Rapid service deployment and modern DevOps practices have enabled service development to be more agile and bold when it comes to introducing new features to users. However, it is not immediately known how much of an impact some of these new features will have until the service is situated in a production environment exposed to all your users.
By taking advantage of DevOps patterns such as introducing an edge layer to route requests, managing traffic at a specific threshold and deploying services in a decoupled fashion, we can develop A/B experimentation with the basic tools that are already available to us. This edge layer performs the role of a gateway router to help coordination of traffic to the various instances of your application in the cloud.
An edge layer not only performs as a proxy to incoming requests, but such a service can also be used as an API gateway to abstract the potentially complex remote API calls by clients from various platforms. Leveraging this notion allows us to perform weighted experimentation of users to aid the service provider in determining whether or not the feature that has been introduced is meaningful in providing positive impact to users.
Aside from knowledge in implementing bot services using the Microsoft Bot Framework, and basic familiarity of deploying services through Microsoft Azure, the technical requirements to implement A/B bot experimentation is not contrived with much technical overhead.
The Bot Messaging Workflow
To begin our discussion, we will use a simple Bot service currently deployed on the cloud. This service makes use of minimal architecture allowable for user interaction. The assumption is that this service is deployed on Microsoft Azure and is situated across two geographical regions: West US and East US.
In order to manage traffic, Azure Traffic Manager is used to route bot service requests at the DNS level to the closest and most performant geographical region relative to the client.
Each bot service exposes an API endpoint, through REST. The endpoint, which is a POST method,
/api/messages allows messages to be sent to the bot service and handled asynchronously by the Bot Framework.
The Microsoft Azure Bot Service manages the channels which invokes these API calls through the bot registration and configuration page. These channels can be Skype, Slack, Facebook Messenger, or through the web. With traffic manager configured at the root
/ endpoint, we can easily change the bot registration API endpoint to the traffic manager endpoint to allow incoming messages to be routed appropriately to the service closest to the user.
Upon receipt of the message, the bot service responds with an
HTTP 202 Accepted as the immediate synchronous response. The real message itself is handled asynchronously through a REST POST call to the Bot Framework by the bot application service. This call includes the payload consisting of the pertinent information needed to send the response back to the sender of the message: which user, or conversation to send back to, and the message response that will be seen by the participants of that conversation.
The user will see a response in their appropriate chat connector channel as one synchronous flow if implemented and performing optimally.
The main observation here is that we can see that this composed architecture of sending, receiving and responding of messages is decoupled and handled in a distributed fashion.
Exploiting this model enables the creation of a powerful experimentation platform that not only allows many instances of experimental instances to be exposed, but also to provide the ability to release new features within the service at a much higher velocity. This can all be done with just few modifications and additions to the underlying service infrastructure of your bot service.
The Edge Service
“All problems in computer science can be solved by another level of indirection”.
To enable an experimentation platform, providing the Bot registration configuration another level of indirection by pointing the API endpoint to the edge service will effectively serve as a proxy. This edge service will provide the appropriate logical processing to route the request in accordance to our A/B testing needs.
Our edge layer may initially begin its life as single application service exposed to the bot service registration. We will name Service A referring to the “stable” service and Service B as the “experimental” service.
In this architecture, we can see that the edge service forwards the requests based on a weight to either the instances sitting in the stable service, or the experimental service. The edge service is not aware of the endpoints situated within the bot application services themselves. Rather, it only knows of the traffic manager endpoints as to abstract away the specific service instance endpoints.
With reliance on the specific application instances to receive the requests, the application instances will then perform asynchronous invocations to the chat service itself. Upon invocation, the application services are no longer responsible for delivery of the message, and instead offloading this responsibility to the Bot Framework.
How do we determine the method in routing traffic? Suppose that we arbitrarily decide that 10% of all users should be routed to the experimental service.
In order to maintain some randomness in determining which users to route, we must introduce functions to maintain randomness for user routing. We must also consider that the routing must be consistent and therefore, users who are initially deemed “experimental” must stay experimental for the duration of the experiment.
The basic approach is the probabilistically generate a random number and determine whether or not, the result falls within the range to route the user to the experimental instance. If 10% is the threshold chosen, then the generated result falling within the range of 0 to 0.1 will route a user to the experimental instance. Otherwise, the stable instance is chosen for the user to be routed to.
In order to persist the routing decision for future requests for the user, we must also maintain some state using some attribute associated with the user. For that, something simple such as a hashing of the conversation ID along with assignment to a set for future reference for the life of the experimentation is a good approach. Note, that the decision of what to hash is purely based on the logic of the service and the needs of the experiment.
We also use two buckets in memory to track which conversation falls as stable, or experimental. These buckets reside in memory and persist in the duration of the server process. It is also an option in which one can introduce a persistent data store for robustness. This will be discussed in the next section.
With all this decided, our function can then be written similarly to the one shown below.
Our endpoint handler will then make the call to retrieve the route with the conversation ID passed in as a parameter:
We also use
http-proxy to proxy the request and manipulate the request as per experimentation needs. By handling the
proxyReq event, we can shape the request to however we want before forwarding it to the target endpoint. Take note that we must do this regardless due to
body-parser exhausting the data stream to convert the incoming request body to JSON. We must create a new body to be able to stream back to the target endpoint.
If we are not to be satisfied with in-memory bucketing to maintain information on which endpoints conversations are to be routed to, what architectural changes are needed? We now know that we must introduce a persistence component situated at the edge layer.
Our architecture has now become more interesting. In order to save the hashed conversation ID values for routing the user consistently to the appropriate service, we must either choose to save this data within memory within the edge service instance itself, or introduce a temporary, but fast data store. Let us choose the latter.
Persistence with Redis Cache
A Redis cache backing our edge service can simply serve as a set that will store the information we need to determine whether or not, the user should be routed to a specific instance.
Architecturally, the Redis cache sits on the same level as our edge service. The edge service will then invoke calls to set and get values to “know” where to route traffic to.
We can then maintain this data for the duration of the testing period and flush the cache when the lifetime of the experiment has been reached.
The following video shows a conversation already being deemed as “experimental”. We can see that interacting with the chat bot, that the user is experiencing the “experimental” experience.
At any time, we can choose to turn off experimentation, which forces the user to go back to interacting with the bot in the “stable” experience. This is seamless.
And now, the user is interacting with the chat bot in its original form.
We can also choose to, at will, force everyone to be experimental. This is akin to blue/green deployment where we route all traffic to the new production instance of a service.
And now, we can see that everyone has been fully migrated.
Although what has been discussed is a simple and powerful pattern, there are considerations which must be made aware upon designing your A/B experimentation infrastructure for your Bot Service.
The first and most obvious is that we must be careful not to put too much logic into this service. The service can easily start performing more than what it was intended to due to the convenience of being exposed at the top-most layer of the architecture.
Consideration must also be taken that this pattern adds latency to servicing a request due to additional hops needed to route the request along with processing and caching queries already being performed at this level.
Monitoring latency is critical to optimizing the service. A consequence of introducing any new service is that it becomes another potential point of failure. Since this edge layer is the gateway to your bot service, failure of this service can be catastrophic. Ensuring that the service is maintaining your defined SLA is critical. Maintaining and monitoring its health is key to success.
Perform load testing on your edge layer and provide solutions in handling large amounts of traffic. Adding logic to queue requests, perform back off, or throttling are basic solutions to mitigating spikes in traffic.
Finally, if your service makes use of globally shared resources such as schedulers, storage accounts or document data stores, it is best to think through your use cases and carefully architect your logic to assume experimental and stable services should be able to share these resources.
Where to Go From Here?
It is quite possible to make this pattern more robust to your service’s needs. Try out the following:
- Create a dynamic scaling of ranges to probabilistically route the initial request from each user. 10%, to 20%, to 50% and so on.
- Create N number of experiments. We do not have to be limited by A/B.
- Introduce monitoring of services within this edge service and traffic routing for blue/green advanced DevOps style deployments.
There is so much potential in chat bots eventually becoming useful conversationalists with humans. More experimentation, and patterns to enabling those to study the impact of new features being integrated within these chat bots will only realize the potential of one day, deploying the “killer-app” the chat bot ecosystem currently needs.
Leveraging the Microsoft Azure cloud services and Microsoft Bot Framework to develop these chat bots will enable us as software developers in the coming age of Artificial Intelligence, Natural Language Processing and Internet of Things only makes it easier.