Four years ago, I joined the startup Bonsai, which envisioned a new way of training AI agents. The idea was to enable subject matter experts to use their knowledge of a particular problem to teach the AI agent how to make decisions about it, or to create an optimal control policy. Bonsai chose reinforcement learning (RL) as the first AI category to support, as we believed it could enable new use cases for automation and create significant value for future customers.
Before then, reinforcement learning had mostly received attention for teaching AI agents how to play games like Pong and other Atari classics. Then, OpenAI enabled users with a library of example environments that could be used to learn more about RL and increase performance of the latest RL algorithms. Soon thereafter, examples of simple physics-based models were added, either as simple robotic systems or games that took advantage of built-in physics engines. At Bonsai, we accelerated that trend and focused on teaching AI agents how to become more intelligent controllers for advanced control problems where traditional approaches may have shortcomings.
With a vision of developing an AI toolchain that enables engineers to add intelligence to their existing and future systems, we started to take a broader view of what might be possible. This goes back to the question: What is a controller? Is a supply chain or any other business process a controller? If so, could we make these controllers more intelligent?
I found a simulation model of the Beer Distribution Game on AnyLogic’s cloud platform and connected it to our deep reinforcement learning service to see if we could teach an AI agent to learn a successful control policy for one of the standard models taught in business school when supply chains are covered. It worked! It easily beat our co-workers when playing the game in real time and enabled our sales team to sign our first supply chain customer.
Fast forward to May 2020. After being acquired by Microsoft in 2018, Bonsai recently launched as a fully integrated Azure service at Microsoft’s Build developers conference. Over the past two years, we have continued to invest in our partnership with AnyLogic, and have learned more about how to use multi-method models with a deep reinforcement learning service such as ours.
If you’re not familiar with RL, it’s based on the idea of framing problems as a Markov decision process in which an AI agent learns a control policy to always pick the best possible action for a given state of the system. Ideally this system is somewhat random and dynamic, which makes a reward-based learning approach superior compared with other traditional control theories. More details can be found here.
Let’s look at an example: Activity Based Costing Analysis (ABCA)
A simplistic factory floor model where cost associated with product processing is calculated and analyzed using activity-based costing (ABC). Each incoming product seizes some resources, is processed by a machine, conveyed, and releases the resources. Cost accumulated by a product is broken down into several categories for analysis and optimization.
In this case, we’re training an AI agent to learn a policy for choosing factory floor parameters with the goal of optimizing product cost. State is defined as the product arrival rate, the number of resources and their utilization, idle and busy costs, as well as process time, and the speed of the conveyor belt. Action is defined as setting the number of resources, process time and the speed to the conveyor. The goal is to reduce cost per product while maintaining a high overall throughput. Users need to describe these parameters to the AI agent using machine teaching, then upload the simulation model and start training. The AI agent will start by initially picking random values for the action, then assess if the changed state is getting closer to the defined goal. Based on this assessment, the agent will adjust actions and over time will reach the goals defined by the user. Depending on problem complexity, the agent may require anywhere from hundreds of thousands to millions of these iterations. Once the goal has been reached, the AI agent has learned to always choose the best possible parameters for any given state of the production process. Enterprises could use a customized version of this simulation model in connection with the Project Bonsai service to optimize their factory floor processes.
The Project Bonsai preview is now available. To get started, you can either access it directly or request engagement from an expert through this form. The model used in this example and more information can be found on the AnyLogic website.
To learn more, please join AnyLogic and our Microsoft team for a webinar showcasing how you can bring deep reinforcement learning to practical business applications in a series of concise and easy-to-follow steps. In this webinar, we’ll demonstrate how an AnyLogic model can be transformed into an RL-ready model and used as the training environment (simulator) in Project Bonsai.
Related: