Reinforcement Learning

machine learning

I recently went to a talk on Reinforcement Learning (RL) in my local .Net development group.   I found the topic fascinating.  For those not familiar with Reinforcement learning; it is a type of unsupervised machine learning where the algorithm uses real-world feedback over time to learn what to do.   Instead of providing a set of training data it uses a reward system and many repetitions to learn a desired behavior.  It has become a hot topic after a reinforcement learning based AI called AlphaGo beat Lee Sedol, the world Go master in 4 out of 5 games.   Because of the complexity of Go, this wasn’t expected to happen in our lifetimes.

Reinforcement Learning Points

While reinforcement learning is very good at solving very specific types of problems it doesn’t yet seem to be a better replacement for many of the other AI algorithms that have been well developed.   For the most part, if you can solve a problem with another type of AI, it will give you a better solution, for now.    Reinforcement Learning is still relatively new and has a long way to go before it bests many of the other more developed AI methods.

Think of a supply chain application that can leverage RL. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it, so the model is trained with the correct answer itself whereas, in reinforcement learning, there is no answer, but the reinforcement agent decides what to do to perform the given task. In the absence of training dataset, it is bound to learn from its experience.

RL’s advanced machine learning (ML) technique takes a very different approach to train models than other machine learning methods. Its superpower is that it learns very complex behaviors without requiring any labeled training data, and can make short term decisions while optimizing for a longer-term goal.  Wow! It is improving your supply chain by extensive training, both from human and computer decisions. A neural network is trained to predict improved supply chain decisions and selections. This neural net improves the strength of tree search, resulting in a higher quality of decision selection and improved outcomes in the next iteration.

So how do you dive into a topic like reinforcement learning and get lots of people playing with it and developing it?  I think Amazon has the right idea with the AWS DeepRacer, a fully autonomous 1/18th scale race car for developers driven by reinforcement learning. This product will make coding fun, and programmers will want to play with it.   I know because I’m a programmer, and I am interested in getting one.