Wednesday, July 12, 2017

Part 6: Reinforcement Learning

Reinforcement Learning is a branch of Machine Learning, also called Online Learning.

It is used to Solve Interacting Problems where the data observed up to time t is considered to decide which action to take at time t + 1.

Reinforcement Learning allows the machine or software agent to learn its behaviour based on feedback from the environment. This behaviour can be learnt once and for all, or keep on adapting as time goes by (e.g. auto-drive cars)

It is also used for Artificial Intelligence when training machines to perform tasks such as walking. Desired outcomes provide the AI with reward, undesired with punishment. Machines learn through trial and error. For example, consider teaching a dog a new trick: you cannot tell it what to do, but you can reward/punish it if it does the right/wrong thing. It has to figure out what it did that made it get the reward/punishment, which is known as the credit assignment problem. We can use a similar method to train computers to do many tasks, such as playing backgammon or chess, scheduling jobs, and controlling robot limbs.

Two Reinforcement Learning models:
  • Upper Confidence Bound (UCB)
  • Thompson Sampling

Differences:









UCB:

  • Deterministic
  • Required update at every round

Thompson
  • Probabilistic
  • Can accommodate delayed feedback
  • Better empirical eveidence


Hope this helps!!

Arun Manglick

No comments:

Post a Comment