EssaysForStudent.com - Free Essays, Term Papers & Book Notes
Search

Deep Reinforcement Learning

Page 1 of 2

History of Deep Reinforcement Learning

Academic researchers have developed deep reinforcement learning for about 60 years:

  • In 1956, Richard Bellman proposed Dynamic Programming Equation, which later was used to update the Q-table.

Dynamic Programming Equation is an essential condition for optimality pertaining to the mathematical optimization way. It defines the value of a decision problem at a specific point regarding the payoff situation and the value of the remaining decision issue that comes from the first choices.

[Application]

  • The well-known approach in this equation is intertemporal capital asset pricing model by Robert C. Merton. In the solution to Merton's theoretical model, investors chose between income today and future income or capital gains, using a form of Bellman's equation.
  • Nancy Stokey, Robert E. Lucas, and Edward Prescott identified stochastic and non-stochastic dynamic programming in critical details, and this led to various applications in economics, such as optimal economic growth, resource extraction, principal–agent problems, public finance, business investment, asset pricing, factor supply, and industrial organization.
  • In 1970s, Harry Klopf wrote reports to propose that a system which could learn but not just memorize given examples as Supervised Learning did, was needed.  
  • In 2014, David Silver proposed Policy gradient methods. This new technique solved problems occurred in traditional approaches. For example, continuous states and actions could be dealt with in ways that were not complex.
  • In 2016, Richard Sutton and Andrew Barto released the drafted second edition of Reinforcement Learning: An Introduction. 

Applications of Deep Reinforcement Learning

Reinforcement Learning has been used to solve problems on a long-term versus short-term reward trade-off. This framework is applied for various real cases, such as robot control, robotics, inventory operation, resource allocation, finance, and so on.

  • Robot Control: iterative learning control[1] is applied in robotic systems, using model of dynamics, correct errors in trajectories. But most industrial robotic systems still perform a fixed motion repeatedly with simple or no perception.

([1] Bristow, Douglas, Marina Tharayil, and Andrew G. Alleyne. A survey of iterative learning control)

* image source: Marketsandmarkets

  • Robotics: reinforcement learning framework is applied in torque at joins of robots that are supposed to move as human or animals do. The system reads observations in each action of the subjects by various sensors and it rewards right decisions by navigating to target an accurate location.

[pic 1]

* image source: Boston Dynamics

  • Inventory Management: for manufacturing companies, inventory control system is very critical. Reinforcement Learning contributes to clarify how many inventories a company purchases depending on different inventory status.
  • Resource Allocation: in a call center, the controlling manager is in charge of how to allocate the human resources, who are call operators, efficiently. RL helps the whole operation to allocate the right call counselors on the basis of ‘who to service first’.
  • Finance: RL is being applied in capital market everyday. Especially for investment decision, portfolio design, and option/asset pricing, the typical framework on RL observes the real market and gives rewards on different trade-offs on investment.

Download as (for upgraded members)  txt (3.4 Kb)   pdf (286.2 Kb)   docx (1.9 Mb)  
Continue for 1 more page »