Reinforcement Learning

Reinforcement learning algorithms are similar to optimization algorithms that are used to find the best methods to earn maximum reward by finding out a winning strategy to attain a given objective. The technique is based on rewards as the model learns to react to an environment by trial and error method. It is basically done to find a winning strategy for future use. The model function learns from its own actions and does not depend on a fixed prior learning policy.

Reinforcement learning is a type of machine learning that is ideal for complex robotics and processes. Reinforcement learning algorithms are fairly complex and need high level of learning to implement it in real world applications correctly.

As the model learns from a series of actions and rewards, so Sequence takes up a vital role in Reinforcement Learning. The reward agent does not just depend on the current state, but the entire history of states. Unlike supervised and unsupervised learning, time is also very important in Reinforcement Learning.

Components of Reinforcement Learning:


Agent is the learner who interacts and selects actions to perform in an environment. Agent tries to maximize the rewards which it gets in the environment. An agent’s job is to maximize the rewards that it receives. Agent gets a penalty or a reward after transitioning from one state to another.


Environment can be defined as the agent’s whole world where it interacts and lives. Agent interacts with several objects in an environment. The environment defines a particular set of tasks for the agent.


An agent performs actions to get rewards its successful completion.


An Agent gets reward after success of a given task in the environment, from which it learns. Rewards are numeric in nature.


Current situation of the agent in an environment is known as its state.

Working of Reinforcement Learning

Working of Reinforcement learning

Rather than learning from data, a reinforcement learning algorithm learns by its mistakes and thus collects meaningful data to use it to make decisions in the future.

Here is a real world example that will help us to understand the working of reinforcement learning algorithms:

We suppose that a cat is the agent that is exposed to an environment, which is our house in this case.

All the situations inside that house are analogous to a state. For Example calling your cat by its name to perform a specific action.

The cat (agent) reacts by performing an action to transition from one state to another state.

After the transition, our agent receives a reward or penalty for its successful or unsuccessful action respectively.

This way your cat or the agent chooses the best strategy by choosing the best action in a state. This is used for better future outcomes in similar environments.

Applications of Reinforcement Learning

Reinforcement Learning is used in solving various real world problems. It has a lot of potential to be used in decision making in various domains where the complexity of prediction is very high. Below are some major applications of Reinforcement Learning.

  • Robotics: robotics have a varied application domains ranging from manufacturing to inventory management.
  • Power Generation and System Optimizations
  • Finance
  • Recommender Systems


  • Highly useful to inculcate human like physical tasks related functions in robotics.
  • Great for clearing errors and self sufficient in resolving them.
  • Helps in solving real world problems with high degree of complexity.

OpenAI Gym

OpenAI Gym is an API that can be used to implement Reinforcement learning for solving real world problems. It is an open-source interface for incorporating all the necessary reinforcement learning tasks.

The gym library for machine learning in python provides an easy-to-use suite of reinforcement learning tasks. Please follow this link to get more information to use OpenAI Gym:

The basic way to show the performance of a reinforcement learning algorithm is to plot the sum of all rewards received as a function of the number of steps. We can say that an algorithm is superior to another only when its plot is consistently above the other one.

Some approaches that can be used to implement reinforcement learning are Q-Learning, State-Action-Reward-State-Action (SARSA), MDP (Markov Hidden Processes)- Hidden Markov Model. Q-Learning method has been briefly discussed below:


Q-learning is a model-free, value based reinforcement learning algorithm. It is also a method of dynamic programming (DP).

It is used to measure that how good an action is in a particular state or policy-based methods to directly find out what actions to take under different states without knowing beforehand about those actions.

Using a Q-Function we get the values for the cells in a Q-Table. Q-Function is denoted by Bellman’s equation that takes state and action pairs as its input values.

The Q-learning algorithm predicts the value of a state-action pair which is stored as a Q-Value. It can be used to compare the predicted values to the observed rewards to update the parameters of the algorithm again. This helps us to make better predictions in the future.

Basic Steps in Q-Learning are:

1. Initialization of Q-Table

2. Agent performs action

3. Calculating Q-Value from the Q-Function

4. Measuring the final reward

5. Updating the Q-Table

6. Repeat Steps 1->5 until final state achieved.


Q-Table can be defined as a lookup table which is maintained during q-learning process and used to calculate the maximum expected future rewards for action at each state. This enables the user to get the best set of actions for a given state in the future. Q-Table guides us with each step in each state. In reinforcement learning, our environment can usually denoted by a Q-Table

Be First to Comment

Leave a Reply

Your email address will not be published.