Contents

- 1 What is exploration and exploitation in reinforcement learning?
- 2 Why is exploration important in reinforcement learning?
- 3 What is the exploration exploitation tradeoff?
- 4 What is exploration rate?
- 5 What is the difference between exploration and exploitation?
- 6 How do you balance exploration and exploitation?
- 7 What is regret in reinforcement learning?
- 8 Is it possible to do exploration in reinforcement learning?
- 9 Is it possible to have regret in reinforcement learning?
- 10 How are greedy actions used in reinforcement learning?
- 11 Is there problem in exploiting and how much to explore?

## What is exploration and exploitation in reinforcement learning?

In Reinforcement Learning, this type of decision is called exploitation when you keep doing what you were doing, and exploration when you try something new. In Reinforcement Learning on the other hand, it is not possible to do that, but there are some techniques that will help figuring out the best strategy.

## Why is exploration important in reinforcement learning?

A classical approach to any reinforcement learning (RL) problem is to explore and to exploit. Explore the most rewarding way that reaches the target and keep on exploiting a certain action; exploration is hard. Without proper reward functions, the algorithms can end up chasing their own tails to eternity.

## What is the exploration exploitation tradeoff?

The exploration-exploitation trade-off is a fundamental dilemma whenever you learn about the world by trying things out. The dilemma is between choosing what you know and getting something close to what you expect (‘exploitation’) and choosing something you aren’t sure about and possibly learning more (‘exploration’).

## What is exploration rate?

This exploration rate is the probability that our agent will explore the environment rather than exploit it. With , it is certain that the agent will start out by exploring the environment.

## What is the difference between exploration and exploitation?

Exploration involves activities such as search, variation, risk taking, experimentation, discovery, and innovation. Exploitation involves activities such as refinement, efficiency, selection, implementation, and execution (March, 1991).

## How do you balance exploration and exploitation?

Striking a Balance

- sufficient initial exploration such that the best options may be identified.
- exploit the optimal option in order to maximise the total reward.
- continuing to set aside a small probability to experiment with sub-optimal and unexplored options, in case they provide better returns in the future.

## What is regret in reinforcement learning?

Mathematically speaking, the regret is expressed as the difference between the payoff (reward or return) of a possible action and the payoff of the action that has been actually taken. If we denote the payoff function as u the formula becomes: regret = u(possible action) – u(action taken)

## Is it possible to do exploration in reinforcement learning?

In Reinforcement Learning on the other hand, it is not possible to do that, but there are some techniques that will help figuring out the best strategy. Logically when we try something new and the result comes unsatisfying we regret our decision.

## Is it possible to have regret in reinforcement learning?

The answer is, yes it is possible, at least in RL. Regret in Reinforcement Learning First we need to define the regret in RL. To do so we start by defining the optimal action a* as the action that gives the highest reward.

## How are greedy actions used in reinforcement learning?

This is easy to intuitively understand, as the Greedy will lock on one action that happened to have good results at one point of time but it is not in reality the optimal action. So Greedy will keep exploiting this action while ignoring the others which might be better.

## Is there problem in exploiting and how much to explore?

Naturally this raises a question about how much to exploit and how much to explore. Howeve r, there is a problem in exploration which is that we don’t really know what would be the outcome, it could be better than the current situation or it could be worse.