Contents

- 1 Is gradient descent technique for solving optimization problem?
- 2 How does gradient descent method work?
- 3 How is the same problem solved by gradient descent?
- 4 How to calculate step sizes for gradient descent?
- 5 How is back propagation used in gradient descent?
- 6 When does the normal gradient become the natural gradient?

## Is gradient descent technique for solving optimization problem?

Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

## How does gradient descent method work?

Gradient descent is an iterative optimization algorithm for finding the local minimum of a function. To find the local minimum of a function using gradient descent, we must take steps proportional to the negative of the gradient (move away from the gradient) of the function at the current point.

## How is the same problem solved by gradient descent?

The same problem can be solved by gradient descent technique. “Gradient descent is an iterative algorithm, that starts from a random point on a function and travels down its slope in steps until it reaches the lowest point of that function.”

## How to calculate step sizes for gradient descent?

If we had more features like x1, x2 etc., we take the partial derivative of “y” with respect to each of the features.) Update the gradient function by plugging in the parameter values. Calculate the step sizes for each feature as : step size = gradient * learning rate. Repeat steps 3 to 5 until gradient is almost 0.

## How is back propagation used in gradient descent?

As defined above, back-propagation is used to compute partial derivative of cost function J (w) whose value will be used in Gradient Descent algorithm. The end result would be the optimized weights which will be updated as per below equation. Term “backward” means that gradient computation starts from backwards through the network.

## When does the normal gradient become the natural gradient?

When the normal gradient is scaled with the inverse of the Fisher’s matrix, we call it the Natural Gradient. Now for those who can accept the hand-wave that Fisher Matrix is a magical quantity which makes the normal gradient natural, skip to the next section. For the brave souls who stick around, a little maths is on your way.