An activation function helps a neural network to learn complex relationships and patterns in data. It takes in the output signal from the previous cell and converts it into some form that can be taken as input to the next cell. The activation function introduces non-linearity into the output of a neuron.

Activation function decides whether a neuron should be activated or not by calculating weighted sum and then adding bias with it. The input into the activation function is **W*x + b **where **W** is the weights of the cell and the **x** is the input and b is the bias **b**.

If there is no activation function, then the weights can take any value depending upon input. Hence a critical use of the activation function is to keep the output within a particular range.

Different activation functions are:

**Softmax**: The softmax is a more generalised form of the sigmoid. It is used in**multi-class classification problems**. It produces values in the range of 0–1 therefore it is used as the final layer in classification models like Sigmoid.

**Tanh:**The tanh is defined as:

Tanh solves the main problem of being zero-centered.

**ReLU**: ReLU**(Rectified Linear Unit)**is defined as**f(x) = max(0,x):**

It is widely used in Convolutional Neural networks. It is easy to compute and does not cause the Vanishing Gradient Problem.

It suffers from the **“dying ReLU”** problem. Since the output is zero for all negative inputs. It causes some nodes to completely die and not learn anything.

**Leaky ReLU and Parametric ReLU**: It is defined as**f(x) = max(αx, x)**

Here α is a hyperparameter generally set to **0.01**. Clearly, Leaky ReLU solves the **“dying ReLU” **problem to some extent. If we set α as 1 then Leaky ReLU will become a linear function f(x) = x and will be of no use. Hence, the value of **α is never set close to 1. **

If we set **α **as a hyperparameter for each neuron separately, we get **parametric ReLU** or **PReLU**.

**ReLU6**: It is basically ReLU restricted on the positive side and it is defined as**f(x) = min(max(0,x),6)**

This helps to stop blowing up the activation thereby stopping the gradients to explode(going to inf) as well another of the small issues that occur with normal ReLUs.