The activation functions in PyTorch (5)

*Memos:

My post explains Step function, Identity and ReLU.
My post explains Leaky ReLU, PReLU and FReLU.
My post explains ELU, SELU and CELU.
My post explains GELU, Mish, SiLU and Softplus.
My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.

(1) Tanh:

can convert an input value(x) to the output value between -1 and 1. *0 and 1 are exclusive.
‘s formula is y = (e^x – e^–x) / (e^x + e^–x).
is also called Hyperbolic Tangent Function.
is Tanh() in PyTorch.
is used in:
- RNN(Recurrent Neural Network). *RNN in PyTorch.
- LSTM(Long Short-Term Memory). *LSTM() in PyTorch.
- GRU(Gated Recurrent Unit). *GRU() in PyTorch.
- GAN(Generative Adversarial Network).
s’pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
s’cons:
- It causes Vanishing Gradient Problem.
- It’s computationally expensive because of exponential and complex operation.
‘s graph in Desmos:

(2) Softsign:

can convert an input value(x) to the output value between 1 and -1. *1 and -1 are exclusive.
‘s formula is y = x / (1 + |x|).
is Softsign() in PyTorch.
‘s pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
‘s cons:
- It causes Vanishing Gradient Problem.
‘s graph in Desmos:

(3) Sigmoid:

can convert an input value(x) to the output value between 0 and 1. *0 and 1 are exclusive.
‘s formula is y = 1 / (1 + e^–x).
is Sigmoid() in PyTorch.
is used in:
- Binary Classification Model.
- Logistic Regression.
- LSTM.
- GRU.
- GAN.
‘s pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
‘s cons:
- It causes Vanishing Gradient Problem.
- It’s computationally expensive because of exponential operation.
‘s graph in Desmos:

(4) Softmax:

can convert input values(xs) to the output values between 0 and 1 each and whose sum is 1(100%):
*Memos:
- *0 and 1 are exclusive.
- If input values are [5, 4, -1], then the output values are [0.730, 0.268, 0.002] which is 0.730(73%) + 0.268(26.8%) + 0.002(0.2%) = 1(100%).
‘s formula is:
is Softmax() in PyTorch.
is used in:
- Multi-Class Classification Model.
‘s pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
‘s cons:
- It causes Vanishing Gradient Problem.
- It’s computationally expensive because of exponential and complex operation.
‘s graph in Desmos:

Top comments (0)