*Memos:
- My post explains Step function, Identity and ReLU.
- My post explains Leaky ReLU, PReLU and FReLU.
- My post explains ELU, SELU and CELU.
- My post explains GELU, Mish, SiLU and Softplus.
- My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
(1) Tanh:
- can convert an input value(
x
) to the output value between -1 and 1. *0 and 1 are exclusive. - ‘s formula is y = (e
x
– e–x
) / (ex
+ e–x
). - is also called Hyperbolic Tangent Function.
- is Tanh() in PyTorch.
- is used in:
- s’pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- s’cons:
- It causes Vanishing Gradient Problem.
- It’s computationally expensive because of exponential and complex operation.
- ‘s graph in Desmos:
(2) Softsign:
- can convert an input value(
x
) to the output value between 1 and -1. *1 and -1 are exclusive. - ‘s formula is y = x / (1 + |x|).
- is Softsign() in PyTorch.
- ‘s pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- ‘s cons:
- It causes Vanishing Gradient Problem.
- ‘s graph in Desmos:
(3) Sigmoid:
- can convert an input value(
x
) to the output value between 0 and 1. *0 and 1 are exclusive. - ‘s formula is y = 1 / (1 + e–
x
). - is Sigmoid() in PyTorch.
- is used in:
- Binary Classification Model.
- Logistic Regression.
- LSTM.
- GRU.
- GAN.
- ‘s pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
- ‘s cons:
- It causes Vanishing Gradient Problem.
- It’s computationally expensive because of exponential operation.
- ‘s graph in Desmos:
(4) Softmax:
- can convert input values(
x
s) to the output values between 0 and 1 each and whose sum is 1(100%):
*Memos:- *0 and 1 are exclusive.
- If input values are [5, 4, -1], then the output values are [0.730, 0.268, 0.002] which is 0.730(73%) + 0.268(26.8%) + 0.002(0.2%) = 1(100%).
- ‘s formula is:
- is Softmax() in PyTorch.
- is used in:
- Multi-Class Classification Model.
- ‘s pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
- ‘s cons:
- It causes Vanishing Gradient Problem.
- It’s computationally expensive because of exponential and complex operation.
- ‘s graph in Desmos:
Source link
lol
Top comments (0)