Transformer原理介绍 #8
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Transformer介绍
Transformer是2017年谷歌大脑团队在一篇名为《Attention is All You Need》论文中提出的序列模型。
编码层 Encoding Layer
图中有三种模块,分别是:
Multi-Head Attention 多头注意力:多头注意力与缩放点乘注意力算法
Add & Norm: 残差与Layer Normalization
残差
_ ∂D_in / ∂ A_out = 1 + (∂ C / ∂ B ) ∙ (∂_ B / ∂ A_out )
Layer Normalization
BN与LN的区别
从图片来看:
Feed Forward 前馈神经网络
位置编码 Positional Encoding
进行位置编码
位置编码的原理
解码器 Decoder
遮盖的多头注意力层
交互注意力层