Now that we understand the overall architecture, let's explore the paper's most important concept—the attention mechanism. The diagram shows three orange boxes labeled "attention." Each serves a different purpose, but all share the same structure.

全体的なアーキテクチャを理解したところで、この論文で最も重要な概念であるアテンション機構について見ていきましょう。図には「attention」とラベルがついたオレンジ色のボックスが3つあります。それぞれ異なる目的を果たしていますが、すべて同じ構造を共有しています。

What is attention?

アテンション（注意）とは

Roughly speaking, attention in this model means assigning weights to other vectors when processing a particular vector.

大まかに言えば、このモデルにおけるアテンションとは、特定のベクトルを処理する際に他のベクトルに重みを割り当てることを指します。

In the encoder, this determines how much each token relates to others. For example, when processing $\text{\_apple}$, the most important contextual token is $\text{\_ate}$ because it clarifies what "apple" means in this sentence (it's something the narrator ate, so it's likely a fruit, not a company or a city).

エンコーダーでは、それぞれのトークンが他のトークンとどの程度関連しているかが決まります。例えば、$\text{\_apple}$を処理する際には、この文における「apple」の意味を明確にする$\text{\_ate}$が、文脈上最も重要なトークンです（話者が食べたものなので、企業名や都市名ではなく、果物である可能性が高い）。

There are two attention boxes in the decoder.

デコーダーには2つのアテンションボックスがあります。

Masked Multi Head Attention assigns weights to all previous French tokens, indicating which ones matter most for predicting the next token. For example, after writing "J'ai" (I have), it assigns a very high weight to "ai" because this auxiliary verb requires the next word to be a past participle.

Masked Multi Head Attention は、それまでのすべてのフランス語のトークンに重みを割り当て、次のトークンを予測する上でどれが最も重要かを示します。例えば、「J'ai」（私は持っている）を書いた後、「ai」に非常に高い重みを割り当てます。この助動詞の後には過去分詞が必要だからです。