This article is a summary of a YouTube video "【機器學習2021】自注意力機制 (Self-attention) (上)" by Hung-yi Lee
TLDR Self-Attention technology and the Transformer network architecture can efficiently process varying input sizes and lengths in complex problems such as word processing by calculating correlations between vectors in a sequence.
Self-Attention is a network architecture that solves the problem of varying input sizes and lengths in complex problems such as word processing.
Word Embedding can simplify the complexity of sound signals and graphs by representing them as vectors, which can be used for regression or classification problems.
POS Tagging is the challenge of identifying the part-of-speech of each word in a sentence, while sound signal identification is like a social network graph where the model decides the characteristics of each node.
Sentiment Analysis helps machines interpret positive or negative comments, while machine learning outputs include voice recognition and sequence labeling.
Fully-Connected Networks struggle with contextual information, but using a window of the first and last five frames can give good results.
Google's Transformer network architecture uses Self-Attention technology to process entire sequence information efficiently.
The lecture explains how to use self-attention mechanism to calculate the correlation between vectors in a sequence and generate a row of vector b.
The lecture explains how Attention is calculated using the Transformer method, which involves calculating correlations between Query and Key vectors to obtain a score called Attention.