1 year ago
#190241
danielkim9
Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)
I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in models such as BERT or GPT-3, it seems to me that there is an output. For example, in BERT, some of the tokens in the input sequence are masked. Then, the model will try to predict those words. Since we already know what those masked words originally were, we can compare that with the prediction to find the loss. Isn't this basically supervised learning?
machine-learning
bert-language-model
unsupervised-learning
0 Answers
Your Answer