- Introduction In NLP, models such as GPT or BERT remove a portion of data and learn to predict the removed content ► What is the difference in masked autoencoding between CV and NLP? Architecture was different CNN was traditionally used in the field of vision ViT addressed this gap Information density is different Language: Human-generated signals Vision: Natural signals Masking a very high por..