I don't think anybody following OpenAI's feature releases will be caught off gua...

mistrial9 · on Sept 25, 2023

one of the original training sets for the BERT series is called 'BookCorpus', accumulated by regular grad students for Natural Language Processing science. Part of the content was specifically and exactly purposed to "align" movies and video with written text. That is partly why it contains several thousand teen romance novels and ordinary paperback-style story telling content. What else is in there? "inquiring minds want to know"