go to ORKG: http://orkg.org/orkg/predicate/P41655

pretraining corpus

The dataset or collection of text used during the pretraining phase.