go to ORKG: http://orkg.org/orkg/predicate/P163013

size of training corpus (in tokens in billions)

The scale of the pretraining dataset, measured in billions of tokens.