ORKG Shapes

size of training corpus (in tokens in billions)

The scale of the pretraining dataset, measured in billions of tokens.