Random samples from large datasets, for convenience.
-
bluelightai-dev/dclm-full-deduped-sample
Viewer • Updated • 4.92M • 39 -
bluelightai-dev/the-stack-dedup-sample
Viewer • Updated • 474k • 10 -
bluelightai-dev/common-corpus-sample-open-culture
Viewer • Updated • 462k • 23 -
bluelightai-dev/common-corpus-sample-open-government
Viewer • Updated • 373k • 16 • 1