We’ve been running into a practical challenge while preparing some larger datasets for release, and I’d love to hear how others in the community are handling this.
When a dataset grows beyond a certain number of files, platforms like Zenodo automatically zip the upload. This is understandable, but it breaks the folder-based structure that BDF encourages and makes it harder for people to browse, inspect, or load subsets of the data without downloading everything.
It’s a small technical issue in the grand scheme, but it has real implications for usability, especially once groups start publishing full test campaigns, long time-series, or high-resolution metadata.
Questions for the community:
-
Where are you hosting your large datasets today?
-
Have you found platforms that preserve folder structure without zipping everything?
-
Is anyone using Git-LFS, S3, HuggingFace Datasets, Dataverse, Dryad, or institutional repositories?
-
Would a “BDF-friendly” hosted storage solution be valuable?
-
Are there best practices we should document for dataset authors so the ecosystem converges on a few stable options?
We’d really appreciate hearing your experiences and recommendations. If there’s interest, we can compile everything into a community-driven guide for BDF dataset hosting.