How are people storing and sharing large battery datasets?

We’ve been running into a practical challenge while preparing some larger datasets for release, and I’d love to hear how others in the community are handling this.

When a dataset grows beyond a certain number of files, platforms like Zenodo automatically zip the upload. This is understandable, but it breaks the folder-based structure that BDF encourages and makes it harder for people to browse, inspect, or load subsets of the data without downloading everything.

It’s a small technical issue in the grand scheme, but it has real implications for usability, especially once groups start publishing full test campaigns, long time-series, or high-resolution metadata.

Questions for the community:

  • Where are you hosting your large datasets today?

  • Have you found platforms that preserve folder structure without zipping everything?

  • Is anyone using Git-LFS, S3, HuggingFace Datasets, Dataverse, Dryad, or institutional repositories?

  • Would a “BDF-friendly” hosted storage solution be valuable?

  • Are there best practices we should document for dataset authors so the ecosystem converges on a few stable options?

We’d really appreciate hearing your experiences and recommendations. If there’s interest, we can compile everything into a community-driven guide for BDF dataset hosting.

Most datasets today seem to be hosted on platforms like Zenodo, Mendeley Data, or institutional-specific repositories. There are a few benefits to those, e.g. (i) they are free, (ii) they mint DOIs for you, and (iii) they handle the bibliographic metadata management.

But, they can also be difficult because they have limits on the number of files per record (e.g. 50 GB and 100 files on Zenodo) and there is no uniformity in how records are structured.

Git-LFS might be suitable for moderately sized records (e.g. < 1 GB) and it offers an archiving service with DOIs through Zenodo. But I’m not sure how well it scales to 100 GB - 1 TB+ (both from a cost and performance perspective).

I’m exploring options like S3 and R2 for hosting the raw data files with an optional git layer for version control.