BDF: What's Been Built and What's Next

This post is a standing summary of where the Battery Data Format is at. I’ll keep it updated as things move. If you’re new here, the open standard thread has the community discussion history — but here’s the short version.

Background

BDF started with a specification donation from Ohm in May 2025. In parallel, the community had been discussing what an open standard for cycler output data should look like since early 2024 — evaluating existing formats, ontologies, and data management approaches. That discussion informed the direction as the donated spec was refined into what shipped.

Key contributions along the way: @DrSimonClark brought ontology alignment with BattINFO/EMMO, @smedegaard pushed for Parquet-compatible nested structures and kicked off the original spec discussion, and @TomHolland contributed the PyProBE integration.

Timeline

  • May 2025 — Ohm donates the initial BDF specification
  • Aug 2025 — Largest open source battery dataset contribution: 199 cells (NMC//graphite and LFP//graphite), each tested for 1,000 cycles under fully automated workflows
  • Dec 2025 — LF Energy publicly releases BDF
  • Jan 2026 — Partnership announcements: BattInfo Ontology, Faraday Institution (PyProBE and BDX), Microsoft Open Battery Dataset contribution, Ohm BDF Converter
  • Feb 2026 — BDF Python package released on PyPI
  • Mar 2026 — Dataset donation from Microsoft Surface Battery Development

What’s been decided

  • Canonical column schema: BDF standardizes column labels and units so datasets from different cyclers can be compared without custom glue code. Required columns are Test Time / s, Voltage / V, and Current / A, with recommended columns for cycle count, step count, temperature, etc.
  • Ontology alignment: column names and metadata terms map to BattINFO/EMMO, making BDF datasets machine-interpretable.
  • Units in column names: SI-style labels (e.g. Voltage / V) as the promoted convention, with tolerance for [] and () variants. Unit semantics live in metadata and are parseable by pint.
  • Time in seconds, not milliseconds. Seconds are more natural for test durations and precision loss as float is negligible for serialized data.
  • Step index: monotonically increasing, incremented on any change in control mode.

What’s still open

These are the active spec questions. If you have experience here, the GitHub issues are the right place to weigh in.

Datastore

The bdf-datastore is a growing collection of real battery datasets in BDF format, organized by contributor and cell. It currently includes datasets from SINTEF and Microsoft, with contributions structured as raw vendor data alongside processed BDF files and metadata. The 199-cell dataset from European research labs (NMC//graphite and LFP//graphite, 1,000 cycles each) is the largest open source contribution to date, and Microsoft Surface Battery Development donated a dataset in March 2026.

Contributing data is straightforward — fork the repo, add your dataset following the folder conventions, and open a PR.

Tooling

  • pip install batterydf — reads vendor exports, normalizes to BDF, validates, and produces metadata. Supports Neware NDA, interactive plotting via Plotly and hvplot. PyPI · GitHub
  • @TomHolland’s PyProBE provides a user-interface layer compatible with BDF
  • @DrSimonClark published the first individual BDF dataset to Zenodo: CR2032 discharge time series

Last updated: April 2026