to_zarr

UnitsAwareDataArray.to_zarr(store: MutableMapping | str | PathLike[str] | None = None, chunk_store: MutableMapping | str | PathLike | None = None, mode: ZarrWriteModes | None = None, synchronizer=None, group: str | None = None, encoding: Mapping | None = None, *, compute: bool = True, consolidated: bool | None = None, append_dim: Hashable | None = None, region: Mapping[str, slice | Literal['auto']] | Literal['auto'] | None = None, safe_chunks: bool = True, storage_options: dict[str, str] | None = None, zarr_version: int | None = None) ZarrStore | Delayed

Write DataArray contents to a Zarr store

Zarr chunks are determined in the following way:

  • From the chunks attribute in each variable’s encoding (can be set via DataArray.chunk).

  • If the variable is a Dask array, from the dask chunks

  • If neither Dask chunks nor encoding chunks are present, chunks will be determined automatically by Zarr

  • If both Dask chunks and encoding chunks are present, encoding chunks will be used, provided that there is a many-to-one relationship between encoding chunks and dask chunks (i.e. Dask chunks are bigger than and evenly divide encoding chunks); otherwise raise a ValueError. This restriction ensures that no synchronization / locks are required when writing. To disable this restriction, use safe_chunks=False.

Parameters:
  • store (MutableMapping, str or path-like, optional) – Store or path to directory in local or remote file system.

  • chunk_store (MutableMapping, str or path-like, optional) – Store or path to directory in local or remote file system only for Zarr array chunks. Requires zarr-python v2.4.0 or later.

  • mode ({"w", "w-", "a", "a-", r+", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override all existing variables including dimension coordinates (create if does not exist); “a-” means only append those variables that have append_dim. “r+” means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.

  • synchronizer (object, optional) – Zarr array synchronizer.

  • group (str, optional) – Group path. (a.k.a. path in zarr terminology.)

  • encoding (dict, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g., {"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}

  • compute (bool, default: True) – If True write array data immediately, otherwise return a dask.delayed.Delayed object that can be computed to write array data later. Metadata is always updated eagerly.

  • consolidated (bool, optional) –

    If True, apply zarr’s consolidate_metadata function to the store after writing metadata and read existing stores with consolidated metadata; if False, do not. The default (consolidated=None) means write consolidated metadata and attempt to read consolidated metadata for existing stores (falling back to non-consolidated).

    When the experimental zarr_version=3, consolidated must be either be None or False.

  • append_dim (hashable, optional) – If set, the dimension along which the data will be appended. All other dimensions on overridden variables must remain the same size.

  • region (dict, optional) –

    Optional mapping from dimension names to integer slices along dataarray dimensions to indicate the region of existing zarr array(s) in which to write this datarray’s data. For example, {'x': slice(0, 1000), 'y': slice(10000, 11000)} would indicate that values should be written to the region 0:1000 along x and 10000:11000 along y.

    Two restrictions apply to the use of region:

    • If region is set, _all_ variables in a dataarray must have at least one dimension in common with the region. Other variables should be written in a separate call to to_zarr().

    • Dimensions cannot be included in both region and append_dim at the same time. To create empty arrays to fill in with region, use a separate call to to_zarr() with compute=False. See “Appending to existing Zarr stores” in the reference documentation for full details.

    Users are expected to ensure that the specified region aligns with Zarr chunk boundaries, and that dask chunks are also aligned. Xarray makes limited checks that these multiple chunk boundaries line up. It is possible to write incomplete chunks and corrupt the data with this option if you are not careful.

  • safe_chunks (bool, default: True) – If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. This option may be useful in combination with compute=False to initialize a Zarr store from an existing DataArray with arbitrary chunk structure.

  • storage_options (dict, optional) – Any additional parameters for the storage backend (ignored for local paths).

  • zarr_version (int or None, optional) – The desired zarr spec version to target (currently 2 or 3). The default of None will attempt to determine the zarr version from store when possible, otherwise defaulting to 2.

Returns:

  • * dask.delayed.Delayed if compute is False

  • * ZarrStore otherwise

References

https://zarr.readthedocs.io/

Notes

Zarr chunking behavior:

If chunks are found in the encoding argument or attribute corresponding to any DataArray, those chunks are used. If a DataArray is a dask array, it is written with those chunks. If not other chunks are found, Zarr uses its own heuristics to choose automatic chunk sizes.

encoding:

The encoding attribute (if exists) of the DataArray(s) will be used. Override any existing encodings by providing the encoding kwarg.

See also

Dataset.to_zarr

Zarr

The I/O user guide, with more details and examples.