MultiFileDataset

class typhon.datasets.dataset.MultiFileDataset(name=None, **kwargs)[source]

Represents a dataset where measurements are spread over multiple files.

If filenames contain timestamps, this information is used to determine the time for a granule or measurement. If filenames do not contain timestamps, this information is obtained from the file contents.

basedir

Describes the directory under which all granules are located. Can be either a string or a pathlib.Path object.

Type:: pathlib.Path or str

subdir

Describes the directory within basedir where granules are located. May contain string formatting directives where particular fields are replaces, such as year, month, and day. For example: subdir = ‘{year}/{month}’. Sorting cannot be more narrow than by day.

Type:: pathlib.Path or str

re

Regular expression that should match valid granule files within basedir / subdir. Should use symbolic group names to capture relevant information when possible, such as starting time, orbit number, etc. For time identification, relevant fields are contained in MultiFileDataset.date_info, where each field also exists in a version with “_end” appended. MultiFileDataset.refields contains all recognised fields.

If any _end fields are found, the ending time is equal to the beginning time with any _end fields replaced. If no _end fields are found, the granule_duration attribute is used to determine the ending time, or the file is read to get the ending time (hopefully the header is enough).

Type:: str

granule_cache_file

If set, use this file to cache information related to granules. This is used to cache granule times if those are not directly inferred from the filename. Otherwise, this is not used. The full path to this file shall be basedir / granule_cache_file.

Type:: pathlib.Path or str

granule_duration

If the filename contains starting times but no ending times, granule_duration is used to determine the ending time. This should be a datetime.timedelta object.

Type:: datetime.timedelta

__init__(**kwargs)[source]

Initialise a Dataset object.

All keyword arguments will be translated into attributes. Does not take positional arguments.

Note that if you create a dataset with a name that already exists, the existing object is returned, but __init__ is still called (Python does this, see https://docs.python.org/3.7/reference/datamodel.html#object.__new__).

Methods

`__init__`(**kwargs)	Initialise a Dataset object.
`as_xarray_dataset`()
`combine`(my_data, other_obj[, other_data, ...])	Combine with data from other dataset.
`find_dir_for_time`(dt)	Find the directory containing granules/measurements at (date)time
`find_granules`([dt_start, dt_end, ...])	Yield all granules/measurementfiles in period
`find_granules_sorted`([dt_start, dt_end, ...])	Yield all granules, sorted by times.
`find_most_recent_granule_before`(instant, ...)	Find granule covering instant
`get_additional_field`(M, fld)	Get additional field.
`get_info_for_granule`(p)	Return dict (re.fullmatch) for granule, based on re
`get_mandatory_fields`()
`get_path_format_variables`()	What extra format variables are needed in find_granules?
`get_subdir_resolution`()	Return the resolution for the subdir precision.
`get_time_from_granule_contents`(p)	Get datetime objects for beginning and end of granule
`get_times_for_granule`(p, **kwargs)	For granule stored in path, get start and end times.
`iterate_subdirs`(d_start, d_end, **extra)	Iterate through all subdirs in dataset.
`read`([f, fields, pseudo_fields])	Read granule in file and do some other fixes
`read_period`([start, end, onerror, fields, ...])	Read all granules between start and end, in bulk.
`setlocal`()	Set local attributes, from config or otherwise.
`verify_mandatory_fields`(extra)

Attributes

`aliases`
`basedir`
`concat_coor`
`datefields`
`default_orbit_filters`
`end_date`
`granule_cache_file`
`granule_duration`
`granules_firstline_db`
`granules_firstline_file`
`mandatory_fields`
`maxsize`
`my_pseudo_fields`
`name`
`re`
`read_returns`
`refields`
`related`
`section`
`start_date`
`subdir`
`time_field`
`unique_fields`
`valid_field_values`