MultiFileDataset

class typhon.datasets.dataset.MultiFileDataset(name=None, **kwargs)[source]

Represents a dataset where measurements are spread over multiple files.

If filenames contain timestamps, this information is used to determine the time for a granule or measurement. If filenames do not contain timestamps, this information is obtained from the file contents.

basedir

Describes the directory under which all granules are located. Can be either a string or a pathlib.Path object.

Type:

pathlib.Path or str

subdir

Describes the directory within basedir where granules are located. May contain string formatting directives where particular fields are replaces, such as year, month, and day. For example: subdir = ‘{year}/{month}’. Sorting cannot be more narrow than by day.

Type:

pathlib.Path or str

re

Regular expression that should match valid granule files within basedir / subdir. Should use symbolic group names to capture relevant information when possible, such as starting time, orbit number, etc. For time identification, relevant fields are contained in MultiFileDataset.date_info, where each field also exists in a version with “_end” appended. MultiFileDataset.refields contains all recognised fields.

If any _end fields are found, the ending time is equal to the beginning time with any _end fields replaced. If no _end fields are found, the granule_duration attribute is used to determine the ending time, or the file is read to get the ending time (hopefully the header is enough).

Type:

str

granule_cache_file

If set, use this file to cache information related to granules. This is used to cache granule times if those are not directly inferred from the filename. Otherwise, this is not used. The full path to this file shall be basedir / granule_cache_file.

Type:

pathlib.Path or str

granule_duration

If the filename contains starting times but no ending times, granule_duration is used to determine the ending time. This should be a datetime.timedelta object.

Type:

datetime.timedelta

__init__(**kwargs)[source]

Initialise a Dataset object.

All keyword arguments will be translated into attributes. Does not take positional arguments.

Note that if you create a dataset with a name that already exists, the existing object is returned, but __init__ is still called (Python does this, see https://docs.python.org/3.7/reference/datamodel.html#object.__new__).

Methods

__init__(**kwargs)

Initialise a Dataset object.

as_xarray_dataset()

combine(my_data, other_obj[, other_data, ...])

Combine with data from other dataset.

find_dir_for_time(dt)

Find the directory containing granules/measurements at (date)time

find_granules([dt_start, dt_end, ...])

Yield all granules/measurementfiles in period

find_granules_sorted([dt_start, dt_end, ...])

Yield all granules, sorted by times.

find_most_recent_granule_before(instant, ...)

Find granule covering instant

get_additional_field(M, fld)

Get additional field.

get_info_for_granule(p)

Return dict (re.fullmatch) for granule, based on re

get_mandatory_fields()

get_path_format_variables()

What extra format variables are needed in find_granules?

get_subdir_resolution()

Return the resolution for the subdir precision.

get_time_from_granule_contents(p)

Get datetime objects for beginning and end of granule

get_times_for_granule(p, **kwargs)

For granule stored in path, get start and end times.

iterate_subdirs(d_start, d_end, **extra)

Iterate through all subdirs in dataset.

read([f, fields, pseudo_fields])

Read granule in file and do some other fixes

read_period([start, end, onerror, fields, ...])

Read all granules between start and end, in bulk.

setlocal()

Set local attributes, from config or otherwise.

verify_mandatory_fields(extra)

Attributes

aliases

basedir

concat_coor

datefields

default_orbit_filters

end_date

granule_cache_file

granule_duration

granules_firstline_db

granules_firstline_file

mandatory_fields

maxsize

my_pseudo_fields

name

re

read_returns

refields

related

section

start_date

subdir

time_field

unique_fields

valid_field_values