MultiFileDataset
- class typhon.datasets.dataset.MultiFileDataset(name=None, **kwargs)[source]
Represents a dataset where measurements are spread over multiple files.
If filenames contain timestamps, this information is used to determine the time for a granule or measurement. If filenames do not contain timestamps, this information is obtained from the file contents.
- basedir
Describes the directory under which all granules are located. Can be either a string or a pathlib.Path object.
- Type:
pathlib.Path or str
- subdir
Describes the directory within basedir where granules are located. May contain string formatting directives where particular fields are replaces, such as year, month, and day. For example: subdir = ‘{year}/{month}’. Sorting cannot be more narrow than by day.
- Type:
pathlib.Path or str
- re
Regular expression that should match valid granule files within basedir / subdir. Should use symbolic group names to capture relevant information when possible, such as starting time, orbit number, etc. For time identification, relevant fields are contained in MultiFileDataset.date_info, where each field also exists in a version with “_end” appended. MultiFileDataset.refields contains all recognised fields.
If any _end fields are found, the ending time is equal to the beginning time with any _end fields replaced. If no _end fields are found, the granule_duration attribute is used to determine the ending time, or the file is read to get the ending time (hopefully the header is enough).
- Type:
- granule_cache_file
If set, use this file to cache information related to granules. This is used to cache granule times if those are not directly inferred from the filename. Otherwise, this is not used. The full path to this file shall be basedir / granule_cache_file.
- Type:
pathlib.Path or str
- granule_duration
If the filename contains starting times but no ending times, granule_duration is used to determine the ending time. This should be a datetime.timedelta object.
- Type:
- __init__(**kwargs)[source]
Initialise a Dataset object.
All keyword arguments will be translated into attributes. Does not take positional arguments.
Note that if you create a dataset with a name that already exists, the existing object is returned, but __init__ is still called (Python does this, see https://docs.python.org/3.7/reference/datamodel.html#object.__new__).
Methods
__init__
(**kwargs)Initialise a Dataset object.
combine
(my_data, other_obj[, other_data, ...])Combine with data from other dataset.
Find the directory containing granules/measurements at (date)time
find_granules
([dt_start, dt_end, ...])Yield all granules/measurementfiles in period
find_granules_sorted
([dt_start, dt_end, ...])Yield all granules, sorted by times.
find_most_recent_granule_before
(instant, ...)Find granule covering instant
get_additional_field
(M, fld)Get additional field.
Return dict (re.fullmatch) for granule, based on re
What extra format variables are needed in find_granules?
Return the resolution for the subdir precision.
Get datetime objects for beginning and end of granule
get_times_for_granule
(p, **kwargs)For granule stored in path, get start and end times.
iterate_subdirs
(d_start, d_end, **extra)Iterate through all subdirs in dataset.
read
([f, fields, pseudo_fields])Read granule in file and do some other fixes
read_period
([start, end, onerror, fields, ...])Read all granules between start and end, in bulk.
setlocal
()Set local attributes, from config or otherwise.
verify_mandatory_fields
(extra)Attributes
aliases
concat_coor
datefields
default_orbit_filters
end_date
granules_firstline_db
granules_firstline_file
mandatory_fields
maxsize
my_pseudo_fields
name
read_returns
refields
related
section
start_date
time_field
unique_fields
valid_field_values