collect

FileSet.collect(start=None, end=None, files=None, return_info=False, **kwargs)[source]

Load all files between two dates sorted by their starting time

Notes

This does not constrain the loaded data to the time period given by start and end. This fully loads all files that contain data in that time period, i.e. it returns also data that may exceed the time period.

This parallelizes the reading of the files by using threads. This should give a speed up if the file handler’s read function internally uses CPython code that releases the GIL. Note that this method is faster than icollect() but also more memory consuming.

Use this if you need all files at once but if want to use a for-loop consider using icollect() instead.

Parameters
  • start – The same as in find().

  • end – The same as in find().

  • files – If you have already a list of files that you want to process, pass it here. The list can contain filenames or lists (bundles) of filenames. If this parameter is given, it is not allowed to set start and end then.

  • return_info – If true, return a FileInfo object with each content value indicating to which file the function was applied.

  • **kwargs – Additional keyword arguments that are allowed for map(). Some might be overwritten by this method.

Returns

one with FileInfo objects of the files and one with the read content objects. Otherwise, the list with the read content objects only. The lists are sorted by the starting times of the files.

Return type

If return_info is True, two list are going to be returned

Examples:

## Load all files between two dates:
# Note: data may contain timestamps exceeding the given time period
data = fileset.collect("2018-01-01", "2018-01-02")

# The above is equivalent to this magic slicing:
data = fileset["2018-01-01":"2018-01-02"]

## If you want to iterate through the files in a for loop, e.g.:
for content in fileset.collect("2018-01-01", "2018-01-02"):
    # do something with file and content...

# Then you should rather use icollect, which uses less memory:
for content in fileset.icollect("2018-01-01", "2018-01-02"):
    # do something with file and content...