Home > atmlab > datasets > SatDataset.m

SatDataset

PURPOSE ^

SYNOPSIS ^

This is a script file.

DESCRIPTION ^

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

DOWNLOAD ^

SatDataset.m

SOURCE CODE ^

0001 classdef SatDataset < handle % subclass from handle to pass-by-reference
0002     % Class to represent a dataset.
0003     %
0004     % Objects from this class lie at the basis of everything that the
0005     % Collocation Toolkit does. For example, a <a href="matlab:help CollocatedDataset">CollocatedDataset</a>
0006     % consists of two SatDatasets. The SatDataset describes where the data
0007     % are stored on the disk, how the data are read, and contains some more
0008     % information needed for collocations. However, a SatDataset can
0009     % certainly also be used stand-alone, for example, to loop through all
0010     % the granule in a particular time period.
0011     %
0012     % When <a href="matlab:help SatDataset/SatDataset">instantiating</a> an object from this class, (or, for that matter,
0013     % any class in the toolkit!), you should define a number of its properties
0014     % (see below). This can be done either when creating the class (see
0015     % <a href="matlab:help SatDataset/SatDataset">constructor info</a>), or later, in the same way as setting a member
0016     % of a structure. It's probably best to define overall properties, such
0017     % as name, granule_duration, etc., when creating, and to define local
0018     % properties later. For example:
0019     %
0020     % >> spam = SatDataset('name', 'MyDataset', 'granule_duration', '3600', ...
0021     %                      'satname', 'SpamSat');
0022     %
0023     % and later:
0024     %
0025     % >> spam.basedir = '/here/are/the/data';
0026     %
0027     % When a dataset is created, it is registered in a global structure
0028     % that can be obtained by calling <a href="matlab:datasets">datasets</a>. For this reason, it
0029     % is recommended to always at least give a dataset a name when it is
0030     % first created. Not doing so will result in a warning and an
0031     % automatically calculated name, which is probably neither helpful nor
0032     % descriptive.
0033     %
0034     % A number of datasets are pre-defined in <a href="matlab:help define_datasets">define_datasets</a>.
0035     % All registered datasets can be listed with <a href="matlab:help datasets">datasets</a>.
0036     %
0037     %
0038     % SatDataset properties:
0039     %
0040     %  Properties to describe the location of the granules:
0041     %
0042     %   basedir -   Specify root for all datasets
0043     %   subdir -    Specify directory for individual granule
0044     %   re -        Regular expression matching individual granule
0045     %   filename -  Exact filename calculating path for individual granule
0046     %
0047     %  Property describing how to read the granule:
0048     %
0049     %   reader -    Reading routine for one granule for this dataset
0050     %
0051     %  Other properties:
0052     %
0053     %   name -      Name with which this dataset is registered
0054     %   granule_duration - Duration of a single granule in seconds
0055     %   satname -   Name for a single-satellite dataset
0056     %   collocated_datasets - Read-only prop. listing <a href="matlab:help CollocatedDataset">CollocatedDatasets</a>.
0057     %   starttimesfile - Name for filename containing start times if not in filename
0058     %   (Use <a href="matlab:properties SatDataset">properties</a> for a complete listing)
0059     %
0060     % SatDataset methods:
0061     %
0062     %   A selection of the most important methods is shown below.
0063     %   For a full listing, see <a href="matlab:doc SatDataset">doc SatDataset</a>.
0064     %
0065     %  Class instantiation:
0066     %
0067     %   SatDataset -                Info on creating a SatDataset object
0068     %
0069     %  Methods for finding granules:
0070     %
0071     %   find_datadir_by_date -      Find dir. containing granules for date
0072     %   find_granules_by_date -     Find all granules for date
0073     %   find_granules_for_period -  Find all granules for multi-day period
0074     %   find_granule_by_datetime -  Find granule starting at exact datetime
0075     %   find_granule_by_unixsecs -  Similar, but in unixsecs
0076     %   find_extreme_granule -      Find first or last granule for dataset
0077     %   get_starttime -             Get starttime from db (if not in file)
0078     %   set_starttimes -            Update starttimes-db
0079     %
0080     %  Methods for reading granules:
0081     %
0082     %   read_granule -              Read granule starting at exact datetime
0083     %
0084     %  Methods for detecting duplicates:
0085     %
0086     %   find_granule_first_line -   Create hashtable cataloguing duplicates
0087     %   granule_first_line -        Look up entry from this hashtable
0088     %
0089     %  Other useful methods:
0090     %
0091     %   find_info_from_granule -    Get information from granule filename
0092     %
0093     % See also: CollocatedDataset, HomemadeDataset (child), AssociatedDataset, FieldCopier, Collapser
0094     %
0095     % Don't forget the Collocation User's Guide.
0096 
0097     % $Id: SatDataset.m 8777 2014-02-12 12:45:13Z gerrit $
0098     
0099     % read/write props
0100     properties
0101         % Name of the dataset (pseudo-required).
0102         %
0103         % If not given, a name will be automatically calculated. This name
0104         % is not necessarily descriptive, so it's usually good to give a
0105         % name upon creating the dataset.
0106         name;
0107         
0108         % Basedir for the dataset (required).
0109         %
0110         % The base directory in which all granules reside. For example,
0111         % mhs.basedir = '/storage3/data/mhs'.
0112         basedir;
0113         
0114         % Subdir for a particular day of data (required).
0115         %
0116         % In many datasets, data are stored per date, in subdirectories
0117         % such as year/month/day/ or year/doy/. With this property, the user
0118         % tells the dataset how the data are sorted. In locating granules,
0119         % certain strings are replaced: $SAT, $YEAR4, $YEAR2, $MONTH, $DAY,
0120         % $DOY. Example: mhs.subdir = '$SAT_mhs_$YEAR4/$MONTH/$DAY'.
0121         subdir;
0122         
0123         % Regular expression to find individual granule (conditionally optional).
0124         %
0125         % This is required for datasets where the full filename cannot be
0126         % exactly calculated from available information, for example, due
0127         % to each filename containing orbit numbers, downlink stations, or
0128         % other information not known or relevant. When <a href="matlab:help SatDataset/find_granules_by_date">searching</a>
0129         % for granules, the regular expression described by this field is
0130         % used for all files in a particular directory, expecting exactly
0131         % one match. Therefore, the regular expression (a named one passed
0132         % to <a href="matlab:help regexp">regexp</a> must match exactly one file.
0133         %
0134         % The regular expression may be applied either to the basename, or
0135         % to the full path, or to anything in-between.
0136         %
0137         % Fields used for matching are:
0138         %       year02, year04, year, month, day, doy, hour, minute, orbitno
0139         %
0140         % For datasets that are L1 or L2, data is usually sorted at least
0141         % per day, but often per granule, which may be specified either by
0142         % hour/minute or by orbitno, some unique orbit number. Hopefully
0143         % each filename specifies either hour/minute uniquely within a
0144         % day, or an orbit number uniquely. See 'kiruna_init' for examples
0145         % of what to write in re.
0146         %
0147         % See also: <a href="matlab:help SatDataset/filename">filename</a>, <a href="matlab:help SatDataset/find_granule_by_datetime">find_granule_by_datetime</a>
0148         %
0149         % Example:
0150         % mhs.re = '(?<satname>[a-z0-9]{6})_(?<type>[a-z0-9]+)_(?<year>\d{4})/(?<month>\d{2})/(?<day>\d{2})/[A-Z0-9.]+\.S(?<hour>\d{2})(?<minute>\d{2})|\.S(?<hour>\d{2})(?<minute>\d{2})'
0151         re;
0152         
0153         % String describing basename (conditionally optional).
0154         %
0155         % This is used for datasets where the full filename can be exactly
0156         % calculate from available information, and no <a href="matlab:help SatDataset/re">regular expression</a>
0157         % is needed. This includes only the basename (no directory).
0158         %
0159         % See also: <a href="matlab:help SatDataset/re">re</a>, <a href="matlab:help SatDataset/find_granule_by_datetime">find_granule_by_datetime</a>
0160         %
0161         % Example:
0162         % mhs.collocated_datasets(1).filename = 'collocations_$SAT.nc.gz'
0163         filename;
0164         
0165         % Function handle to read a single granule (required)
0166         %
0167         % This property points to a function handle used to read a single
0168         % granule (so it becomes in effect a static method). The documentation
0169         % below is a prescription of the interface that the function handle
0170         % assigned to this property must have in order for methods like
0171         % <a href="atmlab:help SatDataset/read_granule">read_granule</a> and <a href="atmlab:help CollocatedDataset/collocate_granule">CollocatedDataset.collocate_granule</a> to work.
0172         % It is not the documentation for any method that already exists!
0173         %
0174         % FORMAT
0175         %
0176         %   data = cd.reader(fullpath, fields)
0177         %
0178         % IN
0179         %
0180         %   fullpath    string      Full path to granule to be read.
0181         %   fields      cellstr
0182         %
0183         %       Any other fields to be read. This is used by the <a  href="atmlab:help FieldCopier">FieldCopier</a>
0184         %       class to copy fields from the original datasets.
0185         %
0186         % OUT
0187         %
0188         %   The function shall return a single output. This output is a
0189         %   structure with at least the following fields:
0190         %
0191         %       lat     double
0192         %
0193         %           Latitude in degrees. This must be a 2-D matrix, although
0194         %           the dimensions may be 1 (so it can be a row-vector, a
0195         %           column-vector, a scalar, or a matrix). It contains the
0196         %           latitude for every single measurement in the data.
0197         %
0198         %       lon     double
0199         %
0200         %           Longitude in degrees. Like the latitude. The sizes of the
0201         %           fields 'lat' and 'lon' must be exactly equal.
0202         %
0203         %       time    double
0204         %
0205         %           Time in seconds since 00:00 UT. This must be a column
0206         %           vector where the number of elements is equal to the number
0207         %           of rows in the 'lat' and 'lon' fields (one scanline can
0208         %           only have a single time).
0209         %
0210         % THROWS
0211         %
0212         %       If the data cannot be read, throws an error with
0213         %       error identifier `atmlab:invalid_data`.
0214         %
0215         % EXAMPLE
0216         %
0217         %   This function is not normally called directly by the user, but
0218         %   since it must often be implemented by the user, examples consist
0219         %   of example implementations. The <a href="matlab:what datasets">datasets</a> directory
0220         %   contains a number of functions with names starting with
0221         %   'common_read'. Those are all implementations for 'reader' used for
0222         %   various datasets that come with atmlab. For example, for data
0223         %   stored in the same format as AMSU, MHS, HIRS, one can use:
0224         %
0225         % mhs.reader = '@<a href="matlab:help satreaders.poes_radiometer">satreaders.poes_radiometer</a>'
0226         reader = @(varargin)[];
0227         
0228         % Function to post-process a single granule
0229         %
0230         % After a granule is read (see attribute reader), there might be
0231         % dataset-specific post-processing to be done immediately after the
0232         % reading.  For example, one may want to calculate pseudo-fields
0233         % to get IWP when the data only have IWC, and one should save
0234         % memory for later on. By default, this property is set to a no-op.
0235         %
0236         % Note that since this is a function implemented as an attribute
0237         % and not as a method (so that redefining does not require
0238         % subclassing), one needs to explicitly pass the dataset as a first
0239         % argument (see below) when calling the function. However, calling
0240         % this function is taken care of by read_granule, and normally the
0241         % user does not need to call this function explicitly.
0242         %
0243         % The method read_granule takes care of removing ballast as
0244         % indicated by the 'dependencies' fields.
0245         %
0246         % FORMAT
0247         %
0248         %   data = self.processor(self, data, fields)
0249         %
0250         % IN
0251         %
0252         %   self    SatDataset object
0253         %   data    data as returned by self.reader
0254         %   fields  fields as passed to self.reader
0255         %
0256         % OUT
0257         %   data    altered data struct
0258         %
0259         % See also: SatDataset/pseudo_fields
0260         reader_processor = @(self, data, fields)data;
0261         
0262         % Structure with pseudo-field information
0263         %
0264         % For some datasets, the user might want to acquire fields not
0265         % actually in the data. For example, to save memory, one might want
0266         % to get IWP when the data actually contain IWC. For this,
0267         % read_granule needs some information on pseudo_fields,
0268         % specifically, for each pseudo field a cell string with
0269         % dependencies. The dependencies cellstr lists the real fields upon
0270         % which the pseudo fields depend. These will be read, passed to
0271         % reader_processor, and then thrown away.
0272         %
0273         % You may wish to add some attributes for user documentation.
0274         %
0275         % Note: every pseudo field MUST have a dependencies member that is
0276         % a cellstr, but this cellstr may be empty.
0277         %
0278         % Note: Users choosing pseudo-field names equal to fields in the
0279         % actual data will be sentenced to the Usenet Death Penalty.
0280         %
0281         % See also: SatDataset/reader_processor
0282         pseudo_fields = struct();
0283         
0284         % Granule duration in seconds (required)
0285         %
0286         % Describes the duration in seconds for a single granule. In case
0287         % this varies, choose an upper limit. This is used by
0288         % <a href="matlab:help CollocatedDataset/overlap_granule">CollocatedDataset.overlap_granule</a>.
0289         %
0290         % Example:
0291         % mhs.granule_duration = 6130
0292         granule_duration;
0293         
0294         % Satellite name for single-satellite datasets (conditionally required)
0295         %
0296         % Some datasets, such as cpr, are contained only on a single satellite.
0297         % However, the database genereated by <a href="matlab:help SatDataset/find_granule_first_line">find_granule_first_line</a>
0298         % and used by <a href="matlab:help SatDataset/granule_first_line">granule_first_line</a> still needs a satellite to
0299         % generate the string for the filename to store the data. The
0300         % satellite name is also used in some verbose output. It is not
0301         % needed or used for multi-satellite datasets, where instead the
0302         % 'spec' specifies the satellite (such as in MHS).
0303         %
0304         % Example:
0305         % cpr.satname = 'cloudsat'
0306         satname;
0307         
0308         % Name for first-line database (predefined)
0309         %
0310         % Predefined property describing where firstline-db is stored
0311         %
0312         % The first-line database (see <a href="matlab:help SatDataset/find_granule_first_line">find_granule_first_line</a>
0313         % and <a href="matlab:help SatDataset/granule_first_line">granule_first_line</a>)
0314         % is needed to prevent duplicates. This contains the filename, but
0315         % not the directory. Currently unused? FIXME
0316         %
0317         % firstline_filename = 'firstline_$NAME_$SPEC';
0318 
0319         % Cache for reading granules
0320         cache;
0321         
0322         % Make this dataset visible. Defaults to true.
0323         %
0324         % If you're creating a dataset for fully internal use, you might
0325         % want to set this to false to prevent automatic searchers from
0326         % detecting it.
0327         visible = true;
0328         
0329         % List of satellites
0330         %
0331         % For multi-satellite datasets, this may contain a list of
0332         % satellites for this dataset. This is only for information and
0333         % does not need to be complete. It may be used e.g. for looping
0334         % through all the possible datasets. Not to be confused with
0335         % <a href="matlab:help SatDataset/satname">satname</a>, which is for single-satellite datasets.
0336         sats;
0337     
0338         % try to use regular expression for finding; a regular expression
0339         % may still be needed for identifying information by
0340         % find_info_from_granule, but not used for finding a filename in
0341         % the first place.
0342         tryre = true;
0343         
0344         % Flag if it needs an exernal starttimes-file
0345         %
0346         % Defaults to false
0347         needs_starttimesfile = false;
0348         
0349         % Some datasets do not contain a starting time in the filename,
0350         % even if there are multiple granule per day. To resolve this, in
0351         % the root of the dataset (e.g. directly in basedir) we store a
0352         % hashtable (Container.Map) that records the starting time based
0353         % on a id that is unique per filename (e.g. orbit number). One
0354         % example of such a dataset is Saphir.
0355         starttimesfile;
0356 
0357         % Database of starting times for datasets where those are not
0358         % clear from the filename.
0359         starttimes
0360         
0361         % Field that can be a structure for arbitrary metadata, such as
0362         % height, channel definitions, or anything. This is not used within
0363         % the framework of atmlab, but can be used by the user for
0364         % anything.
0365         metadata;
0366 
0367         % defaults cannot be set in any subclass, because re-defining a
0368         % property is not allowed except under special circumstances that
0369         % cannot be met here. Therefore, defaults should be encompassed in
0370         % a property, a structure, 'defaults', that is /not/ defined here,
0371         % but only by the class at the end of the hierarchy
0372 
0373     end
0374     
0375     % read-only props (alterable with methods)
0376     properties (SetAccess = protected)
0377         % list of CollocatedDatasets using this SatDataset (read-only)
0378         %
0379         % This property, which is read-only and automatically filled,
0380         % contains a list of <a href="matlab:help CollocatedDataset">CollocatedDataset</a>s
0381         % that use this <a href="matlab:help SatDataset">SatDataset</a>.
0382         collocated_datasets = [];
0383         
0384         % Full path for db containing start-times where applicable.
0385         %
0386         % This is simply basedir/starttimesfile
0387         starttimes_fullpath
0388     end
0389     
0390     % flags
0391     properties (Constant, GetAccess = protected)
0392         FL_INITIAL = -5;
0393         FL_NOTFOUND = -2;
0394         FL_ALLDUPL = typecast(int32(-1), 'uint32');
0395         FL_NONEXT = -3;
0396         FL_NOCUR = -4;
0397     end
0398     
0399     % internally used properties
0400     properties (Transient, GetAccess = private, SetAccess = private)
0401         firstlines;
0402         cachedstarttimes;
0403     end
0404     
0405     methods
0406         
0407         %% constructor
0408         
0409         function self = SatDataset(varargin)
0410             % Creates a SatDataset object.
0411             %
0412             % For full information on the class, see <a href="matlab:doc SatDataset">doc SatDataset</a>.
0413             %
0414             % FORMAT
0415             %
0416             %   ds = SatDataset('name', 'MyDataset', ...
0417             %                   'basedir', '/some/where', ...
0418             %                   ...);
0419             %
0420             % IN
0421             %
0422             %   Any pair of key/value arguments, as when creating a
0423             %   structure. Valid keys are properties. To get a list of
0424             %   valid properties, call <a href="matlab:properties SatDataset">properties SatDataset</a>
0425             %   and/or <a href="matlab:doc SatDataset">doc SatDataset</a>.
0426             %
0427             % OUT
0428             %
0429             %   Valid <a href="matlab:help SatDataset">SatDataset</a> object.
0430             
0431             % this may be defined in subclasses
0432             if isprop(self, 'defaults')
0433                 def_fields = fieldnames(self.defaults); %#ok<*MCNPN>
0434                 for i = 1:length(def_fields)
0435                     fldnm = def_fields{i};
0436                     self.(fldnm) = self.defaults.(fldnm);
0437                 end
0438             end
0439 
0440             for i = 1:2:nargin
0441                 self.(varargin{i}) = varargin{i+1};
0442             end
0443             
0444             if isempty(self.name)
0445                 % give a name
0446                 nm = self.calculate_name();
0447                 warning(['atmlab:' mfilename], ...
0448                     'You didn''t name me! I''ll name myself %s (for now)', nm);
0449                 self.name = nm;
0450             end
0451 
0452             if isempty(self.starttimesfile)
0453                 self.starttimesfile = 'granule_start_times.mat';
0454             end
0455             
0456             self.cache = CachedData();
0457 
0458         end
0459         
0460         %% things that can be done with datasets
0461         
0462         function fulldir = find_datadir_by_date(self, date, varargin)
0463             % find_datadir_by_date Find directory containing granules
0464             %
0465             % For the given datevec and specification, return a string with
0466             % the path to the directory that contains the granules for this particular
0467             % datevec.
0468             %
0469             % FORMAT
0470             %
0471             %   fulldir = ds.find_datadir_by_date(datevec, spec)
0472             %
0473             % IN
0474             %
0475             %   datevec     vector      [year month day] etc.
0476             %   spec        any         name of sat or cellstr {sat1 sat2}
0477             %
0478             % OUT
0479             %
0480             %   fulldir     string      path to directory
0481             
0482             % verify basedir is defined and exists
0483             
0484             spec = optargs(varargin, {''});
0485             errid = ['atmlab:' mfilename]; %find_datadir_by_date';
0486             
0487             assert(any(self.basedir), errid, 'No basedir initialised for %s %s', class(self), char(self.name));
0488             assert(exist(self.basedir, 'dir')~=0, errid, ...
0489                 ['Configured data directory for %s is %s, ' ...
0490                 'but this does not exist or is not a directory. ' ...
0491                 'Please define basedir for %s correctly ' ...
0492                 'or create the directory.'], ...
0493                 self.name, self.basedir, self.name);
0494             
0495             fulldir = fullfile(self.basedir, self.subdir);
0496             
0497             fulldir = self.repvars(fulldir, date, spec);
0498    
0499         end
0500         
0501         function S = find_info_from_granule(self, fname)
0502             % find_info_from_granule Extract info from a granule filename
0503             %
0504             % Thin shell around <a href="matlab:help regexp">regexp</a>.
0505             %
0506             % FORMAT
0507             %
0508             %   S = ds.find_info_from_granule(fname)
0509             %
0510             % IN
0511             %
0512             %   fname   string      (full) path to granule
0513             %
0514             % OUT
0515             %
0516             %  - structure containing whatever info could be obtained
0517             if isempty(self.re) % try to convert it
0518                 self.pos2re();
0519             end
0520             
0521             S = regexp(fname, self.re, 'names');
0522             
0523         end
0524         
0525         function [M, paths] = find_granules_by_date(self, date, varargin)
0526             % find_granules_by_date Find all granule start times for date/sat/dataset
0527             %
0528             % This function finds all granule start times for granules with any coverage
0529             % for the the indicated for the indicated satellite/dataset pair. This
0530             % includes the last granule of the date before, unless this is explicitly
0531             % prohibited. Note that the last granule of the date before may or may not
0532             % actually contain information for the day requested. The
0533             % function assumes at most one granule for the previous date
0534             % contains information for the current date. If two outputs are
0535             % requested, paths to the granules are also output, as a cell
0536             % array of strings.
0537             %
0538             % FORMAT
0539             %
0540             %   [M, paths] = find_granules_by_date(date[, spec[, with_yesterday]])
0541             %
0542             % OUT
0543             %
0544             %   M     N x 6 matrix     where each row corresponds to one granule.
0545             %                          The first five columns are year,
0546             %                          month, day, hour, minute. The last
0547             %                          column may contain a granule number
0548             %                          for some datasets, or -1 otherwise.
0549             %
0550             %   paths cell of strings  Cell array of strings, corresponding full paths
0551             %
0552             % IN
0553             %
0554             %   date            vector [year month day]
0555             %   spec            string or cellstr, sat or pair of sats
0556             %   with_yesterday  (optional) logical, include yesterdays last
0557             %                   granule or not. Defaults to 'true', as this
0558             %                   is needed to cover all of the date; but if
0559             %                   looping over dates, one might not want it.
0560             %
0561             % EXAMPLE
0562             %
0563             %   >> [grans, paths] = mhs.find_granules_by_date([2010, 9, 8], 'noaa18');
0564             %
0565             % $Id: SatDataset.m 8777 2014-02-12 12:45:13Z gerrit $
0566             
0567             % errid = 'atmlab:find_granules_by_date';
0568             
0569             [spec, with_yesterday] = optargs(varargin, {'', true});
0570             
0571             datadir = self.find_datadir_by_date(date, spec);
0572             
0573             if self.tryre && ~isequaln(self.re, nan) && ~isempty(self.re)
0574                 matchy = self.re;
0575             else
0576                 matchy = self.filename; % exact match only
0577             end
0578             
0579             matchy = self.repvars(matchy, date, spec);
0580             
0581             files = dir(datadir);
0582             nfiles = length(files);
0583             M = zeros(nfiles, 6);
0584             for i = 1:nfiles
0585                 fname = files(i).name;
0586                 %nam = regexp(fname, matchy, 'names');
0587                 nam = self.find_info_from_granule(fname);
0588                 if ~isempty(nam)
0589                     % if present, year/month/day should be the same
0590                     if ~self.infofit(nam, date, spec)
0591                         continue;
0592                     end
0593                     
0594                     M(i, 1:3) = date(1:3);
0595                     if isfield(nam, 'hour')
0596                         M(i, 4) = str2double(nam.hour);
0597                     end
0598                     if isfield(nam, 'minute')
0599                         M(i, 5) = str2double(nam.minute);
0600                     end
0601                     if self.needs_starttimesfile
0602                         if isfield(nam, 'orbitno2') % need both numbers
0603                             % specifies range, so so product should be
0604                             % unique
0605                             M(i, 6) = str2double(nam.orbitno) * ...
0606                                       str2double(nam.orbitno2);
0607                         else
0608                             M(i, 6) = str2double(nam.orbitno);
0609                         end
0610                     else
0611                         M(i, 6) = -1;
0612                     end
0613                 end
0614             end
0615             % all paths
0616             paths = cellfun(@(f) fullfile(datadir, f), {files.name}, 'UniformOutput', false);
0617             % remove lines with zeroes (those are not granules)
0618             nogran = M(:, 1)==0;
0619             M(nogran, :) = [];
0620             paths(nogran) = [];
0621             % sort
0622             [M, I] = sortrows(M);
0623             paths = paths(I);
0624             % add yesterday
0625             if with_yesterday
0626                 yesterday = datevec(datenum(date)-1);
0627                 [M_yesterday, paths_yesterday] = self.find_granules_by_date(yesterday, spec, false);
0628                 if ~isempty(M_yesterday) % maybe today is genesis/big bang/epoch/birth/1970-01-01
0629                     M = [M_yesterday(end, :); M];
0630                     paths = [paths_yesterday(end) paths];
0631                 end
0632             end
0633         end
0634         
0635         function datafile = find_granule_by_datetime(self, datevec, varargin)   
0636             % find_granule_by_datetime Return full path of datafile by date/time
0637             %
0638             % Returns the full path of the datafile corresponding to the input
0639             % arguments. There are two modes
0640             %
0641             %   - If <a href="matlab:help SatDataset/filename">dataset.filename</a> is defined, the path is
0642             %     calculated directly from <a href="matlab:help SatDataset">dataset</a> properties and the arguments
0643             %     passed on. The file may or may not actually exist, so in this case,
0644             %     you can use it to calculate the path when planning to create it.
0645             %
0646             %   - If this is not defined, then <a href="matlab:help SatDataset/re">dataset.re</a> has to
0647             %     be defined. It will search the filesystem for a file matching the
0648             %     regular expression and the indicated datevec/spec for this dataset.
0649             %
0650             % FORMAT
0651             %
0652             %   datafile = dataset.find_granule_by_datetime(date,[ spec[, tol]])
0653             %
0654             % IN
0655             %
0656             %   datevec     vector  Date/time to find datafile for
0657             %   spec        string/ Name of satellite. For datasets belonging to one
0658             %               cellstr satellite, a simple string. For datasets belonging
0659             %                       to two satellites (such as collocations), a cell
0660             %                       array of two strings. For
0661             %                       single-satellite datasets, not needed.
0662             %
0663             % OUT
0664             %
0665             %   datafile    string     Path to the looked-for datafile.
0666             %
0667             % EXAMPLE
0668             %
0669             %   >> path = mhs.find_granule_by_datetime([2010 9 8 8 8], 'noaa18');
0670             
0671             errid = 'atmlab:find_granule_by_datetime';
0672             spec = optargs(varargin, {''});
0673             
0674             % implementation:
0675             % 1. find the directory containing the files from basedir/subdir
0676             % 2. if possible, calculate the filename directly
0677             % 3. otherwise, list all files and match with regexp+tolerance
0678             
0679             datevec(end+1:5) = 0;
0680             
0681             fulldir = self.find_datadir_by_date(datevec, spec);
0682             
0683             if (nargin() <= 3 && ~isequaln(self.filename, nan) && ~isempty(self.filename)) || ~self.tryre % calculate directly
0684                 fn = self.repvars(self.filename, datevec, spec);
0685                 datafile = fullfile(fulldir, fn);
0686             else % will search through all granules with find_granules_by_date
0687                 [granules, paths] = self.find_granules_by_date(datevec, spec, false);
0688                 found = granules(:, 4)==datevec(4) & granules(:, 5)==datevec(5);
0689                 
0690                 nfound = sum(found);
0691                 if iscell(spec) % short-string for error message
0692                     sp = horzcat(spec{:});
0693                 else
0694                     sp = spec;
0695                 end
0696                 
0697                 if nfound==0
0698                     error(errid, 'No datafile found for %s %s [%s]', self.name, sp, num2str(datevec));
0699                 elseif nfound > 1
0700                     error(errid, 'Multiple datefiles found for %s %s', self.name, sp);
0701                 else
0702                     datafile = paths{found};
0703                 end
0704                 
0705             end
0706             
0707         end        
0708         
0709         function datafile = find_granule_by_unixsecs(self, unixsecs, varargin)
0710             % find_granule_by_unixsecs Find granule starting at indicated time
0711             %
0712             % IN
0713             %
0714             %   unixsecs    time in seconds since 1970-01-01T00:00:00Z
0715             %
0716             % ...remaining arguments passed on to <a href="matlab:help SatDataset/find_granule_by_datetime">find_granule_by_datetime</a>
0717 
0718             [year, month, day, hour, minute, second] = unixsecs2date(unixsecs);
0719             datafile = self.find_granule_by_datetime([year month day hour minute second], varargin{:});
0720         end
0721         
0722         function [allgrans, allpaths] = find_granules_for_period(self, date1, date2, spec)            
0723             % find_granules_for_period List all granules for sat/dataset for period
0724             %
0725             % For the period between date1 and date2, list all granules (as vectors
0726             % indicating the starting date/time) available.
0727             %
0728             % FORMAT
0729             %
0730             %   [allgrans, allpaths] = find_granules_for_period(date1, date2, spec)
0731             %
0732             % IN
0733             %
0734             %   date1   datevec     starting date   [y,m,d]
0735             %   date2   datevec     ending date     [y,m,d]
0736             %   spec    various     (optional)
0737             %
0738             % OUT
0739             %
0740             %   allgrans matrix     all granules in daterange
0741             %   allpaths cellstr    all paths to those granules
0742             %
0743             % EXAMPLE
0744             %
0745             %   >> [allgrans, allpaths] = mhs.find_granules_for_period([2010, 9, 5], [2010 9 7], 'noaa18');
0746             
0747             narginchk(3, 4);
0748             if ~exist('spec', 'var')
0749                 spec = [];
0750             end
0751             dates = daterange(date1(1:3), date2(1:3));
0752             ndates = size(dates, 1);
0753             
0754             allgrans = nan*zeros(ndates*15, 6);
0755             allpaths = cell(size(allgrans));
0756             
0757             n = 0;
0758             for i = 1:ndates
0759                 date = dates(i, :);
0760                 [grans, paths] = self.find_granules_by_date(date, spec, false);
0761                 ngrans = size(grans, 1);
0762                 allgrans(n+1:n+ngrans, :) = grans;
0763                 allpaths(n+1:n+ngrans) = paths;
0764                 n = n + ngrans;
0765             end
0766             
0767             to_remove = isnan(allgrans(:, 1));
0768             allgrans(to_remove, :) = [];
0769             allpaths(to_remove) = [];
0770             [allgrans, I] = sortrows(allgrans);
0771             allpaths = allpaths(I);
0772         end
0773         
0774         function [gran, path] = find_granule_covering_instant(self, datevec, varargin)
0775             % For datevec/spec, find granule covering this moment
0776             %
0777             % For a combination of a date-vector and a specification
0778             % (satellite), get the granule that covers this instant.  Note
0779             % that this method does NOT verify that the instant is really
0780             % covered, but rather returns the granule that started most
0781             % recently before the considered moment.
0782             
0783             spec = optargs(varargin, {''});
0784             [grans, paths] = self.find_granules_by_date(datevec(1:3), spec, true);
0785             grans_unisecs = date2unixsecs(grans(:, 1), grans(:, 2), grans(:, 3), grans(:, 4), grans(:, 5));
0786             dv_unisecs = date2unixsecs(datevec(1), datevec(2), datevec(3), datevec(4), datevec(5));
0787             last_i = find(dv_unisecs >= grans_unisecs, 1, 'last');
0788             gran = grans(last_i, :);
0789             path = paths(last_i);
0790             % Not implemented yet.
0791         end
0792         
0793         function data = read_granule(self, date, varargin)
0794             % read_granule Read any granule
0795             %
0796             % High-level function to read a particular granule for a
0797             % satellite/sensor-pair, if one doesn't know what function is suitable to
0798             % use to read it. It also gets rid of ballast (duplicates).
0799             %
0800             % Uses member <a href="matlab:help SatDataset/reader">reader</a> to find out what function to use to
0801             % read this particular data. Uses <a href="matlab:help SatDataset/granule_first_line">self.granule_first_line</a>
0802             % to find out the first scanline to use (e.g. the rest are doubles).
0803             %
0804             % FORMAT
0805             %
0806             %   M = read_granule(date[, spec[, extra[, remdouble[, force[, reload]]]]])
0807             %
0808             % IN
0809             %
0810             %   datevec     vector      Date-vector indicating the starting time.
0811             %   spec        various     (optional) E.g. satellite (noaa18, cloudsat, etc.)
0812             %   extra       cellstr (optional)
0813             %
0814             %       Extra args to reader. Usually, those are extra fields
0815             %       that are to be read. Two of those extra fields are a
0816             %       special-case and taken care of by read_granule, and not
0817             %       passed on to the reader: 'scanline_position' and
0818             %       'scanline_number' are assigned according to the
0819             %       position in the scanline and the scanline number,
0820             %       accordingly.
0821             %
0822             %   remdouble   logical     (optional) Remove duplicates or not? Default true.
0823             %                           One reason to pass 'false' might be
0824             %                           when searching for data according
0825             %                           to line/pos.
0826             %   force       logical     (optional) rather than throwing an error,
0827             %                           return [] in case of error. Default false.
0828             %   reload      logical     (optional) Do not use caching, but
0829             %                           reload in any case.
0830             %
0831             % OUT
0832             %
0833             %   data        struct  Contains at least:
0834             %
0835             %       - lat       Latitude in degrees
0836             %       - lon       Longitude in degrees [-180, 180]
0837             %       - time      Time in seconds since 'epoch'
0838             %       - epoch     Time in seconds since 1970-01-01
0839             %       - version   String or float.
0840             %
0841             %       Further fields should be returned depending on
0842             %       arguments passed to 'extra' and the particular dataset.
0843             %
0844             % EXAMPLE
0845             %
0846             %   >> data = mhs.read_granule([2010 9 8 8 8], 'noaa18', {}, true, false);
0847             %
0848             % $Id: SatDataset.m 8777 2014-02-12 12:45:13Z gerrit $
0849              
0850             % optional arguments
0851             [spec, fields_asked, remdouble, force, reload] = optargs(varargin, {'', {}, true, false, false});
0852             
0853             % checks for earlier error detection
0854             rqre_datatype(date, @isnumeric);
0855             rqre_datatype(spec, {@ischar, @iscellstr});
0856             rqre_datatype(fields_asked, @iscell);
0857             rqre_datatype(remdouble, @islogical);
0858             rqre_datatype(force, @islogical);
0859             rqre_datatype(reload, @islogical);
0860             
0861             % special case: remove 'scanline_position' and
0862             % 'scanline_number' from 'extra'. Those are in reality always
0863             % returned, but readers do not necessarily understand them.
0864             
0865             fields_asked(strcmp(fields_asked, 'scanline_number')) = [];
0866             fields_asked(strcmp(fields_asked, 'scanline_position')) = [];
0867             
0868             % these must always be present in reader return
0869             fields_always = {'time', 'lat', 'lon', 'epoch', 'version'};
0870             
0871             %% generate varname for caching
0872             
0873             vn = genvarname(DataHash([self.name, date, varargin]));
0874             
0875             %% read cached if appropiate
0876             if ~reload && ~isempty(self.cache) && self.cache.has_entry(vn)
0877                 logtext(atmlab('OUT'), 'Reading %s from cache (entry %s)\n', ...
0878                     self.name, vn);
0879                 data = self.cache.get_entry(vn);
0880                 return
0881             end
0882             
0883             %% find datafile and function to read it
0884             datafile = self.find_granule_by_datetime(date, spec);
0885             
0886             %% Split extras in fields in data and pseudo-fields
0887 %            pseudos = fieldnames(self.pseudo_fields);
0888             fields_in_data = {};
0889             fields_in_pseudo = {};
0890             fields_in_deps = {}; % only in dependencies
0891             for i = 1:length(fields_asked)
0892                 field = fields_asked{i};
0893                 if isfield(self.pseudo_fields, field)
0894                     fields_in_pseudo = [fields_in_pseudo field]; %#ok<AGROW>
0895                     deps = self.pseudo_fields.(field).dependencies;
0896                     for j = 1:length(deps)
0897                         dep = deps{j};
0898                         if ~any(strcmp(dep, [fields_always(:)' fields_asked(:)' fields_in_deps(:)']))
0899                             fields_in_deps = [fields_in_deps dep]; %#ok<AGROW>
0900                         end
0901                     end
0902                 else
0903                     fields_in_data = [fields_in_data field]; %#ok<AGROW>
0904                 end
0905             end
0906             
0907             %% read datafile
0908             %logtext(colloc_config('stdout'), '%s(''%s'')\n', ...
0909             %    func2str(reader), datafile);
0910             logtext(atmlab('OUT'), 'Reading %s\n', datafile);
0911             try
0912                 data = self.reader(datafile, [fields_in_data fields_in_deps]);
0913             catch ME
0914                 if force
0915                     switch ME.identifier
0916                         case {'atmlab:invalid_data', 'atmlab:exec_system_cmd:shell'}
0917                             logtext(atmlab('ERR'), ...
0918                                 'Unable to read: %s\n', ME.message);
0919                             return
0920                         otherwise
0921                             ME.rethrow();
0922                     end
0923                 else
0924                     % deciding to create a new exception, so that readers
0925                     % further down than catch all
0926                     ME2 = MException('atmlab:SatDataset:cannotread', ...
0927                         ['Failure while %s %s was reading %s.  Original error had ' ...
0928                          'id ''%s'', message ''%s''.  See ''cause'' for details.'], ...
0929                          class(self), self.name, datafile, ME.identifier, ME.message);
0930                     ME2 = ME2.addCause(ME);
0931                     ME2.throw();
0932                 end
0933             end
0934             
0935             %% do further processing
0936             
0937             data = self.reader_processor(self, data, fields_in_pseudo);
0938             
0939             %% set version from info if not already set
0940             
0941             if ~isfield(data, 'version')
0942                 info = self.find_info_from_granule(datafile);
0943                 if isfield(info, 'version')
0944                     data.version = info.version;
0945                 end
0946             end
0947             
0948             %% check/convert datafile contents
0949             
0950             data = rmfield(data, fields_in_deps);
0951             
0952             fields_expected = [fields_always fields_in_data fields_in_pseudo];
0953             ffound = isfield(data, fields_expected);
0954             if ~all(ffound);
0955                 error(['atmlab:' mfilename ':missingfield'], ...
0956                     ['After reading with %s and processing with %s, ' ...
0957                     'the following fields were expected but not found: %s'], ...
0958                     func2str(self.reader), func2str(self.reader_processor), ...
0959                     strjoin(fields_expected(~ffound), ', '));
0960             end
0961             
0962             if isinteger(data.time)
0963                 % later functions will suffocate on non-double time
0964                 data.time = double(data.time);
0965             end
0966             nlines = size(data.time, 1);
0967             npos = size(data.lat, 2);
0968             
0969             % add scanline_number and scanline_position (before removing
0970             % duplicates!)
0971             
0972             data.scanline_number = uint32(repmat((1:nlines).', [1 npos]));
0973             data.scanline_position = uint16(repmat(1:npos, [nlines 1]));
0974             
0975             if ~isfield(data, 'version') % try from filename
0976                 fileinfo = self.find_info_from_granule(datafile);
0977                 if isfield(fileinfo, 'version')
0978                     data.version = fileinfo.version;
0979                 end
0980             end
0981             
0982             %% remove duplicates
0983             if remdouble
0984                 % read ballast
0985                 firstline = self.granule_first_line(date(1:5), spec);
0986                 if firstline > 0 && firstline < intmax(class(firstline))
0987                     % get rid of ballast in all fields
0988                     for f = fieldnames(data)'
0989                         fn = f{1};
0990                         if isnumeric(data.(fn)) && size(data.(fn), 1) == nlines
0991                             % magic in next lines explained at
0992                             % http://www.mathworks.de/matlabcentral/newsreader/view_thread/290890
0993                             sz = size(data.(fn));
0994                             data.(fn) = reshape(data.(fn)(firstline:nlines, :), [nlines-firstline+1, sz(2:end)]);
0995                         end
0996                     end
0997                 end
0998             end
0999             
1000             %% cache results
1001             if ~isempty(self.cache)
1002                 self.cache.set_entry(vn, data);
1003             end
1004         end
1005                 
1006         function first = granule_first_line(self, d, varargin)
1007             % granule_first_line Returns first scanline not present in previous granule
1008             %
1009             % For a certain granule, return the number of the first scanline that
1010             % is not in the previous scanline. This m-file uses a previously created
1011             % database (a hashtable). This hash-table is cached between subsequent
1012             % calls of the function.
1013             %
1014             % There should exist an entry for each satellite/sensor granule. If it's
1015             % not found, an error is raised, and there is probably a bug somewhere. If
1016             % the satellite/sensor granule exists, but there is no (unique) previous
1017             % granule, line -2 is returned.
1018             %
1019             % FORMAT
1020             %
1021             %   first = granule_first_line(datevec[, spec[, reload]])
1022             %
1023             % IN
1024             %
1025             %   d           various     Starting date/time for granule,
1026             %                           datevec or unixsecs
1027             %   spec        various     (optional) Might be indicating satellite(s)
1028             %   reload      logical     (optional) Reload scanline data (i.e. not
1029             %                           cached). Defaults to false.
1030             %   force       logical     (optional) If nothing found, don't
1031             %                           throw an error, but return empty.
1032             %
1033             % OUT
1034             %
1035             %   first       number      First scanline not in previous granule.
1036             %                           Special values: -1 (no data found), -2 (no
1037             %                           previous granule found)
1038             %
1039             % EXAMPLE
1040             %
1041             %   >> first = mhs.granule_first_line([2010 9 8 8 8], 'noaa18', false);
1042             
1043 %             if isequal(self.firstlines, []) % first run
1044 %                 self.firstlines = containers.Map(
1045 %                 S = struct;
1046 %             end
1047             
1048             [spec, reload, force] = optargs(varargin, {'', false, false});
1049             
1050             assert(ischar(spec), ['atmlab:' mfilename], ...
1051                 'function works only for single-satellite datasets');
1052             if isempty(spec)
1053                 sat = self.satname;
1054             else
1055                 sat = spec;
1056             end
1057             
1058             if isequal(self.firstlines, []) || ~isfield(self.firstlines, sat) || reload % load first
1059                 scanfile = datasets_config('firstline_data');
1060                 scanfile = strrep(scanfile, '$SAT', sat);
1061                 scanfile = strrep(scanfile, '$SENSOR', self.name);
1062                 
1063                 logtext(atmlab('OUT'), ...
1064                     ['Reading %s. ' ...
1065                      '(If this fails, run self.find_granule_first_line ' ...
1066                      '(SatDataset.find_granule_first_line)).\n'], scanfile);
1067                 try
1068                     t = load(scanfile);
1069                 catch ME
1070                     switch (ME.identifier)
1071                         case {'MATLAB:load:couldNotReadFile'}
1072                             newME = MException(['atmlab:' mfilename ':failed'], ...
1073                                 ['Unable to locate duplicates for %s. Failed to read file: %s.\n', ...
1074                                  'You might want to run the ''find_granule_first_line'' method?'], ...
1075                                 self.name, scanfile);
1076                             newME.addCause(ME);
1077                             newME.throw();
1078                         otherwise
1079                             ME.rethrow();
1080                     end
1081                 end
1082                                 
1083                 self.firstlines.(sat) = t.first;
1084             end
1085             
1086             if isscalar(d)
1087                 unisecs = uint32(d);
1088             else
1089                 dv = num2cell(d);
1090                 unisecs = uint32(self.get_starttime(d));
1091             end
1092             
1093 
1094             if self.firstlines.(sat).isKey(unisecs)
1095                 first = self.firstlines.(sat)(unisecs);
1096             elseif force
1097                 first = [];
1098             else
1099                 error(['atmlab:' mfilename ':missing_firstline'], ['no data found for %s @ %d. ', ...
1100                     'You may want to run self.find_granule_first_line(...) ', ...
1101                     'for a sufficiently long period.'], ...
1102                     self.name, unisecs);
1103             end
1104             
1105         end
1106         
1107         function find_granule_first_line(self, startdate, enddate, varargin)
1108             % find_granule_first_line Create hashtable with scanline overlaps
1109             %
1110             % Creates a map (such as used by granule_first_line) that maps for
1111             % each granule the first scanline not occuring in the previous
1112             % scanline.
1113             %
1114             % The resulting hashtable is written to a file according to
1115             % datasets_config('firstline_data'), which is also where
1116             % granule_first_line is looking for it.
1117             %
1118             % FORMAT
1119             %
1120             %   cd.find_granule_first_line(startdate, enddate, spec)
1121             %
1122             % IN
1123             %
1124             %   startdate   datevec     start here
1125             %   enddate     datevec     end here
1126             %   spec        various     specification
1127             %
1128             % OUT
1129             %
1130             %   none, but writes a file
1131             %
1132             % EXAMPLE
1133             %
1134             %   >> mhs.find_granule_first_line([2008 1 1], [2008 12 31], 'noaa18');
1135             
1136             spec = optargs(varargin, {''});            
1137             
1138             if isempty(spec)
1139                 sat = self.satname;
1140             else
1141                 sat = spec;    
1142             end
1143 
1144             scanfile = datasets_config('firstline_data');
1145             scanfile = strrep(scanfile, '$SAT', sat);
1146 
1147             scanfile = strrep(scanfile, '$SENSOR', self.name);
1148             
1149             logtext(atmlab('OUT'), 'Locating granules\n');
1150             allgrans = self.find_granules_for_period(startdate, enddate, spec);
1151             ngrans = size(allgrans, 1);
1152             logtext(atmlab('OUT'), 'Found %d granules\n', ngrans);
1153             if ngrans == 0
1154                 return
1155             end
1156             
1157             if exist(scanfile, 'file')
1158                 tm = load(scanfile);
1159                 self.firstlines.(sat) = tm.first;
1160             else
1161                 self.firstlines.(sat) = containers.Map('KeyType', 'uint32', 'ValueType', 'int32');
1162             end
1163             
1164             uni_first = self.get_starttime(allgrans(1, :));
1165             if ~self.firstlines.(sat).isKey(uni_first)
1166                 logtext(atmlab('OUT'), 'Setting very first to flag: %d:%d\n', ...
1167                     uni_first, self.FL_INITIAL);
1168                 self.firstlines.(sat)(uni_first) = self.FL_INITIAL;
1169             end
1170             next = 0;
1171 
1172             for i = 1:ngrans-1
1173                 uni = self.get_starttime(allgrans(i+1, :));
1174                 uni = uint32(uni);
1175                                 
1176                 [ddv{1:5}] = unixsecs2date(double(uni));
1177                 logtext(atmlab('OUT'), 'granule %d/%d: %d-%02d-%02d %02d:%02d\n', i, ngrans-1, ...
1178                     ddv{1}, ddv{2}, ddv{3}, ddv{4}, ddv{5});
1179 
1180                 if self.firstlines.(sat).isKey(uni) && ...
1181                    ~any(self.firstlines.(sat)(uni) == ...
1182                             cellfun(@int64, {self.FL_INITIAL self.FL_NOTFOUND self.FL_ALLDUPL self.FL_NONEXT self.FL_NOCUR}))
1183                     logtext(atmlab('OUT'), 'Already exists (%d:%d)\n', uint32(uni), self.firstlines.(sat)(uni));
1184                     continue
1185                 end
1186                 try
1187                     couldreadcur = false;
1188                     if isequal(next, 0)
1189                         cur = self.read_granule(allgrans(i, :), spec, {}, false, false);
1190                     else
1191                         cur = next;
1192                     end
1193                     couldreadcur = true;
1194                     couldreadnext = false;
1195                     next = self.read_granule(allgrans(i+1, :), spec, {}, false, false);
1196                     couldreadnext = true;
1197                 catch ME
1198                     switch ME.identifier
1199                         case {'atmlab:find_datafile_by_date', 'atmlab:atovs_get_l1c:zamsu2l1c',...
1200                                 'atmlab:invalid_data','atmlab:find_granule_by_datetime',...
1201                                 'MATLAB:imagesci:hdfinfo:invalidFile', 'MATLAB:imagesci:validate:fileOpen', 'atmlab:exec_system_cmd:shell', 'atmlab:SatDataset:cannotread'}
1202                             logtext(atmlab('ERR'), 'Problem: %s\n', ME.message);
1203                         otherwise
1204                             ME.rethrow();
1205                     end
1206                 end
1207                 %uni = date2unixsecs(allgrans(i+1, 1), allgrans(i+1, 2), allgrans(i+1, 3), allgrans(i+1, 4), allgrans(i+1, 5));
1208                 if couldreadcur && couldreadnext
1209                     t_cur = cur.epoch + cur.time;
1210                     t_next = next.epoch + next.time;
1211                     %[t_cur, t_next] = unify_time_axis(cur.time, next.time);
1212                     if isempty(t_next) || isempty(t_cur)
1213                         firstline = 0;
1214                     else
1215                         % Replaced former by latter because modis L2 can
1216                         % have nans for time/lat/lon due to the flagging in
1217                         % the reading routine
1218                         %firstline = find(t_next > t_cur(end), 1, 'first');
1219                         firstline = find(t_next > t_cur(find(~isnan(t_cur), 1, 'last')), 1, 'first');
1220                     end
1221                     if ~isempty(firstline)
1222                         logtext(atmlab('OUT'), 'First line: %d\n', firstline);
1223                         self.firstlines.(sat)(uni) = firstline;
1224                     else
1225                         logtext(atmlab('OUT'), 'No first line, setting to flag\n');
1226                         self.firstlines.(sat)(uni) = self.FL_ALLDUPL;
1227                     end
1228                 elseif couldreadcur
1229                     logtext(atmlab('OUT'), 'Could not read next, flagging \n');
1230                     self.firstlines.(sat)(uni) = self.FL_NONEXT;
1231                 else
1232                     logtext(atmlab('OUT'), 'Could not read current. flagging\n');
1233                     self.firstlines.(sat)(uni) = self.FL_NOCUR;
1234                 end
1235                 
1236                 if mod(i, 10)==0 % store
1237                     self.store_firstline(scanfile, sat);
1238                 end
1239             end
1240             
1241             self.store_firstline(scanfile, sat);
1242                         
1243         end
1244       
1245         %{
1246         function outer = find_extreme_granule(self, spec, mode, varargin)
1247             % find the earliest or latest granule for this dataset
1248             %
1249             % Must always be between Unix Genesis (1970-01-01) and current
1250             % date. Might provide incorrect results if not every day has
1251             % collocations. Use with caution. A safe but slow alternative
1252             % is to run <a href="matlab:help SatDataset/find_granules_for_period">.find_granules_for_period</a> for a very long date interval.
1253             %
1254             % Second argument is either 'first' or 'last'.
1255             %
1256             % EXAMPLE
1257             %
1258             %   gran = d.find_extreme_granule('noaa18', 'first')
1259             %
1260             % Uses bisection. Works poorly, misses granules if total period
1261             % is short.
1262             
1263             switch mode
1264                 case 'first'
1265                     a = 1;
1266                     b = 2;
1267                     extrema = {[1970 1 1], datevec(now)};
1268                 case 'last'
1269                     a = 2;
1270                     b = 1;
1271                     extrema = {[self.find_extreme_granule(spec, 'first') 0], datevec(now)};
1272                 otherwise
1273                     error(['atmlab:' mfilename ':InvalidUse', ...
1274                         '''mode'' must be ''first'' or ''last'', not %s'], ...
1275                         mode);
1276             end
1277             MAXSTEPS = 100;
1278             %early = [1970 1 1];
1279             %late = datevec(now);
1280             mid = [0 0 0];
1281             
1282             for i = 1:MAXSTEPS
1283                 oldmid = mid;
1284                 mid = datevec((datenum(extrema{1})+datenum(extrema{2}))/2);
1285                 % ceil to just next date
1286                 mid = mid(1:3);
1287                 if all(oldmid == mid)
1288                     outer = extrema{b};
1289                     break
1290                 end
1291                 if any(self.find_granules_by_date(mid, spec, false))
1292                     extrema{b} = mid;
1293                 else
1294                     extrema{a} = mid;
1295                 end
1296                 
1297             end
1298             
1299             grans = self.find_granules_by_date(outer, spec, false);
1300             if isempty(grans)
1301                 if iscell(spec)
1302                     tt = [spec{:}];
1303                 else
1304                     tt = spec;
1305                 end
1306                 error(['atmlab:' mfilename ':nogranule'], ...
1307                     'No granules found for %s %s', self.name, tt);
1308             end
1309             
1310             switch mode
1311                 case 'first'
1312                     outer = grans(1, :);
1313                 case 'last'
1314                     outer = grans(end, :);
1315             end
1316         end
1317 %}
1318         
1319         function start_unixsecs = get_starttime(self, datevec)
1320             % get starttime for granule/granules
1321             %
1322             % This is either taken directly from the datevec, or it is
1323             % taken from a database.
1324             %
1325             % based on datevec, last elem is granule-number
1326             %
1327             % FIXME DOC
1328             
1329             if ~isvector(datevec) % pseudo-broadcast with recursion
1330                 start_unixsecs = zeros(size(datevec, 1), 1);
1331                 for i = 1:size(datevec, 1)
1332                     start_unixsecs(i) = self.get_starttime(datevec(i, :));
1333                 end
1334             else
1335                 
1336                 if self.needs_starttimesfile
1337                     
1338                     gran_no = datevec(6);
1339                     if ~self.starttimes.isKey(gran_no)
1340                         self.set_starttime(datevec);
1341                     end
1342                     start_unixsecs = self.starttimes(gran_no);
1343                 else
1344                     start_datevec = datevec(1:5);
1345                     X = num2cell(start_datevec);
1346                     start_unixsecs = date2unixsecs(X{:});
1347                 end
1348                 
1349                 start_unixsecs = round(start_unixsecs);
1350             end
1351         end
1352 
1353         function set_starttime(self, datevec)
1354             % get starttime from db, if not in file
1355             %
1356             % needs datevec including granule number
1357 
1358             gran_no = datevec(6);
1359             fullpath = self.find_granule_by_datetime(datevec);
1360             logtext(atmlab('OUT'), 'Reading %s to get start-time\n', ...
1361                 fullpath);
1362  %           data = read_saphir_l1(fullpath, 'ScanTimestart');
1363             % Don't remove duplicates, as I need to run this function
1364             % before I can even begin to /detect/ duplicates!
1365             success = false;
1366             try
1367                 S = self.read_granule(datevec, [], {}, false);
1368                 success = true;
1369             catch ME
1370                 switch ME.identifier
1371                     case 'atmlab:rqre_in_range:invalid'
1372                         logtext(atmlab('ERR'), ...
1373                             'Cannot determine start time, setting to 0: %s\n', ...
1374                             ME.message);
1375                     otherwise
1376                         ME.rethrow();
1377                 end
1378             end
1379 %             X = textscan(data.data.ScanTimestart(1).Data, '%4d%2d%2d %2d%2d%2d%6d');
1380 %            [year, month, day, hour, minute, second, ~] = deal(X{:});
1381             if success
1382                 self.starttimes(gran_no) = S.epoch;
1383                 tm = self.starttimes; %#ok<NASGU>
1384                 save(self.starttimes_fullpath, 'tm');
1385 
1386             end
1387 
1388             % since storing a small hashtable is so much quicker than
1389             % reading a large granule-file, doing this every time is a
1390             % small price to pay for easier programming
1391             % might add a conditional here?
1392         end
1393 
1394         function [gridmean, lat, lon] = level3(self, date1, date2, spec, fields, gridsize, varargin)
1395             % Calculate level3 (gridded mean) for period
1396             %
1397             %
1398             % IN
1399             %
1400             % date1
1401             % date2
1402             % spec
1403             % fields
1404             % gridsize
1405             % (optional) limiters
1406             %   Cell array.  Each element is either a function handle
1407             %   (passed directly to binned_statistics, or a two-element
1408             %   cell array with {{fields}, limiter} where gridded
1409             %   data will be limited by calling limiter on {fields}.
1410             %   Can be used e.g. to grid only day-time data with the help
1411             %   of sun_angles.  In the latter case the limiter must take a
1412             %   struct with {fields}.
1413             % (optional) struct with options to binning_fast
1414             %
1415             % Returns:
1416             %
1417             % - gridmean: 3-D matrix, [nlat nlon nfield]
1418             %
1419             % FIXME DOC
1420             
1421             [limiters, data] = optargs(varargin, {{}, struct()});
1422             %data.gridsize = gridsize;
1423             allgrans = self.find_granules_for_period(date1, date2, spec);
1424             allstats = {};
1425             % sort limiters
1426             if ~iscell(limiters)
1427                 limiters = {limiters};
1428             end
1429             limit_binnedstats = {};
1430             limit_other = {};
1431             also_fields = {};
1432             j = 1;
1433             k = 1;
1434             for i = 1:length(limiters)
1435                 limiter = limiters{i};
1436                 if isfunction_handle(limiter)
1437                     limit_binnedstats{j} = limiter;
1438                     j = j + 1;
1439                 elseif iscell(limiter)
1440                     limit_other{k} = limiter;
1441                     k = k + 1;
1442                     also_fields = union(also_fields, setdiff(limiter{1}, {'lat', 'lon', 'time', 'epoch'}));
1443                 end
1444             end
1445             i = 0;
1446             while i < size(allgrans, 1)
1447                 
1448                 data.lat = zeros(1e6, 1, 'single');
1449                 data.lon = zeros(1e6, 1, 'single');
1450                 data.data = zeros(1e6, length(fields), 'single');
1451                 
1452                 N = 1;
1453                 X = whos('data');
1454                 while X.bytes < 2e8 && i < size(allgrans, 1)
1455                     i = i + 1;
1456                     try
1457                         grandata = self.read_granule(allgrans(i, :), spec, union(fields, also_fields));
1458                     catch ME
1459                         switch ME.identifier
1460                             case 'atmlab:SatDataset:cannotread'
1461                                 logtext(atmlab('ERR'), ...
1462                                     'Cannot read granule: %s', ...
1463                                     ME.identifier);
1464                                 continue
1465                             otherwise
1466                                 ME.rethrow();
1467                         end
1468                     end
1469                     % apply limiters
1470                     mask = true(size(grandata.lat));
1471                     for k = 1:length(limit_other)
1472                         lim = limit_other{k};
1473                         mask = mask & lim{2}(getfields(grandata, lim{1}{:}));
1474                     end
1475                     grandata.lat = grandata.lat(mask);
1476                     grandata.lon = grandata.lon(mask);
1477                     for k = 1:length(fields)
1478                         field = fields{k};
1479                         grandata.(field) = grandata.(field)(mask);                    
1480                     end
1481                         
1482                     nnew = size(grandata.lat, 1);
1483                     new = N:(N+nnew-1);
1484                     if nnew==0 % no data
1485                         continue
1486                     end
1487                     if new(end) > size(data.lat, 1)
1488                         % double allocation
1489                         data.lat = [data.lat; zeros(size(data.lat, 1), 1, 'single')];
1490                         data.lon = [data.lon; zeros(size(data.lon, 1), 1, 'single')];
1491                         data.data = [data.data; zeros(size(data.data, 1), length(fields), 'single')];
1492                     end
1493                     data.lat(new) = grandata.lat;
1494                     data.lon(new) = grandata.lon;
1495                                     
1496                     C = cellfun(@(f) single(grandata.(f)), fields, 'UniformOutput', false);
1497                     data.data(new, :) = horzcat(C{:});
1498                     N = N + nnew;
1499 
1500                     X = whos('data');
1501                 end
1502                 data.lat = data.lat(1:N-1);
1503                 data.lon = data.lon(1:N-1);
1504                 data.data = data.data(1:N-1);
1505                 
1506 %                 data.lat = grandata.lat;
1507 %                 data.lon = grandata.lon;
1508 %                 C = cellfun(@(f) double(grandata.(f)), fields, 'UniformOutput', false);
1509 %                 data.data = horzcat(C{:});
1510                 logtext(atmlab('OUT'), 'Binning\n');
1511                 %[binned, lat, lon] = binning_fast(data);
1512                 indces = bin_nd({data.lat, data.lon}, {-90:gridsize:90, -180:gridsize:180});
1513                 binned = cellfun(@(ii) data.data(ii), indces, 'UniformOutput', false);
1514                 stats = binned_statistics(binned, {@(x)(size(x,1)), @(x)(sum(x,1))}, limit_binnedstats);
1515                 logtext(atmlab('OUT'), 'Calculating statistics\n');
1516                 stats.statistic1 = cell2mat(stats.statistic1);
1517                 % cell2mat on multi-dim will flatten 2nd dimension,
1518                 % unflatten and put this dimension at the end again
1519                 stats.statistic2 = permute(reshape(cell2mat(stats.statistic2), [size(stats.statistic2, 1) size(stats.statistic2{1, 1}, 2), size(stats.statistic2, 2)]), [1 3 2]);
1520                 allstats = [allstats {stats}];
1521             end
1522             logtext(atmlab('OUT'), 'Finalising\n');
1523             % calculate total count
1524             allcount = cellfun(@(x) x.statistic1, allstats, 'UniformOutput', false);
1525             allcount = cat(3, allcount{:});
1526             totalcount = sum(allcount, 3);
1527             
1528             % calculate total sum
1529             %allsums = cellfun(@(x) bsxfun(@times, x.statistic1, x.statistic2), allstats, 'UniformOutput', false);
1530             allsums = cellfun(@(x) x.statistic2, allstats, 'UniformOutput', false);
1531             allsums = cat(4, allsums{:});
1532             totalsum = sum(allsums, 4);
1533             
1534             % calculate overall mean
1535             gridmean = bsxfun(@ldivide, totalcount, totalsum);
1536         end
1537                 
1538         %% getters and setters for properties
1539         
1540         function path = get.starttimes_fullpath(self)
1541             path = fullfile(self.basedir, self.starttimesfile);
1542         end
1543         
1544         function times = get.starttimes(self)
1545             if ~self.needs_starttimesfile
1546                 times = [];
1547             else
1548                 if isequaln(self.cachedstarttimes, []) % not initialised
1549                     try
1550                         self.cachedstarttimes = loadvar(self.starttimes_fullpath, 'tm');
1551                     catch ME
1552                         switch ME.identifier
1553                             case {'MATLAB:load:couldNotReadFile', 'MATLAB:load:notBinaryFile'}
1554                                 logtext(atmlab('OUT'), ...
1555                                     ['Unable to read starttimes-file at %s ' ...
1556                                     '(%s). Initialising new.\n'], ...
1557                                     self.starttimes_fullpath, ME.message);
1558                                 self.cachedstarttimes = containers.Map('KeyType', 'uint32', 'ValueType', 'int64');
1559                         end
1560                     end
1561                 end
1562                 
1563                 times = self.cachedstarttimes;
1564             end
1565         end
1566 
1567         function set.starttimes(self, val)
1568             if ~isa(val, class(containers.Map))
1569                 error(['atmlab:' mfilename ':invalid'], ...
1570                     'starttimes should be a containers.Map, found %s', ...
1571                     class(val));
1572             end
1573             logtext(atmlab('OUT'), 'Writing %s\n', self.starttimes_fullpath);
1574             save(self.starttimes_fullpath, 'val');
1575         end
1576         
1577         function set.name(self, value)
1578             % setter for 'name' property.
1579             if ~isempty(self.name)
1580                 % deregister old name
1581                 datasets('delete', self);
1582             end
1583             self.name = value;
1584             datasets(self);
1585         end
1586 
1587         %% overloading stuff
1588         
1589         function sobj = saveobj(self)
1590 %             sobj = saveobj@handle(self);
1591              warning(['atmlab:' mfilename ':nostoring'], ...
1592                 ['You tried to store %s %s, but serialisation is not implemented. ' ...
1593                  'Attempts to load will not yield a valid object.  Sorry.'], ...
1594                  class(self), self.name);
1595              sobj = [];
1596         end
1597     end
1598     
1599     methods (Access = protected)
1600         function pos2re(~)
1601             error(['atmlab:' mfilename ':cannotconvert'], ...
1602                 ['The regular expression self.re was not defined, so ' ...
1603                  'I tried to convert it automagically from location '...
1604                  'information.  However, I don''t know how to do this. ' ...
1605                  'I give up.  Please help me: either implement ' ...
1606                  'a pos2re() method, or define a re member.']);
1607         end
1608         
1609         function nm = calculate_name(self)
1610             % calculate_name Return automated name if name is not given
1611             %
1612             % This function is called by the constructor if no name is
1613             % given. You may wish to override it in a subclass.
1614             
1615             existing_names = fieldnames(datasets);
1616             while true
1617                 nm = sprintf([class(self) '_%d_%d'], uint64(date2unixsecs()),round(rand(1, 1)*10000));
1618                 if ~any(strcmp(nm, existing_names))
1619                     break
1620                 end
1621             end
1622         end
1623         
1624         function add_collocated_dataset(ds, cd)
1625             % Adds a collocated dataset. INTERNAL USE ONLY!
1626             %
1627             % This is called by the CollocatedDataset constructor.
1628             ds.collocated_datasets = [ds.collocated_datasets cd];
1629         end
1630         
1631         function s_out = repvars(self, s, datevec, spec)
1632             % repvars Replace 'magic' variables
1633             %
1634             % Replace the 'magic' variables:
1635             %
1636             % $YEAR4
1637             % $YEAR2
1638             % $MONTH
1639             % $DAY
1640             % $DOY
1641             % $SAT  or  $SAT1 and $SAT2
1642             % $HOUR
1643             % $MINUTE
1644             %
1645             % For more replacements, overload this function.
1646             %
1647             % FORMAT
1648             %
1649             %   s_out = self.repvars(s, datevec, satname)
1650             %
1651             % IN
1652             %
1653             %   s       string  where magic is replaced
1654             %   datevec vector  date-vector [year month day]
1655             %   spec    string/ name of satellite/satellites
1656             %           cellstr
1657             %
1658             % OUT
1659             %
1660             %   s_out   string  with magic replaced
1661             
1662             year = num2str(datevec(1), '%04d');
1663             month = num2str(datevec(2), '%02d');
1664             day = num2str(datevec(3), '%02d');
1665             if length(datevec)>3
1666                 hour = num2str(datevec(4), '%02d');
1667                 minute = num2str(datevec(5), '%02d');
1668             else
1669                 hour = '0';
1670                 minute = '0';
1671             end
1672             year02 = year(3:4);
1673             doy = num2str(round(dayofyear(datevec(1), datevec(2), datevec(3))), '%03d');
1674             
1675             to_replace = {'$YEAR4', year, '$MONTH', month, '$DAY', day, ...
1676                 '$YEAR2', year02, '$DOY', doy, '$HOUR', hour, '$MINUTE', minute};
1677             
1678             if iscellstr(spec)
1679                 to_replace = [to_replace {'$SAT1', spec{1}, '$SAT2', spec{2}}];
1680             else
1681                 if strfind(s, '$SAT1')
1682                     warning('atmlab:strrep_variables', ...
1683                         ['replacing $SAT, but having $SAT1; are you sure you don''t ' ...
1684                         'want to pass TWO satellites?']);
1685                 end
1686                 to_replace = [to_replace {'$SAT', spec}];
1687             end
1688             
1689             s_out = strrep_multi(s, to_replace{:});
1690             
1691         end
1692         
1693         function matches = infofit(self, is, datevec, spec)
1694             yearstr = num2str(datevec(1), '%04d'); % for comparing easily with 2digit years
1695             
1696             if (isfield(is, 'year02') && ~isempty(is.year02) && ~strcmp(is.year02, yearstr(3:4))) || ...
1697                     (isfield(is, 'year04') && ~isempty(is.year04) && str2double(is.year04)~=datevec(1)) || ...
1698                     (isfield(is, 'year') && ~isempty(is.year) && str2double(is.year)~=datevec(1)) || ...
1699                     (isfield(is, 'month') && ~isempty(is.month) && str2double(is.month)~=datevec(2)) || ...
1700                     (isfield(is, 'day') && ~isempty(is.day) && str2double(is.day)~=datevec(3)) || ...
1701                     (isfield(is, 'doy') && ~isempty(is.doy) && str2double(is.doy)~=dayofyear(datevec(1), datevec(2), datevec(3)) || ...
1702                     (isfield(is, 'satname') && ~isempty(is.satname) && ~isequal(is.satname, spec)));
1703                 matches = false;
1704             else
1705                 matches = true;
1706             end
1707         end
1708 
1709     end
1710     
1711     methods (Access = private)
1712         function store_firstline(self, scanfile, sat)
1713             logtext(atmlab('OUT'), ...
1714                 'Writing %s\n', scanfile);
1715             first = self.firstlines.(sat); %#ok<NASGU>
1716             save(scanfile, 'first');
1717         end
1718     end
1719 end

Generated on Mon 15-Sep-2014 13:31:28 by m2html © 2005