Home > atmlab > collocations > AssociatedDataset.m

AssociatedDataset

PURPOSE ^

SYNOPSIS ^

This is a script file.

DESCRIPTION ^

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

DOWNLOAD ^

AssociatedDataset.m

SOURCE CODE ^

0001 classdef AssociatedDataset < HomemadeDataset
0002     % Defines data associated with CollocatedDataset
0003     %
0004     % The data stored with a <a href="matlab:help CollocatedDataset">CollocatedDataset</a> is only the
0005     % very core needed to retrieve collocations. Any other data needs to be
0006     % stored by using one or more AssociatedDataset objects.
0007     %
0008     % Classes derived from AssociatedDataset describe datasets with data
0009     % associated with <a href="matlab:help CollocatedDataset">CollocatedDataset</a>s.
0010     % The class AssociatedDataset itself is an abstract class, and can
0011     % therefore not be instantianed directly. However, it may be subclassed
0012     % for the implementation of an arbitrary AssociatedDataset.
0013     % Full implementations of AssociatedDataset that come with atmlab are
0014     % <a href="matlab:help FieldCopier">FieldCopier</a> and <a
0015     % href="matlab:help Collapser">Collapser</a>.
0016     %
0017     % The properties and methods are documented here, because subclasses
0018     % provide mere implementations; the signature does not change.
0019     %
0020     % AssociatedDataset Properties:
0021     %
0022     %  abstract properties:
0023     %
0024     %   members -       Describes how data are stored in NetCDF
0025     %   parent -        SatDataset that this AssociatedDataset belongs to
0026     %   dependencies -  Other AssociatedDataset that must be processed first
0027     %
0028     %  (remaining properties from <a href="matlab:help HomemadeDataset">HomemadeDataset</a>)
0029     %
0030     % AssociatedDataset Methods: (partial overview)
0031     %
0032     %  Constructor:
0033     %
0034     %   AssociatedDataset -     Create AssociatedDataset object
0035     %
0036     %  Abstract methods:
0037     %
0038     %   primary_arguments -     Get args for primary reader
0039     %   secondary_arguments -   Get args for secondary reader
0040     %   needs_primary_data -    Need data from primary?
0041     %   needs_secondary_data -  Need data from secondary?
0042     %   process_granule -       Process a single granule
0043     %
0044     %  Implemented methods:
0045     %
0046     %   process_delayed -       Process data when core read from disk
0047     %   merge -                 Combine core and associated
0048     %   get_mergefields -       Return fields needed to merge
0049     %   concatenate -           Vertically concatenate data
0050     %
0051     %  (remaining methods from <a href="matlab:help HomemadeDataset">HomemadeDataset</a>)
0052     %
0053     %  See also: FieldCopier (implementation), Collapser (implementation),
0054     %            HomemadeDataset (superclass), CollocatedDataset, SatDataset.
0055     %
0056     % $Id: AssociatedDataset.m 8516 2013-06-26 21:33:48Z gerrit $
0057     
0058     % need to know:
0059     % - additional arguments to reader primary
0060     % - additional arguments to reader secondary
0061             
0062     properties (Transient, Abstract, SetAccess = protected)
0063         % Describes how data are stored in NetCDF files
0064         %
0065         % This property gives a full description of how data are stored in
0066         % NetCDF files. The value of the property may be different for
0067         % different instances (objects) of any subclass, so the property
0068         % has no predefined value (unlike <a href="matlab:help CollocatedDataset/members">CollocatedDataset.members</a>).
0069         %members;
0070         % Parent dataset that this AssociatedDataset is associated with
0071         %
0072         % Pointing to a <a href="matlab:help CollocatedDataset">CollocatedDataset</a>,
0073         % this property describes what the AssociatedDataset relates to.
0074         parent;
0075         % Other AssociatedDatasets that need to be considered first.
0076         %
0077         % Contains a collection of other AssociatedDataset objects that
0078         % need to be considered first. For example, to process a
0079         % <a href="matlab:help Collapser">Collapser</a>, one first needs a <a href="matlab:help FieldCopier">FieldCopier</a>.
0080         %
0081         % See also method fields_needed_for_dependency.
0082         dependencies;
0083     end
0084     
0085     properties (Transient)
0086         % priority is dynamically set when sorting for dependencies
0087         priority = 0;
0088     end
0089     
0090     methods
0091         %% constructor
0092         
0093         function self = AssociatedDataset(varargin)
0094             % constructor for AssociatedDataset
0095             %
0096             % Note that AssociatedDataset can not be constructed, but some
0097             % sub-classes may be. In the examples below the class is called
0098             % AssociatedDataset, but replace this by whatever class you are
0099             % using to construct your object.
0100             %
0101             % FORMAT
0102             %
0103             %   ad = AssociatedDataset(cd, ...)
0104             %
0105             % IN
0106             %
0107             %   cd  CollocatedDataset
0108             %
0109             %       This argument is only present for dynamic subclasses,
0110             %       such as <a href="matlab:help FieldCopier">FieldCopier</a>.
0111             %       Contains <a href="matlab:help CollocatedDataset">CollocatedDataset</a> to which this
0112             %       AssociatedDataset belongs.
0113             %
0114             %   dp  cell array
0115             %
0116             %       This arguments is only present for dynamic subclasses.
0117             %       Cell array of other AssociatedDataset objects on which
0118             %       this AssociatedDataset depends, e.g., that have to be
0119             %       calculated first.
0120             %
0121             % Remaining arguments passed on to parent. For static
0122             % subclasses, all arguments are directly passed on to the
0123             % parent.
0124             %
0125             % OUT
0126             %
0127             %   AssociatedDataset-derived object.
0128             if nargin>0 && isa(varargin{1}, 'CollocatedDataset') % dynamic style
0129                 style = 'dynamic';
0130                 cd = varargin{1};
0131                 dp = varargin{2};
0132                 [subargs{1:nargin-2}] = varargin{3:end};
0133             else
0134                 style = 'static';
0135                 [subargs{1:nargin}] = varargin{:};
0136             end
0137             self = self@HomemadeDataset(subargs{:}); % call parent constructor
0138             if strcmp(style, 'dynamic')
0139                 self.parent = cd;
0140                 self.dependencies = dp;
0141             end
0142             if self.visible
0143                 self.parent.add_associated(self);
0144             end
0145         end
0146         
0147         %% implement new methods
0148         
0149         function [M, M_cols] = merge_matrix(self, M_core, cols_core, M_self, cols_self)
0150             % horizontally combine core and associated
0151             %
0152             % Combine core data, core 'cols', associated data and
0153             % associated 'cols'.  This may or may not be trivial depending
0154             % on the actual data.  If more than two matrices need to be
0155             % merged, apply this method iteratively.
0156             %
0157             % FORMAT
0158             %
0159             %   [M_new, M_cols] = ad.merge_matrix(M_core, cols_core, M_here, cols_here)
0160             %
0161             % IN
0162             %
0163             %   M_core      matrix      selection from core collocations
0164             %   cols_core   structure   structure describing M_core
0165             %   M_here      matrix      selection of associated data
0166             %   cols_here   structure   structure describing M_here
0167             %
0168             % OUT
0169             %
0170             %   M_new       matrix      combination M_core, M_here
0171             %   cols_new    structure   structure describing M_new
0172             
0173             M = [M_core M_self];
0174             M_cols = self.merge_new_cols(M_core, cols_core, cols_self);
0175         end
0176         
0177     end
0178     
0179     methods (Access = {?SatDataset})
0180         %% implement new methods
0181         
0182         function [out, localcols] = process_delayed(self, processed_core, spec1, spec2, varargin)
0183             % process associated data when core data is already there
0184             %
0185             % Sometimes, core collocations already exist, but one or more
0186             % associated datasets do not exist yet. This method, that is
0187             % not designed to be called directly by the end user, takes
0188             % care of this.
0189             %
0190             % This method just splits a day of collocations into segments
0191             % and passes each segment to <a href="matlab:help AssociatedDataset/process_granule">process_granule</a>.
0192             % That's where the actual processing is done, and for
0193             % process_granule there is no difference between processing
0194             % directly and processing later.
0195             %
0196             % FORMAT
0197             %
0198             %   out = ad.process_delayed(processed_core, spec1, spec2[, depies])
0199             %
0200             % IN
0201             %
0202             %   processed_core      matrix
0203             %
0204             %       Matrix with processed core collocation data. The
0205             %       columns are described by self.parent.cols.
0206             %
0207             %   spec1               various     sat (or so) for primary
0208             %   spec2               various     sat (or so) for secondary
0209             %   depies              cell array
0210             %
0211             %       Contains output for all previous dependencies.
0212             %
0213             %   depcols             cell array
0214             %
0215             %       Contains column-descriptions for depies
0216             %
0217             %   fields              cell array or 'all'
0218             %
0219             %       Contains fields that are to be processed.
0220             %
0221             % OUT
0222             %
0223             %   out                 matrix
0224             %
0225             %       Data matrix with columns described by self.cols
0226             %
0227             %   localcols           structure, describes columns of 'out'
0228  
0229             [depies, depcols, fields] = optargs(varargin, {{}, {}, 'all'});
0230             % data checks
0231             errid = ['atmlab:' mfilename ':InvalidFormat'];
0232             errmes = ['Data are not properly sorted: %s %s field %s has descending elements.  ' ...
0233                       'One possible cause is that %s %s granule N is entirely contained ' ...
0234                       'in preceding granule N-1, but that granule N was not present ' ...
0235                       'when the firstline-db was generated for %s %s, although it was ' ...
0236                       'present when %s %s was generated.  This means that granule N as collocated really ' ...
0237                       'is mostly duplicates of N-1, resulting in the secondary potentially going ' ...
0238                       '''back in time'' for the collocations.  This problem is detected ' ...
0239                       'if additionals are obtained seperately.  The solution is ' ...
0240                       'to rerun find_granule_first_line for %s %s for today, and then ' ...
0241                       'redo collocations for %s %s for the entire day.'];
0242             assert(all(diff(processed_core(:, self.parent.cols.START1))>=0), ...
0243                 errid, errmes, ...
0244                 class(self.parent), self.parent.name, 'START1', ...
0245                 class(self.parent.primary), self.parent.primary.name, ...
0246                 class(self.parent.primary), self.parent.primary.name, ...
0247                 class(self.parent), self.parent.name, ...
0248                 class(self.parent.primary), self.parent.primary.name, ...
0249                 class(self.parent), self.parent.name);
0250             assert(all(diff(processed_core(:, self.parent.cols.START2))>=0), ...
0251                 errid, errmes, ...
0252                 class(self.parent), self.parent.name, 'START2', ...
0253                 class(self.parent.primary), self.parent.primary.name, ...
0254                 class(self.parent.primary), self.parent.primary.name, ...
0255                 class(self.parent), self.parent.name, ...
0256                 class(self.parent.primary), self.parent.primary.name, ...
0257                 class(self.parent), self.parent.name);
0258             % divide in segments where new primary, new secondary starts
0259             [~, newprim] = unique(processed_core(:, self.parent.cols.START1), 'rows', 'first');
0260             [~, newsec] = unique(processed_core(:, self.parent.cols.START2), 'rows', 'first');
0261             % also add 'end' to it, because want to determine segments
0262             newseg = unique([newprim; newsec]);
0263             % empty data-structs are all I pass to processors not needing
0264             % data
0265             data1 = struct();
0266             data2 = struct();
0267             primseg = 0;
0268             seconseg = 0;
0269             out = [];
0270             
0271             logtext(atmlab('OUT'), 'Processing %d segments\n', length(newseg));
0272             for segcount = 1:length(newseg)
0273                 logtext(atmlab('OUT'), 'Processing segment %d/%d\n', segcount, length(newseg));
0274                 segstart = newseg(segcount);
0275                 % end of segment: either beginning of next, or end of data
0276                 if segcount < length(newseg)
0277                     segend = newseg(segcount+1)-1;
0278                 else
0279                     segend = size(processed_core, 1);
0280                 end
0281                 % keep track of 'primary segment' and 'secondary segment'
0282                 % to know corresponding date1, data1, etc.
0283                 if primseg<length(newprim) && segstart == newprim(primseg+1)
0284                     primseg = primseg + 1;
0285                     [dv{1:6}] = unixsecs2date(processed_core(newprim(primseg), self.parent.cols.START1));
0286                     date1 = cell2mat(dv);
0287                     if self.needs_primary_data(fields)
0288                         data1 = self.parent.primary.read_granule(date1, spec1, self.primary_arguments(fields), false, false);
0289                     end
0290                 end
0291                 if seconseg<length(newsec) && segstart == newsec(seconseg+1)
0292                     seconseg = seconseg + 1;
0293                     [dv{1:6}] = unixsecs2date(processed_core(newsec(seconseg), self.parent.cols.START2));
0294                     date2 = cell2mat(dv);
0295                     if self.needs_secondary_data(fields)
0296                         try
0297                             data2 = self.parent.secondary.read_granule(date2, spec2, self.secondary_arguments(fields), false, false);
0298                         catch ME
0299                             switch ME.identifier
0300                                 case {'atmlab:SatDataset:cannotread', 'atmlab:find_granule_by_datetime'}
0301                                     ME2 = MException('atmlab:AssociatedDataset:cannolongerread', ...
0302                                         ['%s %s trying to postprocess %s %s, in the past could read ' ...
0303                                          '%s %s, but no longer :('], ...
0304                                          class(self), self.name, class(self.parent.secondary), self.parent.secondary.name, ...
0305                                          num2str(date2), spec2);
0306                                     ME2 = ME2.addCause(ME);
0307                                     ME2.throw();
0308                                 otherwise
0309                                     ME.rethrow();
0310                             end
0311                         end
0312                     end
0313                 end
0314                 %                 if local_success
0315                 depies_seg = cellfun(@(X)X(segstart:segend, :), depies, 'UniformOutput', false);
0316                 [new_out, localcols] = self.process_granule(...
0317                     processed_core(segstart:segend, :), data1, date1, ...
0318                     spec1, data2, date2, spec2, depies_seg, depcols, ...
0319                     fields);
0320                 %                 else
0321                 %                     new_out = nan * zeros(length(rows), length(cls));
0322                 %                 end
0323                 if isempty(out)
0324                     out = new_out;
0325                 else
0326                     out = [out; new_out]; %#ok<AGROW>
0327                 end
0328             end
0329         end
0330                 
0331         
0332         function S = merge_struct(self, S_core, S_self)
0333             % merge structures as obtained from read_homemade_granule
0334             %
0335             % In most cases this is a simple structure-concatenation, but
0336             % for Collapsers and some other AssociatedDatasets it's more
0337             % involved.
0338             
0339             % Note: this uses undocumented behaviour
0340             status = warning('error', 'catstruct:DuplicatesFound');
0341             S = catstruct(S_core, S_self);
0342             warning(status);
0343         end
0344         
0345         function C = get_mergefields(self) %#ok<MANU>
0346             % Get minimum fields required to do merging
0347             %
0348             % In some cases, <a href="matlab:help AssociatedDataset/merge">merge</a> requires a certain minimum
0349             % of fields in order to perform the merging. This method
0350             % returns the minimum for a particular object (usually constant
0351             % per class).
0352             %
0353             % FORMAT
0354             %
0355             %   C = ad.get_mergefields();
0356             %
0357             % IN
0358             %
0359             %   (none)
0360             %
0361             % OUT
0362             %
0363             %   C       cell array of strings   names of needed fields
0364             %
0365             C = {};
0366         end       
0367         
0368         function new = concatenate(self, old_core_result, old_additional_result, new_additional_result)
0369             % to concatenate old and new data matrices
0370             %
0371             % To concatenate old and new data matrices, sometimes some
0372             % fields need to be corrected, otherwise this is trivial.
0373             % However, always use this method to concatenate data, just in
0374             % case date have to be corrected. An example where this is
0375             % necessary is for a <a href="matlab:help Collapser">Collapser</a>, where FIRST and LAST need to be
0376             % corrected.
0377             %
0378             % FORMAT
0379             %
0380             %   new = ad.concatenate(old_core, old_addi, new_addi)
0381             %
0382             % IN
0383             %
0384             %   old_core    matrix      old core result
0385             %   old_addi    matrix      old additional result
0386             %   new_addi    matrix      new additional result
0387             %
0388             % OUT
0389             %
0390             %   new         matrix      concatenated additional result
0391             if isempty(new_additional_result)
0392                 new = old_additional_result;
0393             elseif isempty(old_additional_result)
0394                 new = new_additional_result;
0395             else
0396                 new = [old_additional_result; new_additional_result];
0397             end
0398         end
0399         
0400         function members2cols(self)
0401             % converts self.members to corresponding self.cols
0402             %
0403             % Assumes sizes in self.members are correct. This may not
0404             % always be the case before the first data is read!
0405             % This has no input or output, because it operates entirely on
0406             % the own object.
0407             %
0408             % FORMAT
0409             %
0410             %   ad.members2cols()
0411             %
0412             % IN
0413             %
0414             %   (none, but uses <a href="matlab:help AssociatedDataset.members">members</a>)
0415             %
0416             % OUT
0417             %
0418             %   (none, but sets <a href="matlab:help AssociatedDataset.cols">cols</a>)
0419             allnames = fieldnames(self.members);
0420             tot = 1;
0421             for i = 1:length(allnames)
0422                 fname = allnames{i};
0423                 fl = self.members.(fname);
0424                 if isfield(fl, 'dims')
0425                     no = self.members.(fname).dims{2};
0426                 else
0427                     no = 1;
0428                 end
0429                 % self.cols is in HomemadeDataset
0430                 self.cols.(fname) = tot:(tot+no-1);
0431                 tot = tot + no;
0432             end
0433         end
0434         
0435     end
0436     
0437     methods (Static, Access = {?SatDataset})       
0438         function M_cols = merge_new_cols(M_core, cols_core, cols_self)
0439             % merge different cols-structures describing matrix of data
0440             %
0441             % This static method merges two cols-structure that describe a
0442             % matrix of data.
0443             %
0444             % FORMAT
0445             %
0446             %   cols_new = merge_new_cols(M, cols_core, cols_self)
0447             %
0448             % IN
0449             %
0450             %   M           matrix      original data
0451             %   cols_core   structure   describing M
0452             %   cols_self   structure   describing new
0453             %
0454             % OUT
0455             %
0456             %   M_cols      matrix      new structure describing merged
0457             
0458             M_cols = catstruct(cols_core, ...
0459                 structfun(@(x)x+size(M_core, 2), cols_self, 'UniformOutput', false));
0460         end
0461         
0462         function do = redo_all(~)
0463             % redo_all(software_version)
0464             %
0465             % overload and return true if some changes require that a
0466             % dataset must be overwritten (overwrite=1) even if requested
0467             % to be appended (overwrite=2)
0468             do = false;
0469         end
0470     end
0471     
0472     % those methods must be implemented by subclasses
0473     methods (Abstract, Access = {?SatDataset})
0474         % Arguments to pass on to primary reader
0475         %
0476         % This method returns a cell array with arguments that shall be
0477         % passed on to the primary reader.
0478         %
0479         % FORMAT
0480         %
0481         %   args = ad.primary_arguments(fields)
0482         %
0483         % IN
0484         %
0485         %   cell array of strings with fields or 'all' (default)
0486         %
0487         % OUT
0488         %
0489         %   cell array with arguments passed on to primary reader.
0490         args = primary_arguments(self, varargin)
0491         
0492         % Arguments to pass on to secondary reader
0493         %
0494         % See <a href="matlab:help AssociatedDataset/primary_arguments">primary_arguments</a>.
0495         args = secondary_arguments(self, varargin)
0496         
0497         % Whether primary data is used at all.
0498         %
0499         % This method is used in 'late' processing, e.g. when the
0500         % collocations already exist, but associated data does not. For
0501         % late processing, not all original data may need to be re-read.
0502         % This method tells whether the primary data should be re-read.
0503         %
0504         % FORMAT
0505         %
0506         %   reread = ad.needs_primary_data('all')
0507         %
0508         % IN
0509         %
0510         %   optionally, cell array of strings with fields; defaults to
0511         %   'all'
0512         %
0513         % OUT
0514         %
0515         %   logical scalar (boolean), true if data must be reread
0516         bool = needs_primary_data(self, varargin)
0517         
0518         % Whether secondary data is used at all
0519         %
0520         % See <a href="matlab:help AssociatedDataset/needs_primary_data">needs_primary_data</a>.
0521         bool = needs_secondary_data(self, varargin)
0522 
0523         % Process a single granule
0524         %
0525         % This is the core method for any AssociatedDataset implementation.
0526         % It takes collocations as <a href="matlab:help CollocatedDataset/process">processed</a> by a <a href="matlab:help CollocatedDataset">CollocatedDataset</a>,
0527         % as well as original data from the primary and the secondary (if
0528         % so requested by <a href="matlab:help AssociatedDataset.needs_primary_data">needs_primary_data</a> and <a href="matlab:help AssociatedDataset.needs_secondary_data">needs_secondary_data</a>).
0529         % It then does the necessary processing (such as copying in the
0530         % case of <a href="matlab:help FieldCopier">FieldCopier</a>).
0531         % It must also set self.cols correctly.
0532         %
0533         % This method is not normally called directly by the user. However,
0534         % it is to me re-implemented in any special-purpose
0535         % AssociatedDataset.
0536         %
0537         % FORMAT
0538         %
0539         %   out = ad.process_granule(processed_core, ...
0540         %               data1, date1, spec1, ...
0541         %               data2, date2, spec2, ...
0542         %               dependencies)
0543         %
0544         % IN
0545         %
0546         %   processed_core  matrix
0547         %
0548         %       matrix with one row for each collocation and columns
0549         %       described by <a href="matlab:help CollocatedDataset/cols">CollocatedDataset.cols</a>.
0550         %       This is the output of <a href="matlab:help CollocatedDataset/process">CollocatedDataset.process</a>.
0551         %
0552         %   data1           structure
0553         %
0554         %       Full data for the primary, as output by the primary reader.
0555         %
0556         %   date1           datevec     date/time for primary granule
0557         %   spec1           various     specification (e.g. sat) for primary.
0558         %   data2           like data1, but for secondary
0559         %   date2           like date1, but for secondary
0560         %   spec2           like spec1, but for secondary
0561         %
0562         %   dependencies    cell array
0563         %
0564         %       Cell array with elements corresponding to
0565         %       <a href="matlab:AssociatedDataset.dependencies">AssociatedDataset.dependencies</a>. For each
0566         %       dependency, this cell array contains an element with the
0567         %       output of the process_granule method for that particular
0568         %       <a href="matlab:help AssociatedDataset">AssociatedDataset</a>. For example, for a <a href="matlab:help Collapser">Collapser</a>
0569         %       it will contain a single element with the output of
0570         %       FieldCopier.process_granule.
0571         %
0572         %   depcols         cell array of structures describing columns of
0573         %                   dependencies
0574         %
0575         %   fields          cell array
0576         %
0577         %       Fields to process.  'all' for all fields (only way until
0578         %       recently)
0579         %
0580         %  OUT
0581         %
0582         %   data        Matrix containing data
0583         %
0584         %   localcols   Cols describing columns.  If all fields, this is
0585         %               simply self.cols.
0586         [out, localcols] = process_granule(self, processed_core, data1, date1, spec1, data2, date2, spec2, dependencies, depcols, fields)
0587         %store(self, date, spec, result)
0588         
0589         fields = fields_needed_for_dependency(self, fields, dependency)
0590     end
0591     
0592 end

Generated on Mon 15-Sep-2014 13:31:28 by m2html © 2005