The Pycmor Standard Library#

The standard library contains functions that are included in the default pipelines, and are generally used as step functions. We expose several useful ones:

  • Unit Conversion

  • Time Averaging

  • Dataset Loading

  • Variable Extraction

  • Temporal Resampling

  • Trigger Compute

  • Show Data

  • Global Attributes

  • Variable Attributes

See the documentation for each of the steps for more details.

pycmor.std_lib.checkpoint_pipeline(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Insert a checkpoint in the pipeline processing.

This function allows for state saving during pipeline processing, which can be useful for debugging or resuming processing from a specific point.

Parameters:
Returns:

The input data (typically unchanged).

Return type:

xarray.DataArray or xarray.Dataset

Notes

Depending on the configuration in rule, this function might: - Save the current state to disk - Log the current state - Perform debugging operations

pycmor.std_lib.convert_units(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Convert units of a DataArray or Dataset based upon the Data Request Variable you have selected. Automatically handles chemical elements and dimensionless units.

Parameters:
Returns:

The converted data.

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.get_variable(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Extract a variable from a dataset as a DataArray.

Parameters:
  • data (xarray.Dataset) – The dataset containing the variable to extract.

  • rule (Rule) – The rule containing the variable name to extract.

Returns:

The extracted variable as a DataArray.

Return type:

xarray.DataArray

Raises:

KeyError – If the variable specified in the rule does not exist in the dataset.

pycmor.std_lib.load_data(data: DataArray | Dataset | None, rule: Rule) DataArray | Dataset[source]#

Load data from files according to the rule specification.

This function opens and combines data from multiple files that match the pattern specified in the rule. It’s useful for loading time series data that may be spread across multiple files.

Parameters:
  • data (xarray.DataArray or xarray.Dataset or None) – Existing data (if any) to incorporate with loaded data.

  • rule (Rule) – The rule containing the input patterns and other specifications for loading the data.

Returns:

The loaded data combined into a single Dataset or DataArray.

Return type:

xarray.DataArray or xarray.Dataset

Notes

The rule_spec dictionary should contain an input_patterns key with a list of file patterns to match, e.g., [path/to/data/*.nc].

pycmor.std_lib.set_global_attributes(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Set global metadata attributes for a Dataset or DataArray.

This function applies standardized global attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.

Parameters:
  • data (xarray.DataArray or xarray.Dataset) – The data to which global attributes will be added.

  • rule (Rule) – The rule containing the global attribute specifications.

Returns:

The data with updated global attributes.

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.set_variable_attributes(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Set variable-specific metadata attributes.

This function applies standardized variable attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.

Parameters:
  • data (xarray.DataArray or xarray.Dataset) – The data to which variable attributes will be added.

  • rule (Rule) – The rule containing the variable attribute specifications.

Returns:

The data with updated variable attributes.

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.show_data(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Print data to screen for inspection and debugging purposes.

This function is useful during development and debugging to inspect the content and structure of DataArrays and Datasets.

Parameters:
Returns:

The input data (unchanged).

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.temporal_resample(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Resample a DataArray or Dataset to a different temporal frequency.

Parameters:
  • data (xarray.DataArray or xarray.Dataset) – The data to resample.

  • rule (Rule) – The rule containing parameters for the resampling operation, including the frequency for resampling.

Returns:

The resampled data.

Return type:

xarray.DataArray or xarray.Dataset

Notes

This function resamples time series data to a different frequency. The frequency is determined from the rule (typically from data_request_variable.frequency). Common frequencies include: - ‘YS’: year start - ‘MS’: month start - ‘D’: daily - ‘H’: hourly

See also

https

//docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations

pycmor.std_lib.time_average(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Compute the time average of a DataArray or Dataset based upon the Data Request Variable you have selected.

Parameters:
  • data (xarray.DataArray or xarray.Dataset) – The data to average.

  • rule (Rule) – The rule specifying parameters for time averaging, such as the time period or method to use for averaging.

Returns:

The averaged data.

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.trigger_compute(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#

Trigger computation of lazy (dask-backed) data operations.

This function is useful to ensure that all pending computations are executed before proceeding with the next steps in a pipeline. It’s particularly important before saving data to files.

Parameters:
Returns:

The computed data with all operations applied.

Return type:

xarray.DataArray or xarray.Dataset