The Pycmor Standard Library#
The standard library contains functions that are included in the default
pipelines, and are generally used as step functions. We expose several
useful ones:
Unit Conversion
Time Averaging
Dataset Loading
Variable Extraction
Temporal Resampling
Trigger Compute
Show Data
Global Attributes
Variable Attributes
See the documentation for each of the steps for more details.
- pycmor.std_lib.checkpoint_pipeline(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Insert a checkpoint in the pipeline processing.
This function allows for state saving during pipeline processing, which can be useful for debugging or resuming processing from a specific point.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The current data in the pipeline.
rule (Rule) – The rule containing checkpoint parameters.
- Returns:
The input data (typically unchanged).
- Return type:
Notes
Depending on the configuration in rule, this function might: - Save the current state to disk - Log the current state - Perform debugging operations
- pycmor.std_lib.convert_units(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Convert units of a DataArray or Dataset based upon the Data Request Variable you have selected. Automatically handles chemical elements and dimensionless units.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to convert.
rule (Rule) – The rule containing the units to convert to.
- Returns:
The converted data.
- Return type:
- pycmor.std_lib.get_variable(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Extract a variable from a dataset as a DataArray.
- Parameters:
data (xarray.Dataset) – The dataset containing the variable to extract.
rule (Rule) – The rule containing the variable name to extract.
- Returns:
The extracted variable as a DataArray.
- Return type:
- Raises:
KeyError – If the variable specified in the rule does not exist in the dataset.
- pycmor.std_lib.load_data(data: DataArray | Dataset | None, rule: Rule) DataArray | Dataset[source]#
Load data from files according to the rule specification.
This function opens and combines data from multiple files that match the pattern specified in the rule. It’s useful for loading time series data that may be spread across multiple files.
- Parameters:
data (xarray.DataArray or xarray.Dataset or None) – Existing data (if any) to incorporate with loaded data.
rule (Rule) – The rule containing the input patterns and other specifications for loading the data.
- Returns:
The loaded data combined into a single Dataset or DataArray.
- Return type:
Notes
The rule_spec dictionary should contain an
input_patternskey with a list of file patterns to match, e.g., [path/to/data/*.nc].
- pycmor.std_lib.set_global_attributes(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Set global metadata attributes for a Dataset or DataArray.
This function applies standardized global attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to which global attributes will be added.
rule (Rule) – The rule containing the global attribute specifications.
- Returns:
The data with updated global attributes.
- Return type:
- pycmor.std_lib.set_variable_attributes(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Set variable-specific metadata attributes.
This function applies standardized variable attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to which variable attributes will be added.
rule (Rule) – The rule containing the variable attribute specifications.
- Returns:
The data with updated variable attributes.
- Return type:
- pycmor.std_lib.show_data(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Print data to screen for inspection and debugging purposes.
This function is useful during development and debugging to inspect the content and structure of DataArrays and Datasets.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to display.
rule (Rule) – The rule containing additional parameters.
- Returns:
The input data (unchanged).
- Return type:
- pycmor.std_lib.temporal_resample(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Resample a DataArray or Dataset to a different temporal frequency.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to resample.
rule (Rule) – The rule containing parameters for the resampling operation, including the frequency for resampling.
- Returns:
The resampled data.
- Return type:
Notes
This function resamples time series data to a different frequency. The frequency is determined from the rule (typically from data_request_variable.frequency). Common frequencies include: - ‘YS’: year start - ‘MS’: month start - ‘D’: daily - ‘H’: hourly
See also
https//docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations
- pycmor.std_lib.time_average(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Compute the time average of a DataArray or Dataset based upon the Data Request Variable you have selected.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to average.
rule (Rule) – The rule specifying parameters for time averaging, such as the time period or method to use for averaging.
- Returns:
The averaged data.
- Return type:
- pycmor.std_lib.trigger_compute(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Trigger computation of lazy (dask-backed) data operations.
This function is useful to ensure that all pending computations are executed before proceeding with the next steps in a pipeline. It’s particularly important before saving data to files.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data containing operations to be computed.
rule (Rule) – The rule containing additional parameters for computation.
- Returns:
The computed data with all operations applied.
- Return type: