pycmor.std_lib package

pycmor.std_lib package#

The Pycmor Standard Library#

The standard library contains functions that are included in the default pipelines, and are generally used as step functions. We expose several useful ones:

Unit Conversion
Time Averaging
Dataset Loading
Variable Extraction
Temporal Resampling
Trigger Compute
Show Data
Global Attributes
Variable Attributes

See the documentation for each of the steps for more details.

pycmor.std_lib.checkpoint_pipeline(data: DataArray | Dataset, rule: Rule) → DataArray | Dataset[source]#

Insert a checkpoint in the pipeline processing.

This function allows for state saving during pipeline processing, which can be useful for debugging or resuming processing from a specific point.

Parameters:

data (xarray.DataArray or xarray.Dataset) – The current data in the pipeline.
rule (Rule) – The rule containing checkpoint parameters.

Returns:

The input data (typically unchanged).

Return type:

xarray.DataArray or xarray.Dataset

Notes

Depending on the configuration in rule, this function might: - Save the current state to disk - Log the current state - Perform debugging operations

pycmor.std_lib.convert_units(data: DataArray | Dataset, rule: Rule) → DataArray | Dataset[source]#

Convert units of a DataArray or Dataset based upon the Data Request Variable you have selected. Automatically handles chemical elements and dimensionless units.

Parameters:

data (xarray.DataArray or xarray.Dataset) – The data to convert.
rule (Rule) – The rule containing the units to convert to.

Returns:

The converted data.

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.get_variable(data: DataArray | Dataset, rule: Rule) → DataArray | Dataset[source]#

Extract a variable from a dataset as a DataArray.

Parameters:

data (xarray.Dataset) – The dataset containing the variable to extract.
rule (Rule) – The rule containing the variable name to extract.

Returns:

The extracted variable as a DataArray.

Return type:

xarray.DataArray

Raises:

KeyError – If the variable specified in the rule does not exist in the dataset.

pycmor.std_lib.load_data(data: DataArray | Dataset | None, rule: Rule) → DataArray | Dataset[source]#

Load data from files according to the rule specification.

This function opens and combines data from multiple files that match the pattern specified in the rule. It’s useful for loading time series data that may be spread across multiple files.

Parameters:

data (xarray.DataArray or xarray.Dataset or None) – Existing data (if any) to incorporate with loaded data.
rule (Rule) – The rule containing the input patterns and other specifications for loading the data.

Returns:

The loaded data combined into a single Dataset or DataArray.

Return type:

xarray.DataArray or xarray.Dataset

Notes

The rule_spec dictionary should contain an input_patterns key with a list of file patterns to match, e.g., [path/to/data/*.nc].

pycmor.std_lib.set_global_attributes(data: DataArray | Dataset, rule: Rule) → DataArray | Dataset[source]#

Set global metadata attributes for a Dataset or DataArray.

This function applies standardized global attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.

Parameters:

data (xarray.DataArray or xarray.Dataset) – The data to which global attributes will be added.
rule (Rule) – The rule containing the global attribute specifications.

Returns:

The data with updated global attributes.

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.set_variable_attributes(data: DataArray | Dataset, rule: Rule) → DataArray | Dataset[source]#

Set variable-specific metadata attributes.

This function applies standardized variable attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.

Parameters:

data (xarray.DataArray or xarray.Dataset) – The data to which variable attributes will be added.
rule (Rule) – The rule containing the variable attribute specifications.

Returns:

The data with updated variable attributes.

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.show_data(data: DataArray | Dataset, rule: Rule) → DataArray | Dataset[source]#

Print data to screen for inspection and debugging purposes.

This function is useful during development and debugging to inspect the content and structure of DataArrays and Datasets.

Parameters:

data (xarray.DataArray or xarray.Dataset) – The data to display.
rule (Rule) – The rule containing additional parameters.

Returns:

The input data (unchanged).

Return type:

xarray.DataArray or xarray.Dataset

pycmor.std_lib.temporal_resample(data: DataArray | Dataset, rule: Rule) → DataArray | Dataset[source]#

Resample a DataArray or Dataset to a different temporal frequency.

Parameters:

data (xarray.DataArray or xarray.Dataset) – The data to resample.
rule (Rule) – The rule containing parameters for the resampling operation, including the frequency for resampling.

Returns:

The resampled data.

Return type:

xarray.DataArray or xarray.Dataset

Notes

This function resamples time series data to a different frequency. The frequency is determined from the rule (typically from data_request_variable.frequency). Common frequencies include: - ‘YS’: year start - ‘MS’: month start - ‘D’: daily - ‘H’: hourly

Submodules#

pycmor.std_lib.dataset_helpers module#

pycmor.std_lib.dataset_helpers.freq_is_coarser_than_data(freq: str, ds: Dataset, ref_time: Timestamp = Timestamp('1970-01-01 00:00:00')) → bool[source]#

Checks if the frequency is coarser than the time frequency of the xarray Dataset.

Parameters:

freq (str) – The frequency to compare (e.g. ‘M’, ‘D’, ‘6H’).
ds (xr.Dataset) – The dataset containing a time coordinate.
ref_time (pd.Timestamp, optional) – Reference timestamp used to convert frequency to a time delta. Defaults to the beginning of the Unix Epoch.

Returns:

True if freq is coarser (covers a longer duration) than the dataset’s frequency.

Return type:

bool

pycmor.std_lib.dataset_helpers.get_time_label(ds)[source]#

Determines the name of the coordinate in the dataset that can serve as a time label.

Parameters:: ds (xarray.Dataset) – The dataset containing coordinates to check for a time label.
Returns:: The name of the coordinate that is a datetime type and can serve as a time label, or None if no such coordinate is found.
Return type:: str or None

Example

>>> import xarray as xr
>>> import pandas as pd
>>> import numpy as np
>>> ds = xr.Dataset({'time': ('time', pd.date_range('2000-01-01', periods=10))})
>>> get_time_label(ds)
'time'
>>> ds = xr.DataArray(np.ones(10), coords={'T': ('T', pd.date_range('2000-01-01', periods=10))})
>>> get_time_label(ds)
'T'
>>> # The following does have a valid time coordinate, expected to return None
>>> da = xr.Dataset({'time': ('time', [1,2,3,4,5])})
>>> get_time_label(da) is None
True

pycmor.std_lib.dataset_helpers.has_time_axis(ds) → bool[source]#

Checks if the given dataset has a time axis.

Parameters:: ds (xarray.Dataset or xarray.DataArray) – The dataset to check.
Returns:: True if the dataset has a time axis, False otherwise.
Return type:: bool

pycmor.std_lib.dataset_helpers.is_datetime_type(arr: ndarray) → bool[source]#: Checks if array elements are datetime objects or cftime objects

pycmor.std_lib.dataset_helpers.needs_resampling(ds, timespan)[source]#

Checks if a given dataset needs resampling based on its time axis.

Parameters:

ds (xr.Dataset or xr.DataArray) – The dataset to check.
timespan (str) – The time span for which the dataset is to be resampled. 10YS, 1YS, 6MS, etc.

Returns:

bool – True if the dataset needs resampling, False otherwise.
Notes
——
After time-averaging step, this function aids in determining if
splitting into multiple files is required based on provided
timespan.

pycmor.std_lib.exceptions module#

This module contains custom exceptions that you should raise when something specific goes wrong in the standard library.

exception pycmor.std_lib.exceptions.PycmorError[source]#

Bases: Exception

Base class for all errors raised by pycmor.

exception pycmor.std_lib.exceptions.PycmorResamplingError[source]#

Bases: PycmorError

Error raised when resampling fails.

exception pycmor.std_lib.exceptions.PycmorResamplingTimeAxisIncompatibilityError[source]#

Bases: PycmorResamplingError, ValueError

Error raised when resampling fails due to time axis incompatibility.

pycmor.std_lib.files module#

This module contains functions for handling file-related operations in the pycmor package. It includes functions for creating filepaths based on given rules and datasets, and for saving the resulting datasets to the generated filepaths.

Table 2: Precision of time labels used in file names |---------------+-------------------+-----------------------------------------------| | Frequency | Precision of time | Notes | | | label | | |---------------+-------------------+-----------------------------------------------| | yr, dec, | “yyyy” | Label with the years recorded in the first | | yrPt | | and last coordinate values. | |---------------+-------------------+-----------------------------------------------| | mon, monC | “yyyyMM” | For “mon”, label with the months recorded in | | | | the first and last coordinate values; for | | | | “monC” label with the first and last months | | | | contributing to the climatology. | |---------------+-------------------+-----------------------------------------------| | day | “yyyyMMdd” | Label with the days recorded in the first and | | | | last coordinate values. | |---------------+-------------------+-----------------------------------------------| | 6hr, 3hr, | “yyyyMMddhhmm” | Label 1hrCM files with the beginning of the | | 1hr, | | first hour and the end of the last hour | | 1hrCM, 6hrPt, | | contributing to climatology (rounded to the | | 3hrPt, | | nearest minute); for other frequencies in | | 1hrPt | | this category, label with the first and last | | | | time-coordinate values (rounded to the | | | | nearest minute). | |---------------+-------------------+-----------------------------------------------| | subhrPt | “yyyyMMddhhmmss” | Label with the first and last time-coordinate | | | | values (rounded to the nearest second) | |---------------+-------------------+-----------------------------------------------| | fx | Omit time label | This frequency applies to variables that are | | | | independent of time (“fixed”). | |---------------+-------------------+-----------------------------------------------|

pycmor.std_lib.files._filename_time_range(ds, rule) → str[source]#

Determine the time range used in naming the file.

Parameters:

ds (xarray.Dataset) – The input dataset.
rule (Rule) – The rule object containing information for generating the filepath.

Returns:

time_range in filepath.

Return type:

str

pycmor.std_lib.files._save_dataset_with_native_timespan(da, rule, time_label, time_encoding, **extra_kwargs)[source]#

pycmor.std_lib.files.create_filepath(ds, rule)[source]#

Generate a filepath when given an xarray dataset and a rule.

This function generates a filepath for the output file based on the given dataset and rule. The filepath includes the name, table_id, institution, source_id, experiment_id, label, grid, and optionally the start and end time.

Parameters:

ds (xarray.Dataset) – The input dataset.
rule (Rule) – The rule object containing information for generating the filepath.

Returns:

The generated filepath.

Return type:

str

Notes

The rule object should have the following attributes: cmor_variable, data_request_variable, variant_label, source_id, experiment_id, output_directory, and optionally institution.

pycmor.std_lib.files.file_timespan_tail(rule)[source]#: Grab the last timestamp in each file and return them as a list. Also account for offset (if any) defined on the rule

pycmor.std_lib.files.get_offset(rule)[source]#: convert offset defined on the rule to a timedelta.

pycmor.std_lib.files.save_dataset(da: DataArray, rule)[source]#

Save dataset to one or more files.

Parameters:

da (xr.DataArray) – The dataset to be saved.
rule (Rule) – The rule object containing information for generating the filepath.

Return type:

None

Notes

If the dataset does not have a time axis, or if the time axis is a scalar, this function will save the dataset to a single file. Otherwise, it will split the dataset into chunks based on the time axis and save each chunk to a separate file.

The filepath will be generated based on the rule object and the time range of the dataset. The filepath will include the name, table_id, institution, source_id, experiment_id, label, grid, and optionally the start and end time.

If the dataset needs resampling (i.e., the time axis does not align with the time frequency specified in the rule object), this function will split the dataset into chunks based on the time axis and resample each chunk to the specified frequency. The resampled chunks will then be saved to separate files.

NOTE: prior to calling this function, call dask.compute() method, otherwise tasks will progress very slow.

pycmor.std_lib.files.split_data_timespan(ds, rule)[source]#

Splits the dataset into chunks based on the time axis as defined in the source files.

Parameters:

ds (xarray.Dataset) – The dataset to split.
rule (Rule) – The rule object containing information for generating the filepath.

Returns:

A list of datasets, each containing a chunk of the original dataset.

Return type:

list

pycmor.std_lib.generic module#

Generic#

This module, generic.py, provides functionalities for transforming and standardizing NetCDF files according to CMOR.

It contains several functions and classes:

Functions (can be used as actions in Rule objects): - linear_transform: Applies a linear transformation to the data of a NetCDF file. - invert_z_axis: Inverts the z-axis of a NetCDF file.

The Full CMOR (yes, bad pun):

Applied if no other rule sets are given for a file
Adds CMOR metadata to the file
Converts units
Performs time averaging

pycmor.std_lib.generic.create_cmor_directories(config: dict) → dict[source]#

Creates the directory structure for the CMORized files.

Parameters:: config (dict) – The pymor configuration dictionary

pycmor.std_lib.global_attributes module#

class pycmor.std_lib.global_attributes.CMIP6GlobalAttributes(drv, cv, rule_dict)[source]#

Bases: GlobalAttributes

_registry = {}#

_variant_label_components(label: str)[source]#

get_Conventions()[source]#

get_activity_id()[source]#

get_creation_date()[source]#

get_data_specs_version()[source]#

get_experiment()[source]#

get_experiment_id()[source]#

get_forcing_index()[source]#

get_frequency()[source]#

get_further_info_url()[source]#

get_grid()[source]#

get_grid_label()[source]#

get_initialization_index()[source]#

get_institution()[source]#

get_institution_id()[source]#

get_license()[source]#

get_mip_era()[source]#

get_nominal_resolution()[source]#

get_physics_index()[source]#

get_product()[source]#

get_realization_index()[source]#

get_realm()[source]#

get_source()[source]#

get_source_id()[source]#

get_source_type()[source]#

get_sub_experiment()[source]#

get_sub_experiment_id()[source]#

get_table_id()[source]#

get_tracking_id()[source]#

get_variable_id()[source]#

get_variant_label()[source]#

global_attributes() → dict[source]#

property required_global_attributes#

subdir_path() → str[source]#

class pycmor.std_lib.global_attributes.CMIP7GlobalAttributes[source]#

Bases: GlobalAttributes

_registry = {}#

global_attributes()[source]#

subdir_path()[source]#

class pycmor.std_lib.global_attributes.GlobalAttributes[source]#

Bases: object

_registry = {'CMIP6': <class 'pycmor.std_lib.global_attributes.CMIP6GlobalAttributes'>, 'CMIP7': <class 'pycmor.std_lib.global_attributes.CMIP7GlobalAttributes'>}#

abstractmethod global_attributes()[source]#

abstractmethod subdir_path()[source]#

pycmor.std_lib.global_attributes.set_global_attributes(ds, rule)[source]#: Set global attributes for the dataset

pycmor.std_lib.setgrid module#

Set grid information on the data file.

xarray does not have a built-in setgrid operator unlike cdo. Using xarray.merge directly to merge grid with data may or may not produce the desired result all the time.

Some guiding rules to set the grid information:

At least one dimension size in both data file and grid file should match.
If the dimension size match but not the dimension name, then the dimension name in data file is renamed to match the dimension name in grid file.
The matching dimension size must be one of the coordinate variables in both data file and grid file.
If all above conditions are met, then the data file is merged with the grid file.
The coordinate variables and boundary variables (lat_bnds, lon_bnds) from the grid file are kept, while other data variables in grid file are dropped.
The result of the merge is always a xarray.Dataset

Note: Rule 5 is not strict and may go away if it is not desired.

pycmor.std_lib.setgrid.setgrid(da: Dataset | DataArray, rule: Rule) → Dataset | DataArray[source]#

Appends grid information to data file if necessary coordinate dimensions exits in data file. Renames dimensions in data file to match the dimension names in grid file if necessary.

Parameters:

da (xr.Dataset or xr.DataArray) – The input dataarray or dataset.
rule (Rule object containing gridfile attribute)

Returns:

The output dataarray or dataset with the grid information.

Return type:

xr.Dataset

pycmor.std_lib.timeaverage module#

Time Averaging#

This module contains functions for time averaging of data arrays.

The approximate interval for time averaging is prescribed in the CMOR tables, using the key 'approx_interval'. This information is also provided within the library.

Functions#

_get_time_method(frequency: str) -> str:: Determine the time method based on the frequency string from rule.data_request_variable.frequency.
_frequency_from_approx_interval(interval: str) -> str:: Convert an interval expressed in days to a frequency string.
timeavg(da: xr.DataArray, rule: Dict) -> xr.DataArray:: Time averages data with respect to time-method (mean/climatology/instant.)

Module Variables#

_IGNORED_CELL_METHODSlist: List of cell_methods to ignore when calculating time averages.

pycmor.std_lib.timeaverage._IGNORED_CELL_METHODS = ['area: depth: time: mean', 'area: mean', 'area: mean (comment: over land and sea ice) time: point', 'area: mean time: maximum', 'area: mean time: maximum within days time: mean over days', 'area: mean time: mean within days time: mean over days', 'area: mean time: mean within hours time: maximum over hours', 'area: mean time: mean within years time: mean over years', 'area: mean time: minimum', 'area: mean time: minimum within days time: mean over days', 'area: mean time: point', 'area: mean time: sum', 'area: mean where crops time: maximum', 'area: mean where crops time: maximum within days time: mean over days', 'area: mean where crops time: minimum', 'area: mean where crops time: minimum within days time: mean over days', 'area: mean where grounded_ice_sheet', 'area: mean where ice_free_sea over sea time: mean', 'area: mean where ice_sheet', 'area: mean where land', 'area: mean where land over all_area_types time: mean', 'area: mean where land over all_area_types time: point', 'area: mean where land over all_area_types time: sum', 'area: mean where land time: mean', 'area: mean where land time: mean (with samples weighted by snow mass)', 'area: mean where land time: point', 'area: mean where sea', 'area: mean where sea depth: sum where sea (top 100m only) time: mean', 'area: mean where sea depth: sum where sea time: mean', 'area: mean where sea time: mean', 'area: mean where sea time: point', 'area: mean where sea_ice (comment: mask=siconc) time: point', 'area: mean where sector time: point', 'area: mean where snow over sea_ice area: time: mean where sea_ice', 'area: point', 'area: point time: point', 'area: sum', 'area: sum where ice_sheet time: mean', 'area: sum where sea time: mean', 'area: time: mean', 'area: time: mean (comment: over land and sea ice)', 'area: time: mean where cloud', 'area: time: mean where crops (comment: mask=cropFrac)', 'area: time: mean where floating_ice_shelf (comment: mask=sftflf)', 'area: time: mean where grounded_ice_sheet (comment: mask=sfgrlf)', 'area: time: mean where ice_sheet', 'area: time: mean where natural_grasses (comment: mask=grassFrac)', 'area: time: mean where pastures (comment: mask=pastureFrac)', 'area: time: mean where sea_ice (comment: mask=siconc)', 'area: time: mean where sea_ice (comment: mask=siconca)', 'area: time: mean where sea_ice (comment: mask=siitdconc)', 'area: time: mean where sea_ice_melt_pond (comment: mask=simpconc)', 'area: time: mean where sea_ice_ridges (comment: mask=sirdgconc)', 'area: time: mean where sector', 'area: time: mean where shrubs (comment: mask=shrubFrac)', 'area: time: mean where snow (comment: mask=snc)', 'area: time: mean where trees (comment: mask=treeFrac)', 'area: time: mean where unfrozen_soil', 'area: time: mean where vegetation (comment: mask=vegFrac)', 'longitude: mean time: mean', 'longitude: mean time: point', 'longitude: sum (comment: basin sum [along zig-zag grid path]) depth: sum time: mean', 'time: mean', 'time: mean grid_longitude: mean', 'time: point']#

cell_methods to ignore when calculating time averages

Type:: list

pycmor.std_lib.timeaverage._frequency_from_approx_interval(interval: str)[source]#

Convert an interval expressed in days to a frequency string.

This function takes an interval expressed in days and converts it to a frequency string in a suitable time unit (decade, year, month, day, hour, minute, second, millisecond). The conversion is based on an approximate number of days for each time unit.

Parameters:: interval (str) – The interval expressed in days.
Returns:: The frequency string in a suitable time unit.
Return type:: str
Raises:: ValueError – If the interval cannot be converted to a float.

pycmor.std_lib.timeaverage._get_time_method(frequency: str) → str[source]#

Determine the time method based on the frequency string from CMIP6 table for a specific variable (rule.data_request_variable.frequency).

The type of time method influences how the data is processed for time averaging.

Parameters:: frequency (str) – The frequency string from CMIP6 tables (example: “mon”).
Returns:: The corresponding time method (‘INSTANTANEOUS’, ‘CLIMATOLOGY’, or ‘MEAN’).
Return type:: str

pycmor.std_lib.timeaverage.custom_resample(df, freq='M', offset=0.5, func='mean')[source]#

Resample a DataFrame and place timestamps at a custom offset within each period.

Parameters:

df (DataFrame) – DataFrame with a DatetimeIndex
freq (str) – Frequency string (e.g., ‘M’ for month, ‘Y’ for year)
offset (float) – Float between 0 and 1, representing the position within each period
func (str) – Resampling function (e.g., ‘mean’, ‘sum’, ‘max’)

Returns:

Resampled DataFrame with adjusted timestamps

Return type:

DataFrame

Examples

First, set up our imports and random seed:

>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(42)
>>> date_rng = pd.date_range(start="2023-01-01", end="2023-12-31", freq="D")
>>> df = pd.DataFrame({"value": rng.random(len(date_rng))}, index=date_rng)

Test mid-month resampling:

>>> df_month_mid = custom_resample(df, freq="ME", offset=0.5)
>>> print(df_month_mid.head())
                        value
2023-01-16 00:00:00  0.565127
2023-02-14 12:00:00  0.484111
2023-03-16 00:00:00  0.434221
2023-04-15 12:00:00  0.510354
2023-05-16 00:00:00  0.443399

Test mid-year resampling:

>>> df_year_mid = custom_resample(df, freq="YE", offset=0.5)
>>> print(df_year_mid)
               value
2023-07-02  0.492457

Test mid-week resampling:

>>> df_week_mid = custom_resample(df, freq="W", offset=0.5)
>>> print(df_week_mid.head())
               value
2023-01-01  0.773956
2023-01-05  0.658835
2023-01-12  0.540872
2023-01-19  0.488221
2023-01-26  0.500237

Test one-third through each month:

>>> df_month_third = custom_resample(df, freq="ME", offset=1/3)
>>> print(df_month_third.head())
                        value
2023-01-11 00:00:00  0.565127
2023-02-10 00:00:00  0.484111
2023-03-11 00:00:00  0.434221
2023-04-10 16:00:00  0.510354
2023-05-11 00:00:00  0.443399

Test quarter-end resampling:

>>> df_quarter_end = custom_resample(df, freq="QE", offset=1)
>>> print(df_quarter_end)
               value
2023-03-31  0.494832
2023-06-30  0.496207
2023-09-30  0.461806
2023-12-31  0.517077

Test with irregular time series:

>>> irregular_dates = pd.date_range("2023-01-01", periods=100, freq="D").tolist()
>>> irregular_dates += pd.date_range("2023-05-01", periods=50, freq="2D").tolist()
>>> irregular_dates += pd.date_range("2023-07-01", periods=30, freq="3D").tolist()
>>> df_irregular = pd.DataFrame({"value": rng.random(len(irregular_dates))}, index=irregular_dates)
>>> df_irregular_month = custom_resample(df_irregular, freq="ME", offset=0.5)
>>> print(df_irregular_month.head())
                        value
2023-01-16 00:00:00  0.543549
2023-02-14 12:00:00  0.485275
2023-03-16 00:00:00  0.513365
2023-04-05 12:00:00  0.558554
2023-05-16 00:00:00  0.447175

pycmor.std_lib.timeaverage.timeavg(da: DataArray, rule)[source]#

Time averages data with respect to time-method (mean/climatology/instant.)

This function takes a data array and a rule, computes the timespan of the data array, and then performs time averaging based on the time method specified in the rule. The time methods can be "INSTANTANEOUS", "MEAN", or "CLIMATOLOGY".

For "MEAN" time method, the timestamps can be adjusted using the adjust_timestamp parameter in the rule dict.

This can be either: - A float between 0 and 1 representing the position within each period (e.g., 0.5 for mid-point) - A string preset: “first”/”start” (0.0), “last”/”end” (1.0), “mid”/”middle” (0.5) - A pandas offset string (e.g., “2d” for 2 days offset)

This feature is useful for setting consistent mid-month dates by setting adjust_timestamp to “14d”.

Parameters:

da (xr.DataArray) – The data array to compute the timespan for.
rule (dict) – The rule dict containing the time method and other parameters. For “MEAN” time method, can include ‘adjust_timestamp’ to control timestamp positioning.

Returns:

The time averaged data array.

Return type:

xr.DataArray

pycmor.std_lib.units module#

This module deals with the auto-unit conversion in the cmorization process. In case the units in model files differ from CMIP Tables, this module attempts to convert them automatically.

Conversion to-or-from a dimensionless quantity is ambiguous. In this case, provide a mapping of what this dimensionless quantity represents and that is used for the conversion. data/dimensionless_mappings.yaml contains some examples on how the mapping is written.

handle_unit_conversion() is the only function users care about as it handles the unit conversion of an xr.DataArray according to a Rule. The rest of the functions in this module are support functions.

pycmor.std_lib.units._get_units(da: DataArray, rule: Rule) → tuple[str, str, str][source]#

Get the units from a DataArray and a Rule.

This function extracts the units from a DataArray and a Rule. If the Rule contains a model_units entry, this takes precedence over the units defined in the dataset. The function also handles dimensionless units by looking up a unit alias in the dimensionless_unit_mappings dictionary of the Rule.

Parameters:

da (xarray.DataArray) – The DataArray to extract the units from.
rule (dict) – The Rule to extract the units from.

Returns:

from_unit (str) – The unit of the DataArray.
to_unit (str) – The unit to convert the DataArray to.
to_unit_dimensionless_mapping (str) – The unit alias used for representing the to_unit.

pycmor.std_lib.units.convert(da: DataArray, from_unit: str, to_unit: str, to_unit_dimensionless_mapping: str | None = None) → DataArray[source]#

Convert a DataArray from one unit to another.

This function handles the conversion of a xarray.DataArray from one unit to another, taking into account chemical symbols and scaling factor in units. It uses the pint library for unit conversion and supports aliasing of target units.

Parameters:

da (xarray.DataArray) – The DataArray to be converted.
from_unit (str) – The unit of the input DataArray.
to_unit (str) – The unit to convert the DataArray to.
to_unit_dimensionless_mapping (str, optional) – An alias for the target unit, if any. Defaults to None.

Returns:

The converted DataArray with the new unit.

Return type:

xarray.DataArray

Raises:

ValueError – If the conversion between the specified units is not possible.

pycmor.std_lib.units.handle_chemicals(s: str | None = None, pattern: Pattern = re.compile('mol(?P<symbol>\\w+)')) → None[source]#

Handle units containing chemical symbols.

If the unit string contains a chemical symbol (e.g. molNaCl), Pint will raise an error because it does not know the definition of the chemical symbol. This function attempts to detect chemical symbols in the unit string and register a unit definition for it with the aid of chemicals package.

Parameters:

s (str) – The unit string to parse.
pattern (re.Pattern, optional) – The regular expression pattern to use for searching for chemical symbols in the unit string. Defaults to a pattern that matches “mol” followed by any number of word characters.

Return type:

None

Raises:

ValueError – If the chemical symbol is not recognized.

pycmor.std_lib.variable_attributes module#

Pipeline steps to attach metadata attributes to the xarrays

pycmor.std_lib.variable_attributes.set_variable_attributes(ds: Dataset | DataArray, rule: Rule) → Dataset | DataArray#

pycmor.std_lib.variable_attributes.set_variable_attrs(ds: Dataset | DataArray, rule: Rule) → Dataset | DataArray[source]#

pycmor.std_lib package

Contents

pycmor.std_lib package#

The Pycmor Standard Library#

Submodules#

pycmor.std_lib.dataset_helpers module#

pycmor.std_lib.exceptions module#

pycmor.std_lib.files module#

pycmor.std_lib.generic module#

Generic#

pycmor.std_lib.global_attributes module#

pycmor.std_lib.setgrid module#

pycmor.std_lib.timeaverage module#

Time Averaging#

Functions#

Module Variables#

pycmor.std_lib.units module#

pycmor.std_lib.variable_attributes module#