pycmor.std_lib package#
The Pycmor Standard Library#
The standard library contains functions that are included in the default
pipelines, and are generally used as step functions. We expose several
useful ones:
Unit Conversion
Time Averaging
Dataset Loading
Variable Extraction
Temporal Resampling
Trigger Compute
Show Data
Global Attributes
Variable Attributes
See the documentation for each of the steps for more details.
- pycmor.std_lib.checkpoint_pipeline(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Insert a checkpoint in the pipeline processing.
This function allows for state saving during pipeline processing, which can be useful for debugging or resuming processing from a specific point.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The current data in the pipeline.
rule (Rule) – The rule containing checkpoint parameters.
- Returns:
The input data (typically unchanged).
- Return type:
Notes
Depending on the configuration in rule, this function might: - Save the current state to disk - Log the current state - Perform debugging operations
- pycmor.std_lib.convert_units(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Convert units of a DataArray or Dataset based upon the Data Request Variable you have selected. Automatically handles chemical elements and dimensionless units.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to convert.
rule (Rule) – The rule containing the units to convert to.
- Returns:
The converted data.
- Return type:
- pycmor.std_lib.get_variable(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Extract a variable from a dataset as a DataArray.
- Parameters:
data (xarray.Dataset) – The dataset containing the variable to extract.
rule (Rule) – The rule containing the variable name to extract.
- Returns:
The extracted variable as a DataArray.
- Return type:
- Raises:
KeyError – If the variable specified in the rule does not exist in the dataset.
- pycmor.std_lib.load_data(data: DataArray | Dataset | None, rule: Rule) DataArray | Dataset[source]#
Load data from files according to the rule specification.
This function opens and combines data from multiple files that match the pattern specified in the rule. It’s useful for loading time series data that may be spread across multiple files.
- Parameters:
data (xarray.DataArray or xarray.Dataset or None) – Existing data (if any) to incorporate with loaded data.
rule (Rule) – The rule containing the input patterns and other specifications for loading the data.
- Returns:
The loaded data combined into a single Dataset or DataArray.
- Return type:
Notes
The rule_spec dictionary should contain an
input_patternskey with a list of file patterns to match, e.g., [path/to/data/*.nc].
- pycmor.std_lib.set_global_attributes(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Set global metadata attributes for a Dataset or DataArray.
This function applies standardized global attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to which global attributes will be added.
rule (Rule) – The rule containing the global attribute specifications.
- Returns:
The data with updated global attributes.
- Return type:
- pycmor.std_lib.set_variable_attributes(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Set variable-specific metadata attributes.
This function applies standardized variable attributes to the Dataset or DataArray based on the specifications in the rule, following conventions like CMIP6.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to which variable attributes will be added.
rule (Rule) – The rule containing the variable attribute specifications.
- Returns:
The data with updated variable attributes.
- Return type:
- pycmor.std_lib.show_data(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Print data to screen for inspection and debugging purposes.
This function is useful during development and debugging to inspect the content and structure of DataArrays and Datasets.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to display.
rule (Rule) – The rule containing additional parameters.
- Returns:
The input data (unchanged).
- Return type:
- pycmor.std_lib.temporal_resample(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Resample a DataArray or Dataset to a different temporal frequency.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to resample.
rule (Rule) – The rule containing parameters for the resampling operation, including the frequency for resampling.
- Returns:
The resampled data.
- Return type:
Notes
This function resamples time series data to a different frequency. The frequency is determined from the rule (typically from data_request_variable.frequency). Common frequencies include: - ‘YS’: year start - ‘MS’: month start - ‘D’: daily - ‘H’: hourly
See also
https//docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations
- pycmor.std_lib.time_average(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Compute the time average of a DataArray or Dataset based upon the Data Request Variable you have selected.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data to average.
rule (Rule) – The rule specifying parameters for time averaging, such as the time period or method to use for averaging.
- Returns:
The averaged data.
- Return type:
- pycmor.std_lib.trigger_compute(data: DataArray | Dataset, rule: Rule) DataArray | Dataset[source]#
Trigger computation of lazy (dask-backed) data operations.
This function is useful to ensure that all pending computations are executed before proceeding with the next steps in a pipeline. It’s particularly important before saving data to files.
- Parameters:
data (xarray.DataArray or xarray.Dataset) – The data containing operations to be computed.
rule (Rule) – The rule containing additional parameters for computation.
- Returns:
The computed data with all operations applied.
- Return type:
Submodules#
pycmor.std_lib.dataset_helpers module#
- pycmor.std_lib.dataset_helpers.freq_is_coarser_than_data(freq: str, ds: Dataset, ref_time: Timestamp = Timestamp('1970-01-01 00:00:00')) bool[source]#
Checks if the frequency is coarser than the time frequency of the xarray Dataset.
- Parameters:
freq (str) – The frequency to compare (e.g. ‘M’, ‘D’, ‘6H’).
ds (xr.Dataset) – The dataset containing a time coordinate.
ref_time (pd.Timestamp, optional) – Reference timestamp used to convert frequency to a time delta. Defaults to the beginning of the Unix Epoch.
- Returns:
True if freq is coarser (covers a longer duration) than the dataset’s frequency.
- Return type:
- pycmor.std_lib.dataset_helpers.get_time_label(ds)[source]#
Determines the name of the coordinate in the dataset that can serve as a time label.
- Parameters:
ds (xarray.Dataset) – The dataset containing coordinates to check for a time label.
- Returns:
The name of the coordinate that is a datetime type and can serve as a time label, or None if no such coordinate is found.
- Return type:
str or None
Example
>>> import xarray as xr >>> import pandas as pd >>> import numpy as np >>> ds = xr.Dataset({'time': ('time', pd.date_range('2000-01-01', periods=10))}) >>> get_time_label(ds) 'time' >>> ds = xr.DataArray(np.ones(10), coords={'T': ('T', pd.date_range('2000-01-01', periods=10))}) >>> get_time_label(ds) 'T' >>> # The following does have a valid time coordinate, expected to return None >>> da = xr.Dataset({'time': ('time', [1,2,3,4,5])}) >>> get_time_label(da) is None True
- pycmor.std_lib.dataset_helpers.has_time_axis(ds) bool[source]#
Checks if the given dataset has a time axis.
- Parameters:
ds (xarray.Dataset or xarray.DataArray) – The dataset to check.
- Returns:
True if the dataset has a time axis, False otherwise.
- Return type:
- pycmor.std_lib.dataset_helpers.is_datetime_type(arr: ndarray) bool[source]#
Checks if array elements are datetime objects or cftime objects
- pycmor.std_lib.dataset_helpers.needs_resampling(ds, timespan)[source]#
Checks if a given dataset needs resampling based on its time axis.
- Parameters:
ds (xr.Dataset or xr.DataArray) – The dataset to check.
timespan (str) – The time span for which the dataset is to be resampled. 10YS, 1YS, 6MS, etc.
- Returns:
bool – True if the dataset needs resampling, False otherwise.
Notes
——
After time-averaging step, this function aids in determining if
splitting into multiple files is required based on provided
timespan.
pycmor.std_lib.exceptions module#
This module contains custom exceptions that you should raise when something specific goes wrong in the standard library.
- exception pycmor.std_lib.exceptions.PycmorError[source]#
Bases:
ExceptionBase class for all errors raised by pycmor.
- exception pycmor.std_lib.exceptions.PycmorResamplingError[source]#
Bases:
PycmorErrorError raised when resampling fails.
- exception pycmor.std_lib.exceptions.PycmorResamplingTimeAxisIncompatibilityError[source]#
Bases:
PycmorResamplingError,ValueErrorError raised when resampling fails due to time axis incompatibility.
pycmor.std_lib.files module#
This module contains functions for handling file-related operations in the pycmor package. It includes functions for creating filepaths based on given rules and datasets, and for saving the resulting datasets to the generated filepaths.
Table 2: Precision of time labels used in file names |---------------+-------------------+-----------------------------------------------| | Frequency | Precision of time | Notes | | | label | | |---------------+-------------------+-----------------------------------------------| | yr, dec, | “yyyy” | Label with the years recorded in the first | | yrPt | | and last coordinate values. | |---------------+-------------------+-----------------------------------------------| | mon, monC | “yyyyMM” | For “mon”, label with the months recorded in | | | | the first and last coordinate values; for | | | | “monC” label with the first and last months | | | | contributing to the climatology. | |---------------+-------------------+-----------------------------------------------| | day | “yyyyMMdd” | Label with the days recorded in the first and | | | | last coordinate values. | |---------------+-------------------+-----------------------------------------------| | 6hr, 3hr, | “yyyyMMddhhmm” | Label 1hrCM files with the beginning of the | | 1hr, | | first hour and the end of the last hour | | 1hrCM, 6hrPt, | | contributing to climatology (rounded to the | | 3hrPt, | | nearest minute); for other frequencies in | | 1hrPt | | this category, label with the first and last | | | | time-coordinate values (rounded to the | | | | nearest minute). | |---------------+-------------------+-----------------------------------------------| | subhrPt | “yyyyMMddhhmmss” | Label with the first and last time-coordinate | | | | values (rounded to the nearest second) | |---------------+-------------------+-----------------------------------------------| | fx | Omit time label | This frequency applies to variables that are | | | | independent of time (“fixed”). | |---------------+-------------------+-----------------------------------------------|
- pycmor.std_lib.files._filename_time_range(ds, rule) str[source]#
Determine the time range used in naming the file.
- Parameters:
ds (xarray.Dataset) – The input dataset.
rule (Rule) – The rule object containing information for generating the filepath.
- Returns:
time_range in filepath.
- Return type:
- pycmor.std_lib.files._save_dataset_with_native_timespan(da, rule, time_label, time_encoding, **extra_kwargs)[source]#
- pycmor.std_lib.files.create_filepath(ds, rule)[source]#
Generate a filepath when given an xarray dataset and a rule.
This function generates a filepath for the output file based on the given dataset and rule. The filepath includes the name, table_id, institution, source_id, experiment_id, label, grid, and optionally the start and end time.
- Parameters:
ds (xarray.Dataset) – The input dataset.
rule (Rule) – The rule object containing information for generating the filepath.
- Returns:
The generated filepath.
- Return type:
Notes
The rule object should have the following attributes: cmor_variable, data_request_variable, variant_label, source_id, experiment_id, output_directory, and optionally institution.
- pycmor.std_lib.files.file_timespan_tail(rule)[source]#
Grab the last timestamp in each file and return them as a list. Also account for offset (if any) defined on the rule
- pycmor.std_lib.files.save_dataset(da: DataArray, rule)[source]#
Save dataset to one or more files.
- Parameters:
da (xr.DataArray) – The dataset to be saved.
rule (Rule) – The rule object containing information for generating the filepath.
- Return type:
None
Notes
If the dataset does not have a time axis, or if the time axis is a scalar, this function will save the dataset to a single file. Otherwise, it will split the dataset into chunks based on the time axis and save each chunk to a separate file.
The filepath will be generated based on the rule object and the time range of the dataset. The filepath will include the name, table_id, institution, source_id, experiment_id, label, grid, and optionally the start and end time.
If the dataset needs resampling (i.e., the time axis does not align with the time frequency specified in the rule object), this function will split the dataset into chunks based on the time axis and resample each chunk to the specified frequency. The resampled chunks will then be saved to separate files.
NOTE: prior to calling this function, call dask.compute() method, otherwise tasks will progress very slow.
- pycmor.std_lib.files.split_data_timespan(ds, rule)[source]#
Splits the dataset into chunks based on the time axis as defined in the source files.
- Parameters:
ds (xarray.Dataset) – The dataset to split.
rule (Rule) – The rule object containing information for generating the filepath.
- Returns:
A list of datasets, each containing a chunk of the original dataset.
- Return type:
pycmor.std_lib.generic module#
Generic#
This module, generic.py, provides functionalities for transforming and standardizing NetCDF files according to CMOR.
It contains several functions and classes:
Functions (can be used as actions in Rule objects): - linear_transform: Applies a linear transformation to the data of a NetCDF file. - invert_z_axis: Inverts the z-axis of a NetCDF file.
- The Full CMOR (yes, bad pun):
Applied if no other rule sets are given for a file
Adds CMOR metadata to the file
Converts units
Performs time averaging
- pycmor.std_lib.generic.create_cmor_directories(config: dict) dict[source]#
Creates the directory structure for the CMORized files.
- Parameters:
config (dict) – The pymor configuration dictionary
See also
https//docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit
- pycmor.std_lib.generic.dummy_load_data(data, rule_spec, *args, **kwargs)[source]#
A dummy function for testing. Loads the xarray tutorial data
- pycmor.std_lib.generic.dummy_logic_step(data, rule_spec, *args, **kwargs)[source]#
A dummy function for testing. Prints data to screen and adds a dummy attribute to the data.
- pycmor.std_lib.generic.dummy_save_data(data, rule_spec, *args, **kwargs)[source]#
A dummy function for testing. Saves the data to a netcdf file.
- pycmor.std_lib.generic.dummy_sleep(data, rule_spec, *arg, **kwargs)[source]#
A dummy function for testing. Sleeps for 5 seconds.
- pycmor.std_lib.generic.get_variable(data, rule_spec, *args, **kwargs)[source]#
Gets a particular variable out of a xr.Dataset
- Parameters:
data (xr.Dataset) – Assumes data is a dataset already. No checks are done for this!!
rule_spec (Rule) – Rule describing the DataRequestVariable for this pipeline run
- Return type:
xr.DataArray
- pycmor.std_lib.generic.invert_z_axis(filepath: Path, execute: bool = False, flip_sign: bool = False)[source]#
Inverts the z-axis of a NetCDF file.
- Parameters:
filepath (Path) – Path to the input file.
execute (bool, optional) – If True, the function will execute the inversion. If False, it will only print the changes that would be made.
- pycmor.std_lib.generic.linear_transform(filepath: Path, execute: bool = False, slope: float = 1, offset: float = 0)[source]#
Applies a linear transformation to the data of a NetCDF file.
- pycmor.std_lib.generic.load_data(data, rule_spec, *args, **kwargs)[source]#
Loads data described by the rule_spec.
- pycmor.std_lib.generic.rename_dims(data, rule_spec)[source]#
Renames the dimensions of the array based on the key/values of rule_spec[“model_dim”]
- pycmor.std_lib.generic.resample_monthly(data, rule_spec, *args, **kwargs)[source]#
monthly means per year
- pycmor.std_lib.generic.resample_yearly(data, rule_spec, *args, **kwargs)[source]#
monthly means per year
- pycmor.std_lib.generic.show_data(data, rule_spec, *args, **kwargs)[source]#
Prints data to screen. Useful for debugging
pycmor.std_lib.global_attributes module#
- class pycmor.std_lib.global_attributes.CMIP6GlobalAttributes(drv, cv, rule_dict)[source]#
Bases:
GlobalAttributes- _registry = {}#
- property required_global_attributes#
- class pycmor.std_lib.global_attributes.CMIP7GlobalAttributes[source]#
Bases:
GlobalAttributes- _registry = {}#
pycmor.std_lib.setgrid module#
Set grid information on the data file.
xarray does not have a built-in setgrid operator unlike cdo. Using xarray.merge directly to merge grid with data may or may not produce the desired result all the time.
Some guiding rules to set the grid information:
At least one dimension size in both data file and grid file should match.
If the dimension size match but not the dimension name, then the dimension name in data file is renamed to match the dimension name in grid file.
The matching dimension size must be one of the coordinate variables in both data file and grid file.
If all above conditions are met, then the data file is merged with the grid file.
The coordinate variables and boundary variables (lat_bnds, lon_bnds) from the grid file are kept, while other data variables in grid file are dropped.
The result of the merge is always a xarray.Dataset
Note: Rule 5 is not strict and may go away if it is not desired.
- pycmor.std_lib.setgrid.setgrid(da: Dataset | DataArray, rule: Rule) Dataset | DataArray[source]#
Appends grid information to data file if necessary coordinate dimensions exits in data file. Renames dimensions in data file to match the dimension names in grid file if necessary.
- Parameters:
da (xr.Dataset or xr.DataArray) – The input dataarray or dataset.
rule (Rule object containing gridfile attribute)
- Returns:
The output dataarray or dataset with the grid information.
- Return type:
xr.Dataset
pycmor.std_lib.timeaverage module#
Time Averaging#
This module contains functions for time averaging of data arrays.
The approximate interval for time averaging is prescribed in the CMOR tables,
using the key 'approx_interval'. This information is also provided
within the library.
Functions#
- _get_time_method(frequency: str) -> str:
Determine the time method based on the frequency string from rule.data_request_variable.frequency.
- _frequency_from_approx_interval(interval: str) -> str:
Convert an interval expressed in days to a frequency string.
- timeavg(da: xr.DataArray, rule: Dict) -> xr.DataArray:
Time averages data with respect to time-method (mean/climatology/instant.)
Module Variables#
- _IGNORED_CELL_METHODSlist
List of cell_methods to ignore when calculating time averages.
- pycmor.std_lib.timeaverage._IGNORED_CELL_METHODS = ['area: depth: time: mean', 'area: mean', 'area: mean (comment: over land and sea ice) time: point', 'area: mean time: maximum', 'area: mean time: maximum within days time: mean over days', 'area: mean time: mean within days time: mean over days', 'area: mean time: mean within hours time: maximum over hours', 'area: mean time: mean within years time: mean over years', 'area: mean time: minimum', 'area: mean time: minimum within days time: mean over days', 'area: mean time: point', 'area: mean time: sum', 'area: mean where crops time: maximum', 'area: mean where crops time: maximum within days time: mean over days', 'area: mean where crops time: minimum', 'area: mean where crops time: minimum within days time: mean over days', 'area: mean where grounded_ice_sheet', 'area: mean where ice_free_sea over sea time: mean', 'area: mean where ice_sheet', 'area: mean where land', 'area: mean where land over all_area_types time: mean', 'area: mean where land over all_area_types time: point', 'area: mean where land over all_area_types time: sum', 'area: mean where land time: mean', 'area: mean where land time: mean (with samples weighted by snow mass)', 'area: mean where land time: point', 'area: mean where sea', 'area: mean where sea depth: sum where sea (top 100m only) time: mean', 'area: mean where sea depth: sum where sea time: mean', 'area: mean where sea time: mean', 'area: mean where sea time: point', 'area: mean where sea_ice (comment: mask=siconc) time: point', 'area: mean where sector time: point', 'area: mean where snow over sea_ice area: time: mean where sea_ice', 'area: point', 'area: point time: point', 'area: sum', 'area: sum where ice_sheet time: mean', 'area: sum where sea time: mean', 'area: time: mean', 'area: time: mean (comment: over land and sea ice)', 'area: time: mean where cloud', 'area: time: mean where crops (comment: mask=cropFrac)', 'area: time: mean where floating_ice_shelf (comment: mask=sftflf)', 'area: time: mean where grounded_ice_sheet (comment: mask=sfgrlf)', 'area: time: mean where ice_sheet', 'area: time: mean where natural_grasses (comment: mask=grassFrac)', 'area: time: mean where pastures (comment: mask=pastureFrac)', 'area: time: mean where sea_ice (comment: mask=siconc)', 'area: time: mean where sea_ice (comment: mask=siconca)', 'area: time: mean where sea_ice (comment: mask=siitdconc)', 'area: time: mean where sea_ice_melt_pond (comment: mask=simpconc)', 'area: time: mean where sea_ice_ridges (comment: mask=sirdgconc)', 'area: time: mean where sector', 'area: time: mean where shrubs (comment: mask=shrubFrac)', 'area: time: mean where snow (comment: mask=snc)', 'area: time: mean where trees (comment: mask=treeFrac)', 'area: time: mean where unfrozen_soil', 'area: time: mean where vegetation (comment: mask=vegFrac)', 'longitude: mean time: mean', 'longitude: mean time: point', 'longitude: sum (comment: basin sum [along zig-zag grid path]) depth: sum time: mean', 'time: mean', 'time: mean grid_longitude: mean', 'time: point']#
cell_methods to ignore when calculating time averages
- Type:
- pycmor.std_lib.timeaverage._frequency_from_approx_interval(interval: str)[source]#
Convert an interval expressed in days to a frequency string.
This function takes an interval expressed in days and converts it to a frequency string in a suitable time unit (decade, year, month, day, hour, minute, second, millisecond). The conversion is based on an approximate number of days for each time unit.
- Parameters:
interval (str) – The interval expressed in days.
- Returns:
The frequency string in a suitable time unit.
- Return type:
- Raises:
ValueError – If the interval cannot be converted to a float.
- pycmor.std_lib.timeaverage._get_time_method(frequency: str) str[source]#
Determine the time method based on the frequency string from CMIP6 table for a specific variable (rule.data_request_variable.frequency).
The type of time method influences how the data is processed for time averaging.
- pycmor.std_lib.timeaverage.custom_resample(df, freq='M', offset=0.5, func='mean')[source]#
Resample a DataFrame and place timestamps at a custom offset within each period.
- Parameters:
- Returns:
Resampled DataFrame with adjusted timestamps
- Return type:
DataFrame
Examples
First, set up our imports and random seed:
>>> import numpy as np >>> import pandas as pd >>> rng = np.random.default_rng(42) >>> date_rng = pd.date_range(start="2023-01-01", end="2023-12-31", freq="D") >>> df = pd.DataFrame({"value": rng.random(len(date_rng))}, index=date_rng)
Test mid-month resampling:
>>> df_month_mid = custom_resample(df, freq="ME", offset=0.5) >>> print(df_month_mid.head()) value 2023-01-16 00:00:00 0.565127 2023-02-14 12:00:00 0.484111 2023-03-16 00:00:00 0.434221 2023-04-15 12:00:00 0.510354 2023-05-16 00:00:00 0.443399
Test mid-year resampling:
>>> df_year_mid = custom_resample(df, freq="YE", offset=0.5) >>> print(df_year_mid) value 2023-07-02 0.492457
Test mid-week resampling:
>>> df_week_mid = custom_resample(df, freq="W", offset=0.5) >>> print(df_week_mid.head()) value 2023-01-01 0.773956 2023-01-05 0.658835 2023-01-12 0.540872 2023-01-19 0.488221 2023-01-26 0.500237
Test one-third through each month:
>>> df_month_third = custom_resample(df, freq="ME", offset=1/3) >>> print(df_month_third.head()) value 2023-01-11 00:00:00 0.565127 2023-02-10 00:00:00 0.484111 2023-03-11 00:00:00 0.434221 2023-04-10 16:00:00 0.510354 2023-05-11 00:00:00 0.443399
Test quarter-end resampling:
>>> df_quarter_end = custom_resample(df, freq="QE", offset=1) >>> print(df_quarter_end) value 2023-03-31 0.494832 2023-06-30 0.496207 2023-09-30 0.461806 2023-12-31 0.517077
Test with irregular time series:
>>> irregular_dates = pd.date_range("2023-01-01", periods=100, freq="D").tolist() >>> irregular_dates += pd.date_range("2023-05-01", periods=50, freq="2D").tolist() >>> irregular_dates += pd.date_range("2023-07-01", periods=30, freq="3D").tolist() >>> df_irregular = pd.DataFrame({"value": rng.random(len(irregular_dates))}, index=irregular_dates) >>> df_irregular_month = custom_resample(df_irregular, freq="ME", offset=0.5) >>> print(df_irregular_month.head()) value 2023-01-16 00:00:00 0.543549 2023-02-14 12:00:00 0.485275 2023-03-16 00:00:00 0.513365 2023-04-05 12:00:00 0.558554 2023-05-16 00:00:00 0.447175
- pycmor.std_lib.timeaverage.timeavg(da: DataArray, rule)[source]#
Time averages data with respect to time-method (mean/climatology/instant.)
This function takes a data array and a rule, computes the timespan of the data array, and then performs time averaging based on the time method specified in the rule. The time methods can be
"INSTANTANEOUS","MEAN", or"CLIMATOLOGY".For
"MEAN"time method, the timestamps can be adjusted using theadjust_timestampparameter in the rule dict.This can be either: - A float between 0 and 1 representing the position within each period (e.g., 0.5 for mid-point) - A string preset: “first”/”start” (0.0), “last”/”end” (1.0), “mid”/”middle” (0.5) - A pandas offset string (e.g., “2d” for 2 days offset)
This feature is useful for setting consistent mid-month dates by setting
adjust_timestampto “14d”.- Parameters:
da (xr.DataArray) – The data array to compute the timespan for.
rule (dict) – The rule dict containing the time method and other parameters. For “MEAN” time method, can include ‘adjust_timestamp’ to control timestamp positioning.
- Returns:
The time averaged data array.
- Return type:
xr.DataArray
pycmor.std_lib.units module#
This module deals with the auto-unit conversion in the cmorization process. In case the units in model files differ from CMIP Tables, this module attempts to convert them automatically.
Conversion to-or-from a dimensionless quantity is ambiguous. In this case,
provide a mapping of what this dimensionless quantity represents and that
is used for the conversion. data/dimensionless_mappings.yaml contains some
examples on how the mapping is written.
handle_unit_conversion() is the only function users care about as it handles
the unit conversion of an xr.DataArray according to a Rule. The rest
of the functions in this module are support functions.
- pycmor.std_lib.units._get_units(da: DataArray, rule: Rule) tuple[str, str, str][source]#
Get the units from a DataArray and a Rule.
This function extracts the units from a DataArray and a Rule. If the Rule contains a model_units entry, this takes precedence over the units defined in the dataset. The function also handles dimensionless units by looking up a unit alias in the dimensionless_unit_mappings dictionary of the Rule.
- Parameters:
da (xarray.DataArray) – The DataArray to extract the units from.
rule (dict) – The Rule to extract the units from.
- Returns:
from_unit (str) – The unit of the DataArray.
to_unit (str) – The unit to convert the DataArray to.
to_unit_dimensionless_mapping (str) – The unit alias used for representing the to_unit.
- pycmor.std_lib.units.convert(da: DataArray, from_unit: str, to_unit: str, to_unit_dimensionless_mapping: str | None = None) DataArray[source]#
Convert a DataArray from one unit to another.
This function handles the conversion of a xarray.DataArray from one unit to another, taking into account chemical symbols and scaling factor in units. It uses the pint library for unit conversion and supports aliasing of target units.
- Parameters:
da (xarray.DataArray) – The DataArray to be converted.
from_unit (str) – The unit of the input DataArray.
to_unit (str) – The unit to convert the DataArray to.
to_unit_dimensionless_mapping (str, optional) – An alias for the target unit, if any. Defaults to None.
- Returns:
The converted DataArray with the new unit.
- Return type:
- Raises:
ValueError – If the conversion between the specified units is not possible.
- pycmor.std_lib.units.handle_chemicals(s: str | None = None, pattern: Pattern = re.compile('mol(?P<symbol>\\w+)')) None[source]#
Handle units containing chemical symbols.
If the unit string contains a chemical symbol (e.g. molNaCl), Pint will raise an error because it does not know the definition of the chemical symbol. This function attempts to detect chemical symbols in the unit string and register a unit definition for it with the aid of chemicals package.
- Parameters:
s (str) – The unit string to parse.
pattern (re.Pattern, optional) – The regular expression pattern to use for searching for chemical symbols in the unit string. Defaults to a pattern that matches “mol” followed by any number of word characters.
- Return type:
None
- Raises:
ValueError – If the chemical symbol is not recognized.
See also
periodic_tablePeriodic table of elements
compile
- pycmor.std_lib.units.handle_scalar_units(da: DataArray, from_unit: str, to: str) DataArray[source]#
Convert a DataArray with scalar units from one unit to another.
This function handles the conversion of a xarray.DataArray containing scalar units to another unit. The function uses the pint library for unit conversion. If the initial quantification fails due to an undefined unit, it attempts to assign and quantify the unit manually.
- Parameters:
da (xarray.DataArray) – The DataArray to be converted.
from_unit (str) – The unit of the input DataArray.
to (str) – The unit to convert the DataArray to.
- Returns:
The converted DataArray with the new unit.
- Return type:
- Raises:
ValueError – If the conversion between the specified units is not possible.
- pycmor.std_lib.units.handle_unit_conversion(da: DataArray, rule: Rule) DataArray[source]#
Handle unit conversion of a DataArray according to a Rule.
This function applies the necessary unit conversion to a DataArray based on the units defined in the Rule. It takes into account user-defined units, chemical symbols and dimensionless units.
- Parameters:
da (xarray.DataArray) – The DataArray to be converted.
rule (dict) – The Rule containing the units to convert to.
- Returns:
The converted DataArray with the new unit.
- Return type:
pycmor.std_lib.variable_attributes module#
Pipeline steps to attach metadata attributes to the xarrays