The Pycmor Cookbook#

A showcase of some more complicated use cases.

If you’d like to contribute with your own recipe, or ask for a recipe, please open a documentation issue on our GitHub repository.

Default Unit conversions#

In this example, we show how to do “default unit conversions”. pycmor can handle some standard cases, assuming your NetCDF files are sensible. We show two examples:

  1. mmolC/m2/d –> kg m-2 s-2, as in fgco2 for CMIP6_Omon.json:

"fgco2": {
    "frequency": "mon",
    "modeling_realm": "ocnBgchem",
    "standard_name": "surface_downward_mass_flux_of_carbon_dioxide_expressed_as_carbon",
    "units": "kg m-2 s-1",
    "cell_methods": "area: mean where sea time: mean",
    "cell_measures": "area: areacello",
    "long_name": "Surface Downward Mass Flux of Carbon as CO2 [kgC m-2 s-1]",
    "comment": "Gas exchange flux of CO2 (positive into ocean)",
    "dimensions": "longitude latitude time depth0m",
    "out_name": "fgco2",
    "type": "real",
    "positive": "down",
    "valid_min": "",
    "valid_max": "",
    "ok_min_mean_abs": "",
    "ok_max_mean_abs": ""
},
  1. µatm –> Pa, as in spco2 for CMIP6_Omon.json:

"spco2": {
    "frequency": "mon",
    "modeling_realm": "ocnBgchem",
    "standard_name": "surface_partial_pressure_of_carbon_dioxide_in_sea_water",
    "units": "Pa",
    "cell_methods": "area: mean where sea time: mean",
    "cell_measures": "area: areacello",
    "long_name": "Surface Aqueous Partial Pressure of CO2",
    "comment": "The surface called 'surface' means the lower boundary of the atmosphere. The partial pressure of a dissolved gas in sea water is the partial pressure in air with which it would be in equilibrium. The partial pressure of a gaseous constituent of air is the pressure which it alone would exert with unchanged temperature and number of moles per unit volume. The chemical formula for carbon dioxide is CO2.",
    "dimensions": "longitude latitude time depth0m",
    "out_name": "spco2",
    "type": "real",
    "positive": "",
    "valid_min": "",
    "valid_max": "",
    "ok_min_mean_abs": "",
    "ok_max_mean_abs": ""
},

You can download test data with the provided script:

$ ./download-example-ata.sh

This extracts ten years of example data for the two variables in questions.

In our configuration file, we don’t need to specify anything extra, since the default pipeline can handle these cases.

Dealing with wrong input units#

In this example we show how to deal with incorrect units set on the input data. The motivating example comes from the variables dissic and talk for oceanic biogeochemistry:

cmip6-cmor-tables/Tables/CMIP6_Omon.json#
    "dissic": {
        "standard_name": "mole_concentration_of_dissolved_inorganic_carbon_in_sea_water",
        "units": "mol m-3",
    },
    "talk": {
        "standard_name": "sea_water_alkalinity_expressed_as_mole_equivalent",
        "units": "mol m-3",
    },

These two variables need DIC and Talk, which are saved as bgc02 and bgc03 by REcoM. Unfortunately, the units attribute in the NetCDF files is wrong, e.g.:

Excerpt of ncdump -h bgc02_3d.nc#
 float bgc02(time, nodes_3d) ;
         bgc02:description = "bgc tracer 02" ;
         bgc02:units = "" ;
         bgc02:grid_type = "unstructured" ;
         bgc02:_FillValue = 1.e+30f ;

The actual input unit in both cases is \(\frac{mmol}{m^{3}}\), i.e. use mult_factor = 1/1e3 to convert from mmol to mol.

Happily enough, pycmor provides an easy way to rectify this without needing to edit the source files with something like ncatted. You can specify the correct units in the model_unit setting of the rule. Here is an example of how to do that:

 1general:
 2  name: "AWI-ESM-1-1-lr PI Control"
 3  description: "CMOR configuration for AWIESM 1.1 LR"
 4  maintainer: "pgierz"
 5  email: "pgierz@awi.de"
 6  cmor_version: "CMIP6"
 7  mip: "CMIP"
 8  CMIP_Tables_Dir: "/work/ab0246/a270077/SciComp/Projects/pycmor/cmip6-cmor-tables/Tables"
 9  CV_Dir: "/work/ab0246/a270077/SciComp/Projects/pycmor/cmip6-cmor-tables/CMIP6_CVs/"
10pycmor:
11  # parallel: True
12  warn_on_no_rule: False
13  use_flox: True
14  dask_cluster: "slurm"
15  dask_cluster_scaling_mode: fixed
16  fixed_jobs: 1
17  # minimum_jobs: 8
18  # maximum_jobs: 30
19  # You can add your own path to the dimensionless mapping table
20  # If nothing is specified here, it will use the built-in one.
21rules:
22  - name: Dissolved Inorganic Carbon in Seawater
23    description: "dissic from REcoM, showing missing units in NetCDF"
24    inputs:
25      - path: "/work/ab0246/a270077/SciComp/Projects/pymor/examples/03-incorrect-units-in-source-files/model_runs/piControl_LUtrans1850/outdata/recom/"
26        pattern: bgc02_fesom_.*.nc
27    grid_file: /pool/data/AWICM/FESOM1/MESHES/core/griddes.nc
28    mesh_path: /pool/data/AWICM/FESOM1/MESHES/core
29    cmor_variable: dissic
30    model_variable: "bgc02"
31    model_unit: "mmol m-3"
32    output_directory: .
33    variant_label: r1i1p1f1
34    experiment_id: piControl
35    source_id: AWI-CM-1-1-HR
36    model_component: ocnBgchem
37    grid_label: gn
38  - name: Seawater Alkalinity
39    description: "talk from REcoM, showing missing units in NetCDF"
40    inputs:
41      - path: "/work/ab0246/a270077/SciComp/Projects/pymor/examples/03-incorrect-units-in-source-files/model_runs/piControl_LUtrans1850/outdata/recom/"
42        pattern: bgc03_fesom_.*.nc
43    grid_file: /pool/data/AWICM/FESOM1/MESHES/core/griddes.nc
44    mesh_path: /pool/data/AWICM/FESOM1/MESHES/core
45    cmor_variable: talk
46    model_variable: "bgc03"
47    model_unit: "mmol m-3"
48    output_directory: .
49    variant_label: r1i1p1f1
50    experiment_id: piControl
51    source_id: AWI-CM-1-1-HR
52    model_component: ocnBgchem
53    grid_label: gn
54distributed:
55  worker:
56    memory:
57      target: 0.6 # Target 60% of worker memory usage
58      spill: 0.7 # Spill to disk when 70% of memory is used
59      pause: 0.8 # Pause workers if memory usage exceeds 80%
60      terminate: 0.95 # Terminate workers at 95% memory usage
61    resources:
62      CPU: 4 # Assign 4 CPUs per worker
63    death-timeout: 600 # Worker timeout if no heartbeat (seconds): Keep workers alive for 5 minutes
64# SLURM-specific settings for launching workers
65jobqueue:
66  slurm:
67    name: pycmor-worker
68    queue: compute # SLURM queue/partition to submit jobs
69    account: ab0246 # SLURM project/account name
70    cores: 4 # Number of cores per worker
71    memory: 128GB # Memory per worker
72    walltime: '00:30:00' # Maximum walltime per job
73    interface: ib0 # Network interface for communication
74    job-extra-directives: # Additional SLURM job options
75      - '--exclusive' # Run on exclusive nodes
76      - '--nodes=1'
77    # Worker template
78    worker-extra:
79      - "--nthreads"
80      - 4
81      - "--memory-limit"
82      - "128GB"
83      - "--lifetime"
84      - "25m"
85      - "--lifetime-stagger"
86      - "4m"
87    # How to launch workers and scheduler
88    job-cpu: 128
89    job-mem: 256GB
90    # worker-command: dask-worker
91    processes: 4 # Limited by memory per worker!
92    # scheduler-command: dask-scheduler

Multivariable Inputs and vertical integration#

Here we show how use two input variables and perform a vertical integration. The motivating example is from the CMIP6_Omon.json table:

   "intpp": {
       "standard_name": "net_primary_mole_productivity_of_biomass_expressed_as_carbon_by_phytoplankton",
       "units": "mol m-2 s-1",
       "long_name": "Primary Organic Carbon Production by All Types of Phytoplankton",
       "comment": "Vertically integrated total primary (organic carbon) production by phytoplankton.  This should equal the sum of intpdiat+intpphymisc, but those individual components may be unavailable in some models.",
}

In the AWI-ESM-1-REcoM case two phytoplanktion groups contribute to primary production:

  1. nanophytoplankton (variable diags3d01)

  2. diatoms (variable diags3d02)

Hence, the workflow to get implemented would be:

\[\text{NPP} = \int_{\text{bottom}}^{\text{top}} (\text{NPP}_{\text{nanopythoplankton}} + \text{NPP}_{\text{diatoms}})\]

In pseudo-code:

pp = diags3d01 + diags3d03
pp.sum(dim="depth")
mmol_mol = 1e-3
per_day_to_per_sec = 1/86400
conversion_factor_mmolC_m2_d_to_molC_m2_s = 1/1e3/86400  # About 1.157407e-08

We implement custom steps for the addition and vertical integration.

Preparing the example

You should run this example on DKRZ Levante. Start by downloading the sample data:

$ ./download-example-data.sh

This will extract 10 years of example data for the two variables in question.

Next, we define some custom functions in intpp_recom.py:

 1def add_pp_components(data, rule):
 2    """Adds together Net Primary Production components"""
 3    # This is a USER FUNCTION and therefore you need some human brainpower and knowledge
 4    # of your model in order to know what you want to do.
 5
 6    data["pp"] = data["diags3d01"] + data["diags3d02"]
 7    # This return is mandatory for the pipeline to usefully continue
 8    # We return our new variable to be further processed:
 9    return data["pp"]  # Return type: DataArray!
10
11
12def set_pp_units(data, rule):
13    """Corrects missing units on PP"""
14    # Again, a user function. Insider knowledge of REcoM is needed
15    data["pp"].attrs["units"] = "molC m-2 day-1"
16    # Actually, data is in mmol, so we do a "by hand" conversion here to get the mol correct
17    data["pp"] *= 1e-6
18    return data
19
20
21def vertical_integration(data, rule):
22    data = data.sum(dim="depth")
23    return data
24
25
26def manual_breakpoint(data, rule):
27    breakpoint()
28    return data

Working with Dimensionless Units#

Problem#

You need to work with variables that have ambiguous dimensionless units in CMIP6, such as:

  • Pure dimensionless quantities (unit: “1”)

  • Percentage values (unit: “%”)

  • Ratios (e.g., “kg kg-1”)

  • Salinity values (unit: “0.001”)

  • Parts-per-million concentrations (unit: “1e-06”)

And you need these units to be properly converted or recognized by Pycmor. In essence, we need to tell Pycmor more about the physical meaning of the units so that it can convert from one unit to another arbitrarily. An example is a mass ratio of 0.001. If mass is not specified that is ambiguous because it could also be, for example, a volume ratio. The ratios of mass and volume are different depending on the density. To help Pycmor to convert to 0.001 (mass ratio), we need to tell Pycmor that it is a mass ratio and we do this by indicating that 0.001 means g/kg or 10-3 kg kg-1.

Solution#

  1. First, check if the variable exists in the dimensionless mappings file. The file is typically located at:

    <your_pycmor_installation>/src/pycmor/data/dimensionless_mappings.yaml

    For example: /Users/username/Codes/pycmor/src/pycmor/data/dimensionless_mappings.yaml

    Open this file and search for your variable name (e.g., “sisali”) to see if it already exists.

  2. If your variable exists but has an empty mapping (or if it’s missing entirely), you need to add the appropriate mapping:

    # For salinity variables (with unit "0.001")
    sisali:  # sea_ice_salinity
      "0.001": g/kg
    
    # For pure dimensionless variables (with unit "1")
    abs550aer:  # atmosphere_absorption_optical_thickness_due_to_aerosol_particles
      "1":
    
    # For variables with mole fraction units
    co2:  # mole_fraction_of_carbon_dioxide_in_air
      "1e-06":
    
  3. If you have added a new mapping, you can now use it in your regular Pycmor workflow. The cmorize function will automatically use the dimensionless mapping to interpret and convert the units correctly.

  4. To contribute your dimensionless mappings back to the Pycmor repository:

    1. Fork the Pycmor repository on GitHub: esm-tools/pycmor

    2. Clone your fork and create a branch for your changes

    3. Update the dimensionless_mappings.yaml file with your additions/corrections

    4. Commit your changes with a descriptive message explaining the mappings you’ve added

    5. Push your changes and create a pull request to the main repository

    Your contributions help improve Pycmor for the entire climate science community!