The Pycmor Cookbook#
A showcase of some more complicated use cases.
If you’d like to contribute with your own recipe, or ask for a recipe, please open a documentation issue on our GitHub repository.
Default Unit conversions#
In this example, we show how to do “default unit conversions”. pycmor can handle some standard cases, assuming
your NetCDF files are sensible. We show two examples:
mmolC/m2/d–>kg m-2 s-2, as infgco2forCMIP6_Omon.json:
"fgco2": {
"frequency": "mon",
"modeling_realm": "ocnBgchem",
"standard_name": "surface_downward_mass_flux_of_carbon_dioxide_expressed_as_carbon",
"units": "kg m-2 s-1",
"cell_methods": "area: mean where sea time: mean",
"cell_measures": "area: areacello",
"long_name": "Surface Downward Mass Flux of Carbon as CO2 [kgC m-2 s-1]",
"comment": "Gas exchange flux of CO2 (positive into ocean)",
"dimensions": "longitude latitude time depth0m",
"out_name": "fgco2",
"type": "real",
"positive": "down",
"valid_min": "",
"valid_max": "",
"ok_min_mean_abs": "",
"ok_max_mean_abs": ""
},
µatm–>Pa, as inspco2forCMIP6_Omon.json:
"spco2": {
"frequency": "mon",
"modeling_realm": "ocnBgchem",
"standard_name": "surface_partial_pressure_of_carbon_dioxide_in_sea_water",
"units": "Pa",
"cell_methods": "area: mean where sea time: mean",
"cell_measures": "area: areacello",
"long_name": "Surface Aqueous Partial Pressure of CO2",
"comment": "The surface called 'surface' means the lower boundary of the atmosphere. The partial pressure of a dissolved gas in sea water is the partial pressure in air with which it would be in equilibrium. The partial pressure of a gaseous constituent of air is the pressure which it alone would exert with unchanged temperature and number of moles per unit volume. The chemical formula for carbon dioxide is CO2.",
"dimensions": "longitude latitude time depth0m",
"out_name": "spco2",
"type": "real",
"positive": "",
"valid_min": "",
"valid_max": "",
"ok_min_mean_abs": "",
"ok_max_mean_abs": ""
},
You can download test data with the provided script:
$ ./download-example-ata.sh
This extracts ten years of example data for the two variables in questions.
In our configuration file, we don’t need to specify anything extra, since the default pipeline can handle these cases.
Dealing with wrong input units#
In this example we show how to deal with incorrect units set on the input data. The motivating example comes from the variables
dissic and talk for oceanic biogeochemistry:
"dissic": {
"standard_name": "mole_concentration_of_dissolved_inorganic_carbon_in_sea_water",
"units": "mol m-3",
},
"talk": {
"standard_name": "sea_water_alkalinity_expressed_as_mole_equivalent",
"units": "mol m-3",
},
These two variables need DIC and Talk, which are saved as bgc02 and bgc03 by REcoM. Unfortunately, the units attribute in the NetCDF
files is wrong, e.g.:
ncdump -h bgc02_3d.nc# float bgc02(time, nodes_3d) ;
bgc02:description = "bgc tracer 02" ;
bgc02:units = "" ;
bgc02:grid_type = "unstructured" ;
bgc02:_FillValue = 1.e+30f ;
The actual input unit in both cases is \(\frac{mmol}{m^{3}}\), i.e. use mult_factor = 1/1e3 to convert from mmol to mol.
Happily enough, pycmor provides an easy way to rectify this without needing to edit the source files with something like ncatted. You
can specify the correct units in the model_unit setting of the rule. Here is an example of how to do that:
1general:
2 name: "AWI-ESM-1-1-lr PI Control"
3 description: "CMOR configuration for AWIESM 1.1 LR"
4 maintainer: "pgierz"
5 email: "pgierz@awi.de"
6 cmor_version: "CMIP6"
7 mip: "CMIP"
8 CMIP_Tables_Dir: "/work/ab0246/a270077/SciComp/Projects/pycmor/cmip6-cmor-tables/Tables"
9 CV_Dir: "/work/ab0246/a270077/SciComp/Projects/pycmor/cmip6-cmor-tables/CMIP6_CVs/"
10pycmor:
11 # parallel: True
12 warn_on_no_rule: False
13 use_flox: True
14 dask_cluster: "slurm"
15 dask_cluster_scaling_mode: fixed
16 fixed_jobs: 1
17 # minimum_jobs: 8
18 # maximum_jobs: 30
19 # You can add your own path to the dimensionless mapping table
20 # If nothing is specified here, it will use the built-in one.
21rules:
22 - name: Dissolved Inorganic Carbon in Seawater
23 description: "dissic from REcoM, showing missing units in NetCDF"
24 inputs:
25 - path: "/work/ab0246/a270077/SciComp/Projects/pymor/examples/03-incorrect-units-in-source-files/model_runs/piControl_LUtrans1850/outdata/recom/"
26 pattern: bgc02_fesom_.*.nc
27 grid_file: /pool/data/AWICM/FESOM1/MESHES/core/griddes.nc
28 mesh_path: /pool/data/AWICM/FESOM1/MESHES/core
29 cmor_variable: dissic
30 model_variable: "bgc02"
31 model_unit: "mmol m-3"
32 output_directory: .
33 variant_label: r1i1p1f1
34 experiment_id: piControl
35 source_id: AWI-CM-1-1-HR
36 model_component: ocnBgchem
37 grid_label: gn
38 - name: Seawater Alkalinity
39 description: "talk from REcoM, showing missing units in NetCDF"
40 inputs:
41 - path: "/work/ab0246/a270077/SciComp/Projects/pymor/examples/03-incorrect-units-in-source-files/model_runs/piControl_LUtrans1850/outdata/recom/"
42 pattern: bgc03_fesom_.*.nc
43 grid_file: /pool/data/AWICM/FESOM1/MESHES/core/griddes.nc
44 mesh_path: /pool/data/AWICM/FESOM1/MESHES/core
45 cmor_variable: talk
46 model_variable: "bgc03"
47 model_unit: "mmol m-3"
48 output_directory: .
49 variant_label: r1i1p1f1
50 experiment_id: piControl
51 source_id: AWI-CM-1-1-HR
52 model_component: ocnBgchem
53 grid_label: gn
54distributed:
55 worker:
56 memory:
57 target: 0.6 # Target 60% of worker memory usage
58 spill: 0.7 # Spill to disk when 70% of memory is used
59 pause: 0.8 # Pause workers if memory usage exceeds 80%
60 terminate: 0.95 # Terminate workers at 95% memory usage
61 resources:
62 CPU: 4 # Assign 4 CPUs per worker
63 death-timeout: 600 # Worker timeout if no heartbeat (seconds): Keep workers alive for 5 minutes
64# SLURM-specific settings for launching workers
65jobqueue:
66 slurm:
67 name: pycmor-worker
68 queue: compute # SLURM queue/partition to submit jobs
69 account: ab0246 # SLURM project/account name
70 cores: 4 # Number of cores per worker
71 memory: 128GB # Memory per worker
72 walltime: '00:30:00' # Maximum walltime per job
73 interface: ib0 # Network interface for communication
74 job-extra-directives: # Additional SLURM job options
75 - '--exclusive' # Run on exclusive nodes
76 - '--nodes=1'
77 # Worker template
78 worker-extra:
79 - "--nthreads"
80 - 4
81 - "--memory-limit"
82 - "128GB"
83 - "--lifetime"
84 - "25m"
85 - "--lifetime-stagger"
86 - "4m"
87 # How to launch workers and scheduler
88 job-cpu: 128
89 job-mem: 256GB
90 # worker-command: dask-worker
91 processes: 4 # Limited by memory per worker!
92 # scheduler-command: dask-scheduler
Multivariable Inputs and vertical integration#
Here we show how use two input variables and perform a
vertical integration. The motivating example is from the
CMIP6_Omon.json table:
"intpp": { "standard_name": "net_primary_mole_productivity_of_biomass_expressed_as_carbon_by_phytoplankton", "units": "mol m-2 s-1", "long_name": "Primary Organic Carbon Production by All Types of Phytoplankton", "comment": "Vertically integrated total primary (organic carbon) production by phytoplankton. This should equal the sum of intpdiat+intpphymisc, but those individual components may be unavailable in some models.", }
In the AWI-ESM-1-REcoM case two phytoplanktion groups
contribute to primary production:
nanophytoplankton (variable
diags3d01)diatoms (variable
diags3d02)
Hence, the workflow to get implemented would be:
In pseudo-code:
pp = diags3d01 + diags3d03
pp.sum(dim="depth")
mmol_mol = 1e-3
per_day_to_per_sec = 1/86400
conversion_factor_mmolC_m2_d_to_molC_m2_s = 1/1e3/86400 # About 1.157407e-08
We implement custom steps for the addition and vertical integration.
Preparing the example
You should run this example on DKRZ Levante. Start by downloading the sample data:
$ ./download-example-data.sh
This will extract 10 years of example data for the two variables in question.
Next, we define some custom functions in intpp_recom.py:
1def add_pp_components(data, rule):
2 """Adds together Net Primary Production components"""
3 # This is a USER FUNCTION and therefore you need some human brainpower and knowledge
4 # of your model in order to know what you want to do.
5
6 data["pp"] = data["diags3d01"] + data["diags3d02"]
7 # This return is mandatory for the pipeline to usefully continue
8 # We return our new variable to be further processed:
9 return data["pp"] # Return type: DataArray!
10
11
12def set_pp_units(data, rule):
13 """Corrects missing units on PP"""
14 # Again, a user function. Insider knowledge of REcoM is needed
15 data["pp"].attrs["units"] = "molC m-2 day-1"
16 # Actually, data is in mmol, so we do a "by hand" conversion here to get the mol correct
17 data["pp"] *= 1e-6
18 return data
19
20
21def vertical_integration(data, rule):
22 data = data.sum(dim="depth")
23 return data
24
25
26def manual_breakpoint(data, rule):
27 breakpoint()
28 return data
Working with Dimensionless Units#
Problem#
You need to work with variables that have ambiguous dimensionless units in CMIP6, such as:
Pure dimensionless quantities (unit: “1”)
Percentage values (unit: “%”)
Ratios (e.g., “kg kg-1”)
Salinity values (unit: “0.001”)
Parts-per-million concentrations (unit: “1e-06”)
And you need these units to be properly converted or recognized by Pycmor. In essence, we need to tell Pycmor more about the physical meaning of the units so that it can convert from one unit to another arbitrarily. An example is a mass ratio of 0.001. If mass is not specified that is ambiguous because it could also be, for example, a volume ratio. The ratios of mass and volume are different depending on the density. To help Pycmor to convert to 0.001 (mass ratio), we need to tell Pycmor that it is a mass ratio and we do this by indicating that 0.001 means g/kg or 10-3 kg kg-1.
Solution#
First, check if the variable exists in the dimensionless mappings file. The file is typically located at:
<your_pycmor_installation>/src/pycmor/data/dimensionless_mappings.yamlFor example:
/Users/username/Codes/pycmor/src/pycmor/data/dimensionless_mappings.yamlOpen this file and search for your variable name (e.g., “sisali”) to see if it already exists.
If your variable exists but has an empty mapping (or if it’s missing entirely), you need to add the appropriate mapping:
# For salinity variables (with unit "0.001") sisali: # sea_ice_salinity "0.001": g/kg # For pure dimensionless variables (with unit "1") abs550aer: # atmosphere_absorption_optical_thickness_due_to_aerosol_particles "1": # For variables with mole fraction units co2: # mole_fraction_of_carbon_dioxide_in_air "1e-06":
If you have added a new mapping, you can now use it in your regular Pycmor workflow. The cmorize function will automatically use the dimensionless mapping to interpret and convert the units correctly.
To contribute your dimensionless mappings back to the Pycmor repository:
Fork the Pycmor repository on GitHub: esm-tools/pycmor
Clone your fork and create a branch for your changes
Update the dimensionless_mappings.yaml file with your additions/corrections
Commit your changes with a descriptive message explaining the mappings you’ve added
Push your changes and create a pull request to the main repository
Your contributions help improve Pycmor for the entire climate science community!