Temporal Frequency Inference and Resolution Validation#
This module provides tools to infer the temporal frequency of time coordinates in an xarray Dataset or DataArray, with support for non-standard calendars (e.g. 360_day, noleap), and to validate whether the data has a sufficiently fine temporal resolution for operations like resampling or aggregation, in line with CMIP6 compliance.
Features#
📅 Calendar-aware frequency inference (standard, noleap, 360_day)
🧠 Intelligent fallback if xarray.infer_freq() fails
🛠 xarray accessors for infer_frequency() and check_resolution()
✅ Comparison to CMIP6 approx_interval (e.g. 30.4375 days for monthly)
🔍 Strict mode to detect missing or irregular time steps
🧾 Human-readable logging of inference results
Quick Start#
import xarray as xr
import cftime
from pycmor.core.infer_freq import infer_frequency
# Create a DataArray with 360_day calendar
times = [cftime.Datetime360Day(2000, m, 15) for m in range(1, 5)]
da = xr.DataArray([1, 2, 3, 4], coords={"time": times}, dims="time")
# Simple frequency inference (returns FrequencyResult object)
result = da.timefreq.infer_frequency(log=False)
print(f"Frequency: {result.frequency}") # Output: "M"
# Detailed frequency inference with metadata (returns FrequencyResult)
result = infer_frequency(times, return_metadata=True, calendar="360_day")
print(f"Frequency: {result.frequency}") # "M"
print(f"Delta: {result.delta_days} days") # 30.0 days
print(f"Exact: {result.is_exact}") # True
print(f"Status: {result.status}") # "valid"
# Validate resolution against CMIP6 monthly approx_interval
da.timefreq.check_resolution(target_approx_interval=30.4375)
DataArray Accessor (``da.timefreq``):
# Infer frequency with metadata (returns FrequencyResult object)
result = da.timefreq.infer_frequency(strict=True, calendar="360_day", log=False)
print(result.frequency) # 'M'
print(result.is_exact) # True
print(result.status) # 'valid'
# Check if resolution is fine enough for resampling
check = da.timefreq.check_resolution(target_approx_interval=30.4375)
print(check['is_valid_for_resampling']) # True
# Safe resampling with automatic resolution validation
resampled = da.timefreq.resample_safe(
freq_str="M",
target_approx_interval=30.4375,
calendar="360_day",
method="mean"
)
Dataset Accessor (``ds.timefreq``):
# Infer frequency from dataset's time dimension
info = ds.timefreq.infer_frequency(time_dim="time", log=False)
# Check resolution for entire dataset
check = ds.timefreq.check_resolution(
target_approx_interval=30.4375,
time_dim="time"
)
# Safe dataset resampling
resampled_ds = ds.timefreq.resample_safe(
freq_str="M",
target_approx_interval=30.4375,
time_dim="time",
calendar="360_day",
method="mean"
)
API Reference#
FrequencyResult#
When return_metadata=True, frequency inference functions return a FrequencyResult namedtuple with the following fields:
FrequencyResult = namedtuple('FrequencyResult', [
'frequency', # str or None: inferred frequency string (e.g., 'M', '2D')
'delta_days', # float or None: median time delta in days
'step', # int or None: step multiplier for the frequency
'is_exact', # bool: whether the time series has exact regular spacing
'status' # str: status message ('valid', 'irregular', 'no_match', etc.)
])
Example Usage:
# Get detailed metadata
result = infer_frequency(times, return_metadata=True)
# Access fields by name (much cleaner than tuple unpacking!)
if result.frequency:
print(f"Found {result.frequency} frequency")
print(f"Time delta: {result.delta_days:.2f} days")
print(f"Regular spacing: {result.is_exact}")
print(f"Status: {result.status}")
Status Values#
The status field in FrequencyResult indicates the quality and characteristics of the inferred frequency:
“valid”: Regular time series with exact spacing
“irregular”: Time intervals vary but no clear pattern of missing steps
“missing_steps”: Regular pattern detected but with gaps in the expected sequence
“no_match”: No recognizable frequency pattern found
“too_short”: Time series has fewer than 2 time points
“invalid_input”: Error processing the time values
Examples of Different Status Values:
import cftime
from toypycmor.infer_freq import infer_frequency
# Valid: Perfect monthly spacing
times_valid = [
cftime.Datetime360Day(2000, 1, 15),
cftime.Datetime360Day(2000, 2, 15),
cftime.Datetime360Day(2000, 3, 15)
]
result = infer_frequency(times_valid, return_metadata=True, log=True)
# Status: "valid", Frequency: "M"
# Irregular: Varying intervals but no clear gaps
times_irregular = [
cftime.Datetime360Day(2000, 1, 1),
cftime.Datetime360Day(2000, 1, 20), # 19 days
cftime.Datetime360Day(2000, 2, 15), # 26 days
cftime.Datetime360Day(2000, 3, 10) # 24 days
]
result = infer_frequency(times_irregular, return_metadata=True, log=True)
# Status: "irregular", Frequency: detected pattern with ❌ spacing
# Missing Steps: Regular pattern with gaps (requires strict=True)
times_missing = [
cftime.Datetime360Day(2000, 1, 1), # Day 1
cftime.Datetime360Day(2000, 1, 2), # Day 2
cftime.Datetime360Day(2000, 1, 3), # Day 3
# Missing days 4, 5, 6 (3-day gap!)
cftime.Datetime360Day(2000, 1, 7), # Day 7
cftime.Datetime360Day(2000, 1, 8) # Day 8
]
result = infer_frequency(times_missing, return_metadata=True, strict=True, log=True)
# Status: "missing_steps", Frequency: "D" (daily pattern with gaps)
# Too Short: Insufficient data
times_short = [cftime.Datetime360Day(2000, 1, 1)]
result = infer_frequency(times_short, return_metadata=True, log=True)
# Status: "too_short", Frequency: None
Core Functions#
Accessor Methods#
The following methods are available via xarray accessors:
DataArray Accessor (``da.timefreq``):
- TimeFrequencyAccessor.infer_frequency(strict=False, calendar='standard', log=True, time_dim=None, return_metadata=True)[source]#
Infer time frequency from datetime-like array, returning pandas-style frequency strings.
- Parameters:
strict (bool, optional) – If True, performs additional checks for irregular time series and returns a status message. Defaults to False.
calendar (str, optional) – Calendar type to use for cftime objects. Defaults to “standard”.
log (bool, optional) – If True, logs the results of the frequency check. Defaults to False.
time_dim (str, optional) – Name of the time dimension in the DataArray. If None, automatically detects the time dimension using get_time_label. Defaults to None.
return_metadata (bool, optional) – If True, returns (freq, delta, step, is_exact, status) instead of just the frequency string. Defaults to True.
- Returns:
Inferred frequency string (e.g., ‘M’) or (freq, delta, step, is_exact, status) if return_metadata=True.
- Return type:
- TimeFrequencyAccessor.check_resolution(target_approx_interval, calendar='standard', strict=True, tolerance=0.01, log=True, time_dim=None)[source]#
Check if the time resolution is fine enough for resampling.
- Parameters:
target_approx_interval (float) – Expected interval in days for the target frequency
calendar (str, optional) – Calendar type, by default “standard”
strict (bool, optional) – If True, performs additional checks for irregular time series and returns a status message. Defaults to True.
tolerance (float, optional) – Tolerance for time interval comparison, by default 0.01
log (bool, optional) – If True, logs the results of the frequency check. Defaults to True.
time_dim (str, optional) – Name of the time dimension. If None, automatically detects the time dimension using get_time_label. Defaults to None.
- Returns:
Dictionary containing the inferred interval, comparison status, and validity for resampling.
- Return type:
- TimeFrequencyAccessor.resample_safe(target_approx_interval=None, freq_str=None, calendar='standard', method='mean', time_dim=None, tolerance=0.01, **resample_kwargs)[source]#
Safely resample time series data after checking temporal resolution.
Users can specify the target frequency in two ways: 1. Provide target_approx_interval (float in days) - will be converted to freq_str 2. Provide freq_str (pandas frequency string) - used directly for resampling
If both are provided, freq_str takes precedence for resampling, and target_approx_interval is used for validation.
- Parameters:
target_approx_interval (float, optional) – Expected interval in days for the target frequency. If provided without freq_str, this will be converted to an appropriate frequency string. If provided with freq_str, this is used for validation only.
freq_str (str, optional) – Target frequency string (e.g., ‘M’ for monthly, ‘3H’ for 3-hourly). If provided, this takes precedence for resampling operations.
calendar (str, optional) – Calendar type, by default “standard”
method (str or dict, optional) – Resampling method, by default “mean”
time_dim (str, optional) – Name of the time dimension. If None, automatically detects the time dimension using get_time_label. Defaults to None.
tolerance (float, optional) – Tolerance for time interval comparison, by default 0.01
**resample_kwargs – Additional arguments passed to xarray’s resample
- Returns:
Resampled data
- Return type:
- Raises:
ValueError – If neither target_approx_interval nor freq_str is provided, or if the time resolution is too coarse for the target frequency
Examples
# Using approximate interval (will be converted to frequency string) data.timefreq.resample_safe(target_approx_interval=30.0) # ~monthly
# Using frequency string directly data.timefreq.resample_safe(freq_str=’3M’) # 3-monthly
# Using both (freq_str used for resampling, target_approx_interval for validation) data.timefreq.resample_safe(target_approx_interval=90.0, freq_str=’3M’)
Dataset Accessor (``ds.timefreq``):
- DatasetFrequencyAccessor.infer_frequency(time_dim=None, **kwargs)[source]#
Infer time frequency from datetime-like array, returning pandas-style frequency strings.
- Parameters:
time_dim (str, optional) – Name of the time dimension in the Dataset. If None, automatically detects the time dimension using get_time_label. Defaults to None.
**kwargs – Additional arguments passed to infer_frequency.
- Returns:
Inferred frequency string (e.g., ‘M’) or (freq, delta, step, is_exact, status) if return_metadata=True.
- Return type:
- DatasetFrequencyAccessor.resample_safe(target_approx_interval=None, freq_str=None, time_dim=None, calendar='standard', method='mean', tolerance=0.01, **resample_kwargs)[source]#
Safely resample dataset time series data after checking temporal resolution.
Users can specify the target frequency in two ways: 1. Provide target_approx_interval (float in days) - will be converted to freq_str 2. Provide freq_str (pandas frequency string) - used directly for resampling
If both are provided, freq_str takes precedence for resampling, and target_approx_interval is used for validation.
- Parameters:
target_approx_interval (float, optional) – Expected interval in days for the target frequency. If provided without freq_str, this will be converted to an appropriate frequency string. If provided with freq_str, this is used for validation only.
freq_str (str, optional) – Target frequency string (e.g., ‘M’ for monthly, ‘3H’ for 3-hourly). If provided, this takes precedence for resampling operations.
time_dim (str, optional) – Name of the time dimension. If None, automatically detects the time dimension using get_time_label. Defaults to None.
calendar (str, optional) – Calendar type, by default “standard”
method (str or dict, optional) – Resampling method, by default “mean”
tolerance (float, optional) – Tolerance for time interval comparison, by default 0.01
**resample_kwargs – Additional arguments passed to xarray’s resample
- Returns:
Resampled dataset
- Return type:
- Raises:
ValueError – If neither target_approx_interval nor freq_str is provided, or if the time resolution is too coarse for the target frequency
Examples
# Using approximate interval (will be converted to frequency string) dataset.timefreq.resample_safe(target_approx_interval=30.0) # ~monthly
# Using frequency string directly dataset.timefreq.resample_safe(freq_str=’3M’) # 3-monthly
# Using both (freq_str used for resampling, target_approx_interval for validation) dataset.timefreq.resample_safe(target_approx_interval=90.0, freq_str=’3M’)
Calendar Support#
The following calendars are supported:
standardorgregorian: 365.25 days/yearnoleap: 365 days/year360_day: 360 days/year