Temporal Frequency Inference and Resolution Validation#

This module provides tools to infer the temporal frequency of time coordinates in an xarray Dataset or DataArray, with support for non-standard calendars (e.g. 360_day, noleap), and to validate whether the data has a sufficiently fine temporal resolution for operations like resampling or aggregation, in line with CMIP6 compliance.

Features#

  • 📅 Calendar-aware frequency inference (standard, noleap, 360_day)

  • 🧠 Intelligent fallback if xarray.infer_freq() fails

  • 🛠 xarray accessors for infer_frequency() and check_resolution()

  • ✅ Comparison to CMIP6 approx_interval (e.g. 30.4375 days for monthly)

  • 🔍 Strict mode to detect missing or irregular time steps

  • 🧾 Human-readable logging of inference results

Quick Start#

import xarray as xr
import cftime
from pycmor.core.infer_freq import infer_frequency

# Create a DataArray with 360_day calendar
times = [cftime.Datetime360Day(2000, m, 15) for m in range(1, 5)]
da = xr.DataArray([1, 2, 3, 4], coords={"time": times}, dims="time")

# Simple frequency inference (returns FrequencyResult object)
result = da.timefreq.infer_frequency(log=False)
print(f"Frequency: {result.frequency}")  # Output: "M"

# Detailed frequency inference with metadata (returns FrequencyResult)
result = infer_frequency(times, return_metadata=True, calendar="360_day")
print(f"Frequency: {result.frequency}")      # "M"
print(f"Delta: {result.delta_days} days")   # 30.0 days
print(f"Exact: {result.is_exact}")          # True
print(f"Status: {result.status}")           # "valid"

# Validate resolution against CMIP6 monthly approx_interval
da.timefreq.check_resolution(target_approx_interval=30.4375)

DataArray Accessor (``da.timefreq``):

# Infer frequency with metadata (returns FrequencyResult object)
result = da.timefreq.infer_frequency(strict=True, calendar="360_day", log=False)
print(result.frequency)     # 'M'
print(result.is_exact)      # True
print(result.status)        # 'valid'

# Check if resolution is fine enough for resampling
check = da.timefreq.check_resolution(target_approx_interval=30.4375)
print(check['is_valid_for_resampling'])  # True

# Safe resampling with automatic resolution validation
resampled = da.timefreq.resample_safe(
    freq_str="M",
    target_approx_interval=30.4375,
    calendar="360_day",
    method="mean"
)

Dataset Accessor (``ds.timefreq``):

# Infer frequency from dataset's time dimension
info = ds.timefreq.infer_frequency(time_dim="time", log=False)

# Check resolution for entire dataset
check = ds.timefreq.check_resolution(
    target_approx_interval=30.4375,
    time_dim="time"
)

# Safe dataset resampling
resampled_ds = ds.timefreq.resample_safe(
    freq_str="M",
    target_approx_interval=30.4375,
    time_dim="time",
    calendar="360_day",
    method="mean"
)

API Reference#

FrequencyResult#

When return_metadata=True, frequency inference functions return a FrequencyResult namedtuple with the following fields:

FrequencyResult = namedtuple('FrequencyResult', [
    'frequency',      # str or None: inferred frequency string (e.g., 'M', '2D')
    'delta_days',     # float or None: median time delta in days
    'step',           # int or None: step multiplier for the frequency
    'is_exact',       # bool: whether the time series has exact regular spacing
    'status'          # str: status message ('valid', 'irregular', 'no_match', etc.)
])

Example Usage:

# Get detailed metadata
result = infer_frequency(times, return_metadata=True)

# Access fields by name (much cleaner than tuple unpacking!)
if result.frequency:
    print(f"Found {result.frequency} frequency")
    print(f"Time delta: {result.delta_days:.2f} days")
    print(f"Regular spacing: {result.is_exact}")
    print(f"Status: {result.status}")

Status Values#

The status field in FrequencyResult indicates the quality and characteristics of the inferred frequency:

  • “valid”: Regular time series with exact spacing

  • “irregular”: Time intervals vary but no clear pattern of missing steps

  • “missing_steps”: Regular pattern detected but with gaps in the expected sequence

  • “no_match”: No recognizable frequency pattern found

  • “too_short”: Time series has fewer than 2 time points

  • “invalid_input”: Error processing the time values

Examples of Different Status Values:

import cftime
from toypycmor.infer_freq import infer_frequency

# Valid: Perfect monthly spacing
times_valid = [
    cftime.Datetime360Day(2000, 1, 15),
    cftime.Datetime360Day(2000, 2, 15),
    cftime.Datetime360Day(2000, 3, 15)
]
result = infer_frequency(times_valid, return_metadata=True, log=True)
# Status: "valid", Frequency: "M"

# Irregular: Varying intervals but no clear gaps
times_irregular = [
    cftime.Datetime360Day(2000, 1, 1),
    cftime.Datetime360Day(2000, 1, 20),  # 19 days
    cftime.Datetime360Day(2000, 2, 15),  # 26 days
    cftime.Datetime360Day(2000, 3, 10)   # 24 days
]
result = infer_frequency(times_irregular, return_metadata=True, log=True)
# Status: "irregular", Frequency: detected pattern with ❌ spacing

# Missing Steps: Regular pattern with gaps (requires strict=True)
times_missing = [
    cftime.Datetime360Day(2000, 1, 1),   # Day 1
    cftime.Datetime360Day(2000, 1, 2),   # Day 2
    cftime.Datetime360Day(2000, 1, 3),   # Day 3
    # Missing days 4, 5, 6 (3-day gap!)
    cftime.Datetime360Day(2000, 1, 7),   # Day 7
    cftime.Datetime360Day(2000, 1, 8)    # Day 8
]
result = infer_frequency(times_missing, return_metadata=True, strict=True, log=True)
# Status: "missing_steps", Frequency: "D" (daily pattern with gaps)

# Too Short: Insufficient data
times_short = [cftime.Datetime360Day(2000, 1, 1)]
result = infer_frequency(times_short, return_metadata=True, log=True)
# Status: "too_short", Frequency: None

Core Functions#

Accessor Methods#

The following methods are available via xarray accessors:

DataArray Accessor (``da.timefreq``):

TimeFrequencyAccessor.infer_frequency(strict=False, calendar='standard', log=True, time_dim=None, return_metadata=True)[source]#

Infer time frequency from datetime-like array, returning pandas-style frequency strings.

Parameters:
  • strict (bool, optional) – If True, performs additional checks for irregular time series and returns a status message. Defaults to False.

  • calendar (str, optional) – Calendar type to use for cftime objects. Defaults to “standard”.

  • log (bool, optional) – If True, logs the results of the frequency check. Defaults to False.

  • time_dim (str, optional) – Name of the time dimension in the DataArray. If None, automatically detects the time dimension using get_time_label. Defaults to None.

  • return_metadata (bool, optional) – If True, returns (freq, delta, step, is_exact, status) instead of just the frequency string. Defaults to True.

Returns:

Inferred frequency string (e.g., ‘M’) or (freq, delta, step, is_exact, status) if return_metadata=True.

Return type:

str or FrequencyResult

TimeFrequencyAccessor.check_resolution(target_approx_interval, calendar='standard', strict=True, tolerance=0.01, log=True, time_dim=None)[source]#

Check if the time resolution is fine enough for resampling.

Parameters:
  • target_approx_interval (float) – Expected interval in days for the target frequency

  • calendar (str, optional) – Calendar type, by default “standard”

  • strict (bool, optional) – If True, performs additional checks for irregular time series and returns a status message. Defaults to True.

  • tolerance (float, optional) – Tolerance for time interval comparison, by default 0.01

  • log (bool, optional) – If True, logs the results of the frequency check. Defaults to True.

  • time_dim (str, optional) – Name of the time dimension. If None, automatically detects the time dimension using get_time_label. Defaults to None.

Returns:

Dictionary containing the inferred interval, comparison status, and validity for resampling.

Return type:

dict

TimeFrequencyAccessor.resample_safe(target_approx_interval=None, freq_str=None, calendar='standard', method='mean', time_dim=None, tolerance=0.01, **resample_kwargs)[source]#

Safely resample time series data after checking temporal resolution.

Users can specify the target frequency in two ways: 1. Provide target_approx_interval (float in days) - will be converted to freq_str 2. Provide freq_str (pandas frequency string) - used directly for resampling

If both are provided, freq_str takes precedence for resampling, and target_approx_interval is used for validation.

Parameters:
  • target_approx_interval (float, optional) – Expected interval in days for the target frequency. If provided without freq_str, this will be converted to an appropriate frequency string. If provided with freq_str, this is used for validation only.

  • freq_str (str, optional) – Target frequency string (e.g., ‘M’ for monthly, ‘3H’ for 3-hourly). If provided, this takes precedence for resampling operations.

  • calendar (str, optional) – Calendar type, by default “standard”

  • method (str or dict, optional) – Resampling method, by default “mean”

  • time_dim (str, optional) – Name of the time dimension. If None, automatically detects the time dimension using get_time_label. Defaults to None.

  • tolerance (float, optional) – Tolerance for time interval comparison, by default 0.01

  • **resample_kwargs – Additional arguments passed to xarray’s resample

Returns:

Resampled data

Return type:

xarray.DataArray

Raises:

ValueError – If neither target_approx_interval nor freq_str is provided, or if the time resolution is too coarse for the target frequency

Examples

# Using approximate interval (will be converted to frequency string) data.timefreq.resample_safe(target_approx_interval=30.0) # ~monthly

# Using frequency string directly data.timefreq.resample_safe(freq_str=’3M’) # 3-monthly

# Using both (freq_str used for resampling, target_approx_interval for validation) data.timefreq.resample_safe(target_approx_interval=90.0, freq_str=’3M’)

Dataset Accessor (``ds.timefreq``):

DatasetFrequencyAccessor.infer_frequency(time_dim=None, **kwargs)[source]#

Infer time frequency from datetime-like array, returning pandas-style frequency strings.

Parameters:
  • time_dim (str, optional) – Name of the time dimension in the Dataset. If None, automatically detects the time dimension using get_time_label. Defaults to None.

  • **kwargs – Additional arguments passed to infer_frequency.

Returns:

Inferred frequency string (e.g., ‘M’) or (freq, delta, step, is_exact, status) if return_metadata=True.

Return type:

str or FrequencyResult

DatasetFrequencyAccessor.resample_safe(target_approx_interval=None, freq_str=None, time_dim=None, calendar='standard', method='mean', tolerance=0.01, **resample_kwargs)[source]#

Safely resample dataset time series data after checking temporal resolution.

Users can specify the target frequency in two ways: 1. Provide target_approx_interval (float in days) - will be converted to freq_str 2. Provide freq_str (pandas frequency string) - used directly for resampling

If both are provided, freq_str takes precedence for resampling, and target_approx_interval is used for validation.

Parameters:
  • target_approx_interval (float, optional) – Expected interval in days for the target frequency. If provided without freq_str, this will be converted to an appropriate frequency string. If provided with freq_str, this is used for validation only.

  • freq_str (str, optional) – Target frequency string (e.g., ‘M’ for monthly, ‘3H’ for 3-hourly). If provided, this takes precedence for resampling operations.

  • time_dim (str, optional) – Name of the time dimension. If None, automatically detects the time dimension using get_time_label. Defaults to None.

  • calendar (str, optional) – Calendar type, by default “standard”

  • method (str or dict, optional) – Resampling method, by default “mean”

  • tolerance (float, optional) – Tolerance for time interval comparison, by default 0.01

  • **resample_kwargs – Additional arguments passed to xarray’s resample

Returns:

Resampled dataset

Return type:

xarray.Dataset

Raises:

ValueError – If neither target_approx_interval nor freq_str is provided, or if the time resolution is too coarse for the target frequency

Examples

# Using approximate interval (will be converted to frequency string) dataset.timefreq.resample_safe(target_approx_interval=30.0) # ~monthly

# Using frequency string directly dataset.timefreq.resample_safe(freq_str=’3M’) # 3-monthly

# Using both (freq_str used for resampling, target_approx_interval for validation) dataset.timefreq.resample_safe(target_approx_interval=90.0, freq_str=’3M’)

Calendar Support#

The following calendars are supported:

  • standard or gregorian: 365.25 days/year

  • noleap: 365 days/year

  • 360_day: 360 days/year