Gridded Data Module

Memory-efficient loading and processing of large gridded datasets.

This module provides automatic tiling, memory monitoring, and caching for processing large external datasets (SNODAS snow data, temperature grids, precipitation, etc.).

Prevents out-of-memory errors on large DEMs by automatically splitting processing into tiles with configurable memory limits.

Core Classes

class src.terrain.gridded_data.GriddedDataLoader(terrain, cache_dir=None, auto_tile=True, tile_config=None)[source]

Bases: object

Load and cache external gridded data with pipeline processing.

This class provides a general framework for: - Loading gridded data from arbitrary formats - Processing data through multi-step pipelines - Caching each pipeline step independently - Smart cache invalidation based on step dependencies

Pipeline format: List of (name, function, kwargs) tuples

Example

>>> def load_data(source, extent, target_shape):
...     # Load and crop data
...     return {"raw": data_array}
>>>
>>> def compute_stats(input_data):
...     # Compute statistics from previous step
...     raw = input_data["raw"]
...     return {"mean": raw.mean(), "std": raw.std()}
>>>
>>> pipeline = [
...     ("load", load_data, {}),
...     ("stats", compute_stats, {}),
... ]
>>>
>>> loader = GriddedDataLoader(terrain, cache_dir=Path(".cache"))
>>> result = loader.run_pipeline(
...     data_source="/path/to/data",
...     pipeline=pipeline,
...     cache_name="my_analysis"
... )

Main class for loading and processing gridded data with automatic tiling.

Key features:

Transparent automatic tiling for large datasets
Memory monitoring with failsafe (prevents OOM/thrashing)
Per-step and merged result caching
Smart aggregation (spatial concatenation, statistical averaging)

Usage pattern:

Define data bounds and output shape
Add pipeline steps (load, process, transform)
Execute with automatic tiling if needed
Get cached results on subsequent runs

Example:

from src.terrain.gridded_data import GriddedDataLoader, TiledDataConfig

# Configure tiling
config = TiledDataConfig(
    max_output_pixels=4096 * 4096,  # ~16M pixels = ~64MB
    target_tile_outputs=2000,       # 2000x2000 tiles
    enable_memory_monitoring=True,
    max_memory_percent=85.0
)

# Create loader
loader = GriddedDataLoader(
    bounds=(minx, miny, maxx, maxy),
    output_shape=(height, width),
    config=config,
    cache_dir='.gridded_cache'
)

# Add pipeline steps
loader.add_step('load', load_snow_data_fn)
loader.add_step('process', compute_score_fn)
loader.add_step('smooth', smooth_fn)

# Execute (automatically tiles if needed)
result = loader.execute()

See Snow Integration: Sledding Location Analysis for real-world usage.

Parameters:

cache_dir (Path)
auto_tile (bool)
tile_config (TiledDataConfig | None)

__init__(terrain, cache_dir=None, auto_tile=True, tile_config=None)[source]

Initialize gridded data loader.

Parameters:

terrain – Terrain object (provides extent and resolution)
cache_dir (Path) – Directory for caching (default: .gridded_data_cache)
auto_tile (bool) – Enable automatic tiling when outputs exceed memory threshold (default: True)
tile_config (TiledDataConfig | None) – TiledDataConfig for tiling behavior (uses defaults if None)

run_pipeline(data_source, pipeline, cache_name, force_reprocess=False)[source]

Execute a processing pipeline with caching at each step.

Features: - Transparent automatic tiling for large outputs - Memory monitoring with failsafe - Per-step and merged result caching

Parameters:

data_source (Any) – Data source (directory, file list, URL, etc.)
pipeline (List[Tuple[str, Callable, Dict]]) – List of (step_name, function, kwargs) tuples Each function receives previous step’s output as first arg
cache_name (str) – Base name for cache files
force_reprocess (bool) – Force reprocessing all steps even if cached

Returns:

Output of final pipeline step

Raises:

MemoryLimitExceeded – If memory limits exceeded during tiling

Return type:

Any

Configuration

class src.terrain.gridded_data.TiledDataConfig(max_output_pixels=16777216, target_tile_outputs=2000, halo=0, enable_tile_cache=True, aggregation_strategy='auto', max_memory_percent=85.0, max_swap_percent=50.0, memory_check_interval=5.0, enable_memory_monitoring=True)[source]

Bases: object

Configuration for automatic tiling in GriddedDataLoader.

Configuration for automatic tiling behavior.

Key parameters:

max_output_pixels: Tile if output exceeds this (default: 16M pixels)
target_tile_outputs: Target tile size (default: 2000×2000)
max_memory_percent: Abort if RAM usage exceeds (default: 85%)
max_swap_percent: Abort if swap usage exceeds (default: 50%)
enable_memory_monitoring: Enable safety checks (default: True)

Example:

# Conservative settings (low memory systems)
config = TiledDataConfig(
    max_output_pixels=2048 * 2048,  # 4M pixels
    target_tile_outputs=1000,       # 1000x1000 tiles
    max_memory_percent=70.0
)

# Aggressive settings (high memory systems)
config = TiledDataConfig(
    max_output_pixels=8192 * 8192,  # 64M pixels
    target_tile_outputs=4000,       # 4000x4000 tiles
    max_memory_percent=90.0
)

Parameters:

max_output_pixels (int)
target_tile_outputs (int)
halo (int)
enable_tile_cache (bool)
aggregation_strategy (str)
max_memory_percent (float)
max_swap_percent (float)
memory_check_interval (float)
enable_memory_monitoring (bool)

max_output_pixels: int = 16777216

~16M = ~64MB for float32).

Type:: Maximum output pixels before triggering tiling (default

target_tile_outputs: int = 2000

2000x2000).

Type:: Target output pixels per tile dimension (default

halo: int = 0

0 for gridded data).

Type:: Halo size for operations needing boundary overlap (default

enable_tile_cache: bool = True: Cache individual tiles (vs only final merged result).

aggregation_strategy: str = 'auto'

‘concatenate’, ‘mean’, ‘weighted_mean’, ‘auto’ (default).

Type:: How to merge tiles

max_memory_percent: float = 85.0

85%).

Type:: Maximum RAM usage percent before aborting (default

max_swap_percent: float = 50.0

50%).

Type:: Maximum swap usage percent before aborting (default

memory_check_interval: float = 5.0

5s).

Type:: Seconds between memory checks (default

enable_memory_monitoring: bool = True

True).

Type:: Enable memory monitoring failsafe (default

__init__(max_output_pixels=16777216, target_tile_outputs=2000, halo=0, enable_tile_cache=True, aggregation_strategy='auto', max_memory_percent=85.0, max_swap_percent=50.0, memory_check_interval=5.0, enable_memory_monitoring=True)

Parameters:

max_output_pixels (int)
target_tile_outputs (int)
halo (int)
enable_tile_cache (bool)
aggregation_strategy (str)
max_memory_percent (float)
max_swap_percent (float)
memory_check_interval (float)
enable_memory_monitoring (bool)

Return type:

None

Memory Monitoring

class src.terrain.gridded_data.MemoryMonitor(config)[source]

Bases: object

Monitor system memory and abort processing if limits exceeded.

Monitors system memory and aborts processing if limits exceeded.

What it monitors:

RAM usage (percent of total)
Swap usage (percent of total)
Available memory (absolute bytes)

When it aborts:

RAM usage > max_memory_percent (default: 85%)
Swap usage > max_swap_percent (default: 50%)

Requires: psutil package (pip install psutil)

Example:

from src.terrain.gridded_data import MemoryMonitor, TiledDataConfig

config = TiledDataConfig(max_memory_percent=85.0)
monitor = MemoryMonitor(config)

# Start monitoring in background thread
monitor.start()

# Do expensive processing...
process_large_data()

# Stop monitoring
monitor.stop()

Parameters:: config (TiledDataConfig)

__init__(config)[source]

Initialize memory monitor.

Parameters:: config (TiledDataConfig) – TiledDataConfig with memory thresholds

check_memory(force=False)[source]

Check memory usage and raise MemoryLimitExceeded if over threshold.

Parameters:: force (bool) – Force check even if check_interval hasn’t elapsed
Raises:: MemoryLimitExceeded – If memory or swap usage exceeds limits
Return type:: None

exception src.terrain.gridded_data.MemoryLimitExceeded[source]

Bases: Exception

Raised when memory usage exceeds configured limits.

Prevents system thrashing/OOM by aborting early.

Tile Specification

class src.terrain.gridded_data.TileSpecGridded(src_slice, out_slice, extent, target_shape)[source]

Bases: object

Tile specification with geographic extent for gridded data.

Specification for a single tile with geographic extent.

Attributes:

src_slice: Slice into source data (with halo padding)
out_slice: Slice into output array
extent: Geographic bounds (minx, miny, maxx, maxy)
target_shape: Output shape for this tile (height, width)

Parameters:

src_slice (Tuple[slice, slice])
out_slice (Tuple[slice, slice])
extent (Tuple[float, float, float, float])
target_shape (Tuple[int, int])

src_slice: Tuple[slice, slice]: Slice into source DEM (with halo).

out_slice: Tuple[slice, slice]: Slice into output arrays.

extent: Tuple[float, float, float, float]: Geographic extent (minx, miny, maxx, maxy).

target_shape: Tuple[int, int]: Target output shape for this tile (height, width).

__init__(src_slice, out_slice, extent, target_shape)

Parameters:

src_slice (Tuple[slice, slice])
out_slice (Tuple[slice, slice])
extent (Tuple[float, float, float, float])
target_shape (Tuple[int, int])

Return type:

None

Utility Functions

src.terrain.gridded_data.downsample_for_viz(arr, max_dim=2000)[source]

Downsample array using stride slicing for visualization.

Parameters:

arr (ndarray) – Input array to downsample
max_dim (int) – Maximum dimension size for output

Returns:

Tuple of (downsampled_array, stride_used)

Return type:

Tuple[ndarray, int]

Downsample large grids for faster visualization.

src.terrain.gridded_data.create_mock_snow_data(shape)[source]

Create mock snow data for testing.

Generates realistic-looking mock snow statistics using statistical distributions that mimic real SNODAS patterns.

Parameters:

shape (Tuple[int, int]) – Shape of the snow data arrays (height, width)

Returns:

median_max_depth: Snow depth in mm (gamma distribution)
mean_snow_day_ratio: Fraction of days with snow (beta distribution)
interseason_cv: Year-to-year variability (beta distribution)
mean_intraseason_cv: Within-winter variability (beta distribution)

Return type:

Dictionary with mock snow statistics

Create mock SNODAS-like data for testing.

Example:

mock_data = create_mock_snow_data(shape=(1000, 1000))
# Returns dict with 'swe', 'depth', 'density' arrays

Performance Notes

Memory efficiency:

Processes tiles independently (only one tile in memory)
Automatic garbage collection between tiles
Memory-mapped caching for large results

Typical tiling overhead:

No tiling: ~0ms overhead
Tiled (4 tiles): ~100-200ms overhead (cache lookups, tile merging)
Tiled (16 tiles): ~500-1000ms overhead

When tiling triggers:

Output pixels > max_output_pixels (default: 16M)
Example: 4096×4096 DEM triggers tiling by default

Cache effectiveness:

First run: Full computation + caching
Subsequent runs: ~100x faster (cache hits only)
Cache invalidation: Automatic on parameter/data changes