Gridded Data Module
Memory-efficient loading and processing of large gridded datasets.
This module provides automatic tiling, memory monitoring, and caching for processing large external datasets (SNODAS snow data, temperature grids, precipitation, etc.).
Prevents out-of-memory errors on large DEMs by automatically splitting processing into tiles with configurable memory limits.
Core Classes
- class src.terrain.gridded_data.GriddedDataLoader(terrain, cache_dir=None, auto_tile=True, tile_config=None)[source]
Bases:
objectLoad and cache external gridded data with pipeline processing.
This class provides a general framework for: - Loading gridded data from arbitrary formats - Processing data through multi-step pipelines - Caching each pipeline step independently - Smart cache invalidation based on step dependencies
Pipeline format: List of (name, function, kwargs) tuples
Example
>>> def load_data(source, extent, target_shape): ... # Load and crop data ... return {"raw": data_array} >>> >>> def compute_stats(input_data): ... # Compute statistics from previous step ... raw = input_data["raw"] ... return {"mean": raw.mean(), "std": raw.std()} >>> >>> pipeline = [ ... ("load", load_data, {}), ... ("stats", compute_stats, {}), ... ] >>> >>> loader = GriddedDataLoader(terrain, cache_dir=Path(".cache")) >>> result = loader.run_pipeline( ... data_source="/path/to/data", ... pipeline=pipeline, ... cache_name="my_analysis" ... )
Main class for loading and processing gridded data with automatic tiling.
Key features:
Transparent automatic tiling for large datasets
Memory monitoring with failsafe (prevents OOM/thrashing)
Per-step and merged result caching
Smart aggregation (spatial concatenation, statistical averaging)
Usage pattern:
Define data bounds and output shape
Add pipeline steps (load, process, transform)
Execute with automatic tiling if needed
Get cached results on subsequent runs
Example:
from src.terrain.gridded_data import GriddedDataLoader, TiledDataConfig # Configure tiling config = TiledDataConfig( max_output_pixels=4096 * 4096, # ~16M pixels = ~64MB target_tile_outputs=2000, # 2000x2000 tiles enable_memory_monitoring=True, max_memory_percent=85.0 ) # Create loader loader = GriddedDataLoader( bounds=(minx, miny, maxx, maxy), output_shape=(height, width), config=config, cache_dir='.gridded_cache' ) # Add pipeline steps loader.add_step('load', load_snow_data_fn) loader.add_step('process', compute_score_fn) loader.add_step('smooth', smooth_fn) # Execute (automatically tiles if needed) result = loader.execute()
See Snow Integration: Sledding Location Analysis for real-world usage.
- Parameters:
cache_dir (Path)
auto_tile (bool)
tile_config (TiledDataConfig | None)
- __init__(terrain, cache_dir=None, auto_tile=True, tile_config=None)[source]
Initialize gridded data loader.
- Parameters:
terrain – Terrain object (provides extent and resolution)
cache_dir (Path) – Directory for caching (default: .gridded_data_cache)
auto_tile (bool) – Enable automatic tiling when outputs exceed memory threshold (default: True)
tile_config (TiledDataConfig | None) – TiledDataConfig for tiling behavior (uses defaults if None)
- run_pipeline(data_source, pipeline, cache_name, force_reprocess=False)[source]
Execute a processing pipeline with caching at each step.
Features: - Transparent automatic tiling for large outputs - Memory monitoring with failsafe - Per-step and merged result caching
- Parameters:
data_source (Any) – Data source (directory, file list, URL, etc.)
pipeline (List[Tuple[str, Callable, Dict]]) – List of (step_name, function, kwargs) tuples Each function receives previous step’s output as first arg
cache_name (str) – Base name for cache files
force_reprocess (bool) – Force reprocessing all steps even if cached
- Returns:
Output of final pipeline step
- Raises:
MemoryLimitExceeded – If memory limits exceeded during tiling
- Return type:
Configuration
- class src.terrain.gridded_data.TiledDataConfig(max_output_pixels=16777216, target_tile_outputs=2000, halo=0, enable_tile_cache=True, aggregation_strategy='auto', max_memory_percent=85.0, max_swap_percent=50.0, memory_check_interval=5.0, enable_memory_monitoring=True)[source]
Bases:
objectConfiguration for automatic tiling in GriddedDataLoader.
Configuration for automatic tiling behavior.
Key parameters:
max_output_pixels: Tile if output exceeds this (default: 16M pixels)target_tile_outputs: Target tile size (default: 2000×2000)max_memory_percent: Abort if RAM usage exceeds (default: 85%)max_swap_percent: Abort if swap usage exceeds (default: 50%)enable_memory_monitoring: Enable safety checks (default: True)
Example:
# Conservative settings (low memory systems) config = TiledDataConfig( max_output_pixels=2048 * 2048, # 4M pixels target_tile_outputs=1000, # 1000x1000 tiles max_memory_percent=70.0 ) # Aggressive settings (high memory systems) config = TiledDataConfig( max_output_pixels=8192 * 8192, # 64M pixels target_tile_outputs=4000, # 4000x4000 tiles max_memory_percent=90.0 )
- Parameters:
- max_output_pixels: int = 16777216
~16M = ~64MB for float32).
- Type:
Maximum output pixels before triggering tiling (default
- halo: int = 0
0 for gridded data).
- Type:
Halo size for operations needing boundary overlap (default
- aggregation_strategy: str = 'auto'
‘concatenate’, ‘mean’, ‘weighted_mean’, ‘auto’ (default).
- Type:
How to merge tiles
- __init__(max_output_pixels=16777216, target_tile_outputs=2000, halo=0, enable_tile_cache=True, aggregation_strategy='auto', max_memory_percent=85.0, max_swap_percent=50.0, memory_check_interval=5.0, enable_memory_monitoring=True)
Memory Monitoring
- class src.terrain.gridded_data.MemoryMonitor(config)[source]
Bases:
objectMonitor system memory and abort processing if limits exceeded.
Monitors system memory and aborts processing if limits exceeded.
What it monitors:
RAM usage (percent of total)
Swap usage (percent of total)
Available memory (absolute bytes)
When it aborts:
RAM usage >
max_memory_percent(default: 85%)Swap usage >
max_swap_percent(default: 50%)
Requires:
psutilpackage (pip install psutil)Example:
from src.terrain.gridded_data import MemoryMonitor, TiledDataConfig config = TiledDataConfig(max_memory_percent=85.0) monitor = MemoryMonitor(config) # Start monitoring in background thread monitor.start() # Do expensive processing... process_large_data() # Stop monitoring monitor.stop()
- Parameters:
config (TiledDataConfig)
- __init__(config)[source]
Initialize memory monitor.
- Parameters:
config (TiledDataConfig) – TiledDataConfig with memory thresholds
- check_memory(force=False)[source]
Check memory usage and raise MemoryLimitExceeded if over threshold.
- Parameters:
force (bool) – Force check even if check_interval hasn’t elapsed
- Raises:
MemoryLimitExceeded – If memory or swap usage exceeds limits
- Return type:
None
Tile Specification
- class src.terrain.gridded_data.TileSpecGridded(src_slice, out_slice, extent, target_shape)[source]
Bases:
objectTile specification with geographic extent for gridded data.
Specification for a single tile with geographic extent.
Attributes:
src_slice: Slice into source data (with halo padding)out_slice: Slice into output arrayextent: Geographic bounds (minx, miny, maxx, maxy)target_shape: Output shape for this tile (height, width)
- Parameters:
Utility Functions
- src.terrain.gridded_data.downsample_for_viz(arr, max_dim=2000)[source]
Downsample array using stride slicing for visualization.
- Parameters:
- Returns:
Tuple of (downsampled_array, stride_used)
- Return type:
Downsample large grids for faster visualization.
- src.terrain.gridded_data.create_mock_snow_data(shape)[source]
Create mock snow data for testing.
Generates realistic-looking mock snow statistics using statistical distributions that mimic real SNODAS patterns.
- Parameters:
shape (Tuple[int, int]) – Shape of the snow data arrays (height, width)
- Returns:
median_max_depth: Snow depth in mm (gamma distribution)
mean_snow_day_ratio: Fraction of days with snow (beta distribution)
interseason_cv: Year-to-year variability (beta distribution)
mean_intraseason_cv: Within-winter variability (beta distribution)
- Return type:
Dictionary with mock snow statistics
Create mock SNODAS-like data for testing.
Example:
mock_data = create_mock_snow_data(shape=(1000, 1000)) # Returns dict with 'swe', 'depth', 'density' arrays
Performance Notes
Memory efficiency:
Processes tiles independently (only one tile in memory)
Automatic garbage collection between tiles
Memory-mapped caching for large results
Typical tiling overhead:
No tiling: ~0ms overhead
Tiled (4 tiles): ~100-200ms overhead (cache lookups, tile merging)
Tiled (16 tiles): ~500-1000ms overhead
When tiling triggers:
Output pixels >
max_output_pixels(default: 16M)Example: 4096×4096 DEM triggers tiling by default
Cache effectiveness:
First run: Full computation + caching
Subsequent runs: ~100x faster (cache hits only)
Cache invalidation: Automatic on parameter/data changes