Cache Module
Caching system for efficient terrain processing pipeline.
This module provides three caching systems to avoid reloading and reprocessing expensive operations: DEM loading, transforms, and full pipeline stages.
All caches use hash-based validation to automatically invalidate when source data changes.
DEMCache
- class src.terrain.cache.DEMCache(cache_dir=None, enabled=True)[source]
Bases:
objectManages caching of loaded and merged DEM data with hash validation.
The cache stores: - DEM array as .npz file - Metadata including file hash, timestamp, and file list
- cache_dir
Directory where cache files are stored
- enabled
Whether caching is enabled
Caches loaded and merged DEM data with hash validation.
What it caches:
Merged DEM arrays (.npz files)
Affine transforms
Source file metadata (paths, modification times)
When cache is invalidated:
Files are added/removed from source directory
Files are modified (based on mtime)
Directory path changes
Example:
from src.terrain.cache import DEMCache cache = DEMCache(cache_dir='.dem_cache', enabled=True) # Load with caching dem, transform = cache.get_or_load( directory_path='data/hgt', pattern='*.hgt', load_func=lambda: load_dem_files('data/hgt') )
- compute_source_hash(directory_path, pattern, recursive=False)[source]
Compute hash of source DEM files based on paths and modification times.
This ensures the cache is invalidated if: - Files are added/removed - Files are modified - Directory path changes
- save_cache(dem_array, transform, source_hash, cache_name='dem')[source]
Save DEM array and transform to cache.
TransformCache
- class src.terrain.cache.TransformCache(cache_dir=None, enabled=True)[source]
Bases:
objectCache for transform pipeline results with dependency tracking.
Tracks chains of transforms (reproject -> smooth -> water_detect) and computes cache keys that incorporate the full dependency chain, ensuring downstream caches are invalidated when upstream params change.
- cache_dir
Directory where cache files are stored
- enabled
Whether caching is enabled
- dependencies
Graph of transform dependencies
- transforms
Registered transforms with their parameters
Caches transformed raster data (reprojection, smoothing, etc.).
What it caches:
Transformed DEM arrays
Transform parameters (for validation)
Intermediate processing results
Cache key components:
Transform name (e.g., ‘reproject’, ‘smooth’)
Parameter dictionary (all args)
Input data hash
Example:
from src.terrain.cache import TransformCache cache = TransformCache(cache_dir='.transform_cache') # Cache expensive transform result = cache.get_or_compute( 'wavelet_denoise', compute_fn=lambda: wavelet_denoise_dem(dem, wavelet='db4'), params={'wavelet': 'db4', 'levels': 3} )
See Transforms Module for usage with transform functions.
- compute_transform_hash(upstream_hash, transform_name, params)[source]
Compute cache key from upstream hash and transform parameters.
The key incorporates: - Upstream cache key (propagating the full dependency chain) - Transform name - All transform parameters (sorted for determinism)
- save_transform(cache_key, data, transform_name, metadata=None)[source]
Save transform result to cache.
PipelineCache
- class src.terrain.cache.PipelineCache(cache_dir=None, enabled=True)[source]
Bases:
objectTarget-style caching for terrain processing pipelines.
Like a build system (Make, Bazel), this cache: - Tracks targets with defined parameters and dependencies - Computes cache keys that incorporate the FULL dependency chain - Ensures downstream targets are invalidated when upstream changes - Supports file inputs with mtime tracking
Example
cache = PipelineCache() cache.define_target(“dem_loaded”, params={“path”: “/data”}) cache.define_target(“reprojected”, params={“crs”: “EPSG:32617”},
dependencies=[“dem_loaded”])
# First run: cache miss if cache.get_cached(“reprojected”) is None:
data = expensive_operation() cache.save_target(“reprojected”, data)
# Second run (same params): cache hit # If dem_loaded params change: cache miss (invalidated)
- cache_dir
Directory where cache files are stored
- enabled
Whether caching is enabled
- targets
Dict of target definitions {name: {params, dependencies, file_inputs}}
High-level caching for full terrain processing pipelines.
What it caches:
Complete pipeline stage outputs
Multi-step processing chains
Score computations
Pipeline stages:
load: DEM loading and mergingtransform: Geographic transforms (reproject, flip, downsample)smooth: DEM smoothing operationsscores: Score grid computationswater: Water body detectionmesh: Final mesh generation
Example:
from src.terrain.cache import PipelineCache cache = PipelineCache(cache_dir='.pipeline_cache') # Cache pipeline stage smoothed_dem = cache.get_or_run_stage( stage='smooth', compute_fn=lambda: run_smoothing_pipeline(dem), params={ 'wavelet': True, 'adaptive': True, 'bilateral': True } )
Used in Combined Render: Full-Featured Example to avoid reprocessing expensive operations.
- define_target(name, params, dependencies=None, file_inputs=None)[source]
Define a pipeline target with its parameters and dependencies.
- Parameters:
- Raises:
ValueError – If adding this target would create a circular dependency
- Return type:
None
- compute_target_key(target_name)[source]
Compute cache key for a target, incorporating all upstream dependencies.
The key is a SHA256 hash that changes if: - Target’s own params change - Any upstream target’s params change - Any file inputs are modified
Cache Management
Cache directory structure:
``` .dem_cache/ ├── dem_abc123.npz # Cached DEM data └── dem_abc123.json # Metadata
.transform_cache/ ├── reproject_def456.npz # Cached transform results └── reproject_def456.json # Transform params
.pipeline_cache/ ├── stage_smooth_ghi789.npz └── stage_smooth_ghi789.json ```
Clearing cache:
`python
# Clear all caches
import shutil
shutil.rmtree('.dem_cache')
shutil.rmtree('.transform_cache')
shutil.rmtree('.pipeline_cache')
`
Cache validation:
All caches automatically validate: - Source file changes (modification time) - Parameter changes - Data shape/dtype changes
Invalid cache entries are automatically regenerated.
Performance Notes
Typical speedups:
DEM loading: 50-100x faster (0.1s vs 5-10s for large merges)
Transforms: 10-50x faster (depends on complexity)
Pipeline stages: 100-1000x faster for multi-stage pipelines
Cache overhead:
Hash computation: ~10-50ms per cache lookup
Disk I/O: ~50-200ms for large DEMs
Memory: Minimal (uses memory-mapped arrays when possible)
When to use caching:
Development/iteration (same data, different parameters)
Batch processing (same DEM, multiple visualizations)
Testing (avoid reloading data between test runs)
When NOT to use caching:
One-off renders (cache overhead > compute time)
Rapidly changing source data
Limited disk space