Cache Module ============ Caching system for efficient terrain processing pipeline. This module provides three caching systems to avoid reloading and reprocessing expensive operations: DEM loading, transforms, and full pipeline stages. All caches use hash-based validation to automatically invalidate when source data changes. DEMCache -------- .. autoclass:: src.terrain.cache.DEMCache :members: :undoc-members: Caches loaded and merged DEM data with hash validation. **What it caches:** - Merged DEM arrays (.npz files) - Affine transforms - Source file metadata (paths, modification times) **When cache is invalidated:** - Files are added/removed from source directory - Files are modified (based on mtime) - Directory path changes Example:: from src.terrain.cache import DEMCache cache = DEMCache(cache_dir='.dem_cache', enabled=True) # Load with caching dem, transform = cache.get_or_load( directory_path='data/hgt', pattern='*.hgt', load_func=lambda: load_dem_files('data/hgt') ) TransformCache -------------- .. autoclass:: src.terrain.cache.TransformCache :members: :undoc-members: Caches transformed raster data (reprojection, smoothing, etc.). **What it caches:** - Transformed DEM arrays - Transform parameters (for validation) - Intermediate processing results **Cache key components:** - Transform name (e.g., 'reproject', 'smooth') - Parameter dictionary (all args) - Input data hash Example:: from src.terrain.cache import TransformCache cache = TransformCache(cache_dir='.transform_cache') # Cache expensive transform result = cache.get_or_compute( 'wavelet_denoise', compute_fn=lambda: wavelet_denoise_dem(dem, wavelet='db4'), params={'wavelet': 'db4', 'levels': 3} ) See :doc:`transforms` for usage with transform functions. PipelineCache ------------- .. autoclass:: src.terrain.cache.PipelineCache :members: :undoc-members: High-level caching for full terrain processing pipelines. **What it caches:** - Complete pipeline stage outputs - Multi-step processing chains - Score computations **Pipeline stages:** - ``load``: DEM loading and merging - ``transform``: Geographic transforms (reproject, flip, downsample) - ``smooth``: DEM smoothing operations - ``scores``: Score grid computations - ``water``: Water body detection - ``mesh``: Final mesh generation Example:: from src.terrain.cache import PipelineCache cache = PipelineCache(cache_dir='.pipeline_cache') # Cache pipeline stage smoothed_dem = cache.get_or_run_stage( stage='smooth', compute_fn=lambda: run_smoothing_pipeline(dem), params={ 'wavelet': True, 'adaptive': True, 'bilateral': True } ) Used in :doc:`../examples/combined_render` to avoid reprocessing expensive operations. Cache Management ---------------- **Cache directory structure:** ``` .dem_cache/ ├── dem_abc123.npz # Cached DEM data └── dem_abc123.json # Metadata .transform_cache/ ├── reproject_def456.npz # Cached transform results └── reproject_def456.json # Transform params .pipeline_cache/ ├── stage_smooth_ghi789.npz └── stage_smooth_ghi789.json ``` **Clearing cache:** ```python # Clear all caches import shutil shutil.rmtree('.dem_cache') shutil.rmtree('.transform_cache') shutil.rmtree('.pipeline_cache') ``` **Cache validation:** All caches automatically validate: - Source file changes (modification time) - Parameter changes - Data shape/dtype changes Invalid cache entries are automatically regenerated. Performance Notes ----------------- **Typical speedups:** - DEM loading: 50-100x faster (0.1s vs 5-10s for large merges) - Transforms: 10-50x faster (depends on complexity) - Pipeline stages: 100-1000x faster for multi-stage pipelines **Cache overhead:** - Hash computation: ~10-50ms per cache lookup - Disk I/O: ~50-200ms for large DEMs - Memory: Minimal (uses memory-mapped arrays when possible) **When to use caching:** - Development/iteration (same data, different parameters) - Batch processing (same DEM, multiple visualizations) - Testing (avoid reloading data between test runs) **When NOT to use caching:** - One-off renders (cache overhead > compute time) - Rapidly changing source data - Limited disk space