Cache Module

Caching system for efficient terrain processing pipeline.

This module provides three caching systems to avoid reloading and reprocessing expensive operations: DEM loading, transforms, and full pipeline stages.

All caches use hash-based validation to automatically invalidate when source data changes.

DEMCache

class src.terrain.cache.DEMCache(cache_dir=None, enabled=True)[source]

Bases: object

Manages caching of loaded and merged DEM data with hash validation.

The cache stores: - DEM array as .npz file - Metadata including file hash, timestamp, and file list

Parameters:

cache_dir (Path | None)
enabled (bool)

cache_dir: Directory where cache files are stored

enabled: Whether caching is enabled

Caches loaded and merged DEM data with hash validation.

What it caches:

Merged DEM arrays (.npz files)
Affine transforms
Source file metadata (paths, modification times)

When cache is invalidated:

Files are added/removed from source directory
Files are modified (based on mtime)
Directory path changes

Example:

from src.terrain.cache import DEMCache

cache = DEMCache(cache_dir='.dem_cache', enabled=True)

# Load with caching
dem, transform = cache.get_or_load(
    directory_path='data/hgt',
    pattern='*.hgt',
    load_func=lambda: load_dem_files('data/hgt')
)

__init__(cache_dir=None, enabled=True)[source]

Initialize DEM cache.

Parameters:

cache_dir (Path | None) – Directory for cache files. If None, uses .dem_cache/ in project root
enabled (bool) – Whether caching is enabled (default: True)

compute_source_hash(directory_path, pattern, recursive=False)[source]

Compute hash of source DEM files based on paths and modification times.

This ensures the cache is invalidated if: - Files are added/removed - Files are modified - Directory path changes

Parameters:

directory_path (str) – Path to DEM directory
pattern (str) – File pattern (e.g., “*.hgt”)
recursive (bool) – Whether search is recursive

Returns:

SHA256 hash of source file metadata

Return type:

str

get_cache_path(source_hash, cache_name='dem')[source]

Get the path for a cache file.

Parameters:

source_hash (str) – Hash of source files
cache_name (str) – Name of cache item (default: “dem”)

Returns:

Path to cache file

Return type:

Path

get_metadata_path(source_hash, cache_name='dem')[source]

Get the path for cache metadata file.

Parameters:

source_hash (str) – Hash of source files
cache_name (str) – Name of cache item (default: “dem”)

Returns:

Path to metadata file

Return type:

Path

save_cache(dem_array, transform, source_hash, cache_name='dem')[source]

Save DEM array and transform to cache.

Parameters:

dem_array (ndarray) – Merged DEM array
transform (Affine) – Affine transform
source_hash (str) – Hash of source files
cache_name (str) – Name of cache item (default: “dem”)

Returns:

Tuple of (cache_file_path, metadata_file_path)

Return type:

Tuple[Path, Path]

load_cache(source_hash, cache_name='dem')[source]

Load cached DEM data.

Parameters:

source_hash (str) – Hash of source files
cache_name (str) – Name of cache item (default: “dem”)

Returns:

Tuple of (dem_array, transform) or None if cache doesn’t exist

Return type:

Tuple[ndarray, Affine] | None

clear_cache(cache_name='dem')[source]

Clear all cached files for a given cache name.

Parameters:: cache_name (str) – Name of cache item to clear
Returns:: Number of files deleted
Return type:: int

get_cache_stats()[source]

Get statistics about cached files.

Returns:: Dictionary with cache statistics
Return type:: dict

TransformCache

class src.terrain.cache.TransformCache(cache_dir=None, enabled=True)[source]

Bases: object

Cache for transform pipeline results with dependency tracking.

Tracks chains of transforms (reproject -> smooth -> water_detect) and computes cache keys that incorporate the full dependency chain, ensuring downstream caches are invalidated when upstream params change.

Parameters:

cache_dir (Path | None)
enabled (bool)

cache_dir: Directory where cache files are stored

enabled: Whether caching is enabled

dependencies: Graph of transform dependencies

transforms: Registered transforms with their parameters

Caches transformed raster data (reprojection, smoothing, etc.).

What it caches:

Transformed DEM arrays
Transform parameters (for validation)
Intermediate processing results

Cache key components:

Transform name (e.g., ‘reproject’, ‘smooth’)
Parameter dictionary (all args)
Input data hash

Example:

from src.terrain.cache import TransformCache

cache = TransformCache(cache_dir='.transform_cache')

# Cache expensive transform
result = cache.get_or_compute(
    'wavelet_denoise',
    compute_fn=lambda: wavelet_denoise_dem(dem, wavelet='db4'),
    params={'wavelet': 'db4', 'levels': 3}
)

See Transforms Module for usage with transform functions.

__init__(cache_dir=None, enabled=True)[source]

Initialize transform cache.

Parameters:

cache_dir (Path | None) – Directory for cache files. If None, uses .transform_cache/
enabled (bool) – Whether caching is enabled (default: True)

compute_transform_hash(upstream_hash, transform_name, params)[source]

Compute cache key from upstream hash and transform parameters.

The key incorporates: - Upstream cache key (propagating the full dependency chain) - Transform name - All transform parameters (sorted for determinism)

Parameters:

upstream_hash (str) – Hash of upstream data/transform
transform_name (str) – Name of this transform (e.g., “reproject”, “smooth”)
params (dict) – Transform parameters dict

Returns:

SHA256 hash string (64 chars)

Return type:

str

get_cache_path(cache_key, transform_name)[source]

Get path for cache file.

Parameters:

cache_key (str) – Cache key hash
transform_name (str) – Name of transform

Returns:

Path to cache .npz file

Return type:

Path

get_metadata_path(cache_key, transform_name)[source]

Get path for metadata file.

Parameters:

cache_key (str) – Cache key hash
transform_name (str) – Name of transform

Returns:

Path to metadata .json file

Return type:

Path

save_transform(cache_key, data, transform_name, metadata=None)[source]

Save transform result to cache.

Parameters:

cache_key (str) – Cache key hash
data (ndarray) – Transform result array
transform_name (str) – Name of transform
metadata (dict | None) – Optional additional metadata

Returns:

Tuple of (cache_path, metadata_path) or (None, None) if disabled

Return type:

Tuple[Path | None, Path | None]

load_transform(cache_key, transform_name)[source]

Load transform result from cache.

Parameters:

cache_key (str) – Cache key hash
transform_name (str) – Name of transform

Returns:

Cached array or None if cache miss/disabled

Return type:

ndarray | None

register_dependency(child, upstream)[source]

Register a dependency between transforms.

Parameters:

child (str) – Name of dependent transform
upstream (str) – Name of upstream transform it depends on

Return type:

None

register_transform(name, upstream, params)[source]

Register a transform with its parameters.

Parameters:

name (str) – Transform name
upstream (str) – Name of upstream dependency
params (dict) – Transform parameters

Return type:

None

get_dependency_chain(transform_name)[source]

Get full dependency chain for a transform.

Parameters:: transform_name (str) – Name of transform
Returns:: List of transform names from root to target
Return type:: list[str]

get_full_cache_key(transform_name, source_hash)[source]

Compute full cache key incorporating dependency chain.

Parameters:

transform_name (str) – Target transform name
source_hash (str) – Hash of original source data

Returns:

Cache key hash

Return type:

str

invalidate_downstream(transform_name)[source]

Invalidate all caches downstream of a transform.

Parameters:: transform_name (str) – Name of transform whose downstream should be invalidated
Returns:: Number of cache files deleted
Return type:: int

PipelineCache

class src.terrain.cache.PipelineCache(cache_dir=None, enabled=True)[source]

Bases: object

Target-style caching for terrain processing pipelines.

Like a build system (Make, Bazel), this cache: - Tracks targets with defined parameters and dependencies - Computes cache keys that incorporate the FULL dependency chain - Ensures downstream targets are invalidated when upstream changes - Supports file inputs with mtime tracking

Example

cache = PipelineCache() cache.define_target(“dem_loaded”, params={“path”: “/data”}) cache.define_target(“reprojected”, params={“crs”: “EPSG:32617”},

dependencies=[“dem_loaded”])

# First run: cache miss if cache.get_cached(“reprojected”) is None:

data = expensive_operation() cache.save_target(“reprojected”, data)

# Second run (same params): cache hit # If dem_loaded params change: cache miss (invalidated)

Parameters:

cache_dir (Path | None)
enabled (bool)

cache_dir: Directory where cache files are stored

enabled: Whether caching is enabled

targets: Dict of target definitions {name: {params, dependencies, file_inputs}}

High-level caching for full terrain processing pipelines.

What it caches:

Complete pipeline stage outputs
Multi-step processing chains
Score computations

Pipeline stages:

load: DEM loading and merging
transform: Geographic transforms (reproject, flip, downsample)
smooth: DEM smoothing operations
scores: Score grid computations
water: Water body detection
mesh: Final mesh generation

Example:

from src.terrain.cache import PipelineCache

cache = PipelineCache(cache_dir='.pipeline_cache')

# Cache pipeline stage
smoothed_dem = cache.get_or_run_stage(
    stage='smooth',
    compute_fn=lambda: run_smoothing_pipeline(dem),
    params={
        'wavelet': True,
        'adaptive': True,
        'bilateral': True
    }
)

Used in Combined Render: Full-Featured Example to avoid reprocessing expensive operations.

__init__(cache_dir=None, enabled=True)[source]

Initialize pipeline cache.

Parameters:

cache_dir (Path | None) – Directory for cache files. If None, uses .pipeline_cache/
enabled (bool) – Whether caching is enabled (default: True)

define_target(name, params, dependencies=None, file_inputs=None)[source]

Define a pipeline target with its parameters and dependencies.

Parameters:

name (str) – Unique name for this target
params (dict) – Parameters that affect the target’s output
dependencies (list[str] | None) – List of upstream target names this depends on
file_inputs (list[Path] | None) – List of file paths whose mtimes should be tracked

Raises:

ValueError – If adding this target would create a circular dependency

Return type:

None

compute_target_key(target_name)[source]

Compute cache key for a target, incorporating all upstream dependencies.

The key is a SHA256 hash that changes if: - Target’s own params change - Any upstream target’s params change - Any file inputs are modified

Parameters:: target_name (str) – Name of the target
Returns:: 64-character hex SHA256 hash, or empty string if target undefined
Return type:: str

save_target(target_name, data, metadata=None)[source]

Save target output to cache.

Parameters:

target_name (str) – Name of the target
data – numpy array, or dict of arrays to cache
metadata (dict | None) – Optional additional metadata (can include Affine transforms)

Returns:

Path to cache file, or None if disabled

Return type:

Path | None

get_cached(target_name, return_metadata=False)[source]

Get cached target output if available.

Parameters:

target_name (str) – Name of the target
return_metadata (bool) – If True, return (data, metadata) tuple

Returns:

Cached data (array or dict of arrays), or None if cache miss. If return_metadata=True, returns (data, metadata) or (None, None)

clear_target(target_name)[source]

Clear cache files for a specific target.

Parameters:: target_name (str) – Name of target to clear
Returns:: Number of files deleted
Return type:: int

clear_all()[source]

Clear all cache files.

Returns:: Number of files deleted
Return type:: int

Cache Management

Cache directory structure:

``` .dem_cache/ ├── dem_abc123.npz # Cached DEM data └── dem_abc123.json # Metadata

.transform_cache/ ├── reproject_def456.npz # Cached transform results └── reproject_def456.json # Transform params

.pipeline_cache/ ├── stage_smooth_ghi789.npz └── stage_smooth_ghi789.json ```

Clearing cache:

`python # Clear all caches import shutil shutil.rmtree('.dem_cache') shutil.rmtree('.transform_cache') shutil.rmtree('.pipeline_cache') `

Cache validation:

All caches automatically validate: - Source file changes (modification time) - Parameter changes - Data shape/dtype changes

Invalid cache entries are automatically regenerated.

Performance Notes

Typical speedups:

DEM loading: 50-100x faster (0.1s vs 5-10s for large merges)
Transforms: 10-50x faster (depends on complexity)
Pipeline stages: 100-1000x faster for multi-stage pipelines

Cache overhead:

Hash computation: ~10-50ms per cache lookup
Disk I/O: ~50-200ms for large DEMs
Memory: Minimal (uses memory-mapped arrays when possible)

When to use caching:

Development/iteration (same data, different parameters)
Batch processing (same DEM, multiple visualizations)
Testing (avoid reloading data between test runs)

When NOT to use caching:

One-off renders (cache overhead > compute time)
Rapidly changing source data
Limited disk space