Cache Module

Caching system for efficient terrain processing pipeline.

This module provides three caching systems to avoid reloading and reprocessing expensive operations: DEM loading, transforms, and full pipeline stages.

All caches use hash-based validation to automatically invalidate when source data changes.

DEMCache

class src.terrain.cache.DEMCache(cache_dir=None, enabled=True)[source]

Bases: object

Manages caching of loaded and merged DEM data with hash validation.

The cache stores: - DEM array as .npz file - Metadata including file hash, timestamp, and file list

Parameters:
  • cache_dir (Path | None)

  • enabled (bool)

cache_dir

Directory where cache files are stored

enabled

Whether caching is enabled

Caches loaded and merged DEM data with hash validation.

What it caches:

  • Merged DEM arrays (.npz files)

  • Affine transforms

  • Source file metadata (paths, modification times)

When cache is invalidated:

  • Files are added/removed from source directory

  • Files are modified (based on mtime)

  • Directory path changes

Example:

from src.terrain.cache import DEMCache

cache = DEMCache(cache_dir='.dem_cache', enabled=True)

# Load with caching
dem, transform = cache.get_or_load(
    directory_path='data/hgt',
    pattern='*.hgt',
    load_func=lambda: load_dem_files('data/hgt')
)
__init__(cache_dir=None, enabled=True)[source]

Initialize DEM cache.

Parameters:
  • cache_dir (Path | None) – Directory for cache files. If None, uses .dem_cache/ in project root

  • enabled (bool) – Whether caching is enabled (default: True)

compute_source_hash(directory_path, pattern, recursive=False)[source]

Compute hash of source DEM files based on paths and modification times.

This ensures the cache is invalidated if: - Files are added/removed - Files are modified - Directory path changes

Parameters:
  • directory_path (str) – Path to DEM directory

  • pattern (str) – File pattern (e.g., “*.hgt”)

  • recursive (bool) – Whether search is recursive

Returns:

SHA256 hash of source file metadata

Return type:

str

get_cache_path(source_hash, cache_name='dem')[source]

Get the path for a cache file.

Parameters:
  • source_hash (str) – Hash of source files

  • cache_name (str) – Name of cache item (default: “dem”)

Returns:

Path to cache file

Return type:

Path

get_metadata_path(source_hash, cache_name='dem')[source]

Get the path for cache metadata file.

Parameters:
  • source_hash (str) – Hash of source files

  • cache_name (str) – Name of cache item (default: “dem”)

Returns:

Path to metadata file

Return type:

Path

save_cache(dem_array, transform, source_hash, cache_name='dem')[source]

Save DEM array and transform to cache.

Parameters:
  • dem_array (ndarray) – Merged DEM array

  • transform (Affine) – Affine transform

  • source_hash (str) – Hash of source files

  • cache_name (str) – Name of cache item (default: “dem”)

Returns:

Tuple of (cache_file_path, metadata_file_path)

Return type:

Tuple[Path, Path]

load_cache(source_hash, cache_name='dem')[source]

Load cached DEM data.

Parameters:
  • source_hash (str) – Hash of source files

  • cache_name (str) – Name of cache item (default: “dem”)

Returns:

Tuple of (dem_array, transform) or None if cache doesn’t exist

Return type:

Tuple[ndarray, Affine] | None

clear_cache(cache_name='dem')[source]

Clear all cached files for a given cache name.

Parameters:

cache_name (str) – Name of cache item to clear

Returns:

Number of files deleted

Return type:

int

get_cache_stats()[source]

Get statistics about cached files.

Returns:

Dictionary with cache statistics

Return type:

dict

TransformCache

class src.terrain.cache.TransformCache(cache_dir=None, enabled=True)[source]

Bases: object

Cache for transform pipeline results with dependency tracking.

Tracks chains of transforms (reproject -> smooth -> water_detect) and computes cache keys that incorporate the full dependency chain, ensuring downstream caches are invalidated when upstream params change.

Parameters:
  • cache_dir (Path | None)

  • enabled (bool)

cache_dir

Directory where cache files are stored

enabled

Whether caching is enabled

dependencies

Graph of transform dependencies

transforms

Registered transforms with their parameters

Caches transformed raster data (reprojection, smoothing, etc.).

What it caches:

  • Transformed DEM arrays

  • Transform parameters (for validation)

  • Intermediate processing results

Cache key components:

  • Transform name (e.g., ‘reproject’, ‘smooth’)

  • Parameter dictionary (all args)

  • Input data hash

Example:

from src.terrain.cache import TransformCache

cache = TransformCache(cache_dir='.transform_cache')

# Cache expensive transform
result = cache.get_or_compute(
    'wavelet_denoise',
    compute_fn=lambda: wavelet_denoise_dem(dem, wavelet='db4'),
    params={'wavelet': 'db4', 'levels': 3}
)

See Transforms Module for usage with transform functions.

__init__(cache_dir=None, enabled=True)[source]

Initialize transform cache.

Parameters:
  • cache_dir (Path | None) – Directory for cache files. If None, uses .transform_cache/

  • enabled (bool) – Whether caching is enabled (default: True)

compute_transform_hash(upstream_hash, transform_name, params)[source]

Compute cache key from upstream hash and transform parameters.

The key incorporates: - Upstream cache key (propagating the full dependency chain) - Transform name - All transform parameters (sorted for determinism)

Parameters:
  • upstream_hash (str) – Hash of upstream data/transform

  • transform_name (str) – Name of this transform (e.g., “reproject”, “smooth”)

  • params (dict) – Transform parameters dict

Returns:

SHA256 hash string (64 chars)

Return type:

str

get_cache_path(cache_key, transform_name)[source]

Get path for cache file.

Parameters:
  • cache_key (str) – Cache key hash

  • transform_name (str) – Name of transform

Returns:

Path to cache .npz file

Return type:

Path

get_metadata_path(cache_key, transform_name)[source]

Get path for metadata file.

Parameters:
  • cache_key (str) – Cache key hash

  • transform_name (str) – Name of transform

Returns:

Path to metadata .json file

Return type:

Path

save_transform(cache_key, data, transform_name, metadata=None)[source]

Save transform result to cache.

Parameters:
  • cache_key (str) – Cache key hash

  • data (ndarray) – Transform result array

  • transform_name (str) – Name of transform

  • metadata (dict | None) – Optional additional metadata

Returns:

Tuple of (cache_path, metadata_path) or (None, None) if disabled

Return type:

Tuple[Path | None, Path | None]

load_transform(cache_key, transform_name)[source]

Load transform result from cache.

Parameters:
  • cache_key (str) – Cache key hash

  • transform_name (str) – Name of transform

Returns:

Cached array or None if cache miss/disabled

Return type:

ndarray | None

register_dependency(child, upstream)[source]

Register a dependency between transforms.

Parameters:
  • child (str) – Name of dependent transform

  • upstream (str) – Name of upstream transform it depends on

Return type:

None

register_transform(name, upstream, params)[source]

Register a transform with its parameters.

Parameters:
  • name (str) – Transform name

  • upstream (str) – Name of upstream dependency

  • params (dict) – Transform parameters

Return type:

None

get_dependency_chain(transform_name)[source]

Get full dependency chain for a transform.

Parameters:

transform_name (str) – Name of transform

Returns:

List of transform names from root to target

Return type:

list[str]

get_full_cache_key(transform_name, source_hash)[source]

Compute full cache key incorporating dependency chain.

Parameters:
  • transform_name (str) – Target transform name

  • source_hash (str) – Hash of original source data

Returns:

Cache key hash

Return type:

str

invalidate_downstream(transform_name)[source]

Invalidate all caches downstream of a transform.

Parameters:

transform_name (str) – Name of transform whose downstream should be invalidated

Returns:

Number of cache files deleted

Return type:

int

PipelineCache

class src.terrain.cache.PipelineCache(cache_dir=None, enabled=True)[source]

Bases: object

Target-style caching for terrain processing pipelines.

Like a build system (Make, Bazel), this cache: - Tracks targets with defined parameters and dependencies - Computes cache keys that incorporate the FULL dependency chain - Ensures downstream targets are invalidated when upstream changes - Supports file inputs with mtime tracking

Example

cache = PipelineCache() cache.define_target(“dem_loaded”, params={“path”: “/data”}) cache.define_target(“reprojected”, params={“crs”: “EPSG:32617”},

dependencies=[“dem_loaded”])

# First run: cache miss if cache.get_cached(“reprojected”) is None:

data = expensive_operation() cache.save_target(“reprojected”, data)

# Second run (same params): cache hit # If dem_loaded params change: cache miss (invalidated)

Parameters:
  • cache_dir (Path | None)

  • enabled (bool)

cache_dir

Directory where cache files are stored

enabled

Whether caching is enabled

targets

Dict of target definitions {name: {params, dependencies, file_inputs}}

High-level caching for full terrain processing pipelines.

What it caches:

  • Complete pipeline stage outputs

  • Multi-step processing chains

  • Score computations

Pipeline stages:

  • load: DEM loading and merging

  • transform: Geographic transforms (reproject, flip, downsample)

  • smooth: DEM smoothing operations

  • scores: Score grid computations

  • water: Water body detection

  • mesh: Final mesh generation

Example:

from src.terrain.cache import PipelineCache

cache = PipelineCache(cache_dir='.pipeline_cache')

# Cache pipeline stage
smoothed_dem = cache.get_or_run_stage(
    stage='smooth',
    compute_fn=lambda: run_smoothing_pipeline(dem),
    params={
        'wavelet': True,
        'adaptive': True,
        'bilateral': True
    }
)

Used in Combined Render: Full-Featured Example to avoid reprocessing expensive operations.

__init__(cache_dir=None, enabled=True)[source]

Initialize pipeline cache.

Parameters:
  • cache_dir (Path | None) – Directory for cache files. If None, uses .pipeline_cache/

  • enabled (bool) – Whether caching is enabled (default: True)

define_target(name, params, dependencies=None, file_inputs=None)[source]

Define a pipeline target with its parameters and dependencies.

Parameters:
  • name (str) – Unique name for this target

  • params (dict) – Parameters that affect the target’s output

  • dependencies (list[str] | None) – List of upstream target names this depends on

  • file_inputs (list[Path] | None) – List of file paths whose mtimes should be tracked

Raises:

ValueError – If adding this target would create a circular dependency

Return type:

None

compute_target_key(target_name)[source]

Compute cache key for a target, incorporating all upstream dependencies.

The key is a SHA256 hash that changes if: - Target’s own params change - Any upstream target’s params change - Any file inputs are modified

Parameters:

target_name (str) – Name of the target

Returns:

64-character hex SHA256 hash, or empty string if target undefined

Return type:

str

save_target(target_name, data, metadata=None)[source]

Save target output to cache.

Parameters:
  • target_name (str) – Name of the target

  • data – numpy array, or dict of arrays to cache

  • metadata (dict | None) – Optional additional metadata (can include Affine transforms)

Returns:

Path to cache file, or None if disabled

Return type:

Path | None

get_cached(target_name, return_metadata=False)[source]

Get cached target output if available.

Parameters:
  • target_name (str) – Name of the target

  • return_metadata (bool) – If True, return (data, metadata) tuple

Returns:

Cached data (array or dict of arrays), or None if cache miss. If return_metadata=True, returns (data, metadata) or (None, None)

clear_target(target_name)[source]

Clear cache files for a specific target.

Parameters:

target_name (str) – Name of target to clear

Returns:

Number of files deleted

Return type:

int

clear_all()[source]

Clear all cache files.

Returns:

Number of files deleted

Return type:

int

Cache Management

Cache directory structure:

``` .dem_cache/ ├── dem_abc123.npz # Cached DEM data └── dem_abc123.json # Metadata

.transform_cache/ ├── reproject_def456.npz # Cached transform results └── reproject_def456.json # Transform params

.pipeline_cache/ ├── stage_smooth_ghi789.npz └── stage_smooth_ghi789.json ```

Clearing cache:

`python # Clear all caches import shutil shutil.rmtree('.dem_cache') shutil.rmtree('.transform_cache') shutil.rmtree('.pipeline_cache') `

Cache validation:

All caches automatically validate: - Source file changes (modification time) - Parameter changes - Data shape/dtype changes

Invalid cache entries are automatically regenerated.

Performance Notes

Typical speedups:

  • DEM loading: 50-100x faster (0.1s vs 5-10s for large merges)

  • Transforms: 10-50x faster (depends on complexity)

  • Pipeline stages: 100-1000x faster for multi-stage pipelines

Cache overhead:

  • Hash computation: ~10-50ms per cache lookup

  • Disk I/O: ~50-200ms for large DEMs

  • Memory: Minimal (uses memory-mapped arrays when possible)

When to use caching:

  • Development/iteration (same data, different parameters)

  • Batch processing (same DEM, multiple visualizations)

  • Testing (avoid reloading data between test runs)

When NOT to use caching:

  • One-off renders (cache overhead > compute time)

  • Rapidly changing source data

  • Limited disk space