GPU Operations Module
GPU-accelerated terrain processing operations using PyTorch.
This module provides GPU-accelerated versions of common terrain processing operations. Functions automatically use CUDA when available, falling back to CPU otherwise.
All functions accept numpy arrays and return numpy arrays for easy integration.
Requirements: PyTorch with CUDA support (pip install torch)
Slope Calculation
- src.terrain.gpu_ops.gpu_horn_slope(dem)[source]
Calculate slope magnitude using Horn’s method with GPU acceleration.
Uses PyTorch’s F.conv2d for efficient convolution on GPU. Produces identical results to scipy.ndimage.convolve with Horn’s kernels.
- Parameters:
dem (ndarray) – 2D elevation data (H, W). Can contain NaN values.
- Returns:
Slope magnitude array (same shape as input). NaN values preserved.
- Return type:
Calculate slope magnitude using Horn’s method with GPU acceleration.
Performance:
GPU (CUDA): ~7x faster than CPU
Typical: 100ms for 4096×4096 DEM on GPU vs 700ms on CPU
How it works:
Uses PyTorch’s
F.conv2dfor efficient convolutionSobel-like kernels for gradient estimation
Handles NaN values via interpolation
Example:
from src.terrain.gpu_ops import gpu_horn_slope # Calculate slopes (auto-detects GPU) slopes = gpu_horn_slope(dem_data) # Results identical to scipy.ndimage implementation print(f"Max slope: {slopes.max():.2f}°")
Used by
horn_slope().
Filtering Operations
- src.terrain.gpu_ops.gpu_gaussian_blur(data, sigma)[source]
Apply Gaussian blur using GPU acceleration.
Uses separable 1D convolutions for efficiency. Produces results very similar to scipy.ndimage.gaussian_filter.
- Parameters:
- Returns:
edges of NaN regions may have partial values, centers remain NaN.
- Return type:
Blurred array (same shape as input). NaN handling
GPU-accelerated Gaussian blur using separable convolution.
Performance:
GPU: ~10-20x faster than scipy.ndimage for large arrays
Uses separable kernels for efficiency (2 passes vs 2D kernel)
Example:
from src.terrain.gpu_ops import gpu_gaussian_blur # Smooth DEM with sigma=2.0 smoothed = gpu_gaussian_blur(dem_data, sigma=2.0)
- src.terrain.gpu_ops.gpu_median_filter(data, kernel_size=3)[source]
Apply median filter using GPU acceleration.
Uses unfold to extract sliding windows, then computes median of each. Produces identical results to scipy.ndimage.median_filter.
- Parameters:
- Returns:
Filtered array (same shape as input). NaN regions preserved.
- Return type:
GPU-accelerated median filter for noise removal.
Performance:
GPU: ~5-10x faster than scipy.ndimage
Uses PyTorch’s
unfold+ median for efficiency
Example:
from src.terrain.gpu_ops import gpu_median_filter # Remove salt-and-pepper noise cleaned = gpu_median_filter(dem_data, kernel_size=3)
- src.terrain.gpu_ops.gpu_max_filter(data, kernel_size=3)[source]
Apply maximum filter (dilation) using GPU acceleration.
Uses max_pool2d for efficient GPU computation. Produces identical results to scipy.ndimage.maximum_filter.
- Parameters:
- Returns:
Filtered array (same shape as input).
- Return type:
GPU-accelerated maximum filter (morphological dilation).
Performance:
GPU: ~15-25x faster than scipy.ndimage
Uses
max_pool2dfor optimal GPU utilization
Example:
from src.terrain.gpu_ops import gpu_max_filter # Morphological dilation dilated = gpu_max_filter(dem_data, kernel_size=5)
- src.terrain.gpu_ops.gpu_min_filter(data, kernel_size=3)[source]
Apply minimum filter (erosion) using GPU acceleration.
Uses max_pool2d on negated data for efficient GPU computation. Produces identical results to scipy.ndimage.minimum_filter.
- Parameters:
- Returns:
Filtered array (same shape as input).
- Return type:
GPU-accelerated minimum filter (morphological erosion).
Performance:
GPU: ~15-25x faster than scipy.ndimage
Uses
max_pool2don negated data
Example:
from src.terrain.gpu_ops import gpu_min_filter # Morphological erosion eroded = gpu_min_filter(dem_data, kernel_size=5)
Device Management
Performance Notes
GPU vs CPU speedups:
Operation |
Array Size |
GPU Speedup |
|---|---|---|
Horn slope Gaussian blur Median filter Max/Min filter |
4096² 4096² 4096² 4096² |
7x 10-20x 5-10x 15-25x |
Memory requirements:
GPU operations require ~3-4x array size in VRAM
4096×4096 float32 array: ~64MB → ~200MB VRAM needed
Most operations fall back to CPU if VRAM insufficient
When to use GPU ops:
Large arrays (>2048×2048)
Repeated operations (amortize data transfer)
Batch processing
Real-time/interactive applications
When NOT to use:
Small arrays (<1024×1024) - overhead > speedup
Limited VRAM
CPU-only systems (auto-fallback but no benefit)
Integration Example
Integrate GPU ops into terrain pipeline:
from src.terrain.gpu_ops import gpu_horn_slope, gpu_gaussian_blur
from src.terrain.transforms import slope_adaptive_smooth
# Load DEM
dem = load_dem_files('data/hgt')
# GPU-accelerated slope calculation
slopes = gpu_horn_slope(dem)
# GPU-accelerated smoothing
dem_smooth = gpu_gaussian_blur(dem, sigma=2.0)
# Use in pipeline
terrain = Terrain(dem_smooth, transform)
# ...
See Also
Advanced Visualization Module - Uses gpu_horn_slope for slope calculation
Transforms Module - CPU-based transform functions