GPU Operations Module

GPU-accelerated terrain processing operations using PyTorch.

This module provides GPU-accelerated versions of common terrain processing operations. Functions automatically use CUDA when available, falling back to CPU otherwise.

All functions accept numpy arrays and return numpy arrays for easy integration.

Requirements: PyTorch with CUDA support (pip install torch)

Slope Calculation

src.terrain.gpu_ops.gpu_horn_slope(dem)[source]

Calculate slope magnitude using Horn’s method with GPU acceleration.

Uses PyTorch’s F.conv2d for efficient convolution on GPU. Produces identical results to scipy.ndimage.convolve with Horn’s kernels.

Parameters:

dem (ndarray) – 2D elevation data (H, W). Can contain NaN values.

Returns:

Slope magnitude array (same shape as input). NaN values preserved.

Return type:

ndarray

Calculate slope magnitude using Horn’s method with GPU acceleration.

Performance:

  • GPU (CUDA): ~7x faster than CPU

  • Typical: 100ms for 4096×4096 DEM on GPU vs 700ms on CPU

How it works:

  • Uses PyTorch’s F.conv2d for efficient convolution

  • Sobel-like kernels for gradient estimation

  • Handles NaN values via interpolation

Example:

from src.terrain.gpu_ops import gpu_horn_slope

# Calculate slopes (auto-detects GPU)
slopes = gpu_horn_slope(dem_data)

# Results identical to scipy.ndimage implementation
print(f"Max slope: {slopes.max():.2f}°")

Used by horn_slope().

Filtering Operations

src.terrain.gpu_ops.gpu_gaussian_blur(data, sigma)[source]

Apply Gaussian blur using GPU acceleration.

Uses separable 1D convolutions for efficiency. Produces results very similar to scipy.ndimage.gaussian_filter.

Parameters:
  • data (ndarray) – 2D input array (H, W). Can contain NaN values.

  • sigma (float) – Standard deviation of Gaussian kernel.

Returns:

edges of NaN regions may have partial values, centers remain NaN.

Return type:

Blurred array (same shape as input). NaN handling

GPU-accelerated Gaussian blur using separable convolution.

Performance:

  • GPU: ~10-20x faster than scipy.ndimage for large arrays

  • Uses separable kernels for efficiency (2 passes vs 2D kernel)

Example:

from src.terrain.gpu_ops import gpu_gaussian_blur

# Smooth DEM with sigma=2.0
smoothed = gpu_gaussian_blur(dem_data, sigma=2.0)
src.terrain.gpu_ops.gpu_median_filter(data, kernel_size=3)[source]

Apply median filter using GPU acceleration.

Uses unfold to extract sliding windows, then computes median of each. Produces identical results to scipy.ndimage.median_filter.

Parameters:
  • data (ndarray) – 2D input array (H, W). Can contain NaN values.

  • kernel_size (int) – Size of the median filter kernel (odd number).

Returns:

Filtered array (same shape as input). NaN regions preserved.

Return type:

ndarray

GPU-accelerated median filter for noise removal.

Performance:

  • GPU: ~5-10x faster than scipy.ndimage

  • Uses PyTorch’s unfold + median for efficiency

Example:

from src.terrain.gpu_ops import gpu_median_filter

# Remove salt-and-pepper noise
cleaned = gpu_median_filter(dem_data, kernel_size=3)
src.terrain.gpu_ops.gpu_max_filter(data, kernel_size=3)[source]

Apply maximum filter (dilation) using GPU acceleration.

Uses max_pool2d for efficient GPU computation. Produces identical results to scipy.ndimage.maximum_filter.

Parameters:
  • data (ndarray) – 2D input array (H, W).

  • kernel_size (int) – Size of the filter kernel (odd number).

Returns:

Filtered array (same shape as input).

Return type:

ndarray

GPU-accelerated maximum filter (morphological dilation).

Performance:

  • GPU: ~15-25x faster than scipy.ndimage

  • Uses max_pool2d for optimal GPU utilization

Example:

from src.terrain.gpu_ops import gpu_max_filter

# Morphological dilation
dilated = gpu_max_filter(dem_data, kernel_size=5)
src.terrain.gpu_ops.gpu_min_filter(data, kernel_size=3)[source]

Apply minimum filter (erosion) using GPU acceleration.

Uses max_pool2d on negated data for efficient GPU computation. Produces identical results to scipy.ndimage.minimum_filter.

Parameters:
  • data (ndarray) – 2D input array (H, W).

  • kernel_size (int) – Size of the filter kernel (odd number).

Returns:

Filtered array (same shape as input).

Return type:

ndarray

GPU-accelerated minimum filter (morphological erosion).

Performance:

  • GPU: ~15-25x faster than scipy.ndimage

  • Uses max_pool2d on negated data

Example:

from src.terrain.gpu_ops import gpu_min_filter

# Morphological erosion
eroded = gpu_min_filter(dem_data, kernel_size=5)

Device Management

src.terrain.gpu_ops._get_device()[source]

Get the best available device (CUDA > CPU).

Get best available device (CUDA > CPU).

Automatically detects CUDA availability.

Example:

from src.terrain.gpu_ops import _get_device

device = _get_device()
print(f"Using device: {device}")  # "cuda" or "cpu"

Performance Notes

GPU vs CPU speedups:

Operation

Array Size

GPU Speedup

Horn slope Gaussian blur Median filter Max/Min filter

4096² 4096² 4096² 4096²

7x 10-20x 5-10x 15-25x

Memory requirements:

  • GPU operations require ~3-4x array size in VRAM

  • 4096×4096 float32 array: ~64MB → ~200MB VRAM needed

  • Most operations fall back to CPU if VRAM insufficient

When to use GPU ops:

  • Large arrays (>2048×2048)

  • Repeated operations (amortize data transfer)

  • Batch processing

  • Real-time/interactive applications

When NOT to use:

  • Small arrays (<1024×1024) - overhead > speedup

  • Limited VRAM

  • CPU-only systems (auto-fallback but no benefit)

Integration Example

Integrate GPU ops into terrain pipeline:

from src.terrain.gpu_ops import gpu_horn_slope, gpu_gaussian_blur
from src.terrain.transforms import slope_adaptive_smooth

# Load DEM
dem = load_dem_files('data/hgt')

# GPU-accelerated slope calculation
slopes = gpu_horn_slope(dem)

# GPU-accelerated smoothing
dem_smooth = gpu_gaussian_blur(dem, sigma=2.0)

# Use in pipeline
terrain = Terrain(dem_smooth, transform)
# ...

See Also