Monitoring rice paddy fields is critical for food security, water management, and methane emission tracking. However, traditional optical satellite imagery (like Sentinel-2) is often unusable in tropical and subtropical regions due to persistent cloud cover during the monsoon cropping season.
This project implements an end-to-end Deep Learning pipeline to automatically segment paddy fields using Sentinel-1 SAR (Synthetic Aperture Radar) Ground Range Detected (GRD) data. By leveraging the unique backscatter temporal signature of rice during its flooding and growth stages, the model achieves high-precision mapping regardless of weather conditions.
- Multi-Temporal Fusion: Processes a 60-band data stack representing a 20-date cropping season. Each date includes VV, VH, and VV/VH ratio polarizations to capture the "double-bounce" scattering effect characteristic of rice stems in water.
- Geospatial Preprocessing: Integration of ESA SNAP (pyroSAR) for specialized SAR calibration, thermal noise removal, and terrain correction (orthorectification).
- MLOps Architecture:
- DVC (Data Version Control): Manages heavy 60-band GeoTIFF stacks and model weights without bloating the Git repository.
- MLflow: Tracks hyperparameter experiments, loss curves, and evaluation metrics (IoU, F1-Score).
- High Performance: Optimized a U-Net + ResNet34 architecture to achieve a 0.85 IoU, successfully filtering "speckle noise" inherent in SAR data.
- Sentinel-1 SAR: Multi-temporal C-band data (20 dates).
- JAXA LULC Map: Used for automated ground-truth label generation (Rice vs. Non-Rice).
- ROI: Niigata, Japan (High-intensity rice production region).
- Python 3.10 or 3.11
- NVIDIA GPU with CUDA support (Recommended)
Construct a virtual environment and install the required dependencies:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtOn Windows systems, you might encounter OSError: [WinError 1114] when importing torch. This is resolved by manually loading c10.dll. The following snippet is included in the project scripts:
import os
import platform
import ctypes
from importlib.util import find_spec
if platform.system() == "Windows":
try:
if (spec := find_spec("torch")) and spec.origin:
dll_path = os.path.join(os.path.dirname(spec.origin), "lib", "c10.dll")
if os.path.exists(dll_path):
ctypes.CDLL(os.path.normpath(dll_path))
except Exception:
passWe use specific stable versions to ensure compatibility on Windows:
torch==2.5.1+cu121torchvision==0.20.1+cu121torchgeo==0.6.2
src/: Source code for the pipeline stages.training.py: Handles tiling and model training.testing.py: Runs inference on test images and generates visualizations.labeling.py: Prepares binary labels from external data.
data/: Data directory (standardized structure).raw/: Unprocessed input data.processed/: Features, stacks, and labels.external/: External shapefiles and validation data.
config.yaml: Centralized configuration for paths and training parameters.dvc.yaml: DVC pipeline definition.
The project uses DVC to manage the workflow:
- Prepare Labels:
python src/labeling.py- Merges source TIFs and creates binary masks/geojson.
- Train Model:
python src/training.py- Generates tiles (patches) from input rasters and labels.
- Trains the segmentation model.
- Inference (Experimental):
python src/testing.py- Performs semantic segmentation on test images.
- Orthogonalizes results and generates split-map visualizations.
Modify config.yaml to adjust training parameters:
tile_size: Size of the input patches (default: 512).epochs: Number of training iterations.batch_size: Number of samples per training step.
By default, large data files and models are stored in:
- Project root (
data/,models/) - External drive (if configured in
training.py, e.g.,G:\data)
To run the full pipeline through DVC:
dvc reproTo run individual steps:
python src/training.py
python src/testing.py