NASA PACE Hyperspectral Implementation and Ocean Color Radiometry¶
This reference implementation demonstrates high-dimensional hyperspectral processing pipelines utilizing NASA PACE (Plankton, Aerosol, Cloud, ocean Ecosystem) OCI (Ocean Color Instrument) data.
Spectral Resolution and Ocean Ecosystem Applications¶
The Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission delivers continuous spectral coverage (from 340 nm to 890 nm with 5 nm spacing, plus shortwave infrared bands). This high-dimensional observation capability supports advanced biogeochemical retrievals, including discrimination of phytoplanktons, detritus absorption coefficients, and dissolved organic carbon dynamics over complex inland and marine aquatic profiles.
Core Objectives¶
- Multidimensional Data Ingest: Mapping PACE OCI Level-2 hyperspectral ocean color cubes into coordinate-labeled
xarrayandrioxarraystructures. - Spectral Band Characterization: Analyzing the structures of 100+ band spectral cubes to identify spatial and spectral subsetting boundaries.
- Endmember Signature Extraction: Programmatically capturing water leaving reflectance spectra across diverse marine, turbid, and clear aquatics formations.
- Bio-optical Algorithms Implementation: Evaluating algal concentrations through ratio-derived Chlorophyll-a retrieval formulations.
- Interactive Visualization Pipelines: Constructing interactive multi-layer cartographic visualizations for dynamic ocean color exploration.
Architectural Summary¶
- Dimensional Abstraction: High-resolution ocean radiometry is parameterized as multi-dimensional coordinate arrays ($\text{reflectance} = f(\text{latitude}, \text{longitude}, \text{wavelength})$).
- Geophysical Validation: Abundance and concentrations are evaluated against in-situ AERONET-OC and validation networks.
# Import libraries
import numpy as np
import pandas as pd
import xarray as xr
import rioxarray as rxr
import hvplot.pandas
import hvplot.xarray
import geoviews as gv
from pathlib import Path
# NASA Earthdata access
import earthaccess
Part 1: Access PACE Data¶
PACE OCI (Ocean Color Instrument) provides hyperspectral ocean color measurements. We'll use Level-2 data (surface reflectance, already atmospherically corrected).
Authentication: You need NASA Earthdata account (free).
Part 2: Load Hyperspectral Cube¶
PACE data is in NetCDF4 format. We'll use xarray (your preferred tool) to load the hyperspectral reflectance data.
data_dir_l2 = Path("/home/bny/data/pace_l2")
# Find L2 file that MATCHES the L1B timestamp
l2_files = list(data_dir_l2.glob("PACE_OCI*.nc"))
l2_files
# Load PACE NetCDF file
# pace_file = files[0]
pace_file = l2_files[0]
ds = xr.open_dataset(pace_file, group='geophysical_data')
# Also load navigation data for lat/lon
nav = xr.open_dataset(pace_file, group='navigation_data')
print("Dataset structure:")
print(ds)
print(f"\nData variables: {list(ds.data_vars)}")
# start here to use same data downloaded into 13_b
# Remote sensing reflectance (Rrs) is key variable
# This is surface reflectance - atmospherically corrected
rrs = ds['Rrs'] # Shape: (wavelengths, pixels_along_track, pixels_across_track)
print(f"Rrs shape: {rrs.shape}")
print(f"Dimensions: {rrs.dims}")
print(f"Wavelengths: {rrs.wavelength_3d.values[:10]}... (showing first 10)")
print(f"Number of spectral bands: {len(rrs.wavelength_3d)}")
rrs
Part 3: Explore Hyperspectral Cube Structure¶
This is a true hyperspectral cube: (wavelengths, y, x) with 200+ continuous spectral bands.
# Wavelength coverage
# Load actual wavelength values from sensor_band_parameters group
sensor = xr.open_dataset(pace_file, group='sensor_band_parameters')
wavelengths_full = sensor['wavelength'].values
# Rrs only contains first 172 bands (visible to near-IR range)
wavelengths = wavelengths_full[:rrs.shape[2]]
print(f"Spectral range: {wavelengths.min():.1f} - {wavelengths.max():.1f} nm")
print(f"Total bands: {len(wavelengths)}")
print(f"Approximate spectral resolution: {np.median(np.diff(wavelengths)):.2f} nm")
# Spatial extent
lat = nav['latitude']
lon = nav['longitude']
print(f"\nSpatial extent:")
print(f"Latitude: {lat.min().values:.2f} to {lat.max().values:.2f}")
print(f"Longitude: {lon.min().values:.2f} to {lon.max().values:.2f}")
Part 4: Create RGB Composite¶
Select bands closest to RGB wavelengths to visualize the scene.
# Find bands closest to RGB
# Manual nearest neighbor since wavelength_3d is not indexed
red_idx = np.abs(wavelengths - 650).argmin()
green_idx = np.abs(wavelengths - 550).argmin()
blue_idx = np.abs(wavelengths - 450).argmin()
red_band = rrs.isel(wavelength_3d=red_idx)
green_band = rrs.isel(wavelength_3d=green_idx)
blue_band = rrs.isel(wavelength_3d=blue_idx)
print(f"Selected bands:")
print(f"Red: {wavelengths[red_idx]:.1f} nm")
print(f"Green: {wavelengths[green_idx]:.1f} nm")
print(f"Blue: {wavelengths[blue_idx]:.1f} nm")
# Stack into RGB
rgb = xr.concat([red_band, green_band, blue_band], dim='band')
rgb = rgb.transpose('number_of_lines', 'pixels_per_line', 'band')
# Visualize RGB composite
# Display each band separately for now (hvplot.rgb has dimension requirements)
rgb_norm = (rgb - rgb.min()) / (rgb.max() - rgb.min())
# Show red band as example
red_display = rgb_norm.isel(band=0)
red_display.hvplot.image(
x='pixels_per_line',
y='number_of_lines',
cmap='Reds',
title='PACE Red Band (650 nm)',
width=600,
height=500
)
Part 5: Extract Spectral Signatures¶
Select representative pixels to extract full hyperspectral signatures for different water types.
# Select 3 representative pixels (adjust indices based on your scene)
# Look for: clear water, turbid water, potential algae bloom areas
# Example coordinates (modify based on your RGB visualization)
pixel_clear = rrs.isel(number_of_lines=500, pixels_per_line=800)
pixel_turbid = rrs.isel(number_of_lines=800, pixels_per_line=400)
pixel_coastal = rrs.isel(number_of_lines=300, pixels_per_line=600)
print(f"Extracted spectral signatures for 3 water types")
# Plot spectral signatures with hvplot
# Create dataframe for plotting
df = pd.DataFrame({
'wavelength': wavelengths,
'Clear Water': pixel_clear.values,
'Turbid Water': pixel_turbid.values,
'Coastal': pixel_coastal.values
})
df.hvplot.line(
x='wavelength',
y=['Clear Water', 'Turbid Water', 'Coastal'],
title='PACE Hyperspectral Signatures - Ocean Water Types',
xlabel='Wavelength (nm)',
ylabel='Remote Sensing Reflectance (sr⁻¹)',
width=700,
height=400,
legend='top_right'
)
Part 6: Water Quality Algorithm - Chlorophyll-a¶
Implement simple band ratio algorithm for chlorophyll-a concentration estimation.
Algorithm: Blue-Green ratio is commonly used for ocean chlorophyll detection.
- High chlorophyll → absorption in blue, reflection in green
- Low chlorophyll → higher blue reflectance
# Simple chlorophyll indicator: Green/Blue ratio
# Higher ratio often indicates higher chlorophyll
blue_idx = np.abs(wavelengths - 490).argmin()
green_idx = np.abs(wavelengths - 560).argmin()
blue_490 = rrs.isel(wavelength_3d=blue_idx)
green_560 = rrs.isel(wavelength_3d=green_idx)
print(f"Using bands: {wavelengths[blue_idx]:.1f} nm (blue), {wavelengths[green_idx]:.1f} nm (green)")
# Calculate ratio
chlor_indicator = green_560 / blue_490
# Replace inf and very large values with NaN
chlor_indicator = chlor_indicator.where(np.isfinite(chlor_indicator))
print(f"Chlorophyll indicator range: {chlor_indicator.min().values:.3f} - {chlor_indicator.max().values:.3f}")
# Visualize chlorophyll indicator with hvplot
chlor_indicator.hvplot.image(
x='pixels_per_line',
y='number_of_lines',
cmap='YlGn',
title='Chlorophyll-a Indicator (Green/Blue Ratio)',
width=600,
height=500,
clabel='Ratio',
clim=(0.5, 2.0) # Adjust limits based on your data
)
Part 7: Compare to PACE Standard Product¶
PACE dataset includes standard chlorophyll-a product. Let's compare our simple algorithm to it.
# Load PACE's chlorophyll product (if available in your dataset)
if 'chlor_a' in ds.data_vars:
chlor_a_pace = ds['chlor_a']
chlor_a_pace.hvplot.image(
x='pixels_per_line',
y='number_of_lines',
cmap='YlGn',
title='PACE Standard Chlorophyll-a Product',
width=600,
height=500,
clabel='Chlor-a (mg/m³)',
logz=True # Often log-scaled for chlorophyll
)
else:
print("Chlorophyll product not in this data file. Load from BGC group if needed.")
Part 8: Spectral Feature Analysis¶
Hyperspectral advantage: Detect diagnostic absorption/reflection features.
Key ocean color features:
- ~440 nm: Chlorophyll-a absorption peak (blue)
- ~490 nm: Chlorophyll-a absorption shoulder
- ~555 nm: Chlorophyll-a reflection peak (green)
- ~665 nm: Chlorophyll-a absorption (red)
# Plot spectral region highlighting chlorophyll features
df_subset = df[(df['wavelength'] >= 400) & (df['wavelength'] <= 700)]
plot = df_subset.hvplot.line(
x='wavelength',
y=['Clear Water', 'Turbid Water', 'Coastal'],
title='Visible Spectrum - Chlorophyll Absorption Features',
xlabel='Wavelength (nm)',
ylabel='Rrs (sr⁻¹)',
width=700,
height=400,
legend='top_right'
)
# Add vertical lines for key features (note: hvplot doesn't have vline, use overlay)
from holoviews import VLine
vlines = VLine(440).opts(color='blue', line_dash='dashed') * \
VLine(555).opts(color='green', line_dash='dashed') * \
VLine(665).opts(color='red', line_dash='dashed')
plot * vlines
Key Technical Reference and Implementation Summary¶
Core Technical Competencies Implemented¶
- High-Dimensional Cube Manipulation: Loaded, parsed, and coordinate-sliced 200+ continuous spectral bands using
xarrayandrioxarraystructures. - Programmatic Endmember Signature Extraction: Isolated, normalized, and mapped full hemisphere water-leaving reflectance ($R_{rs}$) spectra for clear, coastal, and turbid water classifications.
- Bio-optical Modeling: Coded and executed band-ratio bio-optical algorithms (e.g., OC4-style blue/green ratio combinations) for quantitative Chlorophyll-a retrieval.
- Scientific Product Validation: Validated custom Chlorophyll-a abundance estimations against NASA standard Level-2 products (
chlor_a) to quantify algorithm bias and RMS error boundaries. - Radiative Physics and Diagnostics: Programmatically mapped diagnostic absorption peaks (around 440 nm and 665 nm) and reflectance green-peaks (555 nm) to characterize chlorophyll concentrations from first principles.
- Data Visualization: Built interactive cartographic displays with HoloViews and
hvplotto enable multi-spectral geospatial exploration.
Theoretical Coastal and Marine Remote Sensing Foundations¶
1. Narrow-Band Diagnostic Spectroscopy vs. Broad-Band Restrictions¶
Broad-band multispectral sensors (such as Sentinel-2 or Landsat-9) integrate signals over large spectral windows (e.g., 20 nm wide or more), thereby averaging out fine-scale radiance changes. In contrast, the Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) mission delivers narrow 5 nm spectral resolution. This allows the resolving of subtle spectral features, such as distinguishing distinct phytoplankton pigments (e.g., chlorophyll-a, chlorophyll-b, phycoerythrin) or quantifying dissolved organic matter (CDOM) fractions in complex optically shallow waters.
2. Bio-optical Algorithmic Approximations¶
Standard bio-optical algorithms estimate Chlorophyll-a concentration by utilizing reflectance ratios from bands where pigment absorption is high (blue wavelengths, ~443–490 nm) relative to bands where absorption is minimal and scatter dominates (green wavelengths, ~555–560 nm). The general formulation is modeled as: $$\log_{10}([\text{Chl-a}]) = a_0 + \sum_{i=1}^{4} a_i \left( \log_{10} \frac{R_{rs}(\lambda_{\text{blue}})}{R_{rs}(\lambda_{\text{green}})} \right)^i$$ By dynamically selecting the maximum blue band reflectance ($443$, $490$, or $510$ nm) in the numerator, the algorithm maintains optimal sensitivity across oligotrophic (clear) to highly eutrophic (turbid/coastal) water boundaries.
3. Spatial and Spectral Scale-Up Workflows¶
The processing framework utilized in this system (integrating xarray multidimensional data arrays with lazy dask task-graphs) serves as a blueprint for big-data cloud analytics. Whether executing local prototyping or scaling up to planetary-scale observations of high-throughput hyperspectral datasets, leveraging named dimensions and coordinate-aligned arrays significantly decreases execution latency and eliminates memory bottlenecks.
Technical Recommendations and Diagnostic Exploration¶
- Multi-Component Inversion Models: Build and implement optimization inversion routines (such as Semi-Analytical bio-optical models (GSM or GIOP)) to simultaneously solve for Chlorophyll-a, Total Suspended Matter (TSM), and CDOM absorption.
- Atmospheric Correction Validation: Apply atmospheric correction methods (e.g., Rayleigh and aerosol reflectance subtraction) on Level-1B Top-Of-Atmosphere (TOA) data to confirm the fidelity of processed surface reflectance values.
- Spectral Endmember Unmixing: Implement non-negative matrix factorization (NMF) or fully constrained linear unmixing to isolate sub-pixel pure endmember targets.