1  openEO local processing

In this first flood mapping example we assume that the user has the relevant data on their own local device. The data required for TU Wien flood mapping algorithm consists of \(\sigma^0\) backscatter data, the projected local incidence angle (PLIA) values of those measurements, and the harmonic parameters of a model fit on the pixel’s backscatter time series.

Note

The data can be formatted to the correct structure with the files found in the GitHub repository.

1.1 Setting-up a Python Session

We begin by loading openEO for local processing and some additional packages for transforming and viewing data.

from datetime import datetime
from pathlib import Path
import numpy as np

from openeo_flood_mapper_local.view_flood_map import view_flood_map

from openeo.local import LocalConnection
from openeo.processes import ProcessBuilder, array_element, add, multiply, sin, cos, mask, exp, median
Did not load machine learning processes due to missing dependencies: Install them like this: `pip install openeo-processes-dask[implementations, ml]`

1.2 Data Sources

The paths to the local data sources define the collections to be loaded in the next steps. In the case of a local openEO instance, we do this by just supplying the paths to the files required for the analysis.

ROOT_DATA = ""
hparam_id = Path(f"{ROOT_DATA}openEO_local/tuw_s1_harpar/S1_CSAR_IWGRDH/SIG0-HPAR/V0M2R3/EQUI7_EU020M/E054N006T3/D080.nc")
plia_id = Path(f"{ROOT_DATA}openEO_local/s1_parameters/S1_CSAR_IWGRDH/PLIA-TAG/V01R03/EQUI7_EU020M/E054N006T3/PLIA-TAG-MEAN_20200101T000000_20201231T235959__D080_E054N006T3_EU020M_V01R03_S1IWGRDH.nc")
sig0_id = Path(f"{ROOT_DATA}openEO_local/s1_parameters/S1_CSAR_IWGRDH/SIG0/V1M1R1/EQUI7_EU020M/E054N006T3/SIG0_20180228T043908__VV_D080_E054N006T3_EU020M_V1M1R1_S1AIWGRDH_TUWIEN.nc")

1.3 Connect to an openEO Backend

Establish the local connection by supplying path(s) to root directories of the data source(s). This results in a connection object which is a critical aspect of collection discovery on the backend by openEO, where in this instance the backend is your own machine.

local_connection = LocalConnection([
    hparam_id.parent.as_posix(), 
    plia_id.parent.as_posix(), 
    sig0_id.parent.as_posix()
])

1.4 Load Collections

We can then load the collections. This is done by using the method load_collection() and by using the collection ids as defined above.

hparam_dc = local_connection.load_collection(str(hparam_id))
plia_dc = local_connection.load_collection(str(plia_id))
sig0_dc = local_connection.load_collection(str(sig0_id))

1.5 openEO Analysis

openEO supplies a set of conventional and EO-specific functions (or “processes” in openEO terminology) to work with EO data. In this example, we begin by defining functions that represent the flood mapping algorithm as defined in Bauer-Marschallinger et al. (2022), and by using standard openEO functions. The flood mapping algorithm extracts the following information: 1) the expected backscattering from water bodies, and 2) the expected backscatter intensity over land pixels given historical data. Hence, this function makes use of observations along both the spatial and temporal dimensions of the datacube.

We first define the function that extracts average backscatter intensity over water bodies and we name it water_backscatter(). This functions applies so-called openEO “band math”, which are basically mathematical computations on the bands of the datacube. In this case, the band of the incidence angle (degrees) of the retrieved backscatter signal is multiplied by a factor consisting of the slope plus an intercept from a linear model. This linear model describes the relationship between incidence angle and backscattering over water globally. Applying this linear model results in expected water back scattering. In a follow-up, we use a so-called reducer function, taking the mean of the band over the time dimension, after which, we rename this dimension “wbsc” of type band.

def water_backscatter(plia_dc):
    return (plia_dc * -0.394181 + -4.142015).reduce_bands('mean'). \
        add_dimension('bands', 'wbsc', 'bands')

We can apply this function to the incidence angle datacube, as follows:

water_bsc_dc = water_backscatter(plia_dc)
water_bsc_dc

Here we can see the basic premise of openEO. The previous call did not actually perform the data processing it only generates a JSON representation of the processing graph. Only by calling execute() on this object, we can actually perform the processing, like so:

water_bsc_dc.execute()
<xarray.DataArray (bands: 1, y: 1300, x: 930)>
dask.array<broadcast_to, shape=(1, 1300, 930), dtype=float32, chunksize=(1, 1300, 930), chunktype=numpy.ndarray>
Coordinates:
  * x            (x) float64 5.661e+06 5.661e+06 5.661e+06 ... 5.68e+06 5.68e+06
  * y            (y) float64 6.42e+05 6.42e+05 6.42e+05 ... 6.16e+05 6.16e+05
    spatial_ref  int64 ...
  * bands        (bands) <U4 'wbsc'
Attributes:
    reduced_dimensions_min_values:  {'bands': 'PLIA'}

We define a second function to obtain expected backscattering over land pixels. In this case we will have to use historical Sentinel-1 data for each pixel to negate the effect of seasons on the sigma nought signal. Hence a so-called harmonic model is fitted. The following function harmonic_expected_backscatter() uses this harmonic model for estimations optimised to filter out seasonal signals.

def harmonic_expected_backscatter(data, dtime_str):
    w = np.pi * 2 / 365
    dt = datetime.strptime(dtime_str, "%Y-%m-%d")
    t = dt.timetuple().tm_yday
    wt = w * t

    M0 = data.band('M0')
    S1 = data.band('S1')
    S2 = data.band('S2')
    S3 = data.band('S3')
    C1 = data.band('C1')
    C2 = data.band('C2')
    C3 = data.band('C3')
    hm_c1 = (M0 + S1 * np.sin(wt)) + (C1 * np.cos(wt))
    hm_c2 = ((hm_c1 + S2 * np.sin(2 * wt)) + C2 * np.cos(2 * wt))
    hm_c3 = ((hm_c2 + S3 * np.sin(3 * wt)) + C3 * np.cos(3 * wt))
    return hm_c3.add_dimension('bands', 'hbsc', 'bands')

We can now again apply this function to a datacube. For this operation we use the hparam_dc datacube defining the harmonic parameters of said model. These parameters together with the date of the flooding event generate the expected backscattering per land pixel.

land_bsc_dc = harmonic_expected_backscatter(hparam_dc, '2018-02-01')

So far we have covered a couple of the core features of the openEO Python Client syntax. However, in all these cases, the shape of the datacube was not altered. In the following section we cover what to do when want to change shape through accumulation or reduction of input values.

1.6 Processes with Child “callbacks”

Now we will define the last function, which calculates the probability of flooding with a Bayesian classification model. The output of this function tells if a pixel is flooded based on the previous defined expected land and water backscattering. The implementation of this function is, however, different from the previous functions, as it is applied to subsets of datacubes through the openEO function reduce_bands() and it thereby changes the shape of the input values. This family of functions, which also includes, e.g., apply(), reduce_dimension(), and aggregate_spatial(), are known as the “parent” functions. These parent functions invoke a subprocess on the datacube, so-called child “callbacks”.

The following function will be used as a child callback to calculate flooding probabilities. Note that the function requires the openEO helper object ProcessBuilder.

def bayesian_flood_decision(x: ProcessBuilder) -> ProcessBuilder:
    nf_std = 2.754041
    sig0 = x.array_element(index=0)
    std = x.array_element(index=1)
    wbsc = x.array_element(index=2)
    hbsc = x.array_element(index=3)

    f_prob = (1.0 / (std * np.sqrt(2 * np.pi))) * np.exp(-0.5 * \
        (((sig0 - wbsc) / nf_std) ** 2))
    nf_prob = (1.0 / (nf_std * np.sqrt(2 * np.pi))) * np.exp(-0.5 * \
        (((sig0 - hbsc) / nf_std) ** 2))

    evidence = (nf_prob * 0.5) + (f_prob * 0.5)
    f_post_prob = (f_prob * 0.5) / evidence 
    nf_post_prob = (nf_prob * 0.5) / evidence 

    # flood if flood class has higher probability
    return f_post_prob.gt(nf_post_prob)

We will use a child callback to reduce a datacube consisting of the expected backscatter over water, the expected backscattering over land and it’s standard deviation, and the sigma nought values to one new band comprising the flood classifications per pixel.

For this we will first have to load the standard deviations of the expected land backscattering for each pixel and the sigma nought values for the particular timeperiod of the expected flooding (February 2018).

std_dc = hparam_dc.band('STD').add_dimension('bands', 'std', 'bands')
sig0_dc = sig0_dc.reduce_bands('mean').add_dimension('bands', 'sig0', 'bands')

Now we can merge all these datacubes, like so:

decision_in = sig0_dc. \
    merge_cubes(std_dc). \
    merge_cubes(water_bsc_dc). \
    merge_cubes(land_bsc_dc). \
    merge_cubes(plia_dc)

2 Classification

The merged preprocessed datacube can then be used as input for the Bayesian flood mapping function that reduces all these bands to just one in the resulting datacube.

flood_dc = decision_in.reduce_bands(bayesian_flood_decision). \
    add_dimension('bands', 'dec', 'bands')
flood_dc = flood_dc.merge_cubes(decision_in)

2.1 openEO Flood Mapping

We can check the results of the openEO by executing the processing steps with execute() and by plotting the flood mapping classification.

flood_decision = flood_dc.execute()
view_flood_map(flood_decision[0])

openEO floodmap - no pre-processing

By comparing this figure with the original study (Bauer-Marschallinger et al. 2022), we see that the openEO workflow can perform the same operations. There are, however, some differences with the original flood mapping study. These differences relate to the absence of the low sensitivity masking and post-processing steps of the flood probabilities in the openEO workflow. A priori low sensitivity masking removes observations in which situations arise that cause insensitivity to flood conditions for physical, geometric, or sensor-side reasons. Whereas, post-processing removes e.g. the small patches of supposed flooded pixels scattered throughout the image also known as “speckles”. These speckles produce a more noisy picture in the openEO example.

2.2 Masking of Low Sensitivity Pixels

We continue by improving our flood map by filtering out observations that we expect to have low sensitivity to flooding based on a predefined set of criteria.

2.2.1 Masking of Exceeding Incidence Angles

Firstly we mask areas where the incidence angle exceeds the maximum tolerable range of \(27\degree\) to \(48\degree\). These larger than usual incidence angles are a result of the area’s topography as beams reflect from steep slopes.

mask_ia = (flood_dc.band("PLIA") >= 27) * (flood_dc.band("PLIA") <= 48)
flood_dc = flood_dc * mask_ia

This results in the following map:

openEO floodmap - masking exceeding incidence angles

2.2.2 Identification of Conflicting Distributions

We remove values that have already low backscatter values during normal conditions and which do not represent water. Examples of such surfaces are highways, airstrips. salt panes, or arid sand and/or bedrock. Identification of such conflicting distribution is done by comparing the expected local land distribution (from the harmonic model) with those from the water distribution, if these cannot be distinguished from each other, the pixel is excluded.

water_bsc_threshold = decision_in.band("wbsc") + 0.5 * 2.754041
mask_conflict = decision_in.band("hbsc") > water_bsc_threshold
flood_dc = flood_dc * mask_conflict

Exclusion of these conflicting distributions look as follows:

openEO floodmap - masking conflicting distributions + exceeding incidence angles

2.2.3 Removal of Measurement Outliers

Extreme backscatter values are yet another source of insensitivity to floods. These outliers are not properly represented by the Bayesian model probabilities.

land_bsc_lower = flood_dc.band("hbsc") - 3 * flood_dc.band("std")
land_bsc_upper = flood_dc.band("hbsc") + 3 * flood_dc.band("std")
water_bsc_upper = flood_dc.band("wbsc") + 3 * 2.754041

mask_land_outliers = (flood_dc.band("sig0") > land_bsc_lower) * (flood_dc.band("sig0") < land_bsc_upper)
mask_water_outliers = flood_dc.band("sig0") < water_bsc_upper
flood_dc = flood_dc * (mask_land_outliers | mask_water_outliers)

Adding this to the prvious masking results in the following map:

openEO floodmap - masking extreme outliers + conflicting distributions + exceeding incidence angles

2.2.4 Denial of High Uncertainty on Decision

In some cases the posterior distribution is ambiguous as it falls close to a 0.5 probability of flooding (i.e., a coin flip). This happens when the probability distributions for water and land backscattering overlap and/or the measured backscatter values falls exactly in the middle of the two distributions. Hence a cut-off of 0.2 is used to limit the potential of falls positive classifications.

mask_uncertainty = flood_dc.band("dec") > 0.8
flood_dc = flood_dc * mask_uncertainty

This results in the following floodmap.

openEO floodmap - masking high uncertainty classifications + extreme outliers + conflicting distributions + exceeding incidence angles

2.3 Postprocessing

The following steps are designed to further improve the clarity of the floodmaps. These filters do not directly relate to prior knowledge on backscattering, but consists of contextual evidence that supports, or oppose, a flood classification.

2.3.1 Removal of Speckles

One such filtering approach targets so-called speckles. These speckles are areas of one or a few pixels, and which are likely the result of the diversity of scattering surfaces at a sub-pixel level. In this approach it is argued that small, solitary flood surfaces are unlikely. Hence speckles are removed by applying a smoothing filter which consists of a rolling window median along the x and y-axis simultaneously.

This approach is realized in openEO with the parent function apply_neighborhood() with a window size of 3 by 3 pixels. The child process median is used to average the window’s values.

flood_decision = flood_dc. \
    apply_neighborhood("median", dict(x=-1, y=-1), dict(x=2, y=2)). \
    execute()

This results in a much clearer flood map, as shown below:

/home/mschobbe/miniconda3/envs/openeo-flood-mapper-local/lib/python3.10/site-packages/dask/utils.py:73: RuntimeWarning: All-NaN slice encountered
  return func(*args, **kwargs)

openEO floodmap - masking high uncertainty classifications + extreme outliers + conflicting distributions + exceeding incidence angles + majority filter