Calculating the Two-Step Floating Catchment Area Index

This guide solves one specific problem: computing a defensible Two-Step Floating Catchment Area (2SFCA) accessibility score from a network travel-time matrix without the silent biases that creep in from coordinate-system drift, unfiltered origin-destination explosion, and routing gaps treated as zero access. It is part of Spatial Equity Index Calculation, within Healthcare Access & Network Analysis Automation.

Problem Context & Constraints

The 2SFCA index quantifies spatial access to care by chaining two ratio passes: first a supply-to-demand ratio at each facility, then an aggregation of those ratios back to each demand site, with distance decay applied in both passes. The naive implementation — a single nested loop over every facility-population pair, a fixed travel-distance buffer, and a hard cutoff at the threshold — produces numbers, but the wrong ones, for three reasons that are specific to public health surveillance data:

Distance computed in the wrong units. A 2SFCA score is only as valid as the distances feeding it. Summing or thresholding distances in WGS84 degrees mixes units and distorts with latitude, so a “30 km” catchment is wider north-to-south than east-to-west. Distance and decay math is only correct after projecting every layer to a shared metric CRS — the same projection discipline enforced when you align WGS84 and UTM for county health data before any distance calculation, drawn from the broader Coordinate Reference Systems for Public Health method set.
A hard distance cutoff is a step function, not access behaviour. Binary catchments (in or out at exactly 30 km) make the score discontinuous: a population unit one meter past the threshold contributes nothing, one meter inside contributes fully. Real utilization decays gradually, so an enhanced 2SFCA applies a continuous decay kernel inside the catchment instead of a flat weight.
Missing distances become fabricated zeros. When network routing drops a pair — disconnected component, restricted segment, or an API timeout in batch OSM routing — the absent row is not “no access,” it is “not measured.” Filling it with zero understates accessibility in exactly the underserved areas the index exists to surface. The pipeline must distinguish a true out-of-catchment pair from an unmeasured one.

The implementation below addresses all three with explicit CRS enforcement, a Gaussian decay kernel, spatial pre-filtering to keep the origin-destination matrix from exploding to O(n²), and an explicit measured-versus-unmeasured flag carried through to the output.

Prerequisites

Pin the GIS stack so decay math and spatial joins are reproducible across runs:

# requirements (pinned)
# geopandas==0.14.4
# shapely==2.0.4
# pyproj==3.6.1
# pandas==2.2.2
# numpy==1.26.4
# scipy==1.13.1

Input requirements:

Supply layer (supply_gdf): one row per facility, a stable facility_id, an aggregated capacity column (FTE clinicians, licensed beds, or weekly appointment slots — never patient-level records), and point geometry. De-identify at ingestion: strip PII, retain only capacity counts.
Demand layer (demand_gdf): one row per population unit (census tract, block group, or hex cell), a stable demand_id, a population denominator, and geometry.
Travel matrix (od_matrix): a long-format table with from_id (demand), to_id (facility), and distance_m or time_s, produced upstream by a network router. Pairs beyond the maximum catchment may be omitted; pairs that failed to route must be retained with a null distance and a flag, not silently dropped.
CRS: every layer projected to a shared metric CRS (UTM zone or state plane) before any code below runs.

Step-by-Step Solution

1. Enforce a shared metric CRS

Run every layer through a single projection-and-repair gate before routing or decay. This is the guard against the units bug and against invalid geometries that corrupt spatial joins downstream:

import geopandas as gpd
import logging
from pyproj import CRS
from shapely.validation import make_valid

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

def standardize_crs_and_validate(
    gdf: gpd.GeoDataFrame, target_epsg: int, layer_name: str
) -> gpd.GeoDataFrame:
    """Project to a metric CRS, repair geometries, drop empties. Logs every action for audit."""
    target_crs = CRS.from_epsg(target_epsg)
    if gdf.crs is None:
        raise ValueError(f"[{layer_name}] missing CRS; refuse to assume one")
    if gdf.crs != target_crs:
        logging.info(f"[{layer_name}] transforming {gdf.crs.to_epsg()} -> EPSG:{target_epsg}")
        gdf = gdf.to_crs(target_crs)

    gdf = gdf.copy()
    gdf["geometry"] = gdf["geometry"].apply(make_valid)
    empty_mask = gdf.geometry.is_empty | gdf.geometry.isna()
    if empty_mask.any():
        logging.warning(f"[{layer_name}] dropped {int(empty_mask.sum())} empty/invalid geometries")
    return gdf[~empty_mask].copy()

A projected CRS such as a UTM zone keeps distance_m linear and in meters; refusing to proceed on a missing CRS prevents the silent-degree-arithmetic failure mode entirely.

2. Build a Gaussian decay kernel

The enhanced 2SFCA replaces the binary catchment with a continuous weight that falls smoothly to zero at the threshold. A Gaussian kernel is the standard choice for healthcare utilization:

import numpy as np

def gaussian_decay(dist_m: np.ndarray, threshold_m: float, sigma_ratio: float = 0.5) -> np.ndarray:
    """Decay weights in [0, 1]. Sigma is a fraction of the threshold; beyond threshold -> 0."""
    sigma = threshold_m * sigma_ratio
    weights = np.exp(-0.5 * (dist_m / sigma) ** 2)
    weights[dist_m > threshold_m] = 0.0
    return weights

The sigma_ratio controls how sharply weight falls within the catchment; record the value you ship, because the score is not comparable across runs that used different decay parameters.

3. Pre-filter the matrix and run both passes vectorized

A full demand × supply matrix is O(n²) and exhausts memory at county scale. Keep only pairs inside the catchment (the upstream router should already cap distance; this is a defensive trim) and run both 2SFCA passes as pandas group-bys instead of Python loops:

import pandas as pd
import numpy as np
import geopandas as gpd

def calculate_2sfca(
    od_matrix: pd.DataFrame,
    supply_gdf: gpd.GeoDataFrame,
    demand_gdf: gpd.GeoDataFrame,
    capacity_col: str,
    population_col: str,
    threshold_m: float,
    sigma_ratio: float = 0.5,
) -> gpd.GeoDataFrame:
    """
    od_matrix : columns from_id (demand), to_id (facility), distance_m.
    supply_gdf: facility_id + capacity_col.
    demand_gdf: demand_id + population_col.
    Returns demand_gdf with access_score and a measured-coverage flag.
    """
    # Deterministic ordering so the run is byte-reproducible
    od = od_matrix.sort_values(["from_id", "to_id"]).copy()

    # Defensive trim: drop pairs beyond the catchment; keep failed routes flagged separately
    routed = od["distance_m"].notna()
    in_catchment = routed & (od["distance_m"] <= threshold_m)
    od = od.loc[in_catchment].copy()

    # Attach demand population to each edge, then decay-weight it
    pop = demand_gdf[["demand_id", population_col]].rename(columns={"demand_id": "from_id"})
    od = od.merge(pop, on="from_id", how="left")
    od["decay"] = gaussian_decay(od["distance_m"].to_numpy(), threshold_m, sigma_ratio)
    od["weighted_pop"] = od["decay"] * od[population_col]

    # --- Step 1: supply perspective -> ratio R_j at each facility ---
    fac_demand = od.groupby("to_id")["weighted_pop"].sum().rename("weighted_demand")
    supply = supply_gdf.merge(fac_demand, left_on="facility_id", right_index=True, how="left")
    supply["weighted_demand"] = supply["weighted_demand"].fillna(0.0)
    supply["r_j"] = np.where(
        supply["weighted_demand"] > 0,
        supply[capacity_col] / supply["weighted_demand"],
        0.0,  # isolated facility: defined, not NaN
    )

    # --- Step 2: demand perspective -> accessibility A_i at each site ---
    od = od.merge(
        supply[["facility_id", "r_j"]], left_on="to_id", right_on="facility_id", how="left"
    )
    od["access_contrib"] = od["decay"] * od["r_j"].fillna(0.0)
    access = od.groupby("from_id")["access_contrib"].sum().rename("access_score")

    out = demand_gdf.merge(access, left_on="demand_id", right_index=True, how="left")
    # Sites with no in-catchment, routable facility: genuine zero access
    out["access_score"] = out["access_score"].fillna(0.0)
    # Coverage flag: did this site have any measured (routed) facility at all?
    measured_sites = set(od["from_id"].unique())
    out["measured"] = out["demand_id"].isin(measured_sites)
    return out

The two floating-catchment passes chain together as follows: facility supply ratios are computed first, then carried into the demand aggregation.

The grouping is deterministic because the matrix is sorted by stable IDs first, so two runs over identical inputs produce byte-identical output — a prerequisite for the audit trail below.

Validation & Edge Cases

Three failure modes recur in production. Each has a cheap diagnostic.

Isolated facilities produce divide-by-zero. A facility with no demand in its catchment has weighted_demand == 0. The np.where guard returns 0.0 rather than inf/NaN, but you still want to see how many fired:

isolated = int((supply["weighted_demand"] == 0).sum())
logging.info(f"isolated facilities (no in-catchment demand): {isolated} / {len(supply)}")
# Example output -> INFO: isolated facilities (no in-catchment demand): 3 / 412

Unmeasured sites masquerade as zero-access deserts. The measured flag separates genuine zeros from routing gaps. Audit the split before anyone reads the surface as a coverage map:

zeros = out["access_score"] == 0
genuine = int((zeros & out["measured"]).sum())     # truly no reachable capacity
unmeasured = int((zeros & ~out["measured"]).sum())  # routing never reached this site
logging.info(f"zero-access sites: {genuine} genuine, {unmeasured} unmeasured")
# WARNING here if unmeasured is non-trivial -> investigate the routing job, not the geography

A non-trivial unmeasured count points back upstream — usually a disconnected network component or a routing batch that lost chunks — and should block publication until resolved.

The decay parameter silently changes the ranking. Re-running with a different sigma_ratio reshuffles which sites look underserved. Cross-check the output’s spread against an independent reference — the proportion of demand falling inside federally designated Health Professional Shortage Areas (HPSAs), or historical utilization — and treat a large drift between quarters as a routing or parameter change to investigate, not a real-world shift:

prev, curr = previous_run["access_score"], out["access_score"]
drift = (curr.median() - prev.median()) / prev.median()
if abs(drift) > 0.15:
    logging.warning(f"median access drifted {drift:.1%} vs prior run — validate routing & params")

Compliance Notes

For regulatory defensibility, every score must be reconstructable from its inputs and parameters. Log, and persist alongside the output, the values that change the result:

Source CRS and the target EPSG the layers were projected to, captured by standardize_crs_and_validate.
threshold_m and sigma_ratio — the catchment size and decay shape fully determine the kernel.
Capacity and population column names, plus the de-identification step applied at ingestion (capacity counts only, no patient-level data — satisfying HIPAA Safe Harbor and GDPR pseudonymization by construction).
The measured-versus-unmeasured split, so a reviewer can see which zeros are real.
A configuration hash and run timestamp. Emit the parameter set to a companion JSON manifest and record a SHA-256 of it; combined with the stable-ID sort above, this makes the run deterministic and reproducible. Serialize the surface to GeoParquet or GeoPackage with an ISO 19115 geographic metadata record so the lineage travels with the file.

FAQ

When should I use 2SFCA instead of a simple provider-to-population ratio?

Use 2SFCA when access crosses administrative boundaries — when residents of one tract routinely use facilities in another. A plain provider-to-population ratio assumes people only use facilities inside their own unit, which overstates access where capacity is concentrated and understates it next door. 2SFCA’s two passes let supply spill across boundaries with distance decay, which is the realistic case for clinics and hospitals.

Why Gaussian decay rather than a binary catchment?

A binary catchment is a step function: a population unit one meter past the threshold contributes nothing while one just inside contributes fully, which makes the score discontinuous and sensitive to where exactly you draw the line. The Gaussian kernel falls smoothly to zero at the threshold, so the result is stable to small distance changes and better matches observed utilization. Keep the threshold itself fixed as the policy definition of the catchment and let the kernel shape decay inside it.

A demand site shows zero access but a clinic is clearly nearby — what happened?

Check the measured flag first. If the site is unmeasured, the routing job never produced a pair for it — typically a disconnected network component or a dropped routing chunk — so the zero is an artifact, not a real desert. If the site is measured but still zero, the nearby clinic is beyond threshold_m along the actual network (not the straight line), or it has zero capacity. The flag tells you which investigation to run.

Do I have to project before computing distances?

Yes. Distances summed or thresholded in WGS84 degrees mix units and distort with latitude, which invalidates both the catchment cutoff and the decay weights. Project every layer to a metric CRS (UTM or state plane) first, and record the resolved EPSG code in the run log so the result is reproducible.

Spatial Equity Index Calculation — the parent disparity-scoring pipeline that consumes this accessibility surface.
Handling API Timeouts in Batch OSM Routing — how to keep the travel-time matrix complete so no pair becomes a fabricated zero.
Generating 15-Minute Walk Isochrones for Rural Clinics — the pedestrian catchments that can feed the demand-side step.
How to Align WGS84 and UTM for County Health Data — the projection discipline the distance math depends on.