Overview

This pipeline is designed for targeted thematic extraction from OpenStreetMap, especially for early-stage infrastructure, environmental, and access-screening work. It assumes you are pulling specific feature classes rather than mirroring an entire country extract.

Why it matters

The failure mode in many OSM workflows is not the query itself. It is everything after the query:

The pipeline fixes that by separating the steps.

When to use

Use this pattern when:

Inputs

Live normalization example

Workflow and method

  1. Request Draft the Overpass query and save the query text alongside the output.
  2. Extract Export the raw JSON or GeoJSON exactly as returned.
  3. Clean Standardize field names, promote essential tags, and remove clearly broken records.
  4. Dedupe Resolve obvious duplicates across nodes, ways, and relations or from overlapping pulls.
  5. Split Separate outputs by geometry type or analytical class where needed.
  6. Export Produce final layers for mapping, analysis, and archival reuse.

Suggested file naming

Use names that encode geography, source, theme, stage, and date.

moz_temane_osm_schools-clinics_raw_2026-03-19.geojson
moz_temane_osm_schools-clinics_clean_2026-03-19.geojson
moz_temane_osm_schools-clinics_final_2026-03-19.gpkg

That naming pattern makes it obvious which file is disposable and which file is a handoff artifact.

Pipeline logic

Request

Store the query text in version control or beside the output folder. The query is part of the data provenance.

Extract

Keep one untouched raw export. If you need to rerun cleaning logic later, that file is the baseline.

Clean

Typical cleaning tasks:

Dedupe

Check for:

Split

Split outputs when different downstream consumers need different packages, for example:

Export

Produce one clean analytical layer and one publish-ready layer when the styling or schema requirements differ.

Tools and suggested script layout

scripts/
  overpass/
    fetch_osm_overpass.sh
    normalize_osm_geojson.js
    dedupe_osm_features.js
    export_osm_packages.js
data/
  raw/
  interim/
  processed/
queries/
  schools_clinics.overpassql

Recommended command flow:

./scripts/overpass/fetch_osm_overpass.sh queries/schools_clinics.overpassql
node ./scripts/overpass/normalize_osm_geojson.js
node ./scripts/overpass/dedupe_osm_features.js
node ./scripts/overpass/export_osm_packages.js

QA checks

Run QA before you publish or hand off:

Outputs

The pipeline should end with:

Limitations

This pipeline improves reliability, but it does not solve sparse mapping, uncertain place names, or project-specific interpretation problems. Those still require method choices and sometimes field validation.