Merge and dedupe GeoJSON

Overview

This pipeline is for situations where multiple GeoJSON files describe the same class of features but differ in schema, geography, or timing. The objective is not just to merge them. It is to merge them without losing the ability to explain where a record came from and why one duplicate was kept over another.

Why it matters

Overlapping extractions create duplicate features fast.
Attribute names drift between files.
Small schema mismatches become large downstream problems.
Web maps and analyses both degrade when duplicates are left unresolved.

When to use

Use this pattern when:

stitching adjacent bbox pulls together
combining multiple consultant or project deliveries
assembling one publish layer from several source runs
preparing a web-map package from messy interim files

Inputs

two or more GeoJSON files
a target schema
rules for duplicate detection
optional source priority order

Workflow and method

Inventory Record the input files, dates, and schemas.
Normalize Standardize field names and geometry types.
Append Combine inputs into one staged layer while preserving source metadata.
Detect duplicates Use identifier, geometry, name, or proximity logic.
Resolve duplicates Keep, merge, or flag records based on a documented rule.
Export Produce a final layer and a QA table of what changed.

Duplicate logic options

Rule type	Useful when	Risk
Same source ID	IDs are stable across files	Fails when IDs changed or were dropped
Same normalized name + near-identical geometry	Feature labels are reliable	Can collapse legitimately distinct nearby sites
Spatial proximity threshold	Point data is loosely consistent	Can over-merge dense clusters
Source priority	One source is clearly authoritative	Can hide useful attributes from lower-priority sources

Suggested file layout

data/
  raw/
  staged/
  processed/
scripts/
  merge/
    normalize_geojson.js
    detect_duplicates.js
    resolve_duplicates.js
    export_final_geojson.js
logs/

QA checks

Compare feature counts before and after dedupe.
Inspect clusters where more than one record was collapsed.
Confirm geometry type did not drift unexpectedly.
Spot-check names and source fields in the final output.

Outputs

Expected outputs:

merged staged layer
final deduplicated layer
duplicate review table
short run log describing rules applied

Limitations

No universal duplicate rule works across every layer. Dense settlement points, facilities with shared campuses, and mixed point/polygon representations often require manual review or class-specific logic.