Streaming vs Batch Geofence Evaluation: Latency, Throughput, and Execution Trade-offs

The failure mode this page addresses is the one teams hit when they pick an execution model by reflex instead of by budget: a streaming pipeline that posts a clean P50 but blows its tail every time a worker process garbage-collects, or a batch pipeline whose window latency quietly exceeds the time a vehicle takes to cross a surge-pricing perimeter. Real-time mobility and IoT platforms run geofence evaluation under hard service-level agreements — the trigger drives routing, dynamic pricing, and automated safety interventions — and the choice between evaluating each coordinate event as it arrives versus accumulating events into windows is a deterministic function of the latency budget, the polygon topology density, and the fault-tolerance contract, not a stylistic one. This page expands the streaming-versus-batch decision introduced in Core Architecture & Latency Constraints: streaming pipelines optimize for event velocity, stateful windowing, and immediate emission, while batch processors optimize for spatial index compaction, vectorized geometry, and post-hoc trajectory reconciliation. The numbers throughout assume a single evaluation node driven at 40k–80k events/sec against a roughly 50 ms P95 trigger budget.

Side-by-Side Execution Model

Algorithmic Divergence & Latency Profiles

The two models diverge before any geometry runs, in how work is grouped. Streaming evaluates one coordinate against the active geofence set the instant it deserializes, so its latency is dominated by per-event fixed costs — deserialization, coordinate normalization, spatial index lookup, and the exact point-in-polygon evaluation — plus whatever scheduling jitter the runtime adds. Batch amortizes those fixed costs across a window of events, trading a large, deterministic head-of-line delay (you cannot emit before the window closes) for far higher steady-state throughput and the ability to vectorize the inner loop.

The table below is a head-to-head profile measured on one 12-core node holding ~250k active geofences (~3.1M vertices), driven by a synthetic 1–10 Hz telemetry generator. “Trigger latency” is the time from packet arrival to ENTER/EXIT emission; for batch it includes the window-fill wait. Latencies exclude network ingress.

Execution model	Window	P50 trigger	P95 trigger	P99 trigger	Throughput @ 32 workers	Notes
Streaming, naive (loop-bound PiP)	none	6 ms	38 ms	140 ms	22k eval/s	GC + GIL serialize; tail blows up
Streaming, offloaded + sliding 3 s	3 s debounce	4 ms	21 ms	47 ms	58k eval/s	Hysteresis suppresses flapping
Micro-batch (vectorized)	1 s	0.6 s	1.1 s	1.3 s	96k eval/s	Latency floored by window
Batch (windowed reconcile)	30 s–5 min	≥ window	≥ window	≥ window	140k eval/s	Highest throughput, no real-time triggers

Two reads of this table matter. First, naive streaming is the trap: its P50 looks competitive, but the P99 of 140 ms means one in a hundred safety triggers misses its deadline, and tail latency back-propagates into the producer connection pool. Offloading the geometry and adding a sliding window is what brings the tail under the 50 ms budget. Second, micro-batch does not “reduce” latency below its window — a 1 s window has a 1 s floor by construction — but it buys roughly 4× the throughput of naive streaming, which is exactly the trade you want for reconciliation rather than live triggering.

For high-frequency telemetry at 1–5 Hz, streaming pipelines run sliding windows of 3–5 seconds to debounce GPS noise and suppress trigger flapping at zone boundaries; stateful operators keep an active geofence context per device id so dead-reckoning interpolation and dwell-time logic have somewhere to live. Where the exact PiP test dominates the critical path, algorithmic choice is the primary throughput lever — ray-casting with precomputed bounding boxes consistently beats winding-number in streaming contexts, and the empirical baselines under exactly this kind of sustained load are quantified in point-in-polygon algorithm benchmarks.

Implementation Trade-offs: GIL, asyncio, and the Critical Path

Python’s asyncio multiplexes I/O well and hides CPU-bound spatial math badly. In a streaming pipeline the telemetry arrives over an async consumer (aiokafka, an MQTT client), but the moment a coordinate hits the exact PiP test, the Global Interpreter Lock serializes every coroutine and the event loop starves during the precise ingestion peak the system exists to handle. The architectural contract is that deserialization and admission stay on the loop while geometry runs off it — in a ProcessPoolExecutor or a worker pool fed through shared memory. This offload boundary is covered in depth in async Python execution patterns for spatial math; the streaming-relevant half is that the boundary sets how many copies of each event exist at once and where the tail latency accumulates.

Batch inverts the problem. Because it operates on a window of coordinates rather than one event, it can hold the active polygon set in contiguous memory and run a single vectorized pass — numpy/SIMD-friendly AABB scans, then a bulk exact test only over survivors — so the GIL is held once for a large unit of work instead of being contended per event. The critical path for the streaming side looks like this, with the geometry explicitly pushed off the loop:

python

from __future__ import annotations

import asyncio
from concurrent.futures import ProcessPoolExecutor
from typing import Final

from shapely import Point  # type: ignore[import-untyped]
from shapely.prepared import PreparedGeometry  # type: ignore[import-untyped]

SLIDING_WINDOW_S: Final[float] = 3.0          # debounce GPS jitter
EXECUTOR_QUEUE_CEILING: Final[int] = 2         # × worker count before shedding


def evaluate_containment(lon: float, lat: float, cell_id: int) -> tuple[int, bool]:
    """CPU-bound exact test — must run in a worker, never on the loop.

    Resolves the grid cell's candidate polygons and runs prepared PiP.
    Returns (cell_id, inside) so the caller can reconcile ENTER/EXIT.
    """
    point: Point = Point(lon, lat)
    candidates: list[PreparedGeometry] = CANDIDATES_BY_CELL[cell_id]
    return cell_id, any(poly.contains(point) for poly in candidates)


async def stream_evaluate(
    queue: asyncio.Queue[tuple[float, float, int]],
    pool: ProcessPoolExecutor,
) -> None:
    """Drain the admission queue, offload geometry, keep the loop free."""
    loop = asyncio.get_running_loop()
    while True:
        lon, lat, cell_id = await queue.get()
        # run_in_executor returns control to the loop while the worker runs
        cell, inside = await loop.run_in_executor(
            pool, evaluate_containment, lon, lat, cell_id
        )
        await dispatch_trigger(cell, inside)  # idempotent enter/exit emit
        queue.task_done()

The load-bearing detail is run_in_executor: it yields the loop back while the worker computes, so I/O-bound coroutines keep making progress instead of blocking behind one polygon test. The cost is the IPC hop, which is why naive per-event offload only pays off once geometry exceeds a few hundred microseconds — below that, shared-memory batching of several coordinates per hand-off amortizes the boundary. Streaming evaluation also degrades on dense polygon sets above roughly ten thousand vertices in a single cell, where heap fragmentation and gen-2 collection pauses dominate the tail; that is the regime where batch’s contiguous layout wins outright.

Memory Footprint & Streaming Churn

Steady-state memory is the easy part of streaming; churn is what kills it. Under sustained 1–10 Hz ingestion the allocator sees a constant stream of short-lived event objects, prepared-geometry wrappers, and result tuples. If any are accidentally retained — a per-device dwell map that never evicts, a closure capturing a buffer — resident set size creeps until the generational collector runs, and that gen-2 collection is the P99 spike you see every few minutes. The streaming discipline mirrors the rules in memory-constrained spatial processing: reuse scratch buffers instead of allocating per event, bound the per-device state with an LRU plus idle TTL, and keep long-lived structures (the index snapshot, the device LRU) invisible to the collector with gc.freeze() after warm-up so steady-state sweeps only walk gen-0 garbage.

Batch has the opposite memory profile. Its advantage is precisely that it can load dense polygon sets into contiguous, cache-friendly blocks and scan them with minimal pointer chasing, so per-geometry overhead is low and predictable — but it must tolerate a large transient allocation spike when a window materializes, then release it. A batch worker’s RSS sawtooths with the window cadence; a streaming worker’s RSS should be flat. The eviction policy follows from this: streaming evicts continuously (LRU on device id), batch evicts wholesale at window close. Fragmentation is the residual risk on the streaming side, because varying candidate-set sizes fragment the small-object arena, which is why the per-evaluation scratch slab is sized once for the worst case at startup rather than grown on demand.

Async Mutation Boundaries & Queue Semantics

Production geofencing needs explicit queue semantics and deterministic failure handling, and the two models impose different contracts. Streaming pipelines default to at-least-once delivery, which forces idempotent trigger handlers and a deduplication layer — a Redis-backed bloom filter or device-sequence tracking — so a redelivered packet does not re-fire an ENTER. Exactly-once delivery via Kafka transactional semantics adds transactional overhead that usually violates a tight trigger budget, so it is reserved for billing-critical or compliance-mandated state transitions rather than the hot path.

Geofence definitions also mutate at runtime — surge zones open, road closures appear, compliance perimeters shift — and those updates must never block the query path. The pattern is copy-on-write snapshots behind an atomic pointer swap: a background task drains a bounded update queue, validates topology, builds a new immutable index, and swaps a reference, while query coroutines dereference the current snapshot with no lock. The read path costs nothing and memory is bounded to two snapshots only during the swap window. This is the same snapshot-swap model the spatial indexing reference specifies for the index subsystem.

Backpressure is non-negotiable on the ingress side. An unbounded asyncio.Queue hides downstream degradation until the heap is gone; the queue must be bounded, its depth watched, and shedding deterministic:

python

import asyncio
from typing import Final

MAX_DEPTH: Final[int] = 8_192
SHED_AT: Final[int] = int(MAX_DEPTH * 0.75)


async def admit(
    queue: asyncio.Queue[bytes], event: bytes, *, priority: bool
) -> bool:
    """Drop low-priority telemetry above 75% depth instead of growing the heap.

    Returns False on shed so the caller increments a shed_count metric
    rather than failing silently.
    """
    if queue.qsize() >= SHED_AT and not priority:
        return False
    try:
        queue.put_nowait(event)
        return True
    except asyncio.QueueFull:
        return False

For the multi-worker hand-off, a single-producer/multi-consumer ring buffer with watermark tracking outperforms a lock-guarded queue under the GIL because consumers advance independent read cursors and never contend on one lock. The watermark — the lag between the producer’s write cursor and the slowest consumer’s read cursor — is the backpressure signal: once it crosses 75% of ring capacity the admission gate sheds low-priority events. When consumer lag exceeds threshold, the pipeline degrades to coarse bounding-box checks rather than letting lag become unbounded queueing. For GPS dropouts and network partitions, a configurable grace period holds the last known valid state, followed by a batch reconciliation pass that emits corrected transitions; malformed payloads, out-of-order sequence numbers, and index lookup failures go to a dead-letter queue for asynchronous replay. The threshold contract is concrete: if executor queue depth exceeds 2× worker count or spatial lookup exceeds 50 ms, the system routes to a cached centroid-distance approximation until backpressure resolves, as the latency budget allocation framework specifies.

Operational Runbook & Failure Mitigation

When a streaming node misses its trigger budget, work the node before adding replicas — horizontal scaling masks the regression and multiplies its cost. Proceed in order:

Confirm the symptom. Read the exported trigger_latency_ms histogram and gc.get_stats(). A flat P50 with periodic P99 spikes is GC pause or executor saturation, not volume; a monotonic latency climb is a leak or growing queue lag.
Profile the hot path. Attach py-spy dump --pid <pid> for a stack snapshot and py-spy record -o flame.svg for 30 s under live load. A frame dominated by pickle means an offload boundary is still serializing per event; a frame in shapely/array allocation means the scratch slab is being recreated.
Measure executor saturation. Track asyncio.Queue.qsize() on the admission queue and the ProcessPoolExecutor pending-work count. If pending consistently exceeds worker capacity, scale workers or apply Douglas-Peucker simplification with adaptive tolerance before ingestion to cut vertex count.
Localize leaks. Diff tracemalloc snapshots every 10k evaluations with snapshot.compare_to(prev, "lineno"). Growth above ~50 MB without matching event volume names the retaining line — most often the per-device dwell map or a buffering log handler.
Quantify GC pauses. If gen-2 collections in gc.get_stats() line up with the P99 timeline, confirm gc.freeze() ran after warm-up and gc.get_threshold() reflects backed-off values; target gen-2 pauses under 2 ms.
Verify graceful degradation. Inject 0.5–3.0 s GPS dropouts and a 3× burst. Confirm the node sheds low-priority events at 75% queue depth, routes velocity spikes (>200 km/h) to the dead-letter path with an uncertainty flag, and holds the hot path inside budget.

Circuit-breaker triggers are concrete, not advisory: disable PiP for devices reporting invalid HDOP (>10) or implausible velocity (>200 km/h); deploy updated geofence polygons via versioned snapshots, routing traffic to the new index only after validation and cache warm-up; fall back to centroid-distance checks when R-tree traversal exceeds 50 ms or RSS crosses 80% of the container limit. Run a nightly state-reconciliation job that diffs streaming-emitted states against batch-verified trajectories to quantify drift, retune debounce windows, and purge stale device contexts. Expose trigger_latency_ms, executor_pending, ring_watermark_pct, dedup_hit_rate, gc_gen2_pause_ms, and shed_count as first-class metrics — without shed_count and executor_pending on a dashboard, a silent shed and a saturating pool both look like “everything is fine” until the SLA breaks.

Architectural Guidance: When to Choose Each Model

Neither model is a default; the choice follows the trigger’s deadline and the workload shape. Use this matrix.

Situation	Recommended model	Why
Safety/pricing triggers with sub-100 ms deadline	Streaming, offloaded geometry + 3–5 s sliding window	Only model that emits before physical state advances; window debounces flapping
Trajectory reconciliation, map-matching, drift correction	Batch (windowed)	Window latency is irrelevant; vectorized bulk PiP maximizes throughput
Dense polygon sets (>10k vertices/cell), high overlap	Micro-batch (vectorized)	Contiguous scan avoids the heap fragmentation that wrecks streaming tails
Bursty load with a hard tail SLA	Streaming + bounded ring + shed-at-75%	Backpressure keeps the tail bounded under burst
Billing/compliance state transitions	Streaming with exactly-once on that path only	Transactional cost is justified where double-emit is unacceptable

In production the answer is almost always hybrid, not either/or. The streaming path handles immediate safety and pricing triggers; a batch path performs trajectory reconciliation, retroactive state correction, and high-fidelity spatial analysis off the hot path; and the snapshot-swap channel connects configuration changes to the live index without a restart. Streaming dominates the live trigger path once device concurrency exceeds roughly 50k connections, because batch windows reintroduce exactly the staleness real-time triggers exist to eliminate — but the batch tier remains essential for correcting the false negatives that GPS dropouts and out-of-order delivery leave behind. Success hinges on explicit latency budgeting, disciplined async boundaries, and runbooks that anticipate index fragmentation, GPS anomalies, and queue saturation, so the platform scales geofence evaluation without trading away reliability or spatial accuracy.

Operator FAQ

My streaming P50 is single-digit milliseconds but P99 spikes to 140 ms — is that the network?

Check gc.get_stats() and executor pending count first. Periodic P99 spikes that line up with gen-2 collections are GC pauses; spikes that line up with rising executor_pending are pool saturation. Confirm geometry runs off the loop, gc.freeze() ran after warm-up, and the worker pool is sized to cpu_count() - 2.

Can I get real-time triggers out of a batch pipeline by shrinking the window?

Only down to the window floor. A 1 s micro-batch has a 1 s trigger latency by construction; you cannot emit before the window closes. Below a few hundred milliseconds the per-window fixed costs erase the throughput advantage, at which point offloaded streaming is the better tool.

Do I need exactly-once delivery for geofence triggers?

Rarely. At-least-once plus idempotent handlers and a dedup layer (bloom filter or device-sequence tracking) meets most safety and pricing requirements at a fraction of the cost. Reserve exactly-once for billing and compliance transitions where a double-emit is unacceptable.

Point-in-polygon algorithm benchmarks — measured throughput of the exact test on the critical path of both models.
Memory-constrained spatial processing — the flat-index and GC discipline that keeps streaming RSS flat.
Async Python execution patterns for spatial math — the GIL-free offload boundary the streaming path depends on.
Latency budget allocation for real-time triggers — the per-phase budget and degradation thresholds the circuit breakers enforce.
Fallback routing for GPS dropouts — the grace-period and reconciliation path that batch corrects streaming false negatives through.
Up one level: Core Architecture & Latency Constraints — the pipeline-wide invariants this trade-off sits inside.

Streaming vs Batch Geofence Evaluation: Latency, Throughput, and Execution Trade-offs

Side-by-Side Execution Model #

Algorithmic Divergence & Latency Profiles #

Implementation Trade-offs: GIL, asyncio, and the Critical Path #

Memory Footprint & Streaming Churn #

Async Mutation Boundaries & Queue Semantics #

Operational Runbook & Failure Mitigation #

Architectural Guidance: When to Choose Each Model #

Operator FAQ #

Related #