Optimizing Ray Casting vs Winding Number for GPS Streams

Real-time location pipelines processing high-frequency GPS telemetry from IoT sensors, ride-hailing fleets, and logistics trackers run each containment check inside a sub-millisecond slice of the per-event budget. When that check becomes the bottleneck, the choice between ray casting and the winding number test — and how each is tuned for a streaming workload — decides whether the service scales linearly or collapses under coordinate volume. Degradation rarely shows up as a single timeout; it surfaces as cascading consumer-group lag, erratic CPU spikes, and silent false-positive boundary crossings that corrupt downstream billing, dispatch routing, and compliance audits. This page narrows the point-in-polygon algorithm benchmarks comparison to one practical question — how to make each kernel hold its budget on jittery GPS streams — and operates inside the per-trigger contract defined in Core Architecture & Latency Constraints.

The optimization target is not the median. A geofence trigger is a physical-state race: the asset is already moving, so it is the P99 evaluation that lets a vehicle drift metres past a surge perimeter or customs line before the containment result resolves. Tuning the kernel is therefore an exercise in flattening the tail, not shaving the average.

Concept & Specification

Both kernels resolve exact containment of a query point $p$ against a polygon of $n$ edges, and both are $O (n)$ in edge count. What differs is the constant factor and the behaviour on degenerate geometry. With a bounding-box reject in front, the per-evaluation cost is:

$T_{eval} = O (1) < e m > bbox reject + p < / e m > hit \cdot edge scan on survivors O (n)$

where $p_{hit}$ is the fraction of points that survive the box test and reach the edge scan. Because the $O (n)$ kernel dominates, driving $p_{hit}$ down with a cheap vectorized box test is the single highest-leverage move — it keeps the expensive scan off the hot path for the ~90% of pings that are nowhere near a fence.

Ray casting (the even-odd crossing test) shoots a horizontal ray from $p$ and toggles a boolean each time it crosses an edge; an odd crossing count means inside. The inner loop is one sign comparison and one cross-multiplication per edge — no transcendental calls. The winding number test accumulates the net number of times the polygon wraps $p$:

$w n (p) = \frac{1}{2 π} i = 0 \sum n - 1 Δ θ_{i}$

The naive angle-summation form above calls atan2 per edge and is ~4x slower — never ship it. The robust formulation (Sunday’s) discards atan2 and counts signed up/down crossings, so its per-edge cost lands close to ray casting while staying correct on self-overlapping geometry.

The parameters that materially move the tail on GPS streams:

Parameter	Symbol	Typical range	Effect on P99
Polygon vertex count	$n$	10–1000	Linear on the edge scan
Box-survivor rate	$p_{hit}$	0.02–0.20	Sets how often the kernel runs
Coordinate precision	—	5–9 decimals	Higher = more FP compares, no accuracy gain
Kernel locus	—	Python / Numba / GEOS	Removes interpreter + GIL serialization
GC gen-2 threshold	—	700 / 10 / 10	Pause frequency under allocation churn

Step-by-Step Implementation

Prerequisites: Python 3.11+, numpy>=1.24, shapely>=2.0 (vectorized contains, GEOS C-API), optional numba>=0.59 for a JIT kernel, plus py-spy and tracemalloc for profiling. Coordinate batches are float64 arrays of shape (N, 2) ordered [lon, lat]; fence polygons are pre-loaded into a read-only catalog so geometry never crosses a process boundary per call.

1. Reject the majority of points with a vectorized box test. Apply an axis-aligned bounding-box (AABB) filter before any kernel call, using a branchless NumPy comparison with no Python-level iteration over candidates.

python

from __future__ import annotations

import numpy as np


def aabb_survivors(points: np.ndarray, bounds: np.ndarray) -> np.ndarray:
    # points: (N, 2) float64 [lon, lat]; bounds: [min_lon, min_lat, max_lon, max_lat]
    return (
        (points[:, 0] >= bounds[0]) & (points[:, 0] <= bounds[2]) &
        (points[:, 1] >= bounds[1]) & (points[:, 1] <= bounds[3])
    )

Gotcha: compute bounds once per fence at load time and cache it on the catalog entry. Recomputing polygon.min(axis=0) per batch silently reintroduces the O(n) cost the box test exists to avoid.

2. Run the exact kernel only on survivors. Slice the survivor subset and pass it to a compiled kernel. The Numba-jitted ray-cast below releases the GIL (nogil=True) so it can run under a thread pool without serializing the event loop.

python

from numba import njit


@njit(nogil=True, fastmath=False, cache=True)
def ray_cast_inside(px: float, py: float, vx: np.ndarray, vy: np.ndarray) -> bool:
    inside = False
    n = vx.shape[0]
    j = n - 1
    for i in range(n):
        if (vy[i] > py) != (vy[j] > py):
            x_cross = (vx[j] - vx[i]) * (py - vy[i]) / (vy[j] - vy[i]) + vx[i]
            if px < x_cross:
                inside = not inside
        j = i
    return inside

Gotcha: keep fastmath=False. Reordering floating-point ops flips the sign of the cross-product on points that sit exactly on an edge, turning a deterministic boundary into a flickering one as GPS jitter nudges the coordinate.

3. Choose the kernel by geometry, not by default. Route simple/convex fences to ray casting and reserve the winding number test for self-intersecting municipal boundaries, donut polygons (an exclusion inside a delivery zone), and any perimeter where regulatory correctness is mandated. The even-odd rule treats a hole and an overlap identically; the winding number distinguishes them by sign.

python

def contains(point: tuple[float, float], fence) -> bool:
    if not _in_box(point, fence.bounds):
        return False
    if fence.is_simple:          # precomputed once at load
        return ray_cast_inside(point[0], point[1], fence.vx, fence.vy)
    return winding_number_inside(point[0], point[1], fence.vx, fence.vy)

4. Normalize coordinate precision before evaluation. Quantize incoming GPS to 6 decimal places (~0.11 m) at ingest. Precision beyond that adds floating-point comparison overhead without improving any operational decision.

python

def quantize(points: np.ndarray) -> np.ndarray:
    return np.round(points, 6)

5. Keep a simplified fallback geometry warm. Maintain a pre-computed lower-vertex approximation of each critical fence via Douglas-Peucker simplification. When consumer lag crosses a threshold, route traffic to the reduced polygon, trading a controlled 2–5 m boundary tolerance for throughput. For coarse pre-bucketing of which fences a point could even touch, H3 hexagon indexing narrows the candidate set before any exact kernel runs.

Benchmark / Verification

Figures are single-core, CPython 3.11, warm cache, perf_counter_ns, pre-allocated arrays, box reject applied first, 10^6 random points per cell with ~50% box-survivor rate. Run your own with the same harness — relative gaps are stable across machines even when absolute numbers are not.

Vertices	Kernel	Throughput	P50	P95	P99
50	Ray cast (Numba)	1.9M/s	0.4µs	0.7µs	1.1µs
50	Winding (Sunday)	1.4M/s	0.6µs	1.0µs	1.6µs
300	Ray cast (Numba)	410k/s	2.1µs	3.4µs	4.9µs
300	Winding (Sunday)	290k/s	3.0µs	4.8µs	6.8µs
300	Winding (atan2)	78k/s	11µs	17µs	24µs

Ray casting holds a 1.4x–1.8x edge on simple fences; the gap is real but small enough that correctness, not speed, should decide the kernel for complex boundaries. The dominant win is upstream of either kernel: adding the vectorized box reject moved a 300-vertex pure-Python pipeline from a measured P99 of 41µs/point to 4.9µs/point — roughly 8x — because the kernel now runs on the ~10% of points that survive the box rather than all of them. Confirm the win with tracemalloc: a tuned pipeline should hold steady-state allocation near zero per evaluation, versus the 40–60 MB/s per worker that naive Point/Polygon instantiation churns.

Failure Modes & Edge Cases

NaN / null coordinates. A dropped GPS fix arrives as NaN, and every comparison against NaN returns False, so the box test silently rejects it and the point vanishes from the stream. Filter explicitly with np.isnan(points).any(axis=1) and route bad fixes to a dead-letter path rather than letting them disappear.
On-edge and vertex-grazing points. Jitter parks a coordinate exactly on an edge or vertex. Ray casting’s result there depends on the strict-vs-non-strict comparison; pick one convention (< vs <=) and apply it consistently across the whole catalog, or two adjacent fences will both claim or both disown the point.
Self-intersecting and donut polygons. Ray casting returns wrong containment for self-overlapping multipolygons and for holes. Flag these at load time (fence.is_simple) and force them onto the winding number path; do not let a bad-geometry import quietly degrade accuracy.
Empty or degenerate fences. A polygon with fewer than three vertices, or a zero-area sliver from an over-aggressive simplification, makes the edge loop produce nonsense. Validate vertex count and area at load and reject the geometry rather than evaluating it.
GC pauses landing mid-evaluation. Per-evaluation object creation pushes gen-2 collections into the hot path, and a flame graph attributes the pause to whatever kernel frame was executing — making “the kernel is slow” the wrong conclusion roughly half the time. Pool buffers, keep arrays pre-allocated, and confirm with gc.get_stats() before tuning the kernel itself.
GIL serialization under fan-out. A pure-Python kernel serializes every worker on the GIL, so adding threads buys nothing. The nogil=True Numba kernel (or a GEOS call through Shapely) is what lets a thread pool actually parallelize the edge scan; verify with py-spy that wall time under the kernel frame drops as you add threads. The async offload boundary that makes this safe is covered in benchmarking spatial containment in async Python.

Point-in-Polygon Algorithm Benchmarks — parent comparison of PIP kernels across vertex counts and concurrency
Reducing P99 Latency in Python Geofence Services — the tail-reduction playbook this kernel tuning feeds into
Handling Polygon Edge Cases in High-Frequency Telemetry — degenerate-geometry handling under memory pressure
Core Architecture & Latency Constraints — the per-trigger budget every kernel runs inside

Optimizing Ray Casting vs Winding Number for GPS Streams

Concept & Specification #

Step-by-Step Implementation #

Benchmark / Verification #

Failure Modes & Edge Cases #

Related #

Concept & Specification

Step-by-Step Implementation

Benchmark / Verification

Failure Modes & Edge Cases

Related