Skip to content

Geohash, Quadtree, R-tree, S2, and H3 — A Comparison

TL;DR

Five spatial indexing methods, each with different strengths. Geohash: simple, string-based, prefix matching, but has edge problems. Quadtree: adaptive to density, good for sparse/dense mixtures. R-tree: best for range queries and arbitrary shapes (PostGIS). S2: Hilbert curve on a sphere, used by Google. H3: hexagonal grid, uniform adjacency, used by Uber. For interviews, know when to pick each one. For "design a proximity service," Geohash is the safe default. For "design a ride-sharing system," H3 or S2 are better answers.


The Problem

You have 10 million restaurants. A user opens an app and asks "show me restaurants within 2 km." You need to answer this in under 100ms.

Naive approach: compute the distance from the user to every restaurant. 10 million distance calculations. Even at 10 nanoseconds each, that is 100ms. Barely acceptable, and it gets worse with 100 million points.

You need a spatial index: a data structure that partitions 2D (or spherical) space so that a proximity query only examines nearby points. Every spatial indexing method is a way to convert a 2D problem into something that existing data structures (B-trees, hash tables, sorted arrays) can handle.


The Algorithms

1. Geohash

Geohash encodes a latitude/longitude pair as a string by interleaving the bits of the latitude and longitude binary representations, then base-32 encoding the result.

Location: (37.7749, -122.4194)  [San Francisco]

Geohash encoding:
1. Binary divide latitude range [-90, 90]:
   37.7749 → 10110...
2. Binary divide longitude range [-180, 180]:
   -122.4194 → 01000...
3. Interleave bits: longitude, latitude, longitude, latitude...
   → 0110 1001 0100...
4. Base-32 encode groups of 5 bits:
   → "9q8yy"

Geohash: "9q8yy" (5 characters, ~4.9 km x 4.9 km cell)

Precision levels:

Characters Cell Size (lat x lon) Use Case
1 5,000 km x 5,000 km Continent
3 156 km x 156 km Large city
5 4.9 km x 4.9 km Neighborhood
6 1.2 km x 610 m Street block
7 153 m x 153 m Building group
8 38 m x 19 m Single building

Key property: Geohashes are hierarchical. A geohash is a prefix of all geohashes within it. All locations in "9q8yy" start with "9q8y" which starts with "9q8". This means a SQL query like WHERE geohash LIKE '9q8y%' finds all points in that cell using a standard B-tree index.

The edge problem: Two points can be very close geographically but have completely different geohash prefixes if they straddle a cell boundary. The point at the eastern edge of cell "9q8yy" and the point at the western edge of cell "9q8yz" are neighbors on the map but share no prefix.

Solution: For any proximity query, search the target cell AND its 8 neighboring cells. This eliminates false negatives at cell boundaries.

2. Quadtree

A quadtree recursively divides 2D space into four quadrants. Each internal node has exactly four children (NW, NE, SW, SE). Subdivision continues until each leaf contains fewer than a threshold number of points (or reaches a maximum depth).

Root: entire map
├── NW: sparse area → leaf (3 points)
├── NE: sparse area → leaf (5 points)
├── SW: dense city → subdivide further
│   ├── NW: 2 points
│   ├── NE: subdivide further...
│   ├── SW: 4 points
│   └── SE: 1 point
└── SE: moderate → subdivide once
    ├── NW: 6 points
    ├── NE: 3 points
    ├── SW: 2 points
    └── SE: 7 points

Adaptive density: A quadtree automatically creates smaller cells in dense areas (Manhattan) and larger cells in sparse areas (rural Montana). This is its primary advantage over fixed-grid approaches like Geohash.

Proximity query: Start at the root. Descend to the leaf containing the query point. Expand to neighboring leaves until the search radius is covered. No need to check distant quadrants.

Limitation: Quadtrees are in-memory structures. They do not map naturally to disk-based B-tree indices. If you need the index in a relational database, Geohash or R-tree is a better fit.

3. R-tree

An R-tree indexes spatial objects using minimum bounding rectangles (MBRs). Each node contains a bounding rectangle that encloses all its children. Leaf nodes contain bounding rectangles of actual spatial objects.

Root: MBR covering entire dataset
├── Internal node: MBR covering western half
│   ├── Leaf: MBR around cluster of restaurants
│   └── Leaf: MBR around cluster of parks
└── Internal node: MBR covering eastern half
    ├── Leaf: MBR around cluster of buildings
    └── Leaf: MBR around a lake polygon

Key advantage: R-trees can index arbitrary shapes -- polygons, lines, multipoints -- not just points. This makes them the standard choice for GIS (Geographic Information Systems).

PostGIS (the spatial extension for PostgreSQL) uses R-trees (specifically GiST indices) as its primary spatial index. The query ST_DWithin(geom, point, 1000) (find all geometries within 1000 meters) uses an R-tree to prune the search space before computing exact distances.

Range queries: R-trees excel at "find all objects intersecting this rectangle." The query descends the tree, pruning branches whose MBR does not intersect the query rectangle.

Limitation: R-tree insertion can cause MBR overlap between sibling nodes, leading to multiple branches being explored during queries. R*-tree variants minimize overlap with reinsertion strategies, but the worst case is still slower than point-based indices.

4. S2 Geometry (Google)

S2 projects the sphere onto six faces of a cube, then uses a Hilbert curve to map each face to a 1D space. This converts 2D spherical coordinates into a single 64-bit integer (S2 Cell ID) that preserves spatial locality.

Latitude/longitude
  → cube face projection (one of six faces)
  → Hilbert curve mapping (2D → 1D)
  → 64-bit S2 Cell ID

Cell levels: 0 (face) to 30 (~1 cm^2)
Level 12: ~3.3 km^2 (city neighborhood)
Level 15: ~0.05 km^2 (block)
Level 20: ~0.5 m^2 (sub-meter precision)

Region covering: Given an arbitrary region (circle, polygon), S2 computes a set of S2 cells that cover the region. These cells can be at different levels (adaptive precision). The covering is then used as a set of range queries on the S2 Cell ID column.

"Find restaurants within 2 km of user":
1. Compute S2 covering of the 2 km circle → e.g., 8 cells at level 13.
2. Query: WHERE s2_cell_id BETWEEN cell1_min AND cell1_max
          OR s2_cell_id BETWEEN cell2_min AND cell2_max
          ... (8 range queries on a B-tree index)
3. Filter results by exact distance.

Key advantage: Works on a sphere (no projection distortion). The Hilbert curve preserves spatial locality better than the Z-curve used by Geohash. Nearby points on the sphere tend to have nearby S2 Cell IDs.

Used by: Google Maps, Google Earth, Foursquare, MongoDB (2dsphere index uses S2 internally).

5. H3 (Uber)

H3 is a hexagonal hierarchical spatial index. The world is divided into hexagonal cells at 16 resolution levels.

Why hexagons?

  • Every hexagon has exactly 6 equidistant neighbors. Squares have 8 neighbors at two different distances (4 edge-adjacent, 4 corner-adjacent). This non-uniform adjacency causes problems in proximity algorithms.
  • Hexagons approximate circles better than squares. The distance from center to any edge is constant. For squares, the corner distance is sqrt(2) times the edge distance.
  • Hexagonal grids are the most efficient way to cover a plane with equal-area cells and uniform adjacency.
Resolution levels:
0: 4.3 million km^2 (continental)
5: 252 km^2 (metro area)
9: 0.1 km^2 (city block)
12: 307 m^2 (building)
15: 0.9 m^2 (sub-meter)

K-ring query: To find all cells within distance k, H3 provides kRing(cell, k). This returns a set of hexagonal cells expanding outward. The uniform adjacency makes this clean: kRing(cell, 1) returns 7 cells (center + 6 neighbors). kRing(cell, 2) returns 19 cells.

Used by: Uber (surge pricing, ETA computation, driver dispatch), Lyft, Foursquare, Starbucks (store placement analysis).


Comparison Table

Aspect Geohash Quadtree R-tree S2 H3
Ease of implementation Very easy Easy Hard Medium Easy (library)
Precision control Character count Tree depth MBR size Cell level Resolution
Edge handling Poor (boundary cells) Good Good Good (Hilbert) Good (hex)
Density adaptation No (fixed grid) Yes Yes Yes (covering) No (fixed grid)
Range query efficiency Moderate Good Excellent Good Good
Arbitrary shapes No (points only) Points Yes Region covering Points
Spherical correctness No (distorts at poles) No Depends on projection Yes Yes
Database integration B-tree index In-memory GiST (PostGIS) B-tree index B-tree/custom
Space curve Z-curve N/A N/A Hilbert curve N/A

Proof/Correctness Intuition

Why Space-Filling Curves Work

A space-filling curve maps 2D space to 1D such that nearby 2D points tend to map to nearby 1D values. This converts a 2D proximity query ("find all points within radius r") into a set of 1D range queries ("find all cell IDs between X and Y"), which standard B-tree indices handle efficiently.

The Hilbert curve (S2) preserves locality better than the Z-curve (Geohash). The Z-curve has "jumps" where spatially adjacent regions map to distant 1D positions, especially at cell boundaries. The Hilbert curve minimizes these jumps, resulting in fewer and tighter range queries.

No space-filling curve perfectly preserves all 2D distances. There will always be some nearby 2D points that map to distant 1D values. This is why all spatial index queries require a post-filtering step: compute exact distances and discard false positives.


Real-World Usage

System Index Used Why
PostGIS R-tree (GiST) Best for arbitrary geometries + SQL integration
Google Maps S2 Spherical, hierarchical, Hilbert locality
MongoDB S2 (2dsphere) Spherical queries via S2 covering
Uber H3 Uniform adjacency for pricing + dispatch
Elasticsearch Geohash Simple, works with inverted indices
DynamoDB Geohash Simple, string prefix matching
Redis Geohash GEOADD/GEOSEARCH commands

Interview Application

Decision guide for interviews:

"Design a nearby friends feature":
  → Geohash. Simple, works with any database. Search cell + 8 neighbors.

"Design a ride-sharing service (Uber/Lyft)":
  → H3. Uniform adjacency for surge pricing zones and driver dispatch.

"Design a mapping/navigation service":
  → S2. Spherical correctness, hierarchical precision, Google-proven.

"Design a GIS system for land parcels":
  → R-tree (PostGIS). Arbitrary polygon support, range queries on shapes.

"Design a real-time location tracking system":
  → Quadtree. Adaptive density handles urban/rural differences.

What interviewers want to hear:

  1. You know that spatial queries cannot use standard 1D indices directly.
  2. You can explain at least two approaches (Geohash is the minimum).
  3. You understand the edge/boundary problem with grid-based approaches.
  4. You can justify your choice for the specific problem.

Trade-offs

Geohash Z Curve

Approach Best For Worst For
Geohash Simple proximity, any DB backend Polar regions, fine boundary cases
Quadtree Variable-density point data Disk-based storage, persistence
R-tree Polygon data, range queries High-throughput point updates
S2 Global-scale, spherical queries Simplicity (learning curve)
H3 Uniform adjacency, pricing zones Arbitrary shapes, non-point data

Common Mistakes

Geohash handles all proximity cases

Geohash has a well-known boundary problem. Two points 1 meter apart can have completely different geohash prefixes if they straddle a cell boundary. Always search neighboring cells.

Higher Geohash precision is always better

Higher precision means smaller cells, which means searching more cells for a given radius. A 5 km radius search with precision-8 geohashes (38m cells) requires checking thousands of cells. Match the precision to your query radius.

Quadtrees work on disk

Quadtrees are pointer-based structures optimized for in-memory use. Serializing them to disk requires careful design. For disk-based spatial indexing, use R-trees (PostGIS) or space-filling curves (Geohash, S2) with B-tree indices.

R-trees are the best for everything spatial

R-trees are the best for range queries on arbitrary shapes in a database. For simple point proximity at massive scale (billions of points, real-time), Geohash or S2 with a key-value store is more scalable.

Just use latitude/longitude columns with a compound index

A compound index on (lat, lon) creates a B-tree that sorts by latitude first. A range query WHERE lat BETWEEN a AND b AND lon BETWEEN c AND d can use the index for latitude but must scan all matching latitudes to filter longitude. This is an order of magnitude slower than a spatial index for large datasets.

H3 and Geohash are interchangeable

They solve different problems. Geohash cells are rectangles with non-uniform adjacency (edge vs corner neighbors). H3 cells are hexagons with uniform adjacency. For algorithms that depend on "what are the neighboring cells" (surge pricing, heat maps), H3 is meaningfully better. For simple "is this point near this point" queries in a relational database, Geohash is simpler.