Geohash, Quadtree, R-tree, S2, and H3 — A Comparison
TL;DR
Five spatial indexing methods, each with different strengths. Geohash: simple, string-based, prefix matching, but has edge problems. Quadtree: adaptive to density, good for sparse/dense mixtures. R-tree: best for range queries and arbitrary shapes (PostGIS). S2: Hilbert curve on a sphere, used by Google. H3: hexagonal grid, uniform adjacency, used by Uber. For interviews, know when to pick each one. For "design a proximity service," Geohash is the safe default. For "design a ride-sharing system," H3 or S2 are better answers.
The Problem
You have 10 million restaurants. A user opens an app and asks "show me restaurants within 2 km." You need to answer this in under 100ms.
Naive approach: compute the distance from the user to every restaurant. 10 million distance calculations. Even at 10 nanoseconds each, that is 100ms. Barely acceptable, and it gets worse with 100 million points.
You need a spatial index: a data structure that partitions 2D (or spherical) space so that a proximity query only examines nearby points. Every spatial indexing method is a way to convert a 2D problem into something that existing data structures (B-trees, hash tables, sorted arrays) can handle.
The Algorithms
1. Geohash
Geohash encodes a latitude/longitude pair as a string by interleaving the bits of the latitude and longitude binary representations, then base-32 encoding the result.
Location: (37.7749, -122.4194) [San Francisco]
Geohash encoding:
1. Binary divide latitude range [-90, 90]:
37.7749 → 10110...
2. Binary divide longitude range [-180, 180]:
-122.4194 → 01000...
3. Interleave bits: longitude, latitude, longitude, latitude...
→ 0110 1001 0100...
4. Base-32 encode groups of 5 bits:
→ "9q8yy"
Geohash: "9q8yy" (5 characters, ~4.9 km x 4.9 km cell)
Precision levels:
| Characters | Cell Size (lat x lon) | Use Case |
|---|---|---|
| 1 | 5,000 km x 5,000 km | Continent |
| 3 | 156 km x 156 km | Large city |
| 5 | 4.9 km x 4.9 km | Neighborhood |
| 6 | 1.2 km x 610 m | Street block |
| 7 | 153 m x 153 m | Building group |
| 8 | 38 m x 19 m | Single building |
Key property: Geohashes are hierarchical. A geohash is a prefix of all geohashes within it. All locations in "9q8yy" start with "9q8y" which starts with "9q8". This means a SQL query like WHERE geohash LIKE '9q8y%' finds all points in that cell using a standard B-tree index.
The edge problem: Two points can be very close geographically but have completely different geohash prefixes if they straddle a cell boundary. The point at the eastern edge of cell "9q8yy" and the point at the western edge of cell "9q8yz" are neighbors on the map but share no prefix.
Solution: For any proximity query, search the target cell AND its 8 neighboring cells. This eliminates false negatives at cell boundaries.
2. Quadtree
A quadtree recursively divides 2D space into four quadrants. Each internal node has exactly four children (NW, NE, SW, SE). Subdivision continues until each leaf contains fewer than a threshold number of points (or reaches a maximum depth).
Root: entire map
├── NW: sparse area → leaf (3 points)
├── NE: sparse area → leaf (5 points)
├── SW: dense city → subdivide further
│ ├── NW: 2 points
│ ├── NE: subdivide further...
│ ├── SW: 4 points
│ └── SE: 1 point
└── SE: moderate → subdivide once
├── NW: 6 points
├── NE: 3 points
├── SW: 2 points
└── SE: 7 points
Adaptive density: A quadtree automatically creates smaller cells in dense areas (Manhattan) and larger cells in sparse areas (rural Montana). This is its primary advantage over fixed-grid approaches like Geohash.
Proximity query: Start at the root. Descend to the leaf containing the query point. Expand to neighboring leaves until the search radius is covered. No need to check distant quadrants.
Limitation: Quadtrees are in-memory structures. They do not map naturally to disk-based B-tree indices. If you need the index in a relational database, Geohash or R-tree is a better fit.
3. R-tree
An R-tree indexes spatial objects using minimum bounding rectangles (MBRs). Each node contains a bounding rectangle that encloses all its children. Leaf nodes contain bounding rectangles of actual spatial objects.
Root: MBR covering entire dataset
├── Internal node: MBR covering western half
│ ├── Leaf: MBR around cluster of restaurants
│ └── Leaf: MBR around cluster of parks
└── Internal node: MBR covering eastern half
├── Leaf: MBR around cluster of buildings
└── Leaf: MBR around a lake polygon
Key advantage: R-trees can index arbitrary shapes -- polygons, lines, multipoints -- not just points. This makes them the standard choice for GIS (Geographic Information Systems).
PostGIS (the spatial extension for PostgreSQL) uses R-trees (specifically GiST indices) as its primary spatial index. The query ST_DWithin(geom, point, 1000) (find all geometries within 1000 meters) uses an R-tree to prune the search space before computing exact distances.
Range queries: R-trees excel at "find all objects intersecting this rectangle." The query descends the tree, pruning branches whose MBR does not intersect the query rectangle.
Limitation: R-tree insertion can cause MBR overlap between sibling nodes, leading to multiple branches being explored during queries. R*-tree variants minimize overlap with reinsertion strategies, but the worst case is still slower than point-based indices.
4. S2 Geometry (Google)
S2 projects the sphere onto six faces of a cube, then uses a Hilbert curve to map each face to a 1D space. This converts 2D spherical coordinates into a single 64-bit integer (S2 Cell ID) that preserves spatial locality.
Latitude/longitude
→ cube face projection (one of six faces)
→ Hilbert curve mapping (2D → 1D)
→ 64-bit S2 Cell ID
Cell levels: 0 (face) to 30 (~1 cm^2)
Level 12: ~3.3 km^2 (city neighborhood)
Level 15: ~0.05 km^2 (block)
Level 20: ~0.5 m^2 (sub-meter precision)
Region covering: Given an arbitrary region (circle, polygon), S2 computes a set of S2 cells that cover the region. These cells can be at different levels (adaptive precision). The covering is then used as a set of range queries on the S2 Cell ID column.
"Find restaurants within 2 km of user":
1. Compute S2 covering of the 2 km circle → e.g., 8 cells at level 13.
2. Query: WHERE s2_cell_id BETWEEN cell1_min AND cell1_max
OR s2_cell_id BETWEEN cell2_min AND cell2_max
... (8 range queries on a B-tree index)
3. Filter results by exact distance.
Key advantage: Works on a sphere (no projection distortion). The Hilbert curve preserves spatial locality better than the Z-curve used by Geohash. Nearby points on the sphere tend to have nearby S2 Cell IDs.
Used by: Google Maps, Google Earth, Foursquare, MongoDB (2dsphere index uses S2 internally).
5. H3 (Uber)
H3 is a hexagonal hierarchical spatial index. The world is divided into hexagonal cells at 16 resolution levels.
Why hexagons?
- Every hexagon has exactly 6 equidistant neighbors. Squares have 8 neighbors at two different distances (4 edge-adjacent, 4 corner-adjacent). This non-uniform adjacency causes problems in proximity algorithms.
- Hexagons approximate circles better than squares. The distance from center to any edge is constant. For squares, the corner distance is sqrt(2) times the edge distance.
- Hexagonal grids are the most efficient way to cover a plane with equal-area cells and uniform adjacency.
Resolution levels:
0: 4.3 million km^2 (continental)
5: 252 km^2 (metro area)
9: 0.1 km^2 (city block)
12: 307 m^2 (building)
15: 0.9 m^2 (sub-meter)
K-ring query: To find all cells within distance k, H3 provides kRing(cell, k). This returns a set of hexagonal cells expanding outward. The uniform adjacency makes this clean: kRing(cell, 1) returns 7 cells (center + 6 neighbors). kRing(cell, 2) returns 19 cells.
Used by: Uber (surge pricing, ETA computation, driver dispatch), Lyft, Foursquare, Starbucks (store placement analysis).
Comparison Table
| Aspect | Geohash | Quadtree | R-tree | S2 | H3 |
|---|---|---|---|---|---|
| Ease of implementation | Very easy | Easy | Hard | Medium | Easy (library) |
| Precision control | Character count | Tree depth | MBR size | Cell level | Resolution |
| Edge handling | Poor (boundary cells) | Good | Good | Good (Hilbert) | Good (hex) |
| Density adaptation | No (fixed grid) | Yes | Yes | Yes (covering) | No (fixed grid) |
| Range query efficiency | Moderate | Good | Excellent | Good | Good |
| Arbitrary shapes | No (points only) | Points | Yes | Region covering | Points |
| Spherical correctness | No (distorts at poles) | No | Depends on projection | Yes | Yes |
| Database integration | B-tree index | In-memory | GiST (PostGIS) | B-tree index | B-tree/custom |
| Space curve | Z-curve | N/A | N/A | Hilbert curve | N/A |
Proof/Correctness Intuition
Why Space-Filling Curves Work
A space-filling curve maps 2D space to 1D such that nearby 2D points tend to map to nearby 1D values. This converts a 2D proximity query ("find all points within radius r") into a set of 1D range queries ("find all cell IDs between X and Y"), which standard B-tree indices handle efficiently.
The Hilbert curve (S2) preserves locality better than the Z-curve (Geohash). The Z-curve has "jumps" where spatially adjacent regions map to distant 1D positions, especially at cell boundaries. The Hilbert curve minimizes these jumps, resulting in fewer and tighter range queries.
No space-filling curve perfectly preserves all 2D distances. There will always be some nearby 2D points that map to distant 1D values. This is why all spatial index queries require a post-filtering step: compute exact distances and discard false positives.
Real-World Usage
| System | Index Used | Why |
|---|---|---|
| PostGIS | R-tree (GiST) | Best for arbitrary geometries + SQL integration |
| Google Maps | S2 | Spherical, hierarchical, Hilbert locality |
| MongoDB | S2 (2dsphere) | Spherical queries via S2 covering |
| Uber | H3 | Uniform adjacency for pricing + dispatch |
| Elasticsearch | Geohash | Simple, works with inverted indices |
| DynamoDB | Geohash | Simple, string prefix matching |
| Redis | Geohash | GEOADD/GEOSEARCH commands |
Interview Application
Decision guide for interviews:
"Design a nearby friends feature":
→ Geohash. Simple, works with any database. Search cell + 8 neighbors.
"Design a ride-sharing service (Uber/Lyft)":
→ H3. Uniform adjacency for surge pricing zones and driver dispatch.
"Design a mapping/navigation service":
→ S2. Spherical correctness, hierarchical precision, Google-proven.
"Design a GIS system for land parcels":
→ R-tree (PostGIS). Arbitrary polygon support, range queries on shapes.
"Design a real-time location tracking system":
→ Quadtree. Adaptive density handles urban/rural differences.
What interviewers want to hear:
- You know that spatial queries cannot use standard 1D indices directly.
- You can explain at least two approaches (Geohash is the minimum).
- You understand the edge/boundary problem with grid-based approaches.
- You can justify your choice for the specific problem.
Trade-offs

| Approach | Best For | Worst For |
|---|---|---|
| Geohash | Simple proximity, any DB backend | Polar regions, fine boundary cases |
| Quadtree | Variable-density point data | Disk-based storage, persistence |
| R-tree | Polygon data, range queries | High-throughput point updates |
| S2 | Global-scale, spherical queries | Simplicity (learning curve) |
| H3 | Uniform adjacency, pricing zones | Arbitrary shapes, non-point data |
Common Mistakes
Geohash handles all proximity cases
Geohash has a well-known boundary problem. Two points 1 meter apart can have completely different geohash prefixes if they straddle a cell boundary. Always search neighboring cells.
Higher Geohash precision is always better
Higher precision means smaller cells, which means searching more cells for a given radius. A 5 km radius search with precision-8 geohashes (38m cells) requires checking thousands of cells. Match the precision to your query radius.
Quadtrees work on disk
Quadtrees are pointer-based structures optimized for in-memory use. Serializing them to disk requires careful design. For disk-based spatial indexing, use R-trees (PostGIS) or space-filling curves (Geohash, S2) with B-tree indices.
R-trees are the best for everything spatial
R-trees are the best for range queries on arbitrary shapes in a database. For simple point proximity at massive scale (billions of points, real-time), Geohash or S2 with a key-value store is more scalable.
Just use latitude/longitude columns with a compound index
A compound index on (lat, lon) creates a B-tree that sorts by latitude first. A range query WHERE lat BETWEEN a AND b AND lon BETWEEN c AND d can use the index for latitude but must scan all matching latitudes to filter longitude. This is an order of magnitude slower than a spatial index for large datasets.
H3 and Geohash are interchangeable
They solve different problems. Geohash cells are rectangles with non-uniform adjacency (edge vs corner neighbors). H3 cells are hexagons with uniform adjacency. For algorithms that depend on "what are the neighboring cells" (surge pricing, heat maps), H3 is meaningfully better. For simple "is this point near this point" queries in a relational database, Geohash is simpler.