Google Maps System Design –> how to design a navigation application similar to Google Maps.
- Functional Requirements:
- Provide routes from point A to point B.
- Estimate distance and time for a given route.
- Offer multiple route options (e.g., minimizing distance or time).
- Have a pluggable model to easily add new information like traffic, weather, accidents, construction, or road blockages without changing the core architecture.
- Identify new roads using government data as a starting point and capturing organic user data (e.g., number of users, traffic volume, speed) to determine road existence and characteristics like width and number of lanes.
- Non-Functional Requirements (NFRs):
- High Availability: The system should always be available and not go down.
- Good Accuracy: While not necessarily the “best” route, it should provide a “good enough” route.
- Acceptable Speed: The system should respond within a few seconds (e.g., 1-3 seconds) for calculations, not milliseconds, but also not excessively slow (e.g., not 15 seconds).
- Scalability: The system needs to handle a massive number of users and requests, with roughly a billion monthly active users (MAU) generating 5-10 billion requests per month for route information, plus usage by companies like Uber.
- Challenges in Building a Navigation App:
- Massive Data Set: The sheer number of roads globally (estimated 50-100 million) makes modeling the network as a graph extremely complex, potentially involving 50 million vertices and hundreds of millions of edges. Accessing comprehensive road data is also a challenge as there’s no single source.
- Quantifying Attributes: It’s difficult to mathematically quantify real-world factors like traffic, weather, and road quality that impact Estimated Time of Arrival (ETA).
- Unpredictability: Events like random accidents or road closures are unpredictable, making real-time ETA calculation complex.
- Core Concepts and Modeling:
- Segments: The system uses a “Dynamic Programming” ideology by dividing the globe into small, manageable areas called “Segments” (e.g., 1×1 kilometer squares). Each segment has identified boundaries and an ID. Points can be easily mapped to segments based on their coordinates, and approximate aerial distances between segments can be calculated.
- Road Network as a Graph: Roads are modeled as a graph where junctions (points) are vertices and the roads connecting them are edges.
- Multiple Weights on Edges: Unlike typical graphs, each road can have multiple weights, such as length (e.g., 2 kilometers), average time to cover (e.g., 150 seconds), and traffic-based weights.
- Directed Relationships: Roads are treated as directed relationships to account for one-way roads or different travel times/traffic patterns in opposite directions.
- Efficient Route Finding:
- Within a Segment: Standard graph algorithms like Dijkstra or Bellman-Ford can find the shortest path between two junctions. To optimize, Floyd-Warshall Algorithm can pre-calculate and cache all-pairs shortest paths within a segment, storing them as “calculated edges”.
- Handling Non-Junction Points: If a start/end point is not a junction, it’s mapped to its two closest junctions on the road, and distances from these junctions are incorporated into the path calculation.
- Across Multiple Segments:
- Exits: Each segment calculates and caches the shortest routes from its junctions to its “exit points” (points where roads leave the segment).
- Limited Dijkstra Scope: For routes spanning multiple segments, instead of running Dijkstra across the entire globe, the system identifies a bounding box of relevant segments based on aerial distance (e.g., 20 segments in each direction for a 10km aerial distance).
- Segment-Level Graph: A Dijkstra algorithm is run on a smaller graph composed primarily of segment entry and exit points, using the pre-calculated distances between these exits as weights.
- Mega-Segments: For inter-city or inter-country travel, the concept is extended by grouping segments into “Mega-Segments.” This creates another layer of abstraction, allowing recursive solution for pathfinding across these larger units, typically with 3 levels of nesting.
- Handling Weights, Traffic, and ETA Impact:
- Base Weights: Distance and normal ETA (under ideal conditions) are fundamental weights. Average speed can be derived (distance/time).
- Traffic & Weather as Attributes, Not Weights: Traffic, weather, and roadblocks are not directly added as weights to graph edges. Instead, they are attributes that impact the average speed, which in turn influences the ETA. This ensures a pluggable design where new attributes don’t require changing core Dijkstra logic.
- Real-Time User Data for ETA: Google Maps leverages organic data from active users. Where many users are present, the system uses their actual travel times to derive a statistically normal distribution of ETAs, minimizing reliance on explicit traffic/weather data.
- Tiers for Quantification: In areas without sufficient user data, traffic and weather are quantified using “tiers” (e.g., low, medium, high traffic; good, bad weather). These tiers are used to relatively adjust the average speed (e.g., medium traffic reduces speed by 20%).
- Bubbling Up Changes: When a real road’s weight (ETA) changes due to new data (e.g., traffic), the system identifies all “calculated edges” (cached paths) within that segment that use the affected road and updates their ETAs. This change then recursively “bubbles up” to the Mega-Segment level if those mega-segment paths also include the affected routes.
- Recalculation Thresholds: If an ETA for a path within a segment increases significantly (e.g., by more than a configurable 30%), it triggers a recalculation of all paths within that segment to find if a new shortest path has emerged.
- Historical ETAs: Historical data based on the day of the week and hour of the day (e.g., Monday 5 PM vs. Sunday 5 PM) is cached and used to predict ETAs, especially when real-time traffic information is unavailable.
- System Architecture:
- Part 1: User Location Data Capture and Service Improvement:
- User Devices: Send regular location pings (more frequent when in transit, less when stationary).
- WebSocket Handler & WebSocket Manager: Maintain persistent WebSocket connections with devices and track which handler serves which user (using Redis).
- Location Service: Ingests user location data (User ID, Timestamp, Lat/Long) and stores it permanently in Cassandra. It also streams these pings to Kafka.
- Spark Streaming Consumers: Read from Kafka to perform various real-time calculations:
- New Road Job: Identifies new roads based on user movement patterns and writes updates to Kafka.
- Average Speed Job: Calculates average speeds for road segments from user data, updating graph weights, and writes to Kafka.
- Hotspot Identifier: Detects areas with unusual gatherings of people.
- Road Classifier: Uses ML on Hadoop data to classify new roads (one-way, multi-lane).
- Vehicle Identifier: Infers vehicle type (two-wheeler, four-wheeler, public transport) from movement patterns.
- Map Update Service & Traffic Update Service: Listen to Kafka topics from Spark Streaming jobs and update road information and traffic data, respectively, in Cassandra via the Graph Processing Service.
- Hadoop Cluster: Stores historical location pings for batch ML jobs.
- Part 2: User Navigation Flow:
- Area Search Service: Converts location searches (fuzzy search in Elastic Search) and addresses into Lat/Long coordinates. It also puts search events into Kafka for analytics.
- Navigation Tracking Service: Tracks user location during active navigation, detects route deviations, and pushes data to Kafka for later analysis of route recommendation quality.
- Map Service: An interface service for user devices and external companies (handling rate-limiting). It forwards requests to the Graph Processing Service.
- Graph Processing Service: The core intelligence for pathfinding:
- Queries Segment Service (on Cassandra) to get segment information for start/end points.
- If points are in the same segment, it checks cache or runs Dijkstra.
- For different segments, it builds a subgraph using relevant segments, their entry/exit points, and road data from its own Cassandra database.
- Incorporates live traffic information (updated by Third-Party Data Manager).
- Queries Historical Data Service (on Cassandra) for predicted ETAs if live data is unavailable.
- Performs path calculations (Dijkstra or cached lookup) and returns the result to Map Service.
- Part 1: User Location Data Capture and Service Improvement:
- Analytics:
- Third-Party Data Validation: Analyze data from Kafka to determine the accuracy of information received from third-party data providers by comparing it with internal user data.
- Road Identification & Mapping: Infer current road names and upcoming junctions for turn-by-turn navigation.
- ETA Accuracy: Monitor how accurate the predicted ETAs are compared to actual travel times to improve algorithms.
- Route Recommendation Analysis: Identify routes recommended by the system that users frequently do not take, indicating potential issues with the routes in the database.
- Points of Interest/Hotspots: Identify common gathering places in the city to build points of interest.
- User Profiling: Infer user behavior and preferences (e.g., social, loves travel, home/work locations) based on location data.
- Handling Disputed Areas:
- Google Maps uses a “smart and tricky” approach to handle politically disputed territories (e.g., Jammu and Kashmir between India, Pakistan, and China).
- The map boundaries displayed vary based on the user’s country of origin. For example, a user from India will see the whole area as belonging to India, while a user from Pakistan will see the parts they claim as theirs, and disputed areas shown with dotted lines. This helps avoid upsetting large user bases.
Leave a Reply