System Design Interview Guide
Posted in System Design on April 16, 2023 by Admin ‐ 19 min read
The topic of system design is vast, and interviews on this topic aim to assess your ability to develop technical solutions for abstract problems. These interviews are not meant to have a specific answer, and the interactive nature of the discussion between the interviewer and the candidate is a unique aspect of system-design interviews.
Moreover, the expectations from system design interviews differ at various engineering levels, as individuals with practical experience approach it differently from those who are new to the industry and these interviews are open-ended conversations. Therefore, it can be challenging to devise a single strategy to stay organized during the interview. But, you can follow a blueprint on how to approach these system design questions.
Let’s explore these:
1. System Design Requirements clarification
In system design interviews, the questions asked are often vague or abstract. Therefore, it is crucial to inquire about the problem’s specific scope and clarify functional requirements at the beginning of the interview.
Given the open-ended nature of system design questions, it is crucial to ask questions and resolve any ambiguities early in the interview process. Candidates who invest time in comprehending the system’s end goals are more likely to succeed, as they can tailor their solutions to meet those objectives.
1.1 Functional Requirements
Functional requirements specify what a system or software is supposed to do, these include core business functions and capabilities that the system needs to provide. Functional requirements are mainly concerned with the logical workings of the system rather than technical details.
They focus on WHAT the system should do rather than HOW it should do it. Evaluation is done based on whether the final system meets the key business needs. If the functional requirements are not met, the system will not be useful.
|Benefits of functional requirements :|
|• It ensures the system meets key business needs and objectives.|
|• It provides a shared understanding between stakeholders about what the system should do.|
|• It serves as a reference point to evaluate design, development and testing efforts.|
|• It reduces ambiguity, misunderstandings and rework.|
|• It forms the basis for functional testing to validate if the system meets requirements.|
|• It facilitates effective communication between technical and business teams.|
|• It serves as documentation for future enhancements, upgrades and maintenance of the system.|
|Some of Functional Requirements questions are :|
|• Who are the intended users of the systems ?|
|• What is the intended usage pattern of the system?|
|• How many users are expected to use the system?|
|• What are the functionalities of the system?|
|• What are the inputs and outputs of the system?|
|• How much data is expected to be processed by the system?|
|• What is the expected request rate per second?|
|• What is the expected read-to-write ratio?|
1.2 Non-Functional Requirements
Non-functional requirements specify constraints on the system rather than specific functions. They define qualities that the system must have to be operational. Some key types of non-functional requirements are:
Performance requirements: These specify constraints on the speed, scalability, throughput, response time etc. of the system. E.g. The system shall respond to 90% of requests within 2 seconds.
Usability requirements: These determine how easy and intuitive the system is to use. E.g. The system shall have a low user error rate (<5%), easy to learn and use interface.
Reliability requirements: These specify availability, fault tolerance, recoverability and stability of the system. E.g. The system shall have 99.9% uptime and zero unscheduled downtime.
Security requirements: These determine characteristics like authentication, authorization, encryption, auditing etc. E.g. The system shall implement role-based access control, encrypt all sensitive data, and audit all user activities.
Scalability requirements: These specify how the system can grow in capability and capacity. E.g. The system shall scale to handle 10x more users with minimal degradation in performance.
Portability requirements: These refer to the ability of the system to work across different platforms, interfaces, technologies etc. E.g. The system interfaces shall be platform and language-independent.
Supportability requirements: These determine how easy the system is to support, maintain, upgrade and enhance. E.g. The system design shall be modular, well-documented, and smoothly migratable to new technologies.
|Benefits of non-functional requirements are:|
|• They provide constraints for system design and development. Designers can evaluate alternative solutions based on these requirements.|
|• They ensure that the system meets essential quality attributes and business needs beyond pure functionality.|
|• They facilitate the optimization of key attributes such as performance, scalability, security, usability etc.|
|• They establish service-level agreements and govern the acceptability of the system. Compliance with these requirements can be validated through testing.|
|• They improve quality, reduce defects and lower the total cost of ownership. Meeting non-functional requirements upfront avoids rework and additional costs later.|
|• They provide better understandability of system capabilities and limitations to stakeholders. Unrealistic requirements can lead to disputes, delays and budget overruns.|
|• They enable the selection of appropriate technologies and architecture to meet the required qualities. Every design decision can be evaluated based on non-functional requirements.|
|• They facilitate smoother system integration since interfaces can properly conform to non-functional constraints. Disparate systems can work together seamlessly.|
2. Back of the Envelope Estimation and Constraints
Back-of-the-envelope estimation refers to making rough estimates by jotting down some numbers on the back of an envelope. It is an informal way of estimating metrics like the time required to complete key tasks or milestones for example Estimating the duration of analysis, coding, integration testing, Resource requirements, and estimating the number of team members required for project roles such as developers, testers, and architects. System scalability for example estimating maximum transactions per second, number of concurrent users etc can be handled by a system.
3. Data Model and API Design
After obtaining the necessary estimations, the next step is to define the database schema. This should be done early in the interview process to gain a clear understanding of the data flow, which serves as the backbone of any system. This step involves defining all the entities and their relationships.
A good data model design has many benefits for a system. Some key aspects of data model design are:
Defines the entities and their attributes: The entities represent real-world objects that are relevant for the system. Attributes capture the key details of each entity. E.g. Customer entity with name, address, contact etc attributes.
Establishes relationships between entities: How entities are linked together logically using relationships. E.g. Order entity is linked to the Customer entity through a relationship. Cardinality specifies the number of instances in a relationship.
Enforces data integrity constraints: Rules such as uniqueness, not null, check constraints, default values etc. These ensure data integrity and quality.
Facilitates data querying and manipulation: The design should support efficiently querying and updating data. E.g. Flat model for few relations, and a normalized model for many relations. Denormalization is also needed for performance at times.
Cater to future growth: The model should accommodate potential growth in data volume, new entities, attributes, relationships etc. without major redesign. Modular and scalable design is important.
Defines the data storage structures: The model can guide the choice of data tables, columns, indexes, keys etc. in the selected database system. The model and storage structures have a close correspondence.
Ensures data consistency: The design should ensure a single point of reference for data values. No redundancy, inconsistency or corruption of data. Values should have the same meaning wherever they occur.
|Key principles of good data model design are:|
|• Simplicity: Avoid over-engineering the model with too many entities/attributes. Keep it as simple as possible.|
|• Integrity: Satisfy all constraints completely and consistently across the model. No invalid or corrupted data should be allowed.|
|• Uniformity: Use consistent naming conventions, data types, units of measurement etc. for attributes across the model.|
|• Minimize redundancy: Store any repetitive data only once. No duplication maintaining the same information.|
|• Modularity: Split the model into smaller, independent sub-models to make it manageable and scalable. Loose coupling between sub-models.|
|• Anomaly-free: Avoid Insertion, Deletion, and Modification Anomalies. Proper representation of business rules.|
|• Future proof: Allow room for potential growth to accommodate new features without redesign.|
|• Balanced: Do not over-normalize (too many tables) or under-normalize (too few tables). A balanced normalized model is ideal.|
|• Comprehensive: Cover all aspects of data - entities, attributes, relationships, constraints, indexes etc. No scope should be missed.|
|• Stable: Minimize frequent changes to the model. Changes should not impact existing data, queries or dependent elements. Proper versioning of model changes.|
API or Application Programming Interface design has some important aspects to consider:
Decide how to expose functionality: Will a solo API provide full functionality? Or multiple APIs with separate responsibilities? Monolithic vs microservices architecture.
Choose between synchronous and asynchronous APIs: Synchronous APIs block the caller until a response is received. Asynchronous APIs return immediately and respond later. Consider scalability and user experience need.
Determine the access level: Public ( openly available), partner (limited partners), private (restricted internal use). This guides access control, authentication etc.
Choose between RESTful, GraphQL or gRPC APIs: Evaluating different paradigms based on needs like query language, versioning support, tooling etc. Hybrid approaches are also possible.
Define the API resources, endpoints, methods and schemas: The URI scheme, available resources, possible operations on each resource and request/response format schemas.
Specify the request and response formats: Query parameters, path parameters, headers, request bodies, status codes, response bodies etc. Consistency across APIs is important. JSON, XML, and Protobuf are common formats.
Establish a consistent design style: For things such as naming conventions, data normalization approaches, error handling, logging, security, versioning, caching, etc. The style should continue seamlessly across multiple APIs.
Determine service level agreements: Performance metrics such as latency, throughput, uptime and error rates. Govern scalability, high availability and quality of service.
Ensure strong developer experience: By providing complete documentation, SDKs, code samples, tutorials, forums, design tools, Postman collections etc. Cater to both beginners and experts.
Consider cross-cutting concerns: Issues that transcend individual APIs like security, analytics, monitoring, billing, caching, load balancing, failover etc. A centralized approach benefits management and consistency.
Facilitate integrations and ecosystems: Easy to discover, understand, consume and integrate third-party services. Interoperate with other internal and external APIs seamlessly.
|Key principles of good API design are:|
|• Simplicity: Avoid over-engineering. Simple, intuitive and minimalistic. Fewer components and concepts to understand.|
|• Consistency: Apply design rules consistently across all APIs. Standards and conventions make APIs easy to use. Surprise-free experience.|
|• Ensurability: APIs should be Robust, Reliable, Resilient, Scalable, Secure, Fault Tolerant. Consider different usage patterns and edge cases.|
|• Easy to use: Have great documentation, code samples, SDKs, toolsing support, endpoints, data schemas, status codes etc. for an enjoyable developer experience.|
|• Modularity: Split into separate, reusable components as much as possible. Limited dependencies between modules. Easier to build and integrate. Microservices enable this well.|
|• Versionability: Graceful versioning support to scale functionality without breaking existing integrations. Maintain backwards compatibility as new versions roll out.|
|• Divisibility: Encapsulate and expose only part of total functionality in an API. Allow evolving and improving APIs independent of each other over multiple versions.|
|• Adaptiveness: Flexible and accommodating to changes. Able to adapt to new technologies, tools, standards etc. without major re-design.|
|• Availability: High availability, fault tolerance, and disaster recovery support are as important as for any service. Reliability and responsiveness under load.|
|• Integratibility: Easy to integrate with other services, both internal and third-party. Interoperability across different platforms and languages.|
4. Database Design
Before building a system, it is crucial to properly design its data management components. This involves analyzing how data will be collected, stored, processed and presented as outputs. Some important aspects to consider at this stage are:
Identifying data inputs and outputs: Determine what kind of data will be entered into the system as inputs and what information will be produced as outputs. The outputs should meet stakeholder needs and business requirements.
Data flow: Mapping the overall flow of data through the system across various stages - inputs, storage, processing, querying, updating, accessing, reporting etc. Ensure there are no leaks, loops or roadblocks in the flow that can hamper system performance or integrity.
Data models: Logically modeling the data by defining entities (tables), attributes (columns), relationships, constraints, keys etc.Entity Relationship (ER) diagrams or Unified Modeling Language (UML) class diagrams are often used. Normalization should be addressed to minimize data redundancy.
Database selection: Evaluating which database technology (relational, NoSQL, graph etc.) would be the most suitable and optimized solution for the design. Consider data characteristics, access patterns, performance needs, scalability, features, complexity etc.
Data storage: Determining how and where the data will be physically stored and managed. The storage design should align with the logical data model and selected DBMS technology and suit the overall system architecture.
Access controls: Defining who has access to which data and what permissions or privileges different types of users will have. This is critical for security, privacy, auditing and regulatory compliance.
Integrity constraints: Establishing business rules through constraints, checks, non-null rules, unique constraints, referential integrity constraints, validated data format rules etc. This enforces data validity, consistency and quality within the system.
Indices: Applying indices judiciously to specific columns or column combinations based on how the data will be queried. Indices improve query performance but also slow down inserts, updates and deletes. The trade-off needs to be optimized.
Data design defines the framework for efficiently storing, managing and processing data within a system. It has a profound impact on key attributes like data quality, integrity, security, performance, scale, cost etc.
Robust data design upfront avoids issues later and enables a well-architected, flexible and future-ready data management solution.
5. High-level Component Design
After the dataBase, dataModel and API design have been outlined, the next step would be to give a high-level overview of the architecture. Identify all the components, such as servers, load balancers, API gateways, CDNs, Storage etc.
Below are some of the points that you should consider while making your component choice :
Identify critical components: Determine the essential components required to solve the overall problem from end to end. This could include components for data management, business logic, APIs, interfaces, security, algorithms, caching, messaging etc. Ensure all scope and requirements are fully covered.
High-level design: Outline how the critical components will interact with each other at a conceptual level. A diagram representing the system architecture and major data/control flows can help visualize and document the high-level design.
Comprehensive design: The design should comprehensively cover all perspectives - functional, non-functional, technical, business, operational etc. Any missing scope could lead to rework, adding technical debt. Get inputs from architects, designers, developers, testers, business analysts, domain experts etc.
Integration: Clearly specify how different components will integrate and exchange information. Define common standards and interfaces to enable seamless interoperability. Loose coupling and high cohesion should be aimed for.
Scalability: The design must scale gracefully to accommodate growth. It should not place rigid constraints and should flexibly support adding more resources, increasing load, expanding features etc. Microservices architecture helps achieve great scalability.
Modularity: A modular design with separate, reusable and independent components is easier to build, integrate, deploy and manage. It localizes changes and reduces impacts across the system. However, some dependencies between modules are also inevitable. Find the right balance of cohesion and coupling.
Constraints: Ensure the design complies with all constraints around cost, time, resources, technology stack, standards, SLAs, policies, regulations etc. No external dependency or limitation should be violated. Some constraints may be at odds with each other, requiring optimization.
Alternatives: Evaluate multiple alternatives and options before finalizing a design. Consider pros and cons of each alternative based on requirements, constraints, metrics, experiences etc. Comparing alternatives leads to selecting the optimal design.
Future-readiness: The designed system should be adaptable to future changes. Some flexibility and headroom should be built into the architecture, components, processes, interfaces, standards, and technologies. Ease of evolving the design should be prioritized over initial efficiency gains.
Optimize: Apply optimization techniques like simplification, generalization, abstraction, decoupling etc. to improve the design. Reducing complexity makes the design easier to implement, integrate, scale, maintain and manage. But over-optimizing can also introduce constraints. Strike a balance.
A high-level System design should be comprehensive, integrated, scalable, modular, constraint-compliant, adaptable and optimized. Explaining the overall structure and interactions between key components leads to a design that meets all business, functional, non-functional, technical and quality requirements.
6. Detailed Core Component Design
After the interviewer is satisfied with your high-level design, more often than not they will try to dig deep into a specific component that requires further attention to detail. For example, in the design of tiny URL the following points can be discussed :
Here is a quick summary explaining each major component in detail for a Tiny-URL service:
- Data storage: We need a data store to map short URLs to full destination URLs.
• SQL database: Simple, scalable, ACID compliance. But expensive for high volumes of URLs.
• NoSQL database : Scales horizontally, and supports large volume handling. But lesser ACID guarantees, and more complex queries.
• In-memory cache : Fastest lookups, handles load spikes. But with limited capacity, data not persisted. Chosen approach: Start with Redis for high performance, and persist to MongoDB. Scale MongoDB as volume grows.
- URL abbreviation: We need an algorithm to generate short, unique URLs.
• Simple incrementing numbers: Simple but will quickly exhaust options.
• Random strings: Very high scalability but harder-to-remember URLs.
• Hashcodes: Deterministic, even distribution. But longer URLs hash to the same codes potentially. Chosen approach: Use an incremental counter for the first few million URLs, then switch to a hash-based approach as numbers get too big.
- Redirect handling : Short URLs need to redirect to original long URLs.
• Store redirection rules in the database and programmatically redirect: Scales well but is complex to implement.
• Store in external redirect service: Paid service, faster and easier to set up and scale.
• Setup own redirect microservice: Most cost-effective and flexible long term but high initial setup costs. Chosen approach: Start with CDN, and build an internal redirect service as volumes increase and costs become a concern.
- API design: Expose creation, lookup and delete short URL functionality via API.
• REST API: Most popular, with many options for generation and testing. But complex.
• GraphQL API: Powerful schema-first API but a steep learning curve.
• gRPC API: Efficient binary API, scalable. But niche, limited tooling support. Chosen approach: REST API to get the widest adoption and ease of use.
7. Scaling the Design
Scalability is the ability of a system to handle increasing workload demand without degradation in performance or quality of service.
Some key techniques to ensure a system scales well are:
Load balancing: Distributes workload across multiple compute resources. Prevents overload of any single component. Options: Application load balancers, network load balancers, session affinity etc.
Horizontal scaling: Increases total workload handling capacity by adding more instances/nodes. Scales compute, storage and network resources independently based on demand. Options: Scale out more EC2 instances/EC2 Auto Scaling, Scale Storage and Databases. Achieved via replication, sharding, partitioning etc.
Caching: Stores frequently accessed data/computed results in memory for faster access. Reduces load on other system components. Options: Application caching (Redis), CDN caching, Database caching etc.
Replication: Duplicates data/work across multiple servers. Improves availability, performance, and fault tolerance. Options: Data replication (Mongo, Couchbase etc), Compute replication.
Partitioning: Breaks large data/work into smaller independent pieces. Allowing separate scaling of partitions based on demand. Options: Data partitions, Database shards.
Database Sharding: Horizontal partitioning of data across multiple DB instances. Enables exceeding memory/storage limits of single instances. Options: Range-based, Hash-based, List-based etc. Requiresshard keys to determine data distribution.
Async Processing: Offloads tasks or job work to asynchronous/background processing. Improves responsiveness and throughput of synchronous API requests. Options: Message queues (RabbitMQ), Task queues (Resque, Beanstalkd etc).
To scale a system, determine which techniques are appropriate and cost-effective for the workload profile. Some scaling options compound each other, requiring an optimized solution. Start small, monitor key metrics, and scale out components/resources independently based on empirically observed demands. Scalability needs to be designed from the beginning, not bolted on later.
8. Removing any bottleneck and Identifying Drawbacks
Every System Design is not going to be perfect, there are going to be drawbacks and bottlenecks which affect the overall performance of the system. These bottlenecks need to be monitored and action should be taken based on the metrics, for example, if the number of requests increases, scaling out your servers based on the demand would be a viable option.
Here are some of the Key points to remember while identifying a bottleneck/drawback :
Measure key metrics: Measure everything, anything that can be measured can be corrected. Track metrics like throughput, latency, error rates, resource usage, queue sizes, partition access patterns etc. that indicate system performance and scalability. Identify metrics that are degrading over time.
Find causes of bottlenecks: Analyze metrics and logs to determine the root causes of bottlenecks. Things like a disproportionate load on some components, resources running out of capacity, unoptimized code paths, lack of caching/indexing etc. can cause bottlenecks.
Prioritize issues: Not all bottlenecks require immediate resolution. Prioritize them based on impact on user experience, costs, compliance risks etc. Focus on high-priority issues that hamper system scalability and performance.
Scalability techniques: Use techniques like horizontal scaling, load balancing, caching, partitioning, replication etc. to increase overall throughput and handle more load. But only scale portions of system that require additional capacity.
Optimize code paths: Review code paths that execute very frequently under load and optimize them. Things like avoiding nested loops, inefficient algorithms, excess function calls etc. can optimize performance. Use tools for performance profiling and debugging.
Add caching: Caching stored and computed results in memory speeds up access by avoiding re-computing redundant results on every request. Opt for caching solutions based on data access patterns.
Improve indexing: Ensure useful indexes are used for frequent database queries. Indexes support fast lookups, complex queries, sorting and searches. But too many indexes also slow down inserts and updates.
Increase resources: If other techniques have been exhausted, increasing resources for CPU, memory, network etc. may help improve throughput. But this also increases costs and introduces resource idling during non-peak loads.
Monitor post-resolution: Once a bottleneck has been resolved, continue monitoring the metrics to ensure the resolution is effective under load and does not cause any new bottlenecks or compromise other performance characteristics. Frequently repeat analysis to keep the system optimized.
Centralized logging: A centralized, robust logging system helps immensely in monitoring the system’s health, diagnosing issues, analyzing metrics and debugging. Logs provide an audit trail of each request, its processing steps and corresponding responses which aids performance analysis and bottleneck resolution.
In summary, diagnose bottlenecks through data-driven analysis, find root causes, apply suitable scaling and optimization techniques, and monitor and re-analyze continuously.
Avoid running too quickly increasing resources or throwing excessive spending at the problem without trying more pragmatic solutions first. Some bottleneck resolution requires experimentation to determine what really works for the particular system and workload profile.
Make sure to read the engineering blog of the company you’re interviewing with. This will help you get a sense of what technology stack they’re using and which problems are important to them.