Software Architecture
Definition of Software Architecture
(1-) The software Architecture of a system is a high-level description of the system's structure (-1), (2-) its different components, and how those components communicate with each other (-2) (3-) to fulfill the system's requirements and constraints (-3).
Let's break this long sentence down:
(1) It means that it's an abstraction that shows us the important components that help us reason about the system, while hiding the implementation out of the view. This implies that things like technologies or programming languages are not part of the software's architecture, and are part of the implementation instead. Decisions about implementation should be delayed to the very end of the design.
(2) When we're talking about software architecture, the components are "black box" elements, that are defined by their behavior and APIs. The components may themselves be complex systems with their own software architecture diagrams. So actually, this definition may be recursive when needed.
(3) The software architecture should describe how all those components are coming together to do what the system must do, which is basically our requirements, and how the system does not do what it shouldn't do, which is described in the system constraints. We're going to see all these different components in greater detail later on in the course.
Levels Of Abstraction
When it comes to software development, we can talk about software architecture on many different levels of abstraction.
- Classes
- Modules
- Services
Starting with the lowest level of abstraction, like different classes or structs, depending on the programming language, and the organization & communication between objects inside a program.
Requirements
Motivation
Format a description of what exactly we need to build. High level of abstraction. High level of ambiguity.
Classification
Quality Attributes in Large Scale Systems
- Performance
- Availability
- Scalability
Performance
- Response time
The time between a client sending a request, and receiving a response.
The response time is usually broken down into 2 parts:
Response time
= Processing time
+ Latency
Processing time
= the time it takes for our system to process the request & send the response.
Latency
= The duration of time the request/response spends inactively in our system. Usually this is time spent in transit in our network, or our software queues, waiting to be handled, or sent to its destination.
- Throughput
The ability to ingest and analyze large amount of data at a given period of time. The more data our system can ingest and analyze per unit of time the higher the performance of our system is. The term used for this type of metric is called throughput
.
Formal definition = The amount of work performed by our system per unit of time.
Some examples:
- number of tasks accomplished per second.
- amount of data processed by our system per unit of time.
Tail Latency Graph
Tail Latency = The small percentage of response times from a system, that take the longest in comparison to the rest of the values. We want this tail to be as short as possible.
When we define the response time goals for our system, we need to do it in terms of percentiles and tail latency, and not in terms of average.
- Scalability
We can scale our system up (which is also called Vertical Scalability
), we can scale our system out (which is also called Horizontal Scalability
), or we can scale do a Team/Organization Scalability
.
- scale up a.k.a
Vertical Scalability
- scale out a.k.a
Horizontal Scalability
- scale out a.k.a
Team/Organization Scalability
** Vertical Scalability
Vertical Scalability
is adding more power to you current machine. This is the most basic one. This is the buy more memory, buy more disk space, buy a better network card to add more bandwidth.
This type of scaling is known as Scaling Up
.
*** Pros:
- Any application can benefit from it.
- No code changes are required.
- The migration between different machines is very easy.
When thr traffic is high, we can simply migrate our application into a stronger machine, and when the traffic is low we can move our application instance into a weaker machine and save money. This is particularly easy if we're renting out hardware from a cloud service provider.
** Cons:
- The scope of upgrade is very limited
- We are confined to a centralized system which cannot provide:
- High availability
- Fault tolerance
The major downside of scaling our system vertically is that there's a limit on how much we can upgrade our hardware. And when we're building a system on an internet scale, we reach that limit very fast, and there's no where to go beyond that point. A centralized system cannot provide up with High Availability
and Fault Tolerance
, which are 2 extremely important quality attributes, which we're going to discuss in the following lecture.
*** Pros:
- No limit on scalability
- It's easy to add/remove machines
- If designed correctly we get:
- High availability
- Fault tolerance
When we scale our system horizontally there's virtually no limit to our ability to scale, that's because we can easily add more and more machines as needed. And we can also remove them as easily as needed, when the load on our system goes down. And finally, when we scale our system horizontally, if we designed our system correctly, our system can provide us with High Availability
and Fault Tolerance
.
** Cons:
- Initial code changes may be required.
- Increased complexity, coordination overhead.
On the other hand, not every application can be easily ported ro run as multiple instances on different computers. So potentially we would need to make significant changes to our code. However, once we do make those changes once, then adding or removing resources shouldn't require any changes. Finally, the biggest disadvantage of horizontal scalability in running a group of instances of our application is the increased complexity and coordination overhead. We will talk about some of those challenges and how to solve them during the course. So now that we covered both vertical and horizontal scalability and compared them to each other, let's talk about the last but definitely not least scalability dimension, which is team or organizational scalability. If we look back at the definition of scalability, and we look at this not from a client's perspective but from a developer's perspective, the work that we need to do on the system is adding new features, testing, fixing bugs, deploying releases, and so on. So to increase that type of work for our system the resources that we can add is more engineers. Now, let's assume that we have a group of engineers working on one big monolithic code base. It turns out that if we plot the productivity as the function of the number of engineers in the team, before we reach a certain point, the more engineers we have in the team, the more work we get done which means our productivity increases. However, at some point the more engineers we add to the team the less productivity we actually get. Intuitively, if a team of many engineers work on the same big code base, we can think of many reasons why we see such a degradation and productivity as we hire more engineers to the team. For example, now meetings become a lot more frequent and a lot more crowded which makes them a lot less productive. Also because there are a lot of developers working on many features simultaneously, Code merge conflicts are inevitable which of course adds a lot of overhead. Essentially what happens, is everyone starts stepping on each other toes. Also as the code base grows with the number of developers, each new developer who joins the team has a lot more to learn before they become productive. Also testing becomes very hard and slow. The reason for that, is there is basically no isolation. So every minor change can potentially break something. So every time we make any change to the code base, we need to test everything. And finally releases also become very risky because each release includes a lot of new changes that came from many engineers. As a response to that, a lot of companies start making releases even less frequent which actually makes things even worse because now every release contains even more changes. So believe it or not, our software architecture impacts not only the system performance but also the ability to scale our team or increase the engineering velocity. For example, if we split our code base into separate modules or abstract, the way pieces into separate libraries then separate groups of developers can work on each individual module independently and not interfere with each other as much. However, even with the separation into modules and libraries, all those pieces are still part of the same service which means that there's still very tightly coupled especially when it comes to releasing new versions to production. So the next step, is to separate our code base into separate services. Each service has its own code base, its own stack and its own release schedule. And the services are communicating with each other through loosely coupled protocols over the network. This breakdown into multiple services allows for much better engineering productivity and helps us scale our organization. Of course breaking our monolithic code base into multiple services, doesn't come for free. But we'll talk about it in much greater detail when we get to the topic or of architectural patterns. As a final note, as this chart suggests vertical scalability, horizontal scalability, and team scalability are orthogonal to each other which means we can scale our system on either one dimension, two dimensions or all three dimensions. In this lecture, we talked about a very important quality attribute of the system which is scalability. We first got some motivation based on the fact that the traffic and the load on our system is never constant and it can increase over time as well as on a seasonal or sporadic pattern. After that, we define scalability more formally as the measure of system's ability to handle a growing amount of work in an easy and cost effective way by adding more resources to the system. Later, we went ahead and learned about the three orthogonal ways we can scale our system in terms of performance and productivity. The first way was the vertical scalability which we achieved by adding resources or upgrading the existing resources on a single computer. After that, we talked about the horizontal scalability which can be achieved by adding more resources in a form of new instances running on different computers. And finally, we talked about the third scalability dimension which is theme or organizational scalability. Unlike the first two that are focused mainly on performance, this scalability allows our company to keep increasing its productivity while hiring more engineers into the team. I will see you soon in the next lecture. ---
** Horizontal Scalability
In contrast to the Vertical Scalability
, instead of upgrading the existing hardware, we can simply add more units of the resources we already have. This is the core concept of micro-frontends. This allows us to spread the load across a group of machines.
- Availability
Availability is one of the most important quality attributes of a large scale system. So in this lecture, we're going to talk about the importance of high availability, how to define and measure availability in general and later we will talk specifically, about what constitutes high availability of a system. So, let's start talking about the importance of high availability. Availability is one of the most important quality attributes when designing a system. Because arguably it has the greatest impact on both our users and our business. When it comes to users, we can only imagine the frustration of a user trying to purchase something from our online store but when they type in the URL of our service in the browser, the page just doesn't load. Or maybe it does load but when they go to checkout instead of getting a confirmation, they get an error. Similarly, if we're an email service provider, we can only imagine the impact of everyone losing in their access to their email for a few hours or even a whole day. On that note, if we are a company that provides services to other companies instead of to end users, the impact of a prolonged outage in our services is compounded. A very famous example of that, is the AWS simple storage, four hour outage in February, 2017. Because hundreds of companies and more than a hundred thousand websites were using that service, it's safe to say that that kind of outage almost took down the entire internet. But system downtime is not always just an inconvenience. In case our software is used for mission critical services like air traffic control in an airport for example, or to help manage the healthcare of patients in a hospital, if our system crashes or becomes unavailable for an extended period of time, we potentially have people lives on the line. Now, from the business perspective, the consequences of our system going down are twofold. First of all, if the users cannot use our system then our ability to make money, simply goes to zero. Now, the second consequence is when the users lose access to our system for too long or too frequently, they will inevitably go to our competitors. So, essentially when our system goes down our business may lose both money and also its customers. Now, that we got a feel of how important it is for our system to be available, let's proceed with defining availability a bit more formally. We can define availability as either the fraction of time or the probability that our service is operationally functional and accessible to the user. That time that our system is operationally functional and accessible to the user, is often referred to as the 'uptime' of the system. While the time that our system is unavailable to the user, is commonly referred to as 'downtime'. Availability is usually measured as a percentage, representing the ratio between the uptime and the entire time our system is running, which is the total sum of uptime and downtime of the system. Now, as a side note, sometimes uptime and availability are used interchangeably. So, if you read in an agreement and you see that uptime is described in percentages and not as an absolute time, they simply talk about availability instead. Another alternative way to define as well as estimate availability is using two other statistical metrics, the MTBF and MTTR.
- MTBF
- MTBR
MTBF is the meantime between failures, represents the average time our system is operational. This metric is very useful, if we're dealing with multiple pieces of hardware such as computers, routers, hard drives and so on, where each component has its own operational shelf life. Typically though, if we're dealing with an existing service like a cloud service provider or a third party API, those services usually give us their general uptime and availability upfront. So, we don't really need to make any calculations.
MTTR, which stands for meantime to recovery, is the average time it takes us to detect and recover from a failure, which intuitively is the average downtime of our system. Because until we fully recover from a failure, our system is essentially nonoperational. So the alternative formula for the availability of a system, is:
Availability = MTBF / (MTBF + MTTR)
Now, because this formula is more statistical and talks about probabilities, it's much more useful for availability estimation rather than actual measurement of availability. Now, one very important aspect that we can see in this formula, which we cannot easily see in the previous formula is if we minimize the average time to detect and recover from a failure, theoretically all the way to zero then we essentially achieve a 100% availability:
Availability = MTBF / (MTBF + ~MTTR~0)
Availability = MTBF / MTBF = 100%
regardless of the average time between failures. Of course, in practice we cannot get the recovery time completely to zero. But make a mental note of the effect detectability and fast recovery has on availability, as this serves as a clue on one way we can achieve high availability. Now, how much availability should we aim for?
- Users would want 100% availability
- On the engineering side it is:
- Extremely hard to achieve
- Leaves no time for maintenance/upgrades
Obviously, 100% availability is what our users would want. But on the engineering side, it is extremely hard to achieve and essentially it leaves us with no time for either planned or emergency maintenance or upgrades. So, what would be an acceptable number that still provides our users with high availability? Before we try to answer this question, let's look at a few percentages of availability and try to get a feel of what that measure means for our users.
Availability | Daily Downtime | Weekly Downtime | Monthly Downtime | Annual Downtime |
---|---|---|---|---|
90% | 2h 24m | 16h 48m | 3d | 36d 12h |
95% | 1h 12m | 8h 24m | 1d 12h | 18d 6h |
99% | 14m 24s | 1h 40m 48s | 1m 48s | 3d 15h 36m |
99.9% | 1m 26s | 10m 5s | 43m 12s | 8h 45m 36s |
Let's start with a 90% availability which may seem like a pretty high number but if we look at the table we see that with that availability, our users may not be able to use our system for more than two hours every day or for more than 36 days which is more than a month out of an entire year. So, intuitively we already see, that this cannot be considered high availability. Now, what about 95% availability? Well, that's definitely better but still one day of downtime every month or one hour of every day is not considered high enough for most use cases. So, there's no strict definition of what constitutes high availability but the industry standards set by cloud vendors is generally anywhere between 99% to 100%. Typically at 99.9% or higher. If we look back at our availability table that means that their service may be unavailable for less than one and a half minutes per day, which would not be very noticeable but it still leaves out eight hours of emergency maintenance work, every year if needed. To make discussions about availability a bit easier, percentages are also sometimes referred to by the number of nines in their digits. So for example, 99.9% would be referred to as three nines. 99.99% would be referred to as four nines and so on. Now, before we move on to learning about concrete strategies on how to achieve high availability, let's quickly summarize what we learned in this lecture. In this lecture, we learned the importance of high availability both from a user and business perspective. Specifically, we mentioned the two risks that our business faces, if our system goes down for prolonged periods of time or too frequently. Which are the loss of revenue, as well as the potential loss of customers to our competitors. After that, we defined availability as the fraction of time or probability that our service is operationally functional and accessible to the user. And later, we learned about the two formulas to measure and estimate the system's availability. One using the uptime and downtime of a system and another using the meantime between failures and meantime to recovery. Finally, we defined high availability as three nines or above, based on the standard that the cloud vendor set in the industry.
Fault Tolerance & High Availability
In the previous lecture we talked about availability in general and in particular the importance of high availability for both users and our business. In this lecture, we're going to focus on what prevents us from getting high availability out of the box and what measures and tactics we can take to improve the availability of our system. So generally there are three sources or categories of failures.
- Pushing a fault config to production
- Running the wrong command /script
- Deploying an incompletely tester new version of software
The first and most common source of failures are human error, which includes pushing a faulty config to production, running the wrong command or the wrong script, or deploying a new version of our software that hasn't been thoroughly tested.
The second category of failures, is software errors.
- Long garbage collections
- Out-of-memory exceptions
- Null pointer exceptions
- Segmentation faults
Those include exceptionally long garbage collections and crashes such as out-of-memory exceptions, null pointer exceptions, segmentation faults, and so on.
Lastly, we have hardware failures such as:
- Servers/routers/storage devices breaking down due to limited shelf-time
- Power outages due to natural disasters
- Network failures because of:
- Infrastructure issues
- General congestion
So, servers, routers or storage devices breaking down due to made shelf life, power outages due to natural disasters or network failures because of infrastructure issues or general congestion. Or, how can we deal with all those failures and still achieve high availability?
- Failures will happen despite:
- Improvements to our code
- Review, testing, and release processes
- Performing ongoing maintenance to our hardware
- Fault Tolerance is the best way to achieve
High Availability
Despite the obvious improvements that we can make to our code review, testing and release processes as well as performing ongoing maintenance to our hardware, inevitably failures will happen. So the best way to achieve high availability in our system is through fault tolerance.
"Fault tolerance enables our system to remain operational and available to the users despite failures within one or multiple of its components.
When failures happen:
- Continue operating at the same/reduced level of performance.
- Prevent the system from becoming unavailable.
When those failures happen, a fault tolerance system may continue operating at the same level of performance or a reduced level of performance, but it will prevent the system from becoming entirely unavailable.
- Fault Tolerance revolves around 3 major tactics:
- Failure Prevention
- Failure Detection & Isolation
- Recovery
Generally, fault tolerance revolves around three major tactics, which are failure prevention, failure detection and isolation and recovery.
Failure Prevention
The first thing we can do to prevent our system from going down is to eliminate any single point of failure in our system. A few examples of a single point of failure can be that one server where we are running our application or storing all our data on one instance of a database that runs on a single computer. The best way to eliminate a single point of failure is through replication and redundancy. For example, instead of running our service as a single instance on one server, we can run multiple copies or instances of our application on multiple servers. So if one of those servers goes down, for whatever reason, we can potentially direct all the network traffic to the other servers instead. Similarly, when it comes to databases, instead of storing all the information on one computer we can run multiple replicas of our database containing the same data on multiple computers. So losing any of those replicas will not cause us to lose any data. When it comes to computations, besides the spatial redundancy, which comes in the form of running replicas of our application on different computers, we can also have, what's referred to as time redundancy. Time redundancy essentially means repeating the same operation or the same request multiple times, until we either succeed or give up. Now, when it comes to redundancy and replication we have two strategies, both of which are extensively used in the industry in different systems. The first strategy is called active-active architecture. If we take the database example, it means that requests go to all the replicas, which forces them to all be in sync with each other. So if one of the replicas goes down, then the other replicas can step in and take all the requests immediately. The second strategy is called active-passive architecture, which implies that we have one primary instance or replica that takes all the requests, and the other passive replica or replicas follow the primary replica by taking periodic snapshots of its state. The advantage of the active-active architecture is that essentially we can spread the load among all the replicas, which is identical to horizontal scalability. This allows us to take more traffic and provide better performance. The disadvantage of this approach is that all the replicas are taking requests. So keeping all those active replicas in sync with each other is not trivial and it requires additional coordination and overhead. Similarly, if we take the active-passive architecture approach, then we lose the ability to scale our system, because all the requests still go to one machine. But the implementation of this approach is a lot easier because there is a clear leader that has the most up-to-date data and the rest of the replicas are just followers. The second and tactic for achieving fault tolerance in our system is failure detection and isolation. If we go back to this picture of our application running as multiple instances on different computers, if one of the instances crashes because of software or hardware issues then we need the ability to detect that this instance is faulty and isolated from the rest of the group. This way we can take actions like stopping requests for going to this particular server. Typically, to detect such failures we need a separate system that acts as a monitoring service. The monitoring service can monitor the health of our instances by simply sending it period health check messages. Or alternatively, it can listen to periodic messages called heartbeat that should come periodically from our healthy application instances. In either of those strategies, if the monitoring service does not hear from a particular server for a predefined duration of time it can assume that that server is no longer available. Of course, if the issue was in the network or that host simply had a temporarily on garbage collection the monitoring service is going to have a false positive and assume that a healthy host is faulty. But that's okay, we don't need the perfect monitoring service as long as it doesn't have false negatives. Because false negatives would mean that some of our servers may have crashed but our monitoring system did not detect that. Now, besides the simple exchange of messages in the form of pings and heartbeats, our monitoring system can be more complex and monitor for certain conditions. For example, our monitoring system can collect data about the number of exceptions or errors each host gets per minute. And if the error rate in one of those hosts becomes exceptionally high, our monitoring system can interpret that as an overall failure on that host. Similarly, it can collect information about how long each host takes to respond to incoming requests. And if that time becomes too long the monitoring system can decide that something must be wrong with that server because it became exceptionally slow. Now, let's talk about the third tactic in achieving fault tolerance, which is recovery from a failure. If we recall the second formula of availability that we learned in the previous lecture, we can see that regardless of our system failure rate if we can detect and recover from each failure faster than the user can notice then our system will have high availability. So once we detect and isolate the faulty instance or server we can take several actions. The obvious one is to stop sending any traffic or any workload to that particular host. The second action we can take is to attempt to restart it, with the assumption that after the restart the problem may go away. Another strategy which is commonly used is called a rollback. A rollback essentially means going back to a version that was still considered stable and correct. Rollbacks are very common in databases. If for some reason, we get to a state that violates some condition or the integrity of the data, we simply rolled back to the last state in the past which is considered to be correct. Similarly, if we're rolling out a new version of our software to all the servers, and we detect that all the servers with the new version are getting a lot of errors, we can automatically roll back to the previous version of our software to prevent our system from becoming entirely unavailable. In this lecture, we learned about fault tolerance which is how we achieve high availability in large-scale systems. We first discussed the different sources of failures we can have in our system, which are human error, software crashes, and hardware failures. Later we defined fault tolerance as the ability for our system to remain operational and available to the users despite failures to one or some of its components. And finally, we discussed in detail the three tactics we use to achieve fault tolerance, which are failure prevention, detection and isolation and recovery. I will see you soon in the next lecture.
Part 11: SLA SLO & SLI
Now that we covered the most important quality attributes and everything that we need to know about them, let's conclude the section with three very important terms that essentially aggregate the promises we make to our users in regards to those quality attributes. Those are SLA, SLOs and SLIs. So let's start with the first term which is service level agreement or SLA in short.
Term 1: SLA
The SLA is an agreement between us, the service provider and our clients or users. It's essentially a legal contract that represents the promise we make to our users in terms of quality service such as availability, performance, data durability, the time it takes us to respond to system failures and so on. It also explicitly states the penalties and financial consequences to us if we fail to deliver those promises and bridge the contract. Those penalties can include full or partial refunds, subscription or license extensions, service credits and so on.
- SLAs exist for:
- External paying users (always)
- Free external users (sometimes)
- Internal users within our company (occasionally)
SLAs almost always exist for external paying users but sometimes also exist for free external users and occasionally even for internal users within our company.
- Internal SLAs don't include penalties
- SLAs for free external users makes sense if our system has major issues during a free trial of our system
- We compensate those users with a trial extension or credits for future
- Companies providing entirely free services don't publish SLAs
Although internal SLAs would unlikely include any penalties. The reason SLA still makes sense for free external users is for example if we're providing a three day free trial of our service to our users, enduring that free trial our system had major issues, then it would make sense for us to compensate those users by giving giving them an extension of the trial or maybe some credits that they can use for the future. However, most companies that provide entirely free services avoid making any strict promises to the public and don't publish any SLA. When it comes to internal customers within our company, those customers maybe other teams that are providing services to external customers. And in order for them to meet their SLA, they need to know ours and rely on us to meet that agreement.
Term 2: SLO
Now let's talk about the next term, the SLO which stands for service level objective.
What is SLO?
- Individual goals set for our system.
- Each SLO represents a target value/range that our service needs to meet (for our quality attributes).
SLOs are the individual goals that we set for our system. Each SLO represents a target value or a target range of values that our service needs to meet.
For example, we can have an:
- Availability Service level Objective of three nines.
- A response time SLO of less than a hundred milliseconds at the 90th percentile
- or an issue resolution time objective of between 24 or 48 hours.
All the quality attribute requirements that we gathered at the beginning of the design process will make their way into one or multiple SLOs as well as other objectives that we want to hit with our system and hold ourselves accountable for. If our system has an SLA, then each SLO represent an individual agreement within that SLA about a specific metric in the system. So essentially an SLA aggregates all the service level objectives in a single legal document. But even systems that don't have an SLA still must have SLOs because if we don't have those objectives, our internal or external users don't know what to expect when they rely on our system.
Term 3: SLI
Now the third term we're going to cover is the service level indicator or SLI. A service level indicator is a quantitative measure of our compliance with a service level objective. In other words, it's the actual numbers we measure using a monitoring system or that we calculate from our logs. Once we calculate the numbers based on those service level indicators, we can later compare them to the goals we set for ourselves when we define the service level objectives. For example, we can measure the percentage of user requests that receive the successful response and use that as an indicator to measure the availability of our system and compare it to the availability service level objective of three nines that we set for our service. Similarly, we can collect the response times of each request, then bucket them into time windows and calculate the average or the percentile distribution of the response time that our users experienced. Later, we can compare those numbers to the end-to-end latency SLO of a hundred milliseconds at the 90th percentile that we originally set for ourselves. Since the SLOs essentially represent the target values for our most important quality attributes, now we understand why it was so important for our quality attributes to be testable and measurable. Because if they weren't measurable, we wouldn't be able to find any service level indicators to validate that we are indeed meeting our SLOs. And if we can't measure and prove that we are meeting our SLOs, we can't definitively say that we are meeting our service level agreement which is a legal contract. Now, while the service level agreements are crafted by the business and legal teams, the software engineers and architects have a lot more control in defining and setting the service level objectives and the service level indicators associated with those SLOs. So when it comes to the service level objectives, there are a few considerations that we need to take into account.
Important Considerations
- We shouldn't take every SLI that we can measure in our system and define an objective associated with it. Think about the metrics that users care about the most, and define the SLOs around those metrics. From the SLOs, define the right SLIs to track those SLOs.
- Promising fewer SLOs is better. With too many SLOs it's hard to prioritize all of them equally. With few SLOs, it's easier to the entire software architecture around them.
- Set realistic goals with budget for error. We shouldn't commit to five-nines of availability even if we can provide that. We should commit to a much lower availability than we can provide. This saves costs and incorporates unexpected issues. Internal SLOs tend to be more aggressive than external SLOs. For example, externally we can commit to 99.9% of availability, but internally we commit to 99.99% availability. This would avoid financial penalties.
- Create a recovery plan for when the SLIs show that we are not meeting our SLOs. We need to decide ahead of time what to do if:
- The system does down for a long time
- The performance degrade
- Reports about issues/bugs in the system This plan should include:
- Automatic alerts to engineers/DevOps
- Automatic failovers/restarts/rollbacks/auto-scaling policies
- Predefined handbooks on what to do in certain situations
The first consideration is we shouldn't take every service level indicator that we can measure in our system and define an objective associated with this indicator. Instead, we should first think about the metrics that users care about the most and define the service level objectives around those metrics. From that, we find the right service level indicators to track those service level objectives. The second consideration is the fewer service level objective we promise, the better it is for us. With too many SLOs, it's hard for us to prioritize all of them in an equal manner. So when we have just a few SLOs, it is much easier for us to focus our entire software architecture around them to make sure we meet our goals. The third consideration is setting realistic goals with a budget for error. Just because we can provide five-nines of availability doesn't mean that we should commit to that. Instead, we should commit to a much lower availability than we can provide. This allows us to save costs and also leaves us enough room for unexpected issues. And it's especially true when our service level objectives are represented in an external service level agreement. In those cases, sometimes companies define separate external service level objectives which are a lot looser than internal service level objectives that tend to be more aggressive. For example, externally, we can commit to 99.9% of availability but internally, we can commit to 99.99% availability. This way, we can strive for better quality of service internally while committing to a lot less to our clients. This would also avoid any financial penalties to us if we don't reach this high bar that we set for ourselves internally. The final consideration is we need to create a recovery plan to deal with any situation when the service level indicators start showing us that we are not meeting our service level objectives. In other words, we need to decide ahead of time what to do if all of a sudden our system goes down for long periods of time, if our performance degrades or we suddenly get too many reports about issues or bugs in our system in a short period of time. This plan should include automatic alerts to engineers or DevOps, automatic failovers, restarts, rollbacks or auto scaling policies and predefined handbooks on what to do in certain situations so that the person on call doesn't have to improvise when they're alerted about an emergency in our system. In this lecture, we learned about three very important terms in designing a real life system:
- The service level agreement or SLA
- The service level objectives or SLOs
- And the service level indicators or SLIs
We learned that the SLA is a legal contract between a service provider and its customers which essentially aggregates the most important service level objectives that we promise to our users, while the service level indicators are used to measure our compliance with those objectives. Finally, we learned about four important considerations when defining service level objectives. The first one was defining the most important service level objectives that our users care about and then find the service level indicators based on those objectives. The second and third considerations told us to commit to the bare minimum in terms of the number of objectives and how aggressive they are. And finally, the last consideration was having a recovery plan ahead of time to deal with situations wherein potential breach of our SLOs.
Section 4: API Design
In this lecture, we're going to talk about designing the application programming interface for our system (API).
As always, we will start with some introduction and motivation for why we need to design an API for our system. And later we will talk about a few categories of APIs as well as the best practices and patterns for designing a good API for our clients.
Introduction to API Design
So let's start with defining what an API is and why we need it. After we capture all the functional requirements, we can start thinking about our system as a black box.
- After capturing all function requirements, we can think of our system as a black box.
- That black box has:
- Behavior
- Well-defined interface
- That interface is a contract between:
- Engineers who implement the system
- Client Applications who use the system
- Since this interface is called by other applications, it is referred to as Application Programming Interface or API.
- In a large-scale system, API is called by other applications remotely through the network.
So, we that black box has behavior, as well as a well defined interface. That interface is essentially a contract between us the engineers who implement the system, and the client applications who use our system. Since this interface is going to be called by other applications, we'll refer to this interface as an application programming interface or API in short. To clarify, since we are building a large scale system, and not a class or a library, this API is going to be called by other applications remotely through the network. The applications that are going to be calling our API may be front-end clients like mobile applications or web browsers. They can be other backend systems that belong into other companies, or they can be internal systems within our organization. Additionally, once we finish the internal design of our system and decide on its different components, each such component will have its own API which will be called by other applications within our system. Now let's talk about the different in categories of APIs depending on the system we're designing and the type of clients that are using it.
APIs are classified into 3 groups:
- Public APIs
- Private/Internal APIs
- Partner APIs
Public APIs:
- Exposed to the general public
- Any developer can use/call them from their application
- Good practice:
- Requiring the user to register with us before allowing to send requests and use the system. This allows:
- Control over who uses the system externally
- Control over how they use the system
- Better security
- To blacklist users breaking the rules
Private APIs:
- Exposed only internally within the company
- They allow other teams/parts of the organization to:
- Take advantage of our system
- Provide bigger value for the company
- Not expose the system directly outside the organization
Partner APIs:
- Similar to Public APIs
- Exposed only to companies/users having business relationship with us.
- The business relationship is in the form of:
- Customer Agreement after buying our product
- Subscribing to our service
Benefits of API:
- Client who uses it can immediately and easily enhance their business by using our system.
- They don't need to know anything about our system's internal design or implementation.
- Once we define and expose our API, clients can integrate with us without waiting for full implementation of our system.
- API makes it easier to design and architect the internal structure of our system, since:
- It defines the endpoints to the different routes that the user can take to use our system.
Best Practices for making good API:
- Complete Encapsulation of the internal design and implementation. Abstracting it away from a developer wanting to use our system. If a client wanting to use our API finds himself requiring a lot of information about how it is implemented internally, or in need of knowing our business logic to use it, then the whole purpose of the API is defeated. The API should be completely decoupled from our internal design and implementation, so that we can change the design later without breaking the contract with our clients.
- Easy to use. Easy to understand. Impossible to misuse. The way to make an API simple can be:
- Only one way to get certain data/perform a task.
- Having descriptive names for actions and resources.
- Exposing only the information and actions that users need.
- Keeping things consistent all across our API.
- Keeping the operations Idempotent as much as possible. An idempotent operation is an operation that doesn't have any additional effect on the result if it's performed more than once. For example, updating the user's address to a new address is an idempotent operation because result is the same regardless whether we perform this operation one time, five times or 10 times. On the other hand, incrementing a user's balance by $100 is not an idempotent operation because the result will be different depending on the number of times we perform this operation. The reason we prefer idempotent operations for our API is that our API is going to be used through the network. So if the client application sends us a message, that message can be either lost. The response to that message may be lost or a critical component inside our system may go down and the message wasn't even received. Because of this network decoupling the client application has no idea which one of those scenarios actually happened. So if that operation is idempotent, then the client application can simply resend the same message again without risking adverse consequences.
- API Pagination. Pagination is an important API feature for scenarios when the response from our system to the client request contains a very large payload or dataset. Without pagination, most client applications would either not be able to handle such big responses or result in a poor user experience.
- Asynchronous Operations
- Versioning your API. Allow the client to know which version he is currently using. Best practice design allows us to make changes to the internal design and implementation without changing the API. In practice, we may need to make non-backward compatible API changes. If we explicitly version the APIs we can maintain two versions of the API at the same time, and deprecated the older one gradually with proper communication with the clients who are still using it.
Generally we can classify APIs into three groups. public APIs, private or internal APIs and partner APIs. Public APIs are exposed to the general public and any developer can use them and call them from their application. A good general practice for public APIs is requiring the users to register with us before we allow them to send us requests and use our system. This allows us to have better control over who uses our system externally and how they're using it. This in turn provides us with better security and also allows us to blacklist users that don't play by our rules. Private APIs are exposed only internally within our company. Private APIs allow other teams or part of our organization to take advantage of our system and for provide bigger value for our company without exposing our system directly outside the organization. Partner APIs are very similar to public APIs, but instead of being exposed and available to anyone who wants to use them, they're exposed only to companies or users that have a business relationship with us in a form of a customer agreement after buying our product or subscribing to our service. Having an explicit contract in the form of an API with our clients has a few benefits. The main benefit of a well defined API is that the client who uses it can immediately and easily enhance their business by using our system without knowing anything about our system's internal design or implementation. However, there are a few other benefits that should be considered. As we're going through the design process of our system, once we define our API and expose it to anyone who wants to use it in the future, those clients don't have to wait until we finish implementing our system. And once they know our API, they can already start making progress towards their integration with us. Another benefit is once we define our API, it's a lot easier for us to design and architect the internal structure of our system because this interface essentially defines the endpoints to the different routes that the user can take to use our system. Now, let's talk about the important considerations, patterns and best practices for designing a good and easy to use API. The first rule that would make a good API for any system not only a large scale system is complete encapsulation of the internal design and implementation and abstracting it away from the developer that wants to use our system. In other words, if a client who wants to use our API finds himself requiring information about how this API is implemented internally, or needs to know too much about our business logic for them to use it, the whole purpose of the API is defeated. In addition, we want our API to be completely decoupled from our internal design and implementation so that we can change that design anytime in the future without breaking our contract with existing clients. Another thing to consider when designing an API is that it needs to be easy to use, easy to understand and impossible to misuse accidentally or on purpose. A few things that can help make an API simple are having only one way to get certain data or perform a task and not many. Having descriptive names for our actions and resources and exposing only the information and the actions that the user need, and not more than that. Also keeping things consistent all across our API will make using it a lot easier. The next best practice for a good API is keeping the operations idempotent as much as possible. An idempotent operation is an operation that doesn't have any additional effect on the result if it's performed more than once. For example, updating the user's address to a new address is an idempotent operation because result is the same regardless whether we perform this operation one time, five times or 10 times. On the other hand, incrementing a user's balance by $100 is not an idempotent operation because the result will be different depending on the number of times we perform this operation. The reason we prefer idempotent operations for our API is that our API is going to be used through the network. So if the client application sends us a message, that message can be either lost. The response to that message may be lost or a critical component inside our system may go down and the message wasn't even received. Because of this network decoupling the client application has no idea which one of those scenarios actually happened. So if that operation is idempotent, then the client application can simply resend the same message again without risking adverse consequences. The next best practice is called API pagination. Pagination is an important API feature for scenario when the response from our system to the client request contains a very large payload or dataset. Without pagination, most client applications would either not be able to handle such big responses or result in a poor user experience. Imagine what would happen if you opened your email account on your web browser, and instead of seeing only the last 10 or 20 emails, you would be presented with all the thousands of emails that you ever received since you opened your email account all on one page. Similarly, you can imagine what would happen if you opened your favorite online store or search engine on your mobile phone or web browser. And anytime you searched for a particular item, you would get all the thousands or maybe millions of items that somehow matched your search query. Obviously, if your application or web browser was even able to handle so much data without crushing, which in most cases is highly unlikely, it would take an unreasonable time to show all those results. So pagination allows the client to request only a small segment of the response by specifying the maximum size of each response from our our system, and an offset within the overall dataset. And if they want to receive the next segment, they simply increment the offset. The next best practice and pattern refers to the operations that take a long time for our system to complete. Unlike the previous examples where partial results were meaningful, so we could solve them using pagination, some operations just need one big result at the end and there is nothing meaningful that we can provide before the entire operation finishes. A few examples of such operations can be running a big report that requires our system to talk to a lot of databases, big data analysis that scans a lot of records or log files for aggregation purposes or compression of large video files. In all those use cases, if the operation takes a long time, the client application is forced to wait for the result without the ability to make any progress. The pattern that is used for this type of situation is called an asynchronous API. With an asynchronous API, a client application receives a response immediately without having to wait for the final result. That response usually includes some kind of identifier that allows the client application to track the progress and status of the operation, and eventually receive the final result. The final important best practice for designing a good API is a explicitly versioning it and allowing the client to know which version of the API they're currently using. The motivation behind versioning our API is as follows. As we mentioned earlier, the best API we can design is the one that allows us to make changes to the internal design and implementation without needing to make any changes to the API. However, in practice, we can't always predict the future and we may be forced to potentially make non-backward compatible API changes at one point or another. So if we explicitly version the APIs, we can essentially maintain two versions of the API at the same time and deprecate the older one gradually with proper communication with the clients who are still using it. Now, theoretically we're free to define our API in any way we want. As long as we adhere to those best practices we just mentioned. But over time, a few APIs became more standard in the industry. So we're going to cover those types of APIs in the next lectures. But before we go there, let's quickly summarize what we learn in this lecture. In this lecture, we defined an API as a contract that we define for other applications so they can use our system without knowing anything about its internal design and implementation. Later, we talked about the three categories of APIs which are the public API, private API and partner APIs. And finally, we talked about a few best practices and patterns for designing a good API, which are encapsulation, ease of use, making our operations idempotent, pagination, asynchronous operations and API versioning. I will see you soon in the next lecture.
Section 4: RPC
What is an RPC?
A Remote Procedure Protocol is a is the ability of a client application to execute a subroutine on a remote server. What's unique about this API is that this remote method invocation looks and feels like calling a normal local method, in terms of the code that the developer needs to write. This unique feature of an RPC is also commonly referred to as local transparency, as in the eyes of the developer of the client application, a method executed locally or remotely look almost exactly the same
In this lecture we're going to learn about the first type of API which is commonly referred to as Remote Procedure Calls or RPC in short. We're first going to learn what it is and how it works, and later we'll talk about its benefits, its drawbacks and when to prefer this style of API over the others. So what is an RPC? A Remote Procedure Call is the ability of a client application to execute a sub routine on a remote server. However, what's unique about this type of API is that this remote method invocation looks and feels like calling a normal local method, in terms of the code that the developer needs to write. This unique feature of an RPC is also commonly referred to as local transparency. As in the eyes of the developer of the client application, a method executed locally or remotely look almost exactly the same. Another feature that usually comes with most but not all RPC framework implementations is support for multiple programming languages, so that applications written in different programming languages can seemingly talk to each other using RPC.
How does an RPC works?
The way an RPC works is as follows, the API as well as the data types that are used in the API methods are declared using a special interface description language, which varies between different RPC framework implementations. This is essentially a schema definition of the communication between a remote client and the server which is part of our system. The value of defining the interface and the data types using the specialized language is once we have this interface definition, we can use a special compiler or a code generation tool which is part of the RPC framework to generate two separate implementations of the API method. One for the server application, and one for the client application. The server side implementation of the RPC method is called the server stub, and the auto generated implementation on the client application side is called the client stub. Those stubs take care of all the implementation details of the remote procedure invocation. Additionally, all the custom object types that we declare using the interface description language are compiled into classes or structs, depending on the programming language. Those auto generated object types, are also commonly referred to as Data Transfer Objects or DTOs in short. Now at run time, whenever the client application calls that particular RPC method with some set of parameters the client stub takes care of then coding of the data which is also commonly referred to as serialization or marshalling. After then coding it initiates the connection to the remote server application and sends the data over to the remote server stub. On the other end, the server stub for that particular RPC method is listening to the client applications messages, and when a message is received, the data is deserialized and then the real implementation of the method is invoked on the server application. Once the server completes the operation the result is then passed back to the server stub which in turn serializes the response and sends it back over to the client application. Finally, the client stub for that particular method receives the encoded response, deserializes it and gives it back to the caller as the return value for what looks like a local method invocation. This concept of implementing an API using an RPC has been around for decades and the only thing that changes over time are the frameworks the details of their implementation and their efficiency. So our job as API developers is to pick an appropriate framework define the API as well as the relevant data types using the frameworks interface description language and publish that description. At this point the client which uses our system and the server which is part of our system are completely decoupled. When we finish designing and implementing our system we can generate that stub for the server, and when a new client wants to integrate with us and make API calls to us, all they have to do is use the publicly available frameworks tools to generate their client's stub based on the API definition that we published earlier. Furthermore, if we pick an RPC framework that supports multiple programming languages then we don't limit ourselves to the choice of programming language for the server side. And we allow the client to pick their favorite programming language, to communicate with our system. Now, let's talk about the benefits and some of the drawbacks of picking this style of API. The first benefit is of course the convenience, we provide to the developers of the client applications. Once they generate their stub, they can communicate with our system easily by simply calling methods on objects which look and feel exactly like calling normal local methods. And all the details of how the communication is established or how the data is passed between the client to the server are entirely abstracted away from the developers. Also any failures in the communication with the server simply result in either an error or exception, depending on the programming language just like in the case of any normal method. Now let's talk about the drawbacks of RPC. The main drawback of using an RPC style is unlike local methods executed on the client side, those remote methods are a lot slower and a lot less reliable. The slowness may introduce surprising performance bottlenecks because the client never knows how long those remote method in vocations, can actually we take. However, on the surface code wise they look very much like local methods which are generally fast. So as API designers, we need to help the client application developers avoid blocking their code execution when we introduce methods that we know are going to be slow on our end. The way to do it is by introducing asynchronous versions for those slow methods, which is not surprisingly is one of the best practices for those situations that we learned in the previous lecture. Now, the unreliability stems from the fact that the client is located remotely running on a computer that potentially belongs to a different company and is using the network to communicate with a server in our system. So if we're not careful in designing the API we can introduce some very confusing situations for the client application developers. For example, if we're a credit card company and the client is running an online store and is trying to call the debit account method which is part of our API, a failure or an exception in calling such a method can leave them with a dilemma of whether they should retry calling the method and running the risk of charging the user twice or not retrying it and running the risk of not charging the user at all. That of course comes from the fact that there is no real way for the client to know whether the server on our system received the message and the acknowledgement message simply got lost in the network or the server in our system simply crashed and never received the message. Now, although we can't really solve the unreliability problem we can mitigate the issue by sticking to one of the other best practices we learned in the previous lecture, which is making our operations idempotent when possible. Now let's talk about when we should prefer the RPC style for defining our API and when we should take a different approach. Remote Procedure Calls are very commonly used in communication between two backend systems. And although there are frameworks that support RPC calls from frontend clients like the web browser they are usually less common. So in summary, the RPC style is a perfect choice for an API, we provide to a different company instead of an end user app or webpage. And it's also great choice for communication between different components internally within a large scale system. The RPC API style is also a good choice for situations when we want to completely abstract the way the network communication and focus only on the actions the client wants to perform on the server system. This isn't always the case for all types of APIs. So in cases where we don't want to abstract the network communication away, and we do want to take direct advantage of things like HTTP cookies or headers using the RPC approach would not be a good fit and there are other styles that would be a better match for this use case. Finally, the RPC revolves more around actions and less around data or resources. The way it manifests itself in the style is that every action is simply a new method with a different name and a different signature. And we can easily define as many methods or actions as we want almost without limitation. However, if we are designing an API that is more data centric and all the operations we need are simple CRUD operations then there are other styles of APIs that can be a much better fit for this type of scenario. We're going to talk about one of those styles in the following lecture. In this lecture, we talked about the first type of API, the Remote Procedure Call. We mentioned the three components of the RPC style, which is the API definition typically done using the interface description language, the client stub, and the server stub. We also talked about some of the benefits of RPCs which include local transparency and full abstraction of network communication. And finally, we mentioned some of the drawbacks of RPCs which include slowness and unreliability. Those drawbacks can be mitigated by applying the best practices we learned in a previous lecture. I will see you guys in the next lecture.
Section 4: REST API
REST is short for: Representational State Transfer.
Let's talk about how RESTful help us achieve high performance, scalability & high availability quality attributes:
- Stateless
- Cacheability
URI = Uniform Resource Identifier
Now let's talk about another style of APIs which is called REST API. We're first going to get familiar with the general concepts and benefits of REST API. And later, we'll learn how to define a REST API step by step. So what is a REST API? REST is a relatively new style that originated from a dissertation published by Roy Fielding in the year 2000. REST, which stands for Representational State Transfer, is a set of architectural constraints and best practices for defining APIs for the web. It's important to note that it is not a standard or a protocol, but just an architectural style for designing APIs that are easy for our clients to use and understand. And it makes it easy for us to build a system with quality attributes such as scalability, high availability, and performance. An API that obeys the REST architectural constraints is commonly referred to as a RESTful API. It's easy to understand the REST API when we compare it to the RPC API style that we learned in the previous lecture. The RPC API style revolves around methods that are exposed to the client and are organized in an interface or a set of interfaces. In other words, our system is obstructed away from the client through a set of methods that the client can call. And if we want to expand our API and allow the user to perform more operations on our system, we would do that by adding more methods to the API. In contrast, the REST API style takes a more resource-oriented approach, so the main abstraction to the user is a named resource and not a method. Those resources encapsulate different entities in our system. Also in contrast to the RPC API style, REST API allows the user to manipulate those resources only through a small number of methods. In a REST API, a client requests one of those named resources and our system responds with a representation of the current state of that resource to the client where that representation can be used. The protocol that is commonly used to request those resources is HTTP. It's important to note that our system sends back only the representation of the resource state and the resource itself can be implemented in a completely different way that is transparent to the client. Let's clarify the statement with an example. Let's say we're running an online news magazine and the resource that we provide to the clients is the homepage. So when a client requests the representation of that resource state, they will get a webpage with a title and a bunch of articles and pictures. However, this is just the representation of the current state of the homepage resource. And in reality, our system can implement that resource using many entities spread out through multiple database tables, files, or even external services. This is what we mean by the resource being an abstraction whose representation can be requested on demand by calling our REST API. Another big difference from the general RPC world is the dynamic nature of the REST API. In the RPC world, the only actions that the client can take regardless of its state are statically defined ahead of time in the interface description language. However, in a RESTful API, this interface is a lot more dynamic through a concept called Hypermedia as the Engine of the Application State. The way this is achieved is by accompanying a state representation response to the client with hypermedia links, so the client can follow those links and progress its internal state. For example, if we have a chat or email system, a client may send us a request for their current messages. In response, we can send the client application not only the object that represents all the information about their messages, but also an additional object that describes the resources that the client application can use to either get additional information or take specific actions. Now let's talk about how RESTful APIs help us achieve high performance, scalability, and high availability quality attributes. One important requirement of a system that provides RESTful API is that the server is stateless and does not maintain any session information about a client. Instead, each message should be served by the server in isolation without any information about previous requests. This requirement helps us achieve high scalability and availability and let's see why. If the server does not maintain any session information, we can easily run a group of servers and spread the load of the requests from the client among a large number of machines, completely transparently to the client. So the client may send one request to one machine and another request to a different machine and he won't even notice. Now, the second important requirement is cacheability. That means the server has to either explicitly or implicitly define each response as either cacheable or non-cacheable. This allows the client to eliminate the potential roundtrip to the server and back if the response to a particular request is already cached somewhere closer to the client. As a side benefit to us, it will also reduce the load on our system. Now let's talk about the resources and discuss what those resources can be, how they should be named and organized. In a RESTful API, each resource is named and addressed using a URI. The resources are organized in a hierarchy where each resource is either a simple resource or a collection resource. The hierarchy is represented using forward slashes. Now, a simple resource has a state, and optionally, it can contain one or more sub-resources, and a collection resource is a special resource that contains a list of resources of the same type. For example, if we have a movie streaming service, we can have a collection resource called movies and each sub-resource in that collection is of the type of movie. However, additionally, each movie, which is a simple resource, can have a collection sub-resource of its directors and another collection sub-resource that represents its actors. Additionally, each actor, which is also a simple resource, can have a profile picture sub-resource and a contact information sub-resource. Now, the representation of each resource state can be expressed in a variety of ways. It can be an image, a link to a movie stream, an object, an HTML page, some binary blob, or even an executable code like JavaScript that can be executed in the browser. Now let's talk about some of the best practices for naming our resources. The first best practice is naming our resources using nouns only. Naming our resources using nouns makes a clear distinction between the actions that we're going to take on those resources for which we're going to use verbs. The second best practice is making a distinction between collection resources and simple resources. The way we make that distinction is by using plural names for collections and singular names for simple resources. If we look at the previous example, we can see that the movies resource, directors resource, and actors resource are all collections, while the other resources are simple resources. The next best practice is giving our resources clear and meaningful names. If we give our resources meaningful names, our users will find it very easy to use our API which will help prevent incorrect usages and mistakes. Some overly generic collection names like elements, entities, items, instances, values, or objects should be avoided because they can really mean anything and heavily depend on the context. And the final best practice is the resource identifiers should be unique and URL friendly so they can be easily and safely used on the web. Now let's switch gears and talk about the methods and operations that we can perform on REST API resources. As we mentioned earlier, unlike in RPCs, the REST API limits the number of methods we can perform on each resource to just a few predefined operations. Those operations are creating a new resource, updating an existing resource, deleting an existing resource, and getting the current state of a resource. As a side note, when the resource is a collection resource, getting its state usually means getting the list of its sub-resources. Now because RESTful APIs are commonly implemented using HTTP which already has several standard methods, the REST operations are mapped to HTTP methods as follows. When we want to create a new resource, we use the POST HTTP method. When we want update an existing resource, we use the PUT method. When we want to delete an existing resource, we use the DELETE method. And finally, when we want to get the state of a resource or list of the sub-resources of a collection, we use the GET method. Now, in some situations, we may want to define additional custom methods, but those situations are uncommon. The HTTP semantics give us a few guarantees about those methods. For example, the GET method is considered to be safe, meaning that applying it to a resource would not change its state in any way. Additionally, the GET, PUT, and DELETE methods are idempotent, which, as we remember from the previous lecture, means that applying those operations multiple times would result in the same state change as applying them once. Also, GET requests are usually considered cacheable by default, while responses to POST requests can be made cacheable by setting the appropriate a HTTP headers sent as part of the response to the client. This feature of the HTTP protocol allows us to conform to the cacheability requirements of a REST API. Finally, when the client needs to send additional information to our system as part of a POST or PUT command, for example, we would use the JSON format, although other formats like XML are also acceptable. Now let's learn how to create a REST API step by step by going through a real life example. The system we're going to design is going to be a movie streaming service which we already saw a few examples of. The first step of creating a REST API is identifying the different entities in our system, which will serve as the resources of our API. To keep things simple, let's assume that the only entities we have in our movie streaming service are users, movies, reviews, and actors. Now, the second step would be mapping those entities to URIs. In this step, we define the resources based on the entities we identified in the previous step and also organize their resources in a hierarchy based on their relationships. For example, users and movies are two collection resources that are entirely independent of each other. The actors collection is also independent because the same actors can appear in different movies. However, the reviews are going to be a sub-resource of the movies collection because each review is associated with only a single movie. At this point, we have all the resources defined, so the next step is choosing the representation of each type of resource. Theoretically, we can represent each resource in any way we like, but the most common way to represent a resource is using the JSON format. For example, the movies collection resource would be represented as an object containing an array of movie names mapped to movie IDs. This representation makes it easy to search for a particular movie by name and also get its identifier. Using this identifier, a client application can append it to the movies URI and get a particular movie resource. A single movie resource object would contain all the information about that particular movie, including links that can help us play the movie on the user's device, as well as performing additional operations such as getting the movie reviews and the movie's actors. Those links allow our API to be driven by the hypermedia as the engine of the application state. The last step in creating a REST API is assigning HTTP methods to actions that we can perform on our resources. For example, we can define the POST operation on the user's collection resource to register a new user in our movie streaming service. The response to that request would contain a newly created user ID. Then we can define a GET method on a particular user to get all the information stored for that particular user including the list of favorite movies or the ones that they want to watch in the future. Similarly, we can define a PUT operation on a particular user resource to update the current user's profile, or if we want to make any changes to the user's information or preferences. And finally, we can define a DELETE operation on a particular user to remove that user entirely from our system. Now to finish designing our API, we need to repeat this process on all the resources we have in our system. In this lecture, we'll learn about a new style of API, which is called Representational State Transfer or REST in short. We compared the REST API to the general RPC approach by emphasizing that the REST API is more resource-oriented and limits the number of operations we can perform on those named resources to just a few methods. We later talked about how the REST API requirements allow us to provide high performance, high availability, and scalability. After that, we talked in detail about what those resources are, how they're organized, and what operations we can perform on those resources. And finally, we concluded by providing a step-by-step process on defining the REST API by following a practical real life example. I hope you're having fun and I will see you soon in the next lecture.
Section 5: Large Scale Systems Architectural Building Blocks
Building Block 1: Load Balancer
opening statement here...
Quality Attributes:
- Scalability
- High Availability
- Performance
- Maintainability
Types of Load Balancers:
- DNS load balancing
- Hardware load balancing
- Software load balancing
- Global Server load balancing
In this lecture, we're going to learn about the first and one of the most important software architecture building blocks, which is used in pretty much 100% of all real-life, large-scale systems. This building block is called a load balancer. After getting some motivation for using a load balancer, we will learn what quality attributes this building block can provide to our system. And finally, we will learn about the different types of load balancing solutions and how to use those solutions in architect'ing a real-life, large-scale system. So let's start with an introduction to load balancers. As the name suggests, the basic role of a load balancer is to balance the traffic load among a group of servers in our system. If we remember from the previous lectures, the best way to achieve high availability and horizontal scalability is running multiple identical instances of our application on multiple computers. However, without a load balancer, the client application that may run on our customer's computers will have to know the addresses of those computers, as well as the number of the application instances. This tightly couples the client application to our system's internal implementation and makes it very hard for us to make any changes. So while the main purpose of a load balancer is to balance the load among a group of servers to make sure that no individual server is overloaded as an added feature, most load balancing solutions also provide an abstraction between the client application and our group of servers. This abstraction makes our entire system look like a single server, capable of immense computing power and a lot of memory. Now, different load balancing solutions offer different levels of abstraction, which we will see very soon when we talk about different types of load balancers. Now, let's talk about what specific quality attributes a load balancer can provide to our system. The first quality attribute we get from a load balancer is high scalability. As we already mentioned, by hiding a group of servers behind a load balancer, we can scale our system horizontally both up and down by adding additional servers when the load on our system increases and remove unnecessary servers when the load on our system decreases to save money. In a cloud environment where we can easily rent more hardware on demand, we can use auto-scaling policies to intelligently add or remove servers based on different criteria, like the number of requests per second, network bandwidth, and so on. The next quality attribute the load balancer provides us with is high availability. Most load balancers can easily be configured to stop sending traffic to servers that cannot be reached. By having this monitoring feature, load balancers can intelligently balance the load only among healthy servers while ignoring the ones that are considered to be dead or excessively slow. Now, let's see how load balancers affect the system's performance. When it comes to performance, load balancers may add a little bit of latency and increase their response time to the user, but it's generally an acceptable price to pay for an increased performance in terms of throughput. Since the load balancer allows us to theoretically have as many backend servers as we like, of course, with some reasonable limitations, the number of requests or tasks that we can perform per unit of time is much larger than the throughput we could get from a single server. Another important quality attribute that the load balancer helps us achieve is maintainability. Since we can easily add or remove servers to the rotation, we can take down individual servers one-by-one to prefer maintenance or upgrade the application version without any disruption to the client. And when the maintenance on that server is complete, we can add it back to the load balancer and take down the next server. This way we can have a rolling release, while still keeping our SLA in terms of availability. Now, finally, let's talk about a few different types of load balancers that we can choose from. One of the most basic load balancers can be achieved through DNS. The Domain Name System is part of the internet infrastructure that maps human-friendly URLs, like amazon.com, netflix.com, or apple.com to IP addresses that can be used by network routers to route requests to individual computers on the web. It's essentially the phone book of the internet. So when a user or a client application wants to communicate with our system, the user sends a DNS query to the DNS server and the DNS responds with an IP address that corresponds to our domain name. Then, the client application can use that IP address to send a request directly to the server. However, a single DNS record doesn't have to be mapped to a single IP address and can be easily configured to return a list of IP addresses corresponding to different servers. Most DNS servers are implemented in such a way that they return the list of addresses for each domain in a different order on each client request and by convention, most client applications simply pick the first address in the list that uses the resolved IP address for a particular domain. This way, the domain naming system essentially balances the load on our servers by simply rotating this list in a round-robin fashion. Now, although this way of providing load balancing capability is super simple and cheap, as it essentially comes for free by purchasing a domain name, it has a few drawbacks. The main drawback is that DNS doesn't monitor the health of our servers. In other words, if one of our servers stops responding, the Domain Name System will not know about it and will continue referring clients to that particular server. This list of IP addresses changes only so often and is based on the time to live that was configured for that particular DNS record. Additionally, this list of addresses that a particular domain name is mapped to can be cached in different locations, such as the client's computer. That makes the time between a particular server going down and the point that the requests are no longer sent to that server even longer. Another drawback of DNS-based load balancing is that the load balancing strategy is always just as simple as round-robin, which doesn't take into account the fact that some of our application instances may be running on more powerful servers than others, nor can it detect that one of our servers may be more overloaded than the others. The third drawback of the DNS-based load balancing is the declined application gets the direct IP addresses of all our servers. This exposes some implementation details of our system and more importantly, makes our system less secure. The reason for that is that there is nothing that prevents a malicious client application from just picking one IP address and send requests only to that particular server which, of course, would overload it more than others. To address all those drawbacks, there are two load balancing solutions that are a lot more powerful and intelligent. Those two types of solutions are hardware load balancers and software load balancers. The only difference between those two types of load balancers is that hardware load balancers run on dedicated devices designed and optimized specifically for load balancing, while software load balancers are just programs that can run on any general-purpose computer and perform the load balancing function. In the case of software and hardware load balancers, in contrast to DNS load balancing, all the communication between the client and our group of servers is done through the load balancer. In other words, the individual IP addresses, as well as the number of servers we have behind the load balancer are not exposed to the users, which makes our system a lot more secure. Another feature of hardware and software load balancers is that they can actively monitor the health of our servers and send them periodic health checks to actively detect if one of our servers became unresponsive. Finally, both hardware and software load balancers can balance the load among our servers a lot more intelligently, taking to account the different types of hardware our application instances are running on, the current load on each server, the number of open connections, and so on. The nice thing about software and hardware load balancers is in addition to balancing requests from external users, they can also be used inside our system to create an abstraction between different services. For example, if we have an online store system, we can separate at the service that responds directly to client requests and serves the front end to the client browsers from the fulfillment service and the billing service. Each such service can be deployed independently as multiple application instances running on a group of servers. And those services communicate with each other through a load balancer. This way, we can scale each such service completely independently and transparently to the other service. While software and hardware load balancers are superior to DNS in terms of load balancing, monitoring, failure recovery and security, they are usually collocated with the group of servers they balance the load on. The reason for that is if we put the load balancer too far from the actual servers, we're adding a lot of extra latency since all the communication, both to the servers and back to the client, has to go through the load balancer. So if we run system in multiple geographical locations, which are commonly referred to as data centers, then having only one load balancer for both groups of servers will sacrifice the performance for at least one of those locations. Additionally, load balancers, on their own, do not solve the DNS resolution problem, so we would still need some kind of DNS solution to map human readable domain names to an IP address. For that, there is a fourth load balancing solution, which is called Global Server Load Balancer, or GSLB in short. A GSLB is somewhat of a hybrid between a DNS service and the hardware or software load balancer. A GSLB solution typically can provide a DNS service just like any other DNS server that we're familiar with. However, in addition, it also can make more intelligent routing decisions. On one hand, the GSLB can figure out the user's location based on the origin IP inside the incoming request. On the other hand, a GSLB service has similar monitoring capabilities to a typical software or hardware load balancer. So at any given moment, it knows the location and state of each server that we register with our GSLB. In a typical large-scale system deployment, those servers are load balancers located in different data centers in different geographical locations. So when a user sends a DNS query to the GSLB, the GSLB may return just the address of the most nearby load balancer. From that point on, the user will use that IP address to communicate directly with our system in that data center through a collocated software or hardware load balancer. The cool part about GSLBs is that most GSLBs can be configured to route traffic based on a variety of strategies and not just by physical location. Since they're in constant communication with our data centers, they can be configured to route users based on their current traffic or CPU load on each data center or based on the best estimated response time or bandwidth between the user and that particular data center. Thanks to this great feature, we can provide the best performance possible for each user, regardless of their geographical location. Additionally, GSLBs play a very important role in disaster recovery situations. If there's a natural disaster or a power outage in one of our data centers, the users can be easily routed to different locations, which provides us with higher availability. Finally, to prevent a load balancer from being a single point of failure in each region, we can place multiple load balancers and register all their addresses with the GSLB's DNS service or any other DNS service. So the client applications can get a list of all our load balancers and either send a request to the first one in the list or pick one themselves randomly. We learned a lot in this lecture, so let's quickly summarize it. In this lecture, we learned about a very important software architecture building block, the load balancer. We learned about four load balancing solutions, which are DNS load balancing, hardware load balancing, software load balancing, and Global Server Load Balancing. Later, we talked about the different quality attributes that the load balancer provides to our system. And finally, we saw how we can combine all those different solutions to architect a large-scale system that can provide high performance and high availability, and scale to millions of users located in different geographical locations. I will see you soon in the next lecture.
Open Source Software Load Balancing Solutions
HAProxy
HAProxy is a free and open-source, reliable, high performance TCP/HTTP load balancer. It is particularly suited for very high traffic web sites, and powers a significant portion of the world's most visited ones. It is considered the de-facto standard open-source load balancer, and is shipped with most mainstream Linux distributions. HAProxy supports most Unix style operating systems.
NGINX
NGINX is a free, open-source, high-performance HTTP server and reverse proxy (load balancer). It is known for its high performance, stability, rich feature set and simple configuration. For a full tutorial on how to install, configure and use NGINX follow this (link)[https://www.nginx.com/resources/wiki/start/].
Cloud Based Load Balancing Solutions
AWS - Elastic Load Balancing (ELB)
Amazon ELB is a highly scalable load balancing solution.
It is an ideal solution for running on AWS, and integrates seamlessly with all of AWS services.
It can operate on 4 different modes:
-
Application (Layer 7) Load Balancer - Ideal for advanced load balancing of HTTP and HTTPS traffic
-
Network (Layer 4) Load Balancer - Ideal for load balancing of both TCP and UDP traffic
-
Gateway Load Balancer - Ideal for deploying, scaling, and managing your third-party virtual appliances.
-
Classic Load Balancer (Layer 4 and 7) - Ideal for routing traffic to EC2 instances.
For the full documentation on Amazon ELB and its autoscaling policies follow this (link)[https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-load-balancer.html]
GCP - Cloud Load Balancing
Google Cloud Platform Load Balancer is Google's, highly scalable and robust load balancing solution.
"Cloud Load Balancing allows you to put your resources behind a single IP address that is externally accessible or internal to your Virtual Private Cloud (VPC) network".
Some of the load balancer types available as part of the GCP Cloud Load Balancing are:
-
External HTTP(S) Load Balancer - Externally facing HTTP(s) (Layer 7) load balancer which enables you to run and scale your services behind an internal IP address.
-
Internal HTTP(S) Load Balancer - Internal Layer 7 load balancer that enables you to run and scale your services behind an internal IP address.
-
External TCP/UDP Network Load Balancer - Externally facing TCP/UDP (Layer 4) load balancer
-
Internal TCP/UDP Load Balancer - Internally facing TCP/UDP (Layer 4) load balancer
Microsoft Azure Load Balancer
Microsoft Azure load balancing solution provides 3 different types of load balancer:
-
Standard Load Balancer - Public and internal Layer 4 load balancer
-
Gateway Load Balancer - High performance and high availability load balancer for third-party Network Virtual Appliances.
-
Basic Load Balancer - Ideal for small scale application
GSLB Solutions
• Amazon Route 53 - Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service.
• Google Cloud Platform Load Balancer & Cloud DNS - Reliable, resilient, low-latency DNS serving from Google's worldwide network with everything you need to register, manage, and serve your domains.
• Azure Traffic Manager - DNS-based load balancing
Building Block 2: Message Brokers
What is a message Broker? Why do we need it?
Synchronous communication. Asynchronous communication. Entirely decouple senders from receivers. Now senders don't have to keep a live connection with the receivers. Senders communicate only with the message broker. True, now the message broker needs to be alive, instead of the receiver, but storing messages is faster than processing messages. Now the user doesn't need to be left in a suspend mode, waiting for the receiver to acknowledge the message, the message broker can acknowledge that the message has been sent, and will be taken care of sometime in the future.
Publish/Subscribe Pattern.
Now, it's time for us to learn about the most fundamental architecture building block for asynchronous architectures, the message broker. We will start with getting the motivation for message brokers by exploring some use cases where an asynchronous architecture can provide us with more benefits and better capabilities. And finally, we'll learn what kind of quality attributes message brokers can add to our system. So what is a message broker and why do we need it? If we recall from the previous lecture where we learned about load balancers, you probably noticed that in each case, when we mentioned two applications talking to each other, either directly or through a load balancer there was always an implicit assumption that both the sender application and the receiver application maintained an active connection. This implies that they were both healthy and running in the same time. This type of communication is called synchronous communication. While synchronous communication is the most straightforward type of communication between services, it has a few drawbacks. The first drawback that we already mentioned is the fact that both application instances that establish the communication with each other have to remain healthy and maintain this connection to complete the transaction. While it's easy to achieve when we have two services that exchange very short messages that take a very short time to process and respond to things get a lot more complex when we have a service that takes a long time to complete its operation and provide a response. As an example, let's consider a system that sells tickets to different shows or performances offered by different theaters. Let's assume that we have two services. The first service simply provides the front end to the user and get requests for purchasing tickets. The second service fulfills the order by reserving the ticket through an external API. Then it builds the user by communicating with a credit card company. And in the end, it may send a confirmation email or talk to an external service that would send a physical ticket to the user. Now, while the ticket's reservation service is doing its job, the front end service will have to maintain an open connection and wait for our response. But while it's waiting for the response from the tickets reservation service it's also holding the user in suspense because until the ticket is reserved and the user is successfully built we won't know if the operation is successful. But besides the fact that the operation itself even when successful may take a long time things can get a lot worse if the application server that performs all those operations suddenly crashes and we need to start over. Another drawback of synchronous communication is that there is no padding in the system to absorb a sudden increase in traffic or load. For example, let's say we have an online store that has two services just like before. One for receiving direct traffic from users and another to fulfill the actual purchase. And let's also assume that we're running a limited time promotion for one of our products. That promotion results in manual requests from users to our front end service, which we can easily handle. But it also results in a very large number of purchase fulfillment requests to our order fulfillment service which we cannot easily handle even if we scale that service to many server instances. The reason for that is simply because fulfilling each order involves many operations that take a very long time. All those scenarios can be easily solved with an architectural building block called message broker. A message broker is a software architectural building block that uses the queue data structure to store messages between senders and receivers. To be clear unlike a load balancer that can be easily used to take external traffic from a client application a message broker is an architectural building block that is used inside our system and is generally not exposed externally. In addition to simply storing or temporarily buffering the messages, message brokers can provide additional functionality such as message routing, transformation, validation and even load balancing. However, unlike load balancers message brokers entirely decouple senders from the receivers by providing their own communication protocols and APIs. Message brokers are the fundamental building block for any type of asynchronous software architecture. When we have two services communicating with each other through a message broker, the sender doesn't have to wait for any confirmation from the receiver after it sends the message to the message broker. In fact, the receiver may not even be available to take any messages while sender, send that message. So in the case of the ticket reservation system the end user may get an acknowledgement immediately after placing the order. And later it will get an email asynchronously with an official confirmation for purchasing the ticket after the ticket reservation service actually completes the transaction. By using the message broker we can break the ticket reservation service into multiple services. Each for one stage in the transaction and each pair of services is also decoupled from each other by a message broker. Another important benefit that a message broker provides us with is buffering of messages to absorb traffic spikes. In our online store scenario, when we have a lot of orders in a short period of time the front end service may simply decrement a counter in a database for the number of items left in stock while the actual orders are stored inside the message brokers queue. And those orders can be fulfilled one by one after the sale is already over and the traffic to our store goes down. Most message broker implementations also offer the published subscribe pattern where multiple services can publish messages to a particular channel and multiple services can subscribe to that channel and get notified when a new event is published. With this pattern we can take the same online store, for example and without any modification into the system we can easily add another service that would subscribe to the orders channel and feed that data to the analytic service. This service would analyze the purchasing pattern from different users and suggest certain products to users in the future. Similarly, we can add another service that for every order from the user, it would send a push notification to the user's mobile phone. This way a user can be alerted if a purchase order was placed from their account. And later, just as easily we can add another service that for every purchase from a user would schedule a survey or request a review a certain time after the purchase has been made. And as you can see with this pattern all those services were added without any modifications to the system. So now that we understand all the benefits and capabilities of a message broker, let's speak more concretely about what quality attributes we get from using a message broker in our system. A message broker adds a lot of full tolerance to our system since it allows different services to communicate with each other while some of them may be unavailable temporarily. Also, message brokers, prevent messages from being lost which is another characteristic of a full tolerance system. All this additional full tolerance helps us provide with higher availability to our users. Additionally, since a message broker can queue up messages when there is a traffic spike it allows our system to scale to high traffic without needing to make any modifications to the system. Now it's worth noting that while a message broker provides us with superior availability and scalability we do pay a little bit in performance when it comes to latency. The reason for that additional latency is a message broker at a significant indirection between two services, in comparison to the straightforward synchronous communication even through a load balancer. But generally, this performance penalty is not too significant for most systems. In this lecture, we've learned learn about a very important architectural building block which is fundamental to any asynchronous software architecture, the message broker. We got the motivation for using a message broker by comparing it to the synchronous communication we typically use between services directly or through a load balancer. Later, we talked about the different benefits and capabilities that a message broker provides us with such as asynchronous architecture, buffering, and others. And finally, we talked about the two main quality attributes that we get from using a message broker. Those quality attributes are high availability that comes thanks to the superior full tolerance to service outages and message loss and high scalability which comes mainly due to the ability to buffer messages when there are sudden spikes in the load on our system. I will see you in the next lecture with another exciting architectural building block.
Message Brokers Solutions & Cloud Technologies
Open Source Message Brokers
Apache Kafka - The most popular open-source message broker nowadays. Apache Kafka is a distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
RabbitMQ - A widely deployed open-source message broker. It is used worldwide at small startups and large enterprises.
Cloud Based Message Brokers
Amazon Simple Queue Service (SQS) - Fully managed message queuing service that enables you to decouple and scale micro-services, distributed systems, and serverless applications.
GCP Pub/Sub and Cloud Tasks - Publisher/Subscriber and message queue solutions offered by Google Cloud Platform. See this article for comparison between the two offerings.
Building Block 3: API Gateway
Micro-Services navigation Request Aggregation Security Authorization & Authentication Caching Monitoring & Alerting
Some considerations:
- API Gateway shouldn't contain any business logic.
- API Gateway may become a single point of failure.
- Avoid bypassing the API Gateway from outside.
Point number 2 can actually be solved by deploying multiple instances of an API Gateway and placing them behind a Load Balancer.
In this lecture,we're going to learn about the API Gateway component, which is an architectural building block and design pattern used pretty much by all large scale systems in the industry. But before we talk about what an API Gateway can do for us, let's first understand the problem we're trying to solve. Let's imagine we're building a video sharing and streaming system where users can upload their videos, and also watch and comment on other people's videos. In the beginning, all we have is one service in our system that serves the frontend to the user in a form of HTML and JavaScript, and also takes care of the user profile, channel subscriptions, notifications, video storing and streaming, and comments left for each video. Additionally, this service needs to implement security because if a particular user wants to upload a new video, delete an existing video, or update their profile, we first need to make sure that that user is properly authenticated and authorized to make those changes. Over time, we realized that this one service code base becomes too big and complex for us to develop and maintain, so we apply the organizational scalability principle and split this one service into multiple services. Each one for only one purpose. The consequences of this change are that the single API we originally exposed is now split into multiple APIs implemented by each service. So now, we need to update the frontend code that runs on the client's web browser to be aware of the internal organization of our system, consisting of different services. And make calls to different services depending on the task. For example, when a user wants to simply go to the homepage and see the latest activity on the website, the client code would have to call the Frontend Service and the Users service. And when a user wants to watch a particular video, they would need to also make a call to the Frontend Service to load the different pages. Then call the Video Service to load the actual video, and then call the Comments Service to get all the comments that other users left on that particular video. Now, because we split our system into multiple services and the client application code makes separate calls to each service, each service needs to reimplement its own security, authentication, and authorization, which adds a lot of performance overhead and duplication. So to eliminate all those redundancies, decouple the client application from the internal organization of our system, and simplify our external API, we add an additional abstraction in a form of a service called API Gateway. An API Gateway is an API management service that sits in between the client and the collection of backend services. The API Gateway follows a software architectural pattern called API composition. In this pattern, we compose all the different APIs of our services that we want to expose externally into one single API the client applications can call by sending a requests to only one single service. The API Gateway creates an abstraction between the client and the rest of our system that provides us with a lot of benefits. The first benefit of the API Gateway is that it allows us to make internal changes to our system completely seamlessly and transparently to our API consumers. For example, if we have both desktop and mobile users, at some point, we may want to split our Frontend Service into two different services that serve different data depending on the device the request originated from. And similarly, we can split our video streaming service into two separate services. One for high resolution video, optimized for desktops, and another for lower resolutions, optimized for mobile devices. The second benefit of having an API Gateway in the front door to our system, is we can consolidate all the security, authorization, and authentication in one single place. So if we have a malicious user trying to impersonate another user, we can stop their request at the API Gateway. While a real user can be successfully authenticated at the API Gateway where we can perform an SSL termination and forward the decrypted request to the rest of the services. Additionally, we can allow a user to perform different operations depending on his permissions and role. A few examples of such operations that we can allow to some users and disallow to other users include viewing private videos, deleting, or uploading new videos, and so on. As part of the security feature, we can also implement rate limiting at the API Gateway to block denial of service attacks from malicious users. Now, another benefit of the API Gateway is it can improve the performance of our system. Besides the fact that we save a lot of overhead of authenticating every request from the user at each service by performing all the security and authentication in a single place, we can also save the user from making multiple requests to different services. This feature is called request routing. For example, when a user wants to watch a particular movie without the API Gateway, the user would have to make three calls to our system. One, to load the frontend page from the Frontend Service, then, load the video from the Video Service, and then make another call to the Comments Service to load all the comments that other users left for that particular video. By having the API Gateway in between the user and our services, the client code can make a single call to the API Gateway, which would route the request to all the appropriate services, and aggregate all the responses into one single response. Another way the API Gateway can improve our performance is by caching study content, as well as responses to certain requests. This, of course, reduces the response time to the user because if we already have the cached responses for particular requests, we can return it immediately from the API Gateway without the need to make requests to the different services. Another added benefit we get from routing all the traffic through one single service is monitoring and alerting. By adding monitoring logic into our API Gateway, we can gain real-time visibility into the traffic pattern and the load on our system. This also allows us to create alerts in case the traffic suddenly drops or increases unexpectedly. This feature helps us in improving our system's observability and availability. Finally, the API Gateway allows us to perform protocol translation in one single place. For example, externally, we can expose a rest API that uses JSON to represent different objects. But internally, some of our services may use different RPC technologies or different formats to represent their objects. At the same time, we may even have some legacy services that support all their protocols like HTTP 1 and represent their objects using XML. Now, on the other hand, externally, we may also integrate with other systems that may bring us additional revenue. For example, we may integrate with a digital advertising company that may want to run ads on our videos. Or maybe another system wants to host their videos on our platform, and simply request them on demand to embed them on their webpages. All those companies may already have systems that support only particular protocols, and they may be reluctant to make big changes to their system to support our API's protocols and formats. In this case, we can simply extend our API to support their protocol and formats at the API Gateway, and perform the appropriate translation to call our services using the existing implementation. So now that we listed all the great benefits of an API Gateway, and mention all the quality attributes that the API Gateway provides us with, such as security, performance, and high availability that we get through monitoring, let's talk about a few best practices and anti-patterns of using this architectural building block. The first important consideration in using this design pattern is making sure that our API Gateway doesn't contain any business logic. While security caching and monitoring are nice added features, the main purpose of an API Gateway is API composition and routing of requests to different services. Those services are the ones that make the business decisions, and are performing the actual tasks. If we make the mistake and follow the anti-pattern of adding business logic to our API Gateway, and making it too smart, we may end up again with a single service that does all the work and contains an unmanageable amount of code. Which is actually the problem we wanted to solve initially by splitting our system into multiple services. The next thing to consider is since all the traffic to our system now goes through the API Gateway, our API Gateway may become a Single Point of Failure. We can easily solve the scalability, availability, and performance aspect by simply deploying multiple instances of our API Gateway service, and placing them all behind a load balancer. But another thing to consider is if we push a bad release or we introduce a bug that may crush our API Gateway service, our entire system becomes unavailable to the clients. So we need to be extra careful to eliminate any possibility of human error, and deploy new releases to the API Gateway service with extreme caution. Now, finally, we do need to acknowledge that adding an additional service that our client needs to go through any time a request is sent to our system does add a little bit of performance overhead. Now, while overall, by having an API Gateway, we typically benefit more than we sacrifice in terms of performance. But in certain situations, we may be tempted to over-optimize and bypass the API Gateway. But this is an anti-pattern that we should try to avoid. For example, in the case that the user service team wants to make a change in their API. If they know that the only service that calls them externally is the API Gateway, they can simply make the changes in the API Gateway service, and then safely release their new changes. However, if other external clients can call the service directly, then the Users Service team will have to be a lot more cautious and slower in releasing those changes. This is because they will have to go and update each client's code before they can release their change. This, again, tightly couples the service to external client's code, which is the problem we solve by using the API Gateway to begin with. So now that we know all the do's and don'ts of using an API Gateway in our system, let's summarize what we learned in this lecture. In this lecture, we learned about a very important architectural building block and design pattern of large scale systems. We'll learn about some of the benefits of an API Gateway, such as API composition, security, caching, monitoring, and so on. And finally, we talked about a few considerations for correctly using an API Gateway. Those considerations were keeping the business logic out of the API Gateway, being extra careful, and making modifications to the API Gateway. And finally, not breaking the API Gateway abstraction externally. I hope you already learned a lot so far, so I will see you soon in the next lecture.
API Gateway Solutions & Cloud Technologies
Open Source API Gateways
Netflix Zuul
Zuul is a free and open-source application gateway written in Java that provides capabilities for dynamic routing, monitoring, resiliency, security, and more.
Cloud-Based API Gateways
Amazon API Gateway
Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. Supports RESTful APIs and WebSocket APIs (bi-directional communication between client and server).
Google Cloud Platform API Gateway
Google Cloud Platform API Gateway enables you to provide secure access to your services through a well-defined REST API that is consistent across all of your services, regardless of service implementation. For full documentation follow this (link)[https://cloud.google.com/api-gateway/docs/about-api-gateway].
Microsoft Azure API Management
API Management helps organizations publish APIs to external, partner, and internal developers to unlock the potential of their data and services.
Building Block 4: CDN - Content Delivery Network
In this lecture, we're going to talk about CDNs. Although CDNs are not so much an architectural building block but more of a service, it is one of the most important technologies that power the internet as we know today. So what problem are we solving using a CDN? Even with distributed web hosting across multiple data centers, which are powered by technologies like Global Server Load Balancing, there is still a significant latency caused by the physical distance between an end user and the location of the hosting servers. Additionally, each request between an end user and the destination server has to go through multiple hops between different network routers. This adds even more to the overall latency. To get a better idea of where a CDN comes in handy, let's take an example of a user located in Brazil who wants to load the homepage of our online store hosted in a data center somewhere on the east coast of the United States. Let's assume that this HTML webpage contains links to 10 different assets, like Images, JavaScript, and CSS files. Also, for simplicity, let's assume that the network latency between the user's computer in Brazil and the data center in the United States is about 200 milliseconds. Now, since HTTP uses TCP under the hood because of the TCP three‐way handshake who just established the connection between a client and the server, we need to make three network trips. In this case, accounting to 600 milliseconds. Only once the TCP connection is established the client can send the first HTTP request to get the homepage and get the response with the HTML webpage. So that network round trip now added another 400 milliseconds of latency. And now, when the user finally receives the webpage, the browser makes an asynchronous request to load the 10 different assets the webpage needs. That adds another 200 milliseconds for the request to arrive at the server and another 2,000 milliseconds to load all the assets to the page. So if we add up all those numbers together, we end up with a little over three seconds before a user gets a complete webpage that they can interact with. Now, in the context of a study published by Google Analytics in March 2016 that indicated that 53% of mobile users abandoned a website if it took longer than three seconds to load, this number we just calculated does not look good for our service. Of course, we can try to improve our system's performance by replicating our service and running it on more data centers. But if you think about it, it's not our service that contains all the business logic that the users need closer. It's mostly the study content, like Images, HTML pages, JavaScript, CSS files, and video files that we need to get closer to the users to get them load faster. This is where CDNs come in. A CDN, which stands for Content Delivery Network, is a globally distributed network of servers located in strategic places with the main purpose of speeding up the delivery of content to the end users. CDNs were originally created to address the problem often referred to as the Wide World Wait, which is a term describing a bad user experience caused by a slow network connection or an overloaded web server. So CDNs provide that service by caching our website content on their servers, often referred to as edge servers, and are located at different Points of Presence. Those edge servers are both physically closer to the user and are more strategically located in terms of network infrastructure. This allows them to transfer the content much quicker to the user and improve our perceived system performance. Content Delivery Networks can be used to deliver webpage content and assets, like Images, text, CSS, and JavaScript files, but can also be used to deliver video streams, both live and on-demand. Today, CDNs are used by pretty much all digital service companies that interact with users through either a website or a mobile app. A few examples include E-commerce services, financial institutions like banks, technology and software as a service companies, media companies that deliver video streaming or news, and social media companies. Now, besides faster page loads, the distributed nature of CDNs improves the overall availability of our service. The reason for that is any issue or slowness in our system won't be as noticeable to the users because most of the content will actually come from the CDN and not from our servers directly. CDNs also improve our system's security and help protect us against DDoS attacks since those malicious requests will not go to our system, but instead, are going to be distributed among a large number of servers hosted by the CDN provider, their ability to impact us is very low. So to get some intuition of how much improvement we can get by using a CDN, let's assume that that same user in Brazil who's trying to load our online store's homepage is now going to reach a cached version of our website in a CDN close by. So let's assume that now, the latency between the user and the CDN server is about 50 milliseconds. So now, to establish the connection, the user will have to pay only 150 milliseconds of latency, which accounts for the TCP three-way handshake. Then, to send the HTTP request for the webpage and get it back, we spend another 100 milliseconds. And finally, to send a request for the different assets that the webpage needs and to get all those assets loaded by the browser, we pay another 550 milliseconds. This brings the total latency to load our webpage to comfortably under one second, which is an excellent user experience by all measures. And we were able to achieve it by utilizing CDN to catch the content of our homepage on their network of servers. In addition to being physically closer to the users, CDN providers utilize other techniques to transfer the content to the user a lot faster. For example, the CDN servers use faster and more optimized hard drives to store our cached content. In additions, CDNs also reduce the bandwidth by compressing the content delivered over the network using techniques like Gzip, or minification of JavaScript files. Now, there are generally two strategies that we can use when integrating with Content Delivery Networks in regards to caching content. The first strategy is called a Pull Strategy. In this strategy, all we need to do is tell the CDN provider which content we want on our website to be cached and how often this cache needs to be invalidated, which can be configured by setting the Time To Live property on each asset or type of asset. In this model, the first time a user requests a certain asset, the Content Delivery Network will have to populate its cache by sending a request to a server in our system. Once that asset is cached on the CDN, subsequent request by users will be served by the edge servers directly. This will save all the network latency associated with the communication with our servers. Now, when I use a request on asset that has already expired, the CDN will send a request to our servers to check if we have a new version for that asset. If the version of the asset did not change, the CDN will simply refresh the expiration time for that asset and serve it back to the user. Otherwise, if a new version of that asset is available, the server will send it back to the CDN and the CDN will serve the new version instead of the old one to the user. Now, the second strategy is called the Push Strategy. In this strategy, we manually or automatically upload or publish the content that we want to be delivered through a CDN. And whenever that content changes in our system, we are responsible to republish the new versions to the edge servers. Now, it's worth pointing out that some CDN providers support this model directly while others enable the strategy by simply setting a very long TTL for our assets, so the cache essentially never expires. And whenever we want to publish a new version, we simply purge the content from the cache, which forces the CDN to fetch that content from our servers whenever a user requests that content. Now, before we can choose the right strategy for us, we need to understand the advantages and disadvantages of each of those strategies. So the main advantage of the Pull Model is, of course, lower maintenance on our part because once we configure which assets need to be cached by the CDN and how often they need to expire, we don't need to do anything to keep those CDN caches up-to-date. Essentially, from that point on, everything is going to be taken care of by the CDN provider. Now, the obvious drawbacks are the fact that the first users to use a certain asset that hasn't been cached by the CDN yet will have a longer latency because the CDN will need to fetch that asset first from our system. Another drawback is if we set the Time To Live for all the assets to be the same, we may see those frequent traffic spikes when all the assets expire in the same time. This will result in a large number of requests coming from the CDNs to refresh its caches. Now, while the Pull Strategy definitely adds to our system's availability, we still need to maintain a general high availability of our system because if certain assets expire on the CDN and our system is not available for the CDN to pull a new version, the user will get an error. On the other hand, in the Push Strategy, if our content doesn't change too frequently, we can simply push it once to the CDN. And from that point on, the traffic will go directly to the edge servers. This significantly reduces the traffic to our system and also reduces the burden for our system to maintain high availability because even if our system goes down temporarily, users will still get all the data from the CDN and won't be affected by our system's internal issues at all. On the other hand, if our content does change frequently, we have to actively publish new versions of the content to the CDN. Otherwise, users will get stale and out-of-date content, which is definitely not what we want. In this lecture, we learned about the Content Delivery Network, which is a service that we used in conjunction with our system to improve our users' experience. We talked about the different features and capabilities of CDNs, which add to our system's performance, availability, and security. And finally, we talked about the two strategies for publishing our content to a CDN, which are the Pull Strategy and the Push Strategy. And after that, we compared the two by listing the advantages and disadvantages of each publishing model. I hope you're having fun and I will see you soon in the next lecture.
CDN Solutions & Cloud Technologies
Cloudflare
Cloudflare offers ultra-fast static and dynamic content delivery over our global edge network. It helps reduce bandwidth costs and takes advantage of built-in unmetered DDoS protection.
Fastly
Fastly's Deliver@Edge is a modern, efficient, and highly configurable CDN that gives you the freedom to control how your content is cached so you can deliver the content your users want quickly.
Akamai
Akamai has a large variety of offerings for API Acceleration, Global Traffic Management, Image & Video Management, Media Delivery, and much more.
Amazon CloudFront
Amazon CloudFront is a content delivery network (CDN) service built for high performance, security, and developer convenience. Some of its use-cases include delivering fast secure websites, accelerating dynamic content delivery and APIs, live streaming, video-on-demand, and others.
Google Cloud Platform CDN
GCP CDN offers fast, reliable web and video content delivery with a global scale and reach.
Microsoft Azure Content Delivery Network
Microsoft's CDN solution offers global coverage, full integration with Azure services, and a simple setup.
Section 6: Data Storage at Global Scale
Relational Databases & ACID Transactions
Now that we mastered the most important architectural building blocks in terms of traffic, API, and content delivery management, it's time for us to talk about databases. When it comes to databases there are so many options to choose from that it's natural for us to feel overwhelmed. So instead of choosing the most popular or trendy database, in the next few lectures we will learn a methodical way on how to choose the right database for our systems use case. In this lecture we'll talk about the first type of database called relational database. We will first learn what relational databases are and what advantages and disadvantages they provide us with, and later we'll talk about when to use a relational database and when we should go the other route. So let's start talking about relational databases. In a relational database data is stored in tables. Each row in the table represents a single record and all the records are related to each other through a predefined set of columns that they all have. Each column in a table has a name, a type, and optionally, a set of constraints. This relationship between all the records inside a table is what gives this type of database the name relational database. Now each record in the table is uniquely identified by what's called a primary key, which can be represented by either one column or a set of columns in the table. The structure of each table is always defined ahead of time and is also referred to as the schema of the table. Because of this predefined schema for each table, which gives us the knowledge of what each record in the table must have, we can use a very robust query language to both analyze and update the data in the table. The industry standard scripting language to perform such queries is called SQL which stands for structured query language. Different relational database implementations have their own additional features in their own version of the language. However, the majority of the standard operations are the same for all relational databases. Relational databases have been around since the 1970s so it's a very well known and proven way of structuring data. One of the biggest advantages of relational databases that gave rise to their popularity so early on was the fact that storage used to be very expensive and storing data in separate tables allows us to eliminate the need for data duplication. Although storage became a lot cheaper since then, the amount of data that collect nowadays is a lot larger than before. So for large scale systems storage costs are still a major factor. Let's see how we can save space and avoid data duplication in a relational database with a small, but very realistic example. Let's assume we have an online store where we sell a limited number of products. We store all the information about those products in a table that contains each product's information like its name, its model year, company, category, and base price. Now as customers place orders for those products we need to store those orders in a separate table. However, in addition, we also want to have the ability to easily analyze and report on things like which products sell the most or the least at a given timeframe, we also may want to rank companies based on their performance of their products, or we may want to get a breakdown of which category of products does better during certain sales or seasons. So an approach would be to have one table where each row contains a single purchase of a particular product as well as all the information about that product that we need to perform all those analytics. Of course, we can already see that this table will take a lot of space because it contains a lot of information on each purchase product. And this table will continuously grow as we get more orders for our products. However, if we look closely, most of the information in this table is duplicated from the products table. So because a relational database allows us to have not only relationships between rows inside a table but also relationships between multiple tables, we can actually avoid this duplication and keep the orders table a lot smaller by containing only the ID of the product that was involved in each purchase. Using this one column that is shared between the two tables we can use the join operation and combine the information from both tables without having to store that information twice anywhere. Now let's talk about the advantages and disadvantages of relational databases. The clear advantage of storing our data in a relational database is the ability to perform those complex and very flexible queries using the SQL language. This allows us to gain deep insight into the data we have about our business or our users. Also, as we mentioned a few moments ago, because of our ability to analyze and join multiple tables we can save a lot of space on storage which directly translates to cost savings for our business. Another advantage of relational databases is that they're very easy to reason about. The reason for that is this table structure is very natural for human beings, and it doesn't require any knowledge about sophisticated data structures and computer science. Finally, relational databases provide us with very powerful properties called ACID transactions where ACID stands for atomicity, consistency, isolation, and durability. In the context of a database, a transaction is a sequence of operations that for an external observer should appear as a single operation. For instance, transferring money from one user's account to another may involve multiple operations internally, but should be perceived as one single operation externally. After all, we don't wanna have a situation even for a brief moment where it appears that the money was withdrawn from the first account but it's still missing on the second account. Or similarly, we don't want to have a situation where the money appears twice in both accounts simultaneously. So to make sure we don't get into either of those situations relational databases guarantee the atomicity of transactions, which means each set of operations that are part of one transaction either appear all at once or don't appear at all. In other words, it's either everything or nothing, but never anything in between. Now let's talk about consistency. Consistency guarantees that a transaction that was already committed will be seen by all future queries and transactions. For example, if we transfer a hundred dollars from user A to user B as a single transaction, once that transaction is complete there is no way that a future query will suddenly see those a hundred dollars either in the first user's account or will not see those funds in the second user's account. But consistency doesn't guarantee only that, it also guarantees that a transaction doesn't violate any constraints that we set for our data. For example, if we set a constraint that a particular user can't have more than a thousand dollars in their account, the consistency guarantee will make sure that a transaction will not leave that user with more money than the allowed amount of money in their account. So if that limit is reached during a transaction, that transaction will simply fail. The next property the relational database guarantees as part ACID transaction semantics is isolation. Isolation is somewhat related to atomicity, but in the context of concurrent operations performed on our database. For example, if we continue with the same example of transferring money from one user's account to another, isolation guarantees that if there's another transaction happening simultaneously, that second concurrent transaction will not see an intermediate state of the money being either present in both accounts or missing in both accounts. So in other words, those two concurrent transactions are isolated from each other in such a way that they do not see each other's intermediate state. The last guarantee that comes from a relational database as part of the asset transaction properties is durability. Durability simply means that once a transaction is complete its final state will persist and remain permanently inside the database. So with this property we will never have a situation where for example a user makes a purchase of a product from our online store and transfers the money to us as part of the transaction but the record about the purchase disappears. So as long as the purchase and the transfer of the money were part of the same transaction and that transaction was completed successfully, the final state of that entire transaction will contain both the record of the purchase and the money transfer. Now that we talked about all the awesome advantages and properties that we get from our relational database, let's talk about its disadvantages and drawbacks. The main disadvantage of using a relational database is its rigid structure enforced by each data table schema. Since the schema applies to all the rows inside a table it has to be defined ahead of time before we can use the table. And if at some point we want to change the schema of the table, for example, by adding a new column or removing a column that we no longer need, we would have to have some maintenance time which results in our table not being available for any operations. So when we design the schema of our tables, we need to be very careful and very thorough in planning ahead so we don't need to change the schema very often or preferably not change it at all. Also relational databases tend to be a bit harder and more costly to maintain and scale because of their complexity. After all, supporting the SQL query language and providing ACID transaction guarantees is not a simple task. Additionally, because of all this complexity and all those guarantees that we get from our relational database, read operations tend to be a bit on the slower side compared to the other type of databases that we're going to talk about in another lecture. Of course, different implementations of relational databases have different performance optimizations and guarantees. But as a general rule, relational databases are notoriously slower than the non relational databases. So now that we talked about both the advantages and disadvantages of a relational database, it's easier to extrapolate that knowledge and see when it's a good idea to use a relational database to store our data. All we need to do is look at the advantages and see if those properties are important to us while the disadvantages can take a backseat. For example, if performing those complex and flexible queries to analyze our data is an important use case for us, or we need to guarantee ACID transactions between different entities in our database, then a relational database is the perfect choice for us. However, if there isn't any inherent relationship between different records that justifies storing our data in tables, or if read performance is the most important quality that we need for providing good user experience, then there are a other better alternatives that we're going to talk about in the next lecture. In this lecture, we started the discussion about databases. We learned about the first type of database called relational databases which is also commonly referred to as SQL databases because of their support for the SQL query language. Then we talked about some of the main advantages that we get from relational databases such as the ability to perform powerful and flexible analysis of our data, efficient storage, which allows us to eliminate data duplication, the natural structure of the data in human readable tables, and the last but definitely not least, the ACID transaction guarantees. After that, we talked about some of the drawbacks of typical relational databases which mainly revolve around the rigid schema that we have to define for each table. This rigid schema increases the complexity of our database and also provides challenges for scalability and performance. In the next lecture, we're going to talk about the other type of databases and what qualities those databases can provide to us that relational databases cannot. See you soon in the next lecture.
Non-Relational Databases
Now that we're familiar with the first type of databases, which is the relational databases, let's talk about the second type of databases, which is commonly referred to as non-relational databases or NoSQL databases. We're first going to get an introduction and motivation for using those type of databases. And after that, we'll learn a few categories of non-relational databases for different use cases and purposes. So let's start talking about non-relational databases. Non-relational databases are a relatively new concept which became very popular in the mid-2000s. They mainly came to solve the drawbacks of relational databases, which we saw in an earlier lecture. For example, in a relational database, records that belong to the same table have the same schema, which means they all have the same columns. So if we wanted to add additional data only to some records, we would have to change the schema of the entire table, even though that data makes sense only for a subset of records. So non-relational databases solve exactly that problem. They generally allow to logically group a set of records without forcing all of them to have the same structure. So we can easily add additional attributes to one or multiple records without affecting the rest of the existing records. Another problem we had with relational databases is that essentially, relational databases only support one data structure, which is a table. So while tables are very natural for human beings to analyze records, they are less intuitive for programmers and programming languages. In fact, most programming languages don't even have a table as a data structure. Instead, programming languages support other data structures that are more computer-science oriented, like lists, arrays, maps, and so on. So non-relational databases don't store data in tables, instead, depending on the type of database, they support more native data structures to programming languages. This typically eliminates the need for an ORM to translate the business logic as we store it inside our program to what it looks like inside the relational database. Finally, while relational databases were originally designed for efficient storage, back in the days when the storage used to be expensive, non-relational databases are typically more oriented towards faster queries. And different types of non-relational databases are optimized for different types of queries based on the use case. Of course, everything in software engineering is a trade-off and switching from a relational database to a non-relational database is no exception. When we allow flexible schemas and don't enforce any relationship between records, we lose the ability to easily analyze those records. Because essentially, now each record can have a completely different structure and completely different data. Similarly, analyzing multiple groups of records the same way we used to join tables in a relational database also becomes very hard. Most such operations are either limited or aren't supported at all since each database supports a completely different set of operations and a different set of data structures. Finally, ACID transaction guarantees are also rarely supported by non-relational databases, though there are a few exceptions to this rule. So now that we got familiar with what non-relational databases provide to us and what compromises we have to make to take full advantage of them, let's talk about the three main types of non-relational databases that are commonly used in the industry. As a side note, the boundaries between the different categories are somewhat blurry. So don't be surprised if you see the same database listed under a different category in different sources. The first and simplest type of non-relational databases is a key/value store. In a key/value store, we have a key that uniquely identifies a record and a value that represents the data associated with the record. This value is theoretically completely opaque to the database and can be as simple as an integer or a string, or as complex as an array, a set or even a binary blob. It's easy to think of a key/value store as essentially a large-scale hashtable or dictionary with very few constraints on the type of values we have for each key. Key/value stores are a perfect choice for use cases like counters that multiple services or application instances can read or increment, or for cashing pages or pieces of data that can be easily queried and fetched without the need to do any slow or complex SQL querying from a relational database. The next type of non-relational database is a document store. In a document store, we can store collections of documents with a little bit more structure inside each document. Each document can be thought of as an object with different attributes. And those attributes can be of different types. In a similar way, like a class, can have different fields of different types. So documents inside a document store are much more easily mapped to objects inside a programming language. A few examples of documents can be a JSON object representing a movie, a YAML configuration representing some business logic, or an XML representing a form submitted by a user. The final type of the database is a graph database, which is nothing more than an extension of a document store, but with additional capabilities to link, traverse, and analyze multiple records more efficiently. Those type of databases are particularly optimized for navigating and analyzing relationships between different records in our database. Some of the most popular use cases for graph databases include fraud detection, where multiple logical users may be easily identified as the same person trying to initiate multiple transactions using the same email or the same computer. Recommendation engines, such as the ones used in e-commerce, make extensive use of graph databases to recommend new products to users based on past purchase history of users with similar purchasing patterns or friends of a particular user. So the last question we need to answer is when should we use non-relational databases in our system. While following the same strategy, when deciding on using your relational database, we're going to take the same approach when it comes to choosing a non-relational database. We simply need to analyze our use case and figure out what properties of a database are the most important to us, and which one we can compromise on. Since generally speaking, non-relational databases are superior to relational databases when it comes to query speed, non-relational databases are a perfect choice for things like caching. So we can still have all our data stored efficiently without duplication in our relational database where we can perform complex queries when we need to, but additionally, we can store certain very common query results that correspond to user views in a non-relational database to improve the user experience. In-memory key/value stores are particularly optimized for such use cases. In other cases such as real-time big data where relational databases are just too slow or not scalable enough, we can also use non-relational databases such as document stores. Another perfect use case for a non-relational database is when our data is not structured and different records can contain different attributes. For those cases, non-relational databases like document stores are always a better choice than a relational database. A few examples include user profiles where different users may provide different types of information or content management, where we can have different types of user generated content such as images, comments, videos, and so on. Of course, if we're in neither of those use cases, then choosing a relational database is usually a safer bet due to its simplicity and longterm popularity. In this lecture, we learned about the second type of database, the non-relational database also commonly referred to as a NoSQL database. We'll learn some of the main advantages of non-relational databases such as flexible schemas, fast queries, and more natural data structures for programming languages. After that, we talked about the three main categories of non-relational databases, which are key/value stores, document stores, and graph databases. And finally, we talked about a few considerations for choosing a non-relational database and mentioned a few classic use cases that are particularly suitable for non-relational databases. See you guys in the next lecture.
Non-Relational Databases - Solutions
Key/Value Stores Examples
- Redis
- Aerospike
- Amazon DynamoDB
Document Store Examples
- Cassandra
- MongoDB
Graph Databases Examples
- Amazon Neptune
- NEO4J
Techniques To Improve Performance, Availability & Scalability of Databases
Now that we have a good understanding of what types of databases are available to us and how to pick the right database for our use case, let's continue our discussion about databases in the context of a large scale system. So in this lecture, we're going to learn about three techniques to improve the performance, availability and scalability of our database. The first technique which improves the performance of our database queries is called indexing. The purpose of indexing is to speed up retrieval operations and locate the desired records in a sublinear time. Without indexing, those operations may require a full table scan which can take a very long time when we have large tables. Let's look at a few examples of such operations. Let's assume that we have a very large table with many rows that contain data about our company's users. If we want to perform a query to find all the users that live in a particular city so we can send them relevant notifications for example, our SQL query would look like this. Now, internally to find all the rows in the table that match this condition, our database would have to scan linearly through all the rows. Which for large tables can take a very long time. Other examples of costly operations that involve a full table scan include getting the list of users in our table sorted. For example, by their last name, age, or income. So in addition to the sorting operation, we would still need to go the entire table at least once. Either of those operations if performed very frequently or on very large tables can become a performance bottleneck and impact our users' performance. This is where indexing comes in. A database index is a helper table that we can create from a particular column or a group of columns. When the index is created from a single column, this index table contains a mapping from the column value to the record that contains that value. Once that index table is created, we can put that table inside a data structure. For example, we can use a hashmap which provides us very fast lookups or we can use a type of self-balancing tree like a B-Tree, for example, which keeps all the values sorted so it's easier to both find a particular record in it and return a sorted list of rows that match a certain condition. For example, if we create an index from the city column and place that index in a hash table, then a query that looks for all the users that live in the city of LA can return this list immediately without the need to scan through the entire table. Similarly, if we want to get a sorted view of all the users based on their age within a certain range, then an index that is organized in a balanced tree will provide us with this view in a logarithmic time complexity that avoids both scanning the entire table, and sorting it for every query. Now, as we already mentioned, indexes can be formed not only from one column, but also from a set of columns. An example for that would be if we want to get all the users that live in a particular city and also have a particular last name. If we were to create an index only for the city column, we would get all the users who live in that particular city immediately. However, we would still need to scan linearly through all those users and check if their last name is our query. On the other hand, if we create a composite index from both columns, we can have a direct mapping from a pair of values to the row that contains them. So for the same query that looks for people that live in a particular city and have a particular last name, we can get the result immediately without any linear scan. Of course, as always, if we prioritize one type of operation, we have to make a tradeoff somewhere. In the case of indexing, we make the read queries faster in the expense of additional space for storing the index tables, and also the speed of write operations. The reason the write operations become slower is that whenever we want to update or add a new row, now we also need to update the index table. It's important to note that although we use the relational database for all our examples, indexing is also extensively used in non-relational databases such as document stores to speed up queries. The next technique we're going to talk about is database replication. When we store a mission critical data about our business in a database, our database instance become a potential single point of failure. So just like we avoided a single point of failure for compute instances, we're going to apply the same technique for databases. If we replicate our data and run multiple instances of our database on different computers, we can increase the full tolerance of our system which in turn provides us with higher availability. With multiple replicas of our data, we can make sure that if one of the replicas goes down or becomes unavailable either temporarily or permanently, our business is not affected because queries can continue going to the available replicas while we work to either restore or replace the faulty instance. In addition to higher availability, we can also get better performance in a form of higher throughput. If we have thousands or millions of users making requests to our database through our services, we can handle a much larger volume of queries if we distribute them among a larger number of computers. The trade off that we make when we introduce replication in our database is much higher complexity especially when it comes to write, update, and delete operations. Making sure that concurrent modifications to the same records don't conflict with each other and provide some predictable guarantees in terms of consistency and correctness is not a trivial task. Distributed databases are notorious for being difficult to design, configure, and manage, especially on a high scale. And it requires some competency in the field of distributed systems. Database replication is supported by pretty much all modern databases. In particular, all the non-relational databases incorporate replication out-of-the-box. As those databases were designed in an age where high availability and large scale were already a big concern for most companies. The support for replication in relational databases varies among different relational database implementations. The third technique we're going to talk about is database partitioning, which is also referred to as database sharding. Unlike replication where we run multiple instances of our database with the same copy of the data on each one of them, when we do database partitioning, we split the data among different database instances. And for increased performance, we typically run each instance on a separate computer. When our data is split among a group of computers, we can store way more data than we could originally if we had only one machine for our disposal. Additionally with partitioning, different queries that touch different parts of our data can be performed completely in parallel. So with database partitioning, we get both better performance and higher scalability. Of course, just like in the case of database replication, database sharding essentially turns our database into a distributed a database. This increases the complexity of the database and also adds some overhead, as now we also need to be able to route queries to the right shards and make sure that neither of the shards becomes too large in comparison to the others. Database sharding is the first class feature in pretty much all non-relational databases. The reason for that is that non-relational databases by design decouple different records from each other. So storing the records on different computers is a lot more natural and easier to implement. When it comes to relational databases, just like in the case of replication, the support for partitioning depends on the implementation. The reason it's less supported in relational databases is because relational database queries that involve multiple records are a lot more common. And having those records spread across multiple machines is just a lot more challenging to implement in a performant way while still supporting things like asset transactions or table joints. So when we choose a relational database for a use case that involves a high volume of data, we need to make sure that partitioning is well supported. While we're on the topic of partitioning, it's worth mentioning that partitioning is not only used for databases but can also be used to logically split our infrastructure. For example, we can partition our compute instances using configuration files so that requests from paid customers go to some machines and traffic from free users go to other maybe less powerful machines. Alternatively, we can send traffic from mobile devices to one group of computers and send desktop traffic to another group of computers running the same application. This way, if we have an outage, we can easily know what type of users are affected and decide how to act upon it. As a final note, those three techniques, indexing, replication and partitioning are completely orthogonal to each other. So we don't need to choose one over the other. And in fact, all three of them are commonly used together in most real life, large scale systems. In this lecture, we learned about three techniques that we can apply to our database to make it much more robust in a large scale system. We first learned about indexing, which improves the performance of our database by speeding up search and retrieval operations. Then we learned about database replication which improves our systems availability and performance through increased throughput. And finally, we learned about the third technique which is database partitioning. Database partitioning improves our database scalability by splitting up our data across multiple database instances running on different computers. I'll see you soon in the next lecture.
Brewer's (CAP) Theorem
In the previous lecture, we learned a few techniques to improve the performance, scalability and availability of our database. A few of those techniques turned our database into a distributed database. So although the details of implementing and managing a distributed database are outside the scope of this course, there is one topic that we, as software architects and system designers, absolutely need to understand. So in this lecture, we're going to talk about the CAP Theorem. We're first going to get the intuition behind this theorem. And later, we will define it more formally using precise terminology. So what is the CAP Theorem all about? The CAP Theorem, which was first introduced by Professor Eric Brewer in the late '90s, states that in the presence of a network partition, a distributed database cannot guarantee both consistency and availability and has to choose only one of them. Let's explain what it means with a more concrete example. Let's imagine that we have a database that we replicated onto multiple computers to provide high availability for our system. For simplicity, let's assume that this is a very simple NoSQL key-value store. Also, let's assume that inside the key-value store, we have a record that represents a counter that multiple services read and increment. So in normal conditions, we have no problem. As long as all those replicas can successfully communicate with each other through the network, any updates to this counter can easily propagate to all the replicas. For example, if Service A incremented the counter on Replica 1 and that update did not make its way to the rest of the replicas yet, and Service B wants to get the most up-to-date value of the counter, it can still get it even though it sends the query to Replica 2. Replica 2 can easily send a message to Replica 1 and Replica 3 to check if there were any modifications to the counter value. And if there was an update, then it can update its own copy and also return the most up-to-date value of the counter to Service B. Now it's important to note that the exact method of propagating the new values to all the replicas is a database implementation detail and is not actually that important for the theorem as long as all the replicas can communicate with each other through the network. Now, the problem happens when some replicas in our distributed database become inaccessible to others. For example, because of a network switch problem or some faulty network cables, Replica 1 and Replica 2 can still talk to each other. However, Replica 3 cannot talk to the rest of the replicas and is now isolated from the rest of the database. This type of problem is called a network partition. So now let's imagine what happens if Service A updates the counter maybe even multiple times on Replica 1 while Replica 3 has no way to get that update because of that network partition. So when Service B sends a read request to Replica 3, we have two options. The first option favors availability over consistency. With this option, Replica 3 responds to Service B with its own value of the counter knowing that this value may be inconsistent with the rest of the database. The second option favors consistency over availability. In this option, Replica 3 returns an error message to Service B telling it to try again in the future because at the moment, it cannot guarantee that the value it returns is the most up to date and that it's consistent with what the rest of the services see. This is essentially what CAP Theorem states that in the presence of a network partition in our distributed database, we have to choose either consistency or availability, but we cannot provide both simultaneously. It's important to note and it's something that many people tend to have confusion about is that this theorem forces our database to make this choice only when there's a network partition. However, in the majority of the time when there is no network partition and our replicas can freely communicate with each other, there is no tradeoff to be made and we can easily provide both consistency and availability. So now that we have a pretty good understanding of what CAP Theorem is all about, let's define the terms used in this theorem a bit more formally. The C in the CAP Theorem stands for consistency, A stands for availability and P stands for partition tolerance. The definitions for consistency and availability as used in the CAP Theorem are a bit different than the ones we used before throughout the course. In the CAP Theorem, consistency means that every read request receives either the most recent write or an error. In other words, if there are no network issues, a consistent database will return the value of the record that corresponds to the most recent write operation to that particular record. This consistency guarantees that all the clients see the same value at the same time regardless of which instance of the database they talk to. The definition of availability is that every request receives a non-error response without the guarantee that it contains the most recent write. This implies that occasionally, different clients may get different versions of a particular record. Some of them may be stale. But the most important thing when we have availability is that all requests return successfully with a valid value. Now, partition tolerance means that the system continues to operate despite an arbitrary number of messages being lost or delayed by the network between different computers. So with those definitions in mind, let's revisit the CAP Theorem and interpret what it means for us in practice. The CAP Theorem essentially tells us that when we either choose or configure a database, we have to drop one of those three properties. We can have a consistent and available database, but have no partition tolerance. We can choose to have a consistent and partition tolerant database, but if we get into a network partition, we would not be able to provide availability. Or we can prioritize availability and partition tolerance and compromise on consistency. But can we really have a database that guarantees both consistency and availability? Well, network partitions can happen at any time for any distributed database. Even a database that consists of only two replicas running on two separate computers with sufficient time and query volume will encounter a network partition. So the only way we can have a database that guarantees both availability and consistency is one that runs on only one computer. So only with a centralized database, we can avoid network partitions by eliminating network communication entirely. However, it's important to realize that with a high amount of data and query volume, such a database simply cannot scale and provide high enough performance and fault tolerance. So if we do choose to go the distributed route and run multiple replicas of our database on different computers, we have to also choose partition tolerance. And in that case, we have to choose to either drop availability or consistency. So when should we favor consistency over availability? If we go back to the same example of a shared counter, if this counter corresponds to the inventory of a particular item in our online store, then consistency is much more important than availability. The reason for that is, for example, if we're running low on a particular item and we have only one left in stock while two clients are trying to purchase this item, both clients should see the same consistent picture of our inventory. So if the first client completes the purchase before the second, the second client should see immediately that there are no more items left in stock and get an error if he tries to pay for that item. On the other hand, in a different system, that same counter can represent a completely different thing. For example, in a social media system, this counter can represent the number of likes or views for a particular post or video. So it would be inconceivable if a user got an error in the browser just because temporarily he cannot get the most accurate number of likes for a particular post. And it's completely acceptable to see a not so up-to-date number of likes for some duration of time. So in this case, we would definitely favor availability over consistency to go with our distributed database. Now, it's also worth noting that in practice, when it comes to consistency and availability, things aren't completely black and white. In other words, we don't have to choose entirely between availability and no consistency or 100% consistency and no availability. When we configure a distributed database, in most cases, we have a choice of how much availability and how much consistency we need or can tolerate. So it's more of a dial that we can move depending on our requirements. And anytime we move that dial and add more consistency, we get less availability. And if we want more availability, we get less consistency. The CAP Theorem is really one of the most pronounced examples of making tradeoffs when choosing the quality attributes for our system. And it's important to make those tradeoffs correctly in the architectural design phase when we formalize the non-functional requirements. In this lecture, we learned about a very important concept for databases that operate on a high scale. This concept is the CAP Theorem. After getting the intuition for the CAP Theorem, we learned the definitions of consistency, availability and partition tolerance with respect to a distributed database. And later, we formalized the CAP Theorem as a tradeoff between consistency and availability that we need to make in the presence of network partitions. And finally, we talked about the considerations we need to make when we choose between availability and consistency in a distributed database for our use case. I hope to see you soon in the next lecture.
Scalable Unstructured Data Storage
Now that we extensively covered the topic of databases, we have all the knowledge on how to store and access structured data effectively. However, there is one more type of data that we haven't discussed yet, which is equally important, and that is unstructured data. So in this lecture, we'll first define what we mean by unstructured data. Then we'll talk about some use cases where unstructured data is used. And finally, we'll talk about two different solutions with very different properties a distributed file system and an object store. So first of all, what is unstructured data? Unstructured data is data that doesn't follow a particular structure, schema or model. If we look at the data we stored in a non relational database, although the structure of each set of documents or key value pairs within the same collection did not have to match. Each record still had a well defined structure. On the other hand, when we talk about binary files like audio, video images, or even PDF documents, without a special tool that can decode the file's content, all we have is a blob which stands for binary large object. Of course, some databases allow storing blobs in them, but typically databases are optimized for structured data and not for unstructured data, and most of them impose strict size limits on storing such binary objects, typically in the order of megabytes, because otherwise those databases would suffer from performance and scalability problems. So before we talk about other solutions for storing unstructured data, let's talk about a few common use cases for storing and accessing such data. The first use case is allowing users to upload data to our system for further processing. We're talking about raw or uncompressed images, video files, audio files, or even documents that users upload to our system. This data can be compressed, trans coded and moved to a different location so it can be either shared, such as in the case of social media or streaming platforms, or used for backup purposes such as in the case of file hosting services. Another use case is database backup and archiving. In this case, we can take periodic snapshots of the state of our relational or non relational database used in our system. Those snapshots are organized in a proprietary database, specific binary formats. We can then treat those snapshots as unstructured data, which can be stored elsewhere for two purposes. One purpose is disaster recovery. In case there is an unfortunate event where we lose the primary or even the replicas of our database, we can always recover our data from that backup. The second purpose is archiving backups of transactions, emails or documents which can later be used for auditing. This archiving is required by law in certain industries like financial institutions or health care. The third common use case is web hosting. Any web content like images, thumbnails, digital downloads or other media we need to display on our website is unstructured data that we need to store somewhere, preferably with the ability to update frequently. And finally, we have data points collected for analytics or machine learning purposes. Those data sets can be huge, and they also can contain binary data, such as sensor measurements taken from Iot devices or images taken by satellites or surveillance cameras. The main feature of those use cases is that those data sets can be very big, which means we need our storage system to be able to scale to terabytes or even petabytes of data. And also every single object, like a video file, audio file or document on its own can be very big, way too big to be stored in any database. So now that we understand the need for unstructured data, let's look at two scalable solutions for storing and accessing such data. The first solution is the file system, or more specifically, a distributed file system. A distributed file system provides us with the same obstruction as if we store the data on our local hard drive, except instead of using a single storage device. We have a network of storage devices connected to each other through the network with different distributed file systems. We can get different replication consistency and auto healing guarantees depending on the type of that file system. But the main feature is that our binary objects are stored in a familiar way as files within folders in a tree like structure. One benefit of storing our unstructured data in a distributed file system is that we don't need any special API to access those files. Another benefit is we can easily modify those files if we need to. For example, if we want to modify a document or append data to the end of a log file or a video file, we can easily do that. The last benefit is that performance intensive operations such as big data analysis or transformations on big data sets are very efficient and fast When we do it directly on the distributed file system. This is particularly useful for machine learning projects or Iot. On the other hand, a distributed file system has limitations that make it suboptimal for certain scenarios. For instance, most distributed file systems are limited in the number of files we can create, which is a big scalability issue, especially if we have many relatively small files like images. Another problem is that we can't easily allow access to those files from a web API, like in the case of serving web content to the browser. And we would have to build additional abstractions on top of that file system to provide that access. So the second storage solution is using an object store. An object store is a scalable storage solution, specifically designed for storing unstructured data at internet scale. An object store can scale linearly like a distributed file system by adding more storage devices. However, unlike a distributed file system, we have virtually no limitation on the number of binary objects we can store in it. Also, object stores usually have a very high limit on the size of a single object, generally an order of magnitude of several terabytes. This makes object stores a perfect solution for database backups and archiving, but by far the best feature of a typical object store is that they expose a very easy HTTP rest API that makes them perfect for storing web content such as images, thumbnails, articles, digital downloads, or anything we can link to on our web page. The last major benefit of using an object store is object versioning. If we wanted to use object versioning in a file system, we would have to use an external version control system. However, when we use an object store, we get object versioning out of the box, which allows us to revert changes and undo delete operations very easily. In contrast to a file system, files are not stored in a directory hierarchy. Instead, they are stored in a flat structure organized in containers commonly called buckets. Generally, those buckets don't have a limit on the number of objects we can store in them. The only limit is our storage budget. The main obstruction and an object store is not surprisingly, an object. An object consists of a unique name or identifier and a value which is its content. In addition, it typically also contains metadata, which is a list of key value pairs that provide additional information about the object, like its size type or format. And each object also contains an access control list for managing permissions on who can read or override that object. Nowadays, pretty much every cloud vendor offers object store services on top of their storage infrastructure at very reasonable costs that are typically broken into tiers or storage classes. Each storage class has different pricing and SLA guarantees. The names of the tiers vary by cloud vendor, but usually at the top tier we have the most expensive option that provides the highest availability, usually around 4/9. It also offers the lowest latency, highest throughput and durability guarantees of above ten nines. This is the best option for frequent access and user facing content like video or images. In the middle tiers, we typically have a range of offerings with lower availability guarantees, typically around 30 nines as well as cheaper pricing. Those options usually have more limited performance and sometimes even have limits on the frequency with which we can access data stored in those storage classes. Those storage classes are perfect for data backups that aren't very frequently needed. Finally, at the lowest tier, we have the optimal storage classes for long term archiving purposes. Those storage classes are cheap and typically used by law firms, health care companies or financial institutions. Those companies use that storage to archive data for many years for legal purposes, but that data is rarely used. Now, in some cases, using a cloud based object store is not an option because of budget, legal or performance constraints. For that, there are quite a few open source and third party managed solutions for running an object store using on premise storage devices. And some of those solutions follow the same API as some cloud vendors, so you can easily use them in a hybrid cloud deployment where part of our system runs in the cloud and the other part runs on our private data center. And we can use the same object store API to store and access data in both locations. Finally, just like in a distributed file system, typically object stores use data replication under the hood. This ensures that losing physical storage will never result in actual loss of data. Now, despite all those benefits of using an object store, it has a few downsides to it that are worth mentioning. The first drawback is objects in an object store are immutable. In other words, unlike in a file system where we can always open a file and modify its content, we can't do it in an object store. The only thing we can do is replace an existing object in place with a new version. This, of course, has performance implications. If we want to store large documents for collaboration purposes and use cases like storing log files where we need to append new data are out of the question completely. Another downside is for creating or reading data to or from an object store. We either need to use a special API or a rest API and we can't access that data as easily as in a regular file system. Finally, distributed file systems are usually the preferred storage option over object stores for very high throughput operations. In this lecture, we'll learn about unstructured data and the issues we have in storing such data in data stores like relational or non relational databases. Later, we talked about some very common use cases for unstructured data, such as storing raw user uploaded data, backup and archiving, web hosting and machine learning. And finally, we'll learn about two solutions for storing such data, where the first one was a distributed file system, which is a more scalable and available version of the local file system we use in our computers. And the second option was an object store. This option may not be as performant as a distributed file system, but it is a much better suitable choice for web content thanks to its HTTP based rest API and ability to store many small and large objects.
Scalable Unstructured Data Storage - Cloud and Open Source Solutions
Cloud-Based Object Store Solutions
Amazon S3 (Simple Storage Service) - Amazon's highly scalable cloud storage service that stores object data within buckets. Designed to store and protect any amount of data for various use cases, such as websites, cloud-native applications, backups, archiving machine learning, and analytics.
GCP Cloud Storage - Google Cloud's managed service for storing unstructured data for companies of all sizes.
Azure Blob Storage - Microsoft's massively scalable and secure object storage for cloud-native workloads, archives, data lakes, high-performance computing, and machine learning.
Alibaba Cloud OSS (Object Storage Service) - Fully managed enterprise-ready Object Storage Service to store and access any amount of data from anywhere.
Open Source and Third-Party Object Store Solutions
OpenIO - A software-defined open-source object storage solution ideal for Big Data, HPC, and AI. It is S3 compatible and can be deployed on-premises or cloud-hosted on any hardware that you choose.
MinIO - High-performance, S3-compatible object storage. It is native to Kubernetes and 100% open source under GNU AGPL v3.
Ceph - Open-source, reliable and scalable storage. Ceph provides a unified storage service with object, block, and file interfaces from a single cluster built from commodity hardware components.
Section 7: Software Architecture Patterns and Styles
Relational Databases & ACID Transactions
In this section, we're going to learn about the most popular and useful Architectural Patterns for modern software systems. But before we talk about any of those patterns, let's have a brief introduction to what Architectural Patterns are and when to use them. Software Architectural Patterns are general repeatable solutions to commonly occurring system design problems. Unlike design patterns, such as a singleton, factory, or strategy that you may have heard of, which are used to organize code within a single application, Software Architectural Patterns are common solutions to software architectural problems that involve multiple components that run a separate runtime units, such as applications or services. Over the years, many software architects have been observing how other companies in similar industries went about solving similar design problems to their own and tried to learn what worked for them and why. Even more importantly, they observed what mistakes were made so that other companies wouldn't have to waste resources repeating those same anti-patterns. Those general software architecture practices that seemed to be successful became what we know as Software Architectural Patterns. Now, let's talk about why we should be using those Software Architectural Patterns. As software architects and system designers, we're not strictly required to follow any of those patterns. However, we do have a few incentives to follow them. The first incentive is to save valuable time and resources for ourselves and our organization. Essentially, if somebody tells us that other companies that had a very similar use case to the problem were trying to solve and operate on a similar scale to ours already found an architecture and development practice that works for them, then it's better for us to take that knowledge and use it instead of reinventing the wheel and trying to come up with something completely new. The second motivation to use an existing Software Architectural Pattern is to mitigate the risk of getting into a situation where our architecture resembles what's called a big ball of mud. The big ball of mud is an anti-pattern of a system that seemingly lacks any structure, where every service talks to every other service, everything is tightly coupled together, all the information is global or duplicated, and there's no clear scope of responsibility for any of the components. It's needless to say that it's not a situation we want to be in. Surprisingly, many companies got into this situation or some variation of that situation due to many reasons such as rapid growth or lack of overarching, well-defined software architecture that the entire organization adhered to. A system that gets into this situation is very hard to develop, maintain, and scale, which can have detrimental consequences for our business. So we definitely want to avoid it. And finally, the third incentive to follow a well-known Software Architectural Pattern is that other software engineers and architects can continue working on our system and can easily carry on and stick to the same architecture because everyone can read about the pattern we're following and understand exactly what to do and what not to do. However, with all of that in mind all the Software Architectural Patterns we're going to learn are just guidelines. And at the end of the day, our job is to define the best software architecture for our use case and apply what is relevant to our unique situation. Before we proceed to learn about the first Architectural Pattern, there's one more thing I want to point out. As systems evolve, certain Architectural Patterns that used to fit our system perfectly may not fit us anymore, and that is expected. At that point, we would need to do some restructuring and potentially migrate to a different Software Architectural Pattern that now fits us better. However, the best part about following those common Architectural Patterns is that many companies already went through such migrations in the past so we can follow their best practices to make those migrations quickly and safely. So now that we got some introduction to what we're going to learn in this section, let's proceed to the next lecture and learn about our first Architectural Pattern.
Pattern 1: Multi-Tier Architecture
• Two Tier Architecture (Java Desktop) • Three Tier Architecture (monolithic) • Four Tier Architecture (with an API Gateway as the 2nd tier)
In this lecture, we're going to learn our first architectural pattern: The Multi-Tier Architecture. We will first get a general introduction to this pattern, and later, we will proceed to learn about some of the most common variations of this pattern. Most importantly, by the end of the lecture, we'll know when to apply this pattern to our system and when to seek other solutions. So let's start with a general introduction to the Multi-Tier architectural pattern. The Multi-Tier Architecture organizes our system into multiple physical and logical tiers. The logical separation limits the scope of responsibility on each tier, and the physical separation allows each tier to be deployed, upgraded, and scaled separately by different teams. It's important to point out that, although occasionally, you may hear people using the terms Multi-Tier and Multilayer architecture interchangeably, they are, in fact, two different concepts. Multi layered architecture usually refers to the internal separation inside a single application into multiple logical layers or modules. However, even if the application is logically separated into multiple layers, at runtime, it will run as a single unit and will be considered as a single tier. When we talk about Multi-Tier Architecture, we mean that the applications in each tier physically run on different infrastructure. Besides the benefits of logical and physical separation that allows us to develop, update and scale each tier independently, there are a few restrictions in this architectural pattern that simplify the design. The first restriction is that each pair of applications that belong to adjacent tiers communicate with each other using the Client-Server Model, which is a perfect model for RESTful APIs, for example. The second restriction discourages communication that skips through tiers. This restriction keeps the tiers loosely coupled with each other, which again, allows us to easily make changes to each tier without affecting the entire system. So now that we got an introduction to the Multi-Tier architectural pattern, let's talk about one of the most common Multi-Tier architectural variations, which is the Three-Tier Architecture. The Three-Tier Architecture, to this day, is one of the most common and popular architectural patterns for client-server, web-based services. In the Three-Tier Architecture, the top level tier contains the user interface, and is often referred to as the Presentation Tier. The responsibility of the Presentation Tier is to display information to the user and also take the user's input through a graphical user interface. Examples of the Presentation Tier include a webpage that runs on the client's browser, a mobile app that interacts with the user on a mobile device, or a desktop GUI application. The Presentation Tier normally does not contain any business logic for many reasons. One of those reasons, that applies specifically to code that runs in the client's browser, is that this code is directly accessible and visible to the user. Since, generally, we don't want the user to see and potentially change our business rules, it is an anti-pattern to include any business-specific logic in the Presentation Tier in general. The second, middle tier, is called the Application Tier or sometimes also referred to as the Business Tier or the Logic Tier. The Application Tier is the tier that provides all the functionality and features that we gathered from our functional requirements. This tier is responsible for processing the data that comes from the Presentation Tier and applying the relevant business logic to it. The last tier is the Data Tier. This tier is responsible for storage and persistence of user and business-specific data. This tier may include files on our file system, and most commonly, it includes a database. So, what makes the Three-Tier Architecture such a popular choice? One of the reasons that the Three-Tier Architecture is so popular is because it fits a large variety of use cases. Almost any web-based service fits this model, be it an online store, a news website, or even a video or audio streaming service. It's also pretty easy to scale horizontally to take large traffic and handle a lot of data. And let's see how. The Presentation Tier, of course, does not need any special scaling because it simply runs on the user's devices, so it essentially scales by itself. If we keep the Application Tier stateless, like we should when we use something like a REST API, we can then easily place the Application Tier instances behind a load balancer and run as many instances as we need. And finally, the database can also be scaled very easily if we use a well-established distributed database, which we can scale by using the techniques that we already learned, such as replication and partitioning. Finally, this architectural style is very easy to maintain and develop because all the logic is concentrated in one place, in the Application Tier, where all the back end development actually happens. And we don't need to worry about the integration of multiple code bases, services or projects. So, when does the Three-Tier Architecture stop working for us? Well, the Three-Tier Architecture has one major drawback, which is exactly the reason why it's so popular and easy to apply. This drawback is the monolithic structure of our logic tier. Since, as we mentioned earlier, we do not want to have any business logic in the Presentation Tier, and of course, we can't have any logic at all in the data tier, we are in a situation where all our business logic is concentrated in a single code base that runs as a single runtime unit. That has two implications. The first implication of this drawback is that each of our application instances simply becomes too CPU intensive and starts consuming too much memory. This makes our applications slower and less responsive, especially with memory-managed languages like Java or C#, which, as a result, will have much longer and more frequent garbage collections. All those issues may require us to start upgrading each computer we run our application on and we already know that vertical scaling is both expensive and limited. The second implication of the monolithic nature of the Application Tier is Low Development Velocity. As our code base becomes larger and more complex, it gets much harder to develop, maintain and reason about. And hiring more developers will not add too much value because more concurrent development will simply cause more merge conflicts and higher overhead. We could, somewhat, mitigate this problem by logically splitting the application's codebase into separate modules. However, those modules will still be somewhat tightly coupled because we can release new versions of those modules only when the entire application is upgraded. So, in other words, the organizational scalability of the Three-Tier Architecture is somewhat limited. So in conclusion, the Three-Tier Architecture is the perfect choice for companies whose code base is relatively small and not too complex, and also, it's maintained by a relatively small team of developers. This includes early-stage startup companies, as well as well-established companies that fit those criteria in terms of the size of the codebase and the organization. In addition to the Three-Tier Architecture, there are a few other variations to the Multi-Tier architectural pattern. So, let's take a look at a few of them. Besides the obvious One-Tier Architecture, which is simply a standalone application, we have the Two-Tier architectural pattern, which is maybe less common than the Three-Tier Architecture, but is still pretty popular. In the Two-Tier Architecture, the top-level tier includes both the presentation and the business logic and usually runs as a feature-rich mobile or desktop application. The second tier is the data tier, which, just like before, takes care of the storage and persistence of the users and business data. The Two-Tier Architecture eliminates the overhead of the middle Logic Tier and usually provides a faster and more native experience to the users. A few examples of the Two-Tier Architecture use cases include desktop or mobile versions of a document, image, or music editor that provides all the functionality and the graphical interface on the user's device, while all the storage and backup takes place on a remote server that belongs to the company that provides this application. On the other end of the spectrum, we can have a fourth tier in between the Presentation Tier and the Business Tier, which separates some of the functionality that doesn't belong in either of those tiers. For example, if we support multiple client applications in the Presentation Tier that have different APIs to us or have different data and performance requirements, we can introduce something like an API Gateway Tier that can take care of security, API, and format translation, as well as caching. Having more than four tiers is extremely rare because more tiers, normally, do not provide much value and simply adds additional performance overhead. This overhead comes from the restriction against bypassing tiers, which otherwise, would lead to tight coupling. So now, every request from the client has to pass through multiple services, which can increase the response time for the user. However, we rarely need to pass every request through so many services. So in the following lectures, we will look at other and better options of organizing our system if we need to split our code base into multiple components. In this lecture, we learned about the first architectural pattern, the Multi-Tier Architecture. We learned about its general structure, which splits our system architecture into multiple logical and physical tiers, which allows us to develop, upgrade, and scale each tier completely independently. After that, we talked about the most common variation of the Multi-Tier Architecture, which is the Three-Tier Architecture. More importantly, we talked about the use cases, the scale, and the complexity of the code base that best fits this architectural model, and when we should consider other alternatives. And finally, we concluded with a few other variations of the Multi-Tier Architecture, the Two-Tier Architecture and the Four-Tier architectures, which are also very common. I'll see you soon in the next lecture.
Pattern 2: Micro-Services Architecture
- microservices architecture organizes our business logic as a collection of loosely coupled and independently deployed services.
- Each service is owned by a small team and has a narrow scope of responsibility.
Advantages:
- Smaller Codebase.
- Better Performance & Horizontal Scalability
- Better Organizational Scalability
- Better security (in the form of fault isolation)
With the Codebase being smaller, we benefit because:
- Development becomes easier and faster.
- The codebase loads instantly in our IDE.
- Building and testing the code becomes easier and faster.
- Adding more features becomes easier.
- New developers can become fully productive faster.
Breaking the monolithic application into micro-services benefits us in performance because:
- Instances become less CPU intensive and less memory-consuming.
- Services can be scaled horizontally by adding more instances of low-end computers.
We get a lot of benefit on the organizational level, because:
- Each service can be independently developed, maintained, and deployed by a separate small team.
- leads to getting a high throughput from the entire organization as a whole.
- Each team is autonomous to decide on: programming languages, frameworks, technologies, release schedule/process they want to follow.
Security benefit are:
- if we have an issue in one of our services, or ir crushes, it is easier to isolate it and mitigate the problem.
Micro-Services Considerations:
- We don't get all these benefits out-of-the-box.
- They do come with a fair amount of overhead and challenges
- Follow best practices:
- SRP (Single Responsibility Principle)
- Separate database per Service. Data duplication is to be expected! This is an overhead we need to accept!
In this lecture, we're going to learn about another very popular architectural pattern which is called Microservices Architecture. We will start with the motivation for using this architectural pattern by comparing it to the Monolithic three-tier Architecture pattern, which we learned in a previous lecture. And later we will talk about the advantages, best practices, and challenges that come with this architectural pattern. So what is microservices architecture and when should we use it? In the lecture about the multi-tier architectural pattern, we learned about the three-tier architecture which we also refer to as a monolithic architecture, because all the business logic was concentrated in one single service in the application tier. As we mentioned earlier, the monolithic three-tier architecture is the perfect choice for small teams that manage a fairly small and not too complex codebase. However, as the size and complexity of our codebase grow, it becomes extremely difficult to troubleshoot and add new features to it, as well as build, test, and even load the entire codebase in our IDE. On the organizational scalability dimension, we also start having problems, because the more engineers we add to the team, the more code merge conflicts we get and our meetings become larger, longer, and less productive. Once we start seeing those problems, we need to start switching gears and consider migrating our architecture towards microservices. So what is microservices architecture and how does it solve all the problems that we just mentioned? In contrast to the monolithic three-tier architecture, microservices architecture organizes our business logic as a collection of loosely coupled and independently deployed services. Each service is owned by a small team and has a narrow scope of responsibility. This architectural style offers us a lot of advantages. The narrow scope of responsibility in each microservice makes the codebase a lot smaller than what we had in the monolithic architecture. This provides us with a wide range of benefits. With a small codebase, development just becomes a lot easier and faster. For example, now the code base loads instantaneously in our IDE. Also building and testing the code becomes much easier and faster simply because there are far fewer things to build and test. Additionally, troubleshooting or adding new features also becomes much easier because the code itself is much easier to reason about, and new developers who join the team can become fully productive a lot faster. In terms of performance and scalability, we also get a lot of benefits from breaking the monolithic application into microservices. Each instance of our microservice becomes a lot less CPU intensive and consumes far less memory so it can run much smoother on commodity hardware. And we can scale each such service horizontally by adding more instances of fairly low-end computers. On the organizational scalability, we also get a lot of advantages. Since each service can be independently developed, maintained, and deployed by a separate small team, we can get very high throughput from the entire organization as a whole. On top of that, each team is completely autonomous to decide what programming languages, frameworks, and technologies they want to use and what kind of release schedule or process they want to follow. Finally, if all those benefits aren't enough, we also get better security in a form of Fault Isolation, which means that if we have an issue in one of our services or one of our services starts crashing, it's a lot easier for us to isolate and mitigate the problem than if we had an issue in our single monolithic application. Now it's hard not to get excited about this style of architecture. And unfortunately, a lot of organizations jump too quickly to microservices without considering two major factors. The first factor is while theoretically we can achieve all those benefits from migrating to microservices, we don't get all of them for free out of the box just by splitting our monolithic codebase into an arbitrary collection of services. So in order for us to get the full benefit of this architecture, there are a fewer rules of thumb and best practices that we need to follow. And if we don't follow them, we can easily fall into our nemesis, the Big Ball of Mud. The second factor to consider is that microservices do come with a fair amount of overhead and challenges which have to be taken into consideration before migrating to this architectural style. So first of all, in order for us to achieve full organizational decoupling so that each team can operate independently, we need to make sure that the services are logically separated in a way that every change in the system can happen only in one service so it wouldn't involve multiple teams. Otherwise, if every single change requires meeting with another team, coordinating the development and the release of the new feature, then we don't gain much from the migration to microservices. To achieve this, there are a few best practices that we need to follow. The first best practice is the Single Responsibility Principle, which means that each service needs to be responsible for only one business capability, domain, resource, or action. For example, if we have an online dating service, we can have the following microservices separated by their business capability and sub domains. The user profile service is responsible for the business object that relates to the user's profile and every way that the user can interact with it. The image service takes care of the storage, resizing, and presentation of the profile related images. The matching service is responsible for matching different users based on the rules that we define as well as the preferences that the users define in their profiles. And finally, the billing service is responsible for everything related to charging the user's money for the services they're consuming. So now when the user wants to update their profile or see someone else's profile, they would simply be directed to the user profile service. If they want to upload images, they can be directed to the image service. And when they want to get suggestions for potential partners, they can contact the matching service, which would in turn talk to the user profile service to figure out which profiles would be a good fit for the current user. And of course whenever the user wants to purchase additional services, they can be directed to the billing service, which in turn may contact the user profile service to unlock certain features for that particular user. Let's take another example from the e-commerce space and this time we will separate the microservices by the actions related to the way users interact with our system as well as the entities we have in our system. On the action side, we have the product search service which given a search query from the user performs a search and returns a personalized view of the relevant products for that particular user. But to find those products it talks to the product inventory service which encapsulates the products entity. Another action oriented microservice we have is the checkout service. So when the user goes to checkout, they will see what they have in the cart and also see the amount of tax they need to pay which we accomplish by talking to another action oriented microservice, the tax calculator. This service will calculate the tax based on the product's price, category, and the user's location. And finally, when the user confirms the purchase, the checkout service will orchestrate the entire transaction by talking to the billing service, shipping service, and the product inventory service, which are all entity oriented services. Additionally, we can also break the API gateway that used to be a monolithic service into multiple microservices. So depending on the type of devices or clients, we can have separate API Gateway services that are more specialized and therefore more lightweight. Now the second best practice to make sure that there is no coupling between different services is to have a separate database for each service. Otherwise, if two services share a single database, then every single schema or document structure change will require careful coordination between multiple teams. On the other hand, if each service has its own database, then the database essentially becomes an implementation detail of each service, and it can be easily updated or replaced completely transparently to the rest of the system. It's important to note that when we split the original monolithic database and provide each service with its own database, the data has to be split in such a way that each microservice can be completely independent and fully capable of doing its work while minimizing the need to call other services. Of course, some data duplication is expected and is the overhead that we need to accept when we move to this type of architecture. So in conclusion, following all those best practices will definitely set us in the right direction for success using this architectural style. However, I want to reiterate that microservices architecture provides all those benefits despite the complexity and overhead only when we reach a certain complexity and organizational scale. And it's always best to start with the simple monolithic approach first and only when that architecture stops working for us, we should consider microservices. In this lecture, we'll learn about a very popular and useful architectural pattern and style which is called Microservices Architecture. We talked about all the benefits that this architectural style provides us with, especially when we reach a point of complexity and scale that the monolithic three-tier architecture stops being a good fit for us. All those benefits result in higher organizational and operational scalability, better performance, faster development, better security, and so on. And finally, we talked about a few considerations and best practices that we need to follow to achieve all those benefits when we use Microservices.
Pattern 3: Event-Driven Architecture
- Fact events
- User clicking on an ad
- Item being added to a shopping cart
- Change events
- Player of a video game
- IoT device (lamp)
3 components in an event-driven architecture:
- Event Emitters / Producers
- Event Consumers
- Event Channel / Message Broker
Decoupling of Microservices.
Event Sourcing Pattern.
CQRS Pattern. (Command Query Responsibility Segregation)
CQRS Pattern solves 2 problems:
- The first problem is optimizing a database that has a high load of both Read and Update Operations.
- Joining multiple tables located in separate databases that belong to different microservices.
In this lecture, we're going to learn about another architectural style and pattern, which is called Event-Driven Architecture. First, we will get an introduction to what Event-Driven Architecture is. And later, we will learn about all its benefits, as well as use cases where this type of architecture can be very powerful and useful. So what is an Event-Driven Architecture and what are its main components? If we recall from the lecture about microservices, if microservice A wants to communicate with microservice B, then not only microservice A needs to be aware of the existence of microservice B but also it needs to know what API microservice B provides and how to call it. Also, at runtime, microservice A has to call microservice B synchronously and wait for its response. So from what we just described, we can see that, essentially, microservice A has a dependency on microservice B. In an Event-Driven Architecture, instead of direct messages that issue commands or requests that ask for data, we have only events. An event is an immutable statement of a fact or a change. For example, a user clicking on a digital ad or an item being added to a shopping cart can be thought of as fact events while a player of a video game or an IoT device, like a vacuum cleaner, moving from one position to another can be thought of as change events. In an Event-Driven Architecture, we have three components. On the sending side, we have the event emitters, which are also referred to as producers. On the receiving end, we have the event consumers. And in between, we have the event channel, which is essentially the message broker we learned about in a past lecture. When we use the Event-Driven Architecture style with microservices, we can get a lot of very useful properties and benefits. For example, now, if microservice A communicates with microservice B by producing events, the dependency of microservice A on microservice B is removed. In fact, microservice A doesn't need to know anything about the existence of microservice B. And once microservice A produces the event, it doesn't need to wait for any response from any consumer. Because services don't need to know about each other's existence or API and all the messages are exchanged completely asynchronously, we can decouple microservices more effectively, which in turn provides our system with higher scalability. And we can add more services to the system without making any changes. For example, if we have a banking system, we can start with just two services. The first service is the front-end service, which simply provides the user with the user interface and takes in the user's input, such as money deposits, transfers between accounts, and so on. The second service is the account service, which maintains and updates the balance for each user. Now, if we use an Event-Driven Architecture, then every action that the user performs on their account can be events that are produced by the front-end service. And the account service simply subscribes to those events so it can consume them and update the user's balance. Now, because we decouple those services using a message broker, we can easily add a mobile notification service, which subscribes to the same channel. And every time a user makes changes to their account, the service can send push notifications to the user's mobile device about the activity in their account. Notice that adding the service was super easy and did not require us to make any changes to the front-end service. Later, we can just as easily add a Fraud Detection Service, again, without making any changes to the existing services. And similarly, we can add another producer service that can integrate with other third-party services. For example, it can integrate with utility companies that charge our clients automatically for gas, electricity, or water. Or we can integrate with payroll services that perform direct deposits to the client's account on behalf of the client's employer. And adding this producer will also not require us to make any changes to our system. In addition to the horizontal and organizational scalability, Event-Driven Architecture allows us to analyze streams of data, detect patterns, and act upon them in real time. For example, the Fraud Detection Service can detect suspicious activity in the user's account by looking at the stream of events that happen in the account in real time without waiting for this data to be post-processed and stored in some database. For example, it can look at the recent transactions, and notice that in the last hour, two transactions happened in stores and restaurants in Los Angeles, California while one of the transactions happened in a remote location somewhere in a different state. The Fraud Detection Service can easily detect the suspicious activity, which is possibly a result of someone stealing our user's credit card or account information and trying to make a purchase. Similarly, by analyzing a stream of transactions in real time, it can detect that five different transactions were made within a short amount of time, which simply cannot correlate with real-human activity. This activity would be immediately flagged as fraudulent and would trigger the Fraud Detection Service to communicate with the account service, which would freeze the user's account. And it would also communicate with the notification service, which would alert the user about the situation. When we store all the events that happen in our system inside a message broker besides performing real-time analysis, we can also use this information to implement very powerful architectural patterns. One of those architectural patterns is called event sourcing. If we continue with the same example of the banking system and imagine what a hypothetical log of events for a particular user would look like, we can notice that, essentially, this event log represents all the transactions that happened in the user's account. And if we replay all those transactions from the beginning of time, we can arrive at the account balance that is currently stored in the account service database. So by using the event sourcing pattern instead of storing the current state in a database, we can simply store only the events, which can be replayed whenever we need to know the current state, and we eliminate the need for the database. Now, because events are immutable, we never modify them. And we simply append new events to the log as they come. By using this pattern, we can add another service that can generate a statement or look up any number of transactions that happened in our user's account simply by looking at the last N events in the log. And if our Fraud Detection Service decided that one of those transactions was not approved by the customer, it can easily be fixed by adding another event into the log that compensates for it, and all of that without the need to freeze the user's account or modifying records in a database. For example, if the user was charged $100 by somebody who stole the user's credit card information, the Fraud Detection Service can simply credit the user's account with $100. And all of that information will be reflected in the user's statement immediately. Using the event sourcing pattern, we can choose to store those events for as long as we want and allow the user to look back at his transactions from 5 or even 10 years ago. But we can also make our querying faster by adding snapshot events, let's say, every month that summarizes everything that happened until that point of time. Another very powerful architectural pattern that we can implement using Event-Driven Architecture is called CQRS. CQRS stands for command query responsibility segregation. This pattern can solve two problems for us. The first problem is optimizing a database that has a high load of both Read and Update operations. When we have a database that has a high load of both Read and Update operations, concurrent operations to the same records or tables contend with each other, making all the operations very slow. Additionally, if we use a distributed database, generally, we can optimize it only for one type of operation at the expense of the other. For example, if we have a read-intensive workload, we can compromise on slower writes, or if we have our write-intensive workload, we can compromise on the performance of Read operations. However, in the case when both operations are equally frequent, we have a problem. The CQRS architectural pattern allows us to separate Update and Read operations into separate databases, sitting behind two separate services. In this case, service A would take all the Update operations and perform those updates in its own database, where it optimally stores the data for such updates. But additionally, every time an Update operation is performed, it also publishes an event into a message broker. Meanwhile, service B subscribes to those update events and applies all those changes in its own read optimized database. And now all the Read operations will only go to service B. Now both Update and Read operations can go to two separate services without interfering with each other. And the data in each service is stored in an optimized way for each type of operation. The second problem that CQRS architectural pattern helps us solve is joining multiple tables that are located in separate databases that belong to different microservices. Before we split our monolithic application into microservices, we had all the data tables inside one single database. If that was a relational database, we could easily and relatively quickly combine and analyze records from multiple tables by simply using the SQL Join operation. However, once we migrate to microservices architecture and follow the best practice of having a separate database for each microservice, those Join operations are a lot harder. Now we need to send a request to each service separately, which is a lot slower. And then we also need to combine this data programmatically because now we potentially have different types of databases, some of which may not even be relational databases. So CQRS solves exactly that problem. Now, every time there is a change in the data stored in service A or service B databases, those services would also publish those changes as events to which service C subscribes. Meanwhile, service C stores what's called a materialized view of the joined ready-to-query data from both service A and service B in its own read-only database. And now, whenever we need to get a join view, we don't need to send a request to two different services. Instead, we need to send a request to service C only, which already has the data ready for us. Now let's demonstrate both of those use cases for CQRS pattern in a real-life example. Let's assume that we have an online store where we have hundreds of thousands of products and we have millions of users who search for and leave reviews for those products daily. So if we use the microservices architecture, we would have the product service that has its own database. That database contains things like the product's name, their inventory count, description, price, and so on. We would also have the review service, which manages and stores all the reviews for all those products. So as we can imagine, reviews are coming to our system constantly so updates to the reviews database are very frequent. Similarly, our product's inventory keeps changing, as people purchase products or additional products are added and updated. But we also need to read and combine the product's information and their reviews information very quickly and frequently because users constantly search for products and want to see both their description and prices, which would come from the product service, and also see each product's reviews and rating, which we would need to get from the review service. So to solve this problem, we can use the CQRS architectural pattern. By using this pattern, we can add the product search service, which would have its own database. This database would store all the necessary combined data for each product and its reviews. And the service would also subscribe to updates to this data from the review service and the product service. So now a user who wants to search for a vacuum cleaner or maybe a pair of shoes can get a page with all the potential results very quickly by sending a request only to the product search service. And if the user wants to sort by or filter by the number of reviews or rating, it can also be done very easily and quickly. In this lecture, we learned about the Event-Driven Architecture. We learned about a few benefits that this type of architecture provides to us, especially when combined with the microservices architecture. A few of those benefits include decoupling microservices, which provides us with higher horizontal and organizational scalability. Additionally, Event-Driven Architecture allows us to analyze and respond to large streams of data in real time. We also learned about two very useful and powerful architectural patterns within Event-Driven Architecture. The first one was event sourcing, which allowed us to store and audit the current state of a business entity by only appending immutable events and replaying them when we need to. And the second architectural pattern was CQRS, which allowed us to optimize our database for both Update and Read operations by completely splitting the operations to different services. This pattern also allowed us to quickly and efficiently join data from completely different services and databases. I hope you learned a lot in this lecture, and I will see you all very soon.
Section 8: Big Data Architecture Patterns
Introduction to Big Data
3 Vs:
- Volume
- Variety
- Velocity
- Visualization
In this lecture, we're going to get an introduction and motivation for big data processing. So let's start with describing what big data means. Big data is a term used to describe datasets that are either too large in size, too complex in structure, or come to our system at such a high rate that exceeds the capacity of a traditional application to process it fast enough to provide any value. There are many characteristics of big data, but three of them are the most prominent. The first characteristic of big data is volume. Volume refers to the quantity of the data that we need to process, store, and analyze. When we talk about big data, we're talking about large quantities of data, typically in the order of magnitude of terabytes, petabytes, or even more per day. A few technology fields where we have a high volume of data include internet search companies that have to analyze the entire internet and provide instant search capabilities for their users. Another example is medical software systems that collect, analyze, and store a lot of information about patients in hospitals or clinics and can help in preventing or detecting diseases. Then we have real-time security systems that analyze multiple video streams coming from cameras located throughout certain neighborhoods, cities, or high security facilities. The purpose of those systems is to help combat crime. Finally, we have the weather prediction systems that analyze a lot of data from different sensors located on satellites, as well as in different locations throughout the large geographical region. Those systems can help us predict the weather, as well as alert us about upcoming storms or tsunamis. The second characteristic of big data is variety. In traditional non-big data systems, we typically work with a limited number of structured and well-defined types of data. However, when we move to the field of big data, we can have a large variety of potentially unstructured data that we collect from multiple sources. Our goal is to process all that data and combine it together through a process called data fusion. This can help us find hidden patterns or provide business insights for our organization that aren't obvious if we analyze only one data source. An example of that are social media services or apps that collect a lot of different types of data about the behavior of their user base in real time. For instance, they can collect information about the user's clicks, likes, shares, or posts, as well as capture the amount of time a user spent watching a particular video or even hovering over an article or a digital ad. All those types of seemingly unrelated activities can be combined together and build models that can predict the behavior and response of each user to future ads of particular products. But also on an aggregate level, if we combine all that data from multiple users, we can detect internet trends, as well as clusters of interest. Now finally, the third characteristic of big data is velocity. When we deal with big data, we normally have a continuous stream of data that comes to our system at a very high rate. The high rate of incoming data can be either due to the large scale of our system or simply the high frequency of events. For example, if we have an online store that operates on a global scale with millions of users visiting our website every day, browsing and purchasing our products, then the higher rate of events simply comes from the fact that we have a very large number of users. On the other hand, we have the field of internet of things which deals with connecting multiple devices and getting analytics from their sensors. In this case, we can have a relatively small fleet of buses, trains, or autonomous cars, but all those vehicles can generate lots and lots of data points from their sensors about their location, speed, surrounding objects and so on. Similarly, we can have a food or clothing production factory which is full of robots, assembly lines, and other machinery that constantly generate data about their production quality and speed from their sensors. In both those use cases, the number of devices may not be large. However, each sensor on a robot, a piece of machinery, or autonomous car can generate a continuous stream of data points that we have to ingest very quickly. Now, it's important to point out that storing and processing big data is pretty complex, as well as very expensive, but the value we get from it usually outweighs the cost associated with it. The insights we get from analyzing big data can provide a significant competitive advantage over our competitors. Those insights can come in a form of visualization, querying capabilities or predictive analysis. Visualization is a very powerful tool that can allow humans to make sense of otherwise meaningless data stored in some file system or database. In many cases, after we collect a lot of data, we don't necessarily know what to do with it or how we can benefit from it right away, so querying capabilities allow us to run ad hoc analysis on that data, which eventually helps us find those insights or patterns that were not obvious before. And finally, on the predictive analysis front, we can go as fancy as building algorithms or machine learning models to predict the behavior of our users and suggest products that they will more likely purchase. But we can also go as simple as detecting anomalies in our system by analyzing logs coming from our servers and automatically roll back a new release or alert the engineers on-call. So now that we have the understanding and the motivation behind big data processing, let's proceed and learn about a few architectural styles that help us in processing and analyzing big data.
Big Data Processing Strategies
Now that we got a good introduction to the field of Big Data in general, let's learn about two architectural patterns for processing big data using event driven architecture. But before we talk about big data processing, let's clearly define the problem we're trying to solve.
The Problem
Let's assume we're getting a continuous stream of data coming from different sources into our system. This may be user interactions on our social media platform, logs or metrics coming from production application instances, or data coming from transportation devices such as planes, trains, or autonomous cars. All this large volume of raw data is coming to us in real time. And we want to process it so we can analyze it and provide insights, visualization, or predictions. Now how do we go about processing this data? Well, there are generally two strategies or patterns we can use.
Strategies To Solve
Batch Processing
The first strategy is called Batch Processing. When we use batch processing, we normally store the incoming data either in a distributed database or more commonly directly on a distributed file system. That data is never modified and we only append to it as more data arrives. Now the key principle in batch processing is we do not process each piece of data that comes in our system individually. Instead, we process that data in batches of records and we typically perform this processing on a fixed schedule. Though we can also run it based on a fixed number of records that we want to process. This batch processing can run on any schedule that fits our use case, which can be once a month, once a day, once an hour, and so on. The logic that goes into that job that processes this data is written by us and we can easily update it whenever we want to. Every time this batch processing job runs, it picks up the new raw data that came since the last time it ran, then it analyzes it and produces an up-to-date view of all the data we currently have. This view can be stored in a well structured and indexed database that we can easily query to get the desired insights. It's important to note that this view we generate should reflect the knowledge we have about our entire data set. And depending on the use case, this batch processing job can pick up only the data that arrived recently or it can process the entire dataset from scratch to provide us with a new view of the data we collected. Let's look at two perfect use cases for batch processing to illustrate the process. Let's assume we have an online learning subscription platform where we offer thousands of video courses to millions of students around the world. While students watch those video courses, they can leave a review and rate the course that they are currently watching to reflect their satisfaction from the course at that point in time. On our end what we get is a stream of two types of events. The first type of event indicates the progress a student is making by watching a video course. So for example, every additional minute that a student watches, we get an event into our system. We can roughly estimate that if we have a hundred million students on our platform, even if only 10% of them are active on our platform at a given moment, we can expect about a million events per minute. So this definitely falls within the scope of big data. Another type of event we get is a review and a star rating for a particular course. Now we do not need to process any of this data in real time. However, if we process it in batches and take into account historical data for each course, we can provide a lot of mission critical insights for our business. For example, we can use the data about the content consumption for each course to compensate our instructors based on the percentage of the content that was consumed from their courses out of all the available content on the platform. Also, based the ratings we get for each course, we can recalculate the average rating for each course daily. However, we can go much farther and fuse the data from those two types of events. For example, when we calculate the average rating for each course, we can provide higher weight to ratings that came from students who watched a higher percentage of the course and give lower weight to ratings that came from students who barely watched just a few lectures. This way if we have 10 students that left a one star rating and watched less than 10% of the course, and we have another 10 students who left a five star rating and watch the entire course, we can rate the course much closer to five stars with more confidence that this reflects the quality of the course more accurately. Another example of data fusion we can have from those two types of events is to rank the courses in each category based on the rating, as well as the overall engagement of the students in each course. Finally, we can combine all that data to build a machine learning model and try to predict what type of students can benefit from which courses and send them push notifications or display those courses at the top of the page. Another very common use case for batch processing is search engine services. When we have large amounts of data such as websites, articles, or images that we want to provide search capabilities for, if we try to scan all of it every time a user perform the search, it would take us hours to provide a result. So what search companies typically do is crawl the entire data set periodically and organize index and store it in a way that is very easy and fast to search for. So because this crawling process takes a very long time anyway and there is no real need or expectation that every new website or article would appear in the search results immediately, this is also a perfect use case for batch processing. So now that we got some intuition for batch processing from real use cases, let's talk about the advantages and disadvantages of this processing model. The first advantage of batch processing is that it's very easy to implement because we don't have to worry about latency. Batch processing also provides us with high availability because until a currently running job is done analyzing the entire dataset and providing the new view, the old view is still available for us for querying. So there's essentially no downtime for the users. Another advantage of batch processing is efficiency. Processing data in batches is usually a lot more efficient than processing each incoming record individually. With batch processing we also have a much higher fault tolerance towards human error. For example, if we push some bad code to our processing job, there is not much harm done because we still have our original data. So we can fix our bug, redeploy the processing job, and run it again to analyze our entire data set which would produce a new and correct view of our data. And finally, batch processing can perform very complex and deep analysis of large data sets that can span years of data points. This can provide us with very advanced prediction models and insights. On the other hand, batch processing has one major drawback that makes it not suitable for many use cases. This drawback is the long delay we have between the data coming into our system and the result we get from the processing job. If we run our batch processing job every day or every hour, we don't get a real time view of the data coming in. And if we don't get a real view of what's happening on our platform, we can't respond to it fast enough, which is extremely important in many use cases. This also forces our users to wait a long time before they can get feedback on actions they take on our system. This in turn may cause some confusion if they're not aware of the fact that we analyze our data in batches. For example, if we have an online store that uses batch processing to index different products on our platform for search purposes, a merchant who just added or updated their product description may be surprised why their product doesn't show up in the search results on our online store. In this particular case, we can simply inform them that those changes may take effect only after one business day. And they may be okay with that. But in other cases, batch processing just won't work. For example, if we have a log and metrics analysis system that collects streams of log files and data points from thousands of production application instances, we have to be able to ingest and visualize all this data in real time. Otherwise, if there's a production issue in our data center, the engineers on call have to have those logs and metric graphs available immediately so they can figure out how to fix the issue. Similarly, if we have an online stock trading system, we have to be able to take in both bids and asks, match them together so people can trade on our platform and also update the price on each stock in real time. So the second way we can process big data is real time processing. In real time processing, we put each new event that comes to our system in a queue or a message broker. And on the other end, we have a processing job that processes each individual data record as it comes. After that record is processed, the processing job updates the database that provides querying capabilities for real time visualization and analysis. The obvious advantage of the real time processing model is we can analyze and respond to data as it comes to our system immediately without having to wait hours for a job to process it. The drawback for real time processing is that it's very hard to do any complex analysis in real time. So we don't get as much value or insight into our system as we get if we process it using batch processing. Additionally, doing data fusion from different events that happened at different time points or analyzing historic data is also nearly impossible. So we're limited to the recent data that we currently have to provide those insights or predictions. In this lecture, we'll learn about two different strategies for processing large volumes of data into our system. Those two strategies are batch processing and real time processing. We covered quite a few use cases that fit each of those models and also compared the two strategies by listing their advantages and disadvantages. I will see you soon in the next lecture.