There are two common ways to scale your application: Vertical Scaling and Horizontal Scaling .

Vertical & Horizontal Scaling

Let’s say you have your own web app and it’s running on a single server to start, the number of users rapidly growing, and you may start to meet some performance bottleneck in following aspect:

  • CPU.
    • If you are doing a certain function that requires a lot of processing power, like image conversion or video decoding.
  • Memory.
    • Your app may need to hold lots of data in memory and it got full, and it won’t be able to process as many requests at the same time.
  • I/O.
    • I/O is how fast your application can read from your storage.
    • If you store some images, or videos on the hard drive, there will be a limit on how much data you can access at once.
  • Bandwidth.
    • If you are steaming, the amount of data that you can push through the network is also limited through a single server.

Vertical Scaling (Scale-Up)

If your server is slow, a simple way to resolve this is to get a bigger server with more resource, i.e. get more CPUs, or more memory, or more storage spaces. That’s what Vertical Scaling or Scale-Up means.

  • Easiest way to scale an application.
  • Diminishing return, limits to scalability.
    • The numbers of CPU cores you add, or the amount of memory there’s a limitation. When it’s getting lager, you get less effect for more money.
  • Single point of failure.

Horizontal Scaling (Scale-Out)

Instead of having a single huge server to handle all the traffic, we have multiple smaller servers that can be scaled up and scaled down.

  • More complexity up front, but more efficient long term.
  • Redundancy built-in.
  • Need load balancer to distribute traffic.
  • Cloud providers make this easier.


  • Hadoop.
    • Hadoop is based on MapReduce, it breaks up a massive amount of data and split it off, so that it can be worked on by thousands of different servers. Then it puts that data back together so these results come back.
    • It’s essentially abstracting away all that complexity related to handling these thousands of servers.
  • Docker.
    • Docker allows you to put your applications in containers and easily deploy them to various servers.
  • Kubernetes
    • Kubernetes is based on docker, and it’s abstracting away a lot of complexity again of dealing with all those various servers.


Diminishing Returns - Vertical Scaling

web server elapsed time

Figure: Web Server Elapsed Time .

The above is diagram about Total Elapsed time with different server spec for different web server config.

As we can see for the server Apache 2.2, the blue bar means just 1 core with about 5hours, the red bar with 2 cores reduced to about 2hours40mins, green bar with 4 cores drops to 1hour30mins, and finally the purple bar with 8 cores reaches to 50mins. We can clearly see the trend that, with more CPU cores you are not doubling up the performance.

That’s so called Diminishing Returns.

Latency - Horizontal Scaling

cloud provider data center location


  • Initially the cost for Horizontal is higher than Vertical Scaling, that’s primarily because the complexity that we introduces from multiple servers and etc.
  • As we begin to scale, the price is becoming more linear. Because we pay upfront the cost of designing our system to work and be able to scale horizontally.
  • With Vertical Scaling, scalability you need more capacity these components and servers get much more expensive, and eventually it goes exponential from cost perspective.
  • It’s past a certain point if you’re trying to vertically scale, you gonna have to transit to horizontal scaling system, and change the architecture of your app which will take lots of money and time from engineering perspective.

Vertical vs Horizontal Scaling

Figure: Horizontal scaling becomes much cheaper after a certain threshold