System/Storage IO Performance: IOPS, Latency, Bandwidth, and Throughput – Cloud Inspired AI

The terms IOPS, latency, bandwidth, and throughput used very frequently associated with IO performance. It is often confusing to see the IOPS numbers does not correlate to throughput among various system or storage configuration scenarios. It is important to understand these terms and various factors influencing the overall IO performance. Here is a quick reference about these terms, influencing factors, and a few scenarios with examples on these concepts.

IOPS: Input/output operations per second

IOPS stands for read requests from a storage system and write requests to a storage system that can take place within one second. You might use an IOPS figure to describe the amount of I/O created by a database, or you might use it when defining the maximum performance of a storage system. One is a real-world value and the other a theoretical maximum, but they both use the term IOPS.

Latency for random and sequential I/O

For Hard disk drives, every time you need to access a block on a disk drive, the disk actuator arm has to move the head to the correct track (seek time), then the disk platter has to rotate to locate the correct sector (rotational latency). This mechanical action takes time (seek time + rotational latency) which we call it as IO latency.

Obviously the amount of time depends on where the head was previously located and how fortunate you are with the location of the sector on the platter: if it’s directly under the head you do not need to wait, but if it just passed the head you have to wait for a complete revolution. Even on the fastest 15k RPM disk that takes 4 milliseconds (15,000 rotations per minute = 250 rotations per second, which means one rotation is 1/250th of a second or 4ms) rotation latency to read or write each block. On top of that you will have an extra 1ms of seek time, so you will have a total of around 5ms of IO latency waiting to read or write that block. You will need to read or write a large number of blocks.

What about the next block? Well, if that next block is somewhere else on the disk, you will need to incur the same penalties of seek time and rotational latency. We call this type of operation a random I/O. But if the next block happened to be located directly after the previous one on the same track, the disk head would encounter it immediately afterwards, incurring no wait time (i.e. no latency). This, of course, is a sequential I/O.

Bandwidth or Throughput

These are used to describe volumes of data and are slightly different from IOPS. Bandwidth is usually used to describe the maximum theoretical limit of data transfer, while throughput is used to describe a real-world measurement. You might say that the bandwidth is the maximum possible throughput. Bandwidth and throughput figures are usually given in units of size over units of time, e.g. Mb/sec or GB/sec.

Throughput is defined as, IOPS x I/O size. It is time to start thinking about that I/O size now. If we read or write a single random block in one second, then the number of IOPS is 1 and the I/O size is also 1 (unit of “blocks” used to keep things simple). The Throughput can therefore be calculated as (1 x 1) = 1 block / second.

Alternatively, if we wanted to read or write eight contiguous blocks from disk as a sequential operation then this again would only result in the number of IOPS being 1, but this time the I/O size is 8. The throughput is therefore calculated as (1 x 8) = 8 blocks / second.

Hopefully, you can see from this example the great benefit of sequential I/O on disk systems: it allows increased throughput. Every time you increase the I/O size you get a corresponding increase in throughput, while the IOPS figure remains resolutely fixed. But what happens if you increase the number of IOPS?

Latency impacts disk performance on random I/O

In the example above, a single-threaded process reading or writing a single random block on a disk. That I/O results in a certain amount of latency, as described earlier on (the seek time and rotational latency). We know that the average rotational latency of a 15k RPM disk is 4ms, so let us add another millisecond for the disk head seek time and call the average I/O latency 5ms. How many (single-threaded) random IOPS can we perform if each operation incurs an average of 5ms wait? The answer is 1 second / 5 ms = 200 IOPS. Our process is hitting a physical limit of 200 IOPS on this disk.

What do you do if you need more IOPS? With a disk system you only really have one choice: add more disks. If each spindle can drive 200 IOPS and you require 80,000 IOPS then you need (80,000 / 200) = 400 spindles. It does not make sense.

On the other hand, if you can perform the I/O sequentially you may be able to reduce the IOPS requirement and increase the throughput, allowing the disk system to deliver more data.

IO performance is determined based on pattern, size, and mix. It is more relevant if you tie this with latency. It is mainly useful for random workloads.

Pattern: Sequential vs. Random IO

How the IOPS are read or written. It could be random or sequential. Random IOP patterns operate in a uniformly scattered manner across the data set. Sequential IOP patterns operate on data that is located close to each other.

Operation size: Small, large, and extra-large

Operation size of the data being written or read. They are in small (4 to 8 KB), large (~64 KB), and extra-large (512 KB – 2+ MB) ranges. It is common to see small operational sizes with random patterns and large operational sizes with sequential patterns.

Mix: Reads vs. writes

Mix is the composition of reads and writes. This is usually represented by a % reads and % writes in the given scenario.

Here are some examples of workload distribution. OLTP data base workloads are random patterns, 8 KB in size, mix of 70% reads/30% writes. Data backup workloads are sequential in pattern, 64 KB in size and mix of 100% reads. Video workloads are sequential in pattern, 512+ KB in size, mix of variable reads and writes.