The Meltdown Exploit Explained (Part 1): The Basics

hardware news

It has been a while since the terms “Meltdown” and “Spectre” exploded on the Internet. These terms label what are considered by many to be two of the most severe security issues of the last few years.

 

Media and various parts of the Information Technologies (IT) security community call them the two most serious and dangerous security issues in recent times.

For most people – especially if they are not working in IT, the inner workings of Meltdown and Spectre remain a mystery. To others, who are involved in IT professionally, some questions remain unanswered as well.

This article mainly focuses on one single question: How does a Meltdown attack actually work on a technical level? We will explain without having to understand all the nasty details of modern CPU architectures! We’ll use a common savings bank as an example – a concept everyone is familiar with.

This article will NOT cover the following topics as they were already extensively discussed in other articles:

  • Spectre: This will be part of another article in this series.
  • What countermeasures should be taken for protecting affected systems?
  • How can an attacker exploit the weaknesses in a corporate or personal environment?
  • How existing countermeasures influenced updated systems?
Man with a lot of arms

The Root Of All Evil – Where Did The Problem Originate?

First of all, it is important to point out that Meltdown is based on a flaw inside hardware (the processor or CPU to be exact) and not software. It is basically a design flaw which has existed in a prevalent number of processors manufactured since 1995.  It did not bother anyone. Until now.

What went wrong all those years?

Let’s start with the basics. The following unrelated features need to be understood on a high level in order to understand the Meltdown attack:

  • Multitasking
  • Performance gaps between CPU and memory
  • Optimization techniques
  • Side-Channel Attacks
Rocket

Multitasking – Why is it possible to do more than one thing at a time on a modern computer?

Most of the modern operating systems (OSs) function in multitasking mode. What does that mean? By definition, a CPU executes only one command at a time. However, users expect that computers are able to execute various tasks (or processes) in parallel. This is not possible in current OS architectures. It is “simulated” by the CPU constantly switching between multiple tasks within milliseconds. This creates the illusion that multiple processes run in parallel at the same time.

Security of this multi-tasking feature relies completely on the assumption that the processor guarantees full isolation of each running process. If this assumption does not hold, in a worst case any process can literally access any data that is being processed by other processes. This would result in huge security problems. Such an isolation break would allow unauthorized processes to read, modify, or delete sensitive data such as passwords, cryptographic keys or private files.

To give a more vivid example: imagine my web browser with untrusted HTML and JavaScript code could all of a sudden read the highly confidential contents of a text file I’m working on in “parallel”. During the last two decades, processor manufacturers fully relied on this process isolation assumption and produced their processors accordingly.

Speed difference between CPU and RAM. graph
Figure 1: The difference between CPU and memory data processing speeds.

Another seemingly unrelated topic that needs introduction is the speed gap between CPUs and memory. Let’s reiterate who is doing what in a modern computer system:

  • The CPU is used for processing elementary instructions and data of the current process.
  • The memory (RAM) is used to hold all the instructions and data of the current process.

Over time, modern CPUs underwent heavy performance optimizations. During those optimizations, operating frequencies of processors grew much faster than operating frequencies of the memory. The CPU processes data much quicker than the memory manages to read and store it. This effectively creates a performance bottleneck.

The following graph visualizes this problem.

Idea screen and lamp

These performance bottlenecks forced CPU manufacturers to develop a variety of different optimization techniques which led to additional layers between the CPU and memory. As a result, two new concepts were introduced:

  1. Out-of-order execution blocks – an intelligent CPU optimization technique for making smart guesses and anticipating what to do next.
  2. Processor’s cache (or just cache) – very fast memory with a small storage capacity located very close to the CPU. Only data is stored here.

Out-of-order execution blocks make assumptions by themselves about which data will be needed in the near future by the running process. Based on those decisions, data is simply fetched in advance from memory and stored in cache by the CPU:

  • If the assumption turns out to be correct and if the running process has the permissions to access the fetched data, the data is processed.
  • In all other cases it is just thrown away by the processor.

The applications and operating system do not have any influence on this behaviour at all. It is all hardcoded in the hardware.

 

Introduction to Side Channel Attacks

The root cause that is abused by the attacker during the Meltdown exploit

So far, the features described above do not seem to be problematic. However, there are two key issues where things start to go wrong:

  • Problem 1: Fetching from memory to cache without checking permissions first. The permissions for accessing the data by the affected CPUs are checked after fetching data from memory to cache.
  • Problem 2: Not Deleting Data from Cache. Data stored in cache is not immediately cleaned after checking permissions – even if the permission check failed.

This is the point where a so called side channel opens and starts to leak valuable information to an attacker. By definition, a side channel is an implicit source of information that might be used by an attacker to disclose sensitive data. Side-channel attacks are not new to the IT-security world. Common attacks which were successful in the past were based on changes in power consumption, electromagnetic radiation or timing fluctuations in the system’s performance.

For instance, in March 2008 the KeeLoq system was broken by high-precision measurement of the device’s power consumption during encryption. KeeLoq is a proprietary hardware-dedicated block cipher designed by Microchip Inc. It is used in a variety of remote car and building control systems. As a result, the lock system was completely compromised.

Summary of all used components in a modern vulnerable system

To summarize, the presence of the following standard components made modern computers vulnerable to Meltdown:

Component Description
Abstract computer system Simplified model of a modern (vulnerable) computer system which consists of a memory, CPU and cache.
Processes running on a computer system The full set of active applications and their components.
Malicious process eager to get secret data Process running in the system trying to get unauthorized access to data of other processes.
Secret data the attacker wants to steal Data used by a process and protected from reading, modification and deletion by other processes.
CPU Central Processing Unit. An integrated circuit which executes instructions of the currently running process.
Memory/RAM The memory (RAM) is used to store all the instructions and data of the current process. A specific memory (RAM) location can be accessed by the CPU using an address in the memory.
Cache of CPU Very fast and small memory located very close to the CPU. It is used for storing data (not instructions!).
Out-of-order execution block in CPU Optimization technique which predicts which data will be needed soon. It fetches data in advance from memory and stores it in the cache of the CPU.
Side channel abused by attacker An implicit source of information (e.g. changes in power consumption or electromagnetic radiation) disclosing sensitive data.

If you would like a practical explanation, then stay tuned for the next article about the Meltdown mechanism and how to illustrate the attack using the metaphor of a simple banking institution: The Meltdown-Exploit explained (Part 2): the bank robbery. Stay tuned!