The AWS S3 Denial of Wallet Amplification Attack
If you publicly host large data files on AWS S3 and pay for AWS transfer costs, you may be vulnerable to a “Denial of Wallet” amplification attack. Even if you are not hosting data publicly, you may be exposed to a malicious third-party attack or even a programming error, that can cause major costs within a short time frame.
Introduction
Cloud computing enables companies to build fast-scaling applications. This allows for new ways of analyzing data, especially in medicine or bioinformatics. But with the ease of resource scaling up and down, the way of cost control changes significantly for the cloud customers.
By design, resources are scaling in cloud environments, and this results in rising costs at the same time. Having control of the costs is a difficult job. Even though AWS offers its own pricing calculator, it is so difficult that many companies have products to assist in cost control or cost overview, see for example “AWS Costs: Surprise, Surprise? It Shouldn’t Be!”
Because of the large variety of cloud services and their cost models, it is very important for customers that they can rely on certain assumptions. A simple assumption is that downloading of some S3 data (via the Internet) costs an amount of money that is proportional to the amount of data downloaded (see S3 pricing structure). With such an assumption people build software and have some rough idea about the costs a certain service may incur.
Since cloud resources scale so easily, cloud users must take certain precautions to prevent malicious attacks that may incur unintended costs. For example, if you experience a DDoS attack on your specific application, the costs may rise significantly because your cloud resources can be configured to scale according to demand. You need to implement multiple mechanisms to prevent this. Cloud providers even create tools to reduce the attack surface (see here for AWS best practices on this topic).
The Denial of Wallet amplification attack
Here, we will provide a previously unknown example of a potential attack on cloud resources where the assumed costs differ significantly from the real costs. This is about how costs for AWS S3 egress (transfer into the Internet) are calculated. With a very small number of special requests, it is possible to generate costs that differ from the actually downloaded data by a factor of 50. We coined this scenario a “’Denial of Wallet’ Amplification Attack”. It is not an attack against your service but against your wallet.
We publish this information in a blog post because we ourselves were caught unaware of this situation and we feel the need to raise awareness about this issue, as it could be exploited by malicious third parties.
Data sets in health care may be large
In the healthcare industry, many other large companies and organizations use AWS S3 and similar cloud platforms for data storage or processing. For example, there are many public bioinformatics data repositories, also from government agencies, that provide large files on S3 publicly (e. g. NCBI SRA or gnomAD). Even cloud-computing platforms of large sequencing instrument manufacturers are affected. Having cost control of data storage and data transfer is very important if your data volume is increasing that much.
Attack amplifying S3 egress costs
One day we experienced an anomaly of S3 egress costs. We analyzed this anomaly and found that the amount of data actually downloaded did not match the amount on our invoice. With our partners’ help, we reproduced this behavior and identified a minor bug in a bioinformatics library. It was an accident. Unfortunately, this accidental behavior could also have been ‘weaponized’ in an attack.
After contacting the AWS Support we were directed to the following details of the S3 pricing page (highlighting by us):
Data Transfer Out may be different from the data received by your application in case the connection is prematurely terminated by you, for example, if you make a request for a 10 GB object and terminate the connection after receiving the first 2 GB of data. Amazon S3 attempts to stop the streaming of data, but it does not happen instantaneously. In this example, the Data Transfer Out may be 3 GB (1 GB more than 2 GB you received). As a result, you will be billed for 3 GB of Data Transfer Out.
Okay, this is half the explanation. AWS customers are not billed for the data actually transferred to the Internet but instead for some amount of data that is cached internally.
The main problem is the potential difference between the amount of data received by an entity outside of AWS and the amount of data on your bill. In the example above from the AWS documentation, the difference is a factor of 1.5 (2GB sent, 3GB billed). In our real-world example, the difference was almost a factor of 50. This means 3TB sent and 130TB billed. That can make a difference between “we do not care” and “whaaat”?
The extent of the problem
It may be even worse: we were able to reproduce the scenario where we downloaded 300MB of data in 30sec from AWS S3 and were billed for more than 6GB by AWS. If an attacker can induce costs for 6GB in 30sec how much costs can be generated in a day, or on the weekend when running many threads in parallel? Note that AWS S3 is highly available, and you may never reach any bandwidth limits in real-world scenarios. That should scare anyone who hosts large files on a public S3 bucket, because the egress costs can skyrocket in this scenario.
The good news: we only observed this behavior for large files (>1GB), when software clients download them via HTTP(S) RANGE requests. With range requests, the client can request to retrieve a part of a file, but not the entire file. By quickly canceling such requests, a client can request parts of a file without downloading all the data. Due to the way AWS calculates egress costs the transfer of the entire byte range is billed, even though the requests are canceled before the data is sent. There may also be dependencies on access patterns and timing, but we were able to reproduce it across different buckets and files and over several weeks (so this is not a fluke).
This means that everybody who hosts large files on S3 is at risk, especially if these files are publicly accessible.
Potential remedies
Can I prevent this attack?
Do not host large files on S3 if they can be accessed by range requests in a way that you cannot control. If you are relying on hosting large S3 data sets publicly, then you cannot prevent this attack. AWS best practice guidelines recommend restricting data access to S3 buckets.
What does AWS recommend?
We do not exactly know what the recommendation of AWS is for companies relying on hosting data publicly, but we did not find any official documentation that has any recommendations in that regard for files in public S3 buckets.
For example, the “Security best practices for Amazon S3” state:
Unless you explicitly require anyone on the internet to be able to read or write to your S3 bucket, make sure that your S3 bucket is not public.
But AWS does not say anything about those cases where you do need public access (at least we did not find any under the AWS best practices for storage).
By the way: restricting access to your S3 buckets may not prevent you from incurring costs for unauthorized requests as @maciej.pocwierz found out: https://medium.com/@maciej.pocwierz/how-an-empty-s3-bucket-can-make-your-aws-bill-explode-934a383cb8b1.
Are there workarounds to mitigate the attack?
Yes. As the first and most important step you should create cost alerts. We suggest to activate AWS Cost Anomaly Detection. When activating this tool and configuring it correctly you will be informed about abnormal billing events. At least in our case, this helped us to identify such an adverse event after a short time frame limiting its financial impact.
Sadly, preventing such an attack is no easy task because of the design of AWS S3. The service is designed to serve large amounts of data as fast as possible. You could monitor HTTP API requests to your S3 buckets which is possible with a delay of a few hours. If you see an unusually high amount of API requests, you could prohibit access to the resource. But this solution is a brutal last resort, potentially disrupting your service entirely.
So, if I’m not hosting public S3 files I’m safe?
Unfortunately, no, as we observed a similar behavior when serving S3 files via pre-signed URLs, for example.
Acknowledgement
We thank Stephan Drukewitz from the Institute for Human Genetics, Leipzig, for his help in troubleshooting the issue. Also thanks to Martin Garbe and Roland Ewald who substantially researched and authored this article.
We thank the AWS Security team for their feedback on earlier drafts of this post.
Edits
2024–05–02: Clarified that not the entire file gets billed on a range request, but the egress of the entire request byte range (even though the request was canceled and no data transferred).