A recent news item got me reading on new S3 features, and I discovered one I’d missed. Here we go.
S3 Express One Zone
Faster, cheaper data retrieval single AZ S3. Why? Why not?
Firstly here’s the announcement: https://aws.amazon.com/blogs/aws/new-amazon-s3-express-one-zone-high-performance-storage-class/
Pros
Faster
This really is lower latency. If you’ve a whole bunch of systems needing to access shared storage this could do the trick. This lower latency isn’t just a nice thing, it can save money if it reduces the amount of compute time required. This isn’t conjecture (see: https://aws.amazon.com/blogs/storage/amazon-s3-express-one-zone-delivers-cost-and-performance-gains-for-chaossearch-customers/), but clearly you’ll need to be optimising heavily to actually benefit from a saving here.
Still, speed is nice, if you need it.
Cheaper data retrieval
The data retrieval cost is lower. From https://aws.amazon.com/s3/pricing/:
(per 1000 request) | PUT/POST etc | GET etc |
Standard | $0.005 | $0.0004 |
Express | $0.0025 | $0.0002 |
I can see this being awesome for lots of requests for small objects. I used to work with HPC people in the past and they really needed just fast access to shared storage of small files. They used NFS and then GPFS, but this could fit with similar workloads, provided they were okay with object instead of file based storage. More on that later…
Cons
More expensive storage
Yes, data retrieval is cheaper, but the actual storage cost is higher than S3 Standard.
Using the same pricing page as above, storage/GB for Standard is $0.023 to $0.023, Express One Zone is $0.16 per GB. That’s a really significant hike, so this is clearly not going to be good for general storage.
Single AZ
Yes, this in a single AZ – which I reckon makes the lower latency make sense, the data’s going to be in the same data centre. But having the data only in one AZ carries a risk. The normal S3 hardware resilience is built in (lots of 9s there), except in case of AZ fire, flood etc. If the data centre has a catastrophe the data might be gone for good. How much is that a problem?
Given the cost and use case I’m seeing this as not being the primary location of this data – anything really needed would shift to Standard at some point.
Limited Regions
In time-honoured tradition, this is only available in a few regions right now, but it should be rolled out to others. For now, it’s N.Virginia, Oregon, Tokyo, Stockholm.
Conclusion
I’m seeing this really as a expensive fast local cache – it’s 8x the price, but it’s closer to your compute and faster (and cheaper) to get the data out of. That it’s an object (not file) store can be overcome with the use of…
S3 Mount
A while back I worked on a project requiring s3 to be mounted in a Linux system – we used s3fs because that’s what there was, and it really worked fine. Since then, AWS has released S3Mount – what’s this about?
S3Mount also offers a mountpoint for S3, but it has some pretty big differences. Basically, S3mount is more performant, but has fewer features. If you need high throughput reading from multiple clients (maybe using S3 Express One Zone?), it’s great. If you want to be able to rename a file/object it can’t. S3fs is more general purpose and implements more POSIX features, but I don’t think it’ll be anywhere near as fast. EFS or FSx might be the right fit, ymmv.
S3Mount slightly deeper details: https://github.com/awslabs/mountpoint-s3/blob/main/doc/SEMANTICS.md