Skip to content

cloudflare r2

2 posts with the tag "cloudflare r2"

Cloudflare R2 for AI Training Data: Why Zero Egress Changes the Math

Cloudflare R2 as a home for AI training data, with zero egress on repeated reads

Why Egress Is the Hidden Tax on Training Data

Section titled "Why Egress Is the Hidden Tax on Training Data"

Training a model means reading the same dataset over and over, once per epoch, often from GPUs that sit outside your storage provider's network. On most object stores you pay an egress fee every time that data leaves the bucket. Cloudflare R2 does not charge egress fees, so reading a dataset a hundred times costs the same in transfer as reading it once. For read-heavy AI work, that quietly changes the math.

People size storage by the price per terabyte and then get surprised by the transfer line on the bill. For an archive you rarely open, egress barely matters. For a training set you stream through a data loader thousands of times, egress is the cost.

What Makes Training Data Different From an Archive

Section titled "What Makes Training Data Different From an Archive"

Training data has a few traits that make egress the deciding factor:

  • It is read many times. Every epoch reads the whole set again. Hyperparameter sweeps and multiple runs multiply that.
  • It is large. Image, video, audio, and text corpora run to terabytes, and embeddings pile on more.
  • The compute is often elsewhere. GPUs in another cloud or a rented cluster mean the data crosses a network boundary on every read, which is exactly what egress charges for.

Put those together and a metered-egress store can cost more to read than to hold.

Two properties do the work. First, R2 does not charge egress fees, so repeated reads from outside Cloudflare do not accumulate transfer cost. Second, R2 is S3-compatible, so the data loaders, SDKs, and tools your pipeline already uses point at it by changing the endpoint and the keys. You do not rewrite your training code to adopt it.

A couple of honest caveats, because the math is not free in every direction. R2 has its own operation and request considerations, and throughput depends on how your loader and network are set up. If your training compute lives in the same cloud as your current data, reads inside that cloud may already avoid egress, so R2's advantage is largest when storage and compute would otherwise sit on different networks. Confirm Cloudflare's current terms before you commit a pipeline to them.

A training corpus rarely starts life in one place. It is scraped to a local disk, staged in an S3 bucket, or scattered across a few accounts from different collaborators. Consolidating it into one R2 bucket is the setup step.

Blober moves data into R2 directly from AWS S3, Backblaze B2, Wasabi, DigitalOcean Spaces, Azure Blob, Dropbox, Google Drive, or local storage. It copies in parallel, keeps the folder structure intact, and has skip-existing, so the first run stages the whole corpus and later runs only carry the new files as the dataset grows. You are not downloading the set to a laptop and pushing it back up, which matters when the corpus is bigger than any one machine's disk.

  1. Choose R2 as the dataset home if your training compute reads it repeatedly from outside Cloudflare.
  2. Stage the corpus into an R2 bucket with Blober, in parallel and with structure preserved.
  3. Point your S3-compatible data loader at the R2 endpoint and train.
  4. Re-run Blober with skip-existing as you add data, so only the new files move.

Keep a second copy somewhere else as well. One bucket is one copy, and the 3-2-1 rule applies to a dataset you cannot easily recreate just as much as to family photos.

Does Cloudflare R2 charge egress fees? No. R2 does not charge egress fees for reading your data out, which is its main draw for read-heavy workloads like model training. Confirm the current terms on Cloudflare's site before committing.

Is Cloudflare R2 good for machine learning datasets? Yes, especially when your training compute reads the dataset repeatedly from outside Cloudflare's network. Zero egress removes the per-read transfer cost that dominates training storage bills.

Is R2 S3-compatible for data loaders? Yes. R2 exposes an S3-compatible API, so existing S3 data loaders, SDKs, and tools work by changing the endpoint and credentials.

How do I move my training data into R2? Use a tool that transfers directly and in parallel. Blober stages datasets into R2 from S3, B2, Wasabi, Spaces, Azure Blob, and local storage, with skip-existing for incremental updates.

Stage your training data into R2 without a scripting project. Blober moves datasets into R2 from S3, B2, Wasabi, Spaces, Azure Blob, and local storage, in parallel and with structure intact.

Download Blober at blober.io

How to Move Data from Azure Blob Storage to Cloudflare R2

Move data from Azure Blob Storage to Cloudflare R2 with Blober

Azure Blob Storage charges $0.087 per GB for data leaving their network. If you serve 1 TB of files per month to users or external systems, that is $87/month in egress alone, on top of storage costs.

Cloudflare R2 charges $0 for egress. Zero. Nothing. You pay for storage ($0.015/GB/month) and operations, but downloading data from R2 is free. For applications that serve files to users, APIs, CDNs, or other services, switching to R2 can cut your cloud bill significantly.

The most common reason is cost. If your Azure Blob account is mostly used for serving static assets, media files, backups that get restored frequently, or API responses, the egress fees can dwarf your storage costs. R2 removes that variable entirely.

Another reason is simplicity. R2 is S3-compatible, meaning any tool or SDK that works with S3 works with R2. If your application already uses the S3 API (many do, even on Azure), the migration is mostly about moving data and updating the endpoint.

Blober supports both Azure Blob Storage and Cloudflare R2 as native providers. The transfer works like any other Blober workflow: connect both accounts, select files, run.

Step 1: Connect Azure Blob Storage

Section titled "Step 1: Connect Azure Blob Storage"

Add Azure Blob as a provider with your connection string. Blober lists your containers and their contents.

Add Cloudflare R2 as a provider. You will need your Account ID along with an S3-compatible Access Key ID and Secret Access Key from the Cloudflare dashboard. If you also provide a Cloudflare API token, Blober can list your buckets through Cloudflare's native API with server-side pagination, which is more efficient for accounts with many buckets.

Set Azure Blob as the source and Cloudflare R2 as the destination. Browse your Azure containers, select the files or containers you want to migrate, and choose the destination bucket in R2.

Blober streams data from Azure through your machine to R2. It uses parallel uploads on both ends, so large files move efficiently. If the transfer is interrupted, Blober resumes from where it stopped.

What About Azure Egress Fees During Migration?

Section titled "What About Azure Egress Fees During Migration?"

This is the unavoidable part. Moving data out of Azure means paying egress. For the initial migration, you will pay $0.087/GB to get your data from Azure to your machine (where Blober runs), and from there to R2.

For 1 TB, that is about $87 in egress fees. That is a one-time cost. After the migration, your ongoing egress from R2 is $0.

If you were paying $87/month in Azure egress, the migration pays for itself in the first month.

Data SizeAzure Egress Cost (one-time)Monthly Savings on R2
500 GB~$43Depends on egress pattern
1 TB~$87Up to $87/month
5 TB~$435Up to $435/month
10 TB~$870Up to $870/month

This matters because your application code likely uses the AWS SDK or an S3-compatible client. After migrating data to R2, updating your app is often as simple as changing the endpoint URL and credentials. No SDK changes, no API rewrites.

Blober connects to R2 using the same S3 protocol, so the transfer is seamless.

When Azure Is Still the Right Choice

Section titled "When Azure Is Still the Right Choice"

R2 is excellent for serving files and eliminating egress. But Azure has features that R2 does not:

  • Storage tiers (Hot, Cool, Cold, Archive) for lifecycle cost optimization
  • Geo-redundant replication built into the platform
  • Azure Functions and event triggers tied to blob operations
  • Enterprise compliance certifications that some industries require

If you need those features, Azure is worth the egress premium. Many teams keep some data on Azure (for processing and compliance) and move the served/public data to R2 (for cost savings).

One-time purchase. Transfer as much data as you need.

Download Blober at blober.io