Showing posts with label storage costs. Show all posts
Showing posts with label storage costs. Show all posts

Friday, March 14, 2025

Archival Storage

I gave a talk at the Berkeley I-school's Information Access Seminar entitled Archival Storage. Below the fold is the text of the talk with links to the sources and the slides (with yellow background).

Friday, July 8, 2022

Economic Model Revived

Five years ago, at the urging of the Internet Archive, I implemented a simple interactive version of the Economic Model of Long-Term Storage in Python. It estimated the endowment needed to store a Terabyte for 100 years based on a set of parameters that people could vary. It ran on a Raspberry Pi at the end of our Sonic DSL home Internet connection. The endowment is the capital which, together with the interest it earns, is enough to cover the costs incurred stoing the data for the century.

Alas, the Pi became a casualty when, early in the pandemic, we upgraded to the wonderful Sonic gigabit fiber (Best ISP Evah!), needed to support multiple grandkids each in a different virtual school.

Fortunately, Sawood Alam at the Internet Archive has forked the code, re-implemented it in Javascript, improved the user interface, and deployed it at Github. This new version is once again available here. I'm very grateful to Sawood and the team at the Internet Archive, both for pushing me to do the initial implementation, and now for bringing it back from the dead.

Below the fold I have a couple of caveats.

Thursday, March 25, 2021

Internet Archive Storage

The Internet Archive is a remarkable institution, which has become increasingly important during the pandemic. It has been for many years in the world's top 300 Web sites and is currently ranked #209, sustaining almost 60Gb/s outbound bandwidth from its collection of almost half a trillion archived Web pages and much other content. It does this on a budget of under $20M/yr, yet maintains 99.98% availability.

Jonah Edwards, who runs the Core Infrastructure team, gave a presentation on the Internet Archive's storage infrastructure to the Archive's staff. Below the fold, some details and commentary.

Friday, May 15, 2020

Economics Of Decentralized Storage

Almost two years ago, in The Four Most Expensive Words in the English Language , I wrote skeptically about the economics of decentralized storage networks. I followed up two months later with The Triumph Of Greed Over Arithmetic. Now, Got a few spare terabytes of storage sitting around unused? Tardigrade can turn that into crypto-bucks is Thomas Claiburn's report on initial experience with Tardigrade, the "decentralized" storage network from Storj Labs. Below the fold, some more skepticism.

Tuesday, March 31, 2020

Archival Cloud Storage Pricing

Although there are significant technological risks to data stored for the long term, its most important vulnerability is to interruptions in the money supply. The current pandemic is likely to cause archives to suffer significant interruptions in the money supply.

In Cloud For Preservation I described how much of the motivation for using cloud services was their month-by-month pay-for-what-you-use billing, which transforms capital expenditures (CapEx) into operational expenditures (OpEx). Organizations typically find OpEx much easier to justify than CapEx because:
  • The numbers they look at are smaller, even if what they add up to over time is greater.
  • OpEx is less of a commitment, since it can be decreased if circumstances change.
Unfortunately, the lower the commitment the higher the risk to long-term preservation. Since it doesn't deliver immediate returns, it is likely to be first on the chopping block. Thus both reducing storage cost and increasing its predictability are important for sustainable digital preservation. Below the fold I revisit this issue.

Tuesday, September 24, 2019

Promising New Hard Disk Technology

It has been too long, two-and-a-half years, since the last of Tom Coughlin's Storage Valley Supper Club events. But he just organized one to coincide with the Flash Memory Summit. It featured an extremely interesting talk by Karim Kaddeche, CEO of L2 Drive, a company whose technology seems likely to have a big impact on the hard disk market. Follow me below the fold for the explanation. I didn't take notes, so what follows is from memory. I apologize for any errors.

Thursday, August 29, 2019

SSD vs. HDD (Updated)

IDC & TrendForce data
via Aaron Rakers
Chris Mellor's How long before SSDs replace nearline disk drives? starts with a quote I think the good Dr. Pangloss would love:
Aaron Rakers, the Wells Fargo analyst, thinks enterprise storage buyers will start to prefer SSDs when prices fall to five times or less that of hard disk drives. They are cheaper to operate than disk drives, needing less power and cooling, and are much faster to access.
Below the fold, some skepticism.

Thursday, May 16, 2019

Review Of Data Storage In DNA

Luis Ceze, Jeff Nivala and Karin Strauss of the University of Washington and Microsoft Research team have published a fascinating review of the history and state-of-the-art in Molecular digital data storage using DNA. The abstract reads:
Molecular data storage is an attractive alternative for dense and durable information storage, which is sorely needed to deal with the growing gap between information production and the ability to store data. DNA is a clear example of effective archival data storage in molecular form. In this Review , we provide an overview of the process, the state of the art in this area and challenges for mainstream adoption. We also survey the field of in vivo molecular memory systems that record and store information within the DNA of living cells, which, together with in vitro DNA data storage, lie at the growing intersection of computer systems and biotechnology.
They include a comprehensive bibliography. Below the fold, some commentary and a few quibbles.

Thursday, May 2, 2019

Lets Put Our Money Where Our Ethics Are

I found a video of Jefferson Bailey's talk at the Ethics of Archiving the Web conference from a year ago. It was entitled Lets Put Our Money Where Our Ethics Are. The talk is the first 18.5 minutes of this video. It focused on the paucity of resources devoted to archiving the huge proportion of our culture that now lives on the evanescent Web. I've also written on this topic, for example in Pt. 2 of The Amnesiac Civilization. Below the fold, some detailed numbers (that may by now be somewhat out-of-date) and their implications.

Thursday, March 21, 2019

Cost-Reducing Writing DNA Data

In DNA's Niche in the Storage Market, I addressed a hypothetical DNA storage company's engineers and posed this challenge:
increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.
Now, a company called Catalog plans to demo a significant step in the right direction:
The goal of the demonstration, says Park, is to store 125 gigabytes, ... in 24 hours, on less than 1 cubic centimeter of DNA. And to do it for $7,000.
That would be 1E11 bits for $7E3. At the theoretical maximum 2 bits/base, it would be $3.5E-8 per base, versus last year's estimate of 1E-4, or around 30,000 times better.

If the demo succeeds, it marks a major achievement. But below the fold I continue to throw cold water on the medium-term prospects for DNA storage.

Tuesday, March 5, 2019

Demand Is Far From Insatiable

Based on numbers that IDC conjures from thin air, pundits believe that demand for storage is insatiable because everyone says Lets Just Keep Everything Forever In The Cloud. That idea assumes storage is free, but Storage Will Be Much Less Free Than It Used To Be. (Both links are from 2012). Below the fold I look at some real-world numbers showing how much storage actual customers are buying.

Tuesday, February 26, 2019

Economic Models Of Long-Term Storage

My work on the economics of long-term storage with students at the UC Santa Cruz Center for Research in Storage Systems stopped about six years ago some time after the funding from the Library of Congress ran out. Last year to help with some work at the Internet Archive I developed a much simplified economic model, which runs on a Raspberry Pi.

Two recent developments provide alternative models:
  • Last year, James Byron, Darrell Long, and Ethan Miller's Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems (also here) reported on a vastly more sophisticated model developed at the Center. It includes both much more detailed historical data about, for example, electricity cost, and covers various media types including tape, optical, and SSDs.
  • At the recent PASIG Julian Morley reported on the model being used at the Stanford Digital Repository, a hybrid local and cloud system, and he has made the spreadsheet available for use.
Below the fold some commentary on all three models.

Thursday, November 8, 2018

What's Happening To Storage?

My only post about storage since May, was October's Betteridge's Law Violation, another critique of IDC's Digital Universe, and their constant pushing of the idea that the demand for storage is insatiable. So its time for an update on what is happening in the real world of storage media, instead of IDC's Universe. Below the fold, some quick takes.

Tuesday, November 6, 2018

Making PIEs Is Hard

In The Four Most Expensive Words In The English Language I wrote:
Since the key property of a cryptocurrency-based storage service is a lack of trust in the storage providers, Proofs of Space and Time are required. As Bram Cohen has pointed out, this is an extraordinarily difficult problem at the very frontier of research.
The post argues that the economics of decentralized storage services aren't viable, so the difficulty of Proofs of Space and Time isn't that important. All the same, this area of research is fascinating. Now, in One File for the Price of Three: Catching Cheating Servers in Decentralized Storage Networks Ethan Cecchetti, Ian Miers, and Ari Juels have pushed the frontier further out by inventing PIEs. Below the fold, some details.

Thursday, October 18, 2018

Betteridge's Law Violation

Erez Zadok points me to Wasim Ahmed Bhat's Is a Data-Capacity Gap Inevitable in Big Data Storage? in IEEE Computer. It is a violation of Betteridge's Law of Headlines because the answer isn't no. But what, exactly, is this gap? Follow me below the fold.

Tuesday, September 4, 2018

Chia Network

Back in March I wrote Proofs of Space, analyzing Bram Cohen's fascinating EE380 talk. I've now learned more about Chia Network, the company that is implementing a network using his methods. Below the fold I look into their prospects.

Friday, August 24, 2018

Triumph Of Greed Over Arithmetic

I discussed FileCoin's ICO in The Four Most Expensive Words in the English Language and worked out that:
Filecoin needs to generate $25.7M/yr over and above what it pays the providers. But it can't charge the customers more than S3, or $0.276/GB/yr. If it didn't pay the providers anything it would need to be storing over 93PB right away to generate a 10% return. That's a lot of storage to expect providers to donate to the system.
On my bike ride this morning I thought of another way of looking at FileCoin's optimistic economics.

FileCoin won't be able, as S3 does, to claim 11 nines of durability and triple redundancy across data centers. So the real competition is S3's Reduced Redundancy Storage, which currently costs $23K/PB/month. Assuming that Amazon continues its historic 15%/year Kryder rate, storing a Petabyte in RRS for a decade is $1.48M. So, if you believe cryptocurrency "prices", FileCoin's "investors" pre-paid $257M for data storage at some undefined time in the future. They could instead have, starting now, stored 174PB in S3's RRS for 10 years. So FileCoin needs to store at least 174PB for 10 years before breaking even.

It gets worse. S3 is by no means the low-cost provider in the storage market. If we assume that the competition is Backblaze's B2 service at $0.06/GB/yr and that their Kryder rate is zero, FileCoin would need to store 428PB for 10 years before breaking even. Nearly half an Exabyte for a decade!

Tuesday, July 31, 2018

Amazon's Margins Again

AMZN operating margins
I've been pointing out that economies of scale allow for the astonishing margins Amazon enjoys on S3, and the rest of AWS, for six years. Now, This is the Amazon everyone should have feared — and it has nothing to do with its retail business by Jason Del Rey and Rani Molla documents AWS' margins in this table.
Amazon’s $52.9 billion of revenue in the second quarter of the year came in a tad below what Wall Street analysts expected — and that doesn’t matter whatsoever.

That’s because the massive online retailer once again posted its largest quarterly profit in history — $2.5 billion for the quarter — on the back of two businesses that were afterthoughts just a few years ago: Amazon Web Services, its cloud computing unit, as well as its fast-growing advertising business.
Below the fold, I discuss one of the implications of these amazing margins.

Tuesday, June 19, 2018

The Four Most Expensive Words in the English Language

There are currently a number of attempts to deploy a cryptocurrency-based decentralized storage network, including MaidSafe, FileCoin, Sia and others. Distributed storage networks have a long history, and decentralized, peer-to-peer storage networks a somewhat shorter one. None have succeeded; Amazon's S3 and all other successful network storage systems are centralized.

Despite this history, initial coin offerings for these nascent systems have raised incredible amounts of "money", if you believe the heavily manipulated "markets". According to Sir John Templeton the four words are "this time is different". Below the fold I summarize the history, then ask what is different this time, and how expensive is it likely to be?