I gave a talk at the Berkeley I-school's Information Access Seminar entitled Archival Storage. Below the fold is the text of the talk with links to the sources and the slides (with yellow background).
I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.
Showing posts with label storage costs. Show all posts
Showing posts with label storage costs. Show all posts
Friday, March 14, 2025
Friday, July 8, 2022
Economic Model Revived

Alas, the Pi became a casualty when, early in the pandemic, we upgraded to the wonderful Sonic gigabit fiber (Best ISP Evah!), needed to support multiple grandkids each in a different virtual school.
Fortunately, Sawood Alam at the Internet Archive has forked the code, re-implemented it in Javascript, improved the user interface, and deployed it at Github. This new version is once again available here. I'm very grateful to Sawood and the team at the Internet Archive, both for pushing me to do the initial implementation, and now for bringing it back from the dead.
Below the fold I have a couple of caveats.
Thursday, March 25, 2021
Internet Archive Storage

Jonah Edwards, who runs the Core Infrastructure team, gave a presentation on the Internet Archive's storage infrastructure to the Archive's staff. Below the fold, some details and commentary.
Friday, May 15, 2020
Economics Of Decentralized Storage
Almost two years ago, in The Four Most Expensive Words in the English Language , I wrote skeptically about the economics of decentralized storage networks. I followed up two months later with The Triumph Of Greed Over Arithmetic. Now, Got a few spare terabytes of storage sitting around unused? Tardigrade can turn that into crypto-bucks is Thomas Claiburn's report on initial experience with Tardigrade, the "decentralized" storage network from Storj Labs. Below the fold, some more skepticism.
Tuesday, March 31, 2020
Archival Cloud Storage Pricing
Although there are significant technological risks to data stored for the long term, its most important vulnerability is to interruptions in the money supply. The current pandemic is likely to cause archives to suffer significant interruptions in the money supply.
In Cloud For Preservation I described how much of the motivation for using cloud services was their month-by-month pay-for-what-you-use billing, which transforms capital expenditures (CapEx) into operational expenditures (OpEx). Organizations typically find OpEx much easier to justify than CapEx because:
In Cloud For Preservation I described how much of the motivation for using cloud services was their month-by-month pay-for-what-you-use billing, which transforms capital expenditures (CapEx) into operational expenditures (OpEx). Organizations typically find OpEx much easier to justify than CapEx because:
- The numbers they look at are smaller, even if what they add up to over time is greater.
- OpEx is less of a commitment, since it can be decreased if circumstances change.
Labels:
amazon,
cloud economics,
digital preservation,
storage costs
Tuesday, September 24, 2019
Promising New Hard Disk Technology
It has been too long, two-and-a-half years, since the last of Tom Coughlin's Storage Valley Supper Club events. But he just organized one to coincide with the Flash Memory Summit. It featured an extremely interesting talk by Karim Kaddeche, CEO of L2 Drive, a company whose technology seems likely to have a big impact on the hard disk market. Follow me below the fold for the explanation. I didn't take notes, so what follows is from memory. I apologize for any errors.
Thursday, August 29, 2019
SSD vs. HDD (Updated)
![]() |
IDC & TrendForce data via Aaron Rakers |
Aaron Rakers, the Wells Fargo analyst, thinks enterprise storage buyers will start to prefer SSDs when prices fall to five times or less that of hard disk drives. They are cheaper to operate than disk drives, needing less power and cooling, and are much faster to access.Below the fold, some skepticism.
Thursday, May 16, 2019
Review Of Data Storage In DNA
Luis Ceze, Jeff Nivala and Karin Strauss of the University of Washington and Microsoft Research team have published a fascinating review of the history and state-of-the-art in Molecular digital data storage using DNA. The abstract reads:
Molecular data storage is an attractive alternative for dense and durable information storage, which is sorely needed to deal with the growing gap between information production and the ability to store data. DNA is a clear example of effective archival data storage in molecular form. In this Review , we provide an overview of the process, the state of the art in this area and challenges for mainstream adoption. We also survey the field of in vivo molecular memory systems that record and store information within the DNA of living cells, which, together with in vitro DNA data storage, lie at the growing intersection of computer systems and biotechnology.They include a comprehensive bibliography. Below the fold, some commentary and a few quibbles.
Tuesday, May 7, 2019
Demand Is Even Less Insatiable Than It Used To Be
In Demand Is Far From Insatiable I looked at Chris Mellor's overview of the miserable Q2 numbers from Seagate, Nearline disk drive demand dip dropkicks Seagate: How deep is the trough, how deep is the trough?, and Western Digital, Weak flash demand and disk sales leave Western Digital scrabbling to claw back $800m a year. This quarter was equally dismal. Below the fold, the gory details.
Thursday, May 2, 2019
Lets Put Our Money Where Our Ethics Are
I found a video of Jefferson Bailey's talk at the Ethics of Archiving the Web conference from a year ago. It was entitled Lets Put Our Money Where Our Ethics Are. The talk is the first 18.5 minutes of this video. It focused on the paucity of resources devoted to archiving the huge proportion of our culture that now lives on the evanescent Web. I've also written on this topic, for example in Pt. 2 of The Amnesiac Civilization. Below the fold, some detailed numbers (that may by now be somewhat out-of-date) and their implications.
Thursday, March 21, 2019
Cost-Reducing Writing DNA Data
In DNA's Niche in the Storage Market, I addressed a hypothetical DNA storage company's engineers and posed this challenge:
If the demo succeeds, it marks a major achievement. But below the fold I continue to throw cold water on the medium-term prospects for DNA storage.
increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.Now, a company called Catalog plans to demo a significant step in the right direction:
The goal of the demonstration, says Park, is to store 125 gigabytes, ... in 24 hours, on less than 1 cubic centimeter of DNA. And to do it for $7,000.That would be 1E11 bits for $7E3. At the theoretical maximum 2 bits/base, it would be $3.5E-8 per base, versus last year's estimate of 1E-4, or around 30,000 times better.
If the demo succeeds, it marks a major achievement. But below the fold I continue to throw cold water on the medium-term prospects for DNA storage.
Tuesday, March 5, 2019
Demand Is Far From Insatiable
Based on numbers that IDC conjures from thin air, pundits believe that demand for storage is insatiable because everyone says Lets Just Keep Everything Forever In The Cloud. That idea assumes storage is free, but Storage Will Be Much Less Free Than It Used To Be. (Both links are from 2012). Below the fold I look at some real-world numbers showing how much storage actual customers are buying.
Tuesday, February 26, 2019
Economic Models Of Long-Term Storage
My work on the economics of long-term storage with students at the UC Santa Cruz Center for Research in Storage Systems stopped about six years ago some time after the funding from the Library of Congress ran out. Last year to help with some work at the Internet Archive I developed a much simplified economic model, which runs on a Raspberry Pi.
Two recent developments provide alternative models:
Two recent developments provide alternative models:
- Last year, James Byron, Darrell Long, and Ethan Miller's Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems (also here) reported on a vastly more sophisticated model developed at the Center. It includes both much more detailed historical data about, for example, electricity cost, and covers various media types including tape, optical, and SSDs.
- At the recent PASIG Julian Morley reported on the model being used at the Stanford Digital Repository, a hybrid local and cloud system, and he has made the spreadsheet available for use.
Thursday, November 8, 2018
What's Happening To Storage?
My only post about storage since May, was October's Betteridge's Law Violation, another critique of IDC's Digital Universe, and their constant pushing of the idea that the demand for storage is insatiable. So its time for an update on what is happening in the real world of storage media, instead of IDC's Universe. Below the fold, some quick takes.
Tuesday, November 6, 2018
Making PIEs Is Hard
In The Four Most Expensive Words In The English Language I wrote:
Since the key property of a cryptocurrency-based storage service is a lack of trust in the storage providers, Proofs of Space and Time are required. As Bram Cohen has pointed out, this is an extraordinarily difficult problem at the very frontier of research.The post argues that the economics of decentralized storage services aren't viable, so the difficulty of Proofs of Space and Time isn't that important. All the same, this area of research is fascinating. Now, in One File for the Price of Three: Catching Cheating Servers in Decentralized Storage Networks Ethan Cecchetti, Ian Miers, and Ari Juels have pushed the frontier further out by inventing PIEs. Below the fold, some details.
Thursday, October 18, 2018
Betteridge's Law Violation
Erez Zadok points me to Wasim Ahmed Bhat's Is a Data-Capacity Gap Inevitable in Big Data Storage? in IEEE Computer. It is a violation of Betteridge's Law of Headlines because the answer isn't no. But what, exactly, is this gap? Follow me below the fold.
Labels:
big data,
deduplication,
long-lived media,
seagate,
storage costs,
storage media
Tuesday, September 4, 2018
Chia Network
Back in March I wrote Proofs of Space, analyzing Bram Cohen's fascinating EE380 talk. I've now learned more about Chia Network, the company that is implementing a network using his methods. Below the fold I look into their prospects.
Labels:
bitcoin,
cloud economics,
kryder's law,
storage costs
Friday, August 24, 2018
Triumph Of Greed Over Arithmetic
I discussed FileCoin's ICO in The Four Most Expensive Words in the English Language and worked out that:
FileCoin won't be able, as S3 does, to claim 11 nines of durability and triple redundancy across data centers. So the real competition is S3's Reduced Redundancy Storage, which currently costs $23K/PB/month. Assuming that Amazon continues its historic 15%/year Kryder rate, storing a Petabyte in RRS for a decade is $1.48M. So, if you believe cryptocurrency "prices", FileCoin's "investors" pre-paid $257M for data storage at some undefined time in the future. They could instead have, starting now, stored 174PB in S3's RRS for 10 years. So FileCoin needs to store at least 174PB for 10 years before breaking even.
It gets worse. S3 is by no means the low-cost provider in the storage market. If we assume that the competition is Backblaze's B2 service at $0.06/GB/yr and that their Kryder rate is zero, FileCoin would need to store 428PB for 10 years before breaking even. Nearly half an Exabyte for a decade!
Filecoin needs to generate $25.7M/yr over and above what it pays the providers. But it can't charge the customers more than S3, or $0.276/GB/yr. If it didn't pay the providers anything it would need to be storing over 93PB right away to generate a 10% return. That's a lot of storage to expect providers to donate to the system.On my bike ride this morning I thought of another way of looking at FileCoin's optimistic economics.
FileCoin won't be able, as S3 does, to claim 11 nines of durability and triple redundancy across data centers. So the real competition is S3's Reduced Redundancy Storage, which currently costs $23K/PB/month. Assuming that Amazon continues its historic 15%/year Kryder rate, storing a Petabyte in RRS for a decade is $1.48M. So, if you believe cryptocurrency "prices", FileCoin's "investors" pre-paid $257M for data storage at some undefined time in the future. They could instead have, starting now, stored 174PB in S3's RRS for 10 years. So FileCoin needs to store at least 174PB for 10 years before breaking even.
It gets worse. S3 is by no means the low-cost provider in the storage market. If we assume that the competition is Backblaze's B2 service at $0.06/GB/yr and that their Kryder rate is zero, FileCoin would need to store 428PB for 10 years before breaking even. Nearly half an Exabyte for a decade!
Tuesday, July 31, 2018
Amazon's Margins Again
![]() |
AMZN operating margins |
Amazon’s $52.9 billion of revenue in the second quarter of the year came in a tad below what Wall Street analysts expected — and that doesn’t matter whatsoever.Below the fold, I discuss one of the implications of these amazing margins.
That’s because the massive online retailer once again posted its largest quarterly profit in history — $2.5 billion for the quarter — on the back of two businesses that were afterthoughts just a few years ago: Amazon Web Services, its cloud computing unit, as well as its fast-growing advertising business.
Tuesday, June 19, 2018
The Four Most Expensive Words in the English Language
There are currently a number of attempts to deploy a cryptocurrency-based decentralized storage network, including MaidSafe, FileCoin, Sia and others. Distributed storage networks have a long history, and decentralized, peer-to-peer storage networks a somewhat shorter one. None have succeeded; Amazon's S3 and all other successful network storage systems are centralized.
Despite this history, initial coin offerings for these nascent systems have raised incredible amounts of "money", if you believe the heavily manipulated "markets". According to Sir John Templeton the four words are "this time is different". Below the fold I summarize the history, then ask what is different this time, and how expensive is it likely to be?
Despite this history, initial coin offerings for these nascent systems have raised incredible amounts of "money", if you believe the heavily manipulated "markets". According to Sir John Templeton the four words are "this time is different". Below the fold I summarize the history, then ask what is different this time, and how expensive is it likely to be?
Labels:
bitcoin,
crowdfunding,
storage costs,
venture capital
Subscribe to:
Posts (Atom)