Showing posts with label software preservation. Show all posts
Showing posts with label software preservation. Show all posts

Tuesday, November 19, 2019

Seeds Or Code?

Svalbard Summer '69  
I'd like to congratulate Microsoft on a truly excellent PR stunt, drawing attention to two important topics about which I've been writing for a long time, the cultural significance of open source software, and the need for digital preservation. Ashlee Vance provides the channel to publicize the stunt in Open Source Code Will Survive the Apocalypse in an Arctic Cave. In summary, near Longyearbyen on Spitzbergen is:
the Svalbard Global Seed Vault, where seeds for a wide range of plants, including the crops most valuable to humans, are preserved in case of some famine-inducing pandemic or nuclear apocalypse.
Nearby, in a different worked-out coal mine, is the Arctic World Archive:
The AWA is a joint initiative between Norwegian state-owned mining company Store Norske Spitsbergen Kulkompani (SNSK) and very-long-term digital preservation provider Piql AS. AWA is devoted to archival storage in perpetuity. The film reels will be stored in a steel-walled container inside a sealed chamber within a decommissioned coal mine on the remote archipelago of Svalbard. The AWA already preserves historical and cultural data from Italy, Brazil, Norway, the Vatican, and many others.
Github, the newly-acquired Microsoft subsidiary, will deposit there:
The 02/02/2020 snapshot archived in the GitHub Arctic Code Vault will sweep up every active public GitHub repository, in addition to significant dormant repos as determined by stars, dependencies, and an advisory panel. The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size. Each repository will be packaged as a single TAR file. For greater data density and integrity, most of the data will be stored QR-encoded. A human-readable index and guide will itemize the location of each repository and explain how to recover the data.
Follow me below the fold for an explanation of why I call this admirable effort a PR stunt, albeit a well-justified one.

Tuesday, September 17, 2019

Interesting Articles From Usenix

Unless you're a member of Usenix (why aren't you?) you'll have to wait a year to read two of three interesting preservation-related articles in the Fall 2019 issue of ;login:. Below the fold is a little taste of each of them, with links to the full papers if you don't want to wait a year:

Thursday, August 8, 2019

Wine on WIndows 10

Source
David Gerard posts Wine on Windows 10. It works.
Windows 10 introduced Windows Subsystem for Linux — and the convenience of Ubuntu downloadable from the Microsoft Store. This makes this dumb idea pretty much Just Work out of the box, apart from having to set your DISPLAY environment variable by hand.

So far, it's mindbogglingly useless. It can only run 64-bit Windows apps, which doesn't even include all the apps that come with Windows 10 itself.

But I want to stress again: this now works trivially. I'm not some sort of mad genius to do this thing — I only appear to be the first person to admit to having done it publicly.
Gerard recounts the history of this "interesting" idea. Although he treats this as a "geek gotta do what a geek gotta do" thing, the interest for Emulation & Virtualization as Preservation Strategies is in the tail of the post:
TO DO: 32-bit support. This will have to wait for Microsoft to release WSL 2. I wonder if ancient Win16 programs will work then — they should do in Wine, even if they don't in Windows any more.
Of course, if they run in Wine on Ubuntu on Windows 10 on an x86, they should run on Wine on Ubuntu on an x86. But being able to run Wine in an official Microsoft environment might make deployment of preserved Win16 programs easier to get past an institution's risk-averse lawyers.

Thursday, August 1, 2019

Emulation as a Service

I've written before about the valuable work of the Software Preservation Network (SPN). Now they have released their EaaSI Sandbox, in which you can explore the capabilities of "Emulation as a Service" (EaaS), a topic I discussed in my report Emulation and Virtualization as Preservation Strategies. Below the fold I try EaaSi for the first time.

Tuesday, July 16, 2019

The EFF vs. DMCA Section 1201

As the EFF's Parker Higgins wrote:
Simply put, Section 1201 means that you can be sued or even jailed if you bypass digital locks on copyrighted works—from DVDs to software in your car—even if you are doing so for an otherwise lawful reason, like security testing.;
Section 1201 is obviously a big problem for software preservation, especially when it comes to games.

Last December in Software Preservation Network I discussed both the SPN's important documents relating to the DMCA:
Below the fold, some important news about Section 1201.

Thursday, November 1, 2018

Ithaka's Perspective on Digital Preservation

Oya Rieger of Ithaka S+R has published a report entitled The State of Digital Preservation in 2018: A Snapshot of Challenges and Gaps. In June and July Rieger:
talked with 21 experts and thought leaders to hear their perspectives on the state of digital preservation. The purpose of this report is to share a number of common themes that permeated through the conversations and provide an opportunity for broader community reaction and engagement, which will over time contribute to the development of an Ithaka S+R research agenda in these areas.
Below the fold, a critique.

Tuesday, August 28, 2018

Lending Emulations?

In my report Emulation and Virtualization as Preservation Strategies I discussed the legal issues around emulating obsolete software, the basis for the burgeoning retro-gaming industry. These issues have attracted attention recently, as Kyle Orland reports:
In the wake of Nintendo's recent lawsuits against other ROM distribution sites, major ROM repository EmuParadise has announced it will preemptively cease providing downloadable versions of copyrighted classic games.
Below the fold, some comments on this threat to our cultural history.

Thursday, June 21, 2018

Software Heritage Archive Goes Live

June 7th was a big day for software preservation; it was the formal opening of Software Heritage's archive. Congratulations to Roberto di Cosmo and the team! There's a post on the Software Heritage blog with an overview:
Today, June 7th 2018, we are proud to be back at Unesco headquarters to unveil a major milestone in our roadmap: the grand opening of the doors of the Software Heritage archive to the public (the slides of the presentation are online). You can now look at what we archived, exploring the largest collection of software source code in the world: you can explore the archive right away, via your web browser. If you want to know more, an upcoming post will guide you through all the features that are provided and the internals backing them.
Morane Gruenpeter's Software Preservation: A Stepping Stone for Software Citation is an excellent explanation of the role that Software Heritage's archive plays in enabling researchers to cite software:
In recent years software has become a legitimate product of research gaining more attention from the scholarly ecosystem than ever before, and researchers feel increasingly the need to cite the software they use or produce. Unfortunately, there is no well established best practice for doing this, and in the citations one sees used quite often ephemeral URLs or other identifiers that offer little or no guarantee that the cited software can be found later on.

But for software to be findable, it must have been preserved in the first place: hence software preservation is actually a prerequisite of software citation.
The importance of preserving software, and in particular open source software, is something I've been writing about for nearly a decade. My initial post about the Software Heritage Foundation started:
Back in 2009 I wrote:
who is to say that the corpus of open source is a less important cultural and historical artifact than, say, romance novels.
Back in 2013 I wrote:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.
Please support this important work by donating to the Software Heritage Foundation.

Thursday, April 5, 2018

Emulating Stephen Hawking's Voice

Jason Fagone at the San Francisco Chronicle has a fascinating story of heroic, successful (and timely) emulation in The Silicon Valley quest to preserve Stephen Hawking’s voice. It's the story of a small team which started work in 2009 trying to replace Hawking's voice synthesizer with more modern technology. Below the fold, some details to get you to read the whole article

Wednesday, November 1, 2017

Randall Munroe Says It All

The latest XKCD is a succinct summation of the situation, especially the mouse-over.

Thursday, October 19, 2017

Preserving Malware

Jonathan Farbowitz's NYU MA thesis More Than Digital Dirt: Preserving Malware in Archives, Museums, and Libraries is well worth a more leisurely reading than I've given it so far. He expands greatly on the argument I've made that preserving malware is important, and attempting to ensure archives are malware-free is harmful:
At ingest time, the archive doesn't know what it is about the content future scholars will be interested in. In particular, they don't know that the scholars aren't studying the history of malware. By modifying the content during ingest they may be destroying its usefulness to future scholars.
For example, Farbowitz introduces his third chapter A​ ​Series​ ​of​ ​Inaccurate​ ​Analogies thus:
In my research, I encountered several criticisms of both the intentional collection of malware by cultural heritage institutions and the preservation of malware-infected versions of digital artefacts. These critics have attempted to draw analogies between malware infection and issues that are already well-understood in the treatment and care of archival collections. I will examine each of these analogies to help clarify the debate and elucidate how malware fits within the collecting mandate of archives, museums, and libraries
He goes on to to demolish the ideas that malware is like dirt or mold. He provides several interesting real-world examples of archival workflows encountering malware. His eighth chapter Risk​ ​Assessment​ ​Considerations​ ​for​ ​Storage​ ​and​ ​Access is especially valuable in addressing the reasons why malware preservation is so controversial.

Overall, a very valuable contribution.

Wednesday, April 19, 2017

Emularity strikes again!

The Internet Archive's massive collection of software now includes an in-browser emulation in the Emularity framework of the original Mac with MacOS from 1984 to 1989, and a Mac Plus with MacOS 7.0.1 from 1991. Shaun Nichols at The Register reports that:
The emulator itself is powered by a version of Hampa Hug's PCE Apple emulator ported to run in browsers via JavaScript by James Friend. PCE and PCE.js have been around for a number of years; now that tech has been married to the Internet Archive's vault of software.
Congratulations to Jason Scott and the software archiving team!

Wednesday, January 25, 2017

Rick Whitt on Digital Preservation

Google's Rick Whitt has published "Through A Glass, Darkly" Technical, Policy, and Financial Actions to Avert the Coming Digital Dark Ages (PDF), a very valuable 114-page review of digital preservation aimed at legal and policy audiences. Below the fold, some encomia and some quibbles (but much less than 114 pages of them).

Tuesday, October 11, 2016

Software Art and Emulation

Apart from a short paper describing a heroic effort of Web archaeology, recreating Amsterdam's De Digitale Stadt, the whole second morning of iPRES2016 was devoted to the preservation of software and Internet-based art. It featured a keynote by Sabine Himmelsbach of the House of Electronic Arts (HeK) in Basel, and three papers using the bwFLA emulation technology to present preserved software art (proceedings in one PDF):
  • A Case Study on Emulation-based Preservation in the Museum: Flusser Hypertext, Padberg et al.
  • Towards a Risk Model for Emulation-based Preservation Strategies: A Case Study from the Software-based Art Domain, Rechert et al.
  • Exhibiting Digital Art via Emulation – Boot-to-Emulator with the EMiL Kiosk System, Espenschied et al.
Preserving software art is an important edge case of software preservation. Each art piece is likely to have many more dependencies on specific hardware components, software environments and network services than mainstream software. Focus on techniques for addressing these dependencies in an emulated environment is useful in highlighting them. But it may be somewhat misleading, by giving an exaggerated idea of how hard emulating more representative software would be. Below the fold, I discuss these issues.

Thursday, October 6, 2016

Software Heritage Foundation

Back in 2009 I wrote:
who is to say that the corpus of open source is a less important cultural and historical artifact than, say, romance novels.
Back in 2013 I wrote:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.
There are no legal obstacles to collecting and preserving open source code. Technically, doing so is much easier than general Web archiving. It seemed to me like a no-brainer, especially because almost all other digital preservation efforts depended upon the open source code no-one was preserving! I urged many national libraries to take this work on. They all thought someone else should do it, but none of the someones agreed.

Finally, a team under Roberto di Cosmo with initial support from INRIA has stepped into the breach. As you can see at their website they are already collecting a vast amount of code from open source repositories around the Internet.
softwareheritage.org statistics 06Oct16
They are in the process of setting up a foundation to support this work. Everyone should support this important essential work.

Tuesday, October 4, 2016

Panel on Software Preservation at iPRES

I was one of five members of a panel on Software Preservation at iPRES 2016, moderated by Maureen Pennock. We each had three minutes to answer the question "what have you contributed towards software preservation in the past year?" Follow me below the fold for my answer.

Wednesday, July 6, 2016

Talk at JISC/CNI Workshop

I was invited to give a talk at a workshop convened by JISC and CNI in Oxford. Below the fold, an edited text with links to the sources.

Monday, June 13, 2016

Eric Kaltman on Game Preservation

At How They Got Game, Eric Kaltman's Current Game Preservation is Not Enough is a detailed discussion of why game preservation has become extraordinarily difficult. Eric expands on points made briefly in my report on emulation. His TL;DR sums it up:
The current preservation practices we use for games and software need to be significantly reconsidered when taking into account the current conditions of modern computer games. Below I elaborate on the standard model of game preservation, and what I’m referring to as “network-contingent” experiences. These network-contingent games are now the predominant form of the medium and add significant complexity to the task of preserving the “playable” historical record. Unless there is a general awareness of this problem with the future of history, we might lose a lot more than anyone is expecting. Furthermore, we are already in the midst of this issue, and I think we need to stop pushing off a larger discussion of it.
Well worth reading.

Thursday, March 24, 2016

Long Tien Nguyen & Alan Kay's "Cuneiform" System

Jason Scott points me to Long Tien Nguyen and Alan Kay's paper from last October entitled The Cuneiform Tablets of 2015. It describes what is in effect a better implementation of Raymond Lorie's Universal Virtual Computer. They attribute the failure of the UVC to its complexity:
They tried to make the most general virtual machine they could think of, one that could easily emulate all known real computer architectures easily. The resulting design has a segmented memory model, bit-addressable memory, and an unlimited number of registers of unlimited bit length. This Universal Virtual Computer requires several dozen pages to be completely specified and explained, and requires far more than an afternoon (probably several weeks) to be completely implemented.
They are correct that the UVC was too complicated, but the reasons why it was a failure are far more fundamental and, alas, apply equally to Chifir, the much simpler virtual machine they describe. Below the fold, I set out these reasons.