IIPC Steering Committee Election 2025: Call for Nominations

Wed, 02 July 2025 IIPC StaffLeave a comment

The nomination process for the IIPC Steering Committee is now open.

The Steering Committee (SC) is composed of no more than fifteen Member Institutions. SC Members provide oversight of the Consortium and define and oversee action on its strategy. This year, five seats are up for election.

What is at stake?

Serving on the Steering Committee is an opportunity for motivated members to help guide the IIPC’s mission of improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage. SC members are expected to actively contribute to the leadership and governance of the organization.

Every year, three SC members are designated as IIPC Officers (Chair, Vice-Chair and Treasurer) to serve on the IIPC Executive Board and are responsible for implementing the Strategic Plan.

The SC members meet in person (if circumstances allow) at least once a year. Face-to-face meetings are supplemented by two teleconferences plus additional ones as required. The key tasks for the upcoming term include guiding the implementation of the new Strategic Plan and Consortium Agreement.

Who can run for election?

The SC shall ideally reflect a diverse range of types and sizes of member organizations and roles within the web archiving community and represent the geographic spread of membership. Participation in the SC is open to any IIPC member in good standing. We strongly encourage any organisation interested in serving on the SC to nominate themselves for election.

Please note that the nomination should be on behalf of an organisation, not an individual; however, we do ask that you include the name of the likely representative in the nomination statement. The list of current SC member organisations is available on the IIPC website.

How to run for election?

All nominee institutions, both new and existing members whose term is expiring but are interested in continuing to serve on the SC, are asked to write a short statement (no longer than 200 words) outlining their vision for how they would contribute to IIPC via serving on the SC. Statements can point to current and past contributions to the IIPC activities (e.g. through collaborative projects, conference hosting, participation in SC, Working Groups or task forces), relevant experience or expertise, new ideas for advancing the organisation, or any other relevant information. View past nomination statements here.

All statements will be posted online and emailed to members prior to the election, giving all members ample time to review them. The results will be announced in November, and the three-year term on the Steering Committee will start on 1 January.

Below is the election calendar. We are very much looking forward to receiving your nominations. If you have any questions, please contact the IIPC Senior Program Officer (SPO).

Election Calendar

2 July – 1 October 2025: Nomination period. IIPC Designated Representatives are invited to nominate their organisation by emailing the IIPC SPO. The nomination statement should be no longer than 200 words.

2 October 2025: Nominee statements are published on the Netpreserve blog and circulated to the Members mailing list. Nominees are encouraged to campaign through their own networks.

2 October – 3 November 2025: Members are invited to vote online. The vote is cast by the Designated Representative.

5 November 2025: The results of the vote are announced on the Netpreserve blog and Members mailing list.

1 January 2026: The newly elected SC members start their three-year term.

Preserving Government Social Media in the Netherlands and Luxembourg (Part 2)

Mon, 30 June 2025 IIPC StaffLeave a comment

Techniques and Tools for Social Media Archiving: Casting the Right Fly or Net

by Susanne van den Eijkel (Advisor Digital Recordkeeping) and Lotte Wijsman (Preservation Researcher), National Archives of the Netherlands and Guilhem Costenoble (Archivist), Michel Cottin (Digital Curator), Maxime Detant (Archivist), and Camille Forget (Digital Curator), National Archives of Luxembourg.

This blog post is the second of three about social media archiving at the National Archives of the Netherlands and Luxembourg and their collaboration on the subject. The first blog can be found here.

Over the past years, we have seen various social media platforms change. There are different rules and restrictions in retrieving information online, therefore different techniques are needed to archive the content on these platforms properly. Social media archiving has also become an increasingly discussed topic. There is no universal method to collect social media. Choices depend on institutional goals, legal and technical constraints, and evolving platforms. Despite the challenges, practical guidelines can support the process.

Defining the significant properties of social media

In the first blog, we defined social media and provided context for different policy approaches. The next step is to look at the live web and declare what you think are the significant properties of the social media posts and accounts. In other words: what are the elements that have to be included in the archived version of social media? Is it important to show images, GIFs, and videos, or is text enough? Don’t forget to consider the emojis. They can have a specific meaning in a specific context. How do you make sure you archive that context as wellSignificant properties are defined as “the characteristics of digital objects that must be preserved over time in order to ensure the continued accessibility, usability, and meaning of the objects” (JISC, 2009). The properties are divided in five categories:

Content: This is about what content, such as the texts or an emoji express, not about the form. For example, it’s not only about seeing a “thumbs up”, but it also has to be clear what “thumbs up” means.
Context: Metadata, such as author and date, clarifies who posted something and when. It therefore gives context to a message.
Appearance: Think of color and layout. For example, the specific color blue of Facebook, or the bird/X of Twitter/X. This defines if the archived view is authentic, related to the live version.
Behaviour: Interaction and functionality. For example, a form you can fill in or a video you can play.
Structure: Different types of parts in a post can be related, such as posts from one account that are shared to another account (reposts).

*Figure 1: Example of the category behaviour. Those elements that have a border have to act in a specific way. For example, the link has to refer to a new web page.*

Facing technical constraints

Since the mid-2010s, major social media platforms have progressively restricted access to their data, complicating archiving efforts. Initially driven by the protection of their commercial interests, these restrictions have intensified over the years. Twitter, for example, limited the sharing and transfer of data collected via its APIs as early as 2016, before easing these restrictions for non-commercial research between 2017 and 2020. However, in 2020, the introduction of the “Twitter Academic” API, separate from the paid general API, excluded most archival institutions.

An overview of Social Media Data Policies is shown in Figure 2 below. It is based on feedback from Ben Els (National Library of Luxembourg), Archive-it, Vladimir Tybin (National Library of France); and on the thesis of Beatrice Cannelli (2024).

*Figure 2: Evolution of Social Media Data Policies, that shows how access to platforms has altered over time.*

Unfortunately, large-scale harvests are no longer as qualitative. The restrictions that have appeared in a finally delimited period must lead us to rethink how to collect data in a more reasoned way and to make tactical choices.It is by observing this “evolving” enclosure of the platforms and the irregular changes of rules that we opted for an extraction technique from the parameters of the individual account in the National Archives of Luxembourg. It seems that these modalities remain possible in a relatively uniform way.

This individual extraction method adopted by the National Archives of Luxembourg for collecting social media archives offers a major advantage in terms of sustainability. Unlike web harvesting tools, which face increasingly strict restrictions imposed by platforms, individual extraction relies on a fundamental right guaranteed by a European Union Regulation, the General Data Protection Regulation (GDPR): the right to data portability. Article 20 of the GDPR requires data controllers, including social media platforms, to allow users to retrieve their personal data in a structured, commonly used, and machine-readable format. In other words, even if platforms tighten their restrictions on scraping and crawling, they cannot prevent a user from accessing and retrieving their own data.

The National Archives of the Netherlands have researched multiple techniques to archive social media, keeping the limitations in mind. As a result, we recommend using more than one technique to archive the material, to get as many of the significant properties as possible. Ideally, this means choosing a method that focuses on the text and a method that makes sure to archive the look and feel of the platform. We briefly describe four techniques here, including the pros and cons.

API

API is short for Application Program Interface. It allows computers to ‘talk’ with each other: one asks, the other responds. In practice, this means that a user can type in a web address to open a specific web page (request) and the computer makes sure it shows the web page that is requested (response). This works for social media, as each post is a little web page on itself.

Apart from having some technical background (because you have to understand how the API works and how to connect with it) this was a fairly easy way to have access to social media data. The outcome is structured, with textual data that can be saved as a CSV file. This could be opened in Excel for example, and would provide different columns that include the text of the post, the publisher, the location, timestamp et cetera.

However, in recent years the access to APIs of social media platforms is limited. One of the main things is that you now have to pay for access. Furthermore, the platforms are not transparent in what they’re sharing. This makes it almost impossible to reconstruct how complete and authentic this kind of data is.

“Download” function within social media accounts

Most of the social media platforms offer the option to download your own data. This is the most basic option, one does not need technical skills to retrieve the data. The package that you download most frequently consists of an index HTML file that you can open in your browser. It is possible to navigate through it without being connected to any social media account or even to a network, while keeping the feeling of navigating through the original social media website. Next to that, you’ll receive separate files, such as json and javascript files, CSV, jpeg and MP4 files. In these files, you can find the publications content (whether textual or visual with images and videos) and context (publication date and taken date for audiovisual content), information about the account, advertisements and followers.

It looks quite like the real deal, but there are some differences from the live platform to keep in mind. The most important thing about Luxembourg’s collection principles is that the archive only contains publications from the account owner. This principle is the subject of a collection agreement between ANLux and the ministries. Furthermore, the images are smaller files than those that were posted on the live web.

Do note that the content of your discussions with others may be included, meaning that if you or the person you corresponded with shared personal data, this data is now included in your archive download. However, keep in mind you can still decide to exclude these private messages data before the account extraction or to remove these data after the account extraction.

The main difficulty with this method is that you need to reach out to users who have access to the accounts you wish to archive and to convince them to perform data extraction. This requires an active participation and collaboration between the data producers and the archiving institutions, leading to another form of collect that we have called “partnered collection”. This is the method ANLux chose, with good results for targeted collection campaigns with limited perimeters.

Screen capturing

A screen capture is a static or dynamic recording of your screen. This could be a still image (screenshot) or a more dynamic video of the recording. There are multiple tools to choose from that you can use, that enable you to automatically or manually capture the content. This method results in an image, PDF or video of the content. Only what’s seen on your computer screen is recorded, so this means that, if you are doing this manually, you have to open every separate post and reaction in order to capture it. If it’s automated, you need to find a way to check if everything was opened. As you can imagine, this is very exhausting work.

Web harvesting

Web harvesting (also known as ‘scraping’ or ‘crawling’) is the most common web archiving method. Software is used to archive web content automatically, and it often results in a WACZ or WARC file. The WARC file is the standard file format that is used for web archives. In the NANETH opinion this is most suited for social media archives as well. WARC files do not only contain the content of the webpages, but also some metadata. It is a container file format that includes all the file formats that have been archived and with a special reader the file can be displayed in the correct order on your screen. There are multiple free tools available to harvest web pages and to render the files in a WARC reader. Most preservation systems include a viewer that can render WARC files as well.

Conclusion: Return of the fisher to the port

Having multiple collection methods is always an advantage. Knowing how to combine them is a challenge. In heritage institutions, collecting everything requires method and discernment. The experience of the National Archives of Luxembourg, applying traditional archive collection processes in the field of social networks has demonstrated the feasibility of carrying out a social media collection, though key differences remain with web harvesting in scope and archival status. A partnered collection cannot replace all harvesting techniques. To date, we have not yet had the experience of combining different collection techniques on communities or themes for example: the fishers must sometimes venture into deeper waters or even further out to sea.

*Figure 3: Multiple collection approaches are to be evaluated.*

Resources

Call for Nominations: WWII 80th Anniversary Commemoration

Wed, 21 May 2025 IIPC StaffLeave a comment

By lead curator Melissa Wertheimer, Senior Digital Collections Specialist for Web Archiving at the Library of Congress and Content Development Group Co-Chair.

The Co-Chairs of IIPC’s Content Development Working Group (CDG) invite the public to contribute web-based content to a new event-based collaborative web archive collection: the World War Two 80^th Anniversary Commemoration Web Archive. This collection relates thematically to IIPC’s 2015 World War I Commemoration Web Archive.

The year 2025 is the 80th anniversary of the end of World War II, which concluded in 1945. This year includes specific dates of note that commemorate events that led to the war’s end. The 80th anniversary of Victory in Europe Day (VE Day) is 8 May 2025. The 80th anniversary of the atomic bombings of Hiroshima and Nagasaki are 6 and 9 August 2025. The 80th anniversary of Victory over Japan Day/Victory in the Pacific Day (VJ Day/VP Day) is 15 August 2025 in the United Kingdom and 2 September 2025 in the United States.

What we are collecting

This event-based collaborative collection between IIPC members and the public will include websites and individual web pages that document anniversary events, including:

physical and online sites of memory, such as memorial ceremonies, veterans’ activities, and military cemeteries;
physical exhibits and related events, such as those hosted by museums and historical societies;
information on commemorative works of literature, visual art, and performing arts;
memorial events related to specific battles in the European, Pacific, and North African theaters; and
memorial events that commemorate specific events including VE Day, VJ Day, the Holocaust and concentration camp liberations, and the atomic bomb attacks on Hiroshima and Nagasaki.

Websites and webpages that represent the geo-political and linguistic breadth of the war’s participants are vital to the commemorative nature of this web archive. Online news articles that cover 80th anniversary commemorations, interviews, and ceremonies are especially valued. Complete websites and individual web pages created by organizations, institutions, and groups whose founding origins or missions relate directly to promoting the memory of the Second World War and its social, political, economic, and artistic impacts are also vital to the authenticity of the collection.

Out of scope

Online content created by Holocaust deniers is out of the scope of this collection; however, legitimate online textual content that documents the existence and dangers of Holocaust denial is within scope, especially pertaining to the anniversary year.

Social media is excluded from this collection for technical reasons: to maximize available data for websites and individual webpages and to ensure capture quality over the short duration of the event-based crawls.

How to participate

Members of the public may nominate URLs by using this online form. The collection will run three crawls during the first week of June, August, and September 2025.

For more information and updates, you can contact the IIPC Content Development Working Group team at Collaborative-collections@iipc.simplelists.com.

Resources

Preserving Government Social Media in the Netherlands and Luxembourg (Part 1)

Wed, 09 April 2025 IIPC StaffLeave a comment

Strategies Beyond Web Harvesting: Net or Fly Fishing?

This blog post is the first of three about social media archiving at the National Archives of the Netherlands and Luxembourg and their collaboration on the subject. This first part focuses on the undertaken initiatives to archive social media and its scope. The second will provide an overview of tools and techniques, and the third will discuss advocacy.

Pieces of the Same Puzzle

As National Archives, we are responsible for the selection, evaluation, maintenance, preservation and access to physical and digital information. Information can take many forms, and social media is one of them. Internationally, there has been increasing attention on how to archive and preserve social media content. The ongoing instability surrounding social media content, including efforts to restrict web scraping, has made its long-term preservation a challenge that crosses professional and geographical boundaries for archivists and librarians. Government communication, in particular, presents a sensitive issue for institutions focused on the preservation of historical heritage.

During the International Archival Symposium in Arnhem, the Netherlands, in spring 2024, colleagues of the National Archives of Luxembourg (ANLux) and the Netherlands (NANETH) started a conversation about our respective experiences on the topic of social media archiving. We shared knowledge on the topic via various online meetings and found out that both organisations had pieces of the same puzzle. We spoke of the projects that we had executed within our own organisations, the legal and technical challenges, and our experiences with selection policies, tools and preservation of the material. This blog series is the result of this collaboration.

All Roads Lead to Rome

We found that we had a similar problem but took a different path to find solutions for archiving social media content. ANLux has been promoting the download function on platforms that allows users to export an ‘archive’ of their own data. The method is now published in practical guidelines for Luxembourg public sector organizations to use and updated each year to follow the social media evolutions. NANETH, on the other hand, started research on different techniques to archive social media content that has been published in guidelines for Dutch government organisations to use. There is one major difference between our approaches. ANLux is actively archiving social media, in collaboration with the public administrations and ministries, as a daily business and with annual campaigns. However, NANETH is not responsible for archiving the material. We only advise on how to archive social media and research different techniques. NANETH is responsible for the long-term preservation of the content, once it is transferred to the National Archives.

In Luxembourg, archiving social media was one of the National Library of Luxembourg’s (BnL) attributions of legal deposit. But, as time went by, the BnL was faced with the challenge of adapting traditional web archiving methods, which were limited by publishers to capture social media, while ANLux had been developing retention schedules for Luxembourg public sector organizations since 2018. Moreover, ANLux can rely on a dense network of archiving delegates within these organizations. As a result, the BnL requested the support of the ANLux which developed a procedure to extract government-related accounts in strict compliance with the retention schedules. This led to a first partnered campaign at the end of 2023, with a collection scope limited to the Ministers of Xavier Bettel’s government, 2018-2023, which has had significant results.

In the Netherlands, the Dutch Archival Law (1995) states that ‘documents, whatever their form, received or prepared by public authorities’ have to be archived. This includes government information on the various social media platforms, which means the National Archives are obliged by law to permanently store transferred material. This contrasts with the the the Dutch National Library’s role (KBNL) as it does not have a legal mandate to preserve (digital) material for the long term. In 2020, the first projects in the Netherlands started around social media archiving in collaboration with the Dutch Digital Heritage Network. Since that moment, there have been multiple initiatives and projects to research how to best archive social media, as is shown in Figure 1.

In 2025, NANETH published a guideline on archiving social media. The main focus is government organisations, but the guideline is suitable for heritage institutions as well. Dutch government institutions are responsible for archiving their own content. As National Archives, we can only advise on how to do it, and what’s best for the long-term preservation of the content. For example, in the guideline, we have explained that the method you choose to archive the material determines the information object you create. The guideline will be used as a basis for policymakers and has been developed by experts in the field of social media archiving from the Netherlands and Belgium.

*Figure 1: Timeline of social media archiving initiatives in the Netherlands.*

In Luxembourg, the law of August 17, 2018 relating to archiving states that “all documents, including data, regardless of their date, place of storage, material form and medium, produced or received by any natural or legal person and by any public or private service or body in the exercise of their activity” are archives that must be preserved. Besides, this law makes it mandatory to develop retention schedules for public sector organisations. In these retention schedules, a specific category is dedicated to the content of social networks, whose final outcome is conservation. ANLux has published a practical guideline for government services with the procedures for extracting data from individual accounts for social media platforms used for government communication, mainly Facebook, LinkedIn, Instagram, Twitter/X. These practical procedures are adapted from the “user guides” of each platform and include recommendations on the types of metadata to preserve as well as limitations to comply with the GDPR. This document is supplemented by the payments procedure in the administrative sense as well as recommendations for data transfer techniques.

On the occasion of the governmental transition of 2023, ANLux decided to launch a campaign to collect social media content published by each minister. This project was carried out in collaboration with the BnL, the Government press service, and the Prime Minister’s office. To prepare this campaign, ANLux had the opportunity to carry out a collection test based on the Facebook account of the Prime Minister who agreed to set an example. This helped not only to determine the selection criteria, but also to pave the way for other ministers, showing that individual extractions are still possible and without any risks for privacy. More on this will be explained in the second blog.

*Figure 2: Timeline of social media partnered collection in Luxembourg.*

The NANETH guideline explains the different techniques there are to archive social media, which is linked to the principle of significant properties and (where possible) archiving by design. For example, we explain how the extraction of data with the API works, what the outcome is (e.g. in which file format the data is saved and how it can be viewed) and what this means for the trustworthiness and authenticity of the material. More on the techniques can be found in blog 2. In contrast to exploring the available techniques, ANLux has proposed a more accessible solution of downloading the users’ own content as long as this option is available. ANLux also provides information on what are the best formats to choose for long-term preservation. There is no right or wrong in this, as there are different roads to take. However, our destination is the same. We try to safeguard the information from this relatively new medium for future generations.

Exploring the Scope of Social Media

Before starting with social media archiving, it is important to define what we mean by social media. The term has been in use since the 1990s “to indicate a new medium that enabled social interaction between users on the web” (Cannelli, 2024). This also means you have to decide which social media platforms to focus on. The ones that are the main focus for both Luxembourg and the Netherlands are X (Twitter), Facebook, LinkedIn, and Instagram. Perhaps we will need to reconsider the scope of archived platforms in the future, with the use of new social networks and the decline or even disappearance of other platforms. For example, ministries and ministers are starting to create TikTok accounts in Luxembourg, while many public organisations have already made the decision to leave X (Twitter). In the Netherlands, some of these organisations are shifting to Bluesky and Mastodon.

At NANETH, we defined a rather broad definition of social media. We consider everything the Dutch government publishes on social media as information we want to archive. For social media, this includes communications with Dutch citizens about the COVID-19 lockdown in 2020 or initiatives such as the national garden bird-counting event. Additionally, we believe that the interaction on platforms with civilians, for example, is important to archive as well. If a government organisation is responding to a third-party post, we want to archive the reaction and the original post to ensure completeness. It is important to keep this in mind, as the method you choose for archiving social media has a big part to play in the information object you create. However, the private messages are not in scope for social media archiving. This will be part of a direct messaging policy that is under development, and which covers chat apps such as WhatsApp and Signal.

ANLux chose to follow two main principles to determine their collection strategy: the principle of preserving intentional and official publications from the government and the public sector organisations, and the principle of reduced community collection. The first principle means that only the elements which had been the subject of a publication (in the sense of legal deposit) are preserved. This is why the focus is on public posts, notably excluding interactions with the followers (such as private messages or comments) and social network accounts that did not have the status of government communication. This selective approach ensured that only publicly accessible information was archived, respecting privacy and confidentiality.

The second principle of collecting material dedicated to a specific community came into play from the beginning of the project. The very fact that ANLux would only collect social media accounts mentioned in the retention schedules has consequences on the scope of the collection. This principle is also largely influenced by a factor independent of the National Archives’ will, which are conditions of use and extraction permitted by the editors of the targeted social networks. Indeed, automated harvesting with robots is limited. It seems certain that social media platforms will not back down from allowing mass harvesting.

The partnered collection method that we chose at ANLux has the advantage of involving the data producer in the game rules, like more traditional archive transfers. Let’s take up the challenge with all its opportunities, but also its share of constraints. This method allows us to envisage the broadening of the scope initially targeted in the annual campaigns. Indeed, these campaigns enabled ANLux to make themselves seem more accessible to the ministries and administrations, whether it be to collect social media accounts or other types of archives. Since such campaigns, other organizations – outside the scope of the archiving law – have informed us that they collect social media accounts, for example, with the departure decisions of certain platforms.

To figure out what needs to be included when archiving social media, NANETH has used significant properties, a strategy also used by the Library of Congress. In our guideline on social media archiving, we have assessed multiple techniques and scored them on various criteria. Among them are the five categories of significant properties: content, context, appearance, behaviour, and structure. In the next blog, we will provide more information about tools and techniques, and compare the different approaches of ANLux and NANETH.

Whether you’re a devoted fan of net fishing or fly fishing, the collection of government social media is an ever-evolving sport. Between web harvesting and partner-driven collection, our Luxembourgish and Dutch approaches explore different strategies to preserve these fragile yet essential digital traces. If this first blog post has helped define the stakes and scope, the game is only just beginning!

In our next blog post, we’ll dive into the technical depths of collection methods, results and statistics before dedicating a third post to the challenges of advocacy and integrating these practices into digital preservation strategies. Stay in the loop with fishers and keep the feed rolling!

Resources

IIPC 2024 Summer Olympics and Paralympics collaborative collection

Tue, 25 March 2025 IIPC StaffLeave a comment

By Helena Byrne, Curator of Web Archives, British Library

The 2024 Summer Olympics and Paralympics held in Paris were record breaking events. Like with previous Games since 2010, the International Internet Preservation Consortium (IIPC) Content Development Group organised a collaborative transnational Web archive collection on the Games. The events on and off the field of play from web publications from 86 countries. There are 47 languages represented in the collection. Not surprisingly the largest number of nominations were in French with 1,181 records while many languages have as few as 1 or 2 records.

The collection is available to view from the IIPC Archive-it.org account – https://archive-it.org/collections/22312.

IIPC CDG 2024 Olympics/Paralympics collection – Google My Maps

The majority of these records were nominated by IIPC members but a small number of unique records were nominated through the public nomination form that was launched on the IIPC blog in July 2024.

As with our previous collection on the 2022 Winter Olympics and Paralympics, social media was excluded from this collection. This was due to the fact that it was very difficult to preserve any meaningful social media captures through the Archive-it platform at the time of the event. One change from the scoping rules for this collection compared to previous Olympic and Paralympic collections was to exclude the Seed page plus 1 click of all links on seed page (e.g. a single news page linking to multiple articles), because these types of crawls normally pick up lots of irrelevant content that eats up data.

All seeds added to the crawler were capped at 2mb. This is generally enough data to capture a standard website but would mean we only have shallow captures of bigger media heavy websites. Overall the 3,429 websites and webpages that were archived amounted to 458 GB and 6,315,815 documents.

Resources

For more information and updates on Content Development Group activities, you can contact the IIPC CDG team at Collaborative-collections@iipc.simplelists.com.

Web Archiving the War in Ukraine: call for nominations

Wed, 05 March 2025Tue, 11 March 2025 IIPC StaffLeave a comment

By the lead curators: Anaïs Crinière-Boizet, Digital Curator at the National Library of France and Lead Curator of the War in Ukraine collection and Vladimir Tybin (Head of Digital Legal Deposit, National Library of France)

The IIPC Content Development Working Group launched “the War in Ukraine” collaborative collection in July 2022, a few months after the beginning of this conflict, in order to capture its impact on digital history and culture on the web. Based on suggestions by curators, web archivists and members of the public worldwide, we have launched six crawls in total: three in 2022, two in 2023 and one in 2024. You can read an update on this collaborative effort here.

Many of the pages nominated in the early crawls are already offline, which shows why it is important to continue this effort. We encourage everyone to nominate websites around the themes listed below. The first 2025 crawl will start on 24 March.

What we want to collect

This collection is built through the following themes:

General information about the military confrontations
Consequences of the war on the civilian population
Refugee crisis and international relief efforts
Political consequences
International relations
Diaspora communities – Ukrainian people around the world
Human rights organisations
Foreign embassies and diplomatic relations
Sanctions imposed against Russia by foreign powers
Consequences on energy and agri-food trade
Public opinion: blogs/protest sites/activists

The list is not exhaustive and it is expected that contributing partners may wish to explore other subtopics within their areas of interest and expertise, providing that they are within the general collection development scope.

Out of scope

The following types of content are out of scope for the collection:

Data-intensive audio/video content (e.g. YouTube channels)
Social media platforms
Private member forums, intranets, or email (non-published material)
Content identifying vulnerable people and compromising their safety

How to get involved

Once you have selected the web pages that you would like to see in the collection, it takes less than 5 minutes to fill in the submission form: bit.ly/Ukraine-2022-collection-public-nominations

For more information and updates, you can contact the IIPC Content Development Working Group team at Collaborative-collections@iipc.simplelists.com.

Resources

Three years on: an update on the War in Ukraine IIPC collaborative collection

Wed, 05 March 2025Tue, 11 March 2025 IIPC StaffLeave a comment

By the lead curators: Anaïs Crinière-Boizet, Digital Curator at the National Library of France and Vladimir Tybin, Head of Digital Legal Deposit at the National Library of France.

The War in Ukraine collaborative collection led by the IIPC Content Development Working Group (WIU CDG) was initiated in July 2022, a few months after the beginning of this conflict, in order to capture its impact on digital history and culture on the web. Based on suggestions by curators, web archivists and members of the public worldwide, we have launched six crawls in total, three in 2022, two in 2023 and one in 2024. Together with our colleague Kees Teszelszky, we provided an update on the status of the collection after the 2023 crawls.

This blog post aims to give an update on this transnational collection documenting an important historical event. Since the beginning of the collection, many IIPC members but also the public responded to the call for contributions to document the conflict. In total 1,528 member proposals were received and 322 via the public nomination form, making 1,850 seeds in total. After cleaning up duplicates and invalid URLs, 1,822 seeds remained. All these were crawled at least once between July 2022 and November 2024. Today the collection represents 2.3 TB.

Below is the geographic coverage of nominated content and the IIPC members’ contributions.

Collection scope

The selections cover the following topics given in the call for nominations: general information on: military confrontations; consequences of the war on the civilian population in Ukraine; refugee crisis and international relief efforts in and outside Europe; political consequences; international relations; diaspora communities like Ukrainians around the world; human rights organizations; foreign embassies and diplomatic relations; sanctions imposed on Russia by foreign powers; consequences on energy and agri-food trade; and public opinion such as blogs, protest sites, online writings of activists etc. Websites from countries all over the world and in all languages are in scope. Special attention has been devoted to websites which can be a source of internet culture, such as sites with internet memes.

To learn more about the context of this collection, see this article of 2022 published by the SUCHO initiative.

Screenshot of the SUCHO Meme Wall taken on December 6, 2023.

Timeline and crawl depth

We launched the sixth crawl for the War in Ukraine web collection on 1,206 seeds in November 2024, with a budget of 500 GB. 157 new seeds were submitted by members and through the public nomination form between the last crawl in December 2023 and November 2024. 419 seeds have been deactivated since December 2023. These were pages which were not updated since the last crawl or went offline. These “404 file not found” errors also show why our collection work is important, as some sites have already gone offline. In total, 25 jobs have been launched: 21 crawls with the standard crawler and 3 with Brozzler (a distributed web crawler that uses a real web browser (Chrome or Chromium) to fetch pages and embedded URLs and to extract links).

The tendency to select « One page + » sites is confirmed as the collection continues, with the ‘standard’ scope being the least used of the three proposed. This can be explained by the fact that the first type of site selected (see Figure 2) is ‘News’. « One page + » depth allows crawling of newspaper or online media pages pointing to articles related to the war in Ukraine.

Looking at the distribution of sites by website type, it is noticeable that a large proportion of the sites are news sites, NGOs and government websites. The role of blogs in internet culture has diminished in recent years, as is also visible in this collection. In contrast, NGO websites contain more and more information worth preserving for historians of the future, as they document their activities to their donors.

Screenshot of Ukraine’s 24 Channel taken on October 9, 2022.

Compared to the first crawls, international languages such as English (451) and French (260) are still in the lead, but third place is now occupied by Ukrainian (225) and seventh by Russian (71), showing our will to encourage selections of pages in those languages. The impact of the conflict on the rest of Central, Eastern and Southern Europe around Ukraine can be seen by the collection of sites in Hungarian (46), Czech (53), and Serbian (42).

1580 selections are already available on Archive it. The rest are still waiting for QA. The necessity to control all the URLs and the poor quality of some captures due to blocking or inherent limitations of the crawler make this a long process, but we hope that the collection will soon be completely available on Archive-it.

Quality assurance (QA)

For QA, we had the welcome help of Eilidh MacGlone, web archivist at the National Library of Scotland, who tells us in this article about what she learnt by volunteering on the War in Ukraine collection.

“Volunteering to quality assure (QA) targets in Archive-It was a beneficial experience, building on what I knew about Regular expressions (Regex) and learning about Sort-friendly URI Reordering Transforms (SURTs). I alternately scoped in and blocked URLs for several targets, directing Heritrix to intended pages and avoiding traps and redundant locations. This work will reduce the memory requirement for seeds remaining active in any future crawls. I employed an AI assistant to create Regex phrases which I then checked at regexchecker.com, mixing links I wished the crawler to find with others from the domain I did not and running my generated candidate against these. The patterns I needed are characteristic and I found myself reusing them through this work.

Another aspect of the work was a light touch metadata check, downloaded and edited in a Google spreadsheet as recommended by Archive-It. I reduced the number of alternate titles, and ran a spellcheck, though only for the English language text. I amended a few sentences as I worked, ran a final check for spelling mistakes – and found them, given human nature! Uploading to overwrite the metadata for a collection of more than 2000 items was nerve-wracking but worked well.”

Call for nominations

In a period of uncertainty regarding the future of the conflict, influenced by global political shifts and leadership changes, it is more important than ever to document this event for future generations and researchers.

In a period of uncertainty regarding the future of the conflict, influenced by global political shifts and leadership changes, it is important to document this event for future generations and researchers.

Three years after the start of this conflict, the importance of continuing to archive its traces on the web is as crucial as ever. We are launching a new call for nominations and encourage everyone to suggest content that should be crawled, particularly from countries underrepresented in the current collection.

Call for nominations

If you have been involved in a collection or a project documenting the war in Ukraine and would like to see it included in our next update, please use this form to contact us: https://bit.ly/archiving-the-war-in-Ukraine.

We would like to thank Kees Teszelszky, who started this project with us, Eilidh MacGlone for her QA work, Carlos Lelkes-Rarugal and Nicola Bingham for their continuous support and guidance.

Resources: 

20 Years of OASIS

Tue, 31 December 2024Fri, 24 January 2025 IIPC StaffLeave a comment

This year, the Online Archiving and Searching Internet Sources (OASIS) at the National Library of Korea (NLK) celebrated its 20th anniversary. NLK joined the IIPC in 2008, and as a member of the IIPC, NLK participated in building our collaborative collections, including the 2018 Pyeongchang Winter Olympic Games and the COVID-19 pandemic. Thematic and social media collections are openly available at https://nl.go.kr/oasis/.

By Ahn Kyung-Ja, Librarian at the National Library of Korea

The National Library of Korea held a special exhibition, ‘Webtro: Digital Memory’ celebrating the 20th anniversary of “OASIS”, a web resources archive of Korea.

(Exhibition room on the first floor of the main building, October 14 – December 8, 2024)

Poster for the OASIS Exhibition at the National Library of Korea — Exhibition Key Visual

‘OASIS (Online Archiving & Searching Internet Sources, OASIS)’ was launched in 2004 to preserve Korean digital intellectual heritage. OASIS has collected and preserved a vast amount of domestic and international K-web resources totaling around 2.6 million items and is providing services to the public through the OASIS website (nl.go.kr/oasis).

We prepared an exhibition to celebrate OASIS’s 20th anniversary and promote it to the public. It is the first attempt for the OASIS project team to realize purely digital web resources in an offline space.

The web is a communication network that produces and shares various types of content in real-time and is the most popular and practical knowledge information resource representing this era. Since the early 2000s, member libraries in the IIPC have been collaborating with institutions around the world to preserve web resources that change and evolve every day. To enhance awareness of the necessity and value of web preservation, the National Library of Korea organized an exhibition to show how web archiving works.

The exhibition title, ‘Webtro’, is a newly coined word that connects the words ‘web’ and ‘retro’ and refers to visitors’ past experiences, which are collected and recorded in the OASIS system. We organized domestic websites and web resources collected in OASIS during the past 20 years. Visitors can see them at a glance by theme and by period.

OASIS-Exhibition_Intro — OASIS: Exhibition Intro

We applied pixel design to maximize the unique features of the web and redesigned a once-famous character created for a South Korean social network service. The exhibition’s main goal was to realize a digital-memory trip in a real space. We also posted three video works by young web artists with the motif of OASIS and envisaged the future of OASIS in virtual space on a big screen.

The exhibition consists of 4 parts in total.

Part 1 Understanding OASIS: Digital Time Capsule

Part 1 introduces a world map showing web preservation projects and participating institutions around the world, explaining the formation of the IIPC and worldwide cooperation for web preservation. It shows video images from major web preservation institutions and introduces OASIS from its birth to its current status.

OASIS-Chronology — The Chronology of OASIS

OASIS-IIPC_map — Map of IIPC and Web Preservation Cooperation

Part 2 Exploring the Web: Bringing Back the Lost Web

It’s time to explore the K-Web resources in OASIS. We prepared a chronological archive table that shows 45 major disasters in twenty years and representative K-culture collections such as BTS, Squid Game, and Han Kang’s Nobel Prize in Literature. Visitors can also use a touchscreen to search nostalgic websites that are not currently serviced.

OASIS-Web_Resources_Exploration — OASIS Web Resources Exploration

Part 3 Retro Experience: Y2K Time Travel

This is a space where you can experience time travel into the past. We provide infographics of thirteen major sports events, from the 2014 Incheon Asian Games to the 2024 Paris Olympics and Paralympics.

OASIS-Sports_Time_Machine_Zone — Sports Time Machine Zone

OASIS-The_Miniroom_Area — The Miniroom Area in Our Memory

We created a retrospective space where you can experience a once-popular virtual community in the 2000s. Cyworld, launched in 1999, was the beginning of the Korean Social Network Service. Users created their own mini-me (avatar) and mini-room in cyberspace and communicated with friends and others within the virtual space. By restoring virtual images in real space, we gave visitors an opportunity to relive a good memory. There is also a digital guestbook in which visitors can write a message. We will permanently preserve it after the exhibition ends.

Part 4 Extended Future: Connect to 2050

In the final part of the exhibition, we extended the scope of the web. We imagined a future for OASIS in metaverse space, complete with web art. In the gallery area, you can enjoy a Paik Nam June-style media artwork, three web artworks created by young web artists, and a video illustrating a possible future of OASIS 2050 in the metaverse space.

OASIS-The_Future — A Collaborative Work Combining the Web, Art and the Future of OASIS

Celebrating the 20th anniversary of OASIS, the exhibition expressed the value and importance of preserving digital cultural heritage to the Korean public. We have performed a leading mission of web preservation in South Korea and informed the public that many overseas partner institutions cooperate for web preservation projects. We also reported on the TV news and live broadcasts to let the public recognize our mission. About 3,500 visitors came to the exhibition and recognized the need to preserve web resources. They left encouraging words and wrote short memos to show their thanks for the web preservation project.

“Collapse and Connection” by Lee Yoon-su

“Upanishad” by Jang Hyun-wook

“From Oasis to Ocean” by Areumdamda

I would like to express my sincere gratitude to IIPC Chairman Jeffrey van der Hoeven and to Olga Holownia for their support and kind congratulatory letters. I thank Alex Osborne for visiting the exhibition and promoting it in Australia, and the Bibliothèque Nationale de France for visiting the exhibition.

I hope this exhibition will increase public interest, support, and encouragement for the pioneering work of web archiving.

IIPC Steering Committee Election 2024: Nomination Statements and Results

Thu, 12 September 2024Tue, 01 July 2025 IIPC StaffLeave a comment

The call for nominations for the 2024 Steering Committee elections has closed. We had four vacant seats and received three nominations, so an election process is not required this year. We would like to congratulate and thank the British Library, the National Library of France, and the National Library of the Netherlands, who will be continuing for another term.

Please find the statements of all the nominees below. The new, three-year term starts on 1 January 2025.

Nomination statements:

Bibliothèque nationale de France | National Library of France

The National Library of France (BnF) started its web archiving programme in the early 2000s and now holds an archive of more than 2.2 petabytes. We develop national strategies for the growth and outreach of web archives and host several academic projects in our DataLab. We use and share our expertise on key tools for IIPC members (Heritrix 3, NetarchiveSuite, OpenWayback, SolrWayback, webarchive-discovery) and contribute to the development of several of them.

As one of the founding members of the IIPC, we have always actively contributed to the GA & WAC meetings, workshops and most of the working groups, and we remain committed to the development of a strong community sharing knowledge and practices. Recently, the BnF has been particularly active within the CDG, leading a collaborative collection on the War in Ukraine, participating in the work of various groups, and also hosting the 2024 GA & WAC in Paris. BnF is currently co-leading the Membership Engagement Portfolio. By drafting and implementing the main thrusts of the new Strategic Plan and Consortium Agreement, our participation in the steering committee will be focused on making web archiving a thriving community, engaging researchers in the study of web archives, developing harvest and access strategies.

The British Library

This year, the British Library has had good cause to reflect on the importance of an international community in supporting resilience as well as development of capability in web archiving. The British Library became a founding member of the IIPC because we recognised the need to work collaboratively and share knowledge and experience in a field that was technologically challenging and rapidly changing. That remains the case today, as the technology of the web, web archiving tools and researcher needs have advanced.

The next 3 years will be a period of rapid change for the UK Web Archive as we restore our service following last year’s cyber attack. We remain engaged in how the community can support the development of tools on which we depend. Collaborative collecting remains a key part of how we work with the IIPC. We are excited by new research, including support for use of the archived web as data. We are a member of the Steering Committee and, as Treasurer for 2022 – 2023, gained a good understanding of how the organisation operates. As a member of the Steering Committee we would additionally take a role in the strategic direction of the IIPC.

Koninklijke Bibliotheek | National Library of the Netherlands

We believe the IIPC is an important network organization which brings together ideas, knowledge and best practices on how to preserve the web and retain access to its information in all its diversity. In recent years, KB National Library of the Netherlands (KBNL) has taken an active role in the IIPC: we co-hosted the 2023 Web Archiving Conference and our representatives have served in various leadership roles, including as portfolio leads, IIPC vice-chair (2023) and chair (2024).

We would like to continue our work and bring together more organizations, large and small across the world, to learn from each other and ensure web content remains findable, accessible and re-usable for generations to come. Our main focus will be to support the IIPC in reshaping the Consortium Agreement and developing the new strategic plan, taking input from our members and the wider web archiving community.

As a national library, our work is fueled by the power of the written word. It preserves stories, essays, and ideas, both printed and digital. When people come into contact with these words, whether through reading, studying, or conducting research, they impact their lives. With this perspective in mind, we find it vital to preserve web content for future generations.

WAC 2024 Student Travel Report

Wed, 31 July 2024Thu, 15 May 2025 IIPC StaffLeave a comment

By Kayla Martin-Gant, IIPC Administrative Officer

Co-organized by the IIPC and the BnF in partnership with the French National Audiovisual Institute (INA), the IIPC Web Archiving Conference (WAC) 2024, held from April 24-26 at the National Library of France (BnF) in Paris, brought together over two hundred members of the web archiving community from all over the world. Below are insights and experiences of a few of the attendees who received student bursaries from the IIPC, collected from their submitted travel reports.

Student Bursary Experience

Each year, the IIPC awards up to ten applicants in good academic standing with a bursary to cover their registration costs. We had nine student bursary recipients this year, four of whom also attended as presenters at the conference. While many were local students of the National School of Charters, others hailed from Belgium, Portugal, England, and the United States. Additionally, five of the twenty-four mentees in this year’s mentoring program – a fairly new but much appreciated element of the conference – were student bursary recipients.

Jonas Melo, a student of the University of Porto’s Information and Communication in Digital Platforms program, was a familiar face at WAC, having attended past conferences. As both an attendee and a speaker at this year’s conference, Melo expressed his gratitude for the assistance. “The student bursary program was a fantastic initiative by the IIPC, providing financial support that made it possible for students like me to attend the conference,” he says, noting the ease of the application process and that “the support from the IIPC team was exceptional. I am grateful for this opportunity and hope that the program continues to support future students.” The other recipients echoed his sentiments in their own reports.

Program Favorites

The conference featured a variety of presentations, short talks, panels, posters, and workshops on a range of diverse topics. Attendees had the challenging task of choosing between parallel tracks, each offering valuable insights and innovations. Though they found value in all the sessions they were able to attend, the bursary recipients made note of those they enjoyed the most.

Melo points to the very first session as a standout for him, saying it “sparked many ideas on how we can leverage AI to improve the efficiency and accuracy of our archiving efforts.” He went on to add that the second panel of the conference, Archiving Social Media in an Age of APICalypse, was “particularly relevant as it underscored the importance of balancing technological advancements with ethical responsibilities.”

Panel 2_WAC2024 — *Archiving Social Media In An Age of APIcalypse*
From the left: Anat Ben-David, Benjamin Ooghe-Tabanou,
Frédéric Clavert, Beatrice Cannelli, and Jerôme Thièvre
Photo credit: Olga Holownia / IIPC

Beatrice Cannelli, a PhD student at the University of London School of Advanced Study and a panelist on Archiving Social Media in an Age of APICalypse, had a tough time deciding between the simultaneous sessions. “I seriously wished I had a time-turner at some points!” she exclaims, describing the program as “incredibly rich” and praising the attention to legality and ethics as well as “archiving diverse communities while ensuring inclusivity.” She also found the Unusual Content session to be particularly engaging, especially Christopher Rauch’s Saving Ads: Assessing and Improving Web Archives’ Holdings of Online Advertisements and Valérie Schafer’s Put it Back! Archived Memes in Context.

Fellow attendee Lizzy Zarate, a student of New York University’s Archives and Public History program, agrees with Melo on the quality of both the second panel, which she describes as “an examination of the legal, ethical, and technical issues relating to the regulation of API access by tech platforms that are not incentivized to act in the public interest,” as well as the Artificial Intelligence & Machine Learning session. “As somebody who primarily performs quality assurance checks on archived websites, I was interested in learning about attempts to automate this process and other facets of web archiving using machine learning and artificial intelligence,” she explains.

She points to Benjamin Lee’s work with the End of Term Archive as “an interesting exploration of how preserving materials like PDFs can be accomplished using machine learning. Projects such as these seem particularly important for government accountability as well as potential uses for curation.” She goes on to add that, aside from AI, Alex Dempsey’s lightning talk on the Internet Archive’s deduplication work was “an introduction to a topic that I had never encountered in my work, and I am excited to track how IA continues to address this issue in the future.”

With this year’s conference, I confirmed my affinity for web archives, both as the object of my studies and as a field I would like to work in my future archivist career. I met many engaging professionals, ready to have a small talk around a coffee.

– Alice Guérin

The opening keynote panel, Here Ya Free! Crossed Views on Skyblog, the French Pioneer of Digital Social Networks, was mentioned as a favorite by nearly all of the student bursary recipients.

“After almost 20 years of providing users with a personal digital space, enabling them to connect with other users sharing the same interests, the platform announced its closure in 2023,” explains Cannelli, whose doctoral research focuses on the strategies employed by archiving initiatives in the preservation of social media platforms. “The BnF and INA – as France’s electronic deposit institutions – coordinated an emergency capture to preserve billions of URLs.”

Panelists included Pierre Bellanger, founder and CEO of Skyrock Radio, freelance journalist and former Skyblog user Pauline Ferrari, and Web Archiving Technical Leads Jerôme Thièvre of INA and Sara Aubry of the BnF, and was moderated by Emmanuelle Bermès, Educational Manager of the Digital Technologies Applied to History master’s program at the National School of Charters.

Opening_Keynote_Right — Opening Keynote Panel *Here Ya Free! Crossed Views on Skyblog, the French Pioneer of Digital Social Networks*
From the left: Pierre Bellanger, Pauline Ferrari, Emmanuelle Bermès, Sara Aubry, and Jerôme Thièvre
Photo credit: Nola N’Diaye / BnF

“This mix of voices underscored the important role that such platforms play in our daily lives and the vital function performed by web archiving institutions in ensuring the long-term preservation of such content even beyond the platforms’ lifespan,” says Cannelli. Zarate, who regularly works with student blogs and engages with university students in her research, says the panel “helped illuminate the value and challenges of preserving materials created by young people on relatively unregulated platforms.”

Alice Guérin, who is pursuing a master’s degree in Digital Technologies Applied to History at the National School of Charters under Bermès, cited her current thesis on the history of the Skyblog platform as the reason the keynote panel drew her so strongly. She also notes that the entirety of the Digital Preservation session, as well as Niels Brügger’s presentation on web history, The Form Of Websites: Studying The Formal Development Of Websites, The Case Of Professional Danish Football Clubs, “offered very interesting perspectives for researchers.”

Personal Insights and Takeaways

Despite the packed program, the conference provided ample time to mingle, from casual chats during session breaks to purposefully engineered networking opportunities. Attendees appreciated the chance to engage with such a diverse cross-section of the web archiving community.

As a student bursary recipient, I found the conference to be an invaluable learning experience. The sessions were not only informative but also thought-provoking, encouraging us to think critically about the future of web archiving. I appreciated the opportunity to engage with experts in the field and to gain insights that will undoubtedly shape my future research and career.

– Jonas Melo

“Something that really surprised me was the wide variety of disciplines represented across the conference,” notes Zarate, who learned that her future master’s degree in Archives is an uncommon one in many other countries. At the conference, she says, she was able to meet “archivists, librarians, and computer programmers from around the world…Hearing about the sheer number of different ongoing projects expanded my view of what I had previously thought was possible within web archives.”

Guérin had the opportunity to participate in the Early Scholars Spring School on Web Archives organized by Emmanuelle Bermès (National School of Charters, PSL University of Paris) and Valérie Schafer (Luxembourg Centre for Contemporary and Digital History, C2DH; Internet and Society Center, CNRS). While this did mean she was unable to attend any of the pre-WAC workshops on April 24th, the Spring School gave her a chance to prepare for the intensity of the conference and to have familiar faces to look for at the conference proper. She agrees that the diversity of both the careers and experiences of the attendees were a key component in the enrichment of their discussions, adding that this year’s mentoring program provided her and her fellow participants with “valuable insight on their career prospects and research subjects.”

Networking break in the Grand Auditorium Foyer of the BnF’s François Mitterrand site.Photo credit: Olga Holownia / IIPC

Conclusion

Overall, the bursary recipients found immeasurable value in the 2024 Web Archiving Conference, leaving with a wealth of gained knowledge, new connections, and a renewed sense of purpose in their web archiving careers.

“This conference was instrumental in shaping my understanding of web archival work, and I hope to use this knowledge as I prepare to begin my career as an archivist.”

– Lizzy Zarate

“Although some panels were too technical for my understanding, I can’t wait to have more experience in the field to understand its subtleties,” says Guérin, emphasizing that the conference experience confirmed her desire for a future career in web archiving. Melo agrees that the interactions with his fellow attendees “were instrumental in expanding my understanding of the global web archiving community,” and that he hopes that the connections he formed will lead to future collaborations.

“I left the conference with so much food for thought, and I am looking forward to the 2025 IIPC Web Archiving Conference in Oslo,” says Cannelli, before offering a “special thanks to the organizers for putting together such a fantastic event, and to the IIPC for their invaluable support through the student bursary.”

LINKS:

WAC 2024 Program
WAC 2024 Recordings
Reflections on the 2024 IIPC General Assembly and Web Archiving Conference by Friedel Geeraert
Web Archiving Conference 2024 Travel Report by Anastasia Nefeli Vidaki