After several years of experimentation, the legal deposit of the Internet by the National Library of France is now part of a legal framework. To meet the mass of digital collections, the collection of websites is provided by Sega automatic or semi-automatic. The resulting developments concern selection – through the placement of directed collections in addition to those of robots – and valorization: do they characterize profound developments for the profession? What room for manoeuvre and what added value can the librarian keep? After several years of experimentation and testing of different approaches, the National Library of France has just created a legal framework for the mandatory deposit of websites. The huge increase in digital-only collections means that websites are now harvested automatically or semi-automatically. Do developments such as the use of targeted collection strategies in addition to automatic crawlers as a means of improving site selection reflect profound changes in library science as a whole? What freedom do librarians have to intervene in the process and what added value will they be able to bring? The management of collection operations and their imminent expansion to larger quantities involved the hiring of seven agents. It took several years to put together this team, which was largely formed on the job and took advantage of the cooperation opportunities offered by the International Consortium for the Preservation of the Internet (IIPC). In fact, monitoring collection robots is not a purely technical process. The quality of an archival collection is determined when registering the places whose collection is programmed, monitored and evaluated. Engineers and technicians involved in this work must be fully aware of the content they produce, the documentary problems and the risks associated with cultural heritage. Librarians who are their interlocutors cannot ignore the operation of servers or the IT costs that weigh on production, at the risk of disrupting the entire economy and the coherence of work. Consult the online report of phase 2 of the project: Valérie Beaudouin, Zeynep Pehlivan, “Cartography of the Great War on the Web” (91 pages) The archives that this course aims to present and which invite the Internet user to browse a small part of the history of the Web in France in the 1990s share several particularities.

Consult the online report of phase 3 of the project: Lionel Maurel, Josselin Morvan, “The circulation of a digitized corpus on the web: the example of Valois albums” (75 pages) It is a question of stimulating the web in France, it will also soon be a question of stimulating a web French languages. It is also on this subject that a number of web players such as the internet club communicate. In 2003 and 2004, an extensive awareness-raising campaign on this imminent threat made it possible to find additional resources to implement the entire plan at an appropriate pace in view of the estimated deterioration of the media. Once a year, the BnF carries out a large collection: this collection aims to archive the largest number of websites on the French Internet, but not necessarily in their entirety. Some places are selected by librarians according to documentary criteria. They are the subject of targeted collections that vary in frequency (1 time per year to several times a day) and in depth (in whole or in part). A website that has been the subject of a collection request will be archived within a maximum of three months. For the purposes of this system, the conditions do not need to be verified before the investigation (which would invalidate the economy), but can be checked retrospectively in the event of a dispute or claim. This conception of the terrain is a pragmatic compromise. It leaves it to the BnF and the publishers to examine, on the basis of objective criteria, whether the right to a publication is applicable or not. In any case, it allows the collection of more than .fr, which we know concerns only a limited part of the content likely to be of interest to the national heritage 4. The digital heritage that the BnF has built since 2004 already includes 130 terabytes of data, or 130 million bytes and 12 billion files: it is one of the largest collections of web archives in the world, according to those of the Internet Archive and the Library of Alexandria 5.

Refer to your work and increase your visibility. Inside the David Paul Carr/BnF collection server © This funding has made it possible to make extensive use of subcontracting while installing backup and digitization chains at INA dedicated to the communication of archives for our customers. The progressive and systematic digitization of the collections has also made it possible to rediscover forgotten broadcasts and to open up direct access to image and sound. First for professionals, with the site inA médiapro, from 2004, then for the general public, with ina.fr, in 2006. In 2020, inA launched its new content platforms: mediaclip for professionals and madelen for the general public. In the meantime, new formats and funds have been incorporated into the original NHP, such as the ORP funds. In 2020, these digitization measures made it possible to save a very large number of new programs from the antennas France 3, France Bleu and Outre-mer 1ère. They also made it possible to feed the various channels of valorization of the collections both on the shelf and on the digital offers of the Institute. The BnF also produces thematic collections of the Web.

The catches are not regular and some French sites have been able to escape him. Mainly due to the technical limitations of the collection tools, the web archives are a universe of inflated, but also missing sources. The archive collection is rarely very deep, the researcher comes across unarchived pages, missing images and broken links. Curiosities sometimes appear above the pages (interconnected content spaced several months apart, chronological jumps or anomalies, etc.): if the entry in the black box of the web archive partially explains these reconstructions, opening it also allows a better understanding of what native digital heritage is and provides tools to use it.

Categories: