GitHub has created a thousand-year-old repository in which it will save Open Source repositories for posterity



Former coal mine hosting the Arctic World Archive. Photo : Guy Martin / Bloomberg Businessweek



Free software is the cornerstone of modern civilization and the common heritage of all mankind. The mission of the GitHub Archive program is to preserve this code for future generations so that the history of the Alexandria Library never repeats itself.



To do this, GitHub will maintain many backups on different media, including the long-term Arctic Code Vault storage on Svalbard. It is located in a former coal mine at a depth of 250 meters in permafrost and is designed for a shelf life of at least 1000 years.



A snapshot of the human code will be taken on February 2, 2020 .



The long-term data storage project was launched jointly with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, the Arctic World Archive and other partners.



LOCKSS Project



Today's vital code may be forgotten or lost over time. The worst thing is if in the event of a global catastrophe we lose all the information that was stored on “ephemeral” media: HDDs, SSDs, CDs and DVDs, designed for several decades, on tapes whose conditional life of 30 years requires strict control of temperature and humidity .



The solution to the problem is duplication of backups, that is, archiving software by several organizations and in different forms. This project, called LOCKSS, started for almost 20 years . In May 2019, the LOCKSS 2.0-alpha program was introduced - the first prototype software for distributed data storage for a long time with the support of many participants and external storage.



The developers of the system proceed from the fact that hardware can be much more durable than ephemeral media: therefore, "there are a number of possible future options in which modern working computers exist, but their software is largely lost."



GitHub recalls a lot of lost technologies that could be useful: Roman concrete (its recipe was rediscovered only in 2014), the anti-malarial drug DFDT , lost drawings of the Saturn-5 rocket . It’s easy to imagine a future in which today's software will be considered bizarre and long forgotten unnecessarily until there is an unexpected need for it: “Like any backup, the GitHub archive program is also designed for the unforeseen future,” says the GitHub program website Archive



Github archive



GitHub Archive provides three levels of backups:





After any action by GitHub users, all Git data is replicated to several data centers around the world. Git backups, issue, pool requests and all user data on GitHub are stored in several places. This information is available in real time through the GitHub API.



In addition, recursive indexing was organized by the GHTorrent crawler, which will upload archives on a daily or monthly basis. Through GH Archive, snapshots from the archive can be retrieved by BigQuery. Other copies of the code are located in the well-known “Time Machine” for the Internet archive, which stores copies in several places. Finally, the Software Heritage Foundation will regularly scan GitHub and add its public repositories to its archive, for which there is a public API.



Arctic GitHub Storage



On February 2, 2020, GitHub will make a copy of all active public repositories - and place them in the Arctic GitHub repository.



Data will be stored on 3,500-foot film reels provided by the Norwegian company Piql, which specializes in long-term data storage. According to ISO measurements, this film with silver halide in polyester has a life span of 500 years. Simulation aging tests have shown that Piql film retains information at least twice as long.



In addition, GitHub Archive is working with researchers at the Microsoft Silica project to record all public repositories on quartz glass plates using a femtosecond laser. This media will ensure data safety for more than 10,000 years.



The GitHub Arctic code repository is based on the Arctic World Archive (AWA) at a depth of 250 meters in permafrost. The archive is located in a former coal mine on the Spitsbergen archipelago, which is not very far from the North Pole. Global warming will affect only a few meters of permafrost and does not threaten the mine in the near future (several thousand years).



Svalbard is regulated by international treaty as a demilitarized zone. This is one of the most remote and geopolitically stable human settlements on Earth, says GitHub. There is nearby the famous World Seed Repository, the main hope of mankind in the event of an apocalypse.





Svalbard World Seed Store



AWA is a joint initiative between Norwegian state mining company Norske Spitsbergen Kulkompani (SNSK) and digital preservation provider Piql AS. Historical and cultural data from Italy, Brazil, Norway, the Vatican and other countries are already stored there.





Photo : Guy Martin / Bloomberg Businessweek



GitHub coils will be stored in a container with steel walls inside a sealed chamber. All the active GitHub repositories and a significant part of the inactive ones (judging by the stars, dependencies, etc.), all binary files up to 100 KB, will fall into the snapshot 02.02.2020. Each repository in a separate tar file. Everything should fit on 200 coils of 120 GB.



Together with the archive, they will put a human-readable catalog and technical manuals on QR decoding, file formats, character encodings and other important metadata so that descendants can convert the data back to source code.



The archive will also include the Tech Tech general guide in case future readers do not have computers running and need to restore technology from scratch.



All Articles