History of Version Control Systems





In this article, we compare from a technical point of view the most famous version control systems (in the future we plan to expand the list):



  1. First generation

  2. Second generation
  3. Third generation


First-generation version control systems (VCS) tracked changes in individual files, and editing was supported only locally and by one user at a time. The systems were built on the assumption that all users will log in to their accounts on the same common Unix node.



Second-generation VCS introduced network support, which led to centralized repositories with β€œofficial” versions of projects. This was significant progress, as several users could work with the code at the same time, committing to the same central repository. However, commits required access to the network.



The third generation consists of distributed VCS, where all copies of the repository are considered equal, there is no central repository. This paves the way for commits, branches, and merges that are created locally without access to the network and moved to other repositories as needed.



VCS Release Timeline



For context, here is a chart with the dates these tools appeared:







SCCS (Source Code Control System): first generation



SCCS is considered one of the first successful version control systems. It was developed in 1972 by Mark Rochkind of Bell Labs. The system is written in C and designed to track source file versions. In addition, it greatly facilitated the search for sources of errors in the program. The basic architecture and syntax of SCCS makes it possible to understand the roots of modern VCS tools.



Architecture



Like most modern systems, SCCS has a set of commands for working with file versions:



  1. Check-in files for history tracking in SCCS.

  2. Check-out specific versions of files for review or compilation.

  3. Extract specific versions for editing.

  4. Introducing new versions of files along with comments explaining the changes.

  5. Discard changes made to the extracted file.

  6. Major branching and merging changes.

  7. File Change Log.


When adding a tracking file to SCSS, a special type of file is created, which is called an s-



or



. It is referred to as the source file, only with the s.



prefix s.



, and is stored in the SCCS



subdirectory. Thus, for the test.txt



file, a history file test.txt



will be created in the ./SCCS/



directory. At the time of creation, the history file contains the initial contents of the source file, as well as some metadata that helps track versions. Checksums are stored here to ensure that the contents have not been modified. The contents of the history file are not compressed or encoded (as in the next generation VCS).



Since the contents of the source file are now stored in the history file, it can be extracted to the working directory for viewing, compilation or editing. You can make changes to the history file, such as adding lines, changing and deleting, which increases its version number.



Subsequent file additions store only



or changes, and not all of its contents. This reduces the size of the history file. Each delta is stored inside a history file in a structure called a -



. As mentioned earlier, the actual contents of a file are more or less copied verbatim, with special escape sequences for marking the beginning and end of sections of added and deleted content. Because SCCS history files do not use compression, they are usually larger than the actual file in which changes are tracked. SCCS uses a method called



, which guarantees a constant retrieval time regardless of the age of the retrieved version, that is, older versions are retrieved at the same speed as new ones.



It is important to note that all files are tracked and recorded separately. It is not possible to check for changes in multiple files as a single atomic block, like commits in Git. Each tracked file has its own history file, in which its change history is stored. In general, this means that version numbers of various files in a project usually do not match. However, these versions can be agreed upon by simultaneously editing all the files in the project (without even making any real changes to them) and adding all the files at the same time. This will simultaneously increase the version number for all files, keeping them consistent, but note that this is not the same as including multiple files in a single commit, as in Git. SCCS individually adds history to each file, unlike one large commit that includes all the changes at once.



When a file is extracted for editing in SCCS, a lock is placed on it, so that no one else can edit it. This prevents overwriting of changes by other users, but also limits development, because at any given time only one user can work with this file.



SCCS supports branches that store change sequences in a specific file. You can merge a branch with the original version or with another branch.



Main teams



The following is a list of the most common SCCS commands.





For more information on SCCS internals, see Eric Allman 's Guide and Oracle's Programming Utility Guide .



SCCS History File Example



 ^Ah20562 ^As 00001/00001/00002 ^Ad D 1.3 19/11/26 14:37:08 jack 3 2 ^Ac Here is a comment. ^Ae ^As 00002/00000/00001 ^Ad D 1.2 19/11/26 14:36:00 jack 2 1 ^Ac No. ^Ae ^As 00001/00000/00000 ^Ad D 1.1 19/11/26 14:35:27 jack 1 0 ^Ac date and time created 19/11/26 14:35:27 by jack ^Ae ^Au ^AU ^Af e 0 ^At ^AT ^AI 1 Hi there ^AE 1 ^AI 2 ^AD 3 This is a test of SCCS ^AE 2 ^AE 3 ^AI 3 A test of SCCS ^AE 3
      
      





RCS (Revision Control System): first generation



RCS was written in 1982 by Walter Tichy in C as an alternative to the SCCS system, which at that time was not open source.



Architecture



RCS has a lot in common with its predecessor, including:





When a file is first added to RCS, a corresponding history file is created for it in the local storage in the local directory ./RCS/



. An extension ,v



, is added to this file, that is, a file called test.txt



will be tracked by a file called test.txt,v



.



RCS uses a reverse-delta scheme to store changes. When you add a file, a full snapshot of its contents is saved in the history file. When the file is modified and returned again, a delta is calculated based on the existing contents of the history file. The old picture is discarded, and the new one is saved along with the delta in order to return to the old state. This is called a



, because to retrieve an older version, RCS takes the latest version and sequentially applies deltas until it reaches the desired version. This method allows you to retrieve the current versions very quickly, since a full snapshot of the current revision is always available. However, the older the version, the longer the verification takes, because you need to check more and more deltas.



In SCCS, it’s different: it takes the same time to retrieve any version. In addition, the checksum is not stored in the RCS history files, so file integrity cannot be ensured.



Main teams



Below is a list of the most common RCS commands:





For more information on internal RCS components, see the GNU RCS Manual .



Example RCS History File



 head 1.2; access; symbols; locks; strict; comment @# @; 1.2 date 2019.11.25.05.51.55; author jstopak; state Exp; branches; next 1.1; 1.1 date 2019.11.25.05.49.02; author jstopak; state Exp; branches; next ; desc @This is a test. @ 1.2 log @Edited the file. @ text @hi there, you are my bud. You are so cool! The end. @ 1.1 log @Initial revision @ text @d1 5 a5 1 hi there @
      
      





CVS (Concurrent Versions System): second generation



CVS was created by Dick Grun in 1986 to add network support to version control. It is also written in C and marks the birth of the second generation of VCS tools, thanks to which geographically dispersed development teams have the opportunity to work on projects together.



Architecture



CVS is a frontend for RCS, it has a new set of commands for interacting with files in a project, but under the hood the same RCS history file format and RCS commands are used. For the first time, CVS allowed several developers to work with the same files simultaneously. This is implemented using a centralized repository model. The first step is to configure a centralized repository using CVS on the remote server. Projects can then be imported into the repository. When a project is imported into CVS, each file is converted to a history file ,v



and stored in a central directory:



. The repository is usually located on a remote server, access to which is via a local network or the Internet.



The developer receives a copy of the module, which is copied to the working directory on his local computer. In this process, no files are blocked, so there is no limit on the number of developers who can simultaneously work with the module. Developers can modify their files and commit changes as necessary (commit). If a developer commits a change, other developers should update their working copies using the (usually) automated merge process before committing their changes. Sometimes you have to manually resolve merge conflicts before committing. CVS also provides the ability to create and merge branches.



Main teams





For more information on the internal components of CVS, see the GNU CVS Handbook and Dick Grohn's article .



CVS History File Example



 head 1.1; branch 1.1.1; access ; symbols start:1.1.1.1 jack:1.1.1; locks ; strict; comment @# @; 1.1 date 2019.11.26.18.45.07; author jstopak; state Exp; branches 1.1.1.1; next ; commitid zsEBhVyPc4lonoMB; 1.1.1.1 date 2019.11.26.18.45.07; author jstopak; state Exp; branches ; next ; commitid zsEBhVyPc4lonoMB; desc @@ 1.1 log @Initial revision @ text @hi there @ 1.1.1.1 log @Imported sources @ text @@
      
      





SVN (Subversion): Second Generation



Subversion was created in 2000 by Collabnet Inc. and is currently supported by the Apache Software Foundation. The system is written in C and designed as a more reliable centralized solution than CVS.



Architecture



Like CVS, Subversion uses a centralized repository model. Remote users require a network connection to commit to the central repository.



Subversion introduced the functionality of atomic commits with the guarantee that the commit is either completely successful or completely canceled in the event of a problem. In CVS, if a malfunction occurs in the middle of a commit (for example, due to a network failure), the repository could remain in a damaged and inconsistent state. In addition, a commit or version in Subversion may include several files and directories. This is important because it allows you to track sets of related changes together as a grouped block, and not separately for each file, as in the systems of the past.



Subversion currently uses the FSFS (File System atop the File System) file system. Here a database is created with the structure of files and directories that correspond to the host file system. A unique feature of FSFS is that it is designed to track not only files and directories, but also their versions. This is a time-sensitive file system. Directories are also full objects in Subversion. You can commit empty directories to the system, while the rest (even Git) do not notice them.



When you create a Subversion repository, a (almost) empty database of files and folders is created in its composition. The db/revs



directory is created, which stores all version tracking information for the added (committed) files. Each commit (which may include changes in several files) is stored in a new file in the revs



directory, and it is given a name with a sequential numerical identifier starting with 1. The first commit saves the full contents of the file. Future commits of the same file will only save changes, which are also called



or deltas, to save space. In addition, deltas are compressed using lz4



or zlib



compression algorithms to reduce size.



Such a system only works until a certain point. Although deltas save space, but if there are a lot of them, then the operation takes a lot of time, since in order to recreate the current state of the file, all deltas must be processed. For this reason, by default, Subversion saves up to 1023 deltas per file, and then makes a new full copy of the file. This provides a good balance of storage and speed.



SVN does not use the usual branching and tagging system. The regular Subversion repository template contains three folders in the root:





The trunk/



directory is used for the production version of the project. Directory branches/



- to store subfolders corresponding to individual branches. The tags/



directory is for storing tags representing specific (usually significant) versions of a project.



Main teams





For more information on SVN internals, see Subversion Versioning .



Sample SVN History File



 DELTA SVN^B^@^@ ^B ^A<89> hi there ENDREP id: 2-1.0.r1/4 type: file count: 0 text: 1 3 21 9 12f6bb1941df66b8f138a446d4e8670c 279d9035886d4c0427549863c4c2101e4a63e041 0-0/_4 cpath: /trunk/hi.txt copyroot: 0 / DELTA SVN^B^@^@$^B%^AΒ€$K 6 hi.txt V 15 file 2-1.0.r1/4 END ENDREP id: 0-1.0.r1/6 type: dir count: 0 text: 1 5 48 36 d84cb1c29105ee7739f3e834178e6345 - - cpath: /trunk copyroot: 0 / DELTA SVN^B^@^@'^B#^AΒ’'K 5 trunk V 14 dir 0-1.0.r1/6 END ENDREP id: 0.0.r1/2 type: dir pred: 0.0.r0/2 count: 1 text: 1 7 46 34 1d30e888ec9e633100992b752c2ff4c2 - - cpath: / copyroot: 0 / _0.0.t0-0 add-dir false false false /trunk _2.0.t0-0 add-file true false false /trunk/hi.txt L2P-INDEX ^A<80>@^A^A^A^M^H^@Γ€^HΓ·^AΓ©^FDÎ^BzΓ¨^AP2L-INDEX ^A<91>^E<80><80>@^A?^@'2^@<8d>»Ý<90>^CΒ§^A^X^@Γ΅ Β½Β½^N= ^@ΓΌ<8d>Ôã^Ft^V^@<92><9a><89>Γƒ^E; ^@<8a>Γ₯w|I^@<88><83>Î<93>^L`^M^@ΓΉ<92>Γ€^EΓ―ΓΊ?^[^@^@657 6aad60ec758d121d5181ea4b81a9f5f4 688 75f59082c8b5ab687ae87708432ca406I
      
      





Git: third generation



The Git system was developed in 2005 by Linus Torvalds (creator of Linux). It is written primarily in C in conjunction with some command line scripts. Differs from VCS in function, flexibility and speed. Torvalds originally wrote the system for the Linux code base, but over time its scope has expanded, and today it is the most popular version control system in the world.



Architecture



Git is a distributed system. There is no central repository: all copies are created equal, which is very different from the second generation VCS, where the work is based on adding and extracting files from the central repository. This means that developers can exchange changes with each other immediately before merging their changes into an official branch.



In addition, developers can make their changes to the local copy of the repository without the knowledge of other repositories. This allows commits without being connected to a network or the Internet. Developers can work locally offline, until they are ready to share their work with others. At this point, the changes are sent to other repositories for verification, testing, or deployment.



When a file is added for tracking in Git, it is compressed using the zlib



compression algorithm. The result is hashed using the SHA-1 hash function. This gives a unique hash that matches specifically the contents of this file. Git stores it in the



, which is located in a hidden folder .git/objects



. The file name is the generated hash, and the file contains compressed content. These files are called



and are created each time a new file (or a modified version of an existing file) is added to the repository.



Git implements a staging index, which acts as an intermediate area for changes that are being prepared for commit. As new changes are prepared, their compressed contents are referenced in a special index file, which takes the form of a



object. A tree is a Git object that associates blobs with their real file names, file permissions and links to other trees and thus represents the state of a specific set of files and directories.When all relevant changes are prepared for commit, the index tree can be committed to the repository that creates the object



in the database of Git objects. The commit refers to the header tree for a particular version, as well as to the author of the commit, email address, date and message of the commit. Each commit also stores a link to its parent commit (s), and so over time a history of the development of the project is created.



As already mentioned, all Git objects - blobs, trees and commits - are compressed, hashed and stored in the database of objects based on their hashes. They're called



(loose objects). No diffs are used here to save space, which makes Git very fast, since the full contents of each version of the file are available as a free object. However, some operations, such as sending commits to a remote repository, storing a very large number of objects, or manually running the Git garbage collection command, cause the objects to be repackaged to



. In the process of packing, inverse diffs are calculated, which are compressed to eliminate redundancy and reduce size. As a result, files .pack



with the contents of the object are created , and for each of them a file .idx



(or index) is created with a link to the packed objects and their location in the batch file.



When branches are moved to or retrieved from remote repositories, these batch files are transferred over the network. When you pull or extract branches, the package files are unpacked to create free objects in the object repository.



Main teams





If you want to know how Git code works, check out the Git Starter Guide . For more information on internal components, see the corresponding chapter in the Pro Git book .



Example blob, tree and commit git



Blob with a hash 37d4e6c5c48ba0d245164c4e10d5f41140cab980



:



 hi there
      
      





Tree object with a hash b769f35b07fbe0076dcfc36fd80c121d747ccc04



:



 100644 blob 37d4e6c5c48ba0d245164c4e10d5f41140cab980hi.txt
      
      





Hash commit dc512627287a61f6111705151f4e53f204fbda9b



:



 tree b769f35b07fbe0076dcfc36fd80c121d747ccc04 author Jacob Stopak 1574915303 -0800 committer Jacob Stopak 1574915303 -0800 Initial commit
      
      





Mercurial: third generation



Mercurial was created in 2005 by Matt McCall and written in Python. It is also designed for hosting the Linux codebase, but Git was eventually chosen for this task. This is the second most popular version control system, although it is used much less frequently.



Architecture



Mercurial is also a distributed system that allows any number of developers to work with their copy of the project independently of others. Mercurial uses many of the same technologies as Git, including SHA-1 compression and hashing, but it does it differently.



When a new file is committed for tracking in Mercurial, a corresponding file is created for it revlog



in a hidden directory .hg/store/data/



. You can consider the file revlog



(change log) as an upgraded version



in older VCS like CVS, RCS, and SCCS. Unlike Git, which creates a new blob for each version of each prepared file, Mercurial simply creates a new revlog entry for this file. To save space, each new record contains only the delta from the previous version. When the threshold number of deltas is reached, the full snapshot of the file is saved again. This reduces the search time when processing a large number of deltas to restore a specific version of a file.



Each revlog is named according to the file it is tracking, but with the extension .i



or .d



. Files .d



contain a compressed delta. Files .i



are used as indexes to quickly track different versions within files..d



. For small files with few changes, indexes and content are stored inside files .i



. Revlog entries are compressed for performance and hashed for identification. These hash values ​​are named nodeid



.



With each commit, Mercurial keeps track of all versions of the files in this commit in what is called a



. The manifest is also a revlog file - it stores entries corresponding to certain states of the repository. Instead of having separate file contents, like revlog, the manifest stores a list of file names and nodeids that determine which versions of the file exist in the project version. These manifest entries are also compressed and hashed. Hash values ​​are also called nodeid



.



Finally, Mercurial uses another type of revlog called changelog, that is, a change log. This is a list of entries that associate each commit with the following information:





Each changelog entry also generates a hash known as its nodeid



.



Main teams





Example Mercurial Files



Manifesto:



 hey.txt208b6e0998e8099b16ad0e43f036ec745d58ec04 hi.txt74568dc1a5b9047c8041edd99dd6f566e78d3a42
      
      





Change log (changelog):



 b8ee947ce6f25b84c22fbefecab99ea918fc0969 Jacob Stopak 1575082451 28800 hey.txt Add hey.txt
      
      





Additional information about the Mercurial device:






All Articles