In this article, we compare from a technical point of view the most famous version control systems (in the future we plan to expand the list):
- First generation
- Second generation
- Third generation
First-generation version control systems (VCS) tracked changes in individual files, and editing was supported only locally and by one user at a time. The systems were built on the assumption that all users will log in to their accounts on the same common Unix node.
Second-generation VCS introduced network support, which led to centralized repositories with βofficialβ versions of projects. This was significant progress, as several users could work with the code at the same time, committing to the same central repository. However, commits required access to the network.
The third generation consists of distributed VCS, where all copies of the repository are considered equal, there is no central repository. This paves the way for commits, branches, and merges that are created locally without access to the network and moved to other repositories as needed.
VCS Release Timeline
For context, here is a chart with the dates these tools appeared:
SCCS (Source Code Control System): first generation
SCCS is considered one of the first successful version control systems. It was developed in 1972 by Mark Rochkind of Bell Labs. The system is written in C and designed to track source file versions. In addition, it greatly facilitated the search for sources of errors in the program. The basic architecture and syntax of SCCS makes it possible to understand the roots of modern VCS tools.
Architecture
Like most modern systems, SCCS has a set of commands for working with file versions:
- Check-in files for history tracking in SCCS.
- Check-out specific versions of files for review or compilation.
- Extract specific versions for editing.
- Introducing new versions of files along with comments explaining the changes.
- Discard changes made to the extracted file.
- Major branching and merging changes.
- File Change Log.
When adding a tracking file to SCSS, a special type of file is created, which is called an
s-
or
. It is referred to as the source file, only with the
s.
prefix
s.
, and is stored in the
SCCS
subdirectory. Thus, for the
test.txt
file, a history file
test.txt
will be created in the
./SCCS/
directory. At the time of creation, the history file contains the initial contents of the source file, as well as some metadata that helps track versions. Checksums are stored here to ensure that the contents have not been modified. The contents of the history file are not compressed or encoded (as in the next generation VCS).
Since the contents of the source file are now stored in the history file, it can be extracted to the working directory for viewing, compilation or editing. You can make changes to the history file, such as adding lines, changing and deleting, which increases its version number.
Subsequent file additions store only
or changes, and not all of its contents. This reduces the size of the history file. Each delta is stored inside a history file in a structure called a
-
. As mentioned earlier, the actual contents of a file are more or less copied verbatim, with special escape sequences for marking the beginning and end of sections of added and deleted content. Because SCCS history files do not use compression, they are usually larger than the actual file in which changes are tracked. SCCS uses a method called
, which guarantees a constant retrieval time regardless of the age of the retrieved version, that is, older versions are retrieved at the same speed as new ones.
It is important to note that all files are tracked and recorded separately. It is not possible to check for changes in multiple files as a single atomic block, like commits in Git. Each tracked file has its own history file, in which its change history is stored. In general, this means that version numbers of various files in a project usually do not match. However, these versions can be agreed upon by simultaneously editing all the files in the project (without even making any real changes to them) and adding all the files at the same time. This will simultaneously increase the version number for all files, keeping them consistent, but note that this is not the same as including multiple files in a single commit, as in Git. SCCS individually adds history to each file, unlike one large commit that includes all the changes at once.
When a file is extracted for editing in SCCS, a lock is placed on it, so that no one else can edit it. This prevents overwriting of changes by other users, but also limits development, because at any given time only one user can work with this file.
SCCS supports branches that store change sequences in a specific file. You can merge a branch with the original version or with another branch.
Main teams
The following is a list of the most common SCCS commands.
-
sccs create <filename.ext>
: add a new file to SCCS and create a new history file for it (by default in the./SCCS/
directory).
-
sccs get <filename.ext>
: extract the file from the corresponding history file and put it in the working directory in read-only mode.
-
sccs edit <filename.ext>
: extract the file from the corresponding history file for editing. Lock the history file so that other users cannot modify it.
-
sccs delta <filename.ext>
: add changes to the specified file. The system will request a comment, save the changes to the history file and release the lock.
-
sccs prt <filename.ext>
: display the change log for the monitored file.
-
sccs diffs <filename.ext>
: show the differences between the current working copy of the file and the state of the file when it was extracted.
For more information on SCCS internals, see Eric Allman 's Guide and Oracle's Programming Utility Guide .
SCCS History File Example
^Ah20562 ^As 00001/00001/00002 ^Ad D 1.3 19/11/26 14:37:08 jack 3 2 ^Ac Here is a comment. ^Ae ^As 00002/00000/00001 ^Ad D 1.2 19/11/26 14:36:00 jack 2 1 ^Ac No. ^Ae ^As 00001/00000/00000 ^Ad D 1.1 19/11/26 14:35:27 jack 1 0 ^Ac date and time created 19/11/26 14:35:27 by jack ^Ae ^Au ^AU ^Af e 0 ^At ^AT ^AI 1 Hi there ^AE 1 ^AI 2 ^AD 3 This is a test of SCCS ^AE 2 ^AE 3 ^AI 3 A test of SCCS ^AE 3
RCS (Revision Control System): first generation
RCS was written in 1982 by Walter Tichy in C as an alternative to the SCCS system, which at that time was not open source.
Architecture
RCS has a lot in common with its predecessor, including:
- Versioning separately for each file.
- Changes to multiple files cannot be grouped into a single commit.
- Tracked files cannot be edited by multiple users at the same time.
- No network support.
- Versions of each tracked file are stored in the corresponding history file.
- Branching and merging versions for individual files only.
When a file is first added to RCS, a corresponding history file is created for it in the local storage in the local directory
./RCS/
. An extension
,v
, is added to this file, that is, a file called
test.txt
will be tracked by a file called
test.txt,v
.
RCS uses a reverse-delta scheme to store changes. When you add a file, a full snapshot of its contents is saved in the history file. When the file is modified and returned again, a delta is calculated based on the existing contents of the history file. The old picture is discarded, and the new one is saved along with the delta in order to return to the old state. This is called a
, because to retrieve an older version, RCS takes the latest version and sequentially applies deltas until it reaches the desired version. This method allows you to retrieve the current versions very quickly, since a full snapshot of the current revision is always available. However, the older the version, the longer the verification takes, because you need to check more and more deltas.
In SCCS, itβs different: it takes the same time to retrieve any version. In addition, the checksum is not stored in the RCS history files, so file integrity cannot be ensured.
Main teams
Below is a list of the most common RCS commands:
-
<filename.ext>
: add a new file to RCS and create a new history file for it (by default in the./RCS/
directory).
-
co <filename.ext>
: extract the file from the corresponding history file and put it in the working directory in read-only mode.
-
co -l <filename.ext>
: extract the file from the corresponding history file for editing. Lock the history file so that other users cannot modify it.
-
ci <filename.ext>
: add file changes and create a new revision for it in the corresponding history file.
-
merge <file-to-merge-into.ext> <parent.ext> <file-to-merge-from.ext>
: merge the changes from two modified children of the same parent file.
-
rcsdiff <filename.ext>
: display the differences between the current working copy of the file and the state of the file when it was extracted.
-
rcsclean
: delete work files that are not locked.
For more information on internal RCS components, see the GNU RCS Manual .
Example RCS History File
head 1.2; access; symbols; locks; strict; comment @# @; 1.2 date 2019.11.25.05.51.55; author jstopak; state Exp; branches; next 1.1; 1.1 date 2019.11.25.05.49.02; author jstopak; state Exp; branches; next ; desc @This is a test. @ 1.2 log @Edited the file. @ text @hi there, you are my bud. You are so cool! The end. @ 1.1 log @Initial revision @ text @d1 5 a5 1 hi there @
CVS (Concurrent Versions System): second generation
CVS was created by Dick Grun in 1986 to add network support to version control. It is also written in C and marks the birth of the second generation of VCS tools, thanks to which geographically dispersed development teams have the opportunity to work on projects together.
Architecture
CVS is a frontend for RCS, it has a new set of commands for interacting with files in a project, but under the hood the same RCS history file format and RCS commands are used. For the first time, CVS allowed several developers to work with the same files simultaneously. This is implemented using a centralized repository model. The first step is to configure a centralized repository using CVS on the remote server. Projects can then be imported into the repository. When a project is imported into CVS, each file is converted to a history file
,v
and stored in a central directory:
. The repository is usually located on a remote server, access to which is via a local network or the Internet.
The developer receives a copy of the module, which is copied to the working directory on his local computer. In this process, no files are blocked, so there is no limit on the number of developers who can simultaneously work with the module. Developers can modify their files and commit changes as necessary (commit). If a developer commits a change, other developers should update their working copies using the (usually) automated merge process before committing their changes. Sometimes you have to manually resolve merge conflicts before committing. CVS also provides the ability to create and merge branches.
Main teams
-
export CVSROOT=<path/to/repository>
: set the root directory of the CVS repository, so then you do not need to specify it in each command.
-
cvs import -m 'Import module' <module-name> <vendor-tag> <release-tag>
: import directories with files into the CVS module. Before starting this process, go to the root directory of the project.
-
cvs checkout <module-name>
: copy the module to the working directory.
-
cvs commit <filename.ext>
: commit the modified file back to the module in the central repository.
-
cvs add <filename.txt>
: add a new file to track changes.
-
cvs update
: update the working copy by merging committed changes that exist in the central repository but not in the working copy.
-
cvs status
: show general information about the extracted working copy of the module.
-
cvs tag <tag-name> <files>
: add a tag to a file or set of files.
-
cvs tag -b <new-branch-name>
: create a new branch in the repository (you need to extract it before local work).
-
cvs checkout -r <branch-name>
: extract an existing branch to the working directory.
-
cvs update -j <branch-to-merge>
: Merge an existing branch with a local working copy.
For more information on the internal components of CVS, see the GNU CVS Handbook and Dick Grohn's article .
CVS History File Example
head 1.1; branch 1.1.1; access ; symbols start:1.1.1.1 jack:1.1.1; locks ; strict; comment @# @; 1.1 date 2019.11.26.18.45.07; author jstopak; state Exp; branches 1.1.1.1; next ; commitid zsEBhVyPc4lonoMB; 1.1.1.1 date 2019.11.26.18.45.07; author jstopak; state Exp; branches ; next ; commitid zsEBhVyPc4lonoMB; desc @@ 1.1 log @Initial revision @ text @hi there @ 1.1.1.1 log @Imported sources @ text @@
SVN (Subversion): Second Generation
Subversion was created in 2000 by Collabnet Inc. and is currently supported by the Apache Software Foundation. The system is written in C and designed as a more reliable centralized solution than CVS.
Architecture
Like CVS, Subversion uses a centralized repository model. Remote users require a network connection to commit to the central repository.
Subversion introduced the functionality of atomic commits with the guarantee that the commit is either completely successful or completely canceled in the event of a problem. In CVS, if a malfunction occurs in the middle of a commit (for example, due to a network failure), the repository could remain in a damaged and inconsistent state. In addition, a commit or version in Subversion may include several files and directories. This is important because it allows you to track sets of related changes together as a grouped block, and not separately for each file, as in the systems of the past.
Subversion currently uses the FSFS (File System atop the File System) file system. Here a database is created with the structure of files and directories that correspond to the host file system. A unique feature of FSFS is that it is designed to track not only files and directories, but also their versions. This is a time-sensitive file system. Directories are also full objects in Subversion. You can commit empty directories to the system, while the rest (even Git) do not notice them.
When you create a Subversion repository, a (almost) empty database of files and folders is created in its composition. The
db/revs
directory is created, which stores all version tracking information for the added (committed) files. Each commit (which may include changes in several files) is stored in a new file in the
revs
directory, and it is given a name with a sequential numerical identifier starting with 1. The first commit saves the full contents of the file. Future commits of the same file will only save changes, which are also called
or deltas, to save space. In addition, deltas are compressed using
lz4
or
zlib
compression algorithms to reduce size.
Such a system only works until a certain point. Although deltas save space, but if there are a lot of them, then the operation takes a lot of time, since in order to recreate the current state of the file, all deltas must be processed. For this reason, by default, Subversion saves up to 1023 deltas per file, and then makes a new full copy of the file. This provides a good balance of storage and speed.
SVN does not use the usual branching and tagging system. The regular Subversion repository template contains three folders in the root:
-
trunk/
-
branches/
-
tags/
The
trunk/
directory is used for the production version of the project. Directory
branches/
- to store subfolders corresponding to individual branches. The
tags/
directory is for storing tags representing specific (usually significant) versions of a project.
Main teams
-
svn create <path-to-repository>
: create a new empty repository shell in the specified directory.
-
svn import <path-to-project> <svn-url>
: import the file directory into the specified Subversion repository.
-
svn checkout <svn-path> <path-to-checkout>
: copy the repository to the working directory.
-
svn commit -m 'Commit message'
: commit a set of modified files and folders along with the message.
-
svn add <filename.txt>
: add a new file to track changes.
-
svn update
: update the working copy by merging committed changes that exist in the central repository but not in the working copy.
-
svn status
: display a list of monitored files that have changed in the working directory (if any).
-
svn info
: general information about the extracted copy.
-
svn copy <branch-to-copy> <new-branch-path-and-name>
: create a new branch by copying an existing one.
-
svn switch <existing-branch>
: switch the working directory to an existing branch. This will allow you to take files from there.
-
svn merge <existing-branch>
: merge the specified branch with the current one copied to the working directory. Please note that you will need to commit later.
-
svn log
: show the history of commits and the corresponding messages for the active branch.
For more information on SVN internals, see Subversion Versioning .
Sample SVN History File
DELTA SVN^B^@^@ ^B ^A<89> hi there ENDREP id: 2-1.0.r1/4 type: file count: 0 text: 1 3 21 9 12f6bb1941df66b8f138a446d4e8670c 279d9035886d4c0427549863c4c2101e4a63e041 0-0/_4 cpath: /trunk/hi.txt copyroot: 0 / DELTA SVN^B^@^@$^B%^AΒ€$K 6 hi.txt V 15 file 2-1.0.r1/4 END ENDREP id: 0-1.0.r1/6 type: dir count: 0 text: 1 5 48 36 d84cb1c29105ee7739f3e834178e6345 - - cpath: /trunk copyroot: 0 / DELTA SVN^B^@^@'^B#^AΒ’'K 5 trunk V 14 dir 0-1.0.r1/6 END ENDREP id: 0.0.r1/2 type: dir pred: 0.0.r0/2 count: 1 text: 1 7 46 34 1d30e888ec9e633100992b752c2ff4c2 - - cpath: / copyroot: 0 / _0.0.t0-0 add-dir false false false /trunk _2.0.t0-0 add-file true false false /trunk/hi.txt L2P-INDEX ^A<80>@^A^A^A^M^H^@Γ€^HΓ·^AΓ©^FDΓ^BzΓ¨^AP2L-INDEX ^A<91>^E<80><80>@^A?^@'2^@<8d>Β»Γ<90>^CΒ§^A^X^@Γ΅ Β½Β½^N= ^@ΓΌ<8d>ΓΓ£^Ft^V^@<92><9a><89>Γ^E; ^@<8a>Γ₯w|I^@<88><83>Γ<93>^L`^M^@ΓΉ<92>Γ^EΓ―ΓΊ?^[^@^@657 6aad60ec758d121d5181ea4b81a9f5f4 688 75f59082c8b5ab687ae87708432ca406I
Git: third generation
The Git system was developed in 2005 by Linus Torvalds (creator of Linux). It is written primarily in C in conjunction with some command line scripts. Differs from VCS in function, flexibility and speed. Torvalds originally wrote the system for the Linux code base, but over time its scope has expanded, and today it is the most popular version control system in the world.
Architecture
Git is a distributed system. There is no central repository: all copies are created equal, which is very different from the second generation VCS, where the work is based on adding and extracting files from the central repository. This means that developers can exchange changes with each other immediately before merging their changes into an official branch.
In addition, developers can make their changes to the local copy of the repository without the knowledge of other repositories. This allows commits without being connected to a network or the Internet. Developers can work locally offline, until they are ready to share their work with others. At this point, the changes are sent to other repositories for verification, testing, or deployment.
When a file is added for tracking in Git, it is compressed using the
zlib
compression algorithm. The result is hashed using the SHA-1 hash function. This gives a unique hash that matches specifically the contents of this file. Git stores it in the
, which is located in a hidden folder
.git/objects
. The file name is the generated hash, and the file contains compressed content. These files are called
and are created each time a new file (or a modified version of an existing file) is added to the repository.
Git implements a staging index, which acts as an intermediate area for changes that are being prepared for commit. As new changes are prepared, their compressed contents are referenced in a special index file, which takes the form of a
object. A tree is a Git object that associates blobs with their real file names, file permissions and links to other trees and thus represents the state of a specific set of files and directories.When all relevant changes are prepared for commit, the index tree can be committed to the repository that creates the object
in the database of Git objects. The commit refers to the header tree for a particular version, as well as to the author of the commit, email address, date and message of the commit. Each commit also stores a link to its parent commit (s), and so over time a history of the development of the project is created.
As already mentioned, all Git objects - blobs, trees and commits - are compressed, hashed and stored in the database of objects based on their hashes. They're called
(loose objects). No diffs are used here to save space, which makes Git very fast, since the full contents of each version of the file are available as a free object. However, some operations, such as sending commits to a remote repository, storing a very large number of objects, or manually running the Git garbage collection command, cause the objects to be repackaged to
. In the process of packing, inverse diffs are calculated, which are compressed to eliminate redundancy and reduce size. As a result, files
.pack
with the contents of the object are created , and for each of them a file
.idx
(or index) is created with a link to the packed objects and their location in the batch file.
When branches are moved to or retrieved from remote repositories, these batch files are transferred over the network. When you pull or extract branches, the package files are unpacked to create free objects in the object repository.
Main teams
-
git init
: initialize the current directory as a Git repository (a hidden folder.git
and its contents are created).
-
git clone <git-url>
: Download a copy of the Git repository at the specified URL.
-
git add <filename.ext>
: add an untracked or modified file to the staging area (creates the corresponding records in the database of objects).
-
git commit -m 'Commit message'
: commit a set of changed files and folders along with a commit message.
-
git status
: show the status of the working directory, current branch, untracked files, modified files, etc.
-
git branch <new-branch>
: create a new branch based on the current extracted branch.
-
git checkout <branch>
: extract the specified branch to the working directory.
-
git merge <branch>
: merge the specified branch with the current one, which is extracted into the working directory.
-
git pull
: Update the working copy by combining the committed changes that exist in the remote repository, but not in the working copy.
-
git push
: pack free objects for local commits of the active branch into package files and transfer to a remote repository.
-
git log
: show the history of commits and the corresponding messages for the active branch.
-
git stash
: save all uncommitted changes from the working directory to the cache for later retrieval.
If you want to know how Git code works, check out the Git Starter Guide . For more information on internal components, see the corresponding chapter in the Pro Git book .
Example blob, tree and commit git
Blob with a hash
37d4e6c5c48ba0d245164c4e10d5f41140cab980
:
hi there
Tree object with a hash
b769f35b07fbe0076dcfc36fd80c121d747ccc04
:
100644 blob 37d4e6c5c48ba0d245164c4e10d5f41140cab980hi.txt
Hash commit
dc512627287a61f6111705151f4e53f204fbda9b
:
tree b769f35b07fbe0076dcfc36fd80c121d747ccc04 author Jacob Stopak 1574915303 -0800 committer Jacob Stopak 1574915303 -0800 Initial commit
Mercurial: third generation
Mercurial was created in 2005 by Matt McCall and written in Python. It is also designed for hosting the Linux codebase, but Git was eventually chosen for this task. This is the second most popular version control system, although it is used much less frequently.
Architecture
Mercurial is also a distributed system that allows any number of developers to work with their copy of the project independently of others. Mercurial uses many of the same technologies as Git, including SHA-1 compression and hashing, but it does it differently.
When a new file is committed for tracking in Mercurial, a corresponding file is created for it
revlog
in a hidden directory
.hg/store/data/
. You can consider the file
revlog
(change log) as an upgraded version
in older VCS like CVS, RCS, and SCCS. Unlike Git, which creates a new blob for each version of each prepared file, Mercurial simply creates a new revlog entry for this file. To save space, each new record contains only the delta from the previous version. When the threshold number of deltas is reached, the full snapshot of the file is saved again. This reduces the search time when processing a large number of deltas to restore a specific version of a file.
Each revlog is named according to the file it is tracking, but with the extension
.i
or
.d
. Files
.d
contain a compressed delta. Files
.i
are used as indexes to quickly track different versions within files.
.d
. For small files with few changes, indexes and content are stored inside files
.i
. Revlog entries are compressed for performance and hashed for identification. These hash values ββare named
nodeid
.
With each commit, Mercurial keeps track of all versions of the files in this commit in what is called a
. The manifest is also a revlog file - it stores entries corresponding to certain states of the repository. Instead of having separate file contents, like revlog, the manifest stores a list of file names and nodeids that determine which versions of the file exist in the project version. These manifest entries are also compressed and hashed. Hash values ββare also called
nodeid
.
Finally, Mercurial uses another type of revlog called changelog, that is, a change log. This is a list of entries that associate each commit with the following information:
- manifest nodeid: A complete set of file versions that exist at a particular point in time.
- one or two nodeid for parent commits: this allows Mercurial to build a timeline or branch of the project history. One or two parent IDs are stored depending on the type of commit (regular or merge).
- Commit Author
- Commit date
- Commit message
Each changelog entry also generates a hash known as its
nodeid
.
Main teams
-
hg init
: initialize the current directory as a Mercurial repository (creates a hidden folder.hg
and its contents).
-
hg clone <hg-url>
: Download a copy of the Mercurial repository at the specified URL.
-
hg add <filename.ext>
: Add a new file to track changes.
-
hg commit -m 'Commit message'
: commit the set of modified files and folders along with the commit message.
-
hg status
: display information related to the status of the working directory, untracked files, modified files, etc.
-
hg update <revision>
: extract the specified branch to the working directory.
-
hg merge <branch>
: merge the specified branch with the current one extracted to the working directory.
-
hg pull
: Download new versions from a remote repository, but do not merge them into a working directory.
-
hg push
: Transfer new versions to the remote repository.
-
hg log
: Show the history of commits and related messages for the active branch.
Example Mercurial Files
Manifesto:
hey.txt208b6e0998e8099b16ad0e43f036ec745d58ec04 hi.txt74568dc1a5b9047c8041edd99dd6f566e78d3a42
Change log (changelog):
b8ee947ce6f25b84c22fbefecab99ea918fc0969 Jacob Stopak 1575082451 28800 hey.txt Add hey.txt
Additional information about the Mercurial device: