Linux package managers are a very interesting mechanism. For one thing, the package manager is the main thing that distinguishes one family of Linux distros from another. Things like the desktop environment, the window manager, what programs are installed by default, etc. are all malleable and don’t distinguish a distro beyond just it’s initial configuration, but the package manager is unique to each family.
Another thing that makes a package manager interesting is that it relies on a complex and intricate toolchain of back-ends and front-ends. This toolchain is what I will be examining here. We’re going to take a look at what actually happens when you install a package using a package manager like
First, we need to establish the difference between source-based and binary-based package managers. These are the two main variations of package management used by Linux distros. As the terms suggest, a source-based package manager downloads the source from a repository, builds it using the GNU toolchain or something similar, and then configures and installs the software. A binary-based package manager follows the same process except it skips the build step, because the packages in the repository are already precompiled. The
apt package manager used in Debian-based distros like Ubuntu is source-based. The
pacman program used in Arch Linux and its variants is binary-based. The
yum package system used by Red Hat, Fedora, and related distros handles a mix of source-based and binary-based packages.
I’m going to focus on source-based package managers, since the sequence of steps taken by a binary-based package manager is basically a subset of those taken by the former. To illustrate, I will use
apt. Let’s say you’re running a Debian-based distro and you want to install a package, say Vim. So you type the following at the command line:
$ sudo apt install vim
When you run the package manager, several things happen. The steps are as follows:
- The package manager checks the Debian repository (or whatever repo corresponds to the Linux distro you’re using) to find the package. It looks at what dependencies the package has and installs any of those using the same procedure that it will use for the target package. This step can be done at different times during the procedure depending on where the dependency information is and whether the package is source-based or binary-based.
- If the target package has been successfully found, the package manager downloads it. Different package managers do this in different ways: the
wgetas a back-end.
apt, as far as I can tell, has a download procedure built into it.
- The package manager verifies the package with a hash to make sure it was not corrupted in transit. This is typically either an MD5 hash or a PGP signature, depending on what package system you’re using.
- You now have a verified package file with the .deb extension (assuming you’re using
apt). This extension actually conceals the true nature of the file, which is typically just a regular tarball. Back in the day, these tarballs were of the .tar.gz variety. Now most of them use the newer xz compression format rather than GZip. So if the package were truly transparent its extension would be .tar.xz and not .deb.
- To install the package from the tarball, the package manager now invokes the back-end portion, in this case
dpkg. This program first unpacks the tarball by invoking
- Assuming it’s a source-based package, the package manager back-end then goes into the newly unpacked directory and runs the Makefile. The Makefile may check for dependencies if they haven’t already been installed. It then invokes the GNU toolchain to build the software from source. Binary-based package managers will skip this and the following step.
- The GNU toolchain executes on the source files. For our purposes we will assume we are compiling a C program. The
gccprogram is a front-end for a chain of four different programs: first the preprocessor
cpp, which resolves any macros and includes any header files; then the compiler
cc1, which translates the C code into the intermediate GNU Assembler code; then
as– the GNU Assembler – which translates the assembly code into a linkable machine code object file; then the linker
ld, which links all the object files and library files together to produce a single binary.
- Finally, the package manager copies all files to their proper locations. Binaries are typically copied to /usr/bin, man pages are copied to /usr/share/man, any shared library files are copied to /lib or /usr/lib, and any additional header files are copied to /usr/include. If necessary, any config files are updated to accommodate the new software.
- The package manager performs a cleanup step, where all the original package files are deleted.
So now you see how the front-end chain works.
apt is a front-end for
dpkg, which is a front-end for
make, which is a front-end for
gcc, which is a front-end for
cpp/cc1/as/ld. Or, in the Fedora family,
yum is a front-end for
rpm, which is a front-end for
make, and so on.