How Linux Package Managers Work

tux-package

Linux package managers are a very interesting mechanism. For one thing, the package manager is the main thing that distinguishes one family of Linux distros from another. Things like the desktop environment, the window manager, what programs are installed by default, etc. are all malleable and don’t distinguish a distro beyond just it’s initial configuration, but the package manager is unique to each family.

Another thing that makes a package manager interesting is that it relies on a complex and intricate toolchain of back-ends and front-ends. This toolchain is what I will be examining here. We’re going to take a look at what actually happens when you install a package using a package manager like apt, pacman, yum, etc.

First, we need to establish the difference between source-based and binary-based package managers. These are the two main variations of package management used by Linux distros. As the terms suggest, a source-based package manager downloads the source from a repository, builds it using the GNU toolchain or something similar, and then configures and installs the software. A binary-based package manager follows the same process except it skips the build step, because the packages in the repository are already precompiled. The apt package manager used in Debian-based distros like Ubuntu is source-based. The pacman program used in Arch Linux and its variants is binary-based. The yum package system used by Red Hat, Fedora, and related distros handles a mix of source-based and binary-based packages.

I’m going to focus on source-based package managers, since the sequence of steps taken by a binary-based package manager is basically a subset of those taken by the former. To illustrate, I will use apt. Let’s say you’re running a Debian-based distro and you want to install a package, say Vim. So you type the following at the command line:


$ sudo apt install vim

When you run the package manager, several things happen. The steps are as follows:

  1. The package manager checks the Debian repository (or whatever repo corresponds to the Linux distro you’re using) to find the package. It looks at what dependencies the package has and installs any of those using the same procedure that it will use for the target package. This step can be done at different times during the procedure depending on where the dependency information is and whether the package is source-based or binary-based.
  2. If the target package has been successfully found, the package manager downloads it. Different package managers do this in different ways: the pacman program uses wget as a back-end. apt, as far as I can tell, has a download procedure built into it.
  3. The package manager verifies the package with a hash to make sure it was not corrupted in transit. This is typically either an MD5 hash or a PGP signature, depending on what package system you’re using.
  4. You now have a verified package file with the .deb extension (assuming you’re using apt). This extension actually conceals the true nature of the file, which is typically just a regular tarball. Back in the day, these tarballs were of the .tar.gz variety. Now most of them use the newer xz compression format rather than GZip. So if the package were truly transparent its extension would be .tar.xz and not .deb.
  5. To install the package from the tarball, the package manager now invokes the back-end portion, in this case dpkg. This program first unpacks the tarball by invoking unxz and then tar -x.
  6. Assuming it’s a source-based package, the package manager back-end then goes into the newly unpacked directory and runs the Makefile. The Makefile may check for dependencies if they haven’t already been installed. It then invokes the GNU toolchain to build the software from source. Binary-based package managers will skip this and the following step.
  7. The GNU toolchain executes on the source files. For our purposes we will assume we are compiling a C program. The gcc program is a front-end for a chain of four different programs: first the preprocessor cpp, which resolves any macros and includes any header files; then the compiler cc1, which translates the C code into the intermediate GNU Assembler code; then as – the GNU Assembler – which translates the assembly code into a linkable machine code object file; then the linker ld, which links all the object files and library files together to produce a single binary.
  8. Finally, the package manager copies all files to their proper locations. Binaries are typically copied to /usr/bin, man pages are copied to /usr/share/man, any shared library files are copied to /lib or /usr/lib, and any additional header files are copied to /usr/include. If necessary, any config files are updated to accommodate the new software.
  9. The package manager performs a cleanup step, where all the original package files are deleted.

So now you see how the front-end chain works. apt is a front-end for dpkg, which is a front-end for make, which is a front-end for gcc, which is a front-end for cpp/cc1/as/ld. Or, in the Fedora family, yum is a front-end for rpm, which is a front-end for make, and so on.

12 thoughts on “How Linux Package Managers Work

        1. Actually I wasn’t familiar with dselect either. Just looked it up on Wikipedia, and now I vaguely recall reading about it in an old edition of Linux in a Nutshell.

          Like

          1. My lecturer for our Linux-related modules is a very pro-CLI, so he encourages us to either directly use apt (we’re taught with Mint and normal Debian) or use dselect as our “UI” for it rather than Synaptic.

            Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s