Cleaning Up Gentoo Distfiles


calendar

January 18, 2020

categories

Linux

gentoolinuxportage

One of the major benefits Gentoo has over other Linux distributions, is its source-based package system.

While more widespread distros (such as Ubuntu or Fedora) rely on binary packages, packages in Gentoo are usually compiled on the target system. This means compile-time features can be enabled/disabled, and multiple versions of a single package can be installed side-by-side.

The downside to this is that a lot of cached files remain on the system, and are not removed automatically, wasting disk space.

It is fairly straightforward to trim the cache, and potentially reclaim gigabytes space.

Portage Package Cache

When installing a package, Portage (the Gentoo package manager) will first save a copy of the package source. From this, it will then build and install the package. By default, these packages are saved in /var/cache/distfiles, but this directory can be overridden in /var/portage/make.conf. On my system, the cache is stored in /var/portage/distfiles.

All of this cached source code will be kept by Portage, even when the package is no longer installed. If you need to downgrade to a previous version, or rebuild a current version. the source will not need to be downloaded again – both speeding up the process and reducing server load.

The downside to this is that, over time, the system will accumulate a large quantity of old source files, which are no longer needed. While it would be useful to keep copies of 1 or 2 versions of a package, any more than this is a waste of space.

Distfiles before cleaning
Baobab showing the current size of the distfiles folder on my system.

A quick look at this folder (using sys-apps/baobab, also known as ‘Disk Usage Analyser’) shows that, in the 18 months since my system was installed, almost 20GB worth of package source files have been cached.

Of course, some of this would be retained after the clean, but there is definitely space to be saved.

Installing & Using Eclean

Fortunately, there is a utility available to make the process of cleaning up the Portage cache easy – eclean. Part of Gentoolkit, eclean is specifically designed to clean up unneeded package source files, and binary packages.

It is assumed that all commands below will be executed in a root shell. If not, every command should be prefixed with ‘sudo’ to run them with root privileges.

You can install Gentoolkit by running…

emerge -a app-portage/gentoolkit

Running eclean on the source files is as simple as running either

eclean distfiles

…or the shorthand version…

eclean-dist

A summary of all of the removed files, their individual sizes, and the overall space saved, will returned. Adding -p (or –pretend) to the end of either of these commands will allow you to see a list of files to be cleaned, without making any changes.

Running eclean
The end of my eclean output – over 6GB has been freed.

On my system, this has saved over 6GB (around 1/3) from the distfiles folder, which can be confirmed by taking another look at baobab.

Distfiles after cleaning
Still a large directory, but a third smaller than it was originally.

By default, eclean will remove the source of any package version that is not installed, and not available from the repositories.

It is possible to save even more space, by running either of

eclean distfiles --deep eclean-dist --deep

This will additionally remove the source for any package versions available from the repos, but not currently installed, leaving only the sources for installed packages. On my system, a deep clean saved a further 9GB, for a total saving of around 16GB (~85%).

Running Eclean on a Schedule

To maintain the benefits of using eclean, it will need to be ran regularly. It is possible to automate the process, using cron.

You will first need to make sure there is a cron daemon installed on the system. To do this, run

emerge -a virtual/cron

At time time of writing, cronie is the Gentoo default.

To add a cron job, we add our command to the crontab file. To use crontab, run

crontab -e

Alternatively, you can append ‘sudo’ to the front of the crontab command, rather than using su.

By default, crontab will attempt to use Vi as the editor. If you do not have Vi installed, you will receive an error. To use an alternative editor, add

EDITOR=editor_name

in front of the crontab command, replacing editor_name with the name of the editor you wish to use. For example, to use nano, the command would be

EDITOR=nano crontab -e

The structure of a crontab entry is as follows:

minutehourday of monthmonthday of weekcommand
2015**6/bin/echo “hello” > ~/test

The value in each column should be separated by a space.

20 15 * * 6 /bin/echo "hello" > ~/test

The entry accepts parameters to define the time and day on which to run the command. The example above will add the string “hello” to the file ‘test’ in the users home directory, and will run at 15:20 on every 6th day of the week (Saturday).

It is also possible to define an interval between runs, using */. For example,

*/5 * * * * command

will run every 5 minutes.

There are also useful shorthand statements, which will also run the command at set intervals.

@reboot @yearly @annually @monthly @weekly @daily @hourly

Once you have chosen a timeframe, add it, along with the command to run, into the file. You can use any variation of the eclean command mentioned in the previous section.

@weekly eclean-dist --deep

I use the command above, to run the deeper clean once a week.

The final step to set up cron is to ensure that the service to run it is enabled. If using cronie, run the appropriate command below.

systemctl enable cronie # systemd
service enable cronie # SysVinit

For other cron daemons, replace cronie with the service name of the daemon. Note that in SysVinit, the command (enable) and the service (cronie) can be written interchangeably – the systemd command must be written in the order specified.

To test the cron is working, it is a good idea to add an output log to the command, as below.

@weekly eclean-dist --deep > /home/{username}/cron 2>&1

Replace {username} with the name of the user directory you would like the file to output to. You will find the output in a file named ‘cron’. The ‘2>&1’ on the end of the command pipes stdout (standard output) command to the stderr (standard error) data stream – usually, Linux systems will only output data from the stderr stream, so piping the stdout stream into this will ensure everything is logged.

Once you have verified it is running, the logging command can be removed from crontab (unless you want to keep it, of course).

Portage distfile cache is one of the largest spacehogs on a Gentoo system – with this automated cleanup, the cache will be kept to a sensible size.