Backup Scheduling

If you implement your own backups or if the backup service you are using doesn’t provide automatic backups upon file-save (I’m looking at you, Google Drive on Linux ಠ_ಠ), you will have to come up with a way to trigger the backups yourself.

This post is part of a series of articles about scripting your own backups which I started recently. This time I look into how we can automate script execution for backups on desktops and notebooks. Spoiler alert: cron doesn’t work well here at all…

Cron

If you know your way around Linux servers your first instinct probably is to use cron for this. In order to run backups on every Monday 18:00 you could create the following crontab:

$ cat /etc/crontab
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user  command
0 18  *   *   1   root  /home/fgrosse/scripts/backup.zsh

However you will soon realize that this cannot work reliably on devices which are not running 24/7 (e.g. your notebook). The reason for this is that you just tell cron to run a job at a certain point in time by specifying an interval such as “every monday at 18:00”. If your computer happens to be powered off during that time then cron will neither be able to run this job, nor will it detect it has missed the last scheduled execution the next time you boot up your computer. In that sense, cron is entirely stateless and doesn’t have a concept of the history of its jobs and if they may be overdue.

Anacron

To work around that issue people have invented anacron. Unlike cron, anacron does not assume the system is running all the time. Instead it remembers when a job was executed last using time-stamped files in /var/spool/anacron for each job. This enables anacron to detect and execute overdue jobs when the system boots. The disadvantage is that the smallest interval that anacron can manage is days (i.e. not minutes like cron). While this is not a problem for the backup automation it still means that anacron is not meant as a replacement of cron but as an addition. Anacron is available on all the major Linux distributions. I’m using Fedora where anacron comes as part of cronie (the cron daemon project). Anacron has its own format to specify which jobs to run:

$ cat /etc/anacrontab
SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin

#period in days   delay in minutes   job-identifier   command
1         5   cron.daily		nice run-parts /etc/cron.daily
7         25  cron.weekly		nice run-parts /etc/cron.weekly
@monthly  45  cron.monthly		nice run-parts /etc/cron.monthly

As you can see above in the default anacrontab (which comes from cronie) this configuration is not typically meant to be edited much by hand. Instead it uses run-parts to execute all scripts in the /etc/cron.{daily,weekly,monthly} directories once within their respective interval. So in order to schedule automatic weekly backups I created a script in /etc/cron.weekly and thought I would be done.

However for some reason I could not seem to get this to work reliably. Initially I got my scripts to be executed just as expected but sometimes they were not triggered at all. I spend at least one entire evening debugging this issue and I want to briefly share my findings and why I eventually decided for another scheduling approach.

As I said I am running Fedora on a notebook where anacron comes as part of cronie which is installed as the systemd unit called crond. In order to debug the problem I started by looking at the corresponding journalctl output:

$ journalctl -u crond -e --since=today --no-pager
-- Logs begin at Wed 2018-03-14 21:32:15 CET, end at Thu 2018-05-10 11:33:31 CEST. --
May 10 00:01:01 kronos CROND[23782]: (root) CMD (run-parts /etc/cron.hourly)
May 10 00:01:01 kronos anacron[23793]: Anacron started on 2018-05-10
May 10 00:01:01 kronos anacron[23793]: Will run job 'cron.daily' in 5 min.
May 10 00:01:01 kronos anacron[23793]: Jobs will be executed sequentially
May 10 00:01:01 kronos run-parts[23795]: (/etc/cron.hourly) finished 0anacron
May 10 00:06:01 kronos anacron[23793]: Job 'cron.daily' started
May 10 00:06:01 kronos run-parts[24065]: (/etc/cron.daily) starting google-chrome
May 10 00:06:01 kronos run-parts[24100]: (/etc/cron.daily) starting logrotate
May 10 00:06:01 kronos run-parts[24105]: (/etc/cron.daily) finished logrotate
May 10 00:06:01 kronos anacron[23793]: Job 'cron.daily' terminated
May 10 00:06:01 kronos anacron[23793]: Normal exit (1 job run)
May 10 11:01:01 kronos CROND[7637]: (root) CMD (run-parts /etc/cron.hourly)

During debugging the log output seemed normal and I could see anacron was started without any issues. However when I expected the script to be executed automatically one day later I could not find any mentioning of it in the logs, so I had to look a bit deeper.

The first problem here is that it is quite hard to make some changes such as changing the permission of a file and then test immediately if it would work with anacron. Instead I either had to wait until the next day (so anacron runs again) or I had to delete the /var/spool/anacron/cron.daily file and accept that all daily scripts were executed again. Anyway I was not in a big hurry so I just tried a couple of things over multiple days without much success.

Looking at the logs I had to understand first how cron was starting anacron. In the journalctl output I always saw that cron.hourly was started by cron but anacron was only executed sometimes. So I started to wonder how all of this was actually wired together:

$ grep -r -l 'run-parts /etc/cron.hourly' /etc/cron.*
/etc/cron.d/0hourly

$ cat /etc/cron.d/0hourly
# Run the hourly jobs
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
01 * * * * root run-parts /etc/cron.hourly

With my crontab being entirely empty, cron executes only those scripts within the /etc/cron.d directory. This is also well documented in man cron but I didn’t care to look there first. Lets see what hourly jobs are defined for cron:

$ ls /etc/cron.hourly
0anacron

Ok, seems like that’s how anacron is started by cron at the beginning of every hour. Now I started to realize the first issue. Sometimes I start my notebook just for a short time (e.g. less than 1h) before I put it back to sleep. If the system is not up at the first minute of the hour then none of the /etc/cron.hourly tasks are executed. In the bigger picture that’s not a problem because most of the time I’m actually using my notebook for more than one hour, but I started to dislike the apparent flakiness of the anacron/cronie approach.

I was getting closer to the core of the problem when I checked how anacron is actually started by cron:

$ cat /etc/cron.hourly/0anacron
#!/bin/sh
# Check whether 0anacron was run today already
if test -r /var/spool/anacron/cron.daily; then
    day=`cat /var/spool/anacron/cron.daily`
fi
if [ `date +%Y%m%d` = "$day" ]; then
    exit 0
fi

# Do not run jobs when on battery power

online=1
for psupply in AC ADP0 ; do
    sysfile="/sys/class/power_supply/$psupply/online"

    if [ -f $sysfile ] ; then
        if [ `cat $sysfile 2>/dev/null`x = 1x ]; then
            online=1
            break
        else
            online=0
        fi
    fi
done
if [ $online = 0 ]; then
    exit 0
fi

/usr/sbin/anacron -s

There I finally had the bigger issue: anacron jobs were simply never executed when I was running on battery power! I briefly considered just to change this script but then what happens when the cronie package is updated? Also it didn’t generally sound like a bad idea to not run all anacron jobs on battery power so I decided to not touch the cronie setup and instead started looking for other alternatives for scheduling.

Systemd Timers

After looking into other options (e.g. jobber) I stumbled upon timers in systemd. It turns out these are quite nice to work with, especially during the initial trial-and-error phase because systemd timers can be started ad-hoc and its easy to get the state and output of any systemd unit in general. Let me show you the setup I finally settled with.

Remember I want my backups to executed exactly once within a given time interval (e.g. once per week). If the system isn’t running at the scheduled time, the backups should be executed upon the next time the system is started. Lets start with the definition of the systemd timer unit:

[Unit]
Description=Perform system backup

[Timer]
OnCalendar=weekly
Persistent=true

[Install]
WantedBy=timers.target

This timer will be triggered once every week at Monday 12am. If the system is powered off at that time, Persistent=true triggers the timer the next time the computer starts.

The next step is to create a service unit. By convention the service file should have the same base name as the timer but with a .service extension. Thus I have a backup.timer and the following backup.service file:

[Unit]
Description=Perform system backup

[Service]
Type=simple
ExecStart=/etc/backups/backup.zsh

This unit simply executes a script which I chose to put into /etc/backups. Of course you can put that script in any other location. Since the service is triggered by the timer we do not need an [Install] section or enable the service in systemd.

However we need to start and enable the timer. I do this via the following Makefile which also handles validation of my configuration:

.PHONY: install verify

install: /etc/systemd/system/backup.timer \
		 /etc/systemd/system/backup.service

/etc/systemd/system/backup.timer: backup.timer verify
	sudo cp "$<" "$@"
	sudo systemctl start backup.timer
	sudo systemctl enable backup.timer
	sudo systemctl daemon-reload

/etc/systemd/system/backup.service: backup.service verify
	sudo cp "$<" "$@"
	sudo systemctl daemon-reload

# Annoyingly systemd-analyze doesn't care to set the exit code so
# we have to do it by hand by checking if there was any output...
# (Tested with systemd version 234)
verify:
	@err="$$(sudo systemd-analyze verify backup.service 2>&1)"; \
	if [ -n "$$err" ]; then \
		echo $$err; \
		false; \
	fi

When you have installed the units you can check the status of the service and timer:

# Show backup timer
$ systemctl list-timers backup.timer
NEXT                          LEFT         LAST              PASSED UNIT         ACTIVATES
Mon 2018-05-14 00:00:00 CEST  1h 8min left Sun 2018-05-13 …  8h ago backup.timer backup.service

1 timers listed.
Pass --all to see loaded but inactive timers, too.

# Show status of backup script
$ systemctl status backup
● backup.service - Perform system backup
   Loaded: loaded (/etc/systemd/system/backup.service; static; vendor preset: disabled)
   Active: inactive (dead) since Sun 2018-05-13 14:38:12 CEST; 8h ago
 Main PID: 10720 (code=exited, status=0/SUCCESS)
      CPU: 16ms

May 13 14:38:12 kronos backup.zsh[10720]: Hello Backup World!
May 13 14:38:11 kronos systemd[1]: Started Perform system backup.
…

# Show the full log output of the backup script
$ journalctl --unit=backup
…

Both the service and timer have a lot more options such as configuring more specific dates for the timer execution. More detailed information on the possible configuration can be found via man systemd.timer and man systemd.service. If you do not want to read through all of that I found the documentation of the archlinux people quite helpful as well.

Conclusion

As you can see systemd timers have a lot of advantages over regular cron/anacron:

I have been running the systemd timers approach now for roughly 1.5 months and am still very happy with it. The next post will probably show the actual backup script I am using so far. \ʕ◔ϖ◔ʔ/

Cheers
Friedrich