A little gotcha with Postfix’s header_checks

I run my own mail on all of my personal domains using Postfix and Dovecot, doing almost all of my day-to-day interaction with it using Thunderbird. Many years ago, I realized I would rather not be leaking my home IP address in the initial Received: header in the mail I send, so, guided by various online postings that I can no longer identify with certainty, I initially did the following.

First, I set up a custom cleanup service to be used by the message submission service by adding this entry to master.cf:

# Scrubs client IP address from things arriving over the submission port.
# DON'T FORGET "-o cleanup_service_name=subcleanup" IN THE SERVICE'S ENTRY!
subcleanup unix n       -       -       -       0       cleanup
  -o syslog_name=postfix/submission
  -o header_checks=pcre:/etc/postfix/submission-header-checks
# (Don't actually deploy the snippet above - keep reading.)

And I put the following in /etc/postfix/submission-header-checks to define the actual substitution to be performed:

/^(Received:\s+)from\s.*?\b(by\s.*)$/mi REPLACE $1$2

And finally, I tacked this line onto the submission entry in master.cf to choose the new custom cleanup service:

  -o cleanup_service_name=subcleanup

(Don’t forget the indentation.)

This worked well for quite a while, but has a subtle shortcoming that I didn’t hit until a few days ago when I found myself needing to provide someone with a copy of some pieces of spam I had received, which I decided to forward as attachments – something I had never done before – to ensure the headers would be preserved.

That’s when I found out that header_checks, by default, apply to MIME headers and MIME subparts too – including the Received: headers in the attached spam, stripping away the very information I was trying to preserve by forwarding the messages as attachments.

(I noticed immediately because I was watching Postfix’s logs as I sent the message, having temporarily reconfigured Postfix not to use my usual outbound relay service since… well… it kind of might look like I was trying to send spam through them… and I wanted to see that Postfix really was delivering the message directly. It’s nice that header_checks rules log every change they make.)

A quick look at cleanup(8) revealed that mime_header_checks and nested_header_checks default to the value of header_checks, so I updated the master.cf entry to set those to nothing:

# Scrubs client IP address from things arriving over submissions port.
# DON'T FORGET "-o cleanup_service_name=subcleanup" IN THE SERVICE'S ENTRY!
subcleanup unix n       -       -       -       0       cleanup
  -o syslog_name=postfix/submissions
  -o header_checks=pcre:/etc/postfix/submission-header-checks
  -o mime_header_checks=
  -o nested_header_checks=

…and resent the message (after testing with a resend to myself, of course), saw that only the one actual intended hit of the Received:-modifying rule occurred this time, re-enabled use of the outbound relay service, and went about my day.

(That block now refers to submissions – note the extra “s” – because between the time I first added Received: header filtering to my configuration and the time of this incident, I moved the submission service from port 587 with STARTTLS to port 465 with TLS from the start, in accordance with RFC 8314 not just undeprecating that port but calling for it to become new best practice.)

Looking back at the documentation, I now feel that setting disable_mime_input_processing = yes (disclaimer: haven’t actually tried it) might have been a better thing to do to fix this, but I don’t quite feel up to messing more with something that works for the moment.

If this article happens to directly help you get endpoint IP scrubbing working properly (i.e. without also removing information from messages forwarded as attachments) in your Postfix installation, I’d love to know in the comments.

On screensaver daemons

Quite a while ago, I noticed that my laptop would turn its screen off after 10 minutes of keyboard and mouse inactivity, no matter what I was doing or whether any program had an active screensaver inhibition, and that there was no setting I could find that would prevent this or even change the delay. (As you will see, though, I merely didn’t look hard enough at that time.)

I blamed my long-standing franken-half-GNOME-half-other-stuff desktop configuration, but left things as they are and dealt with it.

Until last week, when I finally decided to investigate this behavior again after getting sufficiently fed up with the lack of respect for screensaver inhibitions.

It turned out to be the X server itself doing it; all DPMS timeouts were set to 10 minutes. A quick xset dpms 0 0 0 (disabling the timeouts but leaving DPMS enabled so the screen could be turned off when I lock the machine) fixed it for the rest of the current session, but I wanted to solve this in a more permanent manner in a nicer way than just causing that xset invocation to occur on session startup.

But while deciding what to do about that, I noticed that my screen now never turned off when I left my session locked.

My screensaver daemon has always been gnome-screensaver. I carried it over ever since I switched (way back when) from running a full GNOME 2 session to running a custom session that still used GNOME components for any parts that I didn’t specifically want to replace with something else.

I never needed any other functionality from a screensaver daemon than screen blanking and locking (I’ve always preferred a blank, ideally powered-off, screen over graphical demos for this purpose), so gnome-screensaver lived on in my session even as my session continued to evolve over the intervening years.

Somewhere along the line, gnome-screensaver got almost completely gutted. There’s not much more it’s capable of nowadays other than blanking the screen and implementing a screen lock, so I never noticed. The gutting included anything to do with power management, so it was the X server’s built-in timeout, and nothing to do with gnome-screensaver, that was making my screen turn off 10 minutes after I locked my machine (in addition, of course, to 10 minutes after taking my hands off of it for any other reason).

This was greatly disappointing (especially in the way that it escaped my notice for so long), so after looking around at a few options (and how they handle power management, which involved some source-diving), I switched to xscreensaver. I felt a little odd doing so given the disagreements (including one very recently) between its upstream and Debian, but of the available options it was the one that appeared to fit my needs best that wasn’t specifically made for some other desktop environment.

With Mode set to Blank Screen Only, and with “Quick Power-Off in Blank Only Mode” checked, it does everything I need, and if I decide at some point that I do want graphical demos after all, it has me covered.

All because I still want my display to be turned off, but I want it to respect screensaver inhibitions, so the power management trigger logic needs to be in the screensaver daemon…

irssi and AFS

For the vast majority of my IRC needs, I used to use a Quassel core. But for a couple of more lighthearted channels I hang out in, I instead started using (and have since folded the rest of my IRC setup into) what on its face appears to be a traditional irssi-in-screen setup on a JHU ACM system so I could have scripts loaded to implement some fun little bits of functionality that others in those channels can invoke.

The traditionality ends there, though.

This is because the ACM uses AFS, so my homedir is in AFS. And there are some channels I am archiving, and to do that, irssi needs to be able to access my homedir for as long as it continues to run. So I need something to maintain an AFS token, which also requires maintaining a Kerberos ticket, but I don’t want this to give irssi access to everything my Kerberos identity is able to do.

So here’s what I did:

First, I made a new Kerberos principal stump/irc and a keytab and pts entry for it and stored the keytab in my homedir. (As part of the JHU ACM sysadmin team I could just do this myself; we’d be happy to do the same thing for any user who asks.) Since my own realm is cross-realmed with the ACM, I could have made this principal here in STUMP.IO-land instead (and then gotten a ticket and run aklog to create the pts entry), but I chose not to. I saved the keytab as ~/irc.keytab.

Then I granted the pts entry the minimum permissions it needed on my homedir to meaningfully run irssi with my configuration and archiving. For me, this is l on the root level of my homedir, rlidw recursively on ~/irclogs, rl recursively on ~/perl5 (because I used the wonderful local::lib to install some Perl packages to my homedir that are used by my scripts but aren’t installed system-wide), and rlidw recursively on ~/.irssi.

Then I wrote a wrapper script to do the runtime setup for starting irssi, as follows:

#!/bin/sh
set -e
TMPDIR=`mktemp -d /tmp/stump_irc_XXXXXX`
trap 'rm -rf "$TMPDIR"' 0 1 2 15
cp -a ~/irc.keytab "$TMPDIR"/keytab
k5start -U -f "$TMPDIR"/keytab -k "$TMPDIR"/krb5cc -t -- irssi "$@"

(The reason for copying the keytab into the temp dir is that k5start sets up a PAG, and thereby loses access to the AFS tokens it was started with, before it reads the keytab. And if, like me, you’re in the [very good!] habit of exec-ing the real program when you write a wrapper script, note that you can’t do this here due to the trap.)

Now I can just run that script in a screen and get an irssi that will continue working indefinitely.

 

…or it would, but because AFS only writes back file contents on close or fsync, each channel’s archive since the last time it was (re-)joined cannot be viewed from other systems. And worse, if the machine loses power or otherwise shuts down ungracefully (which semi-regularly happens, due to the state of constant flux the ACM systems are in, though things are much better now than they were when I first set this up), those spans of archives fall on the floor, which is very sad.

So I wrote a quick script to call fsync on every file descriptor irssi has open (because there was no obvious way to figure out which ones were the logfiles, I just took the brute force approach) every so often. Here it is:

use strict;
use warnings;

use Irssi;
use IO::Handle;
use POSIX;

my $VERSION = '0.1';
my %IRSSI = (
  'authors' => 'John Stumpo',
  'name' => 'fsync.pl',
  'description' => 'Periodically fsync()s all file descriptors irssi has open',
  'license' => 'Public domain (CC0)',
);

our $FSYNC_MSECS = 300000; # every 5 minutes

sub fsync_everything {
  my $io = IO::Handle->new();
  for (my $fd = 0; $fd < 1024; $fd++) {
    # Sadly, modern Perls don't let you just call POSIX::fsync on an int,
    # insisting that you do it through an IO::Handle, which has the highly
    # undesirable (for us) behavior of closing the file descriptor when the
    # object goes away without any documented-reliable way to override this.
    # Therefore, dup the fd first, and fsync and close the duplicate. The dup
    # also serves as a check that $fd is actually an open file descriptor.
    my $duped_fd = POSIX::dup($fd);
    next unless defined($duped_fd);
    $io->fdopen($duped_fd, 'r');
    $io->sync();
    $io->close();
  }
}

Irssi::timeout_add($FSYNC_MSECS, \&fsync_everything, undef);

As the comment says, it would have been nice to just call POSIX::fsync on each int from 0 to some reasonable value and ignore errors (as I could by using fsync from C or os.fsync from Python), but Perl doesn’t let you do that anymore without going through a handle object that closes the file descriptor when it goes away and doesn’t give you a choice about that, hence the dance with POSIX::dup. (And closing a dup‘d file descriptor that goes into AFS doesn’t flush cached writes – only when the last file descriptor closes does that happen, aside from fsync being called.)

Happy IRCing on AFS!

SSHFP records: A different way to check host keys

The JHU ACM has many machines that users might want to SSH into. The JHU ACM also occasionally reinstalls those machines for various reasons, which changes their SSH keys. And I’m usually not near the office anymore to verify for myself that the keys I’m seeing are correct.

The ACM happens to do DNSSEC on their domain, though, and I run a validating resolver, so I can use SSHFP records as a trusted source of SSH fingerprints.

Using SSHFP records

On my end I added this line to ~/.ssh/config:

VerifyHostKeyDNS yes

And ssh checks the key it sees against SSHFP records (falling back to known_hosts if there aren’t any), giving the usual loud warning about any discrepancies it may come across.

(It even knows the difference between a DNSSEC-authenticated response and the other kind, and it will still ask you whether the key is OK if the matching SSHFP record is unauthenticated and the key isn’t in known_hosts. You can make it ask even on authenticated SSHFP records by setting VerifyHostKeyDNS to ask instead.)

That’s all well and good… what about the other end?

ssh-keygen can generate SSHFP records for a given key file when run with the -r option. We wrapped it in a script, which you can run on the machine in question to get a zonefile snippet (and which only looks at public key files, and so doesn’t need to be run as root):

#!/bin/sh
# Generate a zone file snippet for SSHFP records for the host this is run on.

set -e

# This is in increasing order of algorithm ID in the SSHFP records.
for algo in rsa dsa ecdsa ed25519; do
  keyfile="/etc/ssh/ssh_host_${algo}_key.pub"
  if test -f "$keyfile"; then
    if ! ssh-keygen -r "`hostname`" -f "$keyfile" | awk '
    {
      # Do not bother with SHA1 fingerprints - only process SHA256 ones.
      # Reformat the lines so the whitespace matches common zone file layout.
      if ($5 == "2")
        print $1 (length($1) < 8 ? "\t" : "") "\t" $2 "\t" $3 "\t" $4, $5, $6
    }' | grep SSHFP  # the grep is just to check whether there was any output
    then
      if test x"$algo" = xed25519; then
        echo "; placeholder for SSHFP record for `hostname` ed25519 key"
        # Complain on stderr about ed25519 SSHFP records not yet being supported
        # so the output of this script can be sensibly directed into a zone file.
        cat >&2 <<"EOF"

There is an ed25519 key, but ssh-keygen did not turn it into an SSHFP
record. It is probably not a new enough version to support doing so -
if this is the case, remember to regenerate the records once it is.
EOF
      else
        echo "unable to generate SSHFP record for $keyfile" >&2
        # ssh-keygen puts some error messages on stdout (shame!), so re-run
        # the failing invocation to get that output and throw away stderr.
        ssh-keygen -r "`hostname`" -f "$keyfile" 2>/dev/null
        exit 1
      fi
    fi
  fi
done

(Remove the if ($5 == "2") if you also want SHA1 SSHFPs, which older versions of ssh might need. The ugly conditionals involving ed25519 are because at the time the script was written, Debian stable’s OpenSSH didn’t support SSHFP for ed25519, Debian testing’s did, we had machines running both, and the OpenSSH tools could stand to do much better when it comes to error exit codes and to what goes to stdout and what goes to stderr. The script is also available as /afs/acm.jhu.edu/group/admins.pub.ro/scripts/gen-sshfp-records.)

Since we would occasionally forget to update the SSHFP records when reinstalling a machine (or when the key changed for some other reason), I wrote a Nagios plugin for checking SSHFP records, which the ACM now uses. It checks that all key types offered by the server have SSHFP records, that every SSHFP record goes to a key type the server offers, and that every SSHFP record is correct.

The b(ack)log

I’ve been highly remiss on posting anything here.

Which is terrible, because over the last few years, I have very regularly had some experience well worth a blog post, whether it’s to do with sysadminly things, coding, scripting to make my life easier, otherwise improving how I interact with my daily-driver laptop, engineering something interesting, reverse-engineering something interesting, other technical things, or even other non-technical things.

So I want to share some of that.

I will do my best to regularly post about something I did in that timespan – at the very least, weekly. I have built up a long list to pick from and flesh out, which should keep me going for a long while yet.

Here’s a high-level overview of what else has been going on with me:

  • I’m now all finished with my master’s degree in computer science from Johns Hopkins.
  • The JHU ACM moved to a new building and rebuilt its systems from the ground up, during which I was on their sysadmin team (and had been for years before that, and continue to be) and was part of implementing any number of interesting bits of functionality.
  • I spun up my own Kerberos realm and AFS cell to play around with (since the ACM uses those too) and have put them to good use. (Expect some posts about making particular services use GSSAPI authentication and interact more nicely with having their backing stores in AFS.)
  • I went to DebConf 14 (and used an Amtrak USA Rail Pass to expand my trip there into a month-long adventure across the continental United States and back, spending time in seven extra cities).
  • I gave a talk at FOSSCON 2013 about the challenges I experienced with the FoFiX project; I went to FOSSCON 2015 too (missed 2014 because of the DebConf trip).
  • I’ve taken up speedrunning (completing video games as fast as possible) and Twitch streaming. (Expect some posts about glitches and quirks in games and game hardware, and about the technical side of streaming from Unix-like systems.)
  • I finally got a new laptop in winter 2014 and dealt with a number of things for the first time while setting it up, such as configuring disk encryption and handling UEFI.
  • I went to AGDQ 2015 and SGDQ 2015 (and couched two runs at SGDQ, mainly to provide technical background on glitches).
  • I finally got a new VPS in summer 2015 and am moving my services over. (I challenged myself to move each service with as close to zero downtime as possible; this was much easier for some services than others. More details in future posts.)

As I said, I will try to write about no less than one thing every week; there are all kinds of stories (technical and not) built up over this long blog gap. See you then!

(And let me know in the comments if there’s anything in particular that you’d like me to try to prioritize writing about.)