2010-03-28

How to cope with NIS (YP) server outages on Linux

This blog post explains how to cope with NIS (YP) server outages on Linux (tested on Debian Lenny), i.e. how to give applications instant and valid response even when the NIS server is unreachable. When parts of the user (passwd) and group database is not stored locally, but on a NIS (YP) server, then each user and group lookup request (such as getpwnam(3) and id USERNAME) is sent to the NIS server. Should the NIS server be down or unreachable, each such request is blocked until a very long timeout (sometimes more than a minute). Also sometimes the client (ypbind) doesn't detect properly that the NIS server is back up again, so user lookups tend to fail or time out even when the NIS server has recovered. Another problem with communicating with the NIS server is that it's hard to write a strict firewall rule, because the UDP port number of the NIS service (ypserv) is not fixed, but dynamically assigned and managed by the RPC portmap service.

The solution presented here involves caching all NIS data locally, updating the cache from a cron job, and directing applications to always read the cache. This design gives instant response to applications, no matter whether the NIS server is reachable, and since cache updates are performed by a dedicated user, more precise firewall packet filters can be written.

The instructions below have been verified on Debian Lenny, but they should work similarly on other Linux systems. It is assumed that the NIS domain to connect to is called mydomain (the real domain name usually available in /etc/domainname), and the IP address of the NIS server is 1.2.3.4.

  1. Make sure root can download the passwd database from NIS. Try this command: # ypcat -d mydomain -h 1.2.3.4 passwd This should print the the NIS user passwd database, including user names and (encrypted) passwords. If the 2nd field is x for all users than the passwords may be in the shadow file on the NIS server. Try this: # ypcat -d mydomain -h 1.2.3.4 shadow If this doesn't print any encrypted passwords, and you need password-based login on your machine for NIS users, then it wouldn't work for you (you may try complaining to the admin of the NIS server).
  2. Install software (as root) to the client Linux system: # apt-get install sudo nis python2.4 libnss-extrausers
  3. Create a user named nis-update: $ adduser --system --home=/tmp/blah --group nis-update (This disables login by default, and sets the user's login shell to /bin/false.)
  4. Make sure that the line in /etc/shadow starting with nis-update: starts with nis-update:*: instead of nis-update-!:. This disables login for that user.
  5. If you have a firewall which restricts outgoing packets on the local machine, you may have to add these rules to allow the nis-update user can download the NIS data:
    # Allow portmap.
    iptables -A OUTPUT -m owner --uid-owner nis-update \
        -p tcp --dport 111 -d 1.2.3.4 -j ACCEPT
    # Allow all Sun RPC, including NIS (YP).
    iptables -A OUTPUT -m owner --uid-owner nis-update \
        -p udp --dport 511:2999  -d 1.2.3.4 -j ACCEPT
  6. Make sure that the nis-update user can download the passwd database from NIS. Try that this command prints all user records: # sudo -c nis-update ypcat -d mydomain -h 1.2.3.4 passwd
  7. Add the following script as an executable /usr/local/sbin/nis-update.py:
    #! /usr/bin/python2.4
    # by pts@fazekas.hu at Sun Mar 28 18:21:20 CEST 2010
    
    """Script to download NIS (YP) users and groups for libnss_extrausers.so* .
    
    This script should be run from a cron job.
    
    Please also check /etc/nsswitch.conf for libnss_extrausers.so* . It reads
    from /var/lib/extrausers/passwd etc.
    """
    
    __author__ = 'pts@fazekas.hu (Peter Szabo)'
    
    import pwd
    import signal
    import sys
    import os
    
    NIS_UPDATE_DOWNLOAD_DIR = '/var/cache/nis-update'
    NIS_UPDATE_COPY_TARGET_DIR = '/var/lib/extrausers'
    NIS_SERVER='1.2.3.4'   #### fix at install time
    NIS_DOMAIN='mydomain'  #### fix at install time
    
    def NisUpdate(uid, gid, nis_filename):
      assert nis_filename in ('passwd', 'group', 'shadow')
      download_filename = os.path.join(NIS_UPDATE_DOWNLOAD_DIR, nis_filename)
      target_filename = os.path.join(NIS_UPDATE_COPY_TARGET_DIR, nis_filename)
      fd = os.open(download_filename, os.O_WRONLY|os.O_TRUNC|os.O_CREAT, 0644)
      pid = os.fork()
      euid = os.geteuid()
      assert euid == uid or euid == 0, (euid, uid)
      if not pid:
        try:
          fdnull = os.open('/dev/null', os.O_RDONLY)
          if fdnull != 0:
            os.dup2(fdnull, 0)
            os.close(fdnull)
          if fd != 1:
            os.dup2(fd, 1)
            os.close(fd)
          if euid != uid:  # It's root
            os.setgroups([])
            os.setregid(gid, gid)
            os.setreuid(uid, uid)
          signal.alarm(5)
          # This doesn't need a running ypbind.
          os.execl('/usr/bin/ypcat', 'ypcat', '-d', NIS_DOMAIN, '-h', NIS_SERVER,
                   nis_filename)
        except:
          exc_info = sys.exc_info()
          print >>sys.stderr, 'error in child: %s: %s' % (
             exc_info[1].__class__, exc_info[1])
          os._exit(1)
      os.close(fd)
      got_pid, status = os.waitpid(pid, 0)
      assert got_pid == pid
      st = os.stat(download_filename)
      if not status:
        status = 0
      if status or not st.st_size:
        print >>sys.stderr, 'warning: child %s failed with status 0x%x' % (
            nis_filename, status)
        return False
      os.rename(download_filename, target_filename)
      return True
    
    if __name__ == '__main__':
      # Don't print anything, we're optimized for running as a cron job. 
      p = pwd.getpwnam('nis-update')
      NisUpdate(p.pw_uid, p.pw_gid, 'passwd')
      NisUpdate(p.pw_uid, p.pw_gid, 'group')
      # TODO: Download and os.chmod the shadow file if necessary.
  8. In the script above, search for ####, and customize those settings.
  9. As root, create the script output directories: # mkdir -p /var/cache/nis-update /var/lib/extrausers; chown root. /var/cache/nis-update /var/lib/extrausers; chmod 700 /var/cache/nis-update; chmod 755 /var/lib/extrausers
  10. Run the script as root: # /usr/local/sbin/nis-update.py This should not print any error messages, and it should create the files /var/lib/extrausers/{passwd,group}.
  11. If you also need the shadow file (because it contains passwords), then modify the script so it creates that file as well.
  12. Add the script to your crontab so it runs every 5 minutes: # echo '0-55/5 * * * * root /usr/local/sbin/nis-update.py' | sudo tee /etc/cron.d/nis-update
  13. Make sure that the nss-extrausers pacakge is installed. (There was an instruction above.) # apt-get install nss-extrausers
  14. Edit /etc/nsswitch.conf, and set the following values:
    passwd: files extrausers
    group: files extrausers
    shadow: files extrausers
    (The previous values were probably compat or files nis.)
  15. At this point, applications would use the cached files /var/lib/extrausers/{passwd,group}. Test this by checking that $ perl -e 'while(@L=getpwent){print join("+",@L),"\n"}' prints all local users followed by all NIS users.
  16. Turn off the NIS client: # /etc/init.d/nis stop
  17. Run the check above again and make sure that all NIS users are present: $ perl -e 'while(@L=getpwent){print join("+",@L),"\n"}'
  18. Disable NIS client startup at boot time: # mv /etc/domainname{,.not}
  19. Occasionally check root's e-mail for error messages from the nis-update.py cron job.

2 comments:

delta160 said...

Stumbled on this somewhat old post, have You considered ncsd ? it seems to be a standard way to accomplish the same thing. Or does Your solution have some advantage?

pts said...

@delta160: One obvious advantage of my solution is that it persists indefinitely, even if the local computer is rebooted, and it's 100% guaranteed that getpwnam(3) etc. calls won't ever be waiting for the network. This makes it more resilient than nscd to network outages.