FreeBSD Diskless Booting

The following is a work in progress but we are booting half a dozen servers with this method.

Our intent is to use the facilities provided by the OS in to reduce per-client configuration to a minimum. With only a single OS installation across multiple machines, we also hope to make upgrades easier, to allow easy retrogression, and to allow easy substitution of one piece of hardware for another. Our motivation has nothing to do with saving the cost of a boot disk, and we are not using the clients as xterms. This is not installing from the net. Not that there is anything wrong with that, but we have a different purpose. Our systems are really diskless - but they boot from a centralized and shared network resource and provide remote compute, storage and network services.

We have experience only with PXE booting FreeBSD 5.4, 6.0, 6.2 and 7.0 - bootp and earlier versions of FreeBSD are quite different and covered in many other tutorials and in /etc/share/examples/diskless to this day. Those instructions will be quite misleading when applied to any recent release. If you see instructions greatly different from those here, they are likely for FreeBSD 4. or prior. The only other sources of information I am aware of for 5.4+ are in the recently published 2nd edition of the book "Absolute FreeBSD" by Michael Lucas, and a posting by Eric Norgaard . The FreeBSD handbook has recently been revised, but it is quite sketchy.

Overview

The client PXE boot code in ROM directs the client as it obtains from the dhcp server the IP address tftp server (called "next-server" by dhcpd) offering the boot loader (called "pxeboot" by FreeBSD), and of the NFS server exporting the root directory. The boot loader arranges to NFS mount the root directory read only. Once root is mounted, the script /etc/rc.initdiskless creates and populates /etc and /var, and otherwise readies the machine for login.

While /usr, /bin and /sbin can be shared across multiple systems without change, /var and /etc contain many per-client files, and many writable files, which require special treatment. During the diskless boot, /etc/rc.initdiskless creates and populates in memory versions of /var and /etc according to the contents of the /pxeroot/conf directory. While that directory is read-only, the in memory copies on the client can be written, although the contents are lost on reboot. Unfortuneately many applications like to write outside the user home directory, and these require some customization to operate satisfactorily. An alternative to memory file systems would be NFS mounted persistent file systems, but after experimenting with those, we found the memory filesystems easier to deal with. They only consume a couple of megabytes of RAM.

Conventions

In this document, the hostname of the diskless server is bsdboot, the clients are client1, client2... and all filenames and paths are on the server.

Client CMOS setup

Most any PC with a motherboard ethernet manufactured since 1999 can be set for PXE boot, however the settings are always well hidden. Generally you need to turn on the ethernet port, turn on the LAN boot ROM, and set the boot sequence to include LAN on three separate menus. Sometimes you will need to "clear ESCD" also (resetting to factory defaults does not include clearing ESCD). Sometimes LAN doesn't appear on the boot menu till the other flags have been set and CMOS saved and reloaded. Sometimes you can't make it work and must add a PCI card.

If you use a bootable PCI ethernet card, it should show a configuration prompt on the screen for a second or so before the main bios takes effect. Make sure the client displays some kind of "attempting to boot from network" message, otherwise it probably isn't trying. That message will include the MAC address, and since vendors stopped putting little printed stickers on the cards that is likely the way you will get the address - which is needed for the DHCP server configuration.

Argon Technologies sells a $16 ethernet card a that should PXE boot, but that card sometimes arrives with the configuration message and PXE boot facility turned off, and we haven't figured out how to turn them on. There is no documentation with the card. We have stopped using them for this reason, but if you find out how to work these cards, we'd like to hear from you.

Intel cards are the only cards generally available at retail that will likely support PXE. Don't expect to see it mentioned on the box or manual, however. They have always worked for us, but are more expensive.

Bootix apparently sells PXE bootroms for most other vendor's adaptors - we have no experience with that or with bootrom-on-floppy.

Regardless of PXE support in the motherboard firmware, your FreeBSD kernel may not support the onboard ethernet, in which case the kernel will stop with a "nfs_diskless: No interface" prompt. We have discovered that the motherboard ethernet may function, but fall over with heavy use, often during the boot process. In that case we substitute an Intel card.

Server installation

We start with a blank system on our intended boot server and do a "Standard Installation" with "All system sources, binaries and Xwindows" software, and make minimal modifications from there. No doubt much less than a full install is required, but we want something easily reproducible both by us and by any reader of this missive. A disadvantage of our procedure is that we don't learn what a minimal install would look like.

During the installation we accepted all defaults except for automatic partitioning. We place the read-only "pxeroot" directory on its own partion (mounted as /pxeroot to match the default requested by pxeboot). This partition does not need to be large - several hundred megabytes is sufficient unless you wish to add many ports. We allocate 4 gig, and it remains 90% empty.

TFTPD

In FreeBSD 6.0+ xinetd is protected by /etc/hosts.allow, which disallows all tftp requests from other than localhost. I suggest adding the following to that file:

tftpd: client1 client2 : ALLOW

You can use any TFTPD daemon to serve the boot loader. One is part of the default install, but is not turned on. Edit /etc/inetd.conf to uncomment the "UDP" version of tftpd:

tftp dgram udp wait root /usr/libexec/tftpd tftpd -l -s /tftpboot

Restart inetd with an appropriate kill -HUP pid.

The PXE bootloader is shipped in an obscure location and needs to be copied to /tftpboot. Make sure it is readable (but not writable) by all.

 cp /boot/pxeboot /tftpboot

A tftp client is part of most Unix installations (not FC4, though), so after a reboot go to another system and test the server with:

.tftp bsdboot
.get pxeboot

It is probably a good idea to test after each step of the installation - if you wait till the end the symptoms won't seem very diagnostic unless you have considerable experience with FreeBSD.

DHCP

I won't discuss installing dhcpd - it seems likely you already have one, and another could cause conflicts, You will want to add some parameters to dhcpd.conf for the diskless booting group, and some for each of the client systems:

group {
next-server=66.251.72.8;
filename "pxeboot";
option root-path "6.251.72.8:/pxeroot";

host client1; (fixed-address client1; hardware ethernet 0:02:55:97:c9:15;)
host client2; (fixed-address client2; hardware ethernet 0:02:55:97:c9:16;}
}

where next-server specifies the tftp server, root-path points to FreeBSD boot server and the client hostnames are in your DNS. The filename and root-path specifications are actually optional, since below we use the default path on the default server, but there is no default for next-server. There is bug report docs/39348 claiming that "option host-name" is required - we haven't found that to be true.

Restart your DHCP server and give the client a reboot to test your progress. The client console should show pxeboot load, and show the IP addresses for the boot server and gateway and the root path. If it doesn't, check the MAC address Of course it will also show an error message for failing to find a kernel. That is our next step.

Making the actual FreeBSD installation

Theoretically one should be able to export the root of the server for the client machines to use, but that would require / and /usr to be in a single partition, which isn't the standard install. So we recompile world to get a root for the clients. This will do it:
setenv DESTDIR /pxeroot
cd /usr/src
make world
make kernel
cd etc
make distribution
mkdir  $DESTDIR/boot
cp /boot/device.hints $DESTDIR/boot

In our installation no kernel modifications were required and no configurations are edited. Make takes about an hour on our system and produces about 171 megabytes of files on /pxeroot. Surprisingly, subsequent makes of the same source take the same amount of time.

Copying the existing installation

An alternative to make world is to copy the required files. For reasons unclear to us this didn't work when we started, however we are now having success with the following as a quicker substitute:

cd /
cp -pR bin boot cdrom compat dev disk dist etc lib /pxeroot
cp -pR libexec media resque root sbin sys usr var  /pxeroot
rm /var/db/mounttab
cd /pxeroot
mkdir dev proc tmp
chmod 1777 tmp

The list of directories changes slowly with new versions of the OS so you may have to add a few. /var/db/mounttab needs to be removed because it lists the currently mounted nfs partitions on the boot server, which you don't necessarily want the client to mount. /dev and /proc will be populated at boot time. We generally put /tmp on a local hard disk (specified in fstab) but it will need a place to mount.

NFS exports

NFS is part of the default install, but isn't turned on. To turn it on add this to /etc/rc.conf:

nfs_server_enable="YES"
rpcbind_enable="YES"

Edit the /etc/exports to export the root partition read-only:

/pxeroot -ro -maproot=0 -alldirs client1 client2...

We set "maproot=0" so that the client will have access to the password database on the server - which is readable only by root. We enumerate the allowed clients so that the password database and other material is not made readable to an insecure client. At this point you can reboot the client, and some semblance of a Unix system should load. If pxeboot complains that the kernel is unavailable, check that the NFS export is functioning and that forward and reverse DNS match.

showmount -e
will show the server exports.

Once FreeBSD presents a login prompt, you can login as root (no password yet) and observe that /etc is nearly empty. With a diskless client, many of the tasks that sysinstall would otherwise do for you are done by rc.initdiskless, which runs each time a client boots with an NFS root. It will populate memory resident /var and /etc according to the contents of /pxeroot/conf. The "diskless" manpage describes the configuration of that script in greater detail than presented here. One facility is described that doesn't work with DHCP - ${class}.

Briefly, initdiskless copies over to the client /etc the files found in /pxeroot/conf/base/etc/, then copies over that the files found in /pxeroot/conf/default/etc/, then copies over that the files in /pxeroot/conf/hostname/etc (where hostname is the hostname or dotted numeric ip address of the client. A similar procedure is performed for the var directory.

Start with a copy of the standard /etc:

mkdir -p /pxeroot/conf/base
cp -r /etc /pxeroot/conf/base/etc

This is a good time to reboot the client and note that it now boots with the same configuration as the diskless server. If all your clients are identical, you could just edit the configuration files (rc.local, rc.conf, fstab) in /pxeroot/conf/default/etc, but we need separate configurations for each of our clients so we create files such as /pxeroot/conf/66.251.72.16/etc/rc.conf or /pxeroot/conf/client1/etc/rc.conf to contain rc/conf for the client with that IP address. No mount for / is required in any fstab file you create. However, if there is a /pxeroot/etc/fstab that shows a local disk as the root FS (as would happen if you just copy a locally installed distribution into /pxeroot) then it will mount that as /, defeating the diskless boot.

We add the following to every /pxeroot/conf/.../etc/rc.conf but your needs may vary:

amd_enable="YES" '
usbd_enable="YES" '
sshd_enable="YES" '
ntpupdate_enable="YES" '
nis_client_enable="YES" '
nisdomainname="nberorgyp" '
nfs_client_enable="YES" '
rpcbind_enable="YES" '
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpc_bin_enable="YES"
nfs_client_enable is redundant, but lockd and statd need to be explicitly turned on or nfs file locking will not occur.

Passwords

Here is a script that will copy the server password file to the common client /etc. With this script if you asked for NIS in the original server install, it will be available to the client also.

cd /etc
cp passwd master.passwd /pxeroot/conf/default/etc/
cd /pxeroot/etc
pwd_mkdb -d /pxeroot/etc master.passwd

Note the space before master.passwd in the pwd_mkdb command. Somehow this (or an equivalent) script will have to be run whenever accounts are updated. This does give all the clients the same root password.

Proc filesystem

Eventually we noticed that there was no proc filesystem, which we remedied by adding: proc /proc procfs rw 0 0 to /pxeroot/conf/defaults/etc/fstab.

Mounting non-os filesystems

Since / is nfs mounted and read only, you can't mount drives on /mnt. You can only mount drives on the memory filesystems /var and /tmp. That is why we made a /var/mnt under the conf directory and use that for mounting remote filesystems.

Upates with pkg-add

pkg-add keeps a database of installed packages at /var/db/pkg. We do updates from a client machine with rw access to /pxeroot and the original freebsdboot:/pxeroot/var mounted as its /var.

Daemons

adapted from a message from Alex Aminoff

We have daemons that run on only one or a few of our diskless servers. We could just have separate rc.conf files for each server, but that makes maintainance error-prone. But you can put portions of rc.conf in separate files in /pxeroot/conf/.../etc/rc.conf.d and all will be concatenated to rc.conf when the diskless system boots. That is, we can create the file /pxeroot/conf/client1/etc/rc.conf.d/foo with the content:

foo_enable="Yes"
foo_config="/etc/foo.rc"
and client1 will boot running the foo service. This also makes it easy to check which machines are running foo with:
ls /pxeroot/conf/*/etc/rc.conf.d/foo

Power Failures

We have several multi-hour power failures each year, but it isn't necessary to maintain services during the outage, provided service restores when power returns. Our initial thought was that with only /tmp on local storage, there wasn't a great need for any UPS on each clinet computer. However we quickly ran into the problem that after power returns the client compute servers would attempt to boot before the NFS boot server was ready. The clients only attempt to load the kernel once, and then hang if it isn't yet available.

Some older NICs would keep reqesting DHCP addresses untill they obtained one, so with those clients one could simply ensure that dhcpd disn't come up before the tftp and boot NFS servers. None of our current clients include that desirable feature.

The best hope appears to be "autoboot_delay" together with a really big UPS on the dhcp and tftp servers. Other possibilities include getting a really big UPS for all the servers or getting smaller UPSs for the clients with a settable power-up delay. Another possibility is to use wol, if the clients can be persuaded to power-up to a wol sensitive state without attempting to boot.

Applications

According to the Unix Filesystem Standard /var is for variable data that must be preserved across reboots. There are a lot of applications that depend on that, and each needs special treatment when /var is a memory filesystem. A few we encountered are mentioned below. In retrospect, perhaps we should have tried NFS mounting a separate /var for each client.

Cron, at, batch

The crontab database lives in /var, if you want it to run on the clients you have to muck with the /pxeroot/conf tree. At and batch store their requests in directories /var/at and /var/batch. You would have to relocate those directories to persistent storage to preserve requests across reboots. We also needed to touch /var/at/at.deny, create /var/at/spool, and make both /var/at/jobs and /var/at/spool owned by daemon (not root).

Printing

Replies to PR 71488 indicate that /etc/rc.d/var should prepare for and start the line printer daemon, however we found it necessary to add the following commands to /etc/rc.local to enable printing:

/usr/sbin/chkprintcap -d
lpr

Syslog

With the above setup, system logs are lost on reboot, you probably want to establish a syslog server and direct logs there. We have a syslog server named just that and add the following line to the default /etc/syslog.conf:

*.*              @syslog

Sendmail

There are numerous writable files in /etc/mail of a genuine sendmail installation, These are not written by sendmail itself but by makemap and newaliases. Some facility is probably required for updating /etc/mail on the clients when /etc/mail on the server is updated, or a client reboot will be required whenever the aliases or access files change.

We ran into the limitation that rc.initdiskless cannot copy a symlink on top of another symlink if the target symlink points to a directory. If all the clients forward mail to a mailhub, such changes can perhaps be avoided. The source for rc.initdiskless includes information on advanced use.

NTP

ntp likes to keep a driftfile on /var. We haven't done anything to keep it between reboots.

Samba

Samba creates a number of temporary files in /var/db/samba when it starts, but does not create the directory itself. Also, smbpasswd and secrets.tdb in /usr/local/etc/samba need to be writeable, and samba needs to be able to create files in that directory. Lastly, logging needs to be relocated from /usr/local/samba/var with the "syslog" or "log file" directives in smb.conf.

SSH daemon

The first time it runs on any host sshd creates some key files in /etc/sshd that are specific to the host IP address. If you preserve these files by copying them to /pxeroot/conf/ipaddress/etc/sshd/ then connections to these machines won't imagine that they are the victim of a man-in-the-middle attack.

/tmp

rc.initdiskless actually creates a memory filesystem /tmp, which has not been a problem for us. Otherwise /tmp could be mounted over NFS (but must not be shared) or on a local drive.

strace

The strace command terminates immediately with an IO Error, for unknown reasons. We are very interested in why, especially since it might help us debug the sudo command.

Other applications

Although we haven't been using them in our diskless booted clients, I am aware that the following default to keeping their databases in /var: GNU Mailman, MySQL, named (BIND), and nis (yp). Presumably a symbolic link is sufficient to move the actual files to permanent storage if you need to keep updates across boots, but as mentioned above, rc.initdiskless may choke on the symlink, and you may have to write a script to handle this after booting is complete.

Non-problems

We did not run into the problems with SSH and vi discussed in those threads, perhaps because they relate to earlier versions of FreeBSD or those applications. The Handbook covers problems with swap and X, they haven't affected us because we don't use those facilities on our diskless clients.

System updates

Currently we have moved our root filesystem to a Netapp filer, which creates an additional problem, Updates won't work unless the "chflags" command is turned into a noop because that command won't work over NFS. I expect there are other problems we haven't run into. The Handbook mentions problems with /dev - this hasn't been the case for us.

Advanced Topics We haven't tried this but it looks interesting as an alternative way of differentiating client systems.

Comments and suggestions on diskless booting are welcome.

Daniel Feenberg
feenberg isat nber dotte org
(with thanks to Alex Aminoff, Mohan Ramanujan and Clarence Chu. Inspired by Kenneth Cleary)


Date last modified: 22 August 2008  


 
Publications:
Main Publications Page
 
New This Week
Working Papers
Books              
Books in Progress
Older Books Online
Digest            
Reporter            
Bulletin on Aging & Health
Historical Bulletins
Free Subscriptions
Paid Subscriptions
 
Research:
Program descriptions and members
 
Working Group Descriptions and Papers
 
Selected Projects:
Conference on Research in Income and Wealth
Conference on Econometrics and Mathematical Economics
Sloan Science and Engineering Workforce Project
Boston Census Research Data Center
 
Call for Papers
Submit to WP Series             
 
Data:
NBER Collection
Business Cycle Dates
Latest Business Cycle Memo
New Economic Releases
Selected Sources
Current Population Survey
Economic Organizations
US Government Agencies
Other Data Collections

Economic Report of the President
Economic Indicators
Congressional Budget Office
OECD Frequently Requested Statistics
 
About
What we are
Contact us
Non-data Links    
Search              
Site Map
Help              
Employment              
Fellowships
Early History
 
People:
Staff
Researchers
Board
Contact Us
Search
 
Search via Google:
 
printit emailit