...making Linux just a little more fun!

Pixie Chronicles: Part 3 PXE boot

By Henry Grebler

The Story So Far

In Part 1 I outlined my plans: to build a server using network install. However, I got sidetracked by problems. In Part 2 I made some progress and dealt with one of the problems. Now I'm going to detail what I did when I got it right.

Quick Overview

The task is to install Fedora 10 on a machine (the target machine) which will become a web and email server. I'm not going to use CDs; I'm going to install from a network. So, somewhere on the network, I need a machine to supply all the information normally supplied by CDs or a DVD.

How booting from a disk works

When a machine is (re)started, the BIOS takes control. Typically, a user configures the BIOS to try various devices in order (DVD/CD, Floppy, HDD). If there is no removable disk in any drive, the boot proceeds from the hard drive. The BIOS typically reads the first block from the hard drive into memory and executes the code it finds (typically GRUB). This leads to more reading and more executing.

How PXE works

When a machine using PXE is (re)started, it behaves like a DHCP client. It broadcasts a request on a NIC using the MAC address (aka Ethernet address) of the NIC as an identifier.

If the machine is recognised, the DHCP server sends a reply. For PXE, the reply contains the IP address and a file name (the PXE boot Linux kernel). The PXE client uses tftp to download the specified file into memory and execute it. This leads to more reading and more executing.

Get the software for the PXE server

You'll need tftp-server, DHCP, syslinux.

I was already running dnsmasq as a DHCP server, so I didn't need to install anything. There are many packages which can act as DHCP servers. If you are already running DHCP software, you should be able to use that.

I didn't have tftp. Getting it on a Fedora machine is just:

	yum install tftp-server

PXELINUX is part of the syslinux RPM (which came as part of my server's Fedora distribution), but it's also available on the installation images.

Configure DHCP

Since I was already running dnsmasq as a DHCP server, I just needed to add a couple of lines:

	dhcp-host=00:D0:B7:4E:31:1B,b2,192.168.0.60
	dhcp-boot=pxelinux.0

These lines say: if the host with MAC address 00:D0:B7:4E:31:1B asks, tell him his hostname is b2, he should use IP address 192.168.0.60 and tell him to boot the file pxelinux.0 (by implication, using tftp).

	NB I'm using a non-routable IP address from the subnet
	192.168.0/24 during the build of the server. Later,
	I will configure different IP addresses in preparation
	for using this server on a different subnet.
	NB Although I can specify a hostname here, in this case it
	acts merely as documentation because the install process will
	require that the name be specified again.

In other words, at this stage, these are interim values which may or may not be the same as final values.

Having made changes to the config file, I restart the DHCP server:

	/etc/rc.d/init.d/dnsmasq restart

Configure tftp

By default, tftp, which runs from xinetd, is turned off. Edit /etc/xinetd.d/tftp

	< disable = yes
	---
	> disable = no

Restart xinetd:

	/etc/rc.d/init.d/xinetd restart

If you examine /etc/xinetd.d/tftp, you will see that the default configuration wants to serve /tftpboot. I don't like to have such directories on my root partition, so I created /tftpboot as a symlink pointing to the directory which contained the requisite data.

	mkdir /Big/PXEBootServer/tftpboot
	ln -s /Big/PXEBootServer/tftpboot /

Note that the use of a symlink keeps the path short. There are limits; and it means there is less to type.

Either I wasn't thinking too clearly, or perhaps I was concerned that I might not be able to get the net install to work, because I downloaded 6 CD images instead of a single DVD. Knowing what I know now, if I had to do it over, I would download just the DVD image. (The old machine on which I want to install Fedora 10 only has a CD drive, not DVD.)

Populate /Big/PXEBootServer/tftpboot:

One of the CD images which comes with Fedora 10 is called Fedora-10-i386-netinst.iso; it has a copy of pxelinux.0 and some other needed files.

	mount -o loop -r /Big/downloads/Fedora-10-i386-netinst.iso /mnt2
	cp /mnt2/isolinux/vmlinuz /mnt2/isolinux/initrd.img /tftpboot
	cp /usr/lib/syslinux/pxelinux.0 /tftpboot

(Note: I could also have typed

	cp /usr/lib/syslinux/pxelinux.0 /Big/PXEBootServer/tftpboot

but using the symlink saves typing.)

Configure pxelinux

	mkdir /tftpboot/pxelinux.cfg

What do we have so far?

	ls -lA /tftpboot/.
total 19316
-rw-r--r-- 1 root root  17159894 Nov 20 11:50 initrd.img
-rw-r--r-- 1 root root     13100 Feb  9  2006 pxelinux.0
drwxrwxr-x 3 root staff     4096 Nov 27 15:42 pxelinux.cfg
-rwxr-xr-x 1 root root   2567024 Nov 20 11:50 vmlinuz
	cd /tftpboot/pxelinux.cfg

Most of the documentation says to create (touch) a whole stack of files, but I prefer to make just two: one called default as a backstop, and one with a name like 01-00-d0-b7-4e-31-1b. What is this?

Get the MAC address of the NIC to be used. This is nowhere near as easy as it sounds. My PC has 2 NICs. Which is the one to use? It's probably the one that corresponds to eth0. I discovered which one was eth0 by first booting into the Knoppix CD - a very good idea for all sorts of reasons (see Part 2). Once booted into Knoppix, ifconfig eth0 will give the MAC address. In my case it was 00:40:05:58:81:2F.

Now convert the colons to hyphens and upper- to lower-case:

        echo 00:D0:B7:4E:31:1B | tr ':A-Z' '-a-z'
00-d0-b7-4e-31-1b

The required filename is this string preceded by 01- ie 01-00-d0-b7-4e-31-1b.

I chose to be very specific. I figured if the net install is successful, I can use the same technique for other target machines. To achieve that, each machine would be distinguished by its MAC address. Consequently, I did not think it was a good idea to use default for a specific machine. Rather, I configured default to contain generic parameters to handle the possible case of an as yet unconfigured machine performing a PXE boot. default would also handle the case where I got the MAC address wrong (for example while I was trying to figure if the other file should be in upper-case or lower-case). Of course, the PXE boot would behave incorrectly, but at least it would show that there was some life in the system.

Here are the contents of the /tftpboot/pxelinux.cfg/default:

prompt 1
default linux
timeout 100

label linux
kernel vmlinuz
append initrd=initrd.img ramdisk_size=9216 noapic acpi=off

Readers familiar with GRUB or LILO will recognise this as a boot config file.

default is a generic config file. I could use it for the installation, but I'd have to type a lot of extra args at boot: time, for example

	boot: linux ks=nfs:192.168.0.3:/NFS/b2

To save typing and to provide me with documentation and an audit trail, I've tailored a config file specifically for my current server. And how better to tie it to the target machine than by the target machine's MAC address? (In the absence of user intervention, MAC addresses are unique.)

/tftpboot/pxelinux.cfg/01-00-d0-b7-4e-31-1b:

prompt 1
default lhd
timeout 100
display help.txt
f1 help.txt
f2 help.txt
f3 b2.cfg

# Boot from local disk (lhd = local hard drive)

label lhd
localboot 0

label linux
kernel vmlinuz
append initrd=initrd.img ramdisk_size=9216 noapic acpi=off

label b2
kernel vmlinuz
append initrd=initrd.img ramdisk_size=9216 noapic acpi=off ksdevice=eth0 ks=nfs:192.168.0.3:/NFS/b2

        NB It may look ugly, but the append line must be a single
        line. The kernel cannot handle any mechanism which attempts to
        split the text over more than one line.

See /usr/share/doc/syslinux-3.10/syslinux.doc for info on the items in the config file.

Please note that this is vastly different from the first file I used (the one that got me into trouble). By this stage, the config file has been refined to within an inch of its life.

Here are the little wrinkles that make this config file so much better:

default lhd
	tells the PXE client to use the entries under the label lhd
	by default

ksdevice=eth0
	forces the install to use eth0 and avoids having anaconda (the
	Linux installer) ask the user which interface to use

ks=nfs:192.168.0.3:/NFS/b2
	tells anaconda where to find the kickstart file

When the target machine is started, the PXE client will eventually come to a prompt which says

	boot:

If the user presses Enter, the machine will attempt to boot from the local hard drive. If the user does not respond within 10 seconds (timeout 100 has units of 1/10 of a second), the boot will continue with the default label (lhd). Alternatively, the user can enter linux (with or without arguments) for different behaviour. In truth, I used this label earlier in the development of this config file; I would lose little if I now removed this label.

If the user presses function-key 1 or function-key 2 , help is displayed. If the user presses function-key 3, the actual boot config file is displayed (on the PXE server b2.cfg is just a symlink pointing to /tftpboot/pxelinux.cfg/01-00-d0-b7-4e-31-1b). I added the f3 entry largely for debugging and understanding - there's nothing the user can do but choose menu options which are displayed by help.txt.

This completes this part of the exercise. This is the point at which you can try out the PXE process.

Try it out

Machines differ and I daresay PXEs differ. I will describe how my machine behaves and what to expect.

When turned on, my machine, a Compaq Deskpro EP/SB Series, announces its own Ethernet (aka MAC) address and that it is trying DHCP. The MAC address indicates which NIC is being used. This machine will present this MAC address to the DHCP server, which hopefully will respond with the offer of an IP address. It is also used at PXE boot time when searching the TFTP directory for its config file.

If the DHCP server responds, then a series of dots appear as my machine downloads the PXE boot Linux kernel. When the kernel is given control it announces itself with the prompt boot:.

If you get to this point, then the PXE server has been configured correctly, and the client and server are working harmoniously. In Part 4 I'll discuss the rest of the install.

It didn't work for me!

You might recall that in Part 1 Lessons from Mistakes I discussed this subject. I will also confess that the first few times I tried to get things going, I was not successful.

Here's what to do if things don't behave as expected.

	This terminology is misleading. As much as you might like
	things to work first time, if you're human, chances are they
	don't. Consequently, you should expect things to not work.
	(You can feel that you are batting better than average when
	they do work.)

Here's what to do when things go wrong. First, the target machine is in a very primitive state so it is not likely to be of much help. Since the process under investigation involves a dialogue between 2 machines, you would like to monitor the conversation (A said U1, then B said U2, then A said U3, ... ). Who was the last to speak? What did he say? What did I expect him to say?

Ideally, you would have a third machine on the same subnet as the 2 machines (the target machine and the PXE server). But that's unlikely, and adds very little. Clearly, the place to monitor the conversation is on the PXE server. Use tcpdump and tshark and/or ethereal).

As my notes from the exercise say,

	This gave me the opportunity to try and try, again and again,
	turning on tcpdump and tshark, and fiddling and fiddling until
	about the 15th time I got it right.

Here's the expected conversation:

PXE client	sends broadcast packet with its own MAC address
PXE server	sends BOOTP/DHCP reply

I see the above repeated another 3 times. Then,

PXE client	arp enquiry
PXE server	arp reply

The arp enquiry comes from an IP address (earlier packets have an IP address of 0.0.0.0), and confirms that the DHCP server has sent the client an IP address and that the client has used the value specified.

PXE client	sends a RRQ (tftp read request) specifying pxelinux.0
PXE server	sends the file

There should be many packets, large ones from the server (containing the file data); small (length 4) acks from the client.

PXE client	sends a RRQ specifying pxelinux.cfg/01-00-d0-b7-4e-31-1b

This is where you find out what exactly your PXE client is asking for.

I guess there are zillions of other ways that things could go wrong, but the above covers pretty much everything I can think of.

Here's a complete conversation:

12:03:06.208042 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548
12:03:06.243002 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300
12:03:07.192231 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548
12:03:07.192641 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300
12:03:09.169864 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548
12:03:09.170287 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300
12:03:13.125134 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:d0:b7:4e:31:1b, length: 548
12:03:13.142644 IP 192.168.0.3.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length: 300
12:03:13.143673 arp who-has 192.168.0.3 tell 192.168.0.60
12:03:13.143716 arp reply 192.168.0.3 is-at 00:01:6c:31:ec:77
12:03:13.143912 IP 192.168.0.60.2070 > 192.168.0.3.69:  27 RRQ "pxelinux.0" octet tsize 0 
12:03:13.261195 IP 192.168.0.3.34558 > 192.168.0.60.2070: UDP, length 14
12:03:13.261391 IP 192.168.0.60.2070 > 192.168.0.3.34558: UDP, length 17
12:03:13.261601 IP 192.168.0.60.2071 > 192.168.0.3.69:  32 RRQ "pxelinux.0" octet blksize 1456 
12:03:13.284171 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 15
12:03:13.284367 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.309059 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.310453 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.333326 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.334717 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.358162 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.359557 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.381382 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.382769 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.405583 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.406981 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.430128 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.431518 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.453394 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.454782 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.478026 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1460
12:03:13.479420 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.503527 IP 192.168.0.3.34558 > 192.168.0.60.2071: UDP, length 1456
12:03:13.504917 IP 192.168.0.60.2071 > 192.168.0.3.34558: UDP, length 4
12:03:13.517760 IP 192.168.0.60.57089 > 192.168.0.3.69:  63 RRQ "pxelinux.cfg/01-00-d0-b7-4e-31-1b" octet tsize 0 blks
12:03:13.544383 IP 192.168.0.3.34558 > 192.168.0.60.57089: UDP, length 25
12:03:13.544592 IP 192.168.0.60.57089 > 192.168.0.3.34558: UDP, length 4
12:03:13.584227 IP 192.168.0.3.34558 > 192.168.0.60.57089: UDP, length 373
12:03:13.584728 IP 192.168.0.60.57089 > 192.168.0.3.34558: UDP, length 4
12:03:13.584802 IP 192.168.0.60.57090 > 192.168.0.3.69:  38 RRQ "help.txt" octet tsize 0 blksize 1440 
12:03:13.615601 IP 192.168.0.3.34559 > 192.168.0.60.57090: UDP, length 25
12:03:13.615803 IP 192.168.0.60.57090 > 192.168.0.3.34559: UDP, length 4
12:03:13.639528 IP 192.168.0.3.34559 > 192.168.0.60.57090: UDP, length 368
12:03:13.640023 IP 192.168.0.60.57090 > 192.168.0.3.34559: UDP, length 4
12:03:18.259021 arp who-has 192.168.0.60 tell 192.168.0.3
12:03:18.259210 arp reply 192.168.0.60 is-at 00:d0:b7:4e:31:1b

/tftpboot/help.txt:

The default is to boot from the local hard drive.

You can enter one of:

        lhd     (default) boot from local hard drive
        linux   (legacy) normal (ie manual) install
        b2      (hmg addition) kickstart install for b2
                ksdevice=eth0
                ks=nfs:192.168.0.3:/NFS/b2

        f3      display the current boot config file

The final contents of /tftpboot:

	ls -lAt /tftpboot/.
total 19328
drwxrwxr-x 3 root   staff      4096 Dec 21  2008 pxelinux.cfg
-rw-rw-r-- 1 root   staff       364 Dec  7  2008 help.txt
lrwxrwxrwx 1 root   staff        33 Dec  7  2008 b2.cfg -> pxelinux.cfg/01-00-d0-b7-4e-31-1b
-rw-r--r-- 1 root   root   17159894 Nov 20  2008 initrd.img
-rwxr-xr-x 1 root   root    2567024 Nov 20  2008 vmlinuz
-rw-r--r-- 1 root   root      13100 Feb  9  2006 pxelinux.0

	ls -lAt /tftpboot/pxelinux.cfg/
total 24
-rw-rw-r-- 1 root   staff   369 Dec  7  2008 01-00-d0-b7-4e-31-1b
-rw-rw-r-- 1 root   staff   123 Nov 26  2008 default

Resources:

http://www.thekelleys.org.uk/dnsmasq/doc.html
http://fedoraproject.org/en/get-fedora
ftp://mirror.optus.net/fedora/linux/releases/10/Fedora/i386/iso/Fedora-10-i386*

References:

http://linux-sxs.org/internet_serving/pxeboot.html http://www.stanford.edu/~alfw/PXE-Kickstart/PXE-Kickstart-6.html http://syslinux.zytor.com/wiki/index.php/PXELINUX http://docs.fedoraproject.org/release-notes/f10preview/en_US/What_is_New_for_Installation_and_Live_Images.html fedora-install-guide-en_US/sn-automating-installation.html http://www.linuxdevcenter.com/pub/a/linux/2004/11/04/advanced_kickstart.html

Talkback: Discuss this article with The Answer Gang


[BIO]

Henry has spent his days working with computers, mostly for computer manufacturers or software developers. His early computer experience includes relics such as punch cards, paper tape and mag tape. It is his darkest secret that he has been paid to do the sorts of things he would have paid money to be allowed to do. Just don't tell any of his employers.

He has used Linux as his personal home desktop since the family got its first PC in 1996. Back then, when the family shared the one PC, it was a dual-boot Windows/Slackware setup. Now that each member has his/her own computer, Henry somehow survives in a purely Linux world.

He lives in a suburb of Melbourne, Australia.


Copyright © 2010, Henry Grebler. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 175 of Linux Gazette, June 2010

Tux