Experience of Using Grub For DOS

J. Cobb 9/8/2006;25/10/2007
Useful Links
Grub:http://www.gnu.org/software/grub/
Grub for DOS:http://freshmeat.net/projects/grub4dos/
Grub for DOS: http://linux.softpedia.com/get/System/Boot/GRUB-for-DOS-3507.shtml
Our Local Code
This document primarilly relates to experience with Grub for DOs version 0.4.2 however some comments on our early experience with version 0.4.3 are given below.

Grub is an open source Operating System loader capable of loading a wide range of systems (not only open systems such as Linux and BSD but also Microsoft Windows). Grub for DOS is an extension of this loader. It allows the Grub loader to be loaded and run from a DOS command prompt but, as importantly, it allows Grub to operate from disk images held in memory, thus providing another route by which discless machines can be supported. This is useful to us at QM where we already have infrastructure booting into DOS on discless machines and where various initial choices/configurations (beyond the scope of the basic grub menu) are made before choosing the appropriate OS. Grub for DOS has many advantages over the loadlin loader that we currently use to boot Linux, not the least being that the size of INITRD initial ram drive is much less restricted.

To start with we had a problem because, while it worked fine when a machine was booted from a floppy, it hung when the machine was booted remotely from the corresponding floppy image. This problem was traced to a change that the remote boot rom code made to location 0x0540+0x6B. Reading the code it appears that grub for DOS expects a copy of the interrupt table from 0x0540 up. I don't know what puts this copy there (DOS itself presumably) but it turned out that the remote boot ROM code upset the entry corresponding to interrupt 0x1a (real time clock), perhaps it hooks the entry before DOS starts, DOS then starts and copies the interrupt table, later the ROM exists and restores the interrupt - I don't know. I fixed the problem by writing a small program -fixrb- to run before grub that takes the current contents of the int 0x1a interrupt (which were fine) and stores them in the backup table at 0x0540+0x6B. See Our Local Code for the source code. We have also discovered one other problem that seems to be related to some but not all of our network cards which seems to prevent Grub for DOS from correctly identifying the version of DOS. This has been addressed by implementing an extra parameter --dos7 which forces the choice of OS to be DOS7 (or later). Our local code includes a copy of the dosstart.S code with the parameter code (plus some extra diagnotic features) see the into.htm file for more details.

The extra features that we have tried amongst those that Grub for DOS implements are:

Of these it is the first  and last that were of real interest to us, though the MD device was quite useful for looking around memory while we were experimenting. (You can use the GRUB CAT command to dump arbitrary addresses as in: cat --hex (MD)blocknumber+blocks. The INT 13h facility was (for us) more of a curiosity. The improved interface for initial commands is useful but is rather limited by the DOS command interface.

To make use of the ram drive feature I wrote a program -xmsel-  to load an arbitrary file to an arbitrary address in high memory. The program is not very sophisticated and relies on the extended XMS interface to lock memory and move data into it. It assumes that upper memory is available as one block and that any address supplied will sit inside it (there is no checking). Also the XMS spec leaves the DOS extender free to move stuff around in real memory once a high memory block has been unlocked but my code assumes this won't happen. If it becomes a problem it would be possible to create a version of the program that did more checking (even allowed for mulitple memory blocks) and exited leaving the memory locks in place.  See Our Local Code for the source code.

In selecting a load address for the data one must be very careful to steer clear of any areas of memory likely to be used later by Grub - somewhere near the middle seems about right.

Under Linux we generated a disk image -dskimg- (with the kernel and Initrd  and a partition table). (http://www.osdev.org/osfaq2/index.php/Disk Images Under Linux was very useful in doing this)
Using this our boot procedure looked something like this
xmsel -fdskimg -a128
fixrb
<unload dos network drivers>
grub

map --ram-drive=0x81
map --rd-base=0x8000000
map --rd-size=0x400000
root (rd,0)
kernel /kernel root=/dev/ram0 rw ip=bootp ramdisk_size=32768 ...
initrd /initrd
boot
Another practical problem is the Grub parameter file. It this context it is mighty inconvenient that the parameter file sits in the file system you are trying to boot from. I can't agree with the author of the code (if indeed it is they) who states in a comment that it is certain that that is where one wants the configuration file. Almost certainly not, I would have thought, after all, if you are trying to say where in memory your disk image is, it is not very helpful if that information is in a file that can only be accessed if the information is already known! What is fairly certain is that the boot loader proper can't be expected to use OS facilities to read files (after all it is trying to replace the OS!). The author has partly avoided the problem by allowing grub commands to be input on the GRUB_for_DOS command line by introducing a variant of the --config-file parameter which is a list of grub commands rather than a file name. Unfortunately this is seriously limited by the size of the DOS command line. (Except when operating as a device in config.sys). It would obviously be possible to utilise the mechanism that passes the list of commands to the grub loader to pass the contents of a (smallish) file but this is not implemented.

To solve the above problem I have implemented an extension to the DOS for grub loader that accepts a single parameter @filename and then uses the contents of that file as if it were the command line. This allows up to 4Ks worth of grub configuration (if I have understood the code right!), anyway it seems to work (though of course I give no warranty). I looked to see if I could actually implement a proper configuration file on the lines suggested above but, while I could see where the modification would need to go, I was not sufficiently confident of my understanding of the code to feel it was safe.

Some Critical points

It seems rather churlish to criticise what has the potential to be a very useful package, but the main flaws are a lack of documentation (to the extent that I had to read the code to work out how some things worked) and a rather poor user interface to the extensions. Most of the extensions are invoked by extensions to Grub's map command but the extensions do not really obey the 'spirit' of the command they have been embedded into. For instance it is entirely strange that in order to define the location of a disk image in memory one should enter three separate commands to do what is conceptually one action.
You must say:
map --ram-drive=0x81
map --rd-base=0x8000000
map --rd-size=0x400000
You cannot say:
map --ram-drive=0x81 --rd-base=0x8000000 --rd-size=0x400000
(The numbers are purely illustrative - obviously they depend on circumstance).

This is scarcely intuitive and there is no warning given in the help text; though it is true that other extensions to the map command do have warnings that they must stand on their own.

The actual function (in grub) of the map command is to map between device numbers and normally it has two obligatory parameters (the to and from devices). The modifications turn it into a mechanism for creating pseudo memory based devices and (mostly) don't use the obligatory parameters. Its not clear that defining a drive and mapping one drive to another are the same thing and I suspect that implementing a separate command would be more intuitive for the user, less confusing with regard to the help text (which is one horrible long splurge), and possibly easier to maintain.

On the documentation front at least examples of all the extensions would be helpful (the RD extension doesn't seem to have one). Also it should be spelt out that the disk images used must have a partition table (as often you have to take special action to ensure that there is one). Unfortunately we found that our ram drive software (XMSDSK) didn't bother. Also there needs to be some advice about where in memory it is safe to load images. After all, if one is loading linux, grub is going to load the kernel (and initrd) into memory somewhere: perhaps over your image. In practice Grub will also use some memory as work space. As a matter of fact the Grub for Dos itself uses memory at 2MB to keep a copy of low memory so that it can restore DOS if it needs to. From my cursory reading of the code grub seems to use memory above 1Mb as follows.

<1Mb>
kernel image
initrd <temporary copy>
<gap if you are lucky>
initrd <final resting place>
<top of memory>
but I may easily have missed something. The final location of initrd is also dependant on the linux version and on entries in the image header (which can determine the maximum useable address) as well as user parameters.

Another problem with the examples is that they assume too much knowledge for a novice (for instance familiarity with the grub loader). We were new to Grub and it took us quite a while to realise that the example using the cat command to dump memory required a block number not a memory address. To a grub user no doubt it would be obvious that the syntax used  was a standard way of specifying a block range.

As an individual capable of reading c and assembler code but not necessarily particularly experienced in Linux, or Gnu C (and its assembler) or the ways of open source development I would quite like to see some sort of overview of what was being attempted and what changes had been made. Of course one can read the differences and so on but its jolly hard work to deduce backwards from the modifications what is actually intended. Also it is all very well having a chronological sequence of changes (i've done this, i've done that) but a chronological sequence is not necessarily a logical sequence.

As far as I can see the modifications fall primarily into four areas:

  1. code (essentially a separate program) that creates the right conditions for grub to execute and  that loads grub to its intended start address (by copying it in memory). This code also implements an alternative method of passing initial grub commands and attempts to save the state of DOS and recover it afterwards if grub exits. This code is compiled as a separate program and the rest of grub is appended to the end of it at a later stage of the make process. (module: dosstart.s)
  2. an int 13h handler to provide interfaces between bootloaders and the pseudo devices. (module: asm.s)
  3. modifications to the io library to handle memory backed 'devices'. (module:bios.c, routines: biosdisc, get_discinfo)
  4. modifications to the map builtin command to provide the user interface (module: builtins.c)
The dosstart.s module operates in four main phases:
  1. parameter analysis
  2. determination of OS (dos7, dosbox, dos3.3 etc)
  3. reinstatement of 'boot' conditions
  4. loading grub

Grub for DOS 0.4.3

The good news for 0.4.3 is that an equivalent facility to my configuration file patch has been included. The bad news is that I cannot make the code work on our system at all. It complains about interrupt 9 which didn't seem to be an issue with 0.4.2.Worse attempts to  change the code  seem to produce instabilities that I can't understand. (For instance I inserted a block of code after the dos_start1: label whose mere presence (not execution because I inserted a jump around it) caused some later code to hang while it was probing interrupt 74. It is hard to see how the size of the code in this part of the program could have such an effect.

Inspection of the code shows that the phase 2,3 have been merged together and an entirely different technique adopted that is implemented by the 'probe_int' and 'restore_vector' subroutines.  Probe_int essentially just loops round interrupts from 0-0x80 looking at where they are in memory and possibly calling restore_vector and building up a modified version of the vector table.  There does not seem to be a clear rational behind which interrupts are probed (some are clearly skipped because probing them has caused problems). The main problem to understanding what this code is doing is getting some idea of what the restore_vector routine is actually doing, unfortunately there are few clues.

Clearly I do not understand what is happening but I strongly suspect that in attempting to protect against TSRs hooking interrupts and causing difficulties in the boot process the author has introduced an completely different set of vulnerabilities.

Experience with PXE remote boot

We are in the process of swtiching away from Novell's remote boot to PXE. This seems not to have the problem with interrupt 0x1a but on some machines (but not all) does have a problem with two other interrupts. 0x13 (disc) and 0x15 (a rag bag). The problem only occurs when trying to use the disc mapping facility to boot from a second disc and it only occurrs for some varieties of hardware (presumably it is bios/pxe version dependent). The only fix I could find was to boot the machine with a floppy find the values in the table at 540 (addresses 58c, 594) using debug.exe then write a program to patch these values back in before we called grub for dos. This seemed to work.