www.delorie.com/djgpp/doc/eli-m17n99.html

search

The DJGPP Project

Note from DJ: This is a copy of a presentation Eli Zaretskii gave during a trip to Japan.

This file describes the DJGPP project, its goals, current status, and future perspectives.

Introduction: : What is this all about?
History: : Why and how was DJGPP developed?
Extending DOS: : How to run protected-mode programs on top of DOS?
Features: : Important features of DJGPP.
I18N: : Internationalization aspects of DJGPP.
Outlook: : What's been achieved and what's in the future.
Index: :

Node:Introduction, Next:History, Previous:Top, Up:Top

Introduction

DJGPP, an acronym for DJ's GNU Programming Platform¹, is a project which brings the GNU development tools to MS-DOS and MS-Windows systems. Its originator and principal maintainer is DJ Delorie; that's where the "DJ" in DJGPP comes from.

DJGPP is about Free Software. The ported GNU packages are, of course, free; however, the library and utilities developed specifically for DJGPP are also distributed under the GNU license. Since DJGPP supports platforms which have such a huge installed base, and since it is highly popular among MS-DOS/MS-Windows users, the project is a very important member of the Free Software movement. Significantly, a large proportion of DJGPP users are young programmers at the very beginning of their careers. Teaching those young people about the importance of free software and free sharing of ideas is in itself a worthy goal. DJGPP is in a unique position to perform this important community service because it usually is the first serious compiler used by young programmers.

But DJGPP is also about fun. It is fun to port industry-strength applications to MS-DOS and have them running seamlessly on top of a 16-bit "toy operating system". It is fun to see how these applications change the way your system looks and feels, to a point that you can almost think it is a Unix box. It is fun to have all the source code, down to the darkest corners of the library internals, free for reading and hacking. It is fun to be able to find and fix bugs no matter whether they occurred in the application code, in the library, or in the compiler. And it is fun to discuss all these matters with other users and developers all over the world, and to join forces to make the free software better and more powerful. More about this later.

This article presents an overview of the DJGPP project. Section 1 briefly tells the history of the project development. Section 2 explains how protected-mode DJGPP programs manage to run on top of MS-DOS even though MS-DOS and protected mode are incompatible. Section 3 describes several important features that DJGPP brings to MS-DOS and MS-Windows. Internationalization (a.k.a. I18N) aspects specific to DJGPP are discussed in Section 4. Finally, Section 5 summarizes the achievements of 10 years of DJGPP development, and attempts to predict its future.

Node:History, Next:Extending DOS, Previous:Introduction, Up:Top

The History of DJGPP

"In the beginning was the Word...", says the Bible.

Like every other human endeavor, DJGPP also started with a Word. And like it happens with almost everything else in the free software world, that word belonged to Richard Stallman. Here's how DJ Delorie himself describes the genesis of DJGPP²:

DJGPP was born around 1989 [...], when Richard Stallman spoke at a meeting of the Northern New England Unix Users Group (NNEUUG) at Data General, where I then worked. I asked if the FSF ever planned on porting gcc to MS-DOS [...], and he said it couldn't be done because gcc was too big and MS-DOS was a 16-bit operating system. Challenge in hand, I began.

Consequently, we should consider Richard Stallman a progenitor of DJGPP, or at least its godfather. Had it not be for his scepticism, it's possible that DJGPP would not have existed....

The first version of GCC ported by DJ was 1.35. It was compiled on a 386 machine running ISC Unix, linked with a hacked libc.a taken from that machine which had DOS-compatible replacements for system calls such as open, read, stat, etc. and converted to a DOS executable format with a custom program written by DJ: a first version of DJGPP, originally called djgcc, was born. It required Phar Lap's DOS Extender to run protected-mode code on top of real-mode DOS. See DJGPP Programs and MS-DOS, for more about DOS extender's role.

To compile itself, gcc needs lots of memory, which PCs didn't have at that time. Since the DOS extender used to run gcc didn't support virtual memory, DJ wrote his own DOS extender called go32. GCC version 1.37 was the first version built on a DOS platform using go32.

Next came the library. The first version was based on the BSD library whose sources were free'd at that time, and augmented with many custom DOS-specific functions that interfaced with the OS. The header files were based on those distributed with gcc.

The name was changed from DJGCC to DJGPP when C++ support was added. Initially, the name stood for DJ's G++, with the + characters replaced by ps because DOS doesn't allow + in file names. However, since the C++ compiler is integral to gcc distribution, DJGPP now probably stands for something like DJ's GNU Programming Platform³.

DJGPP version 1.05 was the first one available commercially, and it was a big success. Version 1.11 supported all DOS configurations, had somewhat limited support for running on MS-Windows (e.g., graphics and floating-point emulation didn't work), and appeared on the GNU Compiler Binaries CD-ROM.

This was in 1992, and around that time I myself began using DJGPP. I have just bought a brand-new 486-DX/33 and got an email account, and DJGPP v1.11m5 was the first version I downloaded and installed. Exposed to Unix-style tools by the excellent book Software Tools by B.W. Kernighan and P.J. Plauger, I was for years porting GNU programs to MS-DOS using 16-bit proprietary compilers. I became tired of dealing with missing headers, like <unistd.h>, missing functions, like popen and alloca, and missing functionality, like long command lines in a Makefile. DJGPP solved all these and many other problems, and I became instantly hooked.

By the end of 1994 DJGPP became so popular and the traffic on its mailing list became so intensive, that a FAQ list was sorely needed; the first version of the FAQ was released in February 1995. Today, in its 7th edition, the DJGPP FAQ list includes answers to 200 questions, its Texinfo source totals 540K bytes, and its printed version is more than 200 pages long.

DJGPP v1.x could not bootstrap itself: it required Borland's compiler to build the go32 extender. Cygnus, a big user of DJGPP for their DOS-based products, requested a self-bootstrapping version, so DJGPP v2 was born. Version 2 moves some parts of go32 into the C library, other parts into a stub loader produced by a special-purpose assembler capable of producing 16-bit code, and it relies on DPMI services to run on top of DOS; more about this in the next section.

Meantime, in response to the growing interest and user base, a news group dedicated to DJGPP, <comp.os.msdos.djgpp>, was created in June 1995. Nowadays, the traffic on the news group averages about 70 messages per day.

Version 2.0 of DJGPP was shipped in February 1996, after more than two years of development and testing. The v2 library is Posix-compliant, the only library that offers Posix compliance on MS-DOS, and one of the two available for MS-Windows. It also introduced transparent and automatic support of long file names on Windows 9X.

Version 2.01 was released in October 1996. The GNU Software for MS-Windows and MS-DOS CD-ROM, based on DJGPP v2.01 ports of many GNU packages, was released in the last quarter of 1998, and its first edition out-sold all other GNU CD-ROMs.

The latest version 2.02 of DJGPP was released in December 1998.

Node:Extending DOS, Next:Features, Previous:History, Up:Top

DJGPP Programs and MS-DOS

GCC generates 32-bit code, so DJGPP programs are 32-bit programs. GCC also doesn't know anything about segmented architecture of the x86 processors, so its code effectively enforces the data, stack and code segments to be constant during the program execution. However, real-mode segments of x86 CPUs are only 64KB-long. Therefore, to be able to compile large programs, like GCC itself, DJGPP must run in protected mode. This section describes the tricks pulled by DJGPP to make this possible.

Protected Mode and DOS: : Why can't they live together.
DOS Extender: : How the extender solves it.
v1 and go32: : The extender used by DJGPP v1.x.
v2 and DPMI: : DPMI is used by DJGPP v2.x.
Startup: : What the startup code does in v2.x.
Library: : Low-level v2.x library functionality.

Node:Protected Mode and DOS, Next:DOS Extender, Previous:Extending DOS, Up:Extending DOS

DOS Cannot Run in Protected Mode

Switching the CPU into protected mode is easy, but you cannot call DOS and BIOS services while the CPU is in protected mode. Why? Because DOS and BIOS code was written for execution in real mode, and so it constantly violates the rules of protected-mode programming. For example, DOS code loads many different values into segment registers, to overcome the 64KB limitation of a real-mode segment. But in protected mode, a segment register can only be loaded with a value that corresponds to one of the existing selectors; any other value causes a General Protection Fault (GPF in short).

So, if a program switches the CPU into protected mode and then calls DOS, e.g. to print a message, it will immediately crash the system. You can't write even the simplest Hello World program without hitting this brick wall!

It gets worse. DOS and BIOS code needs to be run even if the application program doesn't call any of their services. For example, 18 times a second there's a timer tick, a hardware interrupt issued by the timer chip that's supposed to advance the system clock. But the handler for the timer tick interrupt is part of BIOS, and it employs real-mode code.

So even if a program does nothing to call any real-mode code, some asynchronous system events will do that anyway, and the machine will still crash very promptly. Can the conflict between DOS/BIOS and the protected mode be solved? Yes; read on.

Node:DOS Extender, Next:v1 and go32, Previous:Protected Mode and DOS, Up:Extending DOS

DOS Extender Allows DOS and Protected Mode to Co-exist

The solution to this conflict, if you don't want to write a protected-mode operating system which replaces DOS and BIOS completely⁴, is to add a layer of software between your program and DOS/BIOS code that would switch the CPU from protected to real mode and back, as appropriate. This software layer is called DOS extender.

With a DOS extender, when a protected-mode program calls a real-mode service, the extender traps the call, switches the CPU to real mode, reissues the call, waits for the service to do its thing, then switches the CPU back into protected mode, and returns to the application code that called the real-mode service. Hardware interrupts, such as the timer tick and the keyboard interrupt, are also trapped by the extender, and also cause a switch to real mode and back.

You might think that these mode switches would considerably slow down the application. However, in practice, most programs don't call the OS services too often, and even when they do, the peripheral devices accessed by most of these services, such as the hard disk, are so much slower than modern CPUs, that the overhead of the mode switch is hardly ever noticed.

Node:v1 and go32, Next:v2 and DPMI, Previous:DOS Extender, Up:Extending DOS

DJGPP v1.x Setup with the `go32` Extender

In DJGPP v1.x, go32 was such a DOS extender. It was loaded automatically by every program during its startup. In addition to the usual functions performed by DOS extenders, it also handled some unique DJGPP-related tasks:

Loading the application and setting it up for execution.
Since DJGPP executables use COFF format, which DOS doesn't understand, go32 was responsible to read the COFF header and set up the code, data, and other segments as recorded in the header.
Unix-style command-line expansion.
This is required to overcome deficiencies in stock DOS shells which prevent even the simple task of compiling GCC without extensive hacking of its Makefiles. See Features Provided by DJGPP, for more details about this.
Floating-point emulation in protected mode.
FP emulation needs special handling in protected mode. DJGPP supplies an FP emulator which go32 would load and set up.
Graphics support.
To facilitate graphics programs, go32 allowed to load a driver suitable for the installed video hardware, and worked with the VGA bank switching features to create an illusion of a linear video memory.

Using an extender had an important advantage of being able to run on any DOS configuration, since go32 had special code to adapt itself to all known methods of switching into protected mode and managing extended memory. But it did have a significant drawback as well: the extender was loaded into conventional memory and each instance used about 130KB of that memory. Since most DOS systems had about 500 to 600 KBytes of free conventional memory, this means you couldn't have more than 3-4 nested levels of DJGPP programs. This was a grave limitation: for example, you couldn't build programs whose Makefiles required more than 2 recursive levels of make invocation (because GCC and the compiler passes it invokes require 2 additional levels of program nesting). DJGPP v2 solves this problem, as described below.

Node:v2 and DPMI, Next:Startup, Previous:v1 and go32, Up:Extending DOS

DJGPP v2.x Setup with the DPMI services

DJGPP v2.x gets rid of the extender, and instead requires DPMI services to run. DPMI, an acronym for DOS Protected-Mode Interface, is a special API that allows protected-mode programs to run on top of DOS. It defines several functions that a protected-mode program (called a DPMI client) can use to perform such tasks as entering protected mode, allocating memory and segment descriptors, calling real-mode services, hooking interrupts, etc. Many modern operating systems for Intel CPUs include the DPMI services; all versions of MS-Windows, OS/2, and Linux DOS emulator are notable examples. There are also several proprietary DPMI servers for DOS, usually bundled with DOS memory managers such as QEMM and 386MAX; and FreeDOS includes a DPMI server as part of the default setup. For those systems which don't have a DPMI server, DJGPP v2.x comes with a free server called CWSDPMI; not surprisingly, CWSDPMI reuses a lot of code from go32. The DJGPP startup code checks for DPMI services, and if they aren't available, automatically looks for and loads cwsdpmi.exe, the CWSDPMI server.

The DPMI server (a.k.a. the DPMI host) solves most of the problems of running a protected-mode program on top of real-mode DOS. The rest of the functionality, which in v1.x was the responsibility of go32, is handled in v2.x by the DJGPP startup code and low-level library functions. Let me now briefly describe these two aspects of DJGPP operation.

Node:Startup, Next:Library, Previous:v2 and DPMI, Up:Extending DOS

DJGPP v2.x Startup Code

The DJGPP v2.x startup code includes two parts: the stub loader and the library startup module. The former is a single assembly-language module which is compiled by a special-purpose assembler, called djasm, that is capable of producing 16-bit DOS executables. This stub loader is prepended to every DJGPP program during linking, and is the only part that DOS understands; all the rest--the COFF executable--is just some weird data, as far as DOS is concerned.

The second part of the startup is in the library. It consists of several modules written part in C and part in assembly. Here's where the COFF image entry point is, and that is where the stub passes the execution after it loads the program and sets it up.

Here's the short description of what the stub does:

Allocate memory for the transfer buffer.
This buffer is required for passing data to and from real-mode services. Its role is described in Library Interface with DOS and BIOS, below.
Check whether a DPMI server is already loaded.
DPMI services would already be available if either (1) a resident DPMI server, such as the one built into MS-Windows, is installed; or (2) if this is a nested DJGPP program, and its parent already loaded CWSDPMI.
If DPMI services are not available, the stub loads cwsdpmi.exe⁵. It looks for cwsdpmi.exe in the same directory where this program's executable is kept, and inside directories listed by the PATH environment variable.
Read the COFF executable header into memory.
This is required to know how much memory needs to be allocated for the various sections of the DJGPP program.
Switch the CPU into protected mode, by calling the entry point provided by the DPMI host.
Note that the rest of the stub runs in protected mode.
Allocate memory for the program's code and data segments.
This is done by calling the DPMI functions to allocate segment descriptors and memory for code and data, and set their base address, limit, and privileges.
Read the COFF executable into memory.
The code, data, and BSS sections are read into the memory allocated above, by calling DOS via the DPMI service which allows to call real-mode functions from protected-mode programs.
Jump to the COFF image entry point.
This entry point is inside the library startup module, described next.

Here's what the library startup code does:

Make the null page uncommitted.
This causes a frequent programmatic error known as the NULL pointer dereference to trigger an exception, and the offending program gets the SIGSEGV signal. The DPMI function required for this is not part of the basic DPMI 0.9 spec, and is unsupported by Windows and many other proprietary DPMI servers; but CWSDPMI does support it.
Set up the sbrk memory-allocation mechanism.
This might sound simple, but is actually quite complicated, due to some peculiarities of DPMI memory allocation. For example, it requires a special 16-bit code that runs in real mode to be loaded into a buffer of conventional memory.
Set up the run-time stack for the program.
The stack size of DJGPP programs is 512KB by default, but it can be changed both by the application and using the stubedit program.
Allocate a selector for accessing conventional memory.
Many DOS programs need to access conventional memory, either to pass data to and from DOS/BIOS functions, or to access memory-mapped devices such as the video memory of the graphics adapter. Since the conventional memory is by default not mapped into the program's data segment, a special selector, known as _dos_ds, is provided for these purposes.
Set up the signal handling.
This requires to hook some hardware interrupts, e.g. to generate SIGINT when Ctrl-<C> is pressed, or to generate SIGPROF on a timer tick.
Copy the program environment into the environ[] array.
Read a special file which defines additional DJGPP-specific environment variables.
Get and expand the command-line arguments.
This includes getting long command lines from parent DJGPP program and Unix-style expansion of file-name wildcards. See Long command lines, and also see Unix-style file-name globbing, for more details.
Set up the x87 FPU and load the FP emulator, if needed.
Call the static constructors.
Call the application's main function.

Node:Library, Previous:Startup, Up:Extending DOS

Library Interface with DOS and BIOS

Since DJGPP programs use DOS and BIOS for system calls, many library functions need to actually issue various real-mode DOS/BIOS calls. I already described above how this is done in principle: by calling a special DPMI service provided for that.

However, many real-mode services require some data to be passed. For example, when you write the contents of a buffer to a file, the corresponding DOS function requires a pointer to the buffer to be put into the DS:DX pair of registers. Moreover, the buffer whose pointer is passed to DOS must reside in the first Megabyte of the address space, because real-mode addresses use only 20 bits. In contrast, protected-mode programs use the full 32 bits for addressing, and all the data is always above the 1MB mark⁶. Now, how do we pass such addresses to DOS?

This is where the so-called transfer buffer comes to our help. As we saw, this buffer is allocated in conventional memory during the program startup. The buffer is 16KB long by default, but its size can be changed to any value between 2KB and 64KB using the stubedit program. Every library function that needs to pass data to, or retrieve data from, DOS/BIOS, needs to move that data between the transfer buffer and the protected-mode memory. For example, to write a buffer to a file, the contents of that buffer are copied to the transfer buffer, and the real-mode segment:offset-style address of the transfer buffer is passed to DOS; to read data from a file, the address of the transfer buffer is passed to DOS, and the data put there by DOS is then copied from the transfer buffer to the buffer in protected-mode memory whose address was passed by the calling application.

The startup code stores the real-mode address of the transfer buffer and its size in global variables, which are used by the library function to move data to and from the transfer buffer. The library also provides special functions to move the data between protected-mode memory and the transfer buffer as fast as possible, and thus to make this overhead smaller.

As long as the application calls relatively high-level library functions, such as open, read, write, stat etc., all of the special processing just described is done automatically and transparently by the library; the application doesn't need to know anything about the transfer buffer and data copying that goes on under the hood.

Library functions also provide other specialized processing in some cases. For example, DOS cannot read or write more than 64K bytes in one call, so the library breaks large requests into smaller chunks, each one the size of the transfer buffer, and feeds them to DOS one by one. As another example, consider memory-allocation functions such as malloc. Instead of allocating blocks off the conventional memory by calling DOS, like real-mode programs do, DJGPP issues DPMI calls to allocate extended memory and provide demand-paged virtual memory, so that all of the available memory and swap space can be used by the application via standard function calls.

Node:Features, Next:I18N, Previous:Extending DOS, Up:Top

Features provided by DJGPP

This section describes some advanced features provided by DJGPP. Most of these features are built into the C library, but some are provided by the basic development utilities which are part of the DJGPP development environment. Since DJGPP is a Posix-compliant environment, many of these features are motivated by Unix compatibility.

Compatible headers and libraries.
The DJGPP header files and library functions are highly compatible with other popular environments. In addition to full ANSI and Posix compliance, DJGPP also offers compatibility to many PC and Unix libraries. For example, DJGPP provides library functions that are usually absent from other DOS- and Windows-based libraries, like popen, glob, statfs, getmntent, getpwnam, select, and ftw. Other functions, although they exist in DOS/Windows libraries, are incompatible with Posix in subtle ways. For example, the ANSI-standard function rename typically fails in DOS/Windows implementations if the target file already exists (because the underlying OS call fails). DJGPP makes a point of sticking to Posix or Unix behavior in such cases, even if it means more processing (like removing the target file in the case of rename).
A case in point is library functions stat and fstat. Unix programs make extensive use of the inode number and the mode bits returned by these functions. For example, GNU diff examines the inode numbers of the files it is about to compare, and if they are equal, exits immediately on the assumption that both file names point to the same file. However, DOS and Windows don't support inodes, and most other DOS/Windows implementations return zero in the st_inode member of struct stat, which of course breaks diff. Also, the mode bits returned by fstat are usually incorrect. In contrast, the DJGPP implementation of these functions goes out of its way to provide compatible implementations for these functions, and in particular returns meaningful inode numbers⁷, even though it takes quite a lot of code (for example, stat code compiled totals about 17KB, together with other library functions it calls).
Such high compatibility makes porting programs very easy.
Long command lines.
When DOS invokes programs, it limits the length of the command line to 126 characters (excluding the program's name). This is a ridiculously small limit; it doesn't even allow to compile GCC, since many commands in GCC Makefiles are much longer.
Therefore, DJGPP provides a mechanism to pass long command lines to child programs. The actual command is stored in the transfer buffer, and a pointer to that buffer is passed to the child program instead of the command line itself. The startup code of the child program then retrieves the actual command-line arguments and puts them into the argv[] array passed to main.
DJGPP also supports the so-called response file method of passing long command lines, whereby the command line is stored on a disk file, and the name of that file is passed as @response-file. For example:
```
 ar cq libmylib.a @files-list
```
Unix-style file-name globbing.
All Unix programs assume that any file-name wildcards on their command line were already expanded by the shell, to yield normal file names. But DOS shells don't provide this functionality, so the wildcards would wind up verbatim in the argv[] array. To avoid the need to have special code in every ported program that expands the wildcards, the DJGPP startup code expands the wildcards automatically. The expansion follows the Unix conventions, so * expands to all file names, unlike the DOS conventions where it excludes file names with extensions.
The globbing code supports Unix-style quoting with the ' and " characters (most other DOS/Windows compilers and shells only support "). Escaping special characters with \ is limited to the quote characters themselves, since \ serves as a directory separator in DOS/Windows file names.
DJGPP also provides a special extension: the ... wildcard expands recursively to all the subdirectories. Thus, the following command would search all files in all the subdirectories, recursively:
```
 grep foo .../*
```
(This was hard to achieve even on Unix, until the recent release of the GNU Grep package introduced the --recursive option.)
Extending the shell via the system function.
Traditionally, the system library function calls the shell to process its argument. However, stock DOS shell COMMAND.COM is too dumb to be useful in many cases. For example, it doesn't support long command lines, even though DJGPP programs do; it doesn't understand forward slashes in file names; and it doesn't return the exit code of the child program to the parent.
Therefore, the DJGPP version of system usually doesn't call COMMAND.COM at all. Instead, it internally emulates its functionality, including redirection and pipes, and invokes the programs directly. This allows to provide the following important features:
- Long command lines.
  See Command line, but here it means that shell commands can have arbitrary length, even though the shell itself doesn't support that!
- Unix-style file names.
  File names which are targets of redirection can be given in the Unix /foo/bar style. Unix devices, such as /dev/null, are also supported (see Unix devices).
- Multiple commands in a single command line.
  The emulation code supports the foo ; bar feature of several commands separated by a semi-colon.
- Improved emulation of internal shell commands.
  The emulation of the shell command cd allows Unix-style forward slashes in its argument, and also changes the drive if the argument includes the drive letter.
- Support for Unix-style shells.
  If the environment variable SHELL points to a name like sh or bash, system invokes the shell to do everything, since the internal shell emulation is not sophisticated enough to cover Unix shell functionality.
- Direct invocation of Unix shell scripts.
  Shell scripts can be invoked even if the SHELL environment variable doesn't point to a Unix-style shell, provided that the interpreter whose name appears on the first script line after the #! signature can be found somewhere along the PATH.
- Exit code of the child program is returned to the caller.
COMMAND.COM is only invoked by system to run batch files or commands internal to the shell. However, system always looks for external programs first, so if you have e.g. a port of the GNU echo program installed, system will call it even though COMMAND.COM has an internal (and very much inferior) command by that name.
These features come in especially handy in the DJGPP port of GNU make. Where the original Unix code of make invokes the shell, the DJGPP port simply calls system to execute the commands in rules, and automatically gets support for long command lines and Unix-style shells required to run many Makefiles of Unix origin.
The above extended functionality also means that whenever a Unix program calls system, in most cases the same call will work without any changes when compiled with DJGPP. The result is not only ease of porting, but also less probability to leave subtle bugs in the ported program due to an overlooked fragment which assumes a Unix shell.
Transparent conversion of special file names.
All DJGPP library functions pass file names to DOS via a single low-level function. This allows to remap some special file names to their DOS equivalents. For example, Unix-standard device names /dev/null and /dev/tty are converted to their DOS equivalents NUL and CON, respectively. File names which begin with /dev/x/, where x is a drive letter, are converted to the DOS x:/ form; this is required for running some Unix shell scripts which take apart the PATH variable where colons separate directories. The implementation of the chroot functionality, which isn't supported directly by DOS and Windows, also uses this file-name conversion.
Filesystem extensions.
This feature is built into the low-level file-oriented library functions. It allows the application to install a handler for certain filesystem calls, like open, read, fstat, dup, close, etc. If installed, such a handler is called just before the appropriate primitive is invoked to pass the call to DOS. If the handler returns a non-zero value, it is assumed to have handled the call, and the usual primitive call is bypassed. Otherwise, the library proceeds with calling DOS as usual.
This facility provides an easy way of handling special files and devices which DOS and Windows don't support directly. For example, a program can install a handler for special file names like /dev/ptyp0 and emulate these non-existent devices via an async communications library.
Another way of putting filesystem extensions to a good use is when there's a need to emulate functionality that DOS file I/O doesn't support, even though the associated devices do exist. For example, suppose you need to port code which sends special commands to the terminal device via termcap functions. DOS supports a terminal device, but doesn't support termcap. However, it is possible to achieve the same effects if direct screen writes are used instead of file I/O. By installing a filesystem extension handler for the standard output handle, you could redirect all terminal I/O to direct screen writes and implement all the necessary termcap functionality, without any changes to the program's source code. This is how the DJGPP port of GNU ls supports the --color option without forcing users to install a special terminal driver that interprets ANSI escape sequences.
Support for long file names.
DOS system calls are limited to file names in the so-called 8+3 format: maximum 8 characters for the basename and maximum 3 characters for the extension. Therefore, it is impossible to access the long file names, offered by Windows 9X and Windows NT, via the DOS system calls. However, Windows 9X provides a special API (a bunch of special functions of software interrupt 21h) that allows DOS programs to access long file names. This API is widely known as the LFN API, where LFN is an acronym for Long File Names. For each file-oriented DOS system call, the LFN API includes a replacement that supports long file names. For example, there are functions to open files, list the files in a directory, create a directory, etc. using long names. The LFN API also adds several functions to access extended functionality supported by the Windows filesystems. For example, it is possible to get and set 3 times for each file, like on Unix, instead of only one time supported by DOS.
The DJGPP library features transparent and automatic support for long file names on Windows 9X⁸. The DJGPP startup code queries the system for the availability of the LFN API, and if it's available, all low-level file-oriented primitives are automatically switched to using the special LFN-aware functions. This run-time detection of the LFN support means that the same executable will run on DOS and on Windows, and will automatically support long file names when it runs on Windows 9X.
Emulation of links.
DOS doesn't support hard and symbolic links. However, DJGPP emulates them to some extent. The link library function simulates hard links by copying. The symlink library function simulates a symbolic link for executable programs only, by creating a 2KB stub which is set up to run the COFF image from the target of the link. Thus, ln -s grep fgrep does what you'd expect.
Emacs compatibility.
Emacs is special because when it dumps itself during the build process, static and global variables are frozen in the dumped image with the last value they had at the time the program was dumped. DJGPP has a special facility in the library through which library functions can detect that the program was dumped and restarted. All library functions that need static variables, use this facility to reinitialize them. This allows Emacs to be built with DJGPP without the need to analyze whether each library function called by Emacs is dump-safe.
Special-purpose utilities.
In addition to relying on GNU development tool-chain, DJGPP introduces several utilities written specifically for the project. These utilities are meant to assist the developer in solving specific tasks common for the DJGPP environment. Some of these utilities are listed below:
- djtar
  djtar is a program that unpacks archives (but cannot create them). It was originally written to unpack files created by tar, because DOS and Windows lack standard programs for that. Since the original release, djtar functionality was significantly extended, and now it can unpack .tar.gz and .zip files as well. It also can unpack archives from floppy disks written as raw /dev/rfd0a devices on Unix systems, and it uncompresses and untars .tar.gz files on the fly, by feeding the untar code with output of the unzip code. The latter feature is very important when unpacking large distributions, such as emacs-XX.YY.tar.gz, because pipes are implemented as temporary disk files on DOS/Windows, and so on-the-fly decompression avoids creating huge temporary disk files.
  The ability to unzip .zip archives makes djtar the only free program which does that, since it turns out that InfoZip's UnZip license does not comply with FSF's definition of free software (according to Richard Stallman).
  In addition, djtar offers several features designed to prevent problems due to DOS/Windows file-name restrictions, see DOS file names handling, below.
- djsplit and djmerge
  These two programs come in handy when you need to carry a large file (usually, a compressed archive of a large distribution) on floppies. djsplit splits a file into smaller chunks whose size is user-defined, and djmerge splices the chunks back together.
- dtou and utod
  These programs are close cousins of dos2unix and unix2dos, respectively, but they have several clever tricks up their sleeve. First, they take file names from the command-line arguments and rewrite each file, instead of reading stdin and writing stdout; thus, they can convert many files in a single run. And second, they preserve the time stamps of the converted files, to keep utilities like make happy. With these programs, I can convert the entire directory tree of C source files to the DOS CR-LF format with a single command:
```
 utod .../*.[ch]
```
  This uses the DJGPP wildcard expansion and the special ... wildcard mentioned above.
- update
  This is a replacement for the well-known move-if-changed shell script. It is very handy in Makefiles which should run on systems that don't have Bash installed. Since it understands Unix-style forward slashes (like all DJGPP programs do), it is also widely used in Makefiles for copying files, instead of the shell's internal COPY command, since make doesn't live well with backslashes in file names.
- redir
  As its name implies, redir redirects standard handles. It was originally written to allow redirection of stderr, which stock DOS shell COMMAND.COM cannot do. You need this redirection, e.g., when GCC spits out a long list of error messages which scroll off the screen. redir can also append redirected handled (a-la >>) and redirect stderr to the same place as stdout or vice versa, like what >& does.
  In addition, redir reports the exit status of the program it runs, and print the elapsed time used by the child. These features are provided because, unlike on Unix, there are no standard utilities to do that.
- symify
  DJGPP debugging support doesn't include Unix-style core files which allow post-mortem debugging of a crashed program. To compensate for this deficiency, when a program crashes, a special library module prints the values stored in the CPU registers and the traceback of the function calls that led to the crash, as stored in the call frames pushed onto the stack.
  However, the stack traceback, as printed, is hard to interpret, because it only includes numeric addresses of the functions. The symify program solves this problem. It reads the traceback directly from the video memory, and uses the debug info recorded in the program's executable file to convert the addresses into file names and line numbers of the source files. It then adds the file names and line numbers information near the corresponding addresses, thus making the traceback easy to comprehend.
DJGPP-specific extensions to GNU utilities.
Besides the library functions and DJGPP-specific programs, a lot of special code went into the utilities ported or written for DJGPP, so that these utilities could work together smoothly and have the effect a user would expect. Some of these extensions are listed below:
- Bash supports Unix-style PATH format. Unix uses : to separate directory names in the value of environment variables such as PATH. Many shell scripts rely on this feature to look for programs along the PATH. For example, the GNU-standard configure scripts do that to find gcc, ranlib and other programs, as part of the auto-configuration process.
  However, DOS and Windows use ; to separate directories in PATH (because absolute file names include a drive letter, like in d:/foo/bar). This breaks shell scripts which search along the PATH.
  To allow these scripts to run without changes, the DJGPP port of Bash introduces a special variable PATH_SEPARATOR. If this variable is set to :, Bash converts the value of PATH to pseudo-Unix form. For example, if the original value of PATH is like this:
```
 PATH=c:\djgpp\bin;d:\gnu\emacs\bin
```
  then setting PATH_SEPARATOR=: converts it to this:
```
 PATH=/dev/c/djgpp/bin:/dev/d/gnu/emacs/bin
```
  This lets Unix shell scripts run unaltered. However, to prevent the external commands from breaking (because they don't know anything about PATH_SEPARATOR), Bash converts the value of PATH back to its usual DOS style in the environment it passes to child programs.
  The DJGPP library supports the special /dev/x/ file names by converting them to the usual DOS x:/ format, before it issues DOS calls, so all DJGPP-compiled utilities can be safely run by a script when PATH_SEPARATOR is set to :.
- test -x foo looks for foo.exe, foo.com, foo.bat, etc. This is important e.g. in GNU configure scripts which look for programs along the PATH.
- install foo /bin/foo actually installs foo.exe in the target directory. Similarly, gcc -o foo creates both foo and foo.exe; the first causes make to be happy when Unix Makefile is in use (since the target names are usually extension-less on Unix), while the second can be run from the DOS command prompt, since stock DOS shell refuses to run a program without one of the executable extensions (.exe, .com or .bat) it knows about. Both of these features are intended for using Unix Makefiles without changes.
- Shell specifications such as /bin/sh cause the shell to be looked for along the PATH as well, so that users won't need to have a /bin directory.
- Programs which should pipe text to lpr, write to the local printer device instead, if lpr could not be located. Emacs and dvips are two examples of programs that offer this feature.
- DOS file names handling: programs which unpack file archives rename files whose names are invalid on DOS/Windows. The DJGPP ports of GNU tar and cpio programs, and the djtar utility supplied with the DJGPP development kit are examples of such programs. They replace characters which aren't allowed in file names, like + on MS-DOS or " on MS-Windows, and rename files whose names are reserved on DOS/Windows by character devices (and therefore writing to them could have unexpected results).
  Another potential problems in unpacking file archives is that several different file names can map to the same name after truncation to the DOS 8+3 limits (see 8+3 file names) or as result of the automatic renaming I just described. For this reason, djtar refuses to overwrite existing files, and requires the user to type in another name under which the file will be extracted. If the user presses <RET>, the file is skipped.
  This interactive, one-by-one renaming might be tedious and error-prone, when there's a lot of files to rename. A case in point is the test suite in the GNU Textutils distribution with a lot of names like n+4b2l10f-0FF, njml17f-lmlmlo, etc. For these cases, djtar has a command-line option which can be used to submit a file with a mapping between original and DOS names; djtar will automatically rename every file mentioned there and will leave all other file names intact. An example of putting this feature to use can be seen in the latest versions of Textutils (look for the file djgpp/fnchange.lst and the instructions to use it in djgpp/README).
The features mentioned above are mostly small niceties. But can you imagine the amount of hacking needed to get Unix Makefiles and shell scripts to work on DOS and Windows machines, if these tidbits didn't exist?

Node:I18N, Next:Outlook, Previous:Features, Up:Top

DJGPP and Internationalization

Modern development environments support internationalization by providing facilities to read, write, and display text on languages other than English and character sets other than US-ASCII. For example, most GNU packages support the gettext library and proprietary facilities similar to it, which allow the messages printed by programs be in any of the supported native languages.

DJGPP, being a DOS/Windows-based environment which uses lots of software ported from Unix, faces several unique challenges on its way to internationalization. This section briefly outlines the problems and their possible solutions.

First, some background on international aspects of the operating systems supported by DJGPP.

The international features of MS-DOS rely on so-called DOS codepages. A codepage is a particular mapping between 128 non-ASCII characters and their 8-bit codes in the range [128..255] (the lower 128 codes in every codepage are always occupied by the usual 7-bit ASCII characters). IBM defined several codepages, each one identified with a unique number, to support certain character sets, and these codepages are included with each version of DOS. Every codepage roughly corresponds to one of the ISO-8859 character sets, but the mapping of the high 128 characters is different. For example, codepage 850 corresponds to ISO-8859-1 (a.k.a. Latin-1) character set, codepage 862 corresponds to the ISO-8859-8 (Hebrew) set, etc.

In the default text-mode operation, the DOS terminal is a character terminal which can display a single set of 256 glyphs at a time. This set is determined by the current DOS codepage. The default set of glyphs which corresponds to the native locale is usually burnt into the video hardware; to install a different codepage, you need to edit the system configuration files and reboot. This loads the glyphs of the character set supported by the new codepage into memory, and also updates other devices; for example, it downloads the corresponding font into the local printer.

Windows defines additional codepages, many of them similar or identical to the ISO-8859 character sets for the same locale (e.g., codepage 1252 is identical to the Latin-1 set). However, Windows doesn't allow DOS programs to use these new codepages, and it still requires a system reboot to replace the single supported DOS codepage. So DJGPP programs can still support only one codepage at a time, even when they run on Windows.

Therefore, to use i18n facilities such as the GNU gettext package, DJGPP programs need an additional layer of recoding characters, because the DOS codepage for a given locale maps characters differently from the corresponding ISO-8859 character set. One solution to this problem is to convert the existing *.po files supplied with GNU packages to corresponding DOS codepages. Such conversion can be performed automatically by the GNU recode utility, which supports many of the existing codepages.

The DJGPP version of Emacs 20.4 employs similar technique to display the character set supported by the current DOS codepage. However, unlike gettext, Emacs performs the conversion from the ISO charset to the codepage and back in real time, by defining a special coding system, which is driven by a table that maps the ISO charset into the DOS codepage. The same coding system is also used to read and write files produced by other DOS-based software. This solution avoids introducing new character sets into Emacs, which would be extremely undesirable, as Emacs already has too many partially-overlapping character sets.

Conversion of a single character set might be the way to cause a program speak your native language, but what about programs that need to display more than a single character set at a time, like Emacs 20? Well, one solution is to simulate the glyphs that cannot be displayed with similar glyphs from other character sets. Thus, some Cyrillic characters can be simulated by glyphs of similarly-looking ASCII characters. Where no single glyph can reasonably stand for a non-ASCII character, it could be simulated with strings of several characters. For example, the Latin-1 character ç (a small c with a cedilla) could be displayed as the string {c,}, where the braces serve as a visual indication that this is a single character. Emacs makes this solution based on glyph remapping possible by providing a facility known as a display table, whereby each character can be mapped either to a code of a single glyph, or to a string. If a character is mapped to a string, Emacs redisplay code knows that this string stands for a single character, and so commands which e.g. move point and count columns still work correctly. This is how the DJGPP version of Emacs 20.4 manages to display character sets beyond the one supported by the current codepage.

Solutions are also required for printing multi-lingual text from Emacs. Currently, the only solution available is via the ps-print package, which requires a printer with PostScript support or a PostScript interpreter such as Ghostscript. Other printing commands, like lpr-buffer, currently support only one character set: the one which corresponds to the installed DOS codepage.

In sum, as far as i18n is concerned, DJGPP is certainly more limited than modern GUI environments such as X Windows, but current solutions are quite adequate for most needs of a typical user.

Node:Outlook, Next:Index, Previous:I18N, Up:Top

Summary and Perspective

The DJGPP project exists for 10 years. This might seem like not too long, but it is. Consider this: in 1989, when DJ Delorie began porting GCC, MS-DOS v4.00 was just released and became the hottest issue in the trade press, MS-Windows was not yet heard of outside Microsoft, Linux was still several years away, the latest version of GCC was 1.35, and Emacs was in version 18.5x. We might also reflect on what each one of us did around that year, to get a feeling how much water went under the bridge since then....

So what has DJGPP achieved during this time? This section offers a retrospective summary, and then attempts to outline future developments.

I think the most important achievement is that DJGPP brought the free software to the large community of DOS/Windows users. We may not like the reasons why these systems are so widespread, and we might resent the quality of the software which they run, but the fact remains that there is a huge installed base of such systems. DJGPP brings many users of these systems in touch with free software. It teaches them the value of free access to the sources and free exchange of knowledge and ideas about software internals. It also shows them how this freedom helps to make their software much better than proprietary tools, haunted by software patents, undocumented behavior, and non-disclosure agreements, ever could. Thanks to DJGPP, many young programmers have learned these lessons at the very beginning of their careers, and these are lessons they will not forget easily.

On a more practical note, consider the large body of free software successfully ported to DOS/Windows as part of DJGPP during the years. Besides GCC and Binutils, more than 50 GNU and free software packages were ported, including Emacs, Bash, GDB, Make, Gawk, Perl, TeX, Ghostscript, RCS, CVS, Tar, and many others. The document you are reading now was written in Texinfo using Emacs 20.3, spell-checked with Ispell, converted into Info and HTML with makeinfo, typeset with TeX, previewed as a PDF file produced with dvipdfm, and printed with dvips, all of them DJGPP ports. The GNU Software for MS-Windows and MS-DOS CD-ROM, first released by the FSF in the last quarter of 1998, holds 400MB of GNU software ported to DJGPP; people who bought that CD sometimes write to me that using the software makes them forget what OS did they boot in the morning. All of these ports are in active maintenance, and new versions are ported as the GNU maintainers release them. Many GNU packages already include DJGPP support as part of the official distribution, and work is under way to add such support to other packages.

This abundance of free, high-quality, actively-maintained software which runs on platforms found in each household and in every office really makes a difference. It certainly makes the GNU project and its goals known and popular among users who could have never heard about GNU were it not for DJGPP. To me, it is no surprise that the GNU DOS/Windows CD-ROM instantly became such a big hit and sold more disks than all other GNU CD-ROMs together (200 copies sold during the first 2 months, which brought FSF about $9600). Thus, DJGPP not only makes GNU popular, it also helps to raise funds for the GNU project. Ironically, a project which began because the FSF thought it was impossible, ended up supporting the FSF. History made a full circle.

I know I promised to try to predict the future of DJGPP. But now, that we have done all this way and came to the end of this document, I must confess: I lied. I don't want to set my feet on the slippery path of predicting the future, first, because I'm not good at that, but mostly because DJGPP defies all predictions. DJGPP produces DOS executables, so it doesn't support native Windows programming (although DJGPP programs still make very good console applications when they run on Windows). Microsoft declared DOS dead and actively tries to retire all DOS-based software by deliberately preventing DOS programs running on Windows from accessing some lucrative and useful Windows services. In theory, this should have killed DJGPP. Nevertheless, many people not only use DJGPP, they even choose to run it not on Windows, but in plain DOS. All the hype about Windows being "the way of the future" notwithstanding, users prefer the stability and reliability of DOS-based DJGPP environment to a fancy GUI.

One thing I can be positive about: we will certainly see DJGPP ports of more free software. Several packages, like egcs, inetutils, recode, and UCB Logo are being ported as we speak.

As for the core of DJGPP, its development depends on too many factors unbeknownst to me. One obvious direction is to add support for creating native Windows programs. But this is a large project which requires several dedicated volunteers to work on it for several months. It is not clear whether such a team could be assembled, given that many potential candidates either switch to Linux or use one of the existing free Windows development environments, like Mingw32 and Cygwin.

So the truth is, I don't know what the future of DJGPP will look like. Instead, let me tell what I hope the free software movement will learn from the DJGPP experience. I hope we could learn that free software projects should not ignore popular platforms just because we don't like their operating system. By supporting enthusiasts that are ready to bring free software to those platforms, we could do much better: we could expose a much larger audience to our projects, and we can raise money for continuing our projects by selling software ported to those platforms and support services for them.

Node:Index, Previous:Outlook, Up:Top

Index

PATH separator, Unix-style: Features
chroot support: Features
ioctl, emulation: Features
make, support for Unix features: Features
system function, extended functionality: Features
DJGPP FAQ list: History
DJGPP utilities: Features
DJGPP, history of development: History
DJGPP, news group: History
DJGPP, what it is: Introduction
Codepages: I18N
Codepages, and Emacs: I18N
Command lines, longer than 126 characters: Features
Converting text files: Features
Core dumps: Features
DJ Delorie, author of DJGPP: Introduction
DOS codepage: I18N
DPMI services: v2 and DPMI
Debugging, post-mortem: Features
Device I/O, emulation: Features
Device names, Unix: Features
Emacs and DOS codepages: I18N
Emacs dumping, and library functions: Features
Exit status reporting: Features
FAQ list for DJGPP: History
Filesystem extensions facility: Features
Globbing: Features
Header files, compatibility: Features
LFN API: Features
Library, compatibility: Features
Links, emulation: Features
Long command lines: Features
Long file name support: Features
News group for DJGPP: History
Post-mortem debugging: Features
Printing, multi-lingual: I18N
Redirecting stderr: Features
Richard Stallman, and DJGPP development: History
Splitting large files: Features
Symlinks, emulation: Features
Timing programs: Features
Transfer buffer: Library
Unix compatibility: Features
Unix device names: Features
Unpacking compressed archives: Features
Updating files: Features
Wildcards: Features
utilities, DJGPP-specific: Features

Footnotes

This is not what DJGPP originally stood for, see The History of DJGPP.
See the DJGPP history page on DJ's Web server, for more details.
There is no official interpretation of the acronym DJGPP. A contest for the best name was held more than a year ago; the results can be found by searching the DJGPP mail archives.
This is exactly what Linux, Hurd, and latest versions of MS-Windows do. Interestingly enough, the original reason for DJ Delorie's interest in porting GCC was that he wanted to use it to write a 32-bit OS for PCs.
The name of the default DPMI server program is recorded in the stub and can be changed by editing the stub with a special program called stubedit.
Theoretically, memory below 1MB could be used by DJGPP programs. However, since this memory is usually at a premium, all DPMI servers leave it alone; CWSDPMI uses it only if there's not enough memory above 1MB.
My personal involvement with the DJGPP library development began when I wrote the first version of stat and fstat which returned meaningful inode numbers and also corrected some other frequent blunders in DOS versions of these functions.
Windows NT does not include this API, therefore DJGPP programs cannot access long file names on NT systems. However, a beta version of a free LFN driver for NT is available.

webmaster	delorie software privacy
Copyright © 1999	Updated Jul 1999

Menu

Footnotes