Book: Linkers & Loaders
That’s an excellent book that I’ve read three times and here is my review…
Title: Linkers & Loaders
Author: John R. Levine
This book deals with a rare subject in computer science bibliography. It’s written to cover specifically the inner workings of one of the most important tasks on everyday use of software, the linking process that takes place during the building of the application as well as the loading which includes numerous stages on nowadays systems. In my opinion, this is an excellent book that anyone interested in either low level system programming or system level security should read. However, in some parts the content is outdated so don’t expect to find the latest information on linking & loading in this book although it’s an excellent book. Anyway, here is my detailed review separately for each chapter.
Chapter 1: Linking and Loading
Here the author introduces the basics of linking and loading. It also provides a realistic example using C programming language and open source development tools to demonstrate the linking and loading steps.
Chapter 2: Architectural Issues
This chapter is one of my favorites since it deals with low level architectural issues important for the linking and loading procedures. It discusses various subjects from common ones such as the different ABIs, memory addressing, function calling conventions etc. to PIC/PIE, shared libraries and embedded architectures’ tricks. The chapter includes brief discussion of three popular architectures, Intel x86, IBM 370 and SPARC (both V8 and V9) but it also addresses issues on other processors such as the address space and NULL pointer bugs on VAX using PDP-11 processing unit. At the end of the chapter the author provides some simple exercises on the previously mentioned topics.
Chapter 3: Object Files
Object files’ chapter is a large one in comparison to the others. It starts by introducing the essentials on object files and continues on using DOS EXE files to describe basic features like relocation, fixup entries, loading etc. In the chapter you can also find information on relocation symbols and brief analysis of the following object file formats:
– UNIX a.out
– System V ELF
– IBM 360 Object
– AIX extended COFF
– Windows PE
– DOS OMF
This chapter also provides information on relocation and symbol tables and it concludes with some exercises along with the beginning of a linker project written in Perl.
Chapter 4: Storage Allocation
This is probably the most important task of a loader. This large and very interesting chapter goes through everything from simple storage allocation to multiple segment types and alignment issues up to more specific ones like Fortran’s common storage feature. Then it discusses the main difficulties that arise with C++ duplicate code removal during the linking process which is a complex task because of the virtual function tables, templates, extern inline functions etc. features that the language supports. More system specific features are analyzed including Microsoft Windows’ COMDAT flag, gnu.linkonce. of GNU Linker, BeOS dynamic linker’s library reference dependencies as well as C++ .init and .fini segments. It contains numerous information on widely known low level features like constructor and destructor methods but it also contains less popular subjects like IBM mainframe’s external dummy sections feature because of the IBM pseudo registers, PL/I problems, the lack of pseudo registers support of OS/360, RISC’s limited address range etc. Next, the loading phase is explained in detail as in how the segments are initialized and which processor registers are used to perform this task. At last, the chapter contains some basic information on linker control scripts such as GNU linker’s scripting support and Microsoft linker’s command switches and it ends with the usual exercises along with the extension of the linker project.
Chapter 5: Symbol Management
This chapter goes through the basic concepts such as name resolution, general symbol information and binding to more detailed subjects such as symbol table formats, module tables and global symbol tables. Then it moves to one of the most critical linking stages, the name mangling which is described in detail and examples are provided in C as well as Fortran. The author also mentions some architectural problems like PDP-6 and PDP-10’s name collision. The chapter continues to a complex topic which is the naming rules and general name mangling with C++ type encoding. Based on the PL/I’s linker an example of link-time type checking is also given and the chapter ends by explaining the debugging, line number, symbol and variable information including some practical issues regarding the symbol management. The linker project is also extended to provide basic symbol management functionalities.
Chapter 6: Libraries
From the linking & loading point of view libraries are a pretty special category. In this chapter the author explains them starting from a common overview of the library formats to archive file formats that are operating system specific. Various nuances are discussed like BSD text archive header’s __.SYMDEF, the current COFF and ELF issues, Microsoft’s Windows ECOFF etc. It also deals with problems faced on 64-bit architectures and a full section is provided for the Intel’s OMF libraries used by the ISIS operating system. The next topics covered are library creation, searching libraries and of course, performance issues of both searching and scanning libraries. At last, the linker project is extended based on the newly acquired knowledge.
Chapter 7: Relocation
As the author states, this is the heart of the modern linking process. Both hardware and software relocation are discussed and explained in detail. Concepts like x86 segment slow lookups or library “bit creep” are also analyzed. After that, a few more advanced issues are described, these include link-time and load-time relocation, symbol, segment relocation etc. providing operating system specific examples. Next, the chapter contains information on instruction relocation from simple ones such as x86 that handles just PC relative and direct addresses to SPARC which has no direct addresses and it also supports four different branch formats including the special SETHI absolute address relocation feature. Relocation is analyzed based on different file formats including ELF and OMF and then special relinkable and relocatable file formats are discussed. Once again, the linker project continues throughout the chapter adding new relocation features.
Chapter 8: Loading and Overlays
The process of loading a program to the memory and executing it might seem simple but this chapter proves the opposite. It starts by explaining the basic loading procedure with and without relocation giving information on MVS hardware relocation as well as the possible performance issue of the load-time relocation. It then moves to Position-Independent Code explaining how the IBM’s TSS/360 used a simple brute-force like approach to support PIC. This chapter continues with more specific topics like per-routine pointer tables and IBM AIX’s table of contents (TOC) technique for PIC. Next, the author moves to the popular ELF file format and how it implements the PIC scheme which was introduced by UNIX System V Release 4 providing some great figures and assembly (x86) snippets to demonstrate them. As you might have been expecting, the pros and cons of PIC is the next part of the book and a small section is dedicated to explain the bootstrap process which is basically the loading of the first program in a system. This chapter ends with a complete analysis of tree-structured overlays which were widely used in the pre-virtual memory era and are still a good approach for DSPs. Of course, the linker project hasn’t completed yet and new features are added.
Chapter 9: Shared Libraries
In this chapter statically linked shared libraries are explained in detail from a linker & loader’s point of view. Author moves from simple problems such as binding-time issues that are library specific to more practical topics like address space management, the structure of shared libraries, creation of the jump table etc. The next parts continue with information on linking with shared libraries where an example with COFF is provided. The next section moves on to the following stage which is running programs using shared libraries and problems that are arise from this. Finally, author gives an example of the previous information based on the malloc()’s “hack” using pointer to functions and the chapter ends with the usual project exercises to improve the linker project.
Chapter 10: Dynamic Linking and Loading
Dynamic linking has lots of advantages and a few disadvantages, this is how this chapter begins. After that a quick historical overview is given saying that dynamic shared libraries were basically introduced by SunOS and starts out a complete analysis of dynamic linking on ELF file format. When this is done it moves to the next stage which is a step-by-step explanation of the loading of a dynamically linked program. Since the examples are focused on ELF the following section discusses the use of Procedure Linkage Table (PLT) to perform lazy procedure linkage on x86 architecture and ends up with some peculiarities of dynamic linking like the versioning of libraries as well as the static initializations. Still on ELF examples the next sections talk about the dynamic loading at run-time using dlopen(), dlsym() etc. along with the ELF dynamic linker. It then moves to the Microsoft’s world by introducing some concepts of the Microsoft dynamic-link libraries like the loading of PE files and the Windows’ kernel-side dynamic linker and Microsoft’s run-time relocation known as rebasing. Imported and exported symbols of the PE files are also analysed and then author discusses the lazy binding feature and as in the ELF case he mentions LoadLibrary(), FreeLibrary(), GetProcAddress() etc. and their use. DLLs and threads are also described along with the Thread Local Storage (TLS) method and it ends up with pseudo-static shared libraries like the OSF/1’s. The last sections deal with possible improvements for performance giving real world examples like SGI’s quickstart (basically it’s just pre-loading objects), BeOS relocated libraries’ caching, Microsoft Windows’ pre-relocation etc. Before the exercises and the linker project is a small section that compares the various dynamic linking methods basically on Windows with PE against UNIX using ELF.
Chapter 11: Advanced Techniques
The final chapter of the book is dedicated to advanced topics in the linking and loading process. This means techniques to overcome the challenges raised by complicated languages like C++ (name mangling, global initializers, templates etc.). The author discusses the “trial linking” introduced in the cfront compiler and also similar functionalities on different linkers. The next problem described is the duplicate code elimination where examples for both GCC and Microsoft’s linkers are provided. Chapter continues with analyzing more complex subjects including incremental linking and relinking giving many different examples from real world implementations, garbage collection on link-time which was a pretty interesting section in my opinion. Then, one of the most complicated tasks; How to perform link-time optimization which could easily be a single book on its own, however, the author gives excellent information in a really limited space. The last three sections discuss link-time and load-time code generation and at last, there is a section dedicated in explaining how the sophisticated model of the Java linking works giving step by step the stages of the linking process.
To conclude, this is an excellent book that in my opinion covers the subject that it deals with in an excellent manner. The author is an expert in this field and his book is very easy to read and follow; Even though it’s less than 300 pages it includes all the essential information required to understand and probably start working on linkers & loaders. The figures, diagrams and code snippets are only given when this is needed and this makes this book escape from the boring books filled with useless figures, code snippets etc. for something that has already been discussed in detail and it’s easily understandable. Anyway, it’s a great choice for anyone interested in linking and loading from a developer’s point of view. :)