Packages

Design goals

The over-arching goal of the Common Lisp package system is manage the symbols of very large Lisp systems. This devolves into three subgoals:

Hierarchical
Little packages are built up symbol by symbol. Medium sized packages are built up from little packages. Large packages are built from medium sized ones. And so on. It is unacceptable to have to build very large packages symbol by symbol
Fast
Symbol look up must be fast. It is unacceptable for the Common Lisp Reader to wait while the package system walks a large tree of package definitions. The target ideal is to simply hash the string and find the symbol.
Memory efficient
Since the package system targets very large systems the target ideal is memory consumption that scales linearly with the number of symbols in the system. It is not acceptable to use lots of memory to fully expand the tree of package definitions
Since these goals are in conflict, the package system has an explicit non-goal, intended to create a degree of freedom for the design.

Non-goal: Coherent mutability

One could attempt to compute with the package system by changing it as the computation proceeds. This would involve tracking internal dependencies in order to maintain coherence in the face of mutation. This is explicitly rejected. You build the package system. You use it, taking advantage of the fact that it is fast and memory efficient. Finally you exit the Lisp image, discarding it. It only supports sufficient mutability for construction and debugging.

A two stage naming process

In Common Lisp the names of variables and functions are symbols. Symbols are not strings of characters. Symbols are in-core data structures with various slots including a property list, a function cell, and a value cell. Symbols have names and the names of symbols are strings of characters. Thus access to a variable from the keyboard is a two stage operation. First the user types in some characters. It is a symbol name and is looked up to find a symbol. That symbol names a variable which is looked up to find its value.

Rather than try to justify Common Lisp's two step naming process, I will contradict myself by denying it. Programs that write programs and programs that analyse programs never touch the keyboard with a robotic hand. They work with symbols as the names of functions and variables and never look at the symbols' names. From their point of view it is a single step process.

Uniqueness requirements

The names of functions and variables are unique. Distinct functions have distinct name. Distinct varibles have distinct names. So a symbol names at most one function and one variable.

That account is not fully pedantic.

(flet ((s (x) 1)
       ((setf s)(new-value x) 2))
  (let ((s 3))
    (declare (special s))
    (let ((s 4))
      (setf (s s)
            (locally 
              (declare (special s))
              (s s))))))
In the form (setf (s s)(locally (declare (special s)) (s s))) I see s naming two functions and two variables. Nothing important is at stake here. The names of functions and variables are unique. A symbol names at most one lexical variable, at most one special variable, at most one getter function, and at most one setter function.

There is no uniqueness requirement on the names of symbols. A common character string, such as "x" quite frequently names a dozen or more symbols. When you type in john::smith the function CL:READ upcases the symbol name from smith to SMITH and upcases the package prefix from john to JOHN. It does this for historical reasons. Then it goes looking for a symbol called SMITH in a package called JOHN.

Neither disjoint nor exhaustive

If you have done a little mathematics you have probably come across the theorem that an equivalence relation partitions a set. Here partition is a technical term indicating both that the partition is exhaustive, every element of the set belongs to some equivalence class, and disjoint, none of the equivalence classes overlap. The notion of a partition is an attractive one and every-one who studies the Common Lisp package system starts by imagining that the packages partition the set of symbols in a program. This is a catastrophic error and you must continually fight against it.

Common Lisp packages are not exhaustive. There are symbols that belong to no package. There are a great many such symbols and they are deliberately created outside the package system, usually with gensym The point is that program writing programs, such as one creates with defmacro, need their own private variables that cannot clash with those typed in by the user. Omitting these symbols from the package system makes it impossible to type them in, which prevents name clashes.

Common Lisp packages are not disjoint. Indeed all 978 symbols in the COMMON-LISP package usually belong to every package. If a program contains widely used utility functions the symbols that name those functions typically pervade nearly all of the packages that make up the program

Fundamental requirement

The key uniqueness reqirement on the package system is that for each package individually the symbols belonging to that package have unique names. Given a symbol-name, such as SMITH, and a package-name, such as JOHN, there is only one possible symbol that they could be naming.

Notice the weakness of this uniqueness requirement. The symbol from the previous paragraph could also belong to another package PAUL. Then john::smith and paul::smith are the same symbol. The presence of the symbol called SMITH in the JOHN package prevents any other symbol called SMITH from joining the JOHN package. When the symbol called SMITH signs up to the PAUL package as well it blocks that package from any other symbol called SMITH. It is hogging two packages.

This kind of package hogging is ubiquitous. Symbols in the COMMON-LISP package get everywhere. JOHN::CAR = PAUL::CAR = COMMON-LISP:CAR. If your program involves automobiles you are out of luck! Well, out of luck for several paragraphs, until we get to shadowing symbols.

Notice that my john::smith example clunks. A package name doesn't really work like a symbol's given name, nor does it work like a symbol's family name. Indeed the clunkiness of the example conveys this message: don't try to think of the package name as an extra component to the symbols name, it doesn't really work like that.

A partition

Each package classifies the symbols that belong to it as internal or external. This is a partition. Each symbol is either internal or external but never both. A symbol may belong to many packages. It is internal or external in each package independently. If a symbol belongs to n packages the system must store n bits just to keep track of whether it is internal or external in each of its packages. One assumes that this is implemented as one bit in each package.

Simple example

A manufacturing program. Alice writes the code for handling wood in a package called WOOD. Bob writes the code for handling metal in a package called METAL.

(defpackage "WOOD"
  (:use "COMMON-LISP")
  (:export "GLUE" "NAIL"
           "BURN" "GROW"))

Variables and functions are named by symbols. What kind of object names a package? If packages were named by symbols we would need to look up the name of the symbol in a package to find out which symbol was intended. Which package should we look in? What would that package be called? Common Lisp avoids an infinite regress by using strings to name packages.

Alice doesn't want to be typing

(cl:defun square (x) (cl:* x x))
so she "uses" the COMMON-LISP package, adding all those symbols to her own package. Now when she is working in the WOOD package she has two extra options:
(wood::defun square (x) (wood::* x x))
and
(defun square (x) (* x x))

Bob knows more about Lisp. He knows that COMMON-LISP has a nickname CL, and he understands a rather technical point that we leave for later that means that he can get away with writing:

(defpackage metal
  (:use cl)
  (:export weld rivet
           smelt cast))
The :export keyword is marking certain symbols as external symbols. When you :use a package you only use its external symbols. A package exports its external symbols to its users.

Alice and Bob cooperate on the main program, which lives in the MAIN package.

(defpackage main
  (:use cl wood metal))
Since MAIN uses both WOOD and METAL there is the potential for name clashes between the external symbols of WOOD and the external symbols of METAL. Alice and Bob must coordinate their choices of names for external symbols of their packages. However they remain unconstrained in their use of internal symbols.

Grand Strategy

Part of the Lisp vision is to use S-expressions as a general purpose human readable data format. The reader is important because it is used for reading Lisp source code. The wider vision puts it an the center of information technology strategy, standardising on a single format for both code and data

This is an attractive vision, for practical reasons. Every additional data format your organisation supports is costing you money, but your customers don't care and won't pay for them. The fuss and bother of multiple data formats is pure muda. On the other hand this strategy puts great pressure on READ. It must be efficient. This pressure also bears on the infrasture that supports READ. When READ reads a symbol it has to look it up via the package system, so the package system must also be efficient.

The key to understanding the package system is to approach it from the perspective of speed and memory consumption. What design makes it fast without making it bloated? From this point of view the design of the Common Lisp package system has a compelling logic to it, and the details are as one would expect from the general principles.

Anticipating the threats to efficiency

You can already see the basic problem taking shape. MAIN, WOOD, and METAL all use CL, all 978 symbols of it. When we write

(defpackage uber-package
  (:use package1 package2 ... package-n))
what are we doing to our memory usage? Is it scaling with the number of packages we use, which is unavoidable, or is it scaling with the total number of symbols that they export, which could get us into serious trouble.

The design of the Common Lisp package system is that UBER-PACKAGE merely remembers which packages it uses. When READ wants to look up a string in UBER-PACKAGE in order to find a symbol, the package first checks the symbols that it is looking after directly, then it checks all the packages that it is using. This avoids the problem of bloat, but it causes another problem to loom up, that of pointer chasing.

The concern about pointer chasing is that a large system might accumulate long chains of package use, with package "A" using package "B" which uses package "C" and so on. Indeed if package "C" uses package "A" one might say "and so on, add infinitum". The design of the Common Lisp package system resolves these concerns by limiting the system to a single level of indirection.

This decision guarantees efficiency but it blows a hole in the design by omitting necessary functionality. Sometimes package "A" uses package "B" and package "B" uses package "C" and you need package "B" to pass a symbol through from package "C" to package "A". Obviously we need to expain how this hole is patched, but the limitation to a single level of indirection is achieved in an interesting way, so we discuss that first, leaving the issue of passing symbols through a package for later, even though it is very important.

Using packages doesn't chain

When package YING uses package YANG, the external symbols of YANG are added to the internal symbols of YING. If in addition package YANG uses package YING this appears to create a loop in the package system. This is an illusion. YANG only uses the external symbols of YING. The symbols that YING got from YANG became internal symbols of YING, so YANG does not see its own symbols coming back to it via YING.

So circular package definitions, in which two packages use each other, are not really circular. They should be treated as common place and unremarkable. Although the rule that the external symbols of the used package become the internal symbols of the using package avoids chains of inheritance and avoids cycles there is the banal issue of forward reference. We deal with this next, while the loose thread of passing symbols through a package contines to dangle.

Forward references

The issue of forward references is not banal the first time one meets it, but one has met it before, in the context of mutually recursive functions. A common coding style is to code up the function calling tree starting from the leaves. If f calls g and h then one writes the code and places it in the source file in the order g,h,f. If functions u and v are mutually recursive which do you put first? In Common Lisp organising your source file starting at the leaves and working towards the trunk is a convention that you are free to reject. The language does not require it. In PASCAL (if memory serves) leaf-to-trunk is required by the language and there is a special forward declaration that is necessary to achieve mutual recursion. Common Lisp is able to cope without forward references because the forward referenced function is not actually called until run time, which happens after the whole file has been compiled.

The package system has a problem. Rember that it has a uniqueness requirement. No package may contain two symbols with the same name. That requirement goes right to the point of the package system and has to be checked. The designers opted for eager checking.

(defpackage "A" (:use "B" "C"))
checks immediately for conflicts between all the symbols of "A" andthe external symbols of "B" and "C". Obviously "B" and "C" must already exist.

Let us put this in the context of our example. The program for manufacturing needs some maths utilities, sine, cosine, logarithm, square-root. Suppose that Alice and Bob divide up the work between them with Alice putting sine and cosine in her WOOD package, and Bob putting logarithm and square-root in his METAL package. This is probably a bad decision: they should create a fourth package MATHS and put all four in there, but the decision is good enough to give verisimilitude to my example.

Now the file packages.lisp that contains the package definitions reads like this:

(defpackage wood
  (:use cl)
  (:export glue nail
           burn grow
           sine cosine))

(defpackage metal
  (:use cl wood)
  (:export weld rivet
           smelt cast
           logarithm square-root))

;; Now that we have defined metal we can go
;; back and fix up wood to use metal
(use-package 'metal 'wood)

(defpackage main
  (:use cl wood metal))

Take a moment to ponder the difference between defun and defpackage. You can distribute your defuns across multiple files and load them in any order. However the system must enocunter defpackages in leaf-to-trunk order, with fix-up commands in the right places. If you distribute your defpackage commands across your source files you impose constraints on the order of loading your files to build your system, constraints that you might have avoided. This is why you are advised to put all of the defpackage commands in a single packages file that is loaded first. This confines the ordering requirements imposed by the package system to a single constraint: load packages.lisp first.

In Common Lisp: The Language, Guy Steele shows how to write files with mutually using defpackages distributed across source files. The clever code checks for the existence of packages and lets you load the files in any order. This is an unnecessary flight of technique that makes the package system appear more difficult than it really is. Just put your package defintions in a single file, packages.lisp, and have an easy time of it.

Once again my example clunks. It would have been better to organise the packages like this

(defpackage maths
  (:use cl)
  (:export sine cosine
           logarithm square-root))

(defpackage wood
  (:use cl maths)
  (:export glue nail
           burn grow)

(defpackage metal
  (:use cl maths)
  (:export weld rivet)
           smelt cast)

(defpackage main
  (:use cl wood metal maths))
My suspicion is that is rare that having mutual use of packages is a good design. CL permits it; that doesn't mean you have to do it.

Passing symbols on

We have seen that package use doesn't chain. the external symbols of the used package become internal symbols of the using package and so are not picked up by further use. This leaves a loose end because sometimes you need to pass a symbol on, from a leaf package, through a branch package, towards the trunk. You do it like this:

(defpackage leaf
  (:export popular))
  
(defpackage branch
  (:use leaf)
  (:export popular))

(defpackage trunk (:use branch)) 
This causes the branch package to create its own copy of the mapping from "POPULAR" to leaf:popular. This can be implemented fairly cheaply, just add an entry to the hashtable inside the package with a pointer to the string as the key and a pointer to the symbol as the value; it probably costs 12 bytes.

There are three words of jargon associated with this, with present, accessible, and inherited being technical terms.

Present

The implications of a symbol being present in a package are that memory is consumed by the package object linearly with the number of symbols that are present and that access to that symbol via the package attains the ideal of hashing the string and finding the symbol.

Inherited

The implications of a symbol being inherited by a package are that memory is consumed by the package object in proportion to the number of packages used by the package, and that access to that symbol is slowed down by the need to search through the list of packages.

accessible

Either present or inherited

Comparing a present symbol with an inherited symbol we notice that inherited symbols are very economical on memory. A package can use another package with a great many external symbols, all for a small fixed cost in memory.

In our initial example all the symbols are present in exactly one package and accessible in one or two packages. Then we added four symbols for mathematics and they were accessible in all three packages, but they were still only present in a single package.

The typical situation in a small program is that every symbol is present exactly once and many symbols are accessible via several different packages. In a larger program some symbols are passed through from leaf package via a branch package to a trunk package. Such symbols are present in both the leaf and the branch. Indeed they can be present in several branches.

We have so far only added symbols to packages either in bulk, with :use and use-package, or individually with :external. Looking ahead there are also import commands that can add a symbol from one package to another package. These provide a second way in which a symbol can be present in more than one package. We will motivate this additional commands before discussing them further.

Symbol initialisation without package qualifiers

When the reader encounters a symbol without a package qualifier it looks for it in the current package. The current package is stored in *package*. If the symbol isn't accessible in the current package then the reader creates it adding it to the package. It immediately makes arrangements for printing the symbol.

The goal of these arrangements is "print-read consistancy". When a symbol is printed out, reading that text back in should get back to the same symbol. See 2.3.6 Package System Consistency Rules for extra consistancy rules

If we continue to use the same package printing-read consitancy is easy: it is sufficient to print just the name of the symbol. If we change to a different package, but one in which the symbol is accessible, merely printing the sybol-name remains sufficient.

However we may change to a package in which the symbol is not accessible. Now the printer must prefix the symbol's name with a package qualifier. It is sufficient to use any package in which the symbol is accessible. If we consider the possibility of un-using a package we see that it is more robust to use a package in which the symbol is present.

Common Lisp adopts a simple approach. When a symbol is created it makes a note of the package then current. This is the symbols home-package and can be accessed with symbol-package. Later, if the symbol becomes present in other packages in addition to the home package this creates forward pointers from those packages to the symbol, but Common Lisp does not create backward pointers from the symbol to the additional packages in which it is present

Unintern

Cancelling inadvertent symbol creation

When you are working interactively and changing between packages it is common to accidently create a symbol in the wrong package. You can cancel the package entry of a newly created symbol with unintern.

Cancelling inadvertent pass-throughs

Sometimes you export an inherited symbol that you didn't mean to export. Now it is present in a package in which it should only be accessible via inheritance. You can cancel this with unintern which removes its entry in a specified package

The limits of unintern

In more complicated situations a symbol may be present in multiple packages. If you unintern it from its home package Common Lisp will set the symbol's home package to nil. If you print it out Common Lisp will check the home package cell in the symbol's data structure and because it is nilthe symbol will appear as #:symbol indicating that it is an uninterned symbol. However it is still accessible via the packages in which it is present. Worse it will often be present in those packages becauses they export it. So it will remain accessible via all the package that use any of the packages in which it is still present. In the worst case it may still be accessible in the package from which it was unintern - unintern will appear to have partially failed to work.

Remember the package systems non-goal. If you are debugging and the package system has got into a mess it is time to correct your sources, quit the Lisp image and start a fresh build.

Subtraction

So far we have been building new packages