Next: G. MEMORY ALLOCATION AND Up: LISP 1.5 Programmer's Manual Previous: E. OVERLORD - THE

F. LISP INPUT AND OUTPUT

This appendix describes the LISP read and write programs and the character-manipulation programs. The read and write programs allow one to read and write S-expressions. The character-manipulating programs allow one to read and write individual characters, to process them internally, to break atomic symbols into their constituent characters, and to form atomic symbols from individual characters.

The actual input/output routines are identical for both the LISP read and write, and the character read and write. Input is always from either SYSPIT or the card reader. Printed output is always written on SYSPOT and/or the on-line printer. Punched output is always on SYSPPT and/or the on-line printer. The manner in which these choices are controlled was described in Appendix E.

LISP READ PRINT and PUNCH

The LISP read program reads S-expressions in the form of BCD characters and translates them into list structures. It recognizes the delimiters "("and")" and the separators ". " ", " and (blank). The comma and blank are completely equivalent.

An atomic symbol, when read in, is compared with existing atomic symbols. If it has not been encountered previously, a new atomic symbol with its property list is created. All atomic symbols except numbers and gensyms are on a list called the object list. This list is made of sublists called buckets. The atomic symbols are thrown into buckets by a hash process on their names. This speeds up the search that must occur during reading.

For the purpose of giving a more extended definition of an atomic symbol than was given in Section I, the 48 BCD characters are divided into the following categories.

        Class A A B C ...Z = * /
        Class B 0 1 2 3 4 5 6 7 8 9 + - (11 punch)
        Class C ( ) , . (blank)
        Class D $
        Class E - (4-8 punch)

The 4-8 punch should not be used.

Symbols beginning with a Class B character are interpreted as numbers. Some sort of number conversion is attempted even if it does not make sense.

An ordinary atomic symbol is a sequence of up to 30 characters from classes A, B, and D, with the following restrictions.

a. The first character must not be from class B.

b. The first two characters must not be $ $.

c. It must be delimited on either side by a character from class C.

There is a provision for reading in atomic symbols containing arbitrary characters.

This is done by punching the form $$dsd, where s is any string of up to 30 characters, and d is any character not contained in the string s. Only the string s is used in forming the print name of the atomic symbol; d and the dollar signs will not appear when the atomic symbol is printed out.

Examples

        Input         will print as        $$XAX         A
        $$()))(         )))
        $$_UV.)_        UV.)
        $$/_./         _.

The operation of the read program is critically dependent upon the parsing of left and right parentheses. If an S-expression is deficient in one or more right parentheses, reading will continue into the next S-expression. An unmatched right parenthesis, or a dot that is out of context, will terminate reading and cause an error diagnostic.

The read program is called at the beginning of each packet to read doublets for evalquote until it comes to the S-expression STOP. read may also be used explicitly by the programmer. In this case, it will begin reading with the card following the STOP card because the read buffer is cleared by evalquote after the doublets and STOP have been read. After this, card boundaries are ignored, and one S-expression is read each time read is called. read has no arguments. Its value is the S-expression that it reads.

The pseudo-functions print and punch write one S-expression on the printed or punched output, respectively. In each case, the print or punch buffer is written out and cleared so that the next section of output begins on a new record.

prin1 is a pseudo-function that prints its argument, which must be an atomic symbol, and does not terminate the print line (unless it is full).

terpri prints what is left in the print buffer, and then clears it.

Characters and Character Objects

Each of the sixty-four 6-bit binary numbers corresponds to a BCD character, if we include illegal characters. Therefore, in order to manipulate these characters via LISP functions, each of them has a corresponding object. Of the 64 characters, 48 correspond to characters on the key punch, and the key-punch character is simply that character. The print names of the remaining characters will be described later. When a LISP function is described which has a character as either value or argument, we really mean that it has an object corresponding to a character as value or argument, respectively.

The first group of legal characters consists of the letters of the alphabet from A to Z. Each letter is a legitimate atomic symbol, and therefore may be referred to in a straightforward way, without ambiguity.

The second group of legal characters consists of the digits from 0 to 9. These must be handled with some care because if a digit is considered as an ordinary integer rather than a character a new nonunique object will be created corresponding to it, and this object will not be the same as the character object for the same digit, even though it has the same print name. Since the character-handling programs depend on the character objects being in specific locations, this will lead to error.

The read program has been arranged so that digits 0 through 9 read in as the cor- responding character objects. These may be used in arithmetic just as any other number but, even though the result of an arithmetic operation lies between 0 and 9, it will not point to the corresponding character object. Thus character objects for 0 through 9 may be obtained only by reading them or by manipulation of print names.

The third group of legal characters is the special characters. These correspond to the remaining characters on the key punch, such as "$" and ''=". Since some of the characters are not legitimate atomic symbols, there is a set of special character-value objects which can be used to refer to them.

A typical special character-value object, say DOLLAR, has the following structure

|  .----.---.  .-----.---.  .---.---.  .-----.---.  .---.---.
`->| -1 |   |->|PNAME|   |->|   |   |->|APVAL|   |->|   |***|
   `----'---'  `-----'---'  `---'---'  `-----'---'  `---'---'
                              |                       |
                            .-V-.---.               .-V-.---.
                            |   |***|               | \$ |***|
                            `---'---'               `---'---'
                              |
                            .-V------.
                            | DOLLAR |
                            `--------'

Thus "DOLLAR" has value "$".

The special character value objects and their permanent values are:

        DOLLAR         $
        SLASH         /
        LPAR            (
        RPAR            )
        COMMA         ,
        PERIOD         .
        PLUSS         +
        DASH            - (11 punch)
        STAR            *
        BLANK         blank
        EQSIGN         =

The following examples illustrate the use of their objects and their raison d'etre. Each example consists of a doublet for evalquote followed by the result.

Examples

EVAL (DOLLAR NIL) value is " $"
EVAL ((PRINT PERIOD) NIL) value is "." and ". " is also printed.

The remaining characters are all illegal as far as the key punch is concerned. The two characters corresponding to 12 and 72 have been reserved for end-of-file and end-of-record, respectively. The end-of-file character has print name $EOF$ and the end-of-record character has print name $EOR$; corresponding to these character objects are two character value objects EOR and EOF, whose values are $EOR$ and $EOF$ respectively. The rest of the illegal character objects have print names corresponding to their octal representations preceded by $IL and followed by $. For instance, the character 77 corresponds to a character object with print name $IL77$.

The character objects are arranged in the machine so that their first cells occupy successive storage locations. Thus it is possible to go from a character to the corresponding object or conversely by a single addition or subtraction. This speeds up character-handling considerably, because it isn't necessary to search property lists of character objects for their print names; the names may be deduced from the object locations.

Packing and Unpacking Characters

When a sequence of characters is to be made into either a print name or a numerical object, the characters must be put one by one into a buffer called BOFFO. BOFFO is used to store the characters until they are to be combined. It is not available explicitly to the LISP programmer, but the character-packing functions are described in terms of their effects on BOFFO. At any point, BOFFO contains a sequence of characters. Each operation on BOFFO either adds another character at the end of the sequence or clears BOFFO, i. e. , sets BOFFO to the null sequence. The maximum length of the sequence is 120 characters; an attempt to add more characters will cause an error.

The character-packing functions are:

1. pack [c] : SUBR pseudo-function

The argument of pack must be a character object. pack adds the character c at the end of the sequence of characters in BOFFO. The value of pack is NIL.

2. clearbuff [ ] : SUBR pseudo-function

clearbuff is a function of no arguments. It clears BOFFO and has value NIL. The contents of BOFFO are undefined until a clearbuff has been performed.

3. mknam [ ] : SUBR pseudo-function

mknam is a function of no arguments. Its value is a list of full words contaning the characters in BOFFO in packed BCD form. The last word is filled out with the illegal character code 77 if necessary. After mknam is perform BOFFO is automatically cleared. Note that intern [mknam[ ]] yields the obje whose print name is in BOFFO.

4. numob [ ] : SUBR pseudo-function

numob is a function of no arguments. Its value is the numerical
object represented by the sequence of characters in BOFFO.
(Positive decimal integers from 0 to 9 are converted so as to
point to the corresponding character object. )

5. unpack [x] : SUBR pseudo-function

This function has as argument a pointer to a full word. unpackconsiders the full word to be a set of 6 BCD characters, and has
as value a list of these characters ignoring all characters
including and following the first 77.

6. intern [pname] : SUBR pseudo-function

This function has as argument a pointer to a PNAME type structure
such as -

.---.---.   .---.---.
|   |   |-->|   |***|
`---'---'   `---'---'
  |           |
.-V------.  .-V------.
| EXAMPL |  | E????? |
`--------'  `--------'

Its value is the atomic symbol having this print name. If it
does not already exist, then a new atomic symbol will be
created.

The Character-Classifying Predicates

1. liter [c]: : SUBR predicate

liter has as argument a character object. Its value is T if the
character is a letter of the alphabet, and F otherwise.

2. digit [c]: : SUBR predicate

digit has as argument a character object. Its value is T if the
character is a digit between 0 and 9, and F otherwise.

3. opchar [c]: : SUBR predicate

opchar has as argument a character object. Its value is T if
the character is +, -, /, *, or =, and F otherwise. opchartreats both minus signs equivalently.

4. dash [c]: : SUBR predicate

dash has as argument a character object. Its value is T if the
character is either an 11-punch minus or an 8-4 punch minus, and
F otherwise.

The Character-Reading FunctionsThe character-reading functions make it possible to read
characters one by one from input.

There is an object CURCHAR whose value is the character most
recently read (as an object). There is also an object CHARCOUNT
whose value is an integer object giving the column just read on
the card, i. e., the column number of the character given by
CURCHAR. There are three functions which affect the value of
CURCHAR:

1. startread [ ] : SUBR pseudo-function

startread is a function of no arguments which causes a new card
to be read. The value of startread is the first character on
that card, or more precisely, the object corresponding to the
first character on the card. If an end-of-file condition
exists, the value of startread is $EOF$. The value of CURCHAR
becomes the same as the output of startread, and the value of
CHARCOUNT becomes 1. Both CURCHAR and CHARCOUNT are undefined
until a startread is performed. A startread may be
performed before the current card has been completely read.

2. advance [ ]: : SUBR pseudo-function

advance is a function of no arguments which causes the next
character to be read. The value of advance is that character.
After the 72nd character on the card has been read, the nextadvance will have value $EOR$. After reading $EOR$, the nextadvance will act like a startread, i. e. , will read
the first character of the next card unless an end-of-file condition
exists. The new value of CURCHAR is the same as the output ofadvance; executing advance also increases the value of
CHARCOUNT by 1. However, CHARCOUNT is undefined when CURCHAR is either
$EOR$ or $EOF$.

3. endread [ ]: : SUBR pseudo-function

endread is a function of no arguments which causes the
remainder of the card to be read and ignored. endread sets
CURCHAR to $EOR$ and leaves CHARCOUNT undefined, the value ofendread is always $EOR$. An advance following endread
acts like a startread. If CURCHAR already has value $EOR$ andendread is performed, CURCHAR will remain the same andendread will, as usual, have value $EOR$.

Diagnostic Function

error 1 [ ]: : SUBR pseudo-function

error1 is a function of no arguments and has value NIL. It
should be executed only while reading characters from a card (or
tape). Its effect is to mark the character Just read, i. e. ,
CURCHAR, so that when the end of the card is reached, either by
successive advances or by an endread, the entire card
is printed out along with a visual pointer to the defective
character. For a line consisting of ABCDEFG followed by
blanks, a pointer to C would look like this:

                        V
                        ABCDEFG
                        A

If error1 is performed an even number of times on the same
character, the A will not appear. If error1 is performed before
the first startread or while CURCHAR has value $EOR$ or $EOF$, it
will have no effect. Executing a startread before the current
card has been completed will cause the error1 printout to be
lost. The card is considered to have been completed when CURCHAR
has been set to $EOR$. Successive endreads will cause theerror1 printout to be reprinted. Any number of characters in
a given line may be marked by error1.

Next: G. MEMORY ALLOCATION AND Up: LISP 1.5 Programmer's Manual Previous: E. OVERLORD - THE