Line termination in OpenMCL


Table of Contents

Overview
Functional Reference
*alternate-line-terminator* [Special Variable]
:EXTERNAL-FORMAT [Keyword Argument]
*DEFAULT-EXTERNAL-FORMAT* [Special Variable]

Overview

MacOSX effectively supports two distinct line-termination conventions. Programs in its Darwin substrate follow the Unix convention of recognizing #\LineFeed as a line terminator; traditional MacOS programs use #\Return for this purpose.

OpenMCL follows the Unix convention on both Darwin and LinuxPPC, but offers (as of version 0.11) some support for reading and writing files that use the MacOS convention as well.

This support (and anything like it) is by nature heuristic: it can successfully hide the distinction between newline conventions much of the time, but could mistakenly change the meaning of otherwise correct programs (typically when files contain both #\Return and #\Linefeed characters or when files contain mixtures of text and binary data.) Because of this concern, the default settings of some of the variables that control newline translation and interpretation are somewhat conservative.

Although the issue of multiple newline conventions primarily affects MacOSX users, the functionality described here is available under LinuxPPC as well (and may occasionally be useful there.)

None of this addresses (or attempts to address) issues related to the third newline convention ("CRLF") in widespread use (since that convention isn't native to any platform on which OpenMCL currently runs). If OpenMCL is ever ported to such a platform, that issue might be revisited.

Note that some MacOS programs (including some versions of commercial MCL) may use HFS file type information to recognize TEXT and other file types and so may fail to recognize files created with OpenMCL or other Darwin applications (regardless of line termination issues.)

Unless otherwise noted, the symbols mentioned in this documentation are exported from the CCL package.

Functional Reference

*alternate-line-terminator* [Special Variable]

This variable is currently only used by the standard reader macro function for #\; (single-line comments); that function reads successive characters until EOF, a #\NewLine is read, or a character EQL to the value of *alternate-line-terminator* is read. In OpenMCL for Darwin, the value of this variable is initially #\Return ; in OpenMCL for LinuxPPC, it's initially NIL.

Their default treatment by the #\; reader macro is the primary way in which #\Return and #\Linefeed differ syntactally; by extending the #\; reader macro to (conditionally) treat #\Return as a comment-terminator, that distinction is eliminated. This seems to make LOAD and COMPILE-FILE insensitive to line-termination issues in many cases. It could fail in the (hopefully rare) case where a LF-terminated (Unix) text file contains embedded #\Return characters, and this mechanism isn't adequate to handle cases where newlines are embedded in string constants or other tokens (and presumably should be translated from an external convention to the external one) : it doesn't change what READ-CHAR or READ-LINE "see", and that may be necessary to handle some more complicated cases.

:EXTERNAL-FORMAT [Keyword Argument]

Per ANSI CL, OpenMCL supports the :EXTERNAL-FORMAT keyword argument to the functions OPEN, LOAD, and COMPILE-FILE. This argument is intended to provide a standard way of providing implementation-dependent information about the format of files opened with an element-type of CHARACTER. This argument can meaningfully take on the values :DEFAULT (the default), :MACOS, :UNIX, or :INFERRED in OpenMCL.

When defaulted to or specified as :DEFAULT, the format of the file stream is determined by the value of the variable CCL:*DEFAULT-EXTERNAL-FORMAT*. See below.

When specified as :UNIX, all characters are read from and written to files verbatim.

When specified as :MACOS, all #\Return characters read from the file are immediately translated to #\Linefeed (#\Newline); all #\Newline (#\Linefeed) characters are written externally as #\Return characters.

When specified as :INFERRED and the file is open for input, the first bufferful of input data is examined; if a #\Return character appears in the buffer before the first #\Linefeed, the file stream's external-format is set to :MACOS; otherwise, it is set to :UNIX.

All other values of :EXTERNAL-FORMAT - and any combinations that don't make sense, such as trying to infer the format of a newly-created output file stream - are treated as if :UNIX was specified. As mentioned above, the :EXTERNAL-FORMAT argument doesn't apply to binary file streams.

The translation performed when :MACOS is specified or inferred has a somewhat greater chance of doing the right thing than the *alternate-line-terminator* mechanism does; it probably has a somewhat greater chance of doing the wrong thing, as well.

*DEFAULT-EXTERNAL-FORMAT* [Special Variable]

The value of this variable is used when :EXTERNAL-FORMAT is unspecified or specified as :DEFAULT. It can meaningfully be given any of the values :UNIX, :MACOS, or :INFERRED, each of which is interpreted as described above.

Because there's some risk that unsolicited newline translation could have undesirable consequences, the initial value of this variable in OpenMCL is :UNIX.