Literate Programming - Issues and Problems

Kurt NÝrmark

Department of Computer Science
Aalborg University

Abstract The purpose of this paper is to bring forward a number of arguments for an improved practice of program documentation, in the direction of literate programming. The paper is structured in terms of a number of issues, in which we discuss a number of different points of interest.


This is a note about 'issues and problems' in the discipline called Literate Programming. The term 'Literate Programming' was coined by Knuth in his paper from 1984, in which also a particular tool for literate programming was introduced.

Since then a number of reseachers have worked sporadically on the ideas as well as on tools for literate programming. But it is probably fair to say that the ideas have not had any significant impact on 'the state of the art of software development'.

It may sound a little old fashioned to continue the discussion about a 'software crisis'. However, every now and then it becomes painfully obvious that something is wrong in the software industry. One very clear and well-known example is the 'Year 2000 problem'. Here, an enourmous body of programs need to be updated because of a relatively naive problem. The estimated cost of remedying these problem is estimated to huge amount of dollars. One of the important underlying problems is to understand the programs in order to extend the representation of years from two to four (or more) digits. It is interesting to imagine these programs as literate programs, and the ease of which to attach the year 2000 problem in a literate program vs a traditional, 'illiteral program'.

In the following we will, in a concise style, discuss issues and problems in literate programming.

The name of the game. Maybe one of the problems is the term itself: 'Literate Programming'. The real issue is more down to earth: 'Program documentation' and 'the relations between pieces of program and pieces of documentation'. Very few of us have the ambition of Knuth to write 'world program literature'. Most of us are satisfied if we succeed in writing well-documented programs that can be understood by fellow programmers - today and tomorrow.

Ideas, not a particular tool. For many novices who are introduced to literate programming, the ideas of literate programming are identified with Knuth's original tools for literate programming. This is WEB, and the derivates of WEB. For most of programmers, TeX, LaTeX (or similar text formatting languages) together with WEB are far away from mainstream modern text processing and programming environment tools. As a consequence, it is a challenge to re-emphasize the original ideas of literate programming, and to envision better and more modern tool support of the ideas.

Pragmatism, not academics. Many students object that the ideas of literate programming are academic, in the meaning of being speculative without a practical purpose or intention. The scepticism seems to be justified by the fact that very little practical program developement is done 'in the literal way'. We need to sell the ideas of literate programming in a more convincing way, such that the real issuses, problems and possibilities become clear for the next generation of software developers.

Will the investment pay off? One of the main ideas behind literate programming is to formulate - and write down - the understanding of the program, and connect this understanding to the traditional program source. This is seen as major investment by many programmers, and it is not obvious to the managers of a software project when - or even if - the investment will ever pay off. This may be seen as a major problem, but I think that the focus is wrong. Rather, we should discuss the problem raised in the next issue.

The forgotten understanding. By and large, it is necessary to understand the problem in order to program a solution. To program is to understand. However, the understanding is very difficult to extract from a traditional implementation written in some standard programming language. In some sense, it is necessary to decode and decrypt the program in order to extract some of the original understanding, which is embedded in the program. The main points from these observations are the following:
  • The major investment is to achieve the understanding prior to programming.
  • It is a minor additional effort to formulate and articulate it - to write down the understanding.
  • And moreover, being forced to formulate the understanding adds qualities to the solution. In many cases it will be experienced during the 'formulation phase' that the understanding is not deep or good enough. It is much better to realize these problems early (at 'program explantion time') instead of later (e.g., at 'program test time' or 'program application time').

The educational challenge. Professionals in the software industry are most likely lost when it comes to a better program documentation discipline. Misconceptions such as 'self documenting programs' seem to live well among software professionals. If there is hope, it is among the next generation of programmers, who are educated at schools and universities today (or tomorrow). It is probably possible - although not necessarily easy - to influence the student's attitude to the value of 'high quality and articulated program understanding'. However, in order to do so we need to demonstrate a pragmatic and reasonable process, including integrated tool support, during which to carry out a 'program understanding and development effort'. Such a demonstration cannot be based on WEB, or WEB-like, literate programming tools. Thus, in order for us to address the educational challenge, we need access to much better tools. It is not unrealistic to assume that such tools requires a major re-work of some of ideas underlying literate programming.

Paper is not the main program medium. In Knuth's work, beautiful literate programs were woven from ugly mixes of markups in combined program and documentation files. The literate program was directly targeted towards paper and books. Of obvious reasons, programmers do not very often print their programs on paper. (And as argued above, few of us intend to publish our programs in a book). Within hours, such a program print out is no longer up-to-date. And furthermore, being based on a paper representation, we cannot use the computer's dynamic potentials, such as outlining and searching.

The primary program medium is the screen of the computer, on which we develop the program. This calls for a 'you can understand what you see' (YCUWYS) screen presentations. It is worth a good discussion how far we should go towards 'first class program and documentation typography' directly in the tools of software development environment. However, it is beyond reasonable discussion that a WEB-like program source is inadequate as the primary 'face of the program'.

The development of the Internet in more recent years makes it attractive to provide for presentation of 'literate programs' on the Internet. We can think of a World Wide Web browser as a new attractive program medium for program presentation. This is already a fact within the Java culture (although the Javadoc approach has only little to do with the ideals of literate programming). We can even imagine 'World Wide program development efforts' via Internet browsers.

The documentation paradigm versus the programming paradigm.

Literate programming is one possible, and potentially very attractive program documentation paradigm. It seems to be the case that the ideas of literate programming can be applied in every possible programming paradigm (whether imperative, functional, logic, and object-oriented). However, the detailed elaboration of literate programming depends on the paradigm - or style - of programming. Let us look at one example of particular interest and importance:

The computer science and programming pioners, like Wirth and Knuth, used another programming style than we recommend today. One notable difference is the use of procedure abstractions. When Wirth and Knuth wrote their influentiel books and programming, procedure calls were used with care, partly because they were relatively expensive. Consequently, they needed an extra level of structuring within a program or procedure. 'Programming by stepwise refinement' as well as 'literate programming' can be seen as techniques and representations to remedy the problem. The use of many small (procedure) abstractions goes in another direction. If you have attempted to use 'literate programming' on a program (maybe an object-oriented program) with lots of small abstractions, you will probably have realized that there is a misfit between, on the one hand, being forced to name literal program fragment (scraps) and on the other, named program abstractions. We do not want to leave the impression that this is a major conceptual problem, but it is the experience of the author that we need a variation of literate programming, which is well-suited to programs with many, small (named) abstractions.

It should be expected that other similar interferences between the documentation paradigm and the programming paradigm will occur.

Documentation in program, or program in documentation.

Most programmers are used to 'the documentation lives in the program' approach. With this, we think of documentation as program comments. Usually, we easily get rid of the comments when the program is processed. More rarely, there are tools that process the comments with the purpose of generating program documentation. Literate programming represents the opposite approach: In a literate program 'the program lives in the documentation'. Given this, it is a bit more difficult to process the program. The reason is that the program first needs to be extracted from the documentation, and assembled piece by piece. The literate program itself can be processed rather direcly in order to produce high quality and 'explanation rich' documentation. The latter is, indeed, the major point in the second approach.

It may, however, very well be worthwhile and useful to consider more symmetric relationships between program and documentation. Thus, instead of embedding one kind of information into the other, we can instead model documentation and program fragments as separate entities tied together with relations. The relations can be implemented in a number of diffent ways, e.g., as hypertext links or via database technology.

Last updated: 13.8.1998, 15:27:06
Kurt NÝrmark's home page