VP-SE Research Group (C)

Software Engineering Techniques for CAL

A Postscript version of this document can be found here.

Abstract:

The development of Computer Aided Learning software (CAL) calls for a large scale project if one does not want to limit oneself to lessons of purely personal utility. In our opinion, the development of such software is a team effort. The most common approach in this field is to use authoring languages, imposing a rather rigid framework on their users. As opposed to this approach, we have chosen to use a very simple formalism for the specification of the lessons and a general purpose programming language to implement these specifications. A whole development environment has therefore been developed to run on a graphic workstation, in order to ease the translation of the specifications into programs that could run on different target machines. This development environment includes, among other tools, a graphic script editor, an automatic program generator and a synchronous multi-window editor to hand-code the parts of the CAL program not generated automatically.

Keywords:

Computer aided learning, CAL, Computer based learning, CBL, software engineering, automatic programming, graphical specification, development environment.

1. Introduction:

At the time our team entered the Computer Aided Learning field and more specifically the development of tutorials with the mastery learning approach, the major trend seemed to be towards the elaboration and use of "authoring systems" intended for non-programmers who wanted to develop their own lessons autonomously (e.g. Plato, the Diane project [5]). Very little was done on using a general purpose programming language, for which collaboration between teachers and programmers would be needed.

The major argument for using authoring systems is usually that authors do not need to know any programming language in order to use them. This is why they are preferred by teachers who want to develop their own lessons relatively cheaply. The main problem with this is that authoring systems that are easily learned have to be simplistic, leading to poor quality lessons. Those offering more complex primitives (e.g. programming possibilities) are harder to use and thus are rarely used by the authors themselves. Moreover, these programming possibilities barely conform to such modern programming criteria as security and readability.

In order not to impose any limitation on the teachers who want to develop CAL lessons, we have chosen to use a two phase approach in which the teachers specify the behaviour of a lesson in a detailed script (pedagogical design phase) and then a team of coders implement this script using a general purpose programming language (coding phase). The programming language has to be modular in order to support team work well. Independently from the programming language, we use a graphical specification formalism developed by Prof. A.Bork's team [1] at the University of California, Irvine. Albeit simple, this formalism allows a complete and detailed description (in natural language) of a lesson. Similarly to a movie script, but less linearly, the dialogue between the computer and the learner can be described, specifying the text that has to be output (messages), actions that have to be done (instructions to the coder) and the sequencing of operations corresponding to the user's answers or actions.

2. Specification formalism:

The specification formalism we use is built on just three basic primitives, in order to keep it simple enough to be used by teachers who are not necessarily knowledgeable in computer use. These primitives are: text that has to appear on the screen, instructions to the coder, and test boxes (containing an answer analysis or action criterion). The specification of a lesson can thus be represented as a directed graph, in which each node corresponds to one of the three basic primitives and where the edges indicate the sequence of operations (Figures 1 and 2).

The various analysis criteria of the learner's answer (or action) are specified in a sequence of adjacent test boxes. The edges originating from a test box are generally numbered. They indicate the next action that has to be done on the first, second, etc., occasion that the criterion specified in the box has been satisfied. If the criterion is not satisfied, the one just below (if present) is then evaluated, and so on. If there is no further test box below, the edges originating from the bottom of the last box indicate the actions to be taken.

This formalism does not prejudice who, of the learner or the computer, has control of the other. It has been used to specify very different kind of lessons, ranging from mastery learning tutorials to microworlds with embedded tutoring.

3. First step: code reusability.

The first developments of CAL programs which we undertook some years ago already included separate pedagogical design and coding phases. The script was hand-written with paper and pencil. Generally, the teachers and pedagogues who participate in the pedagogical design find no difficulty in mastering the rather simple formalism. The heaviest part of the work has then to be done by the team of coders in translating the specification into programs.

A set of reusable packages was developed in order to ease this translation and to avoid repeating the same efforts each time. These packages are used to handle text and graphic windows on the screen, the keyboard or mouse input, direct access message files to separate the text of the dialogues from the program logic, raster image files allowing imaging code to be independent from the screen resolution [3], the access to a Local Area Network (LAN) for the on-line supervision of the lessons and for automatic statistics collection [2], as well as pattern matching algorithms to analyse the learner's answers.

The CAL programs which we developed using these packages used message files to ease later modifications of the dialogues that do not imply changing the logic of the program. These are not messages in the sense of object oriented programming, rather, a message is just text to be displayed on the screen. Each message has a name (a string) and the program passes this name to a display primitive to cause the text of the dialogue to appear on the screen. Messages can also contain answer analysis criteria. This separation facilitates the translation of the dialogues to other natural languages.

Even if their utility is obvious, these various packages are useful only in a small part of the life cycle of a piece of CAL software. Moreover, a good deal of the programming that has to be done manually is rather tedious and error prone.

4. The development environment

The experience we acquired in the development of various CAL lessons allowed us to define better the tools needed in connection with the use of a general purpose programming language. These tools are, as a matter of fact, quite similar to those of any large scale project: specification tools, implementation tools, prototyping tools and maintenance tools.

The various phases in the development of a CAL application can be illustrated by Figure 3.

In order to provide one or more tools for each of these phases and to reduce the programming effort as much as possible, we have started the "IDEAL" project (Interactive Development Environment for Assisted Learning). This development environment is mainly intended to be used by computer specialists. This means that it is the power of functionality rather than the simplicity of use that has been given priority, without giving up user friendliness. We indeed think that, even for a computer professional, user friendliness is a key factor for productivity.

This development environment is based on a large screen workstation. For the sake of portability and manufacturer independence, we have chosen to use Ada as a programming language and the X window system (developed at the Massachusetts Institute of Technology in the Athena project) as a windowing environment [4]. The lessons to be developed with this environment are intended to run on personal computers and need far fewer computing resources than the development environment itself, typically 640 KBytes of memory and a monochrome 12" graphic screen. The target programming language also need not be the same as the one chosen to write the environment tools. In fact, the environment is table driven and can generate, among other languages, Ada, Modula 2, UCSD Pascal, or Turbo Pascal programs.

The development environment includes for the pedagogical design phase a graphical script editor allowing the user to enter a graphic specification of a new lesson, or to modify the specification of an existing one.

For the review of the pedagogical design, we use an automatic program generator. Even if it cannot translate the whole script (instructions to the programmer, written in natural language, are not that easy to translate), this tool can produce a program that runs on a target machine and gives a good idea of what the final product will be. This allows us to do fast prototyping and, already at this stage, enables us to modify some aspects of the script, even before the coding phase has started.

The coding and code review phases use a synchronous multi-window editor. This tool is used by a programmer to complete, in one window, the automatically generated code corresponding to the part of the script that is showing in another window. The use of a modular programming language is essential here. It allows a complete separation between the automatically generated code and the code added by the programmers, so that the programmers do not need to modify the code produced by the automatic translation.

The coding review phase also uses a remote supervision tool on a development machine to follow what is going on in a lesson running on a target machine connected to the same LAN. This is done by directly showing on the script the current node of the graph of the target program. It is thus much easier to find where in a lesson a given error occurred. The synchronous multi-window editor can then be used by the programmer to move directly to the code that generated the error and correct it.

A similar mechanism makes it possible to collect automatically statistics on the learning process, once the CAL program is operational. The analysis of these statistics can be useful for the designers of the lessons to review the pedagogical design. This review can then be carried out using the script editor.

Besides the design, the coding and the debugging of lessons, there is in our view another rather important phase: the translation of the dialogues. Indeed, all the CAL software we develop is intended to be easily translated into different languages using the Latin alphabet (with some variants for accented letters). This translation phase can happen before, as well as after, design reviews. Therefore, it implies using an editing tool that also has the functionalities of a version manager to handle a database of updates.

All these tools are interdependent and form a coherent set that can be illustrated by Figure 4.

5. The script editor

The script editor is essentially a graphic editor in which the basic primitives have been tailored to the functionality of the tool. It is used in the pedagogical design phase to build the detailed specification of a lesson, as illustrated in Figures 1 and 2. This editor is interactive and uses the mouse for most operations (except text input): selection of a command in a pop-up menu, selection of node(s) to be modified, indication of the insertion point, etc...

The data structure built by this editor to represent the content of a lesson is a directed graph in which each node represents an elementary action: display of text (a message) on the screen, execution of the operation involved in an instruction to the programmer, or evaluation of a condition about a user's action. The edges represent the sequence of operations. This structure is non-linear, since a node representing a test (generally an answer analysis criterion) may have many successors:

an implicit successor which is the node containing the next test (no visible edge in between). That test is evaluated only if the current test appears to be negative.
one or more explicit successors that are indicated with labelled edges. The labels are either a simple integer or an interval of integers. They indicate which edge must be used each time the test is positive. The i-th time a test is positive, the edge with the label i, or with an interval containing i, will be used.

In addition to more classical direct positioning primitives or elevator mechanisms, the placement primitives include primitives to move along the edges (forward or backward) allowing the user to move along the structure easily. The script editor also includes a primitive to check the coherence of the graph, immediately showing missing edges or incomplete paths.

In order to maintain the separation between the dialogues and the logic of the lesson, the messages are not directly stored in the graph structure. Instead, a name is automatically generated and associated with the message. It is this name that is stored in a "message" node of the graph.

Graphics that have to be incorporated in a lesson are generally described with a few words in an instruction to the coder, e.g. { picture of a plane taking off }. In a later stage we intend to incorporate to the script editor some sketching facilities that would allow the designers to make a approximate drawing of the desired picture. There are so many graphic editors on the market that there is no point in incorporating a full graphic editor to our script editor. We prefer to make reference in the script to images that are prepared externally with more appropriate tools.

On exit, the script editor creates a file containing the graph corresponding to the lesson and a message file containing the text of the dialogues (the content of the "message" nodes). These two files can be used later on by other tools in the environment (Figure 4) to get a paper hardcopy or to generate the corresponding programs.

6. The automatic program generator

The directed graph structure built by the script editor can be very easily translated into a program. Indeed, the graph structure corresponds to a finite state automaton. Each state of the automaton contains the execution of the action specified in the corresponding node of the graph and each edge can be translated into a change of the state variable of the automaton.

A node of type message is translated into a call to the text display primitive with the name of the corresponding message as parameter. A node of type test is translated into a call to a boolean function whose body will be defined in a separate module. A node of type instruction to the coder is translated into a call to a procedure whose body will also be defined externally in a separate module.

The code that the programmer(s) has to add is thus isolated in modules that are separate from the code produced by the automatic program generator. These external modules have their definition (interface) produced by the automatic program generator and the coder(s) can use the synchronous multi-window editor to specify their implementation.

In order to allow fast prototyping of the lessons, the automatic program generator initially produces a dummy implementation of the external modules. In this initial implementation, the procedures that correspond to the instructions to the coder contain the necessary code to display the instructions instead of executing them, and the functions that correspond to tests prompt the user for the result they are supposed to return.

7. The synchronous multi-window editor

One major problem in using code produced by an automatic program generator is that this code is hardly readable by a human being. Most of the symbols used in the code are generated by rather simplistic algorithms and are thus quite cryptic. As we have seen previously, in order to avoid having to deal with automatically generated code, the parts that have to be "hand-coded" are in separate modules. The aim of the synchronous editor is to help the programmer(s) to write these separate modules. The idea is to have different windows show the different aspects or views of the same thing : a regular text editor in one window to edit one of the separate modules that have to be hand-coded and the script editor in another window to show the corresponding specification.

Any script window can be synchronized with a module window, or vice-versa. The script window is read-only in the sense that one can only issue positioning commands in it. One can enter positioning commands in any of the windows and ask for the other window to "synchronize", i.e. show the corresponding part of the view it handles. The programmer can thus very easily see, write or modify the code corresponding to a specific part of the script, by finding the location in the script window and then asking for the module window to synchronize, i.e. show the code for that part of the script. One can also find a specific location in the code (e.g. where an error occurred) and then ask for the script window to synchronize, i.e. show the part of the script that is the specification of the code.

One can have more than one pair of script-program windows at the same time. With this tool, the programmer can always see the specification at the same time as the code he is working on. This specification (the script) should help him understand and manage the code, since it can be seen as a human readable documentation of the code.

8. The message editor

As it has been stated earlier, the CAL software we develop is, from the beginning, designed to allow an easy translation of the dialogues. If this translation were done only after the initial version had been fully completed, there would be no problem in using a simple text editor to create a translated version of the original message file. However, it often happens that some modifications are made to the script after the lesson has been used for a while in field tests, and users or teachers have made some remarks and suggestions.

In order to help to maintain the coherence of the different translations, a multi-window editing utility manages the modifications that are made to the messages. This tool maintains a database of modifications for the different message files, allowing the person(s) in charge of the translation to find directly the places that need to be updated, showing in a window the original message in the initial language, in another window the updated version of the message for this same language, in a third window the translation of the original message and, in a fourth window, the user can enter the update for the translated message. Only the fourth window is active, i.e. it is the only one in which the user can apply a modification. The other windows are passive (read only), but all four windows are synchronized and a move in any of them generates a similar move in the other ones.

In addition to usual text editing primitives, this tool includes version management primitives, maintaining a database of updates. It also includes primitives that make use of the update database: for instance, to move directly to messages where an update has been made in one language and not in another one.

9. The lesson supervisor

The fact that the graph corresponding to the script is translated into a finite state automaton allows us to visualize easily, on a development workstation, the progress of a learner running a lesson on a target machine. It is only required that the workstation and the target machine be on the same LAN. To make use of this link, the automatic program generator can include in the code of the finite state automaton the instructions needed to send a message on to the network each time the state variable changes (i.e. each time a node of the graph is executed). These messages contain the identification of the target machine, an indication of the lesson being run and the value of the state variable

When a lesson is running on a target machine, one can use a development workstation to listen to the network and visually follow the progress of the lesson on the workstation screen by highlighting directly on the script the node that is currently being executed. In parallel to showing on the screen the progress of a lesson, it is possible to use the same technique to collect automatically statistics on the progress of the different lessons. These statistics are useful to get a better feeling of the "impact" of the lessons on the learners, showing the parts of the scripts where many learners had problems. The designers of the lessons may then use this information to modify the script, if necessary, in order to improve its quality. In the past, we already made use of a LAN to collect statistics on CBL material, but the code to send the adequate information on to the network had to be added by the coder [2].

By just adding the name of the learner to the messages sent by the target machine, it is also possible to do automatic curriculum. However, we are not strongly in favour of this solution, because it is not always easy to authenticate the actual identity of the person running the lesson. It would also be detrimental to the atmosphere of confidence which the learner can feel when he knows that nobody is looking over his shoulder and that he can thus make errors without feeling guilty.

The module to access the network is written in such a way that the target machine does not hang up if the network goes down. In addition to its obvious utility in the operational phase of the CBL material, this possibility of watching a lesson run on a target machine is very useful for the debugging of the code during the coding phase. Indeed, it allows the programmer to know exactly where in the script one was when a lesson "crashed". The synchronous editor can then be used quickly to find the corresponding code, making it much easier to locate and correct coding errors.

10. Current implementation

At the time this paper is written, this project is still on-going. It is being developed on 4 MBytes monochrome SUN workstations in Ada and uses version 11 release 2 of the X window system as a multi-window environment. It started with a prototype of the script editor and the automatic program generator about a year ago, but the funding and thus the main effort started about six months ago. The estimated code size of the whole environment is between fifty and a hundred thousand lines of Ada code. This does not include the Ada - X bindings that were not written by our group. Since the environment is not yet finished, nothing can be said for now on its usability or on how it compares with authoring systems.

11. Future extensions

Even though the development of this environment is not yet complete, we are already working on extending its capabilities. As the reader may have noticed, the specification formalism we use is rather open ended, in the sense that it allows the designer to give part of the specification in natural language (as instructions to the coder). In the first phase, we designed the automatic program generator so that it takes care only of the display of the messages and of the sequencing of the operations. The instructions for the coder are, for the moment, translated into a procedure call and the answer analysis criteria into function calls. These procedures and functions are in a separate module and their body must be written by a programmer.

In a second phase, we intend to augment the translation capabilities of the automatic program generator by using artificial intelligence techniques, i.e. knowledge acquisition and generalization techniques. It will try to detect similarities between different instructions to the coder and query the programmer for a predicate, allowing the generator to associate specific parameters to each instruction to the coder.

12. References

Site Hosting: Bronco