Safe REXX on the Desktop,


or
Will They Still Respect My Code in the Morning?

by
Shmuel (Seymour J.) Metz

(Part 1 of 2 parts)

This article is intended for those who, although new to REXX, have some programming background and understand the basic concepts of command languages.  Because of both its intrinsic merit and its status as the SAA Procedures Language, it is likely that you will be using REXX to some extent in the future.  Although REXX is in many ways a good language, it has some pitfalls; an understanding of these pitfalls and of some easy ways to deal with them will make your experience of REXX more enjoyable.  This article does not address extensions adopted by ANSI, nor does it address Object Oriented REXX.

INTRODUCTION

What is REXX?

REXX is a language that was originally designed to replace the EXEC and EXEC2 command-macro languages in the CMS component of IBM 's VM/SP.  Since then it has spread to a large number of other platforms, including Unix, and has been designated by IBM as the SAA procedures language.  REXX has been used to implement a wide variety of applications beyond its original problem domain, including many of substantial size.

Summary of pitfalls

REXX has a number of features that can trap the unwary.  This does not mean that REXX is a bad language, just that you need to understand it for what it is, as you must for any other programming languages.  Some of these features are just language glitches, while in other cases they were added as the necessary price for greater expressive power.

One of the easiest ways to run afoul of REXX is to be misled by superficial similarity with other languages, especially PL/I, TSO CLISTs and languages derived from them.  Make a conscious effort to learn REXX on its own terms, without relying on analogies with other languages.

Other areas that may confuse the neophyte are the use of abutment for concatenation, the use of uninitialized variables as constants, the rules for continuation, parsing, the block structure and the way variable references are passed.  I will go into detail in the next section.

You may write REXX code that must run on multiple platforms, or in different environments on the same platform.  REXX has some language features that may impede portability.  It also has some features that may be exploited to improve portability.  I will give some guidelines on how to ease migration between environments and between platforms.

Of course, there are many generic principles of defensive programming that apply just as much to REXX as to any other language.  These include:

    Use meaningful variable names
    Use judicious comments
    Use a consistent indentation style.

Although I will be discussing only issues and solutions specific to REXX, those generic principles are of equal importance in avoiding programming errors.

SPECIFIC EXAMPLES AND RECOMMENDED AVOIDANCE TACTICS

Although REXX has a number of features that lend themselves to fast prototyping, it has a few pitfalls that can beset the unwary.

Abutment

Although REXX has a conventional concatenation operator (||), it also supports two other concatenation operators:  abutment with white space and abutment without white space.  See figure 1.  With abutment an expression is abutted against a second expression.  If there is white space (e.g., blanks, tabs) between the two, the resulting value is formed by concatenating a _single_ blank between the other two values; otherwise the result is formed by simple concatenation.  It is a common beginner's error to add or remove a blank that appears to be irrelevant to the program's semantics, only to change the output.

Figure 1: Concatenation operators

    /* Explicit (conventional) concatenation */
    dog = "Peke"
    say "Tom's " || dog || "s" /* output is "Tom's Pekes" */

    /* Abutment */
    dog = "Peke"
    say "Dick's "dog"s"        /* output is "Dick's Pekes" */

    /* Abutment with white space */
    dog = "Peke"
    say "Harry's" dog "s"      /* output is "Harry's Peke s" */

    /* Incorrect abutment of X */
    x = 'unknown'
    say 'C1'X                  /* Will display EBCDIC "A",
                                     not "C1unknown" */

Another common error is to abut a literal string with a one character variable name.  If the variable name is a valid suffix for a literal string, e.g., X (for hexadecimal), it will be treated as part of the literal string, not as a variable reference.  For this reason, among others, it is best not to use variable names only one character in length.  It is so easy to misuse abutment that some recommend not to use it at all. I consider that position to be extreme, since abutment is so convenient and readable, but you should exercise caution and good judgement in its use.

Continuation

REXX allows implicit continuation; a statement is treated as continued if it would otherwise be syntactically invalid.  You indicate explicit continuation with a trailing comma.  This presents two common pitfalls for the unwary.

If you break a procedure invocation after a comma, the trailing comma will be treated as an explicit continuation request rather than as an argument separator.  In this situation you _must_ add an additional comma as an explicit continuation request in order to allow the separator to be recognized.  See Figure 2.

Figure 2: Continuation after argument separator

    say value('X',,'OS2ENVIRONMENT') /* retrieves X with no side
                                        effects */
    say value('X',,              ,
              'OS2ENVIRONMENT')      /* same as above */
    say value('X',,
              'OS2ENVIRONMENT')      /* displays current value of X and
                                        then sets X to OS2ENVIRONMENT */

If you break an expression after a literal or variable that is not enclosed in parentheses, the statement will be treated as complete and the next line will be treated as a new statement.  In this situation you _must_ supply a trailing comma as a continuation request.  See Figure 3.

Figure 3: Continuation after expression

    'ECHO' 'DIR'                     /* Displays 'DIR' */
    'ECHO'  ,                        /* Note continuation comma */
    'DIR'                            /* Also displays 'DIR' */
    'ECHO'                           /* Displays blank line */
    'DIR'                            /* Displays directory */

Note that although in some cases REXX will recognize a syntax error when you omit a required explicit continuation character, in other cases you will get incorrect results with no error message.

Keywords

Avoid the use of variables with the same name as a REXX keyword.  If you use such names you risk having statements misinterpreted or rejected as invalid.  See Figure 4.  This is similar to the problem of one character variable names being misinterpreted when abutted to literal strings.

Figure 4: Misinterpreted keyword

    text = 'tom dick harry'
    with = 'Ada Emmy Gracie Lise'
    /* we want to parse 'tom dick harry Ada Emmy Gracie Lise'
       with 'with first rest' */
    parse value text with first rest      /* wrong ! */
    parse value text with with first rest /* also wrong ! */

Labels and SIGNAL

The SIGNAL statement in REXX looks very much like a GOTO in PL/I and other block-structured languages, but its semantics are very different.  Do not attempt to use SIGNAL as a substitute GOTO or you will cause yourself serious difficulties.  Although the form SIGNAL <labelname> will cause a jump to the code with that label, it also flushes the control stack. A subsequent END statement will be detected as an error.  See Figure 5. It is best to use SIGNAL strictly for its intended purpose of indicating exceptional conditions.

Figure 5: SIGNAL errors

    do forever
       signal BELL
       whatever
       BELL:
       end                           /* an error will be detected here
                                        because the SIGNAL logically
                                        terminated the DO */

Parsing

The parsing facilities of REXX have several features that may be confusing to the neophyte.

REXX has keywords for abbreviated forms of PARSE, e.g., ARG is short for PARSE UPPER ARG.  Beginners often forget that these abbreviated forms will translate all data to upper case.

When using PARSE or its abbreviations, it is important that you remember that the last variable or period (".") is treated differently from all of the others; in general its value will include leading _and_ trailing blanks.  Use the STRIP function or a trailing period to remove these if they are unwanted.

As previously noted, it is best not to use variable names that are the same as REXX keywords.  In particular, do not use the names

    ARG             PULL            VAR
    EXTERNAL        SOURCE          VERSION
    NUMERIC         VALUE           WITH

Even if you are careful to write code that does what you want, use of those names will confuse whoever has to modify your code, possibly including yourself.

Also, be careful about your use of the keywords VALUE, VAR and WITH. The code in Figures 6 and 7 will produce quite unexpected results, and was probably meant to behave like the code in Figure 8.  In general, use VAR for simple parsing and don't use the following:

Figure 6:  Misuse of WITH

    stg = abc
    parse var stg with x +1 y +1 z
    /* sets with='ABC' x='' y='' z='' */

Figure 7:  Misuse of VALUE

    stg = abc
    parse value stg x +1 y +1 z
    /* Error 38! */

Figure 8:  Correct use of WITH and VALUE

    stg=abc
    parse value stg with x +1 y +1 z
    /* sets x='A' y='B' z='C' */
    /* Equivalent to */
    parse var stg x +1 y +1 z
    /* which is better form in this case */

Scoping rules

Although superficially REXX appears to be a block-structured language, it is actually a hybrid between dynamic and static scoping.  It is possible, although bad form, to call a label inside a DO from code outside the DO. It is possible to perform code at an arbitrary label as both a call and as a function invocation.  It is incumbent upon the programmer to supply the discipline that the language omits.

The scope of a procedure is determined strictly dynamically; there is no static terminator such as END.

Do not write code intended to serve as both inline and out-of-line code; programs in which you both call and fall through into the same code are notoriously error prone.  Precede each internal subprocedure with a statement that will prevent accidentally falling into it, e.g., EXIT; if your logic permits, begin the procedure with a PROCEDURE statement, which must be the first statement after the label.  See Figure 9.

Figure 9: Procedure isolation

    saytime: PROCEDURE /* here I can get away with hiding all variables */
       say time
       return
    /* Note that there is no END statement ! */

    exit /* In case of fallthrough, since I can't use PROCEDURE */
    putdata:
       parse arg name .
       say name'='value(name)
       return
    /* Note that there is no END statement ! */

    badstyle: PROCEDURE                 /* This entry has no access */
    badform:                            /* This entry has access    */
    ...
    return
    /* Don't ever do this; it is an extremely dangerous style */

The PROCEDURE statement hides all variables except those explicitly listed in an EXPOSE clause.  If your subroutine accesses the caller's variables and constructs those variable names from its arguments, then you must not use the PROCEDURE statement.  This is the only situation in which you should omit it.  See Figure 9.

It is possible to write procedures with overlapping scope in which one procedure hides variables with a PROCEDURE statement and the other procedure leaves all variables exposed by default.  See Figure 9.  This is a dangerous practice, and should be avoided.

Type and range checking

Unlike most other languages, REXX has neither variable typing nor arrays. Arrays are often simulated using compound variables.  This leads to several possible types of undetected errors.

When you assign a value to a variable, there is no check that the value is consistent with the intended type.  If your logic requires any constraints on the values that can be assigned, it is your responsibility to code explicit checks using, e.g., the DATATYPE function.

When you use a variable name as part of a compound variable in order to simulate an access to an array element, REXX does not check that the index is within the array extents, or even that it is an integer.  If your logic requires enforcing such constraints, you must code them explicitly.  Note that even an uninitialized variable can be used as an "index" for a compound variable.

Uninitialized variables used as constants

When you refer to an uninitialized variable, its value is by default its name in upper case.  This is frequently a convenient alternative to the use of literal strings.  However, if you inadvertently use that name for a true variable elsewhere in the program, you may get incorrect and apparently inexplicable results.  It is best to adopt conventions for your variable names that minimize the risk of such problems.

Some recommend always using explicit literal strings for constants. Although well meant, this advice can lead to programs that are harder to read.  I recommend that you do use uninitialized variables, but judiciously.

Variable References

REXX does not currently allow passing parameters to procedures by name or by reference.  However, you can often get similar results by passing constants and using them to construct names.  This is an extremely common and powerful technique, especially in conjunction with compound variables. However, there are a few pitfalls.

If you call a procedure that has an EXPOSE clause on the PROCEDURE statement, it will only have access to the variables that you exposed. If you pass an argument containing the name of some other variable, the code will only be able to access a local version of that variable.

If you call a procedure that requires a variable name as a parameter, and use an uninitialized variable to represent its own name for that parameter, you will probably get incorrect results on your second time through.


Part 2 of this 2-part article from the keyboard of Shmuel (Seymour J.) Metz will be published in an upcoming RexxLA newsletter.

Mr. Metz can be reached at 703-256-4764 (6:00 PM and 9:00 PM EST), at his office at 703-306-1185, X3095, and as SHMUEL@ACM.ORG on the Internet.