|
Safe REXX on the Desktop, or This article is intended for those who, although new to REXX, have some programming background and understand the basic concepts of command languages. Because of both its intrinsic merit and its status as the SAA Procedures Language, it is likely that you will be using REXX to some extent in the future. Although REXX is in many ways a good language, it has some pitfalls; an understanding of these pitfalls and of some easy ways to deal with them will make your experience of REXX more enjoyable. This article does not address extensions adopted by ANSI, nor does it address Object Oriented REXX. INTRODUCTION What is REXX? REXX is a language that was originally designed to replace the EXEC and EXEC2 command-macro languages in the CMS component of IBM 's VM/SP. Since then it has spread to a large number of other platforms, including Unix, and has been designated by IBM as the SAA procedures language. REXX has been used to implement a wide variety of applications beyond its original problem domain, including many of substantial size. Summary of pitfalls REXX has a number of features that can trap the unwary. This does not mean that REXX is a bad language, just that you need to understand it for what it is, as you must for any other programming languages. Some of these features are just language glitches, while in other cases they were added as the necessary price for greater expressive power. One of the easiest ways to run afoul of REXX is to be misled by superficial similarity with other languages, especially PL/I, TSO CLISTs and languages derived from them. Make a conscious effort to learn REXX on its own terms, without relying on analogies with other languages. Other areas that may confuse the neophyte are the use of abutment for concatenation, the use of uninitialized variables as constants, the rules for continuation, parsing, the block structure and the way variable references are passed. I will go into detail in the next section. You may write REXX code that must run on multiple platforms, or in different environments on the same platform. REXX has some language features that may impede portability. It also has some features that may be exploited to improve portability. I will give some guidelines on how to ease migration between environments and between platforms. Of course, there are many generic principles of defensive programming that apply just as much to REXX as to any other language. These include: Use meaningful variable names Although I will be discussing only issues and solutions specific to REXX, those generic principles are of equal importance in avoiding programming errors. SPECIFIC EXAMPLES AND RECOMMENDED AVOIDANCE TACTICS Although REXX has a number of features that lend themselves to fast prototyping, it has a few pitfalls that can beset the unwary. Abutment Although REXX has a conventional concatenation operator (||), it also supports two other concatenation operators: abutment with white space and abutment without white space. See figure 1. With abutment an expression is abutted against a second expression. If there is white space (e.g., blanks, tabs) between the two, the resulting value is formed by concatenating a _single_ blank between the other two values; otherwise the result is formed by simple concatenation. It is a common beginner's error to add or remove a blank that appears to be irrelevant to the program's semantics, only to change the output. Figure 1: Concatenation operators /* Explicit (conventional) concatenation */ Another common error is to abut a literal string with a one character variable name. If the variable name is a valid suffix for a literal string, e.g., X (for hexadecimal), it will be treated as part of the literal string, not as a variable reference. For this reason, among others, it is best not to use variable names only one character in length. It is so easy to misuse abutment that some recommend not to use it at all. I consider that position to be extreme, since abutment is so convenient and readable, but you should exercise caution and good judgement in its use. Continuation REXX allows implicit continuation; a statement is treated as continued if it would otherwise be syntactically invalid. You indicate explicit continuation with a trailing comma. This presents two common pitfalls for the unwary. If you break a procedure invocation after a comma, the trailing comma will be treated as an explicit continuation request rather than as an argument separator. In this situation you _must_ add an additional comma as an explicit continuation request in order to allow the separator to be recognized. See Figure 2. Figure 2: Continuation after argument separator say value('X',,'OS2ENVIRONMENT') /* retrieves X with no side If you break an expression after a literal or variable that is not enclosed in parentheses, the statement will be treated as complete and the next line will be treated as a new statement. In this situation you _must_ supply a trailing comma as a continuation request. See Figure 3. Figure 3: Continuation after expression 'ECHO' 'DIR' /* Displays 'DIR' */ Note that although in some cases REXX will recognize a syntax error when you omit a required explicit continuation character, in other cases you will get incorrect results with no error message. Keywords Avoid the use of variables with the same name as a REXX keyword. If you use such names you risk having statements misinterpreted or rejected as invalid. See Figure 4. This is similar to the problem of one character variable names being misinterpreted when abutted to literal strings. Figure 4: Misinterpreted keyword text = 'tom dick harry' Labels and SIGNAL The SIGNAL statement in REXX looks very much like a GOTO in PL/I and other block-structured languages, but its semantics are very different. Do not attempt to use SIGNAL as a substitute GOTO or you will cause yourself serious difficulties. Although the form SIGNAL <labelname> will cause a jump to the code with that label, it also flushes the control stack. A subsequent END statement will be detected as an error. See Figure 5. It is best to use SIGNAL strictly for its intended purpose of indicating exceptional conditions. Figure 5: SIGNAL errors do forever Parsing The parsing facilities of REXX have several features that may be confusing to the neophyte. REXX has keywords for abbreviated forms of PARSE, e.g., ARG is short for PARSE UPPER ARG. Beginners often forget that these abbreviated forms will translate all data to upper case. When using PARSE or its abbreviations, it is important that you remember that the last variable or period (".") is treated differently from all of the others; in general its value will include leading _and_ trailing blanks. Use the STRIP function or a trailing period to remove these if they are unwanted. As previously noted, it is best not to use variable names that are the same as REXX keywords. In particular, do not use the names ARG PULL VAR Even if you are careful to write code that does what you want, use of those names will confuse whoever has to modify your code, possibly including yourself. Also, be careful about your use of the keywords VALUE, VAR and WITH. The code in Figures 6 and 7 will produce quite unexpected results, and was probably meant to behave like the code in Figure 8. In general, use VAR for simple parsing and don't use the following: Figure 6: Misuse of WITH stg = abc Figure 7: Misuse of VALUE stg = abc Figure 8: Correct use of WITH and VALUE stg=abc Scoping rules Although superficially REXX appears to be a block-structured language, it is actually a hybrid between dynamic and static scoping. It is possible, although bad form, to call a label inside a DO from code outside the DO. It is possible to perform code at an arbitrary label as both a call and as a function invocation. It is incumbent upon the programmer to supply the discipline that the language omits. The scope of a procedure is determined strictly dynamically; there is no static terminator such as END. Do not write code intended to serve as both inline and out-of-line code; programs in which you both call and fall through into the same code are notoriously error prone. Precede each internal subprocedure with a statement that will prevent accidentally falling into it, e.g., EXIT; if your logic permits, begin the procedure with a PROCEDURE statement, which must be the first statement after the label. See Figure 9. Figure 9: Procedure isolation saytime: PROCEDURE /* here I can get away with hiding all variables */ The PROCEDURE statement hides all variables except those explicitly listed in an EXPOSE clause. If your subroutine accesses the caller's variables and constructs those variable names from its arguments, then you must not use the PROCEDURE statement. This is the only situation in which you should omit it. See Figure 9. It is possible to write procedures with overlapping scope in which one procedure hides variables with a PROCEDURE statement and the other procedure leaves all variables exposed by default. See Figure 9. This is a dangerous practice, and should be avoided. Type and range checking Unlike most other languages, REXX has neither variable typing nor arrays. Arrays are often simulated using compound variables. This leads to several possible types of undetected errors. When you assign a value to a variable, there is no check that the value is consistent with the intended type. If your logic requires any constraints on the values that can be assigned, it is your responsibility to code explicit checks using, e.g., the DATATYPE function. When you use a variable name as part of a compound variable in order to simulate an access to an array element, REXX does not check that the index is within the array extents, or even that it is an integer. If your logic requires enforcing such constraints, you must code them explicitly. Note that even an uninitialized variable can be used as an "index" for a compound variable. Uninitialized variables used as constants When you refer to an uninitialized variable, its value is by default its name in upper case. This is frequently a convenient alternative to the use of literal strings. However, if you inadvertently use that name for a true variable elsewhere in the program, you may get incorrect and apparently inexplicable results. It is best to adopt conventions for your variable names that minimize the risk of such problems. Some recommend always using explicit literal strings for constants. Although well meant, this advice can lead to programs that are harder to read. I recommend that you do use uninitialized variables, but judiciously. Variable References REXX does not currently allow passing parameters to procedures by name or by reference. However, you can often get similar results by passing constants and using them to construct names. This is an extremely common and powerful technique, especially in conjunction with compound variables. However, there are a few pitfalls. If you call a procedure that has an EXPOSE clause on the PROCEDURE statement, it will only have access to the variables that you exposed. If you pass an argument containing the name of some other variable, the code will only be able to access a local version of that variable. If you call a procedure that requires a variable name as a parameter, and use an uninitialized variable to represent its own name for that parameter, you will probably get incorrect results on your second time through. Part 2 of this 2-part article from the keyboard of Shmuel (Seymour J.) Metz will be published in an upcoming RexxLA newsletter. Mr. Metz can be reached at 703-256-4764 (6:00 PM and 9:00 PM EST), at his office at 703-306-1185, X3095, and as SHMUEL@ACM.ORG on the Internet. |