*line 1: Unknown output type = >$A include=template_file IMP Core Environment Standard Section 4: String Manipulation This section of the Core Environment standard describes the facilities which are available for the manipulation of IMP string ______ values. 4.1 Basic String Operations This section covers procedures which apply to all string values ______ irrespective of the kind of data which is being held in the string. ______ * byte map CHARNO ( string(*) name S, integer Pos ) ____ ___ ______ ____ _______ Returns a reference to the character at position POS within the string S. The parameter POS must be within the current size of the string, that is: 1 <= POS <= LENGTH(S) It is an error (ERR0001; CHARNO argument out of range) if this condition does not hold. * byte function LENGTH ( string(255) S ) ____ ________ ______ This function returns the current length of the string value passed as parameter, i.e. the number of characters which it contains. * string(255) function SUB STRING ( string(255) S, ______ ________ ______ integer From, To ) _______ This function is used to extract a contiguous sequence of characters from a string value and return that sequence as result. The subsequence to be extracted is located by its first and last character positions within the string value. SUB STRING is defined by the following IMP code fragment: 4-1 string(255) function Sub String ( string(255) S, ______ ________ ______ integer From, To ) _______ string(255) Temp = "" ______ integer P _______ for P = From, 1, To cycle ___ _____ Temp = Temp . To String(Charno(S,P)) repeat ______ result = Temp ______ end ___ It is an error (ERR0003; string inside-out) if the implied length of the resultant string is less than zero characters, that is if TO-FROM+1 is negative. Note that the code fragment given above instead involves a different error (ERR0004; for ___ loop cannot terminate) in this circumstance. This standard separates ERR0003 so that it can be reported separately if the implementation is able to do so. It is an error (ERR0005; SUB STRING bounds) if either the FROM or TO parameters exceed the bounds of the string argument S and at the same time the implied resultant string is not null. More formally, it is an error if: 1) TO<0 or FROM<0 or TO>LENGTH(S) or FROM>LENGTH(S) and 2) TO-FROM+1 >= 0 Note that the code fragment given above instead involves a different error (ERR0001; CHARNO argument out of range) in this circumstance. This standard separates ERR0005 so that it can be reported separately if the implementation is able to do so. * string(1) function TO STRING ( byte N ) ______ ________ ____ This function performs the same operation as the IMP language's integer to string coercion feature, but is preferred by some programmers as being easier to read. TO STRING is used in this standard in place of the newer coercion feature for readability, for example. A definition of TO STRING in terms of the more primitive CHARNO is as follows: string(1) function To String ( byte N ) ______ ________ ____ string(1) S = "*" ______ Charno(S,1) = N result = S ______ end ___ * string(255) function TRIM ( string(255) S, integer Maximum ) ______ ________ ______ _______ Like SUB STRING, the TRIM function returns a sub-string of its string value argument for use as a string value within a string expression. As can be seen from the IMP definition below, TRIM is in fact defined in terms of SUB STRING: 4-2 string(255) function TRIM ( string(255) S, ______ ________ ______ integer Maximum ) _______ result = S if Length(S) <= Maximum ______ __ result = Sub String(S, 1, Maximum) ______ end ___ The effect of TRIM is to return its string value argument if its length is less than or equal to the provided MAXIMUM, or the first MAXIMUM characters of it if it is longer. For example: TRIM("ABC", 4) = "ABC" but TRIM("ABCDE",4) = "ABCD" TRIM provides a similar effect to the IMP language's obsolescent "jam transfer" assignment operator "<-" with two additional advantages. Firstly, while jam transfer may only be used in the context of an assignment, TRIM may be used in any string expression. For example, TRIM can be used to truncate the string value parameters for a procedure. Secondly, the jam transfer operator determines the maximum size of the resultant value from the declared (i.e. statically determined) size of the destination of the assignment, while TRIM may be used to truncate a value to a size determined during the execution of a program. 4.2 Numeric to String Conversions Three string(255) functions are available to perform conversions ______ ________ between numeric values and textual representations of those values. The procedure I TO S provides a textual representation of an integer value _______ in a fixed-point format without decimal point. For real and long real ____ ____ ____ values, a choice is provided of a fixed-point textual representation from R TO S or a floating-point representation using the procedure F TO S. Each of the three procedures described in this section is used as the basis of one of the "derived I/O" procedures in section 7. * string(255) function I TO S ( integer N, Places ) ______ ________ _______ This function returns a string which contains a decimal ______ representation of the integer parameter N. The parameter PLACES controls the format of the output string. A definition of the I TO S function is given as an IMP program fragment below: 4-3 string(255) function I TO S ( integer N, Places ) ______ ________ _______ string(255) function Digits ( integer X ) ______ ________ _______ integer I = X//10 _______ string(255) S = "" ; S = Digits(I) if I # 0 ______ __ result = S . To String(X-I*10+'0') ______ end ___ string(255) S = Digits(|N|) ______ if N < 0 start __ _____ S = "-" . S else if Places > 0 ____ __ S = " " . S finish ______ if Places <= 0 then Places = -Places - __ ____ else Places = Places+1 ____ S = " " . S while Length(S) < Places _____ result = S ______ end ___ Values of PLACES greater than zero are taken as a number of character positions to be allocated to the number, to which is added one more character position for a sign. The sign will be '-' for negative N and a space character for positive N. Negative or zero values of PLACES imply that the use of a space for a positive sign character is to be suppressed, and the absolute value of PLACES is used to determine the total field width. After conversion to a decimal representation with sign, the number is padded out to the required field size, if necessary, by leading space characters. Note that this implies that numbers which do not fit into the field specified are represented in the shortest form available. This is in contrast to FORTRAN formatted output, where numbers too large for the specified field are omitted and replaced by a field full of asterisks. The following table gives the textual representations returned by I TO S for the numbers +100 and -100 under a wide range of PLACES values. The results are enclosed in string quotes (") to indicate the presence of space characters, if any. The string quotes do not appear in genuine results from I TO S. PLACES N=+100 N=-100 -5 " 100" " -100" -4 " 100" "-100" -3 "100" "-100" -2 "100" "-100" -1 "100" "-100" 0 "100" "-100" 1 " 100" "-100" 2 " 100" "-100" 3 " 100" "-100" 4 " 100" " -100" 5 " 100" " -100" 4-4 * string(255) function R TO S ( long real R, ______ ________ ____ ____ integer Before, After ) _______ This function returns a fixed-point textual representation of the parameter R. The truncated integer part of R is expressed as if it had been generated by I TO S with BEFORE places, except that R TO S must handle a larget range of numbers correctly. This I TO S-like portion is followed by a decimal point and AFTER digits of fractional part. In general, the following grammar describes the syntax of the string generated by R TO S: dec digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; dec digits = dec digit, { dec digit } ; R TO S format = { " " }, [ "-" ], dec digits, ".", { dec digit } ; Examples: R TO S(1.5, 5, 2) = " 1.50" R TO S(0.0, 5, 2) = " 0.00" R TO S(1.2,-5, 0) = " 1." * string(255) function F TO S ( long real F, ______ ________ ____ ____ integer Before, After ) _______ This function returns a textual representation of the parameter R in exponential format. The leading digit of the value is printed as if by I TO S in BEFORE places. This is followed by a decimal point and AFTER places of fraction. Finally, this is followed by an exponent composed of an '@' character, a sign character for the exponent ('+' or '-') and then an implementation defined fixed width exponent field including leading zeros as necessary. The size of the exponent field will be large enough to hold the maximum possible exponent value for the long real data type. ____ ____ In general, the following grammar describes the syntax of the string generated by F TO S: dec digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; dec digits = dec digit, { dec digit } ; sign = "+" | "-" ; F TO S format = { " " }, [ "-" ], dec digit, ".", { dec digit }, "@", sign, dec digits ; Examples: F TO S(1.5, 5, 2) = " 1.50@+00" F TO S(0.0, 5, 2) = " 0.00@+00" F TO S(1.2,-5, 0) = " 1.@+00" 4-5 4.3 String to Numeric Conversions This section describes functions which are provided to allow textual representations of numbers (i.e. numbers represented as IMP string ______ values containing text like "-3") to be converted to the equivalent numerical representation as integer or floating-point values. * integer function S TO I ( string(255) S ) _______ ________ ______ This function can be seen as an inverse to the function I TO S described above, that is it takes as parameter a textual representation of an integer value and returns the corresponding integer value. Thus for any integer values of I and X, the _______ following relation will always hold: S To I ( IToS(I,X) ) = I Further to the above requirement, S TO I must always be able to ignore leading and trailing white space in its argument. Thus, for example: S To I ( S." " ) = S To I ( S ) and S To I ( " ".S ) = S To I ( S ) In addition to the textual form returned by I TO S, S TO I will return the numerical equivalent of numbers represented in the "based constant" form used in the IMP language, for example "16_11" being equivalent to "17" in decimal notation. S TO I will signal events (yet to be assigned) in the cases when the argument provided does not correspond to a legal integer constant. Particular important examples of illegal parameters to S TO I are the null string (""), any string consisting solely of white space, or a string containing a character such as "." which cannot form part of an integer constant. The following grammar describes the string values which are ______ acceptable as input to the S TO I procedure. Any other string ______ value will cause an event (yet to be allocated) to be signalled. white char = ? any single ASCII character whose value is strictly less than 33 ? ; white space = { white char } ; UC letter = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" ; LC letter = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ; 4-6 NC letter = UC letter | LC letter ; (* in NC letter, an LC letter has the same semantics as the corresponding UC letter *) dec digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; pos dec const = dec digit, { dec digit } ; based digit = dec digit | NC letter ; (* a based digit takes the integer values 0 .. 9 for the decimal digits, 10 .. 35 for the alphabet. This value may not equal or exceed the base constant preceeding the '_' character. *) based digits = based digit, { based digit } ; base constant = pos dec const (* 2 <= X <= 36 *) ; unsigned const = base constant, "_", based digits | pos dec const ; sign = "+" (* no effect *) | "-" (* entire value is negated *) ; int const = [ sign ], unsigned const ; STOI format = white space, int const, white space ; * long real function S TO R ( string(255) S ) ____ ____ ________ ______ The following grammar describes the string values which are ______ acceptable as input to the S TO R procedure. Any other string ______ value will cause an event (yet to be allocated) to be signalled. white char = ? any single ASCII character whose value is strictly less than 33 ? ; white space = { white char } ; UC letter = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" ; LC letter = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ; NC letter = UC letter | LC letter ; (* in NC letter, an LC letter has the same semantics as the corresponding UC letter *) 4-7 dec digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; dec digits = dec digit, { dec digit } ; pos dec const = dec digits; sign = "+" (* no effect *) | "-" (* entire value is negated *) ; exponent = "@", [ sign ], pos dec const ; (* note exponent is always in decimal *) based digit = dec digit | NC letter ; (* a based digit takes the integer values 0 .. 9 for the decimal digits, 10 .. 35 for the alphabet. This value may not equal or exceed the base constant preceeding the '_' character. *) based digits = based digit, { based digit } ; base constant = pos dec const (* 2 <= X <= 36 *) ; dec mantissa = dec digits, [ ".", [ dec digits ] ] | ".", dec digits ; based mantissa = based digits, [ ".", [ based digits ] ] | ".", based digits ; mantissa = dec mantissa | base constant, "_", based mantissa ; unsigned real = mantissa, [ exponent ] ; real const = [ sign ], unsigned real ; STOR format = white space, real const, white space ; 4.4 Text Manipulation This section describes a number of procedures designed to operate on string values which contain text. In particular, these procedures allow ______ the case of a string value or variable to be converted to a standard form, for example to be used during later comparisons. * routine TO LOWER ( string(*) name S ) _______ ______ ____ This procedure converts each upper case letter ('A' to 'Z' inclusive, ASCII values 65 to 90) in the string variable passed as parameter to the lower case equivalent ('a' to 'z', ASCII values 97 to 122). All other characters in the string variable are left unchanged. 4-8 routine To Lower ( string(*) name S ) _______ ______ ____ integer I _______ byte name P ____ ____ for I = 1, 1, Length(S) cycle ___ _____ P == Charno(S, I) if 'A' <= P <= 'Z' then P = P-'A'+'a' __ ____ repeat ______ end ___ * routine TO UPPER ( string(*) name S ) _______ ______ ____ This procedure converts each lower case letter ('a' to 'z' inclusive, ASCII values 97 to 122) in the string variable passed as parameter to the upper case equivalent ('A' to 'Z', ASCII values 65 to 90). All other characters in the string variable are left unchanged. routine To Upper ( string(*) name S ) _______ ______ ____ integer I _______ byte name P ____ ____ for I = 1, 1, Length(S) cycle ___ _____ P == Charno(S, I) if 'a' <= P <= 'z' then P = P-'a'+'A' __ ____ repeat ______ end ___ An example of a program fragment using this procedure might be the following: string(255) Word ______ Prompt("Yes or No:") Read(Word) To Upper(Word) if Word = "YES" then ... {accepts any case "yes"} __ ____ * string(255) function LOWER CASE ( string(255) S ) ______ ________ ______ This function returns a string value which is identical to its argument except for any upper case letters ('A' to 'Z' inclusive, ASCII values 65 to 90), which are converted to their lower case equivalents ('a' to 'z', values 97 to 122). This function can be defined in terms of the TO LOWER procedure as follows: string(255) function Lower Case ( string(255) S ) ______ ________ ______ To Lower(S) result = S ______ end ___ 4-9 * string(255) function UPPER CASE ( string(255) S ) ______ ________ ______ This function performs the opposite case standardisation operation to LOWER CASE, i.e. it returns a string value which is identical to its argument except for any instances of the lower case letters ('a' to 'z' inclusive, ASCII values 97 to 122). These characters are converted to their upper case equivalents ('A' to 'Z', ASCII values 65 to 90). This function can be defined in terms of the TO UPPER procedure as follows: string(255) function Upper Case ( string(255) S ) ______ ________ ______ To Upper(S) result = S ______ end ___ An example of an (extremely unlikely) program fragment using both the UPPER CASE and LOWER CASE functions might be the following: string(255) Word ______ Prompt("Word:") Read(Word) if Upper Case(Word) = Lower Case(Word) start __ _____ Print String("string contained no letters") New Line finish ______ End of input file Program stopped