User-Defined Character Classes

All programs that support true regular expressions provide a way to enter sets of characters, usually using the notation [abc] or (a|b|c), sometimes both. Such notations have, however, two limitations. First, even if they are predefined in the initialization file so that it is not necessary to type the same set repeatedly, once inserted into the regular expression, it is easy to lose track of what each set consists of. Second, some sets are large, with the result that the regular expression becomes long and unwieldy. Redet provides a means for overcoming these limitations in the form of user-defined character classes.

Redet allows the user to define any number of named character sets. This facility is disabled by default to prevent it from confusing users who expect the normal behavior of the chosen program. You can enable it interactively via the Tools:Classes menu or from your initialization file by including the line:

UserClassesEnabledP T

It is automatically enabled if you define a class interactively.

Character classes may be defined interactively, via the command Enter Character Class Definition, read from a file, via the command Load Character Class Definitions, or defined in the initialization file using the command DefineCharacterClass. Definitions entered interactively may be saved for future use via the command Save Character Class Definitions to File.

A definition consists of three parts:

If defined interactively, each of the three components is entered separately, as shown below:

A character class definition entered interactively

A character class definition file contains one class per line, with the class name, the characters themselves, and the gloss in that order in three fields separated by tabs, e.g.:

vowels	aeiou	The English vowel letters

Similarly, the initialization file command DefineCharacterClass takes three arguments: the class name, the set of characters, and the gloss.

If a class defined in a file is already defined, the new definition overrides the old one silently. However, if a definition is entered interactively for an existing class, Redet asks whether it should redefine the class.

A popup asking whether Redet should redefine an existing class definition

If the user prefers not to redefine the existing class, the definition window reappears with the character and gloss fields filled in as before but with the name field empty, ready to receive a new name.

A stage two class definition popup

Once defined, a character class may be included in a regular expression just like any other component. User-defined character classes are listed in a palette, from which they may be copied into the regular expression window with a mouse click like program palette entries. To see which characters belong to a user-defined character class, double left click on the palette entry for the class.

A palette of user-defined character classes

One of the advantages of character classes defined in this way is that they do not clutter up the regular expression window. However, sometimes it is desirable to see exactly what is being executed. The command Display Regular Expression Actually Executed on the Class menu pops up a window showing the regular expression as executed. When first popped up, it shows the last regular expression executed. If left up, it is updated each time a regular expression is executed.

User-defined character classes are entered using a fixed notation; Redet automatically translates this notation into notation appropriate for the selected program.

A regular expression as actually executed

A regular expression as actually executed

An additional extension provided by Redet allows user-defined named character classes to be intersected. A sequence of two or more user-defined named character classes enclosed within angle-brackets is translated into the intersection of the character classes.

Intersection of character classes

Here we see a regular expression matching a sequence of three characters. The classes use the feature notation used in linguistics. The first character is specified as a voiced labial. The labials in English are the consonants p,b,m,f,v and w. Of these, p and f are voiceless (that is, the vocal folds do not vibrate during these sounds); the others are voiced. The second character is specified as a front vowel, the third as a nasal.

The regular expression actually executed as a result of character class intersection

Some pattern matching engines use angle-brackets as metacharacters, and it is sometimes necessary to match angle-brackets, so the use of angle-brackets as delimiters for character-class intersection can create conflicts. One way to deal with this problem is to disable the user defined character class facility, which you may do either from the User Class submenu of the Tools menu or by means of the initialization file command UserClassesEnabledP. Another way is to redefine the intersection delimiters, which you may do by means of the initialization file commands SetLeftUserClassIntersectionDelimiter and SetRightUserClassIntersectionDelimiter.


Back to Table of Contents