com.raelity.text
Class RegExp

java.lang.Object
  extended by com.raelity.text.RegExp
Direct Known Subclasses:
RegExpJava

public abstract class RegExp
extends Object

This is a simple interface to regular expressions. It exists because the available regex packages have different interfaces. This package does not contain regular expression handling, rather it makes different reqular expression packages appear to have the same interface. It supports compiling a regular expression, matching a regular expression against a string and retrieving the matched parenthesized subexpressions.

This RegExp class is not directly instantiated. Concrete classes that are derived from this class are available though RegExpFactory.

Fortunately the available packages all seemed to be modeled after perl and the cababilities are generally the same. For better documentation on regular expressions refer to perl documentation or the excellent www pages of the packages, noted below, that are adapted.

The regular expression packages that have adaptors are com.stevesoft.pat and com.oroinc.text.regex. To use this software you must have one of the supported packages installed on your system and available in the classpath.

Here are simple examples to demonstrate the use of this software, not to teach you regular expressions.


     RegExp re = RegExpFactory.create();
     re.setEscape('#'); // use '#' instead of '\' for sanity
     try {
         re.compile("#s?(#w+)#s+(#w+)#s?"); // 2 words and maybe white space
     } catch(RegExpPatternError) {
       e.printStackTrace();  // Prints implementation stacktrace
     }
     re.search("hello      world       ");   // returns true
     System.out.println(re.group(1)+" "+re.group(2)); // "hello world"
 

The example above shows direct access to the RegExp object to get results. Alternately a result object can be obtained; it has the same methods for access. So the end of the above example could have been written as:


     RegExpResult r = re.getResult();
     System.out.println(r.group(1)+" "+r.group(2)); // "hello world"
 

When searching a character array or a string can be specified to be searched, and an index/offset from the beginning of the chars can also be specified. If the index/offset is zero, then this is assumed to be the beginning of line for use with the '^' anchor. If the index/offset is greater than zero, then the character before the index/offset is used to determin beginning of line.


Extensions
Some extensions are available.

There are features of the underlying packages that are not available. In the future.....

See Also:
RegExpFactory

Field Summary
protected  char escape
          The escape character.
static int IGNORE_CASE
          A compile option
protected  boolean matched
          Records if the last call to search returned true.
protected  boolean optim
          When true do max optimization.
static int PATTERN_NONE
          simple pattern matching only
static int PATTERN_PERL5
          perl5 patterns
static int PATTERN_SIMPLE
          simple pattern matching only
 
Constructor Summary
RegExp()
           
 
Method Summary
static boolean canInstantiate()
          This is used internally to determine if an underlying regular expression implementation is available.
 void compile(String pattern)
          Prepare this regular expression to use pattern for matching.
abstract  void compile(String pattern, int compileFlags)
          Prepare this regular expression to use pattern for matching.
 void enableOptimize(boolean state)
          Use this before compile is called.
static String getAdaptedName()
           
static String getDisplayName()
           
abstract  RegExpResult getResult()
          Returns a result object for the last call that used this RegExp for searching an input, for example match.
abstract  String group(int i)
          Get the ith backreference.
 boolean isMatch()
          Check if the last call to search matched.
abstract  int length(int i)
          The length of the corresponding backreference.
abstract  int nGroup()
          Return the number of backreferences; this is the number of parenthes pairs.
static int patternType()
          This method returns the type of pattern that is handled.
abstract  boolean search(char[] input, int start, int length)
          Search for a match in char array starting at char position start with the indicated length.
abstract  boolean search(String input)
          Search for match.
abstract  boolean search(String input, int start)
          Search for a match in string starting at char position start
 void setEscape(char escape)
          Use the specified char as the escape character.
abstract  int start(int i)
          The offset from the begining of the input to the start of thespecified group.
abstract  int stop(int i)
          The offset from the begining of the input to the end of the specified group.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

IGNORE_CASE

public static final int IGNORE_CASE
A compile option

See Also:
Constant Field Values

PATTERN_NONE

public static final int PATTERN_NONE
simple pattern matching only

See Also:
Constant Field Values

PATTERN_SIMPLE

public static final int PATTERN_SIMPLE
simple pattern matching only

See Also:
Constant Field Values

PATTERN_PERL5

public static final int PATTERN_PERL5
perl5 patterns

See Also:
Constant Field Values

escape

protected char escape
The escape character.


matched

protected boolean matched
Records if the last call to search returned true.


optim

protected boolean optim
When true do max optimization.

Constructor Detail

RegExp

public RegExp()
Method Detail

getDisplayName

public static String getDisplayName()

patternType

public static int patternType()
This method returns the type of pattern that is handled.


canInstantiate

public static boolean canInstantiate()
This is used internally to determine if an underlying regular expression implementation is available. A direct implementation of RegExp always returns true.

Since this is usually accessed by a Method, it is declared public to reduce the security requirements.

Returns:
True if the underlying regular expression implementation is available, otherwise false.

getAdaptedName

public static String getAdaptedName()
Returns:
The name of the regular expression class that this class is adapting is returned.

setEscape

public void setEscape(char escape)
Use the specified char as the escape character. The escape character defaults to '\'. To have an effect this must be done before the compile method is invoked.


enableOptimize

public void enableOptimize(boolean state)
Use this before compile is called. When true the pattern parser is optimized for speed. Note this is called before compile. Not all regular expression packages have optional optimization. Optimization defaults to false.


compile

public void compile(String pattern)
             throws RegExpPatternError
Prepare this regular expression to use pattern for matching.

Parameters:
pattern - The regular expression.
Throws:
RegExpPatternError - This is thrown if there is a syntax error detected in the regular expression.

compile

public abstract void compile(String pattern,
                             int compileFlags)
                      throws RegExpPatternError
Prepare this regular expression to use pattern for matching.

Parameters:
pattern - The regular expression.
Throws:
RegExpPatternError - This is thrown if there is a syntax error detected in the regular expression.

search

public abstract boolean search(String input)
Search for match.

Returns:
true if input matches this regular expression.

search

public abstract boolean search(String input,
                               int start)
Search for a match in string starting at char position start

Returns:
true if input matches this regular expression.

search

public abstract boolean search(char[] input,
                               int start,
                               int length)
Search for a match in char array starting at char position start with the indicated length.

Returns:
true if input matches this regular expression.

isMatch

public boolean isMatch()
Check if the last call to search matched.

Returns:
True if the match succeded. False if it faled or if no match has been attempted.

getResult

public abstract RegExpResult getResult()
Returns a result object for the last call that used this RegExp for searching an input, for example match. Further calls to match do not change the RegExpResult that is returned.

Returns:
The results of the match or null if the match failed.
See Also:
RegExpResult

nGroup

public abstract int nGroup()
Return the number of backreferences; this is the number of parenthes pairs.

See Also:
RegExpResult.nGroup()

group

public abstract String group(int i)
Get the ith backreference.

See Also:
RegExpResult.group(int)

length

public abstract int length(int i)
The length of the corresponding backreference.


start

public abstract int start(int i)
The offset from the begining of the input to the start of thespecified group.

See Also:
RegExpResult.start(int)

stop

public abstract int stop(int i)
The offset from the begining of the input to the end of the specified group.

See Also:
RegExpResult.stop(int)