CSC/ECE 517 Summer 2008/wiki1 1 mb
Regular expression support in Java and Ruby
What are regular expressions?
Regular expressions provide an efficient way to match patterns against strings. Both Java and Ruby use a regular expression syntax similar to Perl.
Some of the more common elements of regular expression patterns include:
- Anchoring: Anchors provide a mechanism to ensure that a pattern matches only a specific portion of the string. Special characters are used to indicate if a match should occur at the beginning or end of a string or at the beginning or end of a line.
String | Regular Expression | Result |
---|---|---|
"This is a line. \nSo is this" | ^So | match |
"This is a line. \nSo is this" | \ASo | no match |
In this example the first regular expression matches the string because the '^' character in the regular expression represents that the part of the string before the match must occur at the beginning of a line. The second example does not match because the sequence '\A' in the regular expression states that the match must occur at the beginning of the string.
- Character Classes: Character classes provide a way to represent a set of characters within a regular expression.
Regular Expression | Matches |
---|---|
[abc] | a, b, or c |
[0-3] | 0, 1, 2, or 3 |
[^abc] | anything that is not a, b, or c |
. | Matches any character |
- Multiplicity: It is often desirable to match a predetermined number of occurrences of a pattern within a string. This is accomplished using the following modifiers:
Modifier | Matches |
---|---|
* | zero or more occurrences |
+ | one or more occurrences |
? | zero or one occurrence |
{m,n} | at least m and at most n occurrences |
{m,} | at least m occurrences |
{m} | exactly m occurrences |
- Grouping: Parenthesis are used within regular expressions to group elements within a regular expression. The elements enclosed will then be treated as one element.
String | Regular Expression | Result |
---|---|---|
"mississippi" | m(iss){2}ippi | match |
"mississippi" | m(iss){2,}ippi | no match |
- Alternation: It is possible to build patterns that involve matching a particular portion of a string OR a different portion of the string. The '|' charcter is used to accomplish this.
String | Regular Expression | Result |
---|---|---|
"monkey" | m(onvertical baronk)ey | match |
"money" | m(onvertical baronk)ey | match |
These are only a few examples of some of the features of regular expressions that allow patterns to be created that can succinctly and efficiently match strings. For a more detailed description of these features and a complete definition of the syntax, visit the following site.
Perl regular expression syntax
How regular expressions are handled in Java
In Java, regular expressions are supported using the java.util.regex package.
Package summary for java.util.regex
The java.util.regex package contains two classes:
The Pattern class and
the Matcher class.
Instances of the Pattern class are used to represent regular expressions. Instances of the Matcher class are used to match the regular expressions described by the Pattern object against strings. The Pattern class provides methods to facilitate splitting strings based upon provided regular expressions, while the Matcher class provides methods that allow patterns to be found, replaced, and examined further.
If multiple calls are made to the find method in the Matcher class, the match will resume where the last match occurred. This is the default behavior for the Matcher class, whereas in Perl a flag must be set so that matching will resume where the last match occurred. This difference between Java and Perl, as well as a summary of Java regular expression constructs can be found in the Java API for the Pattern class.
How regular expressions are handled in Ruby
In Ruby, regular expressions are supported via the Regexp class.
Class summary for Regexp
The match and last_match methods of the Regexp class return MatchData objects.
Class summary for MatchData
Similar to the Pattern class in Java, instances of the Regexp class are used to represent regular expressions in Ruby. The methods of the Regexp class provide functionality that is comparable to the functionality provided by Java's Pattern class, such as splitting strings and testing for the existence of matches. The more sophisticated operations applicable to regular expressions are handled by the methods in the MatchData class.
An important distinction between Java regular expressions and Ruby regular expressions is how they are created. Java uses Pattern objects and Matcher objects which are created explicitly. In Ruby regular expression objects can be created explicitly or implicitly using the literal forms /pattern/ or %r{pattern}. The implicit creation of regular expression objects in Ruby helps to facilitate a more succinct way of creating, passing, and manipulating regular expressions within code.
Regexp.new('[a-z]') = /[a-z]/ = %r{[a-z]}
An Example
Example: Examine a string and replace any moo sound with moo. A moo sound is a word that begins with "moo" and may have more "o"'s on the end of it.
These are valid moo sounds: moo, moooo, mooooooo
Java: String s = “Cows say moooooo, when they are tired they say moooo”; Pattern p = Pattern.compile("mo{2,}"); Matcher m = p.matcher(s); s = m.replaceAll(“moo”); Ruby: s = “Cows say moooooo, when they are tired they say moooo” e = Regexp.new(/mo{2,}/) s = s.gsub(e, “moo”) The resulting string for both Java and Ruby will now be: Cows say moo, when they are tired they say moo.
Useful links
Basic regular expression syntax
Advanced regular expression syntax
java.util.regex Examples from The Java Developers Almanac 1.4