CSC/ECE 517 Summer 2008/wiki1 1 mb

From Expertiza_Wiki
Jump to navigation Jump to search

Regular expression support in Java and Ruby

What are regular expressions?

Regular expressions provide an efficient way to match patterns against strings. Both Java and Ruby use a regular expression syntax similar to Perl.

Perl regular expression syntax

Some of the more common elements of regular expression patterns include:

  • Anchoring: Anchors provide a mechanism to ensure that a pattern matches only a specific portion of the string. Special characters are used to indicate if a match should occur at the beginning or end of a string or at the beginning or end of a line.
Anchor example
String Regular Expression Result
"This is a line. \nSo is this" ^So match
"This is a line. \nSo is this" \ASo no match

In this example the first regular expression matches the string because the '^' character in the regular expression represents that the part of the string before the match must occur at the beginning of a line. The second example does not match because the sequence '\A' in the regular expression states that the match must occur at the beginning of the string.

  • Character Classes: Character classes provide a way to represent a set of characters within a regular expression.
Some Character Classes
Regular Expression Matches
[abc] a, b, or c
[0-3] 0, 1, 2, or 3
[^abc] anything that is not a, b, or c
. Matches any character
  • Multiplicity: It is often desirable to match a predetermined number of occurrences of a pattern within a string. This is accomplished using the following modifiers:
Pattern Multiplicity
Modifier Matches
* zero or more occurrences
+ one or more occurrences
? zero or one occurrence
{m,n} at least m and at most n occurrences
{m,} at least m occurrences
{m} exactly m occurrences
  • Grouping: Parenthesis are used within regular expressions to group elements within a regular expression. The elements enclosed will then be treated as one element.
Grouping example
String Regular Expression Result
"mississippi" m(iss){2}ippi match
"mississippi" m(iss){2,}ippi no match
  • Alternation: It is possible to build patterns that involve matching a particular portion of a string OR a different portion of the string. The '|' charcter is used to accomplish this.
Alternation example
String Regular Expression Result
"monkey" m(onvertical baronk)ey match
"money" m(onvertical baronk)ey match
Anchoring Example

How regular expressions are handled in Java

In Java, regular expressions are supported using the java.util.regex package.
Package summary for java.util.regex

The java.util.regex package contains two classes:
The Pattern class and the Matcher class.

Instances of the Pattern class are used to represent regular expressions. Instances of the Matcher class are used to match the regular expressions described by the Pattern object against strings. The Pattern class provides methods to facilitate splitting strings based upon provided regular expressions, while the Matcher class provides methods that allow patterns to be found, replaced, and examined further.

How regular expressions are handled in Ruby

In Ruby, regular expressions are supported via the Regexp class.
Class summary for Regexp

The match and last_match methods of the Regexp class return MatchData objects.
Class summary for MatchData

Similar to the Pattern class in Java, instances of the Regexp class are used to represent regular expressions in Ruby. The methods of the Regexp class provide functionality that is comparable to the functionality provided by Java's Pattern class, such as splitting strings and testing for the existence of matches. The more sophisticated operations applicable to regular expressions are handled by the methods in the MatchData class.

Code Examples

Example: Examine a string and replace any moo sound with moo. A moo sound is a word that begins with "moo" and may have more "o"'s on the end of it.
These are valid moo sounds: moo, moooo, mooooooo

Java:
String s = “Cows say moooooo, when they are tired they say moooo”;
Pattern p = Pattern.compile("mo{2,}");
Matcher m = p.matcher(s);
s = m.replaceAll(“moo”);

Ruby:
s = “Cows say moooooo, when they are tired they say moooo”
e = Regexp.new(/mo{2,}/)
s = s.gsub(e, “moo”)

The resulting string for both Java and Ruby will now be:
Cows say moo, when they are tired they say moo.

Useful links

Basic regular expression syntax
Advanced regular expression syntax
java.util.regex Examples from The Java Developers Almanac 1.4