CSC/ECE 517 Summer 2008/wiki1 1 mb

From Expertiza_Wiki
Jump to navigation Jump to search

Regular expression support in Java and Ruby

What are regular expressions?

Regular expressions provide an efficient way to match patterns against strings. Both Java and Ruby use a regular expression syntax similar to Perl.

Some of the more common elements of regular expression patterns include:

  • Anchoring: Anchors provide a mechanism to ensure that a pattern matches only a specific portion of the string. Special characters are used to indicate if a match should occur at the beginning or end of a string or at the beginning or end of a line.
Anchor example
String Regular Expression Result
"This is a line. \nSo is this" ^So match
"This is a line. \nSo is this" \ASo no match

In this example the first regular expression matches the string because the '^' character in the regular expression represents that the part of the string before the match must occur at the beginning of a line. The second example does not match because the sequence '\A' in the regular expression states that the match must occur at the beginning of the string.

  • Character Classes: Character classes provide a way to represent a set of characters within a regular expression.
Some Character Classes
Regular Expression Matches
[abc] a, b, or c
[0-3] 0, 1, 2, or 3
[^abc] anything that is not a, b, or c
. Matches any character
  • Multiplicity: It is often desirable to match a predetermined number of occurrences of a pattern within a string. This is accomplished using the following modifiers:
Pattern Multiplicity
Modifier Matches
* zero or more occurrences
+ one or more occurrences
? zero or one occurrence
{m,n} at least m and at most n occurrences
{m,} at least m occurrences
{m} exactly m occurrences
  • Grouping: Parenthesis are used within regular expressions to group elements within a regular expression. The elements enclosed will then be treated as one element.
Grouping example
String Regular Expression Result
"mississippi" m(iss){2}ippi match
"mississippi" m(iss){2,}ippi no match
  • Alternation: It is possible to build patterns that involve matching a particular portion of a string OR a different portion of the string. The '|' charcter is used to accomplish this.
Alternation example
String Regular Expression Result
"monkey" m(onvertical baronk)ey match
"money" m(onvertical baronk)ey match

These are only a few examples of some of the features of regular expressions that allow patterns to be created that can succinctly and efficiently match strings. For a more detailed description of these features and a complete definition of the syntax, visit the following site.

Perl regular expression syntax

How regular expressions are handled in Java

In Java, regular expressions are supported using the java.util.regex package.
Package summary for java.util.regex

The java.util.regex package contains two classes:
The Pattern class and the Matcher class.

Instances of the Pattern class are used to represent regular expressions. Instances of the Matcher class are used to match the regular expressions described by the Pattern object against strings. The Pattern class provides methods to facilitate splitting strings based upon provided regular expressions, while the Matcher class provides methods that allow patterns to be found, replaced, and examined further.

If multiple calls are made to the find method in the Matcher class, the match will resume where the last match occurred. This is the default behavior for the Matcher class, whereas in Perl a flag must be set so that matching will resume where the last match occurred. This difference between Java and Perl, as well as a summary of Java regular expression constructs can be found in the Java API for the Pattern class.

How regular expressions are handled in Ruby

In Ruby, regular expressions are supported via the Regexp class.
Class summary for Regexp

The match and last_match methods of the Regexp class return MatchData objects.
Class summary for MatchData

Similar to the Pattern class in Java, instances of the Regexp class are used to represent regular expressions in Ruby. The methods of the Regexp class provide functionality that is comparable to the functionality provided by Java's Pattern class, such as splitting strings and testing for the existence of matches. The more sophisticated operations applicable to regular expressions are handled by the methods in the MatchData class.

An important distinction between Java regular expressions and Ruby regular expressions is how they are created. Java uses Pattern objects and Matcher objects which are created explicitly. In Ruby regular expression objects can be created explicitly or implicitly using the literal forms /pattern/ or %r{pattern}. The implicit creation of regular expression objects in Ruby helps to facilitate a more succinct way of creating, passing, and manipulating regular expressions within code.

Regexp.new('[a-z]') = /[a-z]/ = %r{[a-z]}

An Example

Example: Examine a string and replace any moo sound with moo. A moo sound is a word that begins with "moo" and may have more "o"'s on the end of it.
These are valid moo sounds: moo, moooo, mooooooo

Java:
String s = “Cows say moooooo, when they are tired they say moooo”;
Pattern p = Pattern.compile("mo{2,}");
Matcher m = p.matcher(s);
s = m.replaceAll(“moo”);

Ruby:
s = “Cows say moooooo, when they are tired they say moooo”
e = Regexp.new('mo{2,}')
s = s.gsub(e, “moo”)

The resulting string for both Java and Ruby will now be:
Cows say moo, when they are tired they say moo.

It is also interesting to note that the gsub method call made by Ruby is a method in the string class,
 but the Java example is using the method replaceAll which is part of the Matcher class.

Useful links

Basic regular expression syntax
Advanced regular expression syntax
java.util.regex Examples from The Java Developers Almanac 1.4
Neat Regular Expression Tool
Yet Another Regular Expression Tool