CSC/ECE 517 Summer 2008/wiki1 1 mf

From Expertiza_Wiki
Jump to navigation Jump to search

Regular expressions in Ruby versus Java

Ruby supports regular expressions as a language feature without the inclusion of any special classes or modules. Java on the other hand does native regular expression support and it requires the use of special regular expression packages to use them.

If you just want to learn some patterns quickly, reference here.

Ruby regexp support in more depth

Since Ruby borrows it syntax from Perl the is nothing more than the simple syntax of any one of the following three (options are always optional):

/pattern/modifiers
OR
%r{pattern}options
OR
Regexp.new( 'pattern' [, options ])

This alone is what is needed to create an instance of the regular expression class. You can then use your regular expression in conjunction with a string class object to exercise the expression.

Java regexp support in more depth

Support for regexp

Java has been around for a while have has never had native regexp support. Because of this regular expression packages had to be created. There was no comprehensive support for regexp support from Java's main contributor, Sun, until Java 4. Because of this there are multiple 3rd party regexp packages for Java floating around:

  • java.util.regex The most widely used for regular expressions now due to its inclusion in the JDK since Java 4. This document will assume from this point forward that we are using this package for our regular expressions in Java.
  • Jakarta Around since 1996, Jakarta was donated to the Apache Software Foundation and is under an open-source, BSD style license
  • dk.brics.automaton Automaton is known for being the fastest of the Java regexp implementations
  • And the list goes on...

Classes with regular expression abilities

String class

The String class provides simple regular expression support. It is the quickest way to write code to do matching, replacement, or splitting on a string. However, it is not very fast and therefore should not be used if performance is a factor. The String classes regular expression matching also has a severe limitation. Any regular expression passed to it will be interpreted as if it has to span the whole string. IE: ^ is appended to the front and $ is appended to the tail of your expression.

Pattern class

The Pattern class is a compiled representation of a regular expression. A regular expression is input as a string then compiled so that it can used repeatedly by the Matches class or a single time to provide a single match. Using it for a single match is a fairly inefficient use of the class. Once compiled the Pattern class can be used to split a string into an array of matches.

Matcher class

The Matcher class is an engine that performs match operations on a string by interpreting a Pattern. A Matcher class is able to do matching and replacement. The Matcher class does have the ability to return all matches in a string, but only one string at a time.

Examples

Match a pattern

Lets if we can have Manslaughter without laughter.

Ruby

myStr = "Manslaughter"
myStr =~ /laughter/ # this returns 1 since there is 1 instance instance of laughter in Manslaughter
myStr =~ /hilarious/ # this returns nil because there is no instance of hilarious in Manslaughter

OR

myStr = "Manslaughter"
/laughter/.match(myStr) # returns an instance of MatchData class because there is at least 1 match
/hilarious/.match(myStr) # returns nil because there are 0 matches

Java

String myStr = "Manslaughter";
myStr.matches("/laughter/"); // this returns false because of Java auto appending ^ and $ to your regular expression
myStr.matches("/.*laughter/"); // this returns true, but is not the way you'd think of doing it first

OR

String myStr = "Manslaughter";
Pattern.matches("/laughter/", myStr); // this returns true

OR

String myStr = "Manslaughter";
Pattern p = Pattern.compile("/laughter/");
Matcher m = p.matcher(myStr);
m.matches(); // this returns true (this method was truly overkill, no pun intended)

Search for text and replace

Replace laughter in Manslaughter with ion

Ruby

myStr = "Manslaughter"
myNewStr = myStr.sub(/laughter/,"ion") # myNewStr now contains Mansion

Java

String myStr = "Manslaughter";
String myNewStr = myStr.replaceFirst("/laughter/", "ion"); // myNewStr now contains Mansion

OR

String myStr = "Manslaughter";
Pattern p = Pattern.compile("/laughter/");
Matcher m = p.matcher(myString);
String myNewStr = m.replaceFirst("ion"); // myNewStr now contains Mansion

Collect matches

Collect the individual words in a string in an array-like device

Ruby

myStr = "This is a sample"
matches = myStr.scan(/\w+/) # matches is now an array with "This", "is", "a", and "sample" in it

Java

String myStr = "This is a sample";
String[] matches = myStr.split("/\s/"); 
/* matches is now an array with "This", "is", "a", and "sample" in it
notice that it's a different regular expression from the first line though */

OR

String myStr = "This is a sample";
Pattern p = Pattern.compile("/\w+/");
Matcher m = p.matcher(myStr);
Vector matches = new Vector(); // we have to use a vector instead of an array so that we can add elements
while( m.find() ) {
    matches.add( m.group() );
}
// Now down here matches is a vector with "This", "is", "a", and "sample" in it

References

-- Michael Frisch (Tuesday, June 3, 2008)