CSC/ECE 517 Summer 2008/wiki1 1 mf
Regular expressions in Ruby versus Java
Ruby supports regular expressions as a language feature without the inclusion of any special classes or modules. Java on the other hand does native regular expression support and it requires the use of special regular expression packages to use them.
If you just want to learn some patterns quickly, reference here.
Ruby regexp support in more depth
Since Ruby borrows it syntax from Perl the is nothing more than the simple syntax of any one of the following three (options are always optional):
/pattern/modifiers OR %r{pattern}options OR Regexp.new( 'pattern' [, options ])
This alone is what is needed to create an instance of the regular expression class. You can then use your regular expression in conjunction with a string class object to exercise the expression.
Java regexp support in more depth
Support for regexp
Java has been around for a while have has never had native regexp support. Because of this regular expression packages had to be created. There was no comprehensive support for regexp support from Java's main contributor, Sun, until Java 4. Because of this there are multiple 3rd party regexp packages for Java floating around:
- java.util.regex The most widely used for regular expressions now due to its inclusion in the JDK since Java 4. This document will assume from this point forward that we are using this package for our regular expressions in Java.
- Jakarta Around since 1996, Jakarta was donated to the Apache Software Foundation and is under an open-source, BSD style license
- dk.brics.automaton Automaton is known for being the fastest of the Java regexp implementations
- And the list goes on...
Classes with regular expression abilities
String class
The String class provides simple regular expression support. It is the quickest way to write code to do matching, replacement, or splitting on a string. However, it is not very fast and therefore should not be used if performance is a factor. The String classes regular expression matching also has a severe limitation. Any regular expression passed to it will be interpreted as if it has to span the whole string. IE: ^ is appended to the front and $ is appended to the tail of your expression.
Pattern class
The Pattern class is a compiled representation of a regular expression. A regular expression is input as a string then compiled so that it can used repeatedly by the Matches class or a single time to provide a single match. Using it for a single match is a fairly inefficient use of the class. Once compiled the Pattern class can be used to split a string into an array of matches.
Matcher class
The Matcher class is an engine that performs match operations on a string by interpreting a Pattern. A Matcher class is able to do matching and replacement. The Matcher class does have the ability to return all matches in a string, but only one string at a time.
Examples
Match a pattern
Lets if we can have Manslaughter without laughter.
Ruby
myStr = "Manslaughter" myStr =~ /laughter/ # this returns 1 since there is 1 instance instance of laughter in Manslaughter myStr =~ /hilarious/ # this returns nil because there is no instance of hilarious in Manslaughter
OR
myStr = "Manslaughter" /laughter/.match(myStr) # returns an instance of MatchData class because there is at least 1 match /hilarious/.match(myStr) # returns nil because there are 0 matches
Java
String myStr = "Manslaughter"; myStr.matches("/laughter/"); // this returns false because of Java auto appending ^ and $ to your regular expression myStr.matches("/.*laughter/"); // this returns true, but is not the way you'd think of doing it first
OR
String myStr = "Manslaughter"; Pattern.matches("/laughter/", myStr); // this returns true
OR
String myStr = "Manslaughter"; Pattern p = Pattern.compile("/laughter/"); Matcher m = p.matcher(myStr); m.matches(); // this returns true (this method was truly overkill, no pun intended)
Search for text and replace
Replace laughter in Manslaughter with ion
Ruby
myStr = "Manslaughter" myNewStr = myStr.sub(/laughter/,"ion") # myNewStr now contains Mansion
Java
String myStr = "Manslaughter"; String myNewStr = myStr.replaceFirst("/laughter/", "ion"); // myNewStr now contains Mansion
OR
String myStr = "Manslaughter"; Pattern p = Pattern.compile("/laughter/"); Matcher m = p.matcher(myString); String myNewStr = m.replaceFirst("ion"); // myNewStr now contains Mansion
Collect matches
Collect the individual words in a string in an array-like device
Ruby
myStr = "This is a sample" matches = myStr.scan(/\w+/) # matches is now an array with "This", "is", "a", and "sample" in it
Java
String myStr = "This is a sample"; String[] matches = myStr.split("/\s/"); /* matches is now an array with "This", "is", "a", and "sample" in it notice that it's a different regular expression from the first line though */
OR
String myStr = "This is a sample"; Pattern p = Pattern.compile("/\w+/"); Matcher m = p.matcher(myStr); Vector matches = new Vector(); // we have to use a vector instead of an array so that we can add elements while( m.find() ) { matches.add( m.group() ); } // Now down here matches is a vector with "This", "is", "a", and "sample" in it
References
- Ruby Regexp Class - Regular Expressions in Ruby
- Rails for PHP Developers - Regular Expressions in Ruby
- Using Regular Expressions in Java
- java.util.regex
- Jakarta
- dk.brics.automaton
- Java Classes
-- Michael Frisch (Tuesday, June 3, 2008)