CSC/ECE 517 Summer 2008/wiki1 1 mf: Difference between revisions
mNo edit summary |
|||
(38 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= Regular expressions in Ruby versus Java = | = Regular expressions in Ruby versus Java = | ||
Ruby supports regular expressions as a language feature without the inclusion of any special classes or modules. Java on the other hand does native regular expression support and it requires the use of special regular expression packages to use them. | Ruby supports regular expressions as a language feature without the inclusion of any special classes or modules. Java on the other hand does native regular expression support and it requires the use of special regular expression packages to use them. | ||
If you just want to learn some patterns quickly, reference [http://www.regular-expressions.info/reference.html here]. | |||
== Ruby regexp support in more depth == | == Ruby regexp support in more depth == | ||
Since Ruby borrows it syntax from Perl the is nothing more than the simple syntax of any one of the following three (options are always optional): | |||
/''pattern''/''modifiers'' | |||
OR | |||
%r{''pattern''}''options'' | |||
OR | |||
Regexp.new( '''pattern''' [, ''options'' ]) | |||
This alone is what is needed to create an instance of the regular expression class. You can then use your regular expression in conjunction with a string class object to exercise the expression. | |||
== Java regexp support in more depth == | == Java regexp support in more depth == | ||
=== Support for regexp === | |||
Java has been around for a while have has never had native regexp support. Because of this regular expression packages had to be created. There was no comprehensive support for regexp support from Java's main contributor, Sun, until Java 4. Because of this there are multiple 3rd party regexp packages for Java floating around: | Java has been around for a while have has never had native regexp support. Because of this regular expression packages had to be created. There was no comprehensive support for regexp support from Java's main contributor, Sun, until Java 4. Because of this there are multiple 3rd party regexp packages for Java floating around: | ||
* [http://java.sun.com/docs/books/tutorial/essential/regex/ java.util.regex] The most widely used for regular expressions now due to its inclusion in the JDK since Java 4 | * [http://java.sun.com/docs/books/tutorial/essential/regex/ java.util.regex] The most widely used for regular expressions now due to its inclusion in the JDK since Java 4. This document will assume from this point forward that we are using this package for our regular expressions in Java. | ||
* [http://jakarta.apache.org/regexp/index.html Jakarta] Around since 1996, Jakarta was donated to the Apache Software Foundation and is under an open-source, BSD style license | * [http://jakarta.apache.org/regexp/index.html Jakarta] Around since 1996, Jakarta was donated to the Apache Software Foundation and is under an open-source, BSD style license | ||
* [http://www.brics.dk/~amoeller/automaton/ dk.brics.automaton] Automaton is known for being the fastest of the Java regexp implementations | * [http://www.brics.dk/~amoeller/automaton/ dk.brics.automaton] Automaton is known for being the fastest of the Java regexp implementations | ||
* And the list goes on... | * And the list goes on... | ||
===Classes with regular expression abilities=== | |||
====String class==== | |||
The [http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html String class] provides simple regular expression support. It is the quickest way to write code to do matching, replacement, or splitting on a string. However, it is not very fast and therefore should not be used if performance is a factor. The String classes regular expression matching also has a severe limitation. Any regular expression passed to it will be interpreted as if it has to span the whole string. IE: ''^'' is appended to the front and ''$'' is appended to the tail of your expression. | |||
====Pattern class==== | |||
The [http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html Pattern class] is a compiled representation of a regular expression. A regular expression is input as a string then compiled so that it can used repeatedly by the Matches class or a single time to provide a single match. Using it for a single match is a fairly inefficient use of the class. Once compiled the Pattern class can be used to split a string into an array of matches. | |||
====Matcher class==== | |||
The [http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html Matcher class] is an engine that performs match operations on a string by interpreting a Pattern. A Matcher class is able to do matching and replacement. The Matcher class does have the ability to return all matches in a string, but only one string at a time. | |||
== Examples == | |||
===Match a pattern=== | |||
Lets if we can have Manslaughter without laughter. | |||
====Ruby==== | |||
<pre> | |||
myStr = "Manslaughter" | |||
myStr =~ /laughter/ # this returns 1 since there is 1 instance instance of laughter in Manslaughter | |||
myStr =~ /hilarious/ # this returns nil because there is no instance of hilarious in Manslaughter | |||
</pre> | |||
''OR'' | |||
<pre> | |||
myStr = "Manslaughter" | |||
/laughter/.match(myStr) # returns an instance of MatchData class because there is at least 1 match | |||
/hilarious/.match(myStr) # returns nil because there are 0 matches | |||
</pre> | |||
====Java==== | |||
<pre> | |||
String myStr = "Manslaughter"; | |||
myStr.matches("/laughter/"); // this returns false because of Java auto appending ^ and $ to your regular expression | |||
myStr.matches("/.*laughter/"); // this returns true, but is not the way you'd think of doing it first | |||
</pre> | |||
''OR'' | |||
<pre> | |||
String myStr = "Manslaughter"; | |||
Pattern.matches("/laughter/", myStr); // this returns true | |||
</pre> | |||
''OR'' | |||
<pre> | |||
String myStr = "Manslaughter"; | |||
Pattern p = Pattern.compile("/laughter/"); | |||
Matcher m = p.matcher(myStr); | |||
m.matches(); // this returns true (this method was truly overkill, no pun intended) | |||
</pre> | |||
=== Search for text and replace === | |||
Replace laughter in Manslaughter with ion | |||
====Ruby==== | |||
<pre> | |||
myStr = "Manslaughter" | |||
myNewStr = myStr.sub(/laughter/,"ion") # myNewStr now contains Mansion | |||
</pre> | |||
====Java==== | |||
<pre> | |||
String myStr = "Manslaughter"; | |||
String myNewStr = myStr.replaceFirst("/laughter/", "ion"); // myNewStr now contains Mansion | |||
</pre> | |||
''OR'' | |||
<pre> | |||
String myStr = "Manslaughter"; | |||
Pattern p = Pattern.compile("/laughter/"); | |||
Matcher m = p.matcher(myString); | |||
String myNewStr = m.replaceFirst("ion"); // myNewStr now contains Mansion | |||
</pre> | |||
= | === Collect matches === | ||
Collect the individual words in a string in an array-like device | |||
====Ruby==== | |||
<pre> | |||
myStr = "This is a sample" | |||
matches = myStr.scan(/\w+/) # matches is now an array with "This", "is", "a", and "sample" in it | |||
</pre> | |||
====Java==== | |||
<pre> | |||
String myStr = "This is a sample"; | |||
String[] matches = myStr.split("/\s/"); | |||
/* matches is now an array with "This", "is", "a", and "sample" in it | |||
notice that it's a different regular expression from the first line though */ | |||
</pre> | |||
''OR'' | |||
<pre> | |||
String myStr = "This is a sample"; | |||
Pattern p = Pattern.compile("/\w+/"); | |||
Matcher m = p.matcher(myStr); | |||
Vector matches = new Vector(); // we have to use a vector instead of an array so that we can add elements | |||
while( m.find() ) { | |||
matches.add( m.group() ); | |||
} | |||
// Now down here matches is a vector with "This", "is", "a", and "sample" in it | |||
</pre> | |||
= References = | = References = | ||
* [http://www.regular-expressions.info/ruby.html Ruby Regexp Class - Regular Expressions in Ruby] | * [http://www.regular-expressions.info/ruby.html Ruby Regexp Class - Regular Expressions in Ruby] | ||
* [http://railsforphp.com/2008/01/17/regular-expressions-in-ruby/ Rails for PHP Developers - Regular Expressions in Ruby] | |||
* [http://www.regular-expressions.info/java.html Using Regular Expressions in Java] | * [http://www.regular-expressions.info/java.html Using Regular Expressions in Java] | ||
* [http://java.sun.com/docs/books/tutorial/essential/regex/ java.util.regex] | * [http://java.sun.com/docs/books/tutorial/essential/regex/ java.util.regex] | ||
* [http://jakarta.apache.org/regexp/index.html Jakarta] | * [http://jakarta.apache.org/regexp/index.html Jakarta] | ||
* [http://www.brics.dk/~amoeller/automaton/ dk.brics.automaton] | * [http://www.brics.dk/~amoeller/automaton/ dk.brics.automaton] | ||
* Java Classes | |||
** [http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html String] | |||
** [http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html Pattern] | |||
** [http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html Matcher] | |||
** [http://java.sun.com/j2se/1.4.2/docs/api/java/util/Vector.html Vector] | |||
-- Michael Frisch (Tuesday, June 3, 2008) | -- Michael Frisch (Tuesday, June 3, 2008) |
Latest revision as of 23:35, 13 June 2008
Regular expressions in Ruby versus Java
Ruby supports regular expressions as a language feature without the inclusion of any special classes or modules. Java on the other hand does native regular expression support and it requires the use of special regular expression packages to use them.
If you just want to learn some patterns quickly, reference here.
Ruby regexp support in more depth
Since Ruby borrows it syntax from Perl the is nothing more than the simple syntax of any one of the following three (options are always optional):
/pattern/modifiers OR %r{pattern}options OR Regexp.new( 'pattern' [, options ])
This alone is what is needed to create an instance of the regular expression class. You can then use your regular expression in conjunction with a string class object to exercise the expression.
Java regexp support in more depth
Support for regexp
Java has been around for a while have has never had native regexp support. Because of this regular expression packages had to be created. There was no comprehensive support for regexp support from Java's main contributor, Sun, until Java 4. Because of this there are multiple 3rd party regexp packages for Java floating around:
- java.util.regex The most widely used for regular expressions now due to its inclusion in the JDK since Java 4. This document will assume from this point forward that we are using this package for our regular expressions in Java.
- Jakarta Around since 1996, Jakarta was donated to the Apache Software Foundation and is under an open-source, BSD style license
- dk.brics.automaton Automaton is known for being the fastest of the Java regexp implementations
- And the list goes on...
Classes with regular expression abilities
String class
The String class provides simple regular expression support. It is the quickest way to write code to do matching, replacement, or splitting on a string. However, it is not very fast and therefore should not be used if performance is a factor. The String classes regular expression matching also has a severe limitation. Any regular expression passed to it will be interpreted as if it has to span the whole string. IE: ^ is appended to the front and $ is appended to the tail of your expression.
Pattern class
The Pattern class is a compiled representation of a regular expression. A regular expression is input as a string then compiled so that it can used repeatedly by the Matches class or a single time to provide a single match. Using it for a single match is a fairly inefficient use of the class. Once compiled the Pattern class can be used to split a string into an array of matches.
Matcher class
The Matcher class is an engine that performs match operations on a string by interpreting a Pattern. A Matcher class is able to do matching and replacement. The Matcher class does have the ability to return all matches in a string, but only one string at a time.
Examples
Match a pattern
Lets if we can have Manslaughter without laughter.
Ruby
myStr = "Manslaughter" myStr =~ /laughter/ # this returns 1 since there is 1 instance instance of laughter in Manslaughter myStr =~ /hilarious/ # this returns nil because there is no instance of hilarious in Manslaughter
OR
myStr = "Manslaughter" /laughter/.match(myStr) # returns an instance of MatchData class because there is at least 1 match /hilarious/.match(myStr) # returns nil because there are 0 matches
Java
String myStr = "Manslaughter"; myStr.matches("/laughter/"); // this returns false because of Java auto appending ^ and $ to your regular expression myStr.matches("/.*laughter/"); // this returns true, but is not the way you'd think of doing it first
OR
String myStr = "Manslaughter"; Pattern.matches("/laughter/", myStr); // this returns true
OR
String myStr = "Manslaughter"; Pattern p = Pattern.compile("/laughter/"); Matcher m = p.matcher(myStr); m.matches(); // this returns true (this method was truly overkill, no pun intended)
Search for text and replace
Replace laughter in Manslaughter with ion
Ruby
myStr = "Manslaughter" myNewStr = myStr.sub(/laughter/,"ion") # myNewStr now contains Mansion
Java
String myStr = "Manslaughter"; String myNewStr = myStr.replaceFirst("/laughter/", "ion"); // myNewStr now contains Mansion
OR
String myStr = "Manslaughter"; Pattern p = Pattern.compile("/laughter/"); Matcher m = p.matcher(myString); String myNewStr = m.replaceFirst("ion"); // myNewStr now contains Mansion
Collect matches
Collect the individual words in a string in an array-like device
Ruby
myStr = "This is a sample" matches = myStr.scan(/\w+/) # matches is now an array with "This", "is", "a", and "sample" in it
Java
String myStr = "This is a sample"; String[] matches = myStr.split("/\s/"); /* matches is now an array with "This", "is", "a", and "sample" in it notice that it's a different regular expression from the first line though */
OR
String myStr = "This is a sample"; Pattern p = Pattern.compile("/\w+/"); Matcher m = p.matcher(myStr); Vector matches = new Vector(); // we have to use a vector instead of an array so that we can add elements while( m.find() ) { matches.add( m.group() ); } // Now down here matches is a vector with "This", "is", "a", and "sample" in it
References
- Ruby Regexp Class - Regular Expressions in Ruby
- Rails for PHP Developers - Regular Expressions in Ruby
- Using Regular Expressions in Java
- java.util.regex
- Jakarta
- dk.brics.automaton
- Java Classes
-- Michael Frisch (Tuesday, June 3, 2008)