CSC/ECE 517 Summer 2008/wiki1 1 mf

From Expertiza_Wiki
Jump to navigation Jump to search

Regular expressions in Ruby versus Java

Ruby supports regular expressions as a language feature without the inclusion of any special classes or modules. Java on the other hand does native regular expression support and it requires the use of special regular expression packages to use them.

Ruby regexp support in more depth

Since Ruby borrows it syntax from Perl the is nothing more than the simple syntax of:

/pattern/modifiers

This alone is what is needed to create an instance of the regular expression class. You can then use your regular expression in conjunction with a string class object to exercise the expression.

Java regexp support in more depth

Support for regexp

Java has been around for a while have has never had native regexp support. Because of this regular expression packages had to be created. There was no comprehensive support for regexp support from Java's main contributor, Sun, until Java 4. Because of this there are multiple 3rd party regexp packages for Java floating around:

  • java.util.regex The most widely used for regular expressions now due to its inclusion in the JDK since Java 4. This document will assume from this point forward that we are using this package for our regular expressions in Java.
  • Jakarta Around since 1996, Jakarta was donated to the Apache Software Foundation and is under an open-source, BSD style license
  • dk.brics.automaton Automaton is known for being the fastest of the Java regexp implementations
  • And the list goes on...

Classes with regular expression abilities

String class

The String class provides simple regular expression support. It is the quickest way to write code to do matching, replacement, or splitting on a string. However, it is not very fast and therefore should not be used if performance is a factor. The String classes regular expression matching also has a severe limitation. Any regular expression passed to it will be interpreted as if it has to span the whole string. IE: ^ is appended to the front and $ is appended to the tail of your expression.

Pattern class

The Pattern class is a compiled representation of a regular expression. A regular expression is input as a string then compiled so that it can used repeatedly by the Matches class or a single time to provide a single match. Using it for a single match is a fairly inefficient use of the class.

Matcher class

The Matcher class is an engine that performs match operations on a string by interpreting a Pattern. A Matcher class is able to do matching and replacement. The Matcher class does have the ability to return all matches in a string, but only one string at a time.

Examples

Match a pattern

Search for text and replace

Collect matches

References

-- Michael Frisch (Tuesday, June 3, 2008)