CSC/ECE 517 Fall 2009/wiki1b 5 j8
Regular Expressions
Regular expressions are a critical part of most modern programming languages especially ones that deal string processing as a core part of their functionality. They allow a developer to easy match or replace a string using patterns that range from very simple to very complex. Although using regular expressions can change from language to language, the general principle is the same and similar syntax can generally used across the board.
Usage
Perl
Perl has regular expresions built into the language itself via the '=~' operator. A simple match could be done like this:
$string = "lee"; if ($string =~ m/l.*/) { print "Matches"; }
This would print "Matches" since 'lee' starts with an 'l' and has zero or more characters after the 'l'. Replacements can be done simply by using 's' to indicate substitutions:
$string = "peewee"; $string =~ s/e+/aa/g; print $string
This would print "paawaa". The 'g' following the regular expression indicates a global replacement, simply omit this to only replace the first instance of 'e+", which would result in print "paawee". [1]
Java
Unlike many languages Java does not have built-in language support for regular expressions. It instead uses Pattern objects to process regular expressions.
Pattern patt = Pattern.compile("l.*"); Matcher match = patt.matcher("lee"); return match.matches();
This would return true. Since the Pattern object is created with the regular expression, it can be reused with different inputs for increased speed.
Pattern patt = Pattern.compile("l.*"); Matcher match = patt.matcher("eel"); return match.matches();
This would return false since 'eel' does not start with an 'l'. If a developer simply wants to a regular expression once and does not care to reuse the Pattern, he or she can simply use the 'matches' static method within Patthern:
Pattern.matches("l.*", "lee");
or they can simply do operations on the String:
String str = "lee"; str.matches("l.*");
Replacements are done using:
String str = "peewee"; str.replaceAll("e+", "aa")
This would change the sting 'peewee' to 'paawaa', by replacing one or more instance of the letter 'e' with two 'a's. If you just wanted to replace the first instace you would use:
String str = "peewee"; str.replaceAll("e+", "aa")
which would change the string to 'paawee'.
Ruby
Ruby's support for regular expressions is very similar to perl's, but with some differences. Matches are done in the exact same manner:
str = "lee" if (str =~ /l.*/) print "Matches" end
This would print "Matches".
Substitutions are one point where ruby greatly differs from perl. Instead of using the "s/regex/replace/" format, the functions sub, gsub, sub!, and gsub! can be called on any string. sub and gsub simply return a new string with the specified substitution, whereas sub! and gsub! do an in place substitution. gsub differs from sub in that it does a global replacement instead of simply replacing the first instance.
str = "peewee" print str.gsub(/e+/, "aa")
would print 'paawaa'.
Python
Python, similarly to java, does not have built in language support for regular expressions. It does however, like java, provide support for regular expressions through built in libraries. In python this is the 're' library. A simple match test can be done as followed:
import re if re.match("l.*", "lee"): print "Match"
The above would print "Match". For substitution, python uses the "sub" function:
re.sub("e+", "aa", "peewee")
Would would return "paawaa". To replace only the first instace of 'ee', you would simply pass in the optional argument of '1':
re.sub("e+", "aa", "peewee", 1)
Which would return "paawee". The 1 argument tells the sub method to only substitute the first match.
Php
Php also does not have built in language support for regular expressions. To do a matching search simply use the preg_match function:
if (preg_match("/l.*", "lee")) echo "Match";
would print "Match". The preg_match function is syntastically equivelant to perl's regular expressions. Substitutions are done via the preg_replace function:
preg_replace( "/e+/" , "aa", "peewee")
This would return "paawaa". Similarly to python, if you provide an optional argument of '1', only the first instance of the pattern is replaced.
Ease of Use
Although ease of use is largely dependent upon the user, generally any language that has built in language support for regular expressions are easier to use. Going by this metric it is no surprise that Perl would be the easiest of them all to use. Since ruby has some built in language support, it would come next, and the rest would probably be rated about the same.
Advanced Features
Unicode
All of the above support Unicode and internationalized strings. There are however many caveats:
- Ruby did not have support until version 1.9.
- Perl did not have support until version 5.6
- PHP supports it, but requires the use of a /u flag.
POSIX Syntax Support
POSIX-style regular expressions are older and much more limited than Perl style syntax, but are still in use today.
- PHP supports them simply by using the "eregi" function which can be used similarly to the preg function. PHP support was deprecated in PHP 5.3.
- None of the other languages discussed in this document support POSIX style regular expressions.