CSC/ECE 517 Fall 2009/wiki1b 5 j8
Regular Expressions
Who Compare the support for regular expressions in Ruby, Python, PHP, Perl 5, and any other appropriate scripting language with each other. Also compare the syntactic features in these languages with Java's package-based support. What features and syntax do the languages have in common? Are some features supported by some languages and not by others? How robust and easy-to-use are regular expressions in all these languages?
Regular expressions are a critical part of most modern programming languages especially ones that deal string processing as a core part of their functionality. Although using regular expressions can change from language to language, the general principle is the same and similar syntax can generally used across the board.
Usage
Perl
Perl has regular expresions built into the language itself via the '=~' operator. A simple match could be done like this:
$string = "lee"; if ($string =~ m/l.*/) { print "Matches"; }
This would print "Matches" since 'lee' starts with an 'l' and has zero or more characters after the 'l'. Replacements can be done simply by using 's' to indicate substitutions:
$string = "peewee"; $string =~ s/e+/aa/g; print $string
This would print "paawaa". The 'g' following the regular expression indicates a global replacement, simply omit this to only replace the first instance of 'e+", which would result in print "paawee".
Java
Unlike many languages Java does not have built-in language support for regular expressions. It instead uses Pattern objects to process regular expressions.
Pattern patt = Pattern.compile("l.*"); Matcher match = patt.matcher("lee"); return match.matches();
This would return true. Since the Pattern object is created with the regular expression, it can be reused with different inputs for increased speed.
Pattern patt = Pattern.compile("l.*"); Matcher match = patt.matcher("eel"); return match.matches();
This would return false since 'eel' does not start with an 'l'. If a developer simply wants to a regular expression once and does not care to reuse the Pattern, he or she can simply use the 'matches' static method within Patthern:
Pattern.matches("l.*", "lee");
or they can simply do operations on the String:
String str = "lee"; str.matches("l.*");
Replacements are done using:
String str = "peewee"; str.replaceAll("e+", "aa")
This would change the sting 'peewee' to 'paawaa', by replacing one or more instance of the letter 'e' with two 'a's. If you just wanted to replace the first instace you would use:
String str = "peewee"; str.replaceAll("e+", "aa")
which would change the string to 'paawee'.
Ruby
Ruby's support for regular expressions is very similar to perl's, but with some differences. Matches are done in the exact same manner:
str = "lee" if (str =~ /l.*/)
print "Matches"
end
This would print "Matches".
Substitutions are one point where ruby greatly differs from perl. Instead of using the "s/regex/replace/" format, the functions sub, gsub, sub!, and gsub! can be called on any string. sub and gsub simply return a new string with the specified substitution, whereas sub! and gsub! do an in place substitution. gsub differs from sub in that it does a global replacement instead of simply replacing the first instance.
str = "peewee" print str.gsub(/e+/, "aa")
would print 'paawaa'.
Python
Python, similarly to java, does not have built in language support for regular expressions. It does however, like java, provide support for regular expressions through built in libraries. In python this is the 're' library. A simple match test can be done as followed:
import re if re.match("l.*", "lee"): print "Match"
The above would print "Match". For substitution, python uses the "sub" function:
re.sub("e+", "aa", "peewee")
Would would return "paawaa". To replace only the first instace of 'ee', you would simply pass in the optional argument of '1':
re.sub("e+", "aa", "peewee", 1)
Which would return "paawee". The 1 argument tells the sub method to only substitute the first match.
Php
Php also does not have built in language support for regular expressions. To do a matching search simply use the preg_match function:
if (preg_match("/l.*", "lee")) echo "Match";
would print "Match". The preg_match function is syntastically equivelant to perl's regular expressions. If POSIX regular expressions are preferred, simply use the "eregi" function which takes the same arguments. Substitutions are done via the preg_replace function:
preg_replace( "/e+/" , "aa", "peewee")
This would return "paawaa". Similarly to python, if you provide an optional argument of '1', only the first instance of the pattern is replaced.
Advanced Features
Unicode
All of the above support Unicode and internationalized strings. There are however many caveats:
* Ruby did not have support until version 1.9. * Perl did not have support until version 5.6 * PHP supports it, but requires the use of a /u flag.
References
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm http://docs.python.org/library/re.html http://yokolet.blogspot.com/2008/09/ruby-19s-unicode-regular-expression.html http://www.regular-expressions.info/unicode.html