CSC/ECE 517 Fall 2009/wiki1a 1 c9)
Introduction
Regular expressions are a key to powerful ,flexible and efficient processing .Regular expressions are themselves,with a general pattern notation ,like a mini programming language that allows you to describe and parse text.With additional support provided by the particular tool being used, regular expressions can add,remove,isolate and generally fold,spindle and mutilate all kinds of text and data.It might be as simple as a text editor's search command or as powerful as a full text processing language.They are a specific kind of text pattern that can be used with many modern applications and programming languages.They can be used to verify whether input fits into the text pattern,to find text that matches the pattern within a larger body of text,to replace text matching the pattern with other text or rearranged bits of the matched textand to split a block of text into a list of subtexts.
Overview
Regular expressions and the way they are used can vary wildly from tool to tool.When looking at regular expressions in the context of their host language or tool,there are three broad issues to consider: 1)what metacharacters are supported and their meaning.Often known as the regex flavor.
2)how Regular expressions interface with the language or tool,such as how to specify regular expression operations,what operations are allowed and what text they operate on.
3)How the regular expression engine actually goes about applying a regular expression to some text.the method that the language or tool designer uses to implement the regular expression engine has a strong influence on the results one might expect from any given regular expression.
==
Importing the Regular expression library ==
Some of the programmng languages like Ruby have the built in support for regular expressions.Hence there is no need to do anything to import.While some languages like c# ,java etc require to import a library by writing an import statement in the source code.But there are some languages not having any support for regular expression.For these languages, it is required to compile an link in the regular expression support by the programmer. JavaScript Regular expression support is built in.
Ruby
Regular Expression support is built in.
Python
import re It is required to import the re module into the script.Only then,Python's regular expression functions can be used.
Perl
Regular Expression support is built in.
PHP
The preg functions are built in and avaulable in PHP 4.2.0 and later.
C#
using System.Text.RegularExpressions;
VB.Net
Imports System.Text.RegularExpressions
Java
import java.util.regex.*;
Creating Regular Expression Objects
The regular expression has to be compild first before the regular expression engine can match a regular expression to a string.THis happens at the time when the application is running during which the regular expression constructor parses the string holding the regular expression.The string is then converted into a tree structure or a state machine. This tree is then traversed by the function performing the match of actual pattern .The programming languages having support for literal regular expressions copile the code when execution reaches the regular expression operator.
Ruby
In Ruby,a special syntax is used to declare regular expressions.A regular expression s placed between two forward slashes.And if there is a forward slash within the regular expression,that is escaped using a backslash.If we don't want to escape forward slashes ,we can prefix it with a %r and then using any puctuation character a the delimiter. Ruby is quite smilar to Java Script.The only difference is that the name of the class is Regexp as one word in Ruby an is RegExp with camel caps in JavaScript. myregexp = /regex pattern/; Regular expression retrieved from user input ,as a string stored in the variable userinput: myregexp=Regexp.new(userinput);
==
PERL ==
The data processing in Perl relies heavily on regular expressions.The literal regular expressions are used with the pattern-matching operator and the substitution operator.The pattern matching operator starts with m and contains two forward slashes with the required regex between them.Forward slashes should be escaped with the backslash.While using any type of opening or closing punctuation(parentheses,braces or brackets) as a delimiter,they must be matched up.Ex- m{regex}.Using any other puctuation requires the writing of that same character twice.
The substitution operator starts with s.If we are using brackets or similar puctuation as the delimiter,
we need to have two pairs:s[regex][replace].For rest of the puctuation ,it should be used three times:s/regex/replace/. Perl clearly differentiates between dollars used as anchors and dollars used for variabe interpolation.In Perl, @ sign is used for variable interpolation .It should be escaped in literal regular expressions in Perl code.
to compile a regular expression ,"quote regex" operator can be ued and assigned to a variable.The same syntax
as match operator is used except that it starts with qr instead of m. $myregex = qr/regex pattern/ The REgular expression is retrieved from the user input ,and stored as a string in the variable $userinput:
$myregex=qr/$userinput/
==
PHP
==
The three regex engines in PHP are the "preg","ereg" and "mb_ereg" engines.PHP does not have a native regular expresson type unlike Java Sript and Perl.The regular expressions are required to be quoted as strings.Within the given string, the regular expression should be quoted as a Perl style literal regular expression.For ex-while writing /regex/ in Perl,in PHP the string becomes '/regex/'.PHP does support both single -quoted and double quoted strings.
But PHP does not provide a way to store a compiled regular expression into a variable.So,it has to be passed a
string to one of preg functions.
Python
In Python,the literal regular expressions are required to be passed as strings.There are various ways to quote strings provided by Python ,depending on the characters.The different ways of quoting may reduce the number of characters needed to escape with backslashes.The raw strings in Python don't require to escape any characters.For ex-r"\d+" instead of "\\d+".But the raw strings cannot be used when there are both single and double quoted strings in our regular expression. In such a case,the raw string can be triple quoted. reobj=re.compile("regex pattern") The Regular Expression retrieved from user input,as a string stored in the variable userinput: reobj = re.copile(userinput)
JavaScript
To use the same object again,it can be assigned to a variable.If it is stored in a string variable,the RegExp() constructor to compile the regular expression can be used.
==
Java ==
The objects can be created with the Pattern.compile() class factory.Creation requires only a parameter:a string with a regular expression.If there is a syntax error,the Pattern.compile() factory throws a Pattern SyntaxException.