CSC/ECE 517 Fall 2009/wiki1b 5 kf): Difference between revisions
No edit summary |
|||
Line 23: | Line 23: | ||
== Syntax | == Syntax == | ||
Line 38: | Line 38: | ||
A variety of programming languages have strings,integers,arrays and so on.But the efficient tool of regular expression is | A variety of programming languages have strings,integers,arrays and so on.But the efficient tool of regular expression is built into only scripting languages such as Ruby,JavaScript,Perl etc and there is no need to do anything to enable the support for Regular expressions.While some languages like c#,java etc require to import a library by writing an import statement in the source code.However,there are some languages not having any support for regular expression.For these languages,it is required to compile and link in the regular expression support by the programmer.Some libraries are available for multiple languages.But some languages can have a choice of different libraries.Having the built in support for regular expressions makes the work of pattern matching and substitution convenient as well as concise. | ||
built into only scripting languages such as Ruby,JavaScript,Perl etc and there is no need to do | |||
anything to | |||
it is required to compile and link in the regular expression support by the programmer.Some libraries are available for multiple | |||
languages.But some languages can have a choice of different libraries.Having the built in support for regular | |||
expressions makes the work of pattern matching and substitution convenient as well as concise. | |||
'''''JavaScript:''''' | '''''JavaScript:''''' | ||
Line 61: | Line 56: | ||
import re | import re | ||
It is required to import the re module into the script.Only then,Python's regular expression | It is required to import the ''re'' module into the script.Only then,the functions for Python's regular expression can be used. | ||
'''''PHP:''''' | '''''PHP:''''' |
Revision as of 16:43, 18 September 2009
Introduction
Regular expressions are a key to powerful ,flexible and efficient processing of text.Regular expressions are themselves,with a general pattern notation,like a mini programming language that allows you to describe and parse text.With additional support provided by the particular tool being used, regular expressions can add,remove,isolate and generally fold,spindle and mutilate all kinds of text and data.It might be as simple as a text editor's search command or as powerful as a full text processing language.They are a specific kind of text pattern that can be used with many modern applications and programming languages.They can be used to verify whether input fits into the text pattern,to find text that matches the pattern within a larger body of text,to replace text matching the pattern with other text or rearranged bits of the matched text and to split a block of text into a list of subtexts.
Overview
A regular expression is a simple way to specify a pattern of characters that is to be matched in a string.Regular expressions have become a standard feature in a variety of languages and popular tools such as Perl,Ruby,Python,Java,PHP,VB.Net MySQL.The .NET types that support regular expressions are based upon Perl5 regular expressions and support both search and search/replace functions.The regular expressions can be used for tasks such as: validation of text inputs,the parsing of textual data into better structured forms,replacement of patterns of text in a document.
Syntax
Regular expressions consist of normal characters,character classes,wildcard characters and quantifiers.A normal character also known as a literal can be matched as it is.For example-if a pattern consists of "ab",then only the input sequence "ab" could match it.The characters are specified by using the standard escape sequences beginning with a '\'.A character class is a group of characters which is shown by putting the characters in the class between the brackets.For example-the class [abc] matches a,b or c.The wildcard character is the dot(.) which can match any character.A quantifier is used to determine the number of times an expression is matched.+,*,? are known as quantifiers
Importing the Regular expression library
A variety of programming languages have strings,integers,arrays and so on.But the efficient tool of regular expression is built into only scripting languages such as Ruby,JavaScript,Perl etc and there is no need to do anything to enable the support for Regular expressions.While some languages like c#,java etc require to import a library by writing an import statement in the source code.However,there are some languages not having any support for regular expression.For these languages,it is required to compile and link in the regular expression support by the programmer.Some libraries are available for multiple languages.But some languages can have a choice of different libraries.Having the built in support for regular expressions makes the work of pattern matching and substitution convenient as well as concise.
JavaScript:
Regular expression support is built in.
Ruby:
Regular Expression support is built in.
Perl:
Regular Expression support is built in.
Python:
import re
It is required to import the re module into the script.Only then,the functions for Python's regular expression can be used.
PHP:
The preg functions are built in and available in PHP 4.2.0 and later.
C#:
using System.Text.RegularExpressions;
VB.Net:
Imports System.Text.RegularExpressions
Java:
import java.util.regex.*;
Creating Regular Expression Objects
The regular expression has to be compiled first before the regular expression engine can match a regular expression to a string.This happens at the time when the application is running during which the regular expression constructor parses the string holding the regular expression.The string is then converted into a tree structure or a state machine.This tree is then traversed by the function performing the match of actual pattern.The programming languages having support for literal regular expressions compile the code when execution reaches the regular expression operator.
JavaScript:
To use the same object again,it can be assigned to a variable.If it is stored in a string variable,the RegExp() constructor to compile the regular expression can be used.
Ruby:
In Ruby,the Regular expressions are considered as objects of type Regexp.The objects can be created by calling the constructor or by using the literal forms: /pattern/ and %r{pattern}
m=Regexp.new('n') -> /n/
After the object has been created ,it can be used to match against a string by using :
Regexp#match(string)
or using the operators : =~(positive match) and !~(negative match).
name="Rains" name=~/n/ -> 3
A pattern that matches a string which contains the text Perl or the text Python can be written as :
/Perl|Python/
Repetition within the patterns can also be specified.Another feature is the matching of one of a group of characters within a pattern.For example:character classes such as \s matches a whitespace character,a dot can match (almost) any character.Ruby is quite smilar to Java Script.The only difference is that the name of the class is Regexp as one word in Ruby and is RegExp with camel caps in JavaScript.
myregexp = /regex pattern/;
Regular expression retrieved from user input,as a string stored in the variable userinput:
myregexp=Regexp.new(userinput);
PERL:
The data processing in Perl program relies heavily on regular expressions.Perl provides regular expression operators meshed with the constructs and operators that make up the Perl language.The literal regular expressions are used with the pattern-matching operator and the substitution operator.The pattern matching operator starts with m and contains two forward slashes with the required regex between them.Forward slashes should be escaped with the backslash.While using any type of opening or closing punctuation(parentheses,braces or brackets) as a delimiter,they must be matched up.
Example:
m{regex}.Using any other puctuation requires the writing of that same character twice.
The substitution operator starts with s.If we are using brackets or similar puctuation as the delimiter,we need to have two pairs:
s[regex][replace]
For rest of the punctuation,it should be used three times:
s/regex/replace/.
Perl clearly differentiates between dollars used as anchors and dollars used for variabe interpolation.In Perl, @ sign is used for variable interpolation.It should be escaped in literal regular expressions in Perl code.The variety and options offered by Perl's operators and functions are its biggest strength as well as its greatest weakness.
To compile a regular expression ,"quote regex" operator can be used and assigned to a variable.The same syntax as match operator is used except that it starts with qr instead of m.
$myregex = qr/regex pattern/
The Regular expression is retrieved from the user input ,and stored as a string in the variable $userinput:
$myregex=qr/$userinput/
Python:
Python has support for regular expressions through its re module. In Python,the literal regular expressions are required to be passed as strings.There are various ways to quote strings provided by Python ,depending on the characters.The different ways of quoting may reduce the number of characters needed to escape with backslashes.The raw strings in Python don't require to escape any characters.For example:r"\d+" instead of "\\d+".But the raw strings cannot be used when there are both single and double quoted strings in our regular expression.In such a case,the raw string can be triple quoted.
reobj=re.compile("regex pattern")
The Regular Expression retrieved from user input,as a string stored in the variable userinput:
reobj = re.compile(userinput)
PHP:
The three regex engines in PHP are the "preg","ereg" and "mb_ereg" engines.Two of them implement POSIX ERE while the third is based on PCRE.PHP does not have a native regular expression type unlike JavaScript and Perl.The regular expressions are required to be quoted as strings.Within the given string,the regular expression should be quoted as a Perl style literal regular expression.For example-while writing /regex/ in Perl,in PHP the string becomes '/regex/'.PHP does support both single quoted and double quoted strings.
Regular expressions are compiled at runtime.PHP has a large cache which consists of 4096 entries.So,it can be said that a pattern of string is compiled for only the first time it occurs. But PHP does not provide a way to store a compiled regular expression into a variable.So,it has to be passed a string to one of preg functions.
Java:
Regular expression processing is supported by the java.util.regex package.There are two classes which work together and support regular expression processing:Pattern and Matcher.Pattern is used to define the regular expression and the pattern is matched with another sequence called Matcher.The pattern can be created by calling the compile() factory method.
static Pattern compile(String pattern)
Once the pattern object is created,it is used to create a matcher by calling the matcher() factory method.
Matcher matcher(CharSequence str)
str is the character sequence that the pattern will be matched against.If there is a syntax error,the Pattern.compile() factory throws a PatternSyntaxException.