CSC/ECE 517 Fall 2009/wiki1b 5 kf): Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
Line 83: Line 83:
== Creating Regular Expression Objects ==
== Creating Regular Expression Objects ==


The regular expression has to be compiled first before the regular expression engine can match a regular expression
The regular expression has to be compiled first before the regular expression engine can match a regular expression to a string.This happens at the time when the application is running during which the regular expression constructor parses the string holding the regular expression.The string is then converted into a tree structure or a state machine.This tree is then traversed by the function performing the match of actual pattern.The programming languages having support for literal regular expressions compile the code when execution reaches the regular expression operator.
to a string.This happens at the time when the application is running during which the regular expression constructor
parses the string holding the regular expression.The string is then converted into a tree structure or a state machine.
This tree is then traversed by the function performing the match of actual pattern.The programming languages having  
support for literal regular expressions compile the code when execution reaches the regular expression operator.


'''''JavaScript:'''''
'''''JavaScript:'''''


To use the same object again,it can be assigned to a variable.If it is stored in a string variable,the RegExp() constructor  
To use the same object again,it can be assigned to a variable.If it is stored in a string variable,the ''RegExp()'' constructor to compile the regular expression can be used.
to compile the regular expression can be used.


'''''Ruby:'''''
'''''Ruby:'''''


In Ruby,the Regular expressions are considered as objects of type Regexp.The objects can be created by calling the  
In Ruby,the Regular expressions are considered as objects of type ''Regexp''.The objects can be created by calling the constructor or by using the literal forms:
constructor or by using the literal forms /pattern/ and %r{pattern}
/pattern/ and %r{pattern}
m=Regexp.new('n') -> /n/
                          m=Regexp.new('n') -> /n/
After the object has been created ,it can be used to match against a string by using Regexp#match(string) or using the operators =~(positive match) and !~(negative match).
After the object has been created ,it can be used to match against a string by using :
name="Rains"
                          Regexp#match(string)  
name=~/n/ -> 3
or using the operators :
A pattern that matches a string which contains the text Perl or the text Python can be written as  
'' =~(positive match) and !~(negative match).''
/Perl|Python/
                          name="Rains"
Repetition within the patterns can also be specified.Another feature is the matching of one of a group of characters within a pattern.
                          name=~/n/ -> 3
For example-character classes such as \s matches a whitespace character,a dot can match (almost) any character.
A pattern that matches a string which contains the text Perl or the text Python can be written as :
Ruby is quite smilar to Java Script.The only difference is that the name of the class is Regexp as one word in Ruby
                          /Perl|Python/
and is RegExp with camel caps in JavaScript.
Repetition within the patterns can also be specified.Another feature is the matching of one of a group of characters within a pattern.For example:character classes such as \s matches a whitespace character,a dot can match (almost) any character.Ruby is quite smilar to Java Script.The only difference is that the name of the class is ''Regexp'' as one word in Ruby and is ''RegExp'' with camel caps in JavaScript.
myregexp = /regex pattern/;
                          myregexp = /regex pattern/;
Regular expression retrieved from user input ,as a string stored in the variable userinput:
Regular expression retrieved from user input,as a string stored in the variable userinput:
myregexp=Regexp.new(userinput);
                          myregexp=Regexp.new(userinput);


'''''PERL:'''''
'''''PERL:'''''


The data processing in Perl program relies heavily on regular expressions.Perl provides regular expression operators meshed with  
The data processing in Perl program relies heavily on regular expressions.Perl provides regular expression operators meshed with the constructs and operators that make up the Perl language.The literal regular expressions are used with the pattern-matching operator and the substitution operator.The pattern matching operator starts with m and contains two forward slashes with the required regex between them.Forward slashes should be escaped with the backslash.While using any type of opening or closing punctuation(parentheses,braces or brackets) as a delimiter,they must be matched up.
the constructs and operators that make up the Perl language.The literal regular expressions are used with the  
 
pattern-matching operator and the substitution operator.The pattern matching operator starts with m and contains two  
Example:
forward slashes with the required regex between them.Forward slashes should be escaped with the backslash.While using
 
any type of opening or closing punctuation(parentheses,braces or brackets) as a delimiter,they must be matched up.Ex-
                          m{regex}.Using any other puctuation requires the writing of that same character twice.
m{regex}.Using any other puctuation requires the writing of that same character twice.
                
               The substitution operator starts with s.If we are using brackets or similar puctuation as the delimiter,
The substitution operator starts with s.If we are using brackets or similar puctuation as the delimiter,we need to have two pairs:
we need to have two pairs:s[regex][replace].For rest of the punctuation,it should be used three times:s/regex/replace/.
                          s[regex][replace]
Perl clearly differentiates between dollars used as anchors and dollars used for variabe interpolation.In Perl, @ sign is  
For rest of the punctuation,it should be used three times:
used for variable interpolation.It should be escaped in literal regular expressions in Perl code.The  variety and options
                          s/regex/replace/.
offered by Perl's operators and functions are its biggest strength as well as its greatest weakness
Perl clearly differentiates between dollars used as anchors and dollars used for variabe interpolation.In Perl, @ sign is used for variable interpolation.It should be escaped in literal regular expressions in Perl code.The  variety and options offered by Perl's operators and functions are its biggest strength as well as its greatest weakness.
               To compile a regular expression ,"quote regex" operator can be used and assigned to a variable.The same syntax
                
as match operator is used except that it starts with qr instead of m.
To compile a regular expression ,"''quote regex''" operator can be used and assigned to a variable.The same syntax as match operator is used except that it starts with ''qr'' instead of ''m''.
$myregex = qr/regex pattern/
                            $myregex = qr/regex pattern/
The Regular expression is retrieved from the user input ,and stored as a string in the variable $userinput:
The Regular expression is retrieved from the user input ,and stored as a string in the variable $userinput:
                            $myregex=qr/$userinput/


$myregex=qr/$userinput/


'''''Python:'''''
'''''Python:'''''


Python has support for regular expressions through its re module. In Python,the literal regular expressions are required to be passed as strings.There are various ways to quote strings provided by Python ,depending on the characters.The different ways of quoting may reduce the number of characters needed to escape with backslashes.The raw strings in Python don't require to escape any characters.For ex-r"\d+" instead of "\\d+".But the raw strings cannot be used when there are both single and double quoted strings in our regular expression.In such a case,the raw string can be triple quoted.
Python has support for regular expressions through its re module. In Python,the literal regular expressions are required to be passed as strings.There are various ways to quote strings provided by Python ,depending on the characters.The different ways of quoting may reduce the number of characters needed to escape with backslashes.The raw strings in Python don't require to escape any characters.For example:''r"\d+"'' instead of ''"\\d+"''.But the raw strings cannot be used when there are both single and double quoted strings in our regular expression.In such a case,the raw string can be triple quoted.
reobj=re.compile("regex pattern")
                          reobj=re.compile("regex pattern")
The Regular Expression retrieved from user input,as a string stored in the variable userinput:
The Regular Expression retrieved from user input,as a string stored in the variable userinput:
reobj = re.compile(userinput)
                          reobj = re.compile(userinput)


'''''PHP:'''''
'''''PHP:'''''


The three regex engines in PHP are the "preg","ereg" and "mb_ereg" engines.Two of them implement POSIX ERE while the third is  
The three regex engines in PHP are the "''preg''","''ereg''" and "''mb_ereg''" engines.Two of them implement POSIX ERE while the third is based on PCRE.PHP does not have a native regular expression type unlike JavaScript and Perl.The regular expressions are required to be quoted as strings.Within the given string,the regular expression should be quoted as a Perl style literal regular expression.For example-while writing ''/regex/'' in Perl,in PHP the string becomes'' '/regex/'''.PHP does support both single quoted and double quoted strings.
based on PCRE.PHP does not have a native regular expression type unlike JavaScript and Perl.The regular expressions are required to be quoted as strings.Within the given string,the regular expression should be quoted as a Perl style literal regular expression.For example-while writing /regex/ in Perl,in PHP the string becomes '/regex/'.PHP does support both single quoted and double quoted strings.
 
            Regular expressions are compiled at runtime.PHP has a large cache which consists of 4096 entries.So,it can be said that a pattern of string is compiled for only the first time it occurs. But PHP does not provide a way to store a compiled regular expression into a variable.So,it has to be passed a string to one of preg functions.
Regular expressions are compiled at runtime.PHP has a large cache which consists of 4096 entries.So,it can be said that a pattern of string is compiled for only the first time it occurs. But PHP does not provide a way to store a compiled regular expression into a variable.So,it has to be passed a string to one of preg functions.


'''''Java:'''''
'''''Java:'''''


Regular expression processing is supported by the java.util.regex package.There are two classes which work together and support regular expression processing:Pattern and Matcher.Pattern is used to define the regular expression and the pattern is matched with another sequence called Matcher.The pattern can be created by calling the compile() factory method.
Regular expression processing is supported by the ''java.util.regex package''.There are two classes which work together and support regular expression processing:Pattern and Matcher.Pattern is used to define the regular expression and the pattern is matched with another sequence called Matcher.The pattern can be created by calling the compile() factory method.
 
                          static Pattern compile(String pattern)
static Pattern compile(String pattern)
 
Once the pattern object is created,it is used to create a matcher by calling the matcher() factory method.


Matcher matcher(CharSequence str)
Once the pattern object is created,it is used to create a matcher by calling the ''matcher()'' factory method.
                          Matcher matcher(CharSequence str)


str is the character sequence that the pattern will be matched against.If there is a syntax error,the Pattern.compile() factory throws a PatternSyntaxException.
str is the character sequence that the pattern will be matched against.If there is a syntax error,''the Pattern.compile()'' factory throws a ''PatternSyntaxException.''

Revision as of 15:19, 18 September 2009

Introduction

Regular expressions are a key to powerful ,flexible and efficient processing of text.Regular expressions are themselves,with a general pattern notation,like a mini programming language that allows you to describe and parse text.With additional support provided by the particular tool being used, regular expressions can add,remove,isolate and generally fold,spindle and mutilate all kinds of text and data.It might be as simple as a text editor's search command or as powerful as a full text processing language.They are a specific kind of text pattern that can be used with many modern applications and programming languages.They can be used to verify whether input fits into the text pattern,to find text that matches the pattern within a larger body of text,to replace text matching the pattern with other text or rearranged bits of the matched text and to split a block of text into a list of subtexts.


Overview

A regular expression is a simple way to specify a pattern of characters that is to be matched in a string.Regular expressions have become a standard feature in a variety of languages and popular tools such as Perl,Ruby,Python,Java,PHP,VB.Net MySQL.The .NET types that support regular expressions are based upon Perl5 regular expressions and support both search and search/replace functions.The regular expressions can be used for tasks such as: validation of text inputs,the parsing of textual data into better structured forms,replacement of patterns of text in a document.


Syntax of Regular Expression

Regular expressions consist of normal characters,character classes,wildcard characters and quantifiers.A normal character also known as a literal can be matched as it is.For example-if a pattern consists of "ab",then only the input sequence "ab" could match it.The characters are specified by using the standard escape sequences beginning with a '\'.A character class is a group of characters which is shown by putting the characters in the class between the brackets.For example-the class [abc] matches a,b or c.The wildcard character is the dot(.) which can match any character.A quantifier is used to determine the number of times an expression is matched.+,*,? are known as quantifiers


Importing the Regular expression library

A variety of programming languages have strings,integers,arrays and so on.But the efficient tool of regular expression is built into only scripting languages such as Ruby,JavaScript,Perl etc and there is no need to do anything to import in these languages.While some languages like c#,java etc require to import a library by writing an import statement in the source code.However,there are some languages not having any support for regular expression.For these languages, it is required to compile and link in the regular expression support by the programmer.Some libraries are available for multiple languages.But some languages can have a choice of different libraries.Having the built in support for regular expressions makes the work of pattern matching and substitution convenient as well as concise.

JavaScript:

Regular expression support is built in.

Ruby:

Regular Expression support is built in.

Perl:

Regular Expression support is built in.

Python:

import re

It is required to import the re module into the script.Only then,Python's regular expression functions can be used.

PHP:

The preg functions are built in and available in PHP 4.2.0 and later.

C#:

using System.Text.RegularExpressions;

VB.Net:

Imports System.Text.RegularExpressions

Java:

import java.util.regex.*;


Creating Regular Expression Objects

The regular expression has to be compiled first before the regular expression engine can match a regular expression to a string.This happens at the time when the application is running during which the regular expression constructor parses the string holding the regular expression.The string is then converted into a tree structure or a state machine.This tree is then traversed by the function performing the match of actual pattern.The programming languages having support for literal regular expressions compile the code when execution reaches the regular expression operator.

JavaScript:

To use the same object again,it can be assigned to a variable.If it is stored in a string variable,the RegExp() constructor to compile the regular expression can be used.

Ruby:

In Ruby,the Regular expressions are considered as objects of type Regexp.The objects can be created by calling the constructor or by using the literal forms: /pattern/ and %r{pattern}

                         m=Regexp.new('n') -> /n/

After the object has been created ,it can be used to match against a string by using :

                         Regexp#match(string) 

or using the operators : =~(positive match) and !~(negative match).

                         name="Rains"
                         name=~/n/ -> 3

A pattern that matches a string which contains the text Perl or the text Python can be written as :

                         /Perl|Python/

Repetition within the patterns can also be specified.Another feature is the matching of one of a group of characters within a pattern.For example:character classes such as \s matches a whitespace character,a dot can match (almost) any character.Ruby is quite smilar to Java Script.The only difference is that the name of the class is Regexp as one word in Ruby and is RegExp with camel caps in JavaScript.

                         myregexp = /regex pattern/;

Regular expression retrieved from user input,as a string stored in the variable userinput:

                         myregexp=Regexp.new(userinput);

PERL:

The data processing in Perl program relies heavily on regular expressions.Perl provides regular expression operators meshed with the constructs and operators that make up the Perl language.The literal regular expressions are used with the pattern-matching operator and the substitution operator.The pattern matching operator starts with m and contains two forward slashes with the required regex between them.Forward slashes should be escaped with the backslash.While using any type of opening or closing punctuation(parentheses,braces or brackets) as a delimiter,they must be matched up.

Example:

                         m{regex}.Using any other puctuation requires the writing of that same character twice.
              

The substitution operator starts with s.If we are using brackets or similar puctuation as the delimiter,we need to have two pairs:

                         s[regex][replace]

For rest of the punctuation,it should be used three times:

                         s/regex/replace/.

Perl clearly differentiates between dollars used as anchors and dollars used for variabe interpolation.In Perl, @ sign is used for variable interpolation.It should be escaped in literal regular expressions in Perl code.The variety and options offered by Perl's operators and functions are its biggest strength as well as its greatest weakness.

To compile a regular expression ,"quote regex" operator can be used and assigned to a variable.The same syntax as match operator is used except that it starts with qr instead of m.

                           $myregex = qr/regex pattern/

The Regular expression is retrieved from the user input ,and stored as a string in the variable $userinput:

                           $myregex=qr/$userinput/


Python:

Python has support for regular expressions through its re module. In Python,the literal regular expressions are required to be passed as strings.There are various ways to quote strings provided by Python ,depending on the characters.The different ways of quoting may reduce the number of characters needed to escape with backslashes.The raw strings in Python don't require to escape any characters.For example:r"\d+" instead of "\\d+".But the raw strings cannot be used when there are both single and double quoted strings in our regular expression.In such a case,the raw string can be triple quoted.

                          reobj=re.compile("regex pattern")

The Regular Expression retrieved from user input,as a string stored in the variable userinput:

                          reobj = re.compile(userinput)

PHP:

The three regex engines in PHP are the "preg","ereg" and "mb_ereg" engines.Two of them implement POSIX ERE while the third is based on PCRE.PHP does not have a native regular expression type unlike JavaScript and Perl.The regular expressions are required to be quoted as strings.Within the given string,the regular expression should be quoted as a Perl style literal regular expression.For example-while writing /regex/ in Perl,in PHP the string becomes '/regex/'.PHP does support both single quoted and double quoted strings.

Regular expressions are compiled at runtime.PHP has a large cache which consists of 4096 entries.So,it can be said that a pattern of string is compiled for only the first time it occurs. But PHP does not provide a way to store a compiled regular expression into a variable.So,it has to be passed a string to one of preg functions.

Java:

Regular expression processing is supported by the java.util.regex package.There are two classes which work together and support regular expression processing:Pattern and Matcher.Pattern is used to define the regular expression and the pattern is matched with another sequence called Matcher.The pattern can be created by calling the compile() factory method.

                          static Pattern compile(String pattern)

Once the pattern object is created,it is used to create a matcher by calling the matcher() factory method.

                          Matcher matcher(CharSequence str)

str is the character sequence that the pattern will be matched against.If there is a syntax error,the Pattern.compile() factory throws a PatternSyntaxException.