|
|
Line 1: |
Line 1: |
| == Introduction ==
| |
|
| |
|
| Regular expressions are a key to powerful ,flexible and efficient processing .Regular
| |
| expressions are themselves,with a general pattern notation ,like a mini programming language
| |
| that allows you to describe and parse text.With additional support provided by the particular tool being used,
| |
| regular expressions can add,remove,isolate and generally fold,spindle and mutilate all kinds
| |
| of text and data.It might be as simple as a text editor's search command or as powerful
| |
| as a full text processing language.They are a specific kind of text pattern that can be used with many modern
| |
| applications and programming languages.They can be used to verify whether input fits into the text pattern,to find
| |
| text that matches the pattern within a larger body of text,to replace text matching the pattern with other text or
| |
| rearranged bits of the matched textand to split a block of text into a list of subtexts.
| |
|
| |
|
| |
| == ''Overview'' ==
| |
|
| |
| Regular expressions and the way they are used can vary wildly from tool to tool.When looking at regular expressions
| |
| in the context of their host language or tool,there are three broad issues to consider:
| |
|
| |
| 1)what metacharacters are supported and their meaning.Often known as the regex flavor.
| |
|
| |
| 2)how Regular expressions interface with the language or tool,such as how to specify regular expression operations,what
| |
| operations are allowed and what text they operate on.
| |
|
| |
| 3)How the regular expression engine actually goes about applying a regular expression to some text.the method that
| |
| the language or tool designer uses to implement the regular expression engine has a strong influence on the results one
| |
| might expect from any given regular expression.
| |
|
| |
|
| |
| ==
| |
|
| |
| Importing the Regular expression library ==
| |
|
| |
| Some of the programmng languages like Ruby have the built in support for regular expressions.Hence there is no need to do
| |
| anything to import.While some languages like c# ,java etc require to import a library by writing an import statement
| |
| in the source code.But there are some languages not having any support for regular expression.For these languages,
| |
| it is required to compile an link in the regular expression support by the programmer.
| |
| JavaScript
| |
| Regular expression support is built in.
| |
|
| |
| == Ruby ==
| |
|
| |
| Regular Expression support is built in.
| |
|
| |
|
| |
| == Python ==
| |
|
| |
| import re
| |
| It is required to import the re module into the script.Only then,Python's regular expression functions can be used.
| |
|
| |
|
| |
| == Perl ==
| |
|
| |
| Regular Expression support is built in.
| |
|
| |
|
| |
| == PHP ==
| |
|
| |
| The preg functions are built in and avaulable in PHP 4.2.0 and later.
| |
|
| |
|
| |
| == C# ==
| |
|
| |
| using System.Text.RegularExpressions;
| |
|
| |
|
| |
| == VB.Net ==
| |
|
| |
| Imports System.Text.RegularExpressions
| |
|
| |
|
| |
| == Java ==
| |
|
| |
| import java.util.regex.*;
| |
|
| |
|
| |
| == Creating Regular Expression Objects ==
| |
| The regular expression has to be compild first before the regular expression engine can match a regular expression
| |
| to a string.THis happens at the time when the application is running during which the regular expression constructor
| |
| parses the string holding the regular expression.The string is then converted into a tree structure or a state machine.
| |
| This tree is then traversed by the function performing the match of actual pattern .The programming languages having
| |
| support for literal regular expressions copile the code when execution reaches the regular expression operator.
| |
|
| |
|
| |
| == Ruby ==
| |
|
| |
| In Ruby,a special syntax is used to declare regular expressions.A regular expression s placed between two forward
| |
| slashes.And if there is a forward slash within the regular expression,that is escaped using a backslash.If we don't
| |
| want to escape forward slashes ,we can prefix it with a %r and then using any puctuation character a the delimiter.
| |
| Ruby is quite smilar to Java Script.The only difference is that the name of the class is Regexp as one word in Ruby
| |
| an is RegExp with camel caps in JavaScript.
| |
| myregexp = /regex pattern/;
| |
| Regular expression retrieved from user input ,as a string stored in the variable userinput:
| |
| myregexp=Regexp.new(userinput);
| |
|
| |
|
| |
| ==
| |
| PERL ==
| |
|
| |
| The data processing in Perl relies heavily on regular expressions.The literal regular expressions are used with the
| |
| pattern-matching operator and the substitution operator.The pattern matching operator starts with m and contains two
| |
| forward slashes with the required regex between them.Forward slashes should be escaped with the backslash.While using
| |
| any type of opening or closing punctuation(parentheses,braces or brackets) as a delimiter,they must be matched up.Ex-
| |
| m{regex}.Using any other puctuation requires the writing of that same character twice.
| |
| The substitution operator starts with s.If we are using brackets or similar puctuation as the delimiter,
| |
| we need to have two pairs:s[regex][replace].For rest of the puctuation ,it should be used three times:s/regex/replace/.
| |
| Perl clearly differentiates between dollars used as anchors and dollars used for variabe interpolation.In Perl, @ sign is
| |
| used for variable interpolation .It should be escaped in literal regular expressions in Perl code.
| |
| to compile a regular expression ,"quote regex" operator can be ued and assigned to a variable.The same syntax
| |
| as match operator is used except that it starts with qr instead of m.
| |
| $myregex = qr/regex pattern/
| |
| The REgular expression is retrieved from the user input ,and stored as a string in the variable $userinput:
| |
|
| |
| $myregex=qr/$userinput/
| |
|
| |
|
| |
| ==
| |
| PHP
| |
| ==
| |
| The three regex engines in PHP are the "preg","ereg" and "mb_ereg" engines.PHP does not have a native regular expresson
| |
| type unlike Java Sript and Perl.The regular expressions are required to be quoted as strings.Within the given string,
| |
| the regular expression should be quoted as a Perl style literal regular expression.For ex-while writing /regex/ in
| |
| Perl,in PHP the string becomes '/regex/'.PHP does support both single -quoted and double quoted strings.
| |
| But PHP does not provide a way to store a compiled regular expression into a variable.So,it has to be passed a
| |
| string to one of preg functions.
| |
|
| |
|
| |
| == Python ==
| |
|
| |
| In Python,the literal regular expressions are required to be passed as strings.There are various ways to quote strings
| |
| provided by Python ,depending on the characters.The different ways of quoting may reduce the number of characters needed
| |
| to escape with backslashes.The raw strings in Python don't require to escape any characters.For ex-r"\d+" instead of
| |
| "\\d+".But the raw strings cannot be used when there are both single and double quoted strings in our regular expression.
| |
| In such a case,the raw string can be triple quoted.
| |
| reobj=re.compile("regex pattern")
| |
| The Regular Expression retrieved from user input,as a string stored in the variable userinput:
| |
| reobj = re.copile(userinput)
| |
|
| |
|
| |
| == JavaScript ==
| |
|
| |
| To use the same object again,it can be assigned to a variable.If it is stored in a string variable,the RegExp() constructor
| |
| to compile the regular expression can be used.
| |
|
| |
|
| |
| ==
| |
| Java ==
| |
|
| |
| The objects can be created with the Pattern.compile() class factory.Creation requires only a parameter:a string with
| |
| a regular expression.If there is a syntax error,the Pattern.compile() factory throws a Pattern SyntaxException.
| |