CSC/ECE 517 Fall 2009/wiki1b 5 13: Difference between revisions
No edit summary |
|||
Line 109: | Line 109: | ||
{| border="1" cellspacing="0" cellpadding="5" align="center" | {| border="1" cellspacing="0" cellpadding="5" align="center" | ||
} | ! Multi-core Architecture | ||
! Number of levels | |||
! Line Size | |||
! Cache Size | |||
! Associativity | |||
! Latency | |||
! Is the Level shared | |||
! Coherence Protocol used | |||
|- | |||
| AMD Opteron Processor | |||
| 2 | |||
| - | |||
| 64 Byte L1 Cache - Data and Instruction Cache Separated, 1024 KByte L2 | |||
| 2 Way Associative ECC Protected L1 Data Cache & Parity Protected Instruction Cache; <br>16 Way Associative Parity Protected L2 Cache | |||
| Two 64 bit operations per 3 cycle latency | |||
| No | |||
| Exclusive cache architecture | |||
|- | |||
| AMD Athlon X2 Dual Core | |||
| 2 | |||
| - | |||
| 64 Byte L1 Cache - Data and Instruction Cache Separated, 1024 KByte L2 | |||
| 2 Way Associative ECC Protected L1 Data Cache & Parity Protected Instruction Cache; <br>16 Way Associative Parity Protected L2 Cache | |||
| Two 64 bit operations per 3 cycle latency | |||
| No | |||
| Exclusive cache architecture | |||
|- | |||
| AMD Turin 64 Mobile | |||
| 2 | |||
| - | |||
| 64 Kbyte L1; Upto 1MByte of L2 with 512 Kbyte Options | |||
| 2-Way Associative ECC-Protected L1 Data Cache & Parity Protected L1 Instruction Cache; <br>16-Way Associative ECC-Protected L2 Cache | |||
| Two 64-bit operations per cycle, 3-cycle latency - With advanced branch prediction | |||
| No | |||
| Exclusive cache architecture—storage | |||
|- | |||
| AMD Sempron Processor | |||
| 2 | |||
| - | |||
| 64-Kbyte ECC-Protected L1 Data Cache && Parity-Protected Instruction Cache; <br>256-Kbyte ECC-Protected | |||
L2 Cache | |||
| 2-Way Associative L1 Cache ; 16-Way Associative L2 Cache | |||
| Two 64-bit operations per cycle, 3-cycle latency | |||
| No | |||
| Exclusive cache architecture—storage | |||
|- | |||
| AMD Athlon Duron Processor | |||
| 2 | |||
| - | |||
| Integrated 128-Kbyte L1 Cache and an exclusive 64-Kbyte L2 Cache | |||
| - | |||
| - | |||
| No | |||
| Exclsive cache architecture-storage | |||
|- | |||
| AMD Palemo Processor | |||
| 2 | |||
| - | |||
| 64 KByte L1 Data Cache & L1 Instruction Cache; <br> Unified 128 or 256 KByte L2 Cache | |||
| - | |||
| - | |||
| No | |||
| Inclusive | |||
|- | |||
| AMD Thoroughbred (TBRED) | |||
| 2 | |||
| - | |||
| 64 KByte L1 Data Cache & L1 Instruction Cache; <br> Unified 256 KByte full-speed L2 Cache | |||
| - | |||
| - | |||
| No | |||
| Inclusive | |||
|- | |||
| AMD Barton Processor | |||
| 2 | |||
| - | |||
| 64 KByte L1 Data Cache & L1 Instruction Cache; <br> Unified 512 KByte L2 Cache | |||
| - | |||
| - | |||
| No | |||
| - | |||
|- | |||
| AMD Thunderbird | |||
| 2 | |||
| - | |||
| - | |||
| 16-Way | |||
| - | |||
| - | |||
| - | |||
|- | |||
| CELL Processor (Playstation3 Processor) Manufactured by TOSHIBA, IBM and SONY | |||
| Power PC Core, which is at the centre of the Cell, contains 2 Levels ; <br> Each of the "surrounding" SPEs have just one level of Cache | |||
| - | |||
| 32 KByte Data Cache + 32 Kbyte Instruction Cache in the Power PC Core ; <br> The surrounding SPEs have 256 Kbyte Unified Cache | |||
| - | |||
| - | |||
| The L2 Cache of the Power PC Core is shared by the surrounding SPEs | |||
| - | |||
|- | |||
| AMD Athlon 64 X2 Dual Core - 4600+ | |||
| 2 | |||
| - | |||
| 128 KByte L1 Unified Cache ; 512 KByte L2 Unified Cache | |||
| - | |||
| - | |||
| No | |||
| - | |||
|- | |||
| Storm-1 Family by Stream Processors | |||
| 1 | |||
| - | |||
| 16 KByte L1 Data / Instruction Cache | |||
| - | |||
| 533 MHz Data Rate between L1 and DDR Memory | |||
| No | |||
| NA | |||
|- | |||
| UltraSPARC IV | |||
| 2 | |||
| 128 bytes to 512 bytes | |||
| 64KByte Dual L1 Data Cache; 32KByte L2 extendable up to 16MB | |||
| 2way set associative per core | |||
| - | |||
| No | |||
| - | |||
|- | |||
| UltraSPARC IV+ | |||
| 2 | |||
| - | |||
| 32 MByte L2 on-chip ; 32 MByte L3 External | |||
| - | |||
| - | |||
| No | |||
| - | |||
|- | |||
| UlatraSPARC T1 | |||
| 2 | |||
| 4 Banks for L2 | |||
| 8 KByte Data Cache ; 16 KByte Instruction Cache; <br> 3 MByte L2 Cache | |||
| 4-Way Set Associative for L1 ; 12-Way Set Associative for L2 Cache | |||
| - | |||
| No | |||
| - | |||
|- | |||
| Intel Pentium D Series | |||
| 3 (L1 + L2 + Execution Trace cache holding the decoded Micro-Ops) | |||
| - | |||
| 2 * 16 KByte of L1 Data Cache; <br> 2 * 2 MByte L2 Unified Cache | |||
| - | |||
| - | |||
| Yes | |||
| - | |||
|- | |||
| Intel Itanium 2 | |||
| 3 | |||
| - | |||
| 16 KByte L1 Instruction Cache ; 16 KByte L1 Data Cache; 265 KByte L2 Unified Cache; 3 - 9 MByte of L3 Unified Cache | |||
| 4-Way Set Associative L1 Cache ; 8-Way Set Associative L2 Cache | |||
| 6.4 - 2.1 GB/s transfer rate between L3 and External Memory | |||
| L2 and L3 Cache is Shared | |||
| - | |||
|- | |||
| CRAY X1 | |||
| 2 | |||
| - | |||
| 32 KByte Core L1 Unified Cache; 512 MByte of L2 Unified Cache | |||
| - | |||
| 76 GB/s - 50 GB/s for Loads and 26 GB/s for stores between the L2 Cache and the Memory | |||
| Yes | |||
| Integrated Vector Cache is used for Coherence and helps tolerate memory latency | |||
|} |
Revision as of 19:46, 20 September 2009
Regular Expressions:
A pattern matching language used to match user defined patterns within text strings/files. Regular expressions gained popularity with the Unix ed and grep (grep derived from ed's search command g/re/p[1]). Since then regular expressions have been incorporated into text editors, command utilities and programming languages.
Regular Expression Engines:
There are two prominent types of regular expression engines; DFA (Deterministic Finite Automaton) and NFA (Nondeterministic Finite Automaton). NFA is found primarily in modern scripting languages, while DFA is found in older parsers.
DFA (Deterministic Finite Automaton) : State machine based algorithm that keeps track of all matches in progress. As soon as a match is made it returns "leftmost match wins"[2]; with the exception of alternation matches (<substring1>|<substring2>). If all alternation options match then the longest match will be selected - "longest leftmost match wins". Non-greedy searches are not supported since they are nondeterministic in nature. ie A.c =~ AbcvAbcAdd returns Abc DFA Languages : lex, awk, vi(m)
NFA (Nondeterministic Finite Automaton) : Greedy search routine that keeps track where the algorithm branches to determine if the most greedy match succeeds; will roll back to original position for subsequent less greedy matches. Supports non-greedy matches (ie match 0 or 1 times). ie A.c =~ AbcvAbcAdd matches both occurrences of Abc in the string. NFA Types:
- Traditional NFA : gives preference to the earliest substring in alternation matches (dog|doggy). (dog|doggy) =~ doggy returns dog. (doggy|dog) =~ doggy returns doggy.
- POSIX NFA : handles alternation matches like DFA engines. ie (dog|doggy) =~ doggy return doggy.
Traditional NFA languages : Perl, Python, Ruby, Java, PHP, JavaScript
POSIX NFA : mawk[3]
DFA vs NFA
Since the DFA engine is deterministic its performance can be bounded by its search set. NFA performance isn't as well defined since, the NFA can roll back on itself to find a less greedy match. Effectively redoing the regular expression search for the less greedy case. Those interesting in the theory and performance differences should review the following page: http://swtch.com/~rsc/regexp/regexp1.html
Regular Expression Support in Programming Languages:
All examples are the same that want to get the month, day and year from the following format (mm-dd-yyyy).
Perl
Perl has built in language support for regular expressions without requiring additional package loads or imports. Regular expression matches and returns are stored in special after match variables that are considered "reserved" variable names ($1, $2... or \1, \2....).
Example
$date = '01-02-2001' $date =~ '(\d\d)-(\d\d)-(\d\d\d\d)' $1 => '01' $2 => '02' $3 => '2001'
Python
Python has built in support via the import of the 're' package that comes standard in the Python installation libraries. Regular expression matches and returns are stored in regular expression objects that are accessed through '.' notation.
Example
import re date = '01-02-2001' date_match = re.search(r'(\d\d)-(\d\d)-(\d\d\d\)', date) month = date_match.group(1) => '01' day = date_match.group(2) => '02' year = date_match.group(3) => '2001'
Python allows for regular expressions to be "pre-compiled" at run time through the use of the re.compile() function. "Pre-compiled" regular expressions will exist while they are scoped which can result in a performance increase if the same regex is going to be used multiple times.
Ruby
Ruby has built in language support for regular expressions without addition package loads or imports. Ruby being "Perl but even better"[4] supports Perl style =~ or =! regular expression matches as well as a more OO approach via <string>.match(). Regular expression returns are stored in after match variables as well as in return regular expression objects.
Option 1, use Ruby's version of Perl syntax:
date = '01-02-2001' date =~ '(\d\d)-(\d\d)-(\d\d\d\d)' $1 => '01' $2 => '02' $3 => '2001'
Option 2, use Ruby's OO version of regular expressions by using the STRING objects match method.
date_match = date.match('(\d\d)-(\d\d)-(\d\d\d\d)') month = date_match[1] => '01' day = date_match[2] => '02' year = date_match[3] => '2001'
Option 3, use Ruby's OO version of regular expressions by using the Regexp object.
date_match = Regexp.new('(\d\d)-(\d\d)-(\d\d\d\d)') date = date_match.match(date) month = date[1] => '01' day = date[2] => '02' year = date[3] => '2001'
Like the python re.compile, the use of Regexp.new/compile also scopes the regular expression object for repeated use in code blocks.
JavaScript
JavaScript has support for pattern matching within the String objects or if more advanced features are desired the RegExp object can be utilized. RegEx matches are returned as an array or -1 (-1 indicates no match).
Example:
var date = "01-02-2001"; var regex = new RegExp("(\d\d)-(\d\d)-(\d\d\d\d)"); var match_array = RegExp.exec(date) match_array[1] => '01' match_array[2] => '02' match_array[3] => '2001'
Java
Java doesn't have built in native support for regular expressions but Sun and others have developed Regular Expression packages that can be imported to allow for regular expression functionality. Since regular expressions are an add-on and Java is strictly typed there is more syntactic overhead to produce a Java based regular expression.
Example:
PatternObj date_pat = new PatternObj("(\d\d)-(\d\d)-(\d\d\d\d)"); MatcherObj date_mat = date_pat('01-02-2001'); date_mat.Group(1) => '01' date_mat.Group(2) => '02' date_mat.Group(3) => '2001'
PHP
PHP has built in support via the preg() method routine. Regular expression matches are returned in user defined arrays.
Example:
preg_match('(\d\d)-(\d\d)-(\d\d\d\d)', '01-02-2001', $date_mat) $date_mat[1] => '01' $date_mat[2] => '02' $date_mat[3] => '2001'
Below are some of the regular expressions Characters and support in different languages
Multi-core Architecture | Number of levels | Line Size | Cache Size | Associativity | Latency | Is the Level shared | Coherence Protocol used |
---|---|---|---|---|---|---|---|
AMD Opteron Processor | 2 | - | 64 Byte L1 Cache - Data and Instruction Cache Separated, 1024 KByte L2 | 2 Way Associative ECC Protected L1 Data Cache & Parity Protected Instruction Cache; 16 Way Associative Parity Protected L2 Cache |
Two 64 bit operations per 3 cycle latency | No | Exclusive cache architecture |
AMD Athlon X2 Dual Core | 2 | - | 64 Byte L1 Cache - Data and Instruction Cache Separated, 1024 KByte L2 | 2 Way Associative ECC Protected L1 Data Cache & Parity Protected Instruction Cache; 16 Way Associative Parity Protected L2 Cache |
Two 64 bit operations per 3 cycle latency | No | Exclusive cache architecture |
AMD Turin 64 Mobile | 2 | - | 64 Kbyte L1; Upto 1MByte of L2 with 512 Kbyte Options | 2-Way Associative ECC-Protected L1 Data Cache & Parity Protected L1 Instruction Cache; 16-Way Associative ECC-Protected L2 Cache |
Two 64-bit operations per cycle, 3-cycle latency - With advanced branch prediction | No | Exclusive cache architecture—storage |
AMD Sempron Processor | 2 | - | 64-Kbyte ECC-Protected L1 Data Cache && Parity-Protected Instruction Cache; 256-Kbyte ECC-Protected L2 Cache |
2-Way Associative L1 Cache ; 16-Way Associative L2 Cache | Two 64-bit operations per cycle, 3-cycle latency | No | Exclusive cache architecture—storage |
AMD Athlon Duron Processor | 2 | - | Integrated 128-Kbyte L1 Cache and an exclusive 64-Kbyte L2 Cache | - | - | No | Exclsive cache architecture-storage |
AMD Palemo Processor | 2 | - | 64 KByte L1 Data Cache & L1 Instruction Cache; Unified 128 or 256 KByte L2 Cache |
- | - | No | Inclusive |
AMD Thoroughbred (TBRED) | 2 | - | 64 KByte L1 Data Cache & L1 Instruction Cache; Unified 256 KByte full-speed L2 Cache |
- | - | No | Inclusive |
AMD Barton Processor | 2 | - | 64 KByte L1 Data Cache & L1 Instruction Cache; Unified 512 KByte L2 Cache |
- | - | No | - |
AMD Thunderbird | 2 | - | - | 16-Way | - | - | - |
CELL Processor (Playstation3 Processor) Manufactured by TOSHIBA, IBM and SONY | Power PC Core, which is at the centre of the Cell, contains 2 Levels ; Each of the "surrounding" SPEs have just one level of Cache |
- | 32 KByte Data Cache + 32 Kbyte Instruction Cache in the Power PC Core ; The surrounding SPEs have 256 Kbyte Unified Cache |
- | - | The L2 Cache of the Power PC Core is shared by the surrounding SPEs | - |
AMD Athlon 64 X2 Dual Core - 4600+ | 2 | - | 128 KByte L1 Unified Cache ; 512 KByte L2 Unified Cache | - | - | No | - |
Storm-1 Family by Stream Processors | 1 | - | 16 KByte L1 Data / Instruction Cache | - | 533 MHz Data Rate between L1 and DDR Memory | No | NA |
UltraSPARC IV | 2 | 128 bytes to 512 bytes | 64KByte Dual L1 Data Cache; 32KByte L2 extendable up to 16MB | 2way set associative per core | - | No | - |
UltraSPARC IV+ | 2 | - | 32 MByte L2 on-chip ; 32 MByte L3 External | - | - | No | - |
UlatraSPARC T1 | 2 | 4 Banks for L2 | 8 KByte Data Cache ; 16 KByte Instruction Cache; 3 MByte L2 Cache |
4-Way Set Associative for L1 ; 12-Way Set Associative for L2 Cache | - | No | - |
Intel Pentium D Series | 3 (L1 + L2 + Execution Trace cache holding the decoded Micro-Ops) | - | 2 * 16 KByte of L1 Data Cache; 2 * 2 MByte L2 Unified Cache |
- | - | Yes | - |
Intel Itanium 2 | 3 | - | 16 KByte L1 Instruction Cache ; 16 KByte L1 Data Cache; 265 KByte L2 Unified Cache; 3 - 9 MByte of L3 Unified Cache | 4-Way Set Associative L1 Cache ; 8-Way Set Associative L2 Cache | 6.4 - 2.1 GB/s transfer rate between L3 and External Memory | L2 and L3 Cache is Shared | - |
CRAY X1 | 2 | - | 32 KByte Core L1 Unified Cache; 512 MByte of L2 Unified Cache | - | 76 GB/s - 50 GB/s for Loads and 26 GB/s for stores between the L2 Cache and the Memory | Yes | Integrated Vector Cache is used for Coherence and helps tolerate memory latency |