CSC/ECE 506 Fall 2007/wiki1 3 as1506: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 3: Line 3:
== TPC benchmarks ==
== TPC benchmarks ==


The goal of TPC benchmarks is to define a set of functional requirements that can be run on any transaction processing system, regardless of hardware or operating system. This methodology allows any vendor, using "proprietary" or "open" systems, to implement the TPC benchmark and guarantees to end-users that they will see an apples-to-apples comparison.  
The goal of TPC (Transaction Processing Performance Council)
benchmarks is to define a set of functional requirements that can be
run on any transaction processing system, regardless of hardware or
operating system. This methodology allows any person, to implement the
TPC benchmark and guarantees to end-users that they will see an
like-to-like comparison. In other words, it acts as an efficient way
of comparison bringing different processing systems on a level playing
field.


One of the OLTP system benchmark in this suite, the TPC-C, simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.
One of the OLTP (Online Transaction Processing) system benchmark in
this suite, the TPC-C, simulates a complete environment where a number
of terminal operators executes transactions against a database. The
benchmark is centered around the principal activities (transactions)
of an order-entry environment. These transactions include entering and
delivering orders, recording payments, checking the status of orders,
and monitoring the level of stock at the warehouses.


The throughput of TPC-C is a direct result of the level of activity at the terminals. Each system has ten terminals and all five transactions are available at each terminal. A remote terminal emulator (RTE) is used to maintain the required mix of transactions over the performance measurement period. This mix represents the complete business processing of an order as it is entered, paid for, checked, and delivered. More specifically, the required mix is defined to produce an equal number of New-Order and Payment transactions and to produce one Delivery transaction, one Order-Status transaction, and one Stock-Level transaction for every ten New-Order transactions.
The throughput of TPC-C is a direct result of the level of activity at
the terminals. Each system has ten terminals and all five transactions
are available at each terminal. A remote terminal emulator (RTE) is
used to maintain the required mix of transactions over the performance
measurement period. This mix represents the complete business
processing of an order as it is entered, paid for, checked, and
delivered. More specifically, the required mix is defined to produce
an equal number of New-Order and Payment transactions and to produce
one Delivery transaction, one Order-Status transaction, and one
Stock-Level transaction for every ten New-Order transactions.


The tpm-C metric is the number of New-Order transactions executed per minute. Given the required mix and the wide range of complexity and types among the transactions, this metric more closely simulates a complete business activity, not just one or two transactions or computer operations. For this reason, the tpm-C metric is considered to be a measure of business throughput. The tpm-C, does not just measure a few basic computer or database transactions, but measures how many complete business operations can be processed per minute. This new benchmark should give users a more extensive, more complex yardstick for measuring OLTP system performance.
The tpm-C metric is the number of New-Order transactions executed per
minute. This metric more closely simulates a complete business
activity, not just one or two transactions or computer operations. For
this reason, the tpm-C metric is considered to be a measure of
business throughput. The tpm-C, does not just measure a few basic
computer or database transactions, but measures how many complete
business operations can be processed per minute. This new benchmark
should give users a more extensive, more complex yardstick for
measuring OLTP system performance.


The current version of TPC-C benchmark is Version 5.9. Compared to the version mentioned in the text, pricing changes included reducing maintenance support pricing to 3 years down from 5 years, 24x7 maintenance up from 8x5, removing terminal network pricing (hubs, switches), and allowing pricing quotes from web pages and print materials. Runtime changes included reducing the disk space requirements to 60 days from 180 days, increasing the measurement interval to 2 hours up from 20 minutes, reporting checkpoint durations, and reporting the number of lost connections of users during the measurement interval.
The current version of TPC-C benchmark is Version 5.9. Compared to the
version mentioned in the text (David E. Culler, Jaswinder Pal Singh,
with Anoop Gupta, Parallel Computer Architecture: A Hardware/Software
Approach) , pricing changes included reducing maintenance support
pricing to 3 years down from 5 years, 24x7 maintenance up from 8x5,
removing terminal network pricing (hubs, switches), and allowing
pricing quotes from web pages and print materials. Runtime changes
included reducing the disk space requirements to 60 days from 180
days, increasing the measurement interval to 2 hours up from 20
minutes, reporting checkpoint durations, and reporting the number of
lost connections of users during the measurement interval.
 
For the complete description regarding the TPC-C benchmark, please visit http://www.tpc.org/tpcc/detail.asp


=== Top 10 TPC-C according to performance ===
=== Top 10 TPC-C according to performance ===


{| border="1" cellspacing="0"  
{| border="1" cellspacing="0"  
Line 59: Line 100:
|-
|-
|}
|}
 
<br>
<br>
[[Image:Abc1.JPG]]


=== Top 10 TPC-C according to price/performance ===
=== Top 10 TPC-C according to price/performance ===
Line 107: Line 150:
|}
|}


== Processor and memory specifications ==
To get a complete list of the TPC-C results, please visit http://www.tpc.org/information/results_spreadsheet.asp
 
== Processor and memory ==


The ever increasing gap between processor and memory speeds, have elevated memory system design as the critical performance factor for commercial workloads. Memory design depends on the number of factors like data/instruction locality, branches etc. A well designed memory system will have fewer cache misses and reduces branch miss rates.  
The ever increasing gap between processor and memory speeds, have elevated memory system design as the critical performance factor for commercial workloads. Memory design depends on the number of factors like data/instruction locality, branches etc. A well designed memory system will have fewer cache misses and reduces branch miss rates.  
Line 218: Line 263:


While the same set of data shows the superiority of performance of IBM eServer p5 595 (L3cache/proc = 36MB memory = 2048GB) as compared to IBM System p5 570 Model 9117-570 (L3cache/proc = 1MB memory = 2 GB) even though both systems have the number of processors(2) and same processor type(3.2GHz 1MB L3 Xeon Processor)
While the same set of data shows the superiority of performance of IBM eServer p5 595 (L3cache/proc = 36MB memory = 2048GB) as compared to IBM System p5 570 Model 9117-570 (L3cache/proc = 1MB memory = 2 GB) even though both systems have the number of processors(2) and same processor type(3.2GHz 1MB L3 Xeon Processor)
This indicates that memory design is more critical to the performance of the system rather than processor technology.


== Speedup ==
== Speedup ==


The number of processors in a system increases the parallelism of the system. This is clearly seen from the table below which shows the comparison of various IBM systems. Here it should be noted that we consider the IBM only to make a fair comparison across the various systems. The table shows that the speedup when we go from 2 processors to 8 processors is less due to significant penalty in the parallel implementation. However for the same step, when we go from 8 processors to 32 processors the speedup is almost double indicating a very good scalability. Thus, parallelism is almost indispensable for commercial applications. In fact, several vendors supplying database hardware or software offer multiprocessor systems that provide performance substantially beyond their uniprocessor product.
The number of processors in a system increases the parallelism of the system. This is clearly seen from the table below which shows the comparison of various IBM systems. Here it should be noted that we consider IBM systems to make a fair comparison across the various systems. The table shows that the speedup when we go from 2 processors to 8 processors is less due to significant penalty in the parallel implementation. However for the same step, when we go from 8 processors to 32 processors the speedup is almost double indicating a very good scalability. Thus, parallelism is almost indispensable for commercial applications. In fact, several vendors supplying database hardware or software offer multiprocessor systems that provide performance substantially beyond their uniprocessor product.


{| border="1" cellspacing="0"
{| border="1" cellspacing="0"
Line 265: Line 312:
!3.934354238
!3.934354238
|-
|-
|}
|}


== References ==
== References ==


1. Parallel Computer Arhcitecture: A hardware/software approach - David E. Culler, Jaswinder Pal Singh, Anoop Gupta
1. Parallel Computer Arhcitecture: A hardware/software approach - David E. Culler, Jaswinder Pal Singh, Anoop Gupta<br>
2. http://www.tpc.org
2. http://www.tpc.org<br>
3.http://barroso.org/publications/isca98_1.pdf
3.http://barroso.org/publications/isca98_1.pdf<br>
4.http://en.wikipedia.org/wiki/OLTP<br>

Latest revision as of 03:38, 11 September 2007

Commercial computers, since long, have been using parallel architectures for its high end applications. However unlike the requirements of scientific or engineering computing where majority of the work done depends on the computing ability, commercial applications require that the system supports maximum number of transactions at any given time so that it can support large database and service large number of customers. The class of such systems is referred to on-line transaction processing.

TPC benchmarks

The goal of TPC (Transaction Processing Performance Council) benchmarks is to define a set of functional requirements that can be run on any transaction processing system, regardless of hardware or operating system. This methodology allows any person, to implement the TPC benchmark and guarantees to end-users that they will see an like-to-like comparison. In other words, it acts as an efficient way of comparison bringing different processing systems on a level playing field.

One of the OLTP (Online Transaction Processing) system benchmark in this suite, the TPC-C, simulates a complete environment where a number of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

The throughput of TPC-C is a direct result of the level of activity at the terminals. Each system has ten terminals and all five transactions are available at each terminal. A remote terminal emulator (RTE) is used to maintain the required mix of transactions over the performance measurement period. This mix represents the complete business processing of an order as it is entered, paid for, checked, and delivered. More specifically, the required mix is defined to produce an equal number of New-Order and Payment transactions and to produce one Delivery transaction, one Order-Status transaction, and one Stock-Level transaction for every ten New-Order transactions.

The tpm-C metric is the number of New-Order transactions executed per minute. This metric more closely simulates a complete business activity, not just one or two transactions or computer operations. For this reason, the tpm-C metric is considered to be a measure of business throughput. The tpm-C, does not just measure a few basic computer or database transactions, but measures how many complete business operations can be processed per minute. This new benchmark should give users a more extensive, more complex yardstick for measuring OLTP system performance.

The current version of TPC-C benchmark is Version 5.9. Compared to the version mentioned in the text (David E. Culler, Jaswinder Pal Singh, with Anoop Gupta, Parallel Computer Architecture: A Hardware/Software Approach) , pricing changes included reducing maintenance support pricing to 3 years down from 5 years, 24x7 maintenance up from 8x5, removing terminal network pricing (hubs, switches), and allowing pricing quotes from web pages and print materials. Runtime changes included reducing the disk space requirements to 60 days from 180 days, increasing the measurement interval to 2 hours up from 20 minutes, reporting checkpoint durations, and reporting the number of lost connections of users during the measurement interval.

For the complete description regarding the TPC-C benchmark, please visit http://www.tpc.org/tpcc/detail.asp

Top 10 TPC-C according to performance

Company Name tpmC Price/tpmC in $
HP HP Integrity Superdome- Itanium2/1.6Ghz/24MB iL3 4,092,799 2.93
IBM IBM System p5 595 4,033,378 2.97
IBM IBM eServer p5 595 3,210,540 5.07
IBM IBM System p 570 1,616,162 3.54
IBM IBM eServer p5 595 1,601,784 5.05
FUJITSU PRIMEQUEST 540 16p/32c 1,238,579 3.94
HP HP Integrity Superdome 1,231,433 4.82



Top 10 TPC-C according to price/performance

Company Name tpmC Price/tpmC in $
HP HP ProLiant ML350G5 100,926 .74
DELL PowerEdge 2900/1/2.33Ghz/2x4M 69,564 .91
DELL PowerEdge 2900/3Ghz/4M 65,833 .98
DELL PowerEdge 2800/1/2.8Ghz/2+2M 38,622 .99
DELL PowerEdge 2800/1/3.6Ghz/2M 28,244 1.29
DELL PowerEdge 2900/1/2.66Ghz/2x4M 126,371 1.33
DELL PowerEdge 2800/1/3.4Ghz/2M 28,122 1.40

To get a complete list of the TPC-C results, please visit http://www.tpc.org/information/results_spreadsheet.asp

Processor and memory

The ever increasing gap between processor and memory speeds, have elevated memory system design as the critical performance factor for commercial workloads. Memory design depends on the number of factors like data/instruction locality, branches etc. A well designed memory system will have fewer cache misses and reduces branch miss rates. Memory sizes are in the range of few GB's to up to TB's.

Following table shows the memory and processor resources in some of the today's systems used for commercial workloads.

System Processor Number of L3 cache/ proc. Memory tpmC
processors MB GB
HP Integrity Superdome- Itanium2/1.6Ghz/24MB iL3 1.6GHz Intel Itanium 2 64 24 2048 4,092,799
IBM System p5 595 2.3GHz POWER5+ 32 36 2048 4,033,378
IBM eServer p5 595 3.2GHz 1MB L3 Xeon Processor 32 36 2048 3,210,540
IBM System p 570 4.7GHz POWER6 8 32 768 1,616,162
IBM eServer p5 595 3.2GHz 1MB L3 Xeon Processor 2 36 2048 1,601,784
PRIMEQUEST 540 16p/32c 1.6GHz Dual-Core Intel Itanium2 16 24 1024 1,238,579
HP Integrity Superdome Intel Itanium 2 9M CPUs at 1.6GHz 64 9 1024 1,231,433
HP Integrity rx5670 Cluster–Itanium2/1.5 GHz-64p/6 1.5GHz Itanium 2 6M w/ 6MB Cache 64 6 768 1,184,893
IBM eServer pSeries 690 Model 7040-681 2.8 GHz Xeon W/512KB L2 Cache 2 0.5 2 1,025,486
IBM System p5 570 Model 9117-570 3.2GHz 1MB L3 Xeon Processor 2 1 2 1,025,169
HP Integrity rx6600 Dual Core Itanium 2 Processor 9050 4 24 192 344,928
HP Integrity rx8620 – Itanium2/1.6 GHz-16p/16c Intel Itanium2 - 1.6 GHz 16 6 256 332265

From the above table it can be seen that,
performance of HP Integrity Superdome with a 1.6GHz processor > performance of IBM System p5 595 with 2.3GHz processor > performance of IBM eServer p5 595 with 3.2GHz processor.

While the same set of data shows the superiority of performance of IBM eServer p5 595 (L3cache/proc = 36MB memory = 2048GB) as compared to IBM System p5 570 Model 9117-570 (L3cache/proc = 1MB memory = 2 GB) even though both systems have the number of processors(2) and same processor type(3.2GHz 1MB L3 Xeon Processor)

This indicates that memory design is more critical to the performance of the system rather than processor technology.

Speedup

The number of processors in a system increases the parallelism of the system. This is clearly seen from the table below which shows the comparison of various IBM systems. Here it should be noted that we consider IBM systems to make a fair comparison across the various systems. The table shows that the speedup when we go from 2 processors to 8 processors is less due to significant penalty in the parallel implementation. However for the same step, when we go from 8 processors to 32 processors the speedup is almost double indicating a very good scalability. Thus, parallelism is almost indispensable for commercial applications. In fact, several vendors supplying database hardware or software offer multiprocessor systems that provide performance substantially beyond their uniprocessor product.

System Number of tpmC Speedup
processors
IBM System p5 570 Model 9117-570 2 1,025,169 1
IBM eServer pSeries 690 Model 7040-681 2 1,025,486 1.000309217
IBM eServer p5 595 2 1,601,784 1.562458482
IBM System p 570 8 1,616,162 1.576483487
IBM eServer p5 595 32 3,210,540 3.131717795
IBM System p5 595 32 4,033,378 3.934354238

References

1. Parallel Computer Arhcitecture: A hardware/software approach - David E. Culler, Jaswinder Pal Singh, Anoop Gupta
2. http://www.tpc.org
3.http://barroso.org/publications/isca98_1.pdf
4.http://en.wikipedia.org/wiki/OLTP