Fast data crunching; a lot of work done per unit of time.
Historically,
|-------------|
| Processor 1 |-----------o-------------o-------------o
|-------------| | | |
| | |
| | |
|-------------| | | |
| Processor 2 |-----------o-------------o-------------o
|-------------| | | |
| | |
| | |
|-------------| | | |
| Processor 3 |-----------o-------------o-------------o
|-------------| | | |
. . .
. . .
. . .
|-------------| | | |
| Processor N |-----------o-------------o-------------o
|-------------| | | |
| | |
| | |
|----------| |----------| |----------|
| Memory | | Memory | | Memory |
| Block 1 | | Block 2 | | Block 3 |
|----------| |----------| |----------|
Decoding circuitry for Dividing Shared-Memory M into R blocks
R x f(M/R)
Switches(relatively inexpensive) N x R
Fully interconnected N processors N x (N-1)/2 lines
/ 2i for 0<=i<=N/2-1
j = |
\ 2i+1-N for N/2<=i<=N-1
|-----------| |-----------| |-----------|
| Processor | | Processor | | Processor |
| 1 | | 2 | . . . | N |
|-----------| |-----------| |-----------|
| Memory | | Memory | | Memory |
| 1 | | 2 | . . . | N |
|-----------| |-----------| |-----------|
^ ^ ^
| | |
| | |
| | |
v v v
|----------------------------------------------|
| |
| ORIGIN 2000 Shared-Memory |
| |
|----------------------------------------------|
|----------------------------------------------|
| SHRED MEMORY |
| |
| OR |
| |
| INTERCONNECTED NETWORK |
| |
|----------------------------------------------|
| | |
|DATA |DATA |DATA
|STREAM |STREAM |STREAM
| 1 | 2 | N
v v v
|-----------| |-----------| |-----------|
| Processor | | Processor | | Processor |
| 1 | | 2 | . . . | N |
|-----------| |-----------| |-----------|
^ ^ ^
| | |
|INSTRUCTION |INSTRUCTION |INSTRUCTION
|STREAM |STREAM |STREAM
| 1 | 2 | N
| | |
|-----------| |-----------| |-----------|
| Control | | Control | | Control |
| 1 | | 2 | . . . | N |
|-----------| |-----------| |-----------|
|----------------------------------------------|
| Closely Connected Switch |
| |
| OR |
| |
| INTERCONNECTED NETWORK |
| (Fast-Ethernet, ATM, etc.) |
| |
|----------------------------------------------|
^ ^ ^
| | |
|DATA |DATA |DATA
|EXCHANGE |EXCHANGE |EXCHANGE
| 1 | 2 | N
v v v
|-----------| |-----------| |-----------|
| Processor | | Processor | | Processor |
| 1 | | 2 | . . . | N |
|-----------| |-----------| |-----------|
| Memory | | Memory | | Memory |
| 1 | | 2 | . . . | N |
|-----------| |-----------| |-----------|
Machine Convex C3820 Cray C90/6256 IBM SP2 SGI Origin 2000 Number of CPU 2 6 32 8 CPU speed - MHz 62.5 250 66.7 200 Data cache size N.A. N.A. 128/256K 4M L2 Memory size - MB 1024 2048 512/node 8198 Vector processing Yes Yes No No Disk storage - GB 27 110 2/node 80
SP1 SP2-thin SP2-wide
RS6K equiv. model no. 370 390 590
CPU speed, MHz 62.5 66.7 66.7
Data Cache size, bytes 32K 64K 256K
Cache line size, bytes 64 128 256
NO. OF CACHE LINES 128X4 128X4 256X4
TLB ENTRIES 128 512 512
"TLB SPACE", BYTES 512K 2M 2M
MAIN MEMORY, BYTES 64M 128M >=256M
The 64KB data cache is organized into four 16KB congruence classes. Each congruence class consists of 128 lines,and each line is 128 bytes. Logically, virtual storage is similar divided into congruence classes of 16KB. Any memory location in virtual storage can be associated with one of a group of four cache lines out of 512 cache lines available.
Speed-up
where
We need further take into account of the parallel overhead introduced by parallel execution:
where