Posts

Showing posts from January, 2015

SUMMARY OF TRENDS IN COMPUTER ARCHITECTURE

Image
■ SUMMARY I n the RISC approach, the most frequently occurring instructions are optimized by eliminating or reducing the complexity of other instructions and addressing modes commonly found in CISC architectures. The performance of RISC architectures is further enhanced by pipelining and increasing the number of registers available to the CPU. Superscalar and VLIW architectures are examples of newer performance enhancements that extend, rather than replace, the RISC approach. P arallel architectures can be classified as MISD, SIMD, or MIMD. The MISD approach is used for systolic array processing, and is the least general architecture of the three. In a SIMD architecture, all PEs carry out the same operations on different data sets, in an “army of ants” approach to parallel processing. The MIMD approach can be characterized as “herd of elephants,” because there are a small number of powerful processors, each with their own data and instruction streams. The current trend is moving ...

Trends in computer architecture: case study: parallel processing in the sega genesis (the sega genesis architecture, sega genesis operation and sega genesis programming).

Image
  Case Study: Parallel Processing in the Sega Genesis Home video game systems are examples of (nearly) full-featured computer architectures. They have all of the basic features of modern computer architectures, and several advanced features. One notably lacking feature is permanent storage (like a hard disk) for saving information, although newer models even have that to a degree. One notably advanced feature, which we explore here, is the use of multiple processors in a MIMD configuration. Three of the most prominent home video game platforms are manufactured by S ony , N intendo , and S ega . For the purpose of this discussion, we will study the Sega Genesis, which exploits parallel processing for real-time performance. THE SEGA GENESIS ARCHITECTURE Figure 10-34 illustrates the external view of the Sega Genesis home video game system. The Sega Genesis consists of a motherboard, which contains electronic components such as the processor, memory, and interconnects, and...

Trends in computer architecture: parallel architecture (mapping an algorithm onto a parallel architecture, fine-grain parallelism – the connection machine cm-1 and course-grain parallelism: the cm-5).

Image
MAPPING AN ALGORITHM ONTO A PARALLEL ARCHITECTURE The process of mapping an algorithm onto a parallel architecture begins with a dependency analysis in which data dependencies among the operations in a program are identified. Consider the C code shown in Figure 10-23. In an ordinary SISD processor, the four numbered statements require four time steps to complete, as illustrated in the control sequence of Figure 10-24a. The dependency graph shown in Figure 10-24b exposes the natural parallelism in the control sequence. The dependency graph is created by assigning each operation in the original program to a node in the graph, and then drawing a directed arc from each node that produces a result to the node(s) that needs it. The control sequence requires four time steps to complete, but the dependency graph shows that the program can be completed in just three time steps, since operations 0 and 1 do not depend on each other and can be executed simulta- may not be very great...

Trends in computer architecture: parallel architecture (the Flynn taxonomy and interconnection networks).

Image
Parallel Architecture One method of improving the performance of a processor is to decrease the time needed to execute instructions. This will work up to a limit of about 400 MHz (Stone, 1991), at which point an effect known as ringing on busses prohibits further speedup with conventional bus technology. This is not to say that higher clock speeds are not possible, because indeed current microprocessors have clock rates well above 400 MHz, but that “shared bus” approaches become impractical at these speeds. As conventional architectural approaches to improving performance wear thin, we need to consider alternative methods of improving performance. One alternative approach to increasing bus speed is to increase the number of processors, and decompose and distribute a single program onto the processors. This approach is known as parallel processing , in which a number of processors work collectively, in parallel, on a common problem. We see an example of parallel processing earlier ...

Trends in computer architecture: vliw machines and case study: the anthelia- 64 (Merced) architecture (background—the 80x86 cisc architecture and the Merced: an epic architecture).

Image
10.7 VLIW Machines There is an architecture that is in a sense competitive with superscalar architectures, referred to as the VLIW (Very Long Instruction Word) architecture. In VLIW machines, multiple operations are packed into a single instruction word that may be 128 or more bits wide. The VLIW machine has multiple execution units, similar to the superscalar machine. A typical VLIW CPU might have two IUs, two FPUs, two load/store units, and a BPU. It is the responsibility of the compiler to organize multiple operations into the instruction word. This relieves the CPU of the need to examine instructions for dependencies, or to order or reorder instructions. A disadvantage is that the compiler must out of necessity be pessimistic in its estimates of dependencies. If it cannot find enough instructions to fill the instruction word, it must fill the blank spots with NOP instructions. Furthermore, VLIW architectural improvements require software to be recompiled to take advantage of them...

Trends in computer architecture: multiple instruction issue (supers alar) machines – the power pc 601 and case study: the power pc™ 601 as a superscalar architecture (instruction set architecture of the power pc 601 and hardware architecture of the power pc 601).

Image
10.4 Multiple Instruction Issue (Supers alar) Machines – The PowerPC 601 In the earlier pipelining discussion, we see how several instructions can be in various phases of execution at once. Here, we look at superscalar architecture, where, with separate execution units, several instructions can be executed simul- taneously. In a superscalar architecture, there might be one or more separate I nteger Units (IUs), F loating Point Units (FPUs), and B ranch Processing Units (BPUs). This implies that instructions need to be scheduled into the various execution units, and further, that instructions might be executed out-of-order. Out-of-order execution means that instructions need to be examined prior to dispatching them to an execution unit, not only to determine which unit should execute them, but also to determine whether executing them out of order would result in an incorrect program, because of dependencies between the instructions. This in turn implies an I nstruction Unit , IU...

Trends in computer architecture: overlapping register windows

Image
Overlapping Register Windows One modern architectural feature that has not been as widely adopted as other features (such as pipelining) is o v erlapping register windows , which to date has only been adopted by the SPARC family. This feature is based upon studies that show typical programs spend much of their time dealing with procedure call-and-return overhead, which involves passing parameters on a stack located in main memory in traditional architectures. The SPARC architecture reduces much of this overhead by employing multiple register sets that overlap. These registers are used for passing parameters between procedures, instead of using a stack in main memory. Procedure calls may be deeply nested in an ordinary program, but for a given window of time, the nesting depth fluctuates within a narrow band. Figure 10-6 illustrates this behavior. For a nesting depth window size of five, the window moves only 18 times for 100 procedure calls. Results produced by a group at UC B...