## CPU Structure and Function Ch 12

General Organisation
Registers
Instruction Cycle
Pipelining
Branch Prediction
Interrupts

26.9.2002 Copyright Teemu Kerola 2002

### General CPU Organization (4)

- ALI
  - does all real work
- Registers



Fig. 12.1 (Fig. 11.1 [Stal99])

- data stored here
- Internal CPU Bus
- -----
- Control More in Chapters 16-17 (Ch 14-15 [Stal99])
  - determines who does what when
  - driven by clock
  - uses control signals (wires) to control what every circuit is doing at any given clock cycle

02 Copyright Teemu Kerola 2002

### Register Organisation (4)

- · Registers make up CPU work space
- User visible registers

ADD R1,R2,R3

- accessible directly via instructions
- Control and status registers
- BNeq Loop
- may be accessible indirectly via instructions
- may be accessible only internally HW exception
- Internal latches for temporary storage during instruction execution
  - E.g., ALU operand either from constant in instruction or from machine register

26.9.2002 Copyright Teemu Kerola 2002

### User Visible Registers (6)

- · Varies from one architecture to another
- General purpose registers (GPR)
  - Data, address, index, PC, condition, ....
- · Data registers
  - Int, FP, Double, Index
- Address registers
- · Segment and stack pointers
  - only privileged instruction can write?
- Condition codes
  - result of some previous ALU operation

6.9.2002 Copyright Teemu Kerola 2002 4

## Control and Status Registers (5)

- PC
  - next instruction (not current!)
  - part of process state
- IR, Instruction (Decoding) Register

- current instruction



- MAR, Memory Address Register
  - current memory address
- MBR, Memory Buffer Register
  - current data to/from memory
- PSW, Program Status Word
  - what is allowed? What is going on?
  - part of process state

26.9.2002 Copyright Teemu Kerola 2002 5

### PSW - Program Status Word (6)

- State info from latest ALU-op
  - Sign, zero?
  - Carry (for multiword ALU ops)?
  - Overflow?
- Interrupts that are enabled/disabled?
- Pending interrupts?
- CPU execution mode (supervisor, user)?
- Stack pointer, page table pointer?
- I/O registers?

26.9.2002 Copyright Teemu Kerola 2002













# 2-stage Instruction Execution Pipeline (4)

Fig. 12.9 (Fig. 11.10 [Stal99])

- Good: instruction pre-fetch at the same time as execution of previous instruction
- Bad: execution phase is longer, I.e., fetch stage is sometimes idle



- Bad: Sometimes (jump, branch) wrong instruction is fetched
  - every 6th instruction?
- Not enough parallelism ⇒ more stages?

Copyright Teemu Kerola 2002

# Another Possible **Instruction Execution Pipeline**

- FE Fetch instruction
- DI Decode instruction
- CO <u>Calculate operand effective addresses</u>
- FO Fetch operands from memory
- EI Execute Instruction
- WO Write operand (result) to memory

Fig. 12.10 (Fig. 11.11 [Stal99])

26.9.2002

Copyright Teemu Kerola 2002



## Pipeline Execution Time (3)

- Time to execute one instruction, I.e., latency may be longer than for non-pipelined machine
  - extra latches to store intermediate results
- Time to execute 1000 instructions (seconds) is shorter (better) than that for non-pipelined machine, I.e., throughput (instructions per second) for pipelined machine is better (bigger) than that for non-pipelined machine
  - parallel actions speed-up overall work load
- Is this good or bad? Why?

Copyright Teemu Kerola 2002

Pipeline Speedup Problems

- Some stages are shorter than the others
- · Dependencies between instructions
  - control dependency
    - · E.g., conditional branch decision know only after EI stage

Fig. 12.11 (Fig. 11.12 [Stal99]) Fig. 12.12-13 (Fig. 11.13 [Stal99])

Copyright Teemu Kerola 2002

Pipeline Speedup Problems (3) Fig. 12.11 (Fig. 11.12 [Stal99]) value known • Dependencies between after EI stage instructions MUL R1,R2,R3 data dependency • One instruction depends LOAD R6, ArrB(R1) on data produced by some earlier instruction value needed : - structural dependency in CO stage · Many instructions STORE R1, VarX need the same resource ADD R2,R3,VarY at the same time R3,R4,R5 · memory bus, ALU, ... FI memory bus use Copyright Teemu Kerola 2002









### Branch Probl. Solutions (contd) (6)

- Multiple instruction streams
  - execute speculatively in both directions
    - Problem: we do not know the branch target address early!
  - if one direction splits, continue each way again
  - lots of hardware
    - speculative results (registers!), control
  - speculative instructions may delay real work
    - bus & register contention?
    - Need multiple ALUs?
  - need to be able to <u>cancel</u> not-taken instruction streams in pipeline

26 9 2002 Convright Teemu Kerola 2002. 23

#### Branch Probl. Solutions (contd) (2)

• Prefetch Branch Target

IBM 360/91 (1967)

- prefetch just branch target instruction
- do not execute it, I.e., do only FI stage
- if branch take, no need to wait for memory
- · Loop Buffer
  - keep n most recently fetched instructions in high speed buffer inside CPU
  - works for small loops (at most *n* instructions)

6.9.2002 Copyright Teemu Kerola 2002 24

### Branch Probl. Solutions (contd) (4)

- Static Branch Prediction
  - guess (intelligently) which way branch will go
  - static prediction: all taken or all not taken
  - static prediction based on opcode
    - E.g., because BLE instruction is usually at the end of loop, guess "taken" for all BLE instructions

26.9.2002

Copyright Teemu Kerola 2002

#### Branch Probl. Solutions (contd) (5)

- · Dynamic branch prediction
  - based on previous time this instruction was executed
  - need a CPU "cache" of addresses of branch instructions, and taken/not taken information
    - 1 bit
  - end of loop always wrong twice!
  - extension: prediction based on two previous time executions of that branch instruction
    - need more space (2 bits)

Fig. 12.17 (Fig. 11.16 [Stal99])

26.9.2002

Copyright Teemu Kerola 2002

#### Branch Address Prediction (3)

- It is not enough to know whether branch is taken or not
- · Must know also branch address to fetch target instruction
- Branch History Table
  - state information to guess whether branch will be taken or not
  - previous branch target <u>address</u>
  - stored in CPU "cache" for each branch

Copyright Teemu Kerola 2002

27

Table 12.3

(Tbl. 11.3)

Table 12.4

(Tbl. 11.4)

#### **Branch History Table**

Cached

PowerPC 620

- entries only for most recent branches
- · Branch instruction address, or tag bits for it
  - Branch taken prediction bits (2?)
- Target address (from previous time) or complete target instruction?
- Why cached
  - expensive hardware, not enough space for all possible branches
  - at lookup time check first whether entry for correct branch instruction
    - · Index/tag bits of branch instruction address

Copyright Teemu Kerola 2002

## CPU Example: PowerPC • User Visible Registers Fig. 12.23 (Fig. 11.22 [Stal99])

- - 32 general purpose regs, each 64 bits
  - Exception reg (XER), 32 bits Fig. 12.24a (Fig. 11.23a)
  - 32 FP regs, each 64 bits
    - FP status & control (FPSCR), 32 bits
  - branch processing unit registers
    - · Condition, 32 bits
      - Fig. 12.24b (Fig. 11.23b) - 8 fields, each 4 bits
    - identity given in instructions · Link reg, 64 bits
    - E.g., return address
    - · Count regs, 64 bits
      - E.g., loop counter

Copyright Teemu Kerola 2002

## CPU Example: PowerPC

- Interrupts
  - cause
    - · system condition or event
    - instruction

Table 12.5 (Fig. 11.5 [Stal99])

Copyright Teemu Kerola 2002

## CPU Example: PowerPC

• Machine State Register, 64 bits

(Tbl. 11.6 [Stal99]) Table 12.6

- bit 48: external (I/O) interrupts enabled?
- bit 49: privileged state or not
- bits 52&55: which FP interrupts enabled?
- bit 59: data address translation on/off
- bit 63: big/little endian mode
- Save/Restore Regs SRR0 and SRR1
  - temporary data needed for interrupt handling

Copyright Teemu Kerola 2002

## Power PC Interrupt Invocation

• Save return PC to SRR0

(Tbl. 11.6 [Stal99]) Table 12.6

- current or next instruction at the time of interrupt
- · Copy relevant areas of MSR to SRR1
- · Copy additional interrupt info to SRR1
- · Copy fixed new value into MSR
  - different for each interrupt
  - address translation off, disable interrupts
- Copy interrupt handler entry point to PC
  - two possible handlers, selection based on bit 57 of original MSR

Copyright Teemu Kerola 2002

### Power PC Interrupt Return

(Tbl. 11.6 [Stal99]) Table 12.6

33

- Return From Interrupt (rfi) instruction - privileged
- Rebuild original MSR from SRR1
- Copy return address from SRR0 to PC

Copyright Teemu Kerola 2002

