## CPU Structure and Function Ch 11

General Organisation
Registers
Instruction Cycle
Pipelining
Branch Prediction
Interrupts

26/09/2001 Copyright Teemu Kerola 2001

#### User Visible Registers

- · Varies from one architecture to another
- General purpose register (GPR)
  - Data, address, index, PC, condition, ....
- · Data register
  - Int, FP, Double, Index
- · Address register
- · Segment and stack pointers
  - only privileged instruction can write?
- · Condition codes
  - result of some previous ALU operation

9/2001

Copyright Teemu Kerola 2001

#### General CPU Organization (4)

- ALU
  - does all real work
- Registers
  - data stored here
- Internal CPU Bus
- Control
- More in Chapters 14-15

Fig. 11.1

Fig. 11.2

- determines who does what when
- driven by clock
- uses control signals (wires) to control what every circuit is doing at any given clock cycle

26/09/2001

Copyright Teemu Kerola 2001

## Control and Status Registers (5)

- P(
  - next instruction (not current!)
  - part of process state
- IR, Instruction (Decoding) Register
  - current instruction
- · MAR, Memory Address Register
  - current memory address
- MBR, Memory Buffer Register
  - current data to/from memory
- · PSW, Program Status Word
  - what is allowed? What is going on?
  - part of process state

26/09/2001

Copyright Teemu Kerola 2001

Fig. 11.7

#### Register Organisation (4)

- Registers make up CPU work space
- User visible registers
- ADD R1,R2,R3
- accessible directly via instructions
- Control and status registers
- BNeq Loop
- may be accessible indirectly via instructions
- may be accessible only internally HW exception
- Internal latches for temporary storage during instruction execution
  - E.g., ALU operand either from constant in instruction or from machine register

26/09/2001

Copyright Teemu Kerola 2001

#### PSW - Program Status Word (6)

- State info from latest ALU-op
  - Sign, zero?
  - Carry (for multiword ALU ops)?
  - Overflow?
- Interrupts that are enabled/disabled?
- Pending interrupts?
- CPU execution mode (supervisor, user)?
- Stack pointer, page table pointer?
- I/O registers?

26/09/2001













# 2-stage Instruction Execution Pipeline (4) Fig. 11.10

- Good: instruction pre-fetch at the same time as execution of previous instruction
- Bad: execution phase is longer, I.e., fetch stage is sometimes idle
- Bad: Sometimes (jump, branch) wrong instruction is fetched
  - every 6th instruction?
- Not enough parallelism ⇒ more stages?

5/09/2001 Copyright Teemu Kerola 2001

## Pipeline Execution Time (3)

- <u>Time</u> to execute <u>one instruction</u> (latency, seconds) may be <u>longer</u> than for non-pipelined machine
  - extra latches to store intermediate results
- <u>Time</u> to execute 1000 instructions (seconds) is <u>shorter</u> (better) than that for non-pipelined machine, I.e.,
  - <u>Throughput</u> (instructions per second) for pipelined machine is <u>better</u> (bigger) than that for non-pipelined machine
- Is this good or bad? Why?

26/09/2001 Copyright Teemu Kerola 2001 16

## Another Possible Instruction Execution Pipeline

- FE Fetch instruction
- DI Decode instruction
- CO Calculate operand effective addresses
- FO Fetch operands from memory
- EI Execute Instruction
- WO Write operand (result) to memory

Fig. 11.11

26/09/2001

Copyright Teemu Kerola 2001 14

## Pipeline Speedup Problems

- Some stages are shorter than the others
- Dependencies between instructions
  - control dependency
    - E.g., conditional branch decision know only after EI stage

Fig. 11.12

Fig. 11.13

09/2001 Copyright Teemu Kerola 2001







#### Branch Problem Solutions (5)

- · Delayed Branch
  - compiler places some useful instructions
     (1 or more!) after branch (or jump) instructions
  - these instructions are almost completely executed when branch decision is known
  - less actual work lost
  - can be difficult to do



26/09/2001

Copyright Teemu Kerola 2001

Pipeline Speedup n instructions, k stages n instructions, k stages  $\tau$  = stage delay = cycle time (pessimistic because of not pipelined:  $T_1$ assuming that each stage would still have  $\tau$  cycle time) Time  $= |k + (n-1)|\tau$ pipelined: k cycles until 1 cycle for 1st instruction each of the rest completes (n-1) instructions Copyright Teemu Kerola 200

## Branch Probl. Solutions (contd) (6)

- Multiple instruction streams
  - execute speculatively in both directions
    - Problem: we do not know the branch target address early!
  - if one direction splits, continue each way again
  - lots of hardware
    - · speculative results (registers!), control
  - speculative instructions may delay real work
    - bus & register contention?
  - need to be able to <u>cancel</u> not-taken instruction streams in pipeline

26/09/2001 Copyright Teemu Kerola 2001 23



#### Branch Probl. Solutions (contd) (2)

• Prefetch Branch Target

IBM 360/91 (1967)

- prefetch just branch target instruction
- do not execute it, I.e., do only FI stage
- if branch take, no need to wait for memory
- · Loop Buffer
  - keep n most recently fetched instructions in high speed buffer inside CPU
  - works for small loops (at most *n* instructions)

26/09/200

#### Branch Probl. Solutions (contd) (5)

- Branch Prediction
  - guess (intelligently) which way branch will go
  - static prediction: all taken or all not taken
  - static prediction based on opcode
    - E.g., because BLE instruction is usually at the end of loop, guess "taken"
  - dynamic prediction taken/not taken
    - · based on previous time this instruction was executed
    - need space (1 bit) in CPU for each (?) branch
    - · end of loop always wrong twice!
    - · extension based on two previous time execution
      - need more space (2 bits)

Fig. 11.16

26/09/200

Copyright Teemu Kerola 2001

#### CPU Example: PowerPC • User Visible Registers - 32 general purpose regs, each 64 bits • Exception reg (XER), 32 bits Fig. 11.23a - 32 FP regs, each 64 bits Table 11.3 • FP status & control (FPSCR), 32 bits branch processing unit registers · Condition, 32 bits Fig. 11.23b - 8 fields, each 4 bits - identity given in instructions Table 11.4 · Link reg, 64 bits - E.g., return address Count regs, 64 bits

26/09/2001

 E.g., loop counter Copyright Teemu Kerola 2001

#### Branch Address Prediction (3)

- It is not enough to know whether branch is taken or not
- Must know also branch address to fetch target instruction
- Branch History Table
  - state information to guess whether branch will be taken or not
  - previous branch target address
  - stored in CPU for each (?) branch

Copyright Teemu Kerola 200

### CPU Example: PowerPC

- · Interrupts
  - cause
    - · system condition or event

Table 11.5

· instruction

Copyright Teemu Kerola 200

### **Branch History Table**

- PowerPC 620
- entries only for most recent branches
- · Branch instruction address, or tag bits for it
- Branch taken prediction bits (2?)
- · Target address (from previous time) or complete target instruction?
- · Why cached
  - expensive hardware, not enough space for all possible branches
  - at lookup time check first whether entry for correct branch instruction

Copyright Teemu Kerola 200

## CPU Example: PowerPC

• Machine State Register, 64 bits

Table 11.6

- bit 48: external (I/O) interrupts enabled?
- bit 49: privileged state or not
- bits 52&55: which FP interrupts enabled?
- bit 59: data address translation on/off
- bit 63: big/little endian mode
- Save/Restore Regs SRR0 and SRR1
  - temporary data needed for interrupt handling

## Power PC Interrupt Invocation

- Save return PC to SRR0
- Table 11.6
- current or next instruction at the time of interrupt
- Copy relevant areas of MSR to SRR1
- · Copy additional interrupt info to SRR1
- Copy fixed new value into MSR
  - different for each interrupt
  - address translation off, disable interrupts
- Copy interrupt handler entry point to PC
  - two possible handlers, selection based on bit 57 of original MSR

26/09/2001

Copyright Teemu Kerola 2001

Power PC Interrupt Return

Table 11.6

32

- Return From Interrupt (rfi) instruction privileged
- Rebuild original MSR from SRR1
- Copy return address from SRR0 to PC

26/09/200

