This was the first update to OR1K architecture for many years. It addresses architectural and documentation issues in the manual.

Download pdf
Changes
- jump/branch delay slot is now optional
- improved version tracking
- relocatable exception vector space
- correction of arithmetic overflow and carry detection
- improvements to MAC and integer multiply behaviour
- various clarifications regarding hardware and software behaviour

This page outlines the changes made for OR1K 1.0. It does not give the full details of all the changes at present. Please refer to the document (table 1-2) for full details.

Summary

New SPRs

VR2 (Version register 2), group 0, number 9
AVR (Architecture Version register), group 0, number 10
EVBAR (Exception Vector Base Address register) (optional), group 0, number 11
AECR (Arithmetic Exception Control register) (optional), group 0, number 12
AESR (Arithmetic Exception Status register) (optional), group 0, number 13
ISR0-ISR7 (Implementation-specific registers) (optional) group 0, numbers 21-28

Modified SPRs

VR (Version Register) - bit 6 now indicates the presence of the VR2
CPUCFGR - several bits added indicating
- bit 10, ND, whether CPU implements delay slot
- bit 11, AVRP, AVR present
- bit 12, EVBARP, EVBAR present
- bit 13, ISRP, ISRs present
- bit 14, AECSRP, AECR & AESR present

VR

The VR is deprecated in OR1K 1.0, software can check for the new version register, (VR2) by checking if bit 6 of VR is set. Bit 6 (previously reserved) of the VR is now defined as the Updated Version Register Present bit. This indicates the presence of the VR2 register.

VR2

The VR2 now holds 2 fields.

8-bit CPU identifier - these are unique identifiers and there is one per CPU implementation as per [[OR1K_CPU_Cores#CPU_ID_Table this table]]
24-bit CPU implementation version number (meaning is implementation-specific but version number overall should increment with newer versions)

CPUCFGR bits

There have been additional bits added to the CPUCFGR SPR indicating features and presence of optional architectural features.

AECR, AESR

These registers make it possible to configure and determine which arithmetic exceptions trigger the range exception.

The addition of these registers, in combination with clarifying the signed and unsigned overflow conditions setting the carry and overflow flags for the add/subtract and multiplication instructions, now make OR1K’s arithmetic exception handling clearer and more configurable.

ISRs

The implementation-specific registers allow CPU implementations to embed up to 256-bits of information. This space can be used for any purpose, but it’s intended to be used to store things such as

implementation-level configuration information (ie, synthesis-time configuration settings)
local build version/date information
debug features

Updates to implementations

It’s suggested the following is done to bring implementations into line with OR1k 1.0:

SPRs

Add new VR2, indicating the CPU implementation, and its version
- Also requires adding the UVRP bit in VR (set VR[6])
Add new AVR, indicating the architectural version
- Also requires adding the ARVP bit in CPUCFGR
If implementation does not implementation delay-slots on branches/jumps, set the CPUCFGR[ND] bit
Unimplemented SPR space (SPR addresses for which there are no registers) should read as zero and writing should have no effect

Arithmetic operations

Carry and overflow handling should now behave like so:

for all integer addition, subtraction and multiplication instructions, SR[CY] (the carry bit) should only be set on ‘'’unsigned’’’ overflow, and SR[OV] should only be set for ‘'’signed’’’ overflow
integer divide by zero has the following behaviour:
- on l.div sets SR[OV]
- on l.divu sets SR[CY]
If the implementation supports the MAC instructions
- add similar overflow/carry setting logic for the addition stage

Other new features

These are all optional.

Proposals

Below are the proposals which were suggested and accepted into OR1K 1.0.

GPR0 usage, implementation

To eliminate confusion regarding the behaviour of writes to R0 when it’s hardwired in implementation, the mention of its role according to the ABI and the potential implementation shortcuts should be removed. This is intended to imply that R0 must be implemented in the same way as the other general-purpose registers. The mention of its use as constant zero should only be present in the ABI section of the document.

The following paragraph in 4.4 should be removed:

R0 is used as a constant zero. Whether or not R0 is actually hardwired to zero is implementation dependent. R0 should never be used as a destination register. Functions of other registers are explained in chapter Application Binary Interface on page 319.

The following sentence should replace it:

Functions of the registers are explained in chapter Application Binary Interface on page 319.

In section 16.2.1, the following:

 R0 [Zero] Always fixed to zero. 
 Even if it is writable in some embedded implementations, the software shouldn’t modify it.

Should be changed to:

 R0 [Zero] Holds a zero value.

l.trap condition

The architecture spec currently indicates the trap instruction checks a bit in the SR to test whether it should occur:

 Format:
 l.trap K
 
 Description:
 Execution of trap instruction results in the trap exception if specified bit in SR is set. Trap
 exception is a request to the operating system or to the debug facility to execute certain
 debug services. Immediate value is used to select which SR bit is tested by trap
 instruction.
 
 32-bit Implementation:
 if SR[K] = 1 then trap-exception()

This is not obeyed in any implementation and has no good justification for being specified this way. The specification should remove mention of being conditional on SR[K].

Accessing unimplemented SPRs

It should be stated in section 4.3 that un-implemented or reserved SPR space should be read as zero and writing should have no consequence (ie l.mtspr for an unimplemented SPR should be equivalent to a l.nop).

rdiez: In my OR10 CPU accessing non-existing SPRs raises an exception. This helps catch bugs early.

firefalcon: +1 rdiez. Also, to make the ISA amenable to virtualization, any attempted access to any (existing or not) privileged SPR should result in an exception.

ABI, returning structures by value

Currently, the description for returning structures by value says:

 A function that returns a structure or union places the address of the structure or 
 union in the general-purpose RV register."

This should be extended with the following information:

 A function that returns a structure by value expects the location where that
 structure is to be placed to be supplied in function parameter word 0 (R3).

Typo for lf.sfle.d instruction

The current lf.sfle.d (double precision set flag less or equal) states that register rA is 364 bits in size (rA[363:0]) instead of just 64 (rA[63:0]).

Remove “Signed” from name of addition and subtraction instructions

At present, the l.add, l.addi, l.addc, l.addic and l.sub all say the instructions are “signed” whereas the instructions

work for both signed and unsigned operands
signal both signed overflow and and unsigned overflow (carry)

… so actually are not signed-specific. So removing the “Signed” from the name of the instructions makes things less confusing.

Versioning

There are a few version-tracking issues to be dealt with.

Version Registers

The existing implementation version registers are not as good as they could be, and some proposals for changing them have been made.

Add new 32-bit version register, VR2, in SPR space, address 21. It identifies the implementation (model) and version of the OpenRISC 1000 processor.

 [31:24] VER Version
 Implementation-specific version information. This value should 
 increase for more recent versions. The CPU implementation 
 specification document should indicate how to interpret this field.

 [23:0] CPUID CPU Identification
 Implementation-specific identification number. Each unique 
 implementation should have a unique identification value.

The 8-bit CPUID field of the VR can be used to determine the implementation. A list of know implementations and unique IDs can probably be maintained in a document kept alongside the architecture spec and/or on this wiki page. I think the OR1200 should have its ID as 0x12 and or1ksim as 0x01. Any sufficiently different branch of any implementation, that is not likely to be re-merged, should get its own unique ID.

A VR2-presence bit in existing VR register - bit 15 (currently reserved)
Implementation revision/build information registers, REVIR0-REVIR4, SPR addresses 22-26. These 5 registers provide detailed information on the CPU’s revision. The exact use of these registers is implementation-specific but are big enough to store a 160-bit SHA1 hash value.

The REVI registers helps us use that neat trick with git, but is mostly designed to help keep track of local modifications to the project before synthesis is performed. If people aren’t using git locally there’s many other ways to store local modifications there. For or1ksim, it’s a way of getting a useful size amount of information into registers which the software can read.

Architecture Versions

As the architecture is modified and possibly expanded there needs to be a way of tracking the various versions of the architecture, both in implementation (eg. via a register) and in the document itself.

Although optional individual features will all have newly added presence bits to test, adherence to other architectural advances will need to be conveyed via a register for quick and easy determination by software.

For example, the GPR0 issue - there can be assurance that any implementation with an architecture revision value above 0 will adhere to the clarified definition of its implementation and use.

Architecture Revision Register

A new register, the Architecture Revision Register (ARR) should be added at address 27.

This register will contain the latest architecture revision the implementation contains features from. This will allow software to broadly detect which generation of CPU it is running on, and so which assumptions are safe.

Architecture Document Versions

The document itself will change over time as its contents are amended, clarified or expanded. There needs to be a clear way of tracking the versions of the document.

As the document is amended at present, the document revision history table in section 1.3 must be updated. However the only way to track the document revision is by the recorded date of the edit. It is proposed a new column is added to table 1-2 with an entry for each document revision of the format X-Y where X is the architecture revision the document is at that point and Y is an integer beginning at 0 for each new revision and incrementing by 1 per document update. This would mean the existing document revision is ‘0-17’ (17 recorded updates since March 2000.) The update corresponding with the addition of new features for architecture revision 1 would mean the document revision changes to ‘1-0’.

Exception Vector Base Address

The current options for locations of the exception vectors are not very flexible, the only two possible locations are at address 0x0 or address 0xf0000000.

The proposal is for the addition of an (optional) SPR register (EVBA)in group 0 (System Control and Status registers) at address 1536 (right after the last possible GPR mapped into SPR space), which would hold the (upper part of the) base address of the exception vectors.

 [31:13] Exception Vector Base Address
 Location for the start of exception vectors.
 Reset value: Implementation specific
 [12:0] Reserved / Constant 0

This register is optional, and in case EPH is asserted, it should be OR’ed together with the value in EVBA.

The presence of EVBA can be detected in software by writing a value to it and see if the same value reads back.

EVBA is only writable/readable when in supervisor mode (i.e. when SR[SM]=1)

Delay Slot Optional

There is a proposal to make delay slots optional. This will be stated in the architecture specification. Whether a machine has branch delay slots will be indicated via a the ND bit (position 10) in CPUCFGR.

On machines where CPUCFGR[ND] is set, jumps and branches change the PC immediately, instead of after the following delay slot instruction. Additionally, the l.jal and l.jalr instructions write PC+4 to the link register (R9), instead of PC+8.

The following changes will be made to the toolchain in order to facilitate the use of this option:

The EF_OR1K_NODELAY bit (position 0) in the flags field of the ELF header will be set in all binaries. The linker should warn whenever this bit is incompatibly set, but not fail or abort. This warning should be able to be turned off.
GAS will now understand the directive .nodelay. For now its only effect is to cause the EF_OR1K_NODELAY bit in the ELF file to be set.
GCC will have the following new flags:
- -mdelay: forces use of the delay slot
- -mno-delay: forces branches and jumps to not have a delay slot
- -mcompat-delay: forces delay slots, and fills them with nops
The or1knd-elf and or1knd-linux machine specifications will be added.
- When GCC is compiled for the or1k-* targets, -mdelay will be the default
- When GCC is compiled for the or1knd-* targets, -mno-delay will be the default
- GCC compiled for either target will understand all 3 flags
- When using -mno-delay GCC will emit a .nodelay directive in the intermediate .s file, and the C preprocessor will include the predefined macro OR1K_NODELAY
- Separate GCC multilib configurations might be useful for each of these flags
or1ksim will be modified to allow the delay slot to be disabled. This can be configured by setting the ND bit (position 10) in the cfg configuration variable of the cpu section of sim.cfg.
Where necessary, assembly source files (in testsuites, newlib, etc.) will be modified to use the C preprocessor to detect whether delay slots are to be used, and possibly exchange the position of delay slot instructions.

Change Use of Carry and Overflow Flags by Multiply Instructions

This proposal aims to simplify the implementation of the OpenRISC multiplier, as well as resolve conflicting implementation between the OR1200 and or1ksim.

The l.mul instruction is hard to implement in the current state. It performs a signed multiply, but must the set CY flag to indicate that overflow would have happened if the numbers were treated as unsigned. This is difficult to implement because multiplier macros and DSP slices on FPGAs only let you do either signed or unsigned multiply, and almost never give you the ability to detect unsigned overflow when doing signed multiply. Implementing it pretty much requires a hand-coded multiplier. The or1ksim simulator does get this requirement correct, but it does so by multiplying the operands twice, in both signed and unsigned modes. Notably, the OR1200 does not appear to implement this correctly. Instead, it appears to set the OV bit for both signed overflow in l.mul and unsigned overflow in l.mulu.

For consistency, we will always use OV to indicate signed overflow, and CY to indicate unsigned overflow. We will add a CYE bit to the SR register that causes a RANGE exception to be triggered when the CY bit is set by an instruction. Additionally, we will change the requirement that the l.mul instruction set the CY bit for unsigned overflow, and instead require that it always clear the bit. (I had considered having it just leave that bit unchanged, but it seems all the arithmetic instructions write both, if only to clear them, so just clearing it is more consistent.)

Summary of changes:

Remove the requirement that the l.mul instruction set the CY flag to indicate unsigned overflow. The instruction will instead always set CY to zero (for consistency with other arithmetic instructions).
That’s it!

Control of Carry and Overflow Flags and Exceptions

Continuing with the idea of fixing the definitions of instructions and their setting of carry and overflow flags, where we will have carry (SR[CY]) set on unsigned overflow, and overflow (SR[OV]) set on signed overflow, it’s also likely we will want to have relatively fine-grained control over exceptions triggered by this arithmetic.

For instance, we may want an exception on signed multiply overflow, but not on addition overflow (addition is not specific about whether its operands are to be considered signed or unsigned, so it will be setting both OV and CY on signed and unsigned overflow, respectively.)

(Unfortunately the OR1K ISA lacks unsigned addition, so it’s not possible to distinguish between unsigned and signed overflow for addition and subtraction.)

So to implement more useful arithmetic overflow detection and management I propose the addition of 2 SPRs for control and status of arithmetic overflow exceptions.

AECR - Arithmetic Exception Control Register

 [0] CYADDE
   Carry flag set by unsigned overflow on integer addition and subtraction instructions causes exception
 [1] OVADDE
   Overflow flag set by signed overflow on integer addition and subtraction instructions causes exception
 [2] CYMULE
   Carry flag set by unsigned overflow on unsigned integer multiply instructions causes exception
 [3] OVMULE
   Overflow flag set by signed overflow on signed integer multiply instructions causes exception
 [4] DBZE
   Overflow flag set by divide-by-zero on integer division instruction, or carry flag set by divide-by-zero on l.divu instruction, causes exception
 [5] CYMACADDE
   Carry flag set by unsigned overflow on MAC-unit add/sub operations causes exception
 [6] OVMACADDE
   Overflow flag set by signed overflow on MAC-unit add/sub operations causes exception

AESR - Arithmetic Exception Status Register

 [0] CYADDE
   Carry flag set by unsigned overflow on integer addition and subtraction instructions caused exception
 [1] OVADDE
   Overflow flag set by signed overflow on integer addition and subtraction instructions caused exception
 [2] CYMULE
   Carry flag set by unsigned overflow on unsigned integer multiply instructions caused exception
 [3] OVMULE
   Overflow flag set by signed overflow on signed integer multiply instructions caused exception
 [4] DBZE
   Overflow flag set by divide-by-zero on l.div instruction, or carry flag set by divide-by-zero on l.divu instruction, caused exception
 [5] CYMACADDE
   Carry flag set by unsigned overflow on MAC-unit add/sub operations caused exception
 [6] OVMACADDE
   Overflow flag set by signed overflow on MAC-unit add/sub operations caused exception

Setting the SR[OVE] will cause a range exception to occur when any of the situations specified in AECR have their bit set. SR[OV] and SR[CY], as appropriate, shall be set regardless of whether an exception is triggered. This will cause the range exception to occur and set the bit corresponding to the cause for the exception in the AESR. Writing ‘0’ to the bit in the AESR will cause it to be cleared. Only the occurrence of the exception will cause the range exception to occur, so leaving the bit set in the AESR and l.rfe with ESR[OVE] set shall not cause another exception.

Both registers will be R/W in supervisor mode only.

Function of bits [8:2] are obviously dependent on corresponding instructions/units being present.

Modifications to instructions are as follows:

 l.add, l.addc, l.addi, l.addic, l.sub
   Set SR[CY] on unsigned overflow, else clear
   Set CR[OV] on signed overflow, else clear

 l.mul, l.muli
   Clear SR[CY]
   Set SR[OV] on signed overflow, else clear

 l.mulu
   Set SR[CY] on unsigned overflow, else clear
   Clear SR[OV]
   * This differs from current definition, which sets SR[OV] instead of SR[CY] 

 l.div
   Clear SR[CY]
   Set SR[OV] on divide-by-zero, else clear

 l.divu
   Set SR[CY] on divide-by-zero, else clear
   Clear SR[OV]
   * This differs from current definition, which sets SR[OV] instead of SR[CY] 

 signed l.mac instructions
   Clear SR[CY]
   Set CR[OV] on signed overflow of add/subtract with {MACHI,MACLO}, else clear
   * This differs from current definition, which does not set either SR[OV] or SR[CY]

 (proposed) unsigned l.mac instructions
   Set SR[CY] on unsigned overflow of add/subtract with {MACHI,MACLO}, else clear
   Clear SR[OV]
  * This will apply to an entirely new set of instructions

Update: I propose that when these registers are not present (as indicated in the CPUCFGR) then the default behaviour is that any carry or overflow occurrences (as defined by the updated instruction behaviour) cause a range exception when SR[OVE] is set (just like all bits in AECR are set to ‘1’.)

Optional overflow/carry detection logic and exception trigger

I propose that the functionality of the carry and overflow detection be optional, and the capability of the CPU to perform it is indicated by a bit in the CPUCFGR.

In the case that it is not present, the SR[CY] and SR[OV] bits are tied low, and SR[OVE] cannot be set. If the AECR/AESR registers are implemented (silly if they are) then they have no effect.

Multiply and Accumulate

This is a potentially disruptive change, as it is different from the operation of the OR1200. It may be a good idea to implement it under a new UPR code or similar.

The l.mac instructions truncate the product of the two registers to 32-bits before performing a 64-bit add. This seems less useful, and unnecessary, since the simplest implementation of the l.mul instructions would use a 32x32-bit to 64-bit multiplier in order to be implemented using FPGA DSP slices, and still accurately capture the signed/unsigned overflow flags. Additionally, the MACHI register value is only useful if the programmer knows the product of the operands fits in 32 bits.

So I propose changing the l.mac instruction to perform a full 64-bit addition of the accumulator and the 64-bit product, without first truncating the product to 32 bits.

Note that if the programmer has already guaranteed that the products are smaller than 32 bits, the program will produce the same result after this ISA change. Additionally, the the MACLO value produced will be identical to before, so if the program only uses l.mac, l.msb, and l.macrc instructions, there will be no visible difference.

Additionally, it seems simple and useful to add unsigned versions of the mac instructions, and a version that does not perform an add or subtraction, and allows the full 64-bit product to be recovered from the HI/LO registers.

Finally, these changes will simplify the implementation and improve the performance of C long long and bignums in general.