At present, the SPRs are defined to be 32-bits wide, which restricts the width of the MACHI:MACLO accumulator register. This is emphasized by the definition of the l.macrc instruction. Further, user-space access to MACHI and MACLO SPRs may be disabled by the kernel via SR[SUMRA]. Better to introduce another method of performing double-word arithmetic and accumulation that does not depend on SPRs at all.

For the purposes of discussion, name this set of instructions “ACC” (Alternate aCCumulate).

31    26   21   16   11    6     5     4     0
[ 0x14 | rD | rA | rB | rC | SOV | SCY | opc ]

Mnemonic    Opc   Setup
l.aadd      0000  a=D:A, m=0, c=0, s=0
l.asub      0001  a=D:A, m=0, c=1, s=1
l.aadc      0010  a=D:A, m=0, c=SR[CY], s=0
l.asbb      0011  a=D:A, m=0, c=~SR[CY], s=1
l.amul      0100  a=0, m=1, e=1, c=0, s=0
l.amulu     0101  a=0, m=1, e=0, c=0, s=0
l.amac      0110  a=D:A, m=1, e=1, c=0, s=0
l.amacu     0111  a=D:A, m=1, e=0, c=0, s=0
l.amsb      1000  a=D:A, m=1, e=1, c=1, s=1
l.amsbu     1001  a=D:A, m=1, e=0, c=1, s=1

if (m)
  x = (e ? exts(rB) : extz(rB))
  y = (e ? exts(rC) : extz(rC))
  p = x * y
  p = rB:rC

rD:rA <- a + (s ? ~p : p) + c

if (~SCY)
  SR[CY] <- (s ? ~carry-out : carry-out) (unsigned overflow)
if (~SOV)
  SR[OV] <- signed overflow

The circuits needed to implement these insns are virtually identical to that needed to implement the MAC instructions. Only the sources and destinations differ.

This uses pairs of registers to perform the same double-word arithmetic that’s currently possible with the MAC instructions, but leaves the result in general registers. This makes it simple to organize coding of general multi-word arithmetic (libgmp), the performance of which is crucial for implementing public-key encryption.

Note the l.aadc and l.asbb instructions include the carry-in, and so can be strung together to produce two words of a multi-word addition each.

Note that unlike l.mac vs l.macu, these instructions produce both SR[CY] and SR[OV] from the addition stage. Whether or not the multiply stage is zero-extended or not should not imply anything about the signed-ness of the addition.

Note the SOV and SCY bits to “suppress” generation of either SR[OV] or SR[CY] from the addition when they’re not desired by the programmer. The inverted sense of these bits (set to suppress rather than set to generate) will tie in to another proposal for normal arithmetic.