Post by EricPPost by MitchAlsupPost by EricPPost by Terje MathisenPost by Stephen FuldPost by Rick C. HodginIs there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for
free in his MY 6600. The add instruction (as well as others)
allows you to negate an operand. So standard subtract is simply
add with the second operand negated. Similarly, subtract reverse
would be add with the first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to
calculate the space remaining in a buffer, Mitch's approach of a
having just the regular HW adder defined and all the argument
inversions exposed in the instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
No, it is useless if it does something that no one wants.
Negating a constant would be one. Adding zero can be another.
I'm not suggesting My66000 does this.
Sometimes, redundant encodings (assuming they are not being used) can be
reclaimed for something else.
Post by EricPIn many load/store ISA's an immediate is the second source operand.
Subtract opcode means negate the second operand.
So Subtract Immediate is a useless 3-operand opcode.
But if we make immediates the first source operand then it works
for both ADD and SUB and we don't need 3 operand Subtract Reverse.
Now we can do SUB rd,-5,rs.
FWIW, in my case there are no SUB with immediate cases, but instead ADD
with a one-extended immediate.
Basically works because:
z=x-3;
Can essentially be encoded equivalently as:
z=x+(-3);
Also, addition is commutative whereas subtraction is not, ...
Though, support for SUB is needed in the ALU, in the current
implementation, this is done by doing both the ADD and SUB cases in
parallel, and then picking the desired result afterwards (these results
drive ADD/SUB, CMPEQ/CMPGT/..., as well as various SIMD operations).
Post by EricPBut consider 2 operand SUB2 rsd,rs is rsd = rsd-rs.
Now Subtract Reverse does have a use SUBR2 rsd,rs is rsd = rs-rsd.
Also reverse operands are useful for 3 operand divide immediate
as we need both rd = rs/imm and rd = imm/rs.
And, still no hardware divider here.
Current dividers I have:
Software shift-subtract loop, currently written in ASM, not super fast
but works fairly reliably.
Software divider which tries to compose (1/x) via lookup tables (or
values less than 16 via a switch table), which is currently faster but
not used automatically because it doesn't give exact results (once a
reciprocal is composed, a widening multiply is used to perform the
division).
It is also possible to cast to double, do an FP divide, then cast back.
However, this isn't really all that much faster, and doesn't seem
particularly safe either (there is a non-zero possibility that the
results will be incorrectly rounded for an integer divide).
Post by EricPPost by MitchAlsupPost by EricPFor example, for a 3 register ADD can swap source registers rs1 and rs2.
Or that (-rd) = (-rs1) + (-rs2) is just an ADD.
Or when rs1 = rs2, rd = 0, and (-rd) is rd = -1.
And that if one operand is an immediate there is no need
to negate at run time as the assembler/compiler does so.
At first look it seems there are only 8 variants so why bother.
But then when you consider there are 5 different ADD operations,
with non-trapping (wrapping), signed and unsigned trap or saturate,
these start to add up. Also one source can be a register or immediate,
and there are various sizes of immediates, lets say 32 or 64 bits.
So now there are 8*5*3 = 120 different kinds of ADD.
Clearly (CLEARLY) at this point the encoding space is out of hand....
This is what would happen if I tried mixing my priority features,
trapping/saturating arithmetic, with your operand negate features.
And I'm not necessarily bothered by it, rather just recognizing
where that trail leads.
Saturating arithmetic is another of those things.
As-is, it would look something like:
ADD R8, R9, R4
CMPGT 255, R4
MOV?T 255, R4
CMPGT 0, R4
MOV?F 0, R4
Explicit clamping ops could make sense (there is a clamping intrinsic,
but it basically does something similar to the above).
With a more recent addition, it is possible to SIMD this case though.
MOV 0x0000000000000000, R6
MOV 0x00FF00FF00FF00FF, R7
PADD.W R8, R9, R4
PCMPGT.W R7, R4
PCSELT.W R7, R4, R4
PCMPGT.W R6, R4
PCSELT.W R4, R6, R4
Similar is now also possible with floating-point cases.
But, could be worse...
Post by EricPPost by MitchAlsupa) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space
So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
I'm just looking at where asymmetries enter and their effect
on possible encodings.
Post by MitchAlsupPost by EricPIf one has a 32-bit opcode then this is not so bad.
But if one is trying to pack instructions into a 16-bit opcode,
Then you DON'T do this kind of encoding. The 16-bit space is for addressing
code footprint, not a be-all encoding space.
Right, and this is where orthogonality is tossed away.
The way I approached it is to lay out all 0,1,2,3 and some 4 operand
instructions in 32 bit formats and mark the ones that can also fit
into 16 bits. A guesstimate of frequency of usage for various short
16-bit formats says which to choose.
There are also some that are specifically designed to be short,
like Add Tiny ADDTY rsd,tiny and SUBTY rsd,tiny adds or subtracts
a 4-bit tiny value in the range 1..16 (the value 0 is considered 16).
For incrementing/decrementing through arrays.
Also short 2 operand ADD2 rsd,rs and ADD2 rsd,imm could be useful.
Similar.
A lot of the 16-bit ops with immediate values in my ISA also use 4-bit
immediate fields (along with 4 bit register IDs).
Not much space to afford much more.
The 1R ops generally have a full 5 bit register though.
There are a few ops with an 8 bit immediate, namely LDI and ADD.
There are two ops with a 12-bit immediate (which load an Imm12 with zero
or one extension into R0). A few of these made more sense in the early
form of the ISA, but no longer make quite as much sense. Defining the
32-bit encodings as the baseline, *1, and having a lot of 32-bit Imm9
and Imm10 encodings, significantly reduces the number of cases where
these are useful.
Or:
7zzz: Still not used
9zzz: Old FPU, no longer used.
Ajjj: Imm12, could drop eventually (loss of relevance)
Bjjj: Imm12, could drop eventually (loss of relevance)
*1: Early on, the idea was for 16-bit encodings to be the baseline, with
32-bit encodings as an extension (like in SH), but this switched around
partly when I later looked into a fixed-length subset, and realized that
making the whole ISA encodable as 32-bit ops made more sense than
shoe-horning everything into 16-bit encodings (a fixed 32-bit case would
have modestly worse code code density; but a fixed 16-bit case would
take a pretty big hit in terms of performance).
Still, I budgeted most of the encoding space to 16-bit encodings, as
they need it more than the 32-bit encodings did.
Though, one could still likely get by with probably ~ 15.59 bits of
encoding space (eg: 0zzz..Bzzz as 16-bit, Czzz..Fzzz as 32-bit), or
maybe even just 15 bits (0zzz..7zzz as 16-bit, 8zzz..Fzzz as 32-bit).
Cramming everything down to 14 bits would get a bit harder though.
Post by EricPPost by MitchAlsupPost by EricPalong with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.
It is already way over the top. This might also be a good time to state
that the 16-bit encodings do not give one access to both signed and
unsigned data {or overflowing or saturating}.
I don't follow you here.
Which opcodes wind up with short 16-bit format encodings is all to do
with (a) can it possibly fit and (b) which has the highest usage.
All 5 wrap/trap/sat of the 2 operand ADD2 opcodes fit into a
16-bit format. The question is which need to be there.
Yep.
The most "well trodden ground" in my compiler output seems to be:
Memory load/store (particularly SP relative cases);
Branch ops;
ALU 2R ops (such as "ADD Rm, Rn");
Various 0R and 1R ops (such as 'RTS' and similar);
Small immed ops.
A few "massive spikes" are:
"MOV Rm, Rn"
"ADD Imm8, Rn"
"LDI Imm8, Rn"
Within 32-bit encodings, the overall distribution seems to be similar.
Post by EricPPost by MitchAlsupPost by EricPAnd one can say all the same things for logic ops AND, OR, XOR,
and complimenting the source and/or result bits.
When one looks at the integer side of computations {Logical, signed,
unsigned} one sees that an XOR gate in the operand delivery path and
carry inputs to the integer adder, provide for both negation and
inversion. This comes at no more gate-delay because one of these XORs
is already in the path to enable SUBtracts!
On the FP side, negate is a single XOR gate on the sign bit (often
done with a multiplexer to get ABS() and -ABS() at the same time.
Oh I know they are inexpensive to execute.
I might toss integer and float MIN and MAX in there too
as they happen a lot and it costs just a mux.
Post by MitchAlsupPost by EricPEven more so when one considers effective NOP's, zero or -1 results
when both source registers are the same.
All those redundant or useless encodings start
to look ripe for clawing back and reassignment.
And the ISA starts to look a little more weathered.
One must choose carefully.
Right, I just don't need 50 different ways to zero a register
taking up precious 16-bit opcode space.
Still kind of funny that I still have some 16-bit encoding space left
over. Doesn't really seem that much though that there is much which
could use a 16-bit encoding and will actually fit there.