Discussion:
Reverse subtract
(too old to reply)
Rick C. Hodgin
2020-09-17 01:34:43 UTC
Permalink
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?

It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?

It would be used like this:

; Setup a scan operation
mov ecx,maximum_length_of_string
mov al,char_to_search_for
mov edi,offset start_of_string

; Scan forward
cld
repne scasb

; Right here, ecx has been decreased from its staring value of
; maximum_length_of_string by the number of characters
; skipped before while the comparison was not equal.
; Could be zero, or could have be the starting value.
; It doesn't store a count of how many it moved, so a
; calculation to obtain that value is needed:

; Reverse subtract
rsub ecx,maximum_length_of_string
; ecx is set to (maximum_legth_of_string - ecx)

It's the equivalent of:

; Traditional way
mov ebx,maximum_lenth_of_string
sub ebx,ecx
--
Rick C. Hodgin
Quadibloc
2020-09-17 02:06:27 UTC
Permalink
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
Not that I know of, but I do recall that the Texas Instruments 9900 had a "clear bits corresponding" instruction that was sort of an inverted-AND, so
such an architecture could exist out there.

John Savard
BGB
2020-09-17 03:15:33 UTC
Permalink
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Could be done (in the immediate case), and wouldn't actually be all that
difficult or expensive in my case, but not sure if its use would be
common enough to make it worthwhile (or to justify its cost in terms of
encoding space; I have very few Imm9 spots left).

Skimming over some code, it seems like it is likely to maybe come up
roughly a few times per translation unit. Could be more relevant if it
happens in a tight loop.

Though, either way, "y=imm-x;" seems occur somewhat less often than
"y=x-imm;".


For the register case, both the normal and reverse cases exist if using
3R ops.

For memory, it is N/A in my case because the ISA is Load/Store.
Post by Rick C. Hodgin
    ; Setup a scan operation
    mov     ecx,maximum_length_of_string
    mov     al,char_to_search_for
    mov     edi,offset start_of_string
    ; Scan forward
    cld
    repne   scasb
    ; Right here, ecx has been decreased from its staring value of
    ;             maximum_length_of_string by the number of characters
    ;             skipped before while the comparison was not equal.
    ;             Could be zero, or could have be the starting value.
    ;             It doesn't store a count of how many it moved, so a
    ; Reverse subtract
    rsub    ecx,maximum_length_of_string
    ; ecx is set to (maximum_legth_of_string - ecx)
    ; Traditional way
    mov     ebx,maximum_lenth_of_string
    sub     ebx,ecx
MitchAlsup
2020-09-17 15:51:46 UTC
Permalink
Post by BGB
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Could be done (in the immediate case), and wouldn't actually be all that
difficult or expensive in my case, but not sure if its use would be
common enough to make it worthwhile (or to justify its cost in terms of
encoding space; I have very few Imm9 spots left).
Skimming over some code, it seems like it is likely to maybe come up
roughly a few times per translation unit. Could be more relevant if it
happens in a tight loop.
Though, either way, "y=imm-x;" seems occur somewhat less often than
"y=x-imm;".
For the register case, both the normal and reverse cases exist if using
3R ops.
For memory, it is N/A in my case because the ISA is Load/Store.
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
Post by BGB
Post by Rick C. Hodgin
    ; Setup a scan operation
    mov     ecx,maximum_length_of_string
    mov     al,char_to_search_for
    mov     edi,offset start_of_string
    ; Scan forward
    cld
    repne   scasb
    ; Right here, ecx has been decreased from its staring value of
    ;             maximum_length_of_string by the number of characters
    ;             skipped before while the comparison was not equal.
    ;             Could be zero, or could have be the starting value.
    ;             It doesn't store a count of how many it moved, so a
    ; Reverse subtract
    rsub    ecx,maximum_length_of_string
    ; ecx is set to (maximum_legth_of_string - ecx)
    ; Traditional way
    mov     ebx,maximum_lenth_of_string
    sub     ebx,ecx
Stephen Fuld
2020-09-17 16:27:53 UTC
Permalink
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Could be done (in the immediate case), and wouldn't actually be all that
difficult or expensive in my case, but not sure if its use would be
common enough to make it worthwhile (or to justify its cost in terms of
encoding space; I have very few Imm9 spots left).
Skimming over some code, it seems like it is likely to maybe come up
roughly a few times per translation unit. Could be more relevant if it
happens in a tight loop.
Though, either way, "y=imm-x;" seems occur somewhat less often than
"y=x-imm;".
For the register case, both the normal and reverse cases exist if using
3R ops.
For memory, it is N/A in my case because the ISA is Load/Store.
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
And the Univac 1108 has load, load negative, load magnitude (abs), and
load negative magnitude. There are also store, store negative and store
magnitude (no store negative magnitude).

I guess these are not really violations of the no load-op dictum of RISC
designs (though neither of the above designs are RISC), as the
"operation", negation and or abs, doesn't require an additional operand
input.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
BGB
2020-09-17 19:57:05 UTC
Permalink
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Could be done (in the immediate case), and wouldn't actually be all that
difficult or expensive in my case, but not sure if its use would be
common enough to make it worthwhile (or to justify its cost in terms of
encoding space; I have very few Imm9 spots left).
Skimming over some code, it seems like it is likely to maybe come up
roughly a few times per translation unit. Could be more relevant if it
happens in a tight loop.
Though, either way, "y=imm-x;" seems occur somewhat less often than
"y=x-imm;".
Had noted that I do have:
ADD Rm, Imm9, Rn
Which currently doesn't use the Q bit (IIRC, it previously encoded an
"ADD with high bits undefined" op, but this went away as "ADD with sign
or zero extension" can serve the same purpose and on-average leads to
more efficient code generation anyways).

It is "possible" I could define the Q bit to decode as if it were:
SUB Imm9, Rm, Rn

But, I am left to debate whether it makes sense to do so, as looking
into it a little more, it seems to be less common than my initial
estimate (it seems there are many translation units where this case
doesn't happen at all).
Post by MitchAlsup
Post by BGB
For the register case, both the normal and reverse cases exist if using
3R ops.
For memory, it is N/A in my case because the ISA is Load/Store.
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
Possible, but not necessarily a win on an instruction-timing front (with
twos complement).
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
    ; Setup a scan operation
    mov     ecx,maximum_length_of_string
    mov     al,char_to_search_for
    mov     edi,offset start_of_string
    ; Scan forward
    cld
    repne   scasb
    ; Right here, ecx has been decreased from its staring value of
    ;             maximum_length_of_string by the number of characters
    ;             skipped before while the comparison was not equal.
    ;             Could be zero, or could have be the starting value.
    ;             It doesn't store a count of how many it moved, so a
    ; Reverse subtract
    rsub    ecx,maximum_length_of_string
    ; ecx is set to (maximum_legth_of_string - ecx)
    ; Traditional way
    mov     ebx,maximum_lenth_of_string
    sub     ebx,ecx
MitchAlsup
2020-09-17 20:42:31 UTC
Permalink
Post by BGB
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Could be done (in the immediate case), and wouldn't actually be all that
difficult or expensive in my case, but not sure if its use would be
common enough to make it worthwhile (or to justify its cost in terms of
encoding space; I have very few Imm9 spots left).
Skimming over some code, it seems like it is likely to maybe come up
roughly a few times per translation unit. Could be more relevant if it
happens in a tight loop.
Though, either way, "y=imm-x;" seems occur somewhat less often than
"y=x-imm;".
ADD Rm, Imm9, Rn
Which currently doesn't use the Q bit (IIRC, it previously encoded an
"ADD with high bits undefined" op, but this went away as "ADD with sign
or zero extension" can serve the same purpose and on-average leads to
more efficient code generation anyways).
SUB Imm9, Rm, Rn
But, I am left to debate whether it makes sense to do so, as looking
into it a little more, it seems to be less common than my initial
estimate (it seems there are many translation units where this case
doesn't happen at all).
Post by MitchAlsup
Post by BGB
For the register case, both the normal and reverse cases exist if using
3R ops.
For memory, it is N/A in my case because the ISA is Load/Store.
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
Possible, but not necessarily a win on an instruction-timing front (with
twos complement).
Only in the case where you are trying to compress the LD pipeline into 2
cycles is this "a speed path". With the necessity of Registered SRAMs
you can't compress the LD pipeline into 2-cycles (ala MIPS 2000) anyway.
The Load aligner is only 2 gates (and lots of wires) deep, while tag
comparison is at least 5-gates. There is time to insert the negation.

But, I have been informed that I am wrong in that the IBM 360 ISA does
not have this functionality.
Post by BGB
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
    ; Setup a scan operation
    mov     ecx,maximum_length_of_string
    mov     al,char_to_search_for
    mov     edi,offset start_of_string
    ; Scan forward
    cld
    repne   scasb
    ; Right here, ecx has been decreased from its staring value of
    ;             maximum_length_of_string by the number of characters
    ;             skipped before while the comparison was not equal.
    ;             Could be zero, or could have be the starting value.
    ;             It doesn't store a count of how many it moved, so a
    ; Reverse subtract
    rsub    ecx,maximum_length_of_string
    ; ecx is set to (maximum_legth_of_string - ecx)
    ; Traditional way
    mov     ebx,maximum_lenth_of_string
    sub     ebx,ecx
BGB
2020-09-18 18:24:52 UTC
Permalink
Post by MitchAlsup
Post by BGB
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Could be done (in the immediate case), and wouldn't actually be all that
difficult or expensive in my case, but not sure if its use would be
common enough to make it worthwhile (or to justify its cost in terms of
encoding space; I have very few Imm9 spots left).
Skimming over some code, it seems like it is likely to maybe come up
roughly a few times per translation unit. Could be more relevant if it
happens in a tight loop.
Though, either way, "y=imm-x;" seems occur somewhat less often than
"y=x-imm;".
ADD Rm, Imm9, Rn
Which currently doesn't use the Q bit (IIRC, it previously encoded an
"ADD with high bits undefined" op, but this went away as "ADD with sign
or zero extension" can serve the same purpose and on-average leads to
more efficient code generation anyways).
SUB Imm9, Rm, Rn
But, I am left to debate whether it makes sense to do so, as looking
into it a little more, it seems to be less common than my initial
estimate (it seems there are many translation units where this case
doesn't happen at all).
Post by MitchAlsup
Post by BGB
For the register case, both the normal and reverse cases exist if using
3R ops.
For memory, it is N/A in my case because the ISA is Load/Store.
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
Possible, but not necessarily a win on an instruction-timing front (with
twos complement).
Only in the case where you are trying to compress the LD pipeline into 2
cycles is this "a speed path". With the necessity of Registered SRAMs
you can't compress the LD pipeline into 2-cycles (ala MIPS 2000) anyway.
The Load aligner is only 2 gates (and lots of wires) deep, while tag
comparison is at least 5-gates. There is time to insert the negation.
Currently:
EX1: Generate Address, Initiate fetch from cache
EX2: Check for cache-miss, extract value from block
EX3: Insert value to cache line, initiate store-back (Store)
Get value back into pipeline (Load).
WB: Value from Load gets written back to register.
Memory store gets cache-line written back.

There was previously some internal forwarding in the cache to deal with
loads from a cache-line that was just written into, or a miss involving
a just stored-into block.

This has since been replaced by merely detecting this case and
initiating a stall (to give the cache time to finish up), since doing so
is cheaper and has relatively little performance impact.


While potentially, it is possible that the value could be negated in
EX3, timing with the register forwarding is already pretty tight
(similar reason for ADD/SUB/... being multi-cycle operations; though ALU
ops will get 1-cycle throughput if one doesn't try to use the result
immediately, similar to the present case with Load/Store ops; one can
run ALU ops in parallel, but they still have an issue though with
non-zero latency).


In VLIW code, when writing ASM, it makes sense to try for an even/odd
bundle pattern regarding things like ALU ops. When results are computed,
try to avoid using them directly within the next bundle (though the
next-even or next-odd bundle is generally safe).
Post by MitchAlsup
But, I have been informed that I am wrong in that the IBM 360 ISA does
not have this functionality.
...

It also seems like one of those things which probably wouldn't come up
all that often.
Post by MitchAlsup
Post by BGB
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
    ; Setup a scan operation
    mov     ecx,maximum_length_of_string
    mov     al,char_to_search_for
    mov     edi,offset start_of_string
    ; Scan forward
    cld
    repne   scasb
    ; Right here, ecx has been decreased from its staring value of
    ;             maximum_length_of_string by the number of characters
    ;             skipped before while the comparison was not equal.
    ;             Could be zero, or could have be the starting value.
    ;             It doesn't store a count of how many it moved, so a
    ; Reverse subtract
    rsub    ecx,maximum_length_of_string
    ; ecx is set to (maximum_legth_of_string - ecx)
    ; Traditional way
    mov     ebx,maximum_lenth_of_string
    sub     ebx,ecx
MitchAlsup
2020-09-18 20:31:42 UTC
Permalink
Post by BGB
Post by MitchAlsup
Post by BGB
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Could be done (in the immediate case), and wouldn't actually be all that
difficult or expensive in my case, but not sure if its use would be
common enough to make it worthwhile (or to justify its cost in terms of
encoding space; I have very few Imm9 spots left).
Skimming over some code, it seems like it is likely to maybe come up
roughly a few times per translation unit. Could be more relevant if it
happens in a tight loop.
Though, either way, "y=imm-x;" seems occur somewhat less often than
"y=x-imm;".
ADD Rm, Imm9, Rn
Which currently doesn't use the Q bit (IIRC, it previously encoded an
"ADD with high bits undefined" op, but this went away as "ADD with sign
or zero extension" can serve the same purpose and on-average leads to
more efficient code generation anyways).
SUB Imm9, Rm, Rn
But, I am left to debate whether it makes sense to do so, as looking
into it a little more, it seems to be less common than my initial
estimate (it seems there are many translation units where this case
doesn't happen at all).
Post by MitchAlsup
Post by BGB
For the register case, both the normal and reverse cases exist if using
3R ops.
For memory, it is N/A in my case because the ISA is Load/Store.
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
Possible, but not necessarily a win on an instruction-timing front (with
twos complement).
Only in the case where you are trying to compress the LD pipeline into 2
cycles is this "a speed path". With the necessity of Registered SRAMs
you can't compress the LD pipeline into 2-cycles (ala MIPS 2000) anyway.
The Load aligner is only 2 gates (and lots of wires) deep, while tag
comparison is at least 5-gates. There is time to insert the negation.
EX1: Generate Address, Initiate fetch from cache
EX2: Check for cache-miss, extract value from block
EX3: Insert value to cache line, initiate store-back (Store)
Get value back into pipeline (Load).
WB: Value from Load gets written back to register.
Memory store gets cache-line written back.
There was previously some internal forwarding in the cache to deal with
loads from a cache-line that was just written into, or a miss involving
a just stored-into block.
This has since been replaced by merely detecting this case and
initiating a stall (to give the cache time to finish up), since doing so
is cheaper and has relatively little performance impact.
While potentially, it is possible that the value could be negated in
EX3, timing with the register forwarding is already pretty tight
(similar reason for ADD/SUB/... being multi-cycle operations; though ALU
ops will get 1-cycle throughput if one doesn't try to use the result
immediately, similar to the present case with Load/Store ops; one can
run ALU ops in parallel, but they still have an issue though with
non-zero latency).
Here, you are assuming one needs to "fully negate" the result. When in fact,
all you need to do is to bit invert the field, and remember to insert a carry
when you get to the ALU stage.

-r becomes {~r, -1}

So there is only 1-gate of delay for the negation. Now, notice that if the
consumer is logical-unit, then thee operands are merely inverted not negated
so the adder uses the carry in while the logical ignores carry in, and, presto
it all works with 1 gate of delay !
Post by BGB
In VLIW code, when writing ASM, it makes sense to try for an even/odd
bundle pattern regarding things like ALU ops. When results are computed,
try to avoid using them directly within the next bundle (though the
next-even or next-odd bundle is generally safe).
Post by MitchAlsup
But, I have been informed that I am wrong in that the IBM 360 ISA does
not have this functionality.
...
It also seems like one of those things which probably wouldn't come up
all that often.
Post by MitchAlsup
Post by BGB
Post by MitchAlsup
Post by BGB
Post by Rick C. Hodgin
    ; Setup a scan operation
    mov     ecx,maximum_length_of_string
    mov     al,char_to_search_for
    mov     edi,offset start_of_string
    ; Scan forward
    cld
    repne   scasb
    ; Right here, ecx has been decreased from its staring value of
    ;             maximum_length_of_string by the number of characters
    ;             skipped before while the comparison was not equal.
    ;             Could be zero, or could have be the starting value.
    ;             It doesn't store a count of how many it moved, so a
    ; Reverse subtract
    rsub    ecx,maximum_length_of_string
    ; ecx is set to (maximum_legth_of_string - ecx)
    ; Traditional way
    mov     ebx,maximum_lenth_of_string
    sub     ebx,ecx
Quadibloc
2020-09-22 22:10:03 UTC
Permalink
Post by MitchAlsup
Post by MitchAlsup
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
But, I have been informed that I am wrong in that the IBM 360 ISA does
not have this functionality.
Actually, you're wrong but the IBM 360 ISA (or at least the 370 ISA) does have... _half_ this functionality.

There's a Load Complement instruction, but it's _only_ for floating-point numbers.

John Savard
MitchAlsup
2020-09-23 16:33:06 UTC
Permalink
Post by Quadibloc
Post by MitchAlsup
Post by MitchAlsup
IBM 360 has Load and Negate
LN Rd,disp(Rbase,Rindex)
But, I have been informed that I am wrong in that the IBM 360 ISA does
not have this functionality.
Actually, you're wrong but the IBM 360 ISA (or at least the 370 ISA) does have... _half_ this functionality.
There's a Load Complement instruction, but it's _only_ for floating-point numbers.
There is LNR Load Negative Register but not from memory which is what my
original statement purported.
Post by Quadibloc
John Savard
John Levine
2020-09-23 18:31:19 UTC
Permalink
Post by MitchAlsup
Post by Quadibloc
Actually, you're wrong but the IBM 360 ISA (or at least the 370 ISA) does have... _half_ this functionality.
There's a Load Complement instruction, but it's _only_ for floating-point numbers.
There is LNR Load Negative Register but not from memory which is what my
original statement purported.
Sheesh, it it really that hard to go back and check?

The 360 had LCR which loads the two's complement of the register
contents, LPR which loads the absolute value, and LNR which loads the
negative of the absolute value. They all set the condition code. LCR
and LPR can overflow if the value is 0x80000000, LNR cannot.

There's also LTR which loads and sets the condition code and the more
common LR which does not change the condition code.

Those are all fixed point instructions. There were also floating point
LTER LTDR LCER LCDR LNER LNDR LPER LPDR.

I never recall seeing an LNR instruction. I wonder if they had a use case in
mind or it was just there for symmetry with LPR.
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
MitchAlsup
2020-09-23 18:40:46 UTC
Permalink
Post by John Levine
Post by MitchAlsup
Post by Quadibloc
Actually, you're wrong but the IBM 360 ISA (or at least the 370 ISA) does have... _half_ this functionality.
There's a Load Complement instruction, but it's _only_ for floating-point numbers.
There is LNR Load Negative Register but not from memory which is what my
original statement purported.
Sheesh, it it really that hard to go back and check?
I did go out and read the original 1963 printing of Principles of Operation.
Post by John Levine
The 360 had LCR which loads the two's complement of the register
register to register
Post by John Levine
contents, LPR which loads the absolute value, and LNR which loads the
both register to register
Post by John Levine
negative of the absolute value. They all set the condition code. LCR
and LPR can overflow if the value is 0x80000000, LNR cannot.
My first comment was attached to the notion one could READ MEMORY and
obtain the inversion/negative value in the destination register.

The memory reference version of this instruction DOES NOT EXIST in 360 ISA,
while the register to register versions do.
Post by John Levine
There's also LTR which loads and sets the condition code and the more
common LR which does not change the condition code.
Those are all fixed point instructions. There were also floating point
LTER LTDR LCER LCDR LNER LNDR LPER LPDR.
I never recall seeing an LNR instruction. I wonder if they had a use case in
mind or it was just there for symmetry with LPR.
--
Regards,
Please consider the environment before reading this e-mail. https://jl.ly
Stephen Fuld
2020-09-17 06:00:24 UTC
Permalink
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
Terje Mathisen
2020-09-17 06:17:24 UTC
Permalink
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600.  The add instruction (as well as others) allows you to
negate an operand.  So standard subtract is simply add with the second
operand negated.  Similarly, subtract reverse would be add with the
first value negated.
Exactly right.

I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
EricP
2020-09-17 15:18:57 UTC
Permalink
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.

The question is does one eliminate redundant or useless encodings?
For example, for a 3 register ADD can swap source registers rs1 and rs2.
Or that (-rd) = (-rs1) + (-rs2) is just an ADD.
Or when rs1 = rs2, rd = 0, and (-rd) is rd = -1.
And that if one operand is an immediate there is no need
to negate at run time as the assembler/compiler does so.

At first look it seems there are only 8 variants so why bother.
But then when you consider there are 5 different ADD operations,
with non-trapping (wrapping), signed and unsigned trap or saturate,
these start to add up. Also one source can be a register or immediate,
and there are various sizes of immediates, lets say 32 or 64 bits.

So now there are 8*5*3 = 120 different kinds of ADD.

If one has a 32-bit opcode then this is not so bad.
But if one is trying to pack instructions into a 16-bit opcode,
along with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.

And one can say all the same things for logic ops AND, OR, XOR,
and complimenting the source and/or result bits.
Even more so when one considers effective NOP's, zero or -1 results
when both source registers are the same.

All those redundant or useless encodings start
to look ripe for clawing back and reassignment.
And the ISA starts to look a little more weathered.
MitchAlsup
2020-09-17 16:08:16 UTC
Permalink
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
Post by EricP
For example, for a 3 register ADD can swap source registers rs1 and rs2.
Or that (-rd) = (-rs1) + (-rs2) is just an ADD.
Or when rs1 = rs2, rd = 0, and (-rd) is rd = -1.
And that if one operand is an immediate there is no need
to negate at run time as the assembler/compiler does so.
At first look it seems there are only 8 variants so why bother.
But then when you consider there are 5 different ADD operations,
with non-trapping (wrapping), signed and unsigned trap or saturate,
these start to add up. Also one source can be a register or immediate,
and there are various sizes of immediates, lets say 32 or 64 bits.
So now there are 8*5*3 = 120 different kinds of ADD.
Clearly (CLEARLY) at this point the encoding space is out of hand....

When I faced this problem, I soon realized that ::
a) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space

So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
Post by EricP
If one has a 32-bit opcode then this is not so bad.
But if one is trying to pack instructions into a 16-bit opcode,
Then you DON'T do this kind of encoding. The 16-bit space is for addressing
code footprint, not a be-all encoding space.
Post by EricP
along with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.
It is already way over the top. This might also be a good time to state
that the 16-bit encodings do not give one access to both signed and
unsigned data {or overflowing or saturating}.
Post by EricP
And one can say all the same things for logic ops AND, OR, XOR,
and complimenting the source and/or result bits.
When one looks at the integer side of computations {Logical, signed,
unsigned} one sees that an XOR gate in the operand delivery path and
carry inputs to the integer adder, provide for both negation and
inversion. This comes at no more gate-delay because one of these XORs
is already in the path to enable SUBtracts!

On the FP side, negate is a single XOR gate on the sign bit (often
done with a multiplexer to get ABS() and -ABS() at the same time.
Post by EricP
Even more so when one considers effective NOP's, zero or -1 results
when both source registers are the same.
All those redundant or useless encodings start
to look ripe for clawing back and reassignment.
And the ISA starts to look a little more weathered.
One must choose carefully.
Jonathan Brandmeyer
2020-09-17 16:22:44 UTC
Permalink
Post by MitchAlsup
Post by EricP
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
a) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space
So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
Stupid question: When do you need/want both operand and result neg/inv support? There are distributive properties for both the logical and arithmetic cases, after all.
MitchAlsup
2020-09-17 17:48:58 UTC
Permalink
Post by Jonathan Brandmeyer
Post by MitchAlsup
Post by EricP
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
a) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space
So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
Stupid question: When do you need/want both operand and result neg/inv support? There are distributive properties for both the logical and arithmetic cases, after all.
2-operand and 3-operand has only negation on the operand side, 1-operand
have negation on both sides::

For example:: {Remembering My 66000 has transcendental instructions}

-EXP( -x )
-POP( ~x )
-FF1( ~x )
MitchAlsup
2020-09-17 17:52:58 UTC
Permalink
Post by MitchAlsup
Post by Jonathan Brandmeyer
Post by MitchAlsup
Post by EricP
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
a) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space
So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
Stupid question: When do you need/want both operand and result neg/inv support? There are distributive properties for both the logical and arithmetic cases, after all.
2-operand and 3-operand has only negation on the operand side, 1-operand
For example:: {Remembering My 66000 has transcendental instructions}
-EXP( -x )
-POP( ~x )
-FF1( ~x )
I should also add that the 1-operand encoding space is currently using only
9 of the 128 possible encodings (which could easily be increased to 128*32
= 4096 possibles.) So, one can justify more waste in this sub-space.
EricP
2020-09-17 19:36:02 UTC
Permalink
Post by MitchAlsup
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
No, it is useless if it does something that no one wants.
Negating a constant would be one. Adding zero can be another.
I'm not suggesting My66000 does this.

In many load/store ISA's an immediate is the second source operand.
Subtract opcode means negate the second operand.
So Subtract Immediate is a useless 3-operand opcode.
But if we make immediates the first source operand then it works
for both ADD and SUB and we don't need 3 operand Subtract Reverse.
Now we can do SUB rd,-5,rs.

But consider 2 operand SUB2 rsd,rs is rsd = rsd-rs.
Now Subtract Reverse does have a use SUBR2 rsd,rs is rsd = rs-rsd.

Also reverse operands are useful for 3 operand divide immediate
as we need both rd = rs/imm and rd = imm/rs.
Post by MitchAlsup
Post by EricP
For example, for a 3 register ADD can swap source registers rs1 and rs2.
Or that (-rd) = (-rs1) + (-rs2) is just an ADD.
Or when rs1 = rs2, rd = 0, and (-rd) is rd = -1.
And that if one operand is an immediate there is no need
to negate at run time as the assembler/compiler does so.
At first look it seems there are only 8 variants so why bother.
But then when you consider there are 5 different ADD operations,
with non-trapping (wrapping), signed and unsigned trap or saturate,
these start to add up. Also one source can be a register or immediate,
and there are various sizes of immediates, lets say 32 or 64 bits.
So now there are 8*5*3 = 120 different kinds of ADD.
Clearly (CLEARLY) at this point the encoding space is out of hand....
This is what would happen if I tried mixing my priority features,
trapping/saturating arithmetic, with your operand negate features.

And I'm not necessarily bothered by it, rather just recognizing
where that trail leads.
Post by MitchAlsup
a) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space
So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
I'm just looking at where asymmetries enter and their effect
on possible encodings.
Post by MitchAlsup
Post by EricP
If one has a 32-bit opcode then this is not so bad.
But if one is trying to pack instructions into a 16-bit opcode,
Then you DON'T do this kind of encoding. The 16-bit space is for addressing
code footprint, not a be-all encoding space.
Right, and this is where orthogonality is tossed away.

The way I approached it is to lay out all 0,1,2,3 and some 4 operand
instructions in 32 bit formats and mark the ones that can also fit
into 16 bits. A guesstimate of frequency of usage for various short
16-bit formats says which to choose.

There are also some that are specifically designed to be short,
like Add Tiny ADDTY rsd,tiny and SUBTY rsd,tiny adds or subtracts
a 4-bit tiny value in the range 1..16 (the value 0 is considered 16).
For incrementing/decrementing through arrays.
Also short 2 operand ADD2 rsd,rs and ADD2 rsd,imm could be useful.
Post by MitchAlsup
Post by EricP
along with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.
It is already way over the top. This might also be a good time to state
that the 16-bit encodings do not give one access to both signed and
unsigned data {or overflowing or saturating}.
I don't follow you here.
Which opcodes wind up with short 16-bit format encodings is all to do
with (a) can it possibly fit and (b) which has the highest usage.

All 5 wrap/trap/sat of the 2 operand ADD2 opcodes fit into a
16-bit format. The question is which need to be there.
Post by MitchAlsup
Post by EricP
And one can say all the same things for logic ops AND, OR, XOR,
and complimenting the source and/or result bits.
When one looks at the integer side of computations {Logical, signed,
unsigned} one sees that an XOR gate in the operand delivery path and
carry inputs to the integer adder, provide for both negation and
inversion. This comes at no more gate-delay because one of these XORs
is already in the path to enable SUBtracts!
On the FP side, negate is a single XOR gate on the sign bit (often
done with a multiplexer to get ABS() and -ABS() at the same time.
Oh I know they are inexpensive to execute.
I might toss integer and float MIN and MAX in there too
as they happen a lot and it costs just a mux.
Post by MitchAlsup
Post by EricP
Even more so when one considers effective NOP's, zero or -1 results
when both source registers are the same.
All those redundant or useless encodings start
to look ripe for clawing back and reassignment.
And the ISA starts to look a little more weathered.
One must choose carefully.
Right, I just don't need 50 different ways to zero a register
taking up precious 16-bit opcode space.
MitchAlsup
2020-09-17 20:38:03 UTC
Permalink
Post by EricP
Post by MitchAlsup
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
No, it is useless if it does something that no one wants.
Negating a constant would be one. Adding zero can be another.
I'm not suggesting My66000 does this.
In many load/store ISA's an immediate is the second source operand.
check
Post by EricP
Subtract opcode means negate the second operand.
check
Post by EricP
So Subtract Immediate is a useless 3-operand opcode.
I came to the same conclusion but even stronger--SUB itself is the useless
OpCode !! since a-b = a+(-b)
Post by EricP
But if we make immediates the first source operand then it works
for both ADD and SUB and we don't need 3 operand Subtract Reverse.
bingo
Post by EricP
Now we can do SUB rd,-5,rs.
Did you mean ADD rd,-5,rs ??
Post by EricP
But consider 2 operand SUB2 rsd,rs is rsd = rsd-rs.
Now Subtract Reverse does have a use SUBR2 rsd,rs is rsd = rs-rsd.
Which in the imm16 case SPARC did.
Post by EricP
Also reverse operands are useful for 3 operand divide immediate
as we need both rd = rs/imm and rd = imm/rs.
a) I still call these 2-operand (and 1-result) instructions
b) My 66000 provides these in full generality
DIV Rd,12345678901234,Rs2
Post by EricP
Post by MitchAlsup
Post by EricP
For example, for a 3 register ADD can swap source registers rs1 and rs2.
Or that (-rd) = (-rs1) + (-rs2) is just an ADD.
Or when rs1 = rs2, rd = 0, and (-rd) is rd = -1.
And that if one operand is an immediate there is no need
to negate at run time as the assembler/compiler does so.
At first look it seems there are only 8 variants so why bother.
But then when you consider there are 5 different ADD operations,
with non-trapping (wrapping), signed and unsigned trap or saturate,
these start to add up. Also one source can be a register or immediate,
and there are various sizes of immediates, lets say 32 or 64 bits.
So now there are 8*5*3 = 120 different kinds of ADD.
Clearly (CLEARLY) at this point the encoding space is out of hand....
This is what would happen if I tried mixing my priority features,
trapping/saturating arithmetic, with your operand negate features.
And I'm not necessarily bothered by it, rather just recognizing
where that trail leads.
Post by MitchAlsup
a) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space
So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
I'm just looking at where asymmetries enter and their effect
on possible encodings.
Post by MitchAlsup
Post by EricP
If one has a 32-bit opcode then this is not so bad.
But if one is trying to pack instructions into a 16-bit opcode,
Then you DON'T do this kind of encoding. The 16-bit space is for addressing
code footprint, not a be-all encoding space.
Right, and this is where orthogonality is tossed away.
The way I approached it is to lay out all 0,1,2,3 and some 4 operand
instructions in 32 bit formats and mark the ones that can also fit
into 16 bits. A guesstimate of frequency of usage for various short
16-bit formats says which to choose.
I might note at this point that when the instruction set has [Base+scaled
Index+Displacement] addressing the density of shift instruction drops into
the 2% range.
Post by EricP
There are also some that are specifically designed to be short,
like Add Tiny ADDTY rsd,tiny and SUBTY rsd,tiny adds or subtracts
a 4-bit tiny value in the range 1..16 (the value 0 is considered 16).
These are mostly (80%-ile) covered by INC and DEC, should you like.
And {1,2,4,8} covers the 95%-ile.
Post by EricP
For incrementing/decrementing through arrays.
Also short 2 operand ADD2 rsd,rs and ADD2 rsd,imm could be useful.
One of the reason My 66000 went so far in operand sign control was to
eliminate essentially ALL negate and Invert instructions (2%-level gain)
Post by EricP
Post by MitchAlsup
Post by EricP
along with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.
It is already way over the top. This might also be a good time to state
that the 16-bit encodings do not give one access to both signed and
unsigned data {or overflowing or saturating}.
I don't follow you here.
Which opcodes wind up with short 16-bit format encodings is all to do
with (a) can it possibly fit and (b) which has the highest usage.
Signed calculations Overflow/Underflow while Unsigned and Saturating do not.

I suggest that, when using 16-bit encodings, you do not want one variant that
raises an exception and another that does not. If you want access to "special"
arithmetic properties, use the 32-bit instruction space. Since the vary vast
majority of integer arithmetic will never O/U (on a 64-bit machine), eating
a large OpCOde when one wants that special "feature" is the proper thing to
do.

One other note concerning Overflow on unsigned integers is how to detect
that::

ADD Rd,Rs1,-2 // undeflows 0 !
while
ADD Rd,Rs1,0xFFFFFFFFFFFFFFFE // overflows U_Pos_Max !
Post by EricP
All 5 wrap/trap/sat of the 2 operand ADD2 opcodes fit into a
16-bit format. The question is which need to be there.
The ones used most often.
Post by EricP
Post by MitchAlsup
Post by EricP
And one can say all the same things for logic ops AND, OR, XOR,
and complimenting the source and/or result bits.
When one looks at the integer side of computations {Logical, signed,
unsigned} one sees that an XOR gate in the operand delivery path and
carry inputs to the integer adder, provide for both negation and
inversion. This comes at no more gate-delay because one of these XORs
is already in the path to enable SUBtracts!
On the FP side, negate is a single XOR gate on the sign bit (often
done with a multiplexer to get ABS() and -ABS() at the same time.
Oh I know they are inexpensive to execute.
I might toss integer and float MIN and MAX in there too
as they happen a lot and it costs just a mux.
My 66000 has signed, unsigned, and floating point; MIN and MAX. ABS() just
happens to be:
MAX Rd,Rs1,-Rs1 // !!
saving OpCode space. While -ABS() happens to be:
MIN Rd,Rs1,-Rs2 // !!

{I am sure glad that IEEE 754-2019 got rid of the 754-2008 MIN and MAX crap}
Post by EricP
Post by MitchAlsup
Post by EricP
Even more so when one considers effective NOP's, zero or -1 results
when both source registers are the same.
All those redundant or useless encodings start
to look ripe for clawing back and reassignment.
And the ISA starts to look a little more weathered.
One must choose carefully.
Right, I just don't need 50 different ways to zero a register
taking up precious 16-bit opcode space.
Especially the precious 16-bit space.

But one thing troubles me with the 16-bit space:: In the 32-bit space,
My 66000 went out of its way to permanently reserve 6-OpCodes to prevent
jumping into random data and doing much for long. These reserved OpCodes
correspond to:
Integer{ 0..2^26-1, and -2^26..-1} and
floating point {-1/128..-32,+1/128..+32}
So jumping into most kinds of 32-bit or 64-bit data simply results in
the raising of an OPERATION exception.

Note 1: The reverse field choice in RISC-V prevents this kind of weak
protection.
Note 2: The MMU tables can also prevent this in a much stronger way.
EricP
2020-09-17 22:35:19 UTC
Permalink
Post by MitchAlsup
Post by EricP
There are also some that are specifically designed to be short,
like Add Tiny ADDTY rsd,tiny and SUBTY rsd,tiny adds or subtracts
a 4-bit tiny value in the range 1..16 (the value 0 is considered 16).
These are mostly (80%-ile) covered by INC and DEC, should you like.
And {1,2,4,8} covers the 95%-ile.
Ok, but I have a handy 4-bit register field there in the 16-bit opcode
and that saves going to a longer instruction with a 16-bit immediate.
So it seemed worth it.

Also I liked the idea ever since I saw it in the NatSemi 32016
instead of INC and DEC back days of yore.
Post by MitchAlsup
Post by EricP
For incrementing/decrementing through arrays.
Also short 2 operand ADD2 rsd,rs and ADD2 rsd,imm could be useful.
One of the reason My 66000 went so far in operand sign control was to
eliminate essentially ALL negate and Invert instructions (2%-level gain)
Post by EricP
Post by MitchAlsup
Post by EricP
along with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.
It is already way over the top. This might also be a good time to state
that the 16-bit encodings do not give one access to both signed and
unsigned data {or overflowing or saturating}.
I don't follow you here.
Which opcodes wind up with short 16-bit format encodings is all to do
with (a) can it possibly fit and (b) which has the highest usage.
Signed calculations Overflow/Underflow while Unsigned and Saturating do not.
I suggest that, when using 16-bit encodings, you do not want one variant that
raises an exception and another that does not. If you want access to "special"
arithmetic properties, use the 32-bit instruction space. Since the vary vast
majority of integer arithmetic will never O/U (on a 64-bit machine), eating
a large OpCOde when one wants that special "feature" is the proper thing to
do.
One other note concerning Overflow on unsigned integers is how to detect
ADD Rd,Rs1,-2 // undeflows 0 !
while
ADD Rd,Rs1,0xFFFFFFFFFFFFFFFE // overflows U_Pos_Max !
I don't make a distinction between signed overflow or underflow traps.
Its just trap-on-wrap to the same vector as bounds check CompareAndTrap.
Add Saturate Unsigned can only saturate high,
Subtract Saturate Unsigned can only saturate low.

ADD and SUB source operands are sign or zero extended to 65 bits
and produce 65 bit results of which [63:0] may be saved.
The difference is what happens with result msb [64:64] (carry/Borrow bit)
or [64:63] the extended sign bits.

ADD zero extends the sources, tosses the result msb [64:64] and wraps.

ADDTU Add Trap Unsigned zero extends sources and
traps if result msb [64:64] Carry is set.
ADDTS Add Trap Signed sign extends sources and
traps if result extended sign bits [64:63] are different (XOR).
ADDSU Add saturate Unsigned zero extends sources and
saturates high to UINT_MAX if result [64:64] Carry is set.
ADDSS Add Saturate Signed sign extends sources and
saturates high to INT_MAX if result [64:63] == 01b (wrap high)
saturates low to INT_MIN if result [64:63] == 10b (wrap low).

SUBSU Subtract Saturate Unsigned zero extends sources and
saturates low to 0 if result [64:64] Borrow is set.
The rest of SUBxx are same behavior as ADDxx.

If rather than ADD and SUB opcodes there were Negate flags
then signed trap & sat operations would work the same,
testing result extended sign bits [64:63].
But unsigned saturates would have to look at the combination
of Negate flags to decide what direction, higher or lower,
the operation was going in order to decide to saturate high or low.

Alternatively, it could zero extend unsigned values to 66 bits,
treat them as signed 66-bit values, apply any negates as though
they were 66-bit signed values, producing a 66-bit result.
Then bits [65:64] can be tested the same as signed values
to determine whether to saturate to 0 low or UINT_MAX high.
MitchAlsup
2020-09-17 23:10:13 UTC
Permalink
Post by EricP
Post by MitchAlsup
Post by EricP
There are also some that are specifically designed to be short,
like Add Tiny ADDTY rsd,tiny and SUBTY rsd,tiny adds or subtracts
a 4-bit tiny value in the range 1..16 (the value 0 is considered 16).
These are mostly (80%-ile) covered by INC and DEC, should you like.
And {1,2,4,8} covers the 95%-ile.
Ok, but I have a handy 4-bit register field there in the 16-bit opcode
and that saves going to a longer instruction with a 16-bit immediate.
So it seemed worth it.
I did the same with 5-bit immediates in place of Rs1.
Post by EricP
Also I liked the idea ever since I saw it in the NatSemi 32016
instead of INC and DEC back days of yore.
Post by MitchAlsup
Post by EricP
For incrementing/decrementing through arrays.
Also short 2 operand ADD2 rsd,rs and ADD2 rsd,imm could be useful.
One of the reason My 66000 went so far in operand sign control was to
eliminate essentially ALL negate and Invert instructions (2%-level gain)
Post by EricP
Post by MitchAlsup
Post by EricP
along with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.
It is already way over the top. This might also be a good time to state
that the 16-bit encodings do not give one access to both signed and
unsigned data {or overflowing or saturating}.
I don't follow you here.
Which opcodes wind up with short 16-bit format encodings is all to do
with (a) can it possibly fit and (b) which has the highest usage.
Signed calculations Overflow/Underflow while Unsigned and Saturating do not.
I suggest that, when using 16-bit encodings, you do not want one variant that
raises an exception and another that does not. If you want access to "special"
arithmetic properties, use the 32-bit instruction space. Since the vary vast
majority of integer arithmetic will never O/U (on a 64-bit machine), eating
a large OpCOde when one wants that special "feature" is the proper thing to
do.
One other note concerning Overflow on unsigned integers is how to detect
ADD Rd,Rs1,-2 // undeflows 0 !
while
ADD Rd,Rs1,0xFFFFFFFFFFFFFFFE // overflows U_Pos_Max !
I don't make a distinction between signed overflow or underflow traps.
I was more pointing this out to the others here to illustrate the
difficulty in determining whether something overflows or underflows.

If the source code was::

uint64_t d = S1 - 2; //it underflows

But if the source code was::

uint64_t d = S1 + 0xFFFFFFFFFFFFFFFE // it overflows!

Yet in almost all architectures, the encoding for -2 is the same as the
encoding for 0xFFFFFFFFFFFFFFFE !
Post by EricP
Its just trap-on-wrap to the same vector as bounds check CompareAndTrap.
Add Saturate Unsigned can only saturate high,
Subtract Saturate Unsigned can only saturate low.
So, if ADD (above) saturated hit, and now one SUB it becomes "natural"
again ?!? "natural" ≡ withing normal integer bounds.
Post by EricP
ADD and SUB source operands are sign or zero extended to 65 bits
and produce 65 bit results of which [63:0] may be saved.
The difference is what happens with result msb [64:64] (carry/Borrow bit)
or [64:63] the extended sign bits.
This is the std way of doing the data path to avoid the boundary conditions.

The other trick, here, is for each 8-bits of the operand, you use 9-bit adder.
If you want to clip the carry insert 00
If you want to propagate the carry insert 01 or 10
if you want to insert a carry insert 11.
Post by EricP
ADD zero extends the sources, tosses the result msb [64:64] and wraps.
ADDTU Add Trap Unsigned zero extends sources and
traps if result msb [64:64] Carry is set.
ADDTS Add Trap Signed sign extends sources and
traps if result extended sign bits [64:63] are different (XOR).
ADDSU Add saturate Unsigned zero extends sources and
saturates high to UINT_MAX if result [64:64] Carry is set.
ADDSS Add Saturate Signed sign extends sources and
saturates high to INT_MAX if result [64:63] == 01b (wrap high)
saturates low to INT_MIN if result [64:63] == 10b (wrap low).
SUBSU Subtract Saturate Unsigned zero extends sources and
saturates low to 0 if result [64:64] Borrow is set.
The rest of SUBxx are same behavior as ADDxx.
If rather than ADD and SUB opcodes there were Negate flags
then signed trap & sat operations would work the same,
testing result extended sign bits [64:63].
check
Post by EricP
But unsigned saturates would have to look at the combination
of Negate flags to decide what direction, higher or lower,
the operation was going in order to decide to saturate high or low.
I think XOR gate solves this.
Post by EricP
Alternatively, it could zero extend unsigned values to 66 bits,
treat them as signed 66-bit values, apply any negates as though
they were 66-bit signed values, producing a 66-bit result.
Then bits [65:64] can be tested the same as signed values
to determine whether to saturate to 0 low or UINT_MAX high.
EricP
2020-09-17 23:44:29 UTC
Permalink
Post by EricP
Post by MitchAlsup
One other note concerning Overflow on unsigned integers is how to detect
ADD Rd,Rs1,-2 // undeflows 0 !
while
ADD Rd,Rs1,0xFFFFFFFFFFFFFFFE // overflows U_Pos_Max !
I don't make a distinction between signed overflow or underflow traps.
Its just trap-on-wrap to the same vector as bounds check CompareAndTrap.
Add Saturate Unsigned can only saturate high,
Subtract Saturate Unsigned can only saturate low.
ADD and SUB source operands are sign or zero extended to 65 bits
and produce 65 bit results of which [63:0] may be saved.
The difference is what happens with result msb [64:64] (carry/Borrow bit)
or [64:63] the extended sign bits.
ADD zero extends the sources, tosses the result msb [64:64] and wraps.
ADDTU Add Trap Unsigned zero extends sources and
traps if result msb [64:64] Carry is set.
ADDTS Add Trap Signed sign extends sources and
traps if result extended sign bits [64:63] are different (XOR).
ADDSU Add saturate Unsigned zero extends sources and
saturates high to UINT_MAX if result [64:64] Carry is set.
ADDSS Add Saturate Signed sign extends sources and
saturates high to INT_MAX if result [64:63] == 01b (wrap high)
saturates low to INT_MIN if result [64:63] == 10b (wrap low).
SUBSU Subtract Saturate Unsigned zero extends sources and
saturates low to 0 if result [64:64] Borrow is set.
The rest of SUBxx are same behavior as ADDxx.
If rather than ADD and SUB opcodes there were Negate flags
then signed trap & sat operations would work the same,
testing result extended sign bits [64:63].
But unsigned saturates would have to look at the combination
of Negate flags to decide what direction, higher or lower,
the operation was going in order to decide to saturate high or low.
Alternatively, it could zero extend unsigned values to 66 bits,
treat them as signed 66-bit values, apply any negates as though
they were 66-bit signed values, producing a 66-bit result.
Then bits [65:64] can be tested the same as signed values
to determine whether to saturate to 0 low or UINT_MAX high.
This is slight wrong for Negate flags since each N bit
intermediate negate operation also produces N+1 bit result.

So it can either use 65 bit ADD and check each intermediate
result for overflow or saturate as originally described,
or for 3 negate flags and the ADD op, sign or zero extend
to 68 bits, and for unsigned check that bits [67:64] are zero,
or for signed [67:63] are all 1 or all 0.
The msb extended sign bit [67:67] tells us the wrap direction.
Ivan Godard
2020-09-20 17:47:07 UTC
Permalink
Post by MitchAlsup
Post by EricP
Post by MitchAlsup
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
No, it is useless if it does something that no one wants.
Negating a constant would be one. Adding zero can be another.
I'm not suggesting My66000 does this.
In many load/store ISA's an immediate is the second source operand.
check
Post by EricP
Subtract opcode means negate the second operand.
check
Post by EricP
So Subtract Immediate is a useless 3-operand opcode.
I came to the same conclusion but even stronger--SUB itself is the useless
OpCode !! since a-b = a+(-b)
In integral math, but not in twos-comp. MININT - MININT should be zero
with no overflow exception. MININT + (-MININT) got you an overflow on
the "-MININT", and will get you another on the add if the first is ignored.

Life is so much easier for the hardware if wrapping semantics can be
assumed.

And so much more difficult for the software.
Terje Mathisen
2020-09-18 06:20:34 UTC
Permalink
Post by EricP
Post by MitchAlsup
On the FP side, negate is a single XOR gate on the sign bit (often
done with a multiplexer to get ABS() and -ABS() at the same time.
Oh I know they are inexpensive to execute.
I might toss integer and float MIN and MAX in there too
as they happen a lot and it costs just a mux.
Please make sure that you implement the latest (ieee754 2019) fp min/max
operations! The previous was seriously messed up when handling NaNs:

A Max() done across an array with both regular numbers and NaNs could
return almost any array element depending upon the order you did the
operations!

This already exist in the form of a SIMD fp max which is implemented as
a binary ladder, i.e. 0<->1 at the same time as 2<->3, then the winners
compared in the second stage, it can return either a NaN or one of the
larger (but possibly not the largest) of the normal values.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
BGB
2020-09-18 17:00:54 UTC
Permalink
Post by EricP
Post by MitchAlsup
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for
free in his MY 6600.  The add instruction (as well as others)
allows you to negate an operand.  So standard subtract is simply
add with the second operand negated.  Similarly, subtract reverse
would be add with the first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to
calculate the space remaining in a buffer, Mitch's approach of a
having just the regular HW adder defined and all the argument
inversions exposed in the instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
No, it is useless if it does something that no one wants.
Negating a constant would be one. Adding zero can be another.
I'm not suggesting My66000 does this.
Sometimes, redundant encodings (assuming they are not being used) can be
reclaimed for something else.
Post by EricP
In many load/store ISA's an immediate is the second source operand.
Subtract opcode means negate the second operand.
So Subtract Immediate is a useless 3-operand opcode.
But if we make immediates the first source operand then it works
for both ADD and SUB and we don't need 3 operand Subtract Reverse.
Now we can do SUB rd,-5,rs.
FWIW, in my case there are no SUB with immediate cases, but instead ADD
with a one-extended immediate.

Basically works because:
z=x-3;
Can essentially be encoded equivalently as:
z=x+(-3);
Also, addition is commutative whereas subtraction is not, ...

Though, support for SUB is needed in the ALU, in the current
implementation, this is done by doing both the ADD and SUB cases in
parallel, and then picking the desired result afterwards (these results
drive ADD/SUB, CMPEQ/CMPGT/..., as well as various SIMD operations).
Post by EricP
But consider 2 operand SUB2 rsd,rs  is rsd = rsd-rs.
Now Subtract Reverse does have a use SUBR2 rsd,rs  is rsd = rs-rsd.
Also reverse operands are useful for 3 operand divide immediate
as we need both rd = rs/imm and rd = imm/rs.
And, still no hardware divider here.


Current dividers I have:

Software shift-subtract loop, currently written in ASM, not super fast
but works fairly reliably.

Software divider which tries to compose (1/x) via lookup tables (or
values less than 16 via a switch table), which is currently faster but
not used automatically because it doesn't give exact results (once a
reciprocal is composed, a widening multiply is used to perform the
division).

It is also possible to cast to double, do an FP divide, then cast back.
However, this isn't really all that much faster, and doesn't seem
particularly safe either (there is a non-zero possibility that the
results will be incorrectly rounded for an integer divide).
Post by EricP
Post by MitchAlsup
Post by EricP
For example, for a 3 register ADD can swap source registers rs1 and rs2.
Or that (-rd) = (-rs1) + (-rs2) is just an ADD.
Or when rs1 = rs2, rd = 0, and (-rd) is rd = -1.
And that if one operand is an immediate there is no need
to negate at run time as the assembler/compiler does so.
At first look it seems there are only 8 variants so why bother.
But then when you consider there are 5 different ADD operations,
with non-trapping (wrapping), signed and unsigned trap or saturate,
these start to add up. Also one source can be a register or immediate,
and there are various sizes of immediates, lets say 32 or 64 bits.
So now there are 8*5*3 = 120 different kinds of ADD.
Clearly (CLEARLY) at this point the encoding space is out of hand....
This is what would happen if I tried mixing my priority features,
trapping/saturating arithmetic, with your operand negate features.
And I'm not necessarily bothered by it, rather just recognizing
where that trail leads.
Saturating arithmetic is another of those things.

As-is, it would look something like:
ADD R8, R9, R4
CMPGT 255, R4
MOV?T 255, R4
CMPGT 0, R4
MOV?F 0, R4

Explicit clamping ops could make sense (there is a clamping intrinsic,
but it basically does something similar to the above).

With a more recent addition, it is possible to SIMD this case though.
MOV 0x0000000000000000, R6
MOV 0x00FF00FF00FF00FF, R7
PADD.W R8, R9, R4
PCMPGT.W R7, R4
PCSELT.W R7, R4, R4
PCMPGT.W R6, R4
PCSELT.W R4, R6, R4

Similar is now also possible with floating-point cases.

But, could be worse...
Post by EricP
Post by MitchAlsup
a) negation/inversion parts are common {Logical, Integer, and Floating Point}
b) That all immediates can be negated/inverted at compiler time
c) Small immediates in Rs1 position are useful (1<<j); saving space
So, one has to realize that a computation has 3 phases, operand delivery,
calculation, and result delivery. Negation and Inversion can be done
during operand delivery, and Negation or Inversion can be done during
result delivery.
I'm just looking at where asymmetries enter and their effect
on possible encodings.
Post by MitchAlsup
Post by EricP
If one has a 32-bit opcode then this is not so bad.
But if one is trying to pack instructions into a 16-bit opcode,
Then you DON'T do this kind of encoding. The 16-bit space is for addressing
code footprint, not a be-all encoding space.
Right, and this is where orthogonality is tossed away.
The way I approached it is to lay out all 0,1,2,3 and some 4 operand
instructions in 32 bit formats and mark the ones that can also fit
into 16 bits. A guesstimate of frequency of usage for various short
16-bit formats says which to choose.
There are also some that are specifically designed to be short,
like Add Tiny ADDTY rsd,tiny and SUBTY rsd,tiny adds or subtracts
a 4-bit tiny value in the range 1..16 (the value 0 is considered 16).
For incrementing/decrementing through arrays.
Also short 2 operand ADD2 rsd,rs and ADD2 rsd,imm could be useful.
Similar.

A lot of the 16-bit ops with immediate values in my ISA also use 4-bit
immediate fields (along with 4 bit register IDs).

Not much space to afford much more.
The 1R ops generally have a full 5 bit register though.


There are a few ops with an 8 bit immediate, namely LDI and ADD.


There are two ops with a 12-bit immediate (which load an Imm12 with zero
or one extension into R0). A few of these made more sense in the early
form of the ISA, but no longer make quite as much sense. Defining the
32-bit encodings as the baseline, *1, and having a lot of 32-bit Imm9
and Imm10 encodings, significantly reduces the number of cases where
these are useful.

Or:
7zzz: Still not used
9zzz: Old FPU, no longer used.
Ajjj: Imm12, could drop eventually (loss of relevance)
Bjjj: Imm12, could drop eventually (loss of relevance)


*1: Early on, the idea was for 16-bit encodings to be the baseline, with
32-bit encodings as an extension (like in SH), but this switched around
partly when I later looked into a fixed-length subset, and realized that
making the whole ISA encodable as 32-bit ops made more sense than
shoe-horning everything into 16-bit encodings (a fixed 32-bit case would
have modestly worse code code density; but a fixed 16-bit case would
take a pretty big hit in terms of performance).

Still, I budgeted most of the encoding space to 16-bit encodings, as
they need it more than the 32-bit encodings did.


Though, one could still likely get by with probably ~ 15.59 bits of
encoding space (eg: 0zzz..Bzzz as 16-bit, Czzz..Fzzz as 32-bit), or
maybe even just 15 bits (0zzz..7zzz as 16-bit, 8zzz..Fzzz as 32-bit).

Cramming everything down to 14 bits would get a bit harder though.
Post by EricP
Post by MitchAlsup
Post by EricP
along with two 4-bit register fields and some instruction length bits,
120 different kinds of ADD starts to look a little extravagant.
It is already way over the top. This might also be a good time to state
that the 16-bit encodings do not give one access to both signed and
unsigned data {or overflowing or saturating}.
I don't follow you here.
Which opcodes wind up with short 16-bit format encodings is all to do
with (a) can it possibly fit and (b) which has the highest usage.
All 5 wrap/trap/sat of the 2 operand ADD2 opcodes fit into a
16-bit format. The question is which need to be there.
Yep.


The most "well trodden ground" in my compiler output seems to be:
Memory load/store (particularly SP relative cases);
Branch ops;
ALU 2R ops (such as "ADD Rm, Rn");
Various 0R and 1R ops (such as 'RTS' and similar);
Small immed ops.

A few "massive spikes" are:
"MOV Rm, Rn"
"ADD Imm8, Rn"
"LDI Imm8, Rn"


Within 32-bit encodings, the overall distribution seems to be similar.
Post by EricP
Post by MitchAlsup
Post by EricP
And one can say all the same things for logic ops AND, OR, XOR,
and complimenting the source and/or result bits.
When one looks at the integer side of computations {Logical, signed,
unsigned} one sees that an XOR gate in the operand delivery path and
carry inputs to the integer adder, provide for both negation and
inversion. This comes at no more gate-delay because one of these XORs
is already in the path to enable SUBtracts!
On the FP side, negate is a single XOR gate on the sign bit (often
done with a multiplexer to get ABS() and -ABS() at the same time.
Oh I know they are inexpensive to execute.
I might toss integer and float MIN and MAX in there too
as they happen a lot and it costs just a mux.
Post by MitchAlsup
Post by EricP
Even more so when one considers effective NOP's, zero or -1 results
when both source registers are the same.
All those redundant or useless encodings start
to look ripe for clawing back and reassignment.
And the ISA starts to look a little more weathered.
One must choose carefully.
Right, I just don't need 50 different ways to zero a register
taking up precious 16-bit opcode space.
Still kind of funny that I still have some 16-bit encoding space left
over. Doesn't really seem that much though that there is much which
could use a 16-bit encoding and will actually fit there.
Terje Mathisen
2020-09-18 20:53:25 UTC
Permalink
Post by BGB
Though, support for SUB is needed in the ALU, in the current
implementation, this is done by doing both the ADD and SUB cases in
parallel, and then picking the desired result afterwards (these results
drive ADD/SUB, CMPEQ/CMPGT/..., as well as various SIMD operations).
This is the same approach I came up with for emulated FADD/FSUB on the
Mill with its zero-cycle pick(): Calculate both and use the xor of the
signs to select the proper value to propagate.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Stephen Fuld
2020-09-23 17:13:33 UTC
Permalink
Post by MitchAlsup
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
And for the logical operations, you do get some potentially useful
variants. I think, negating both inputs to an OR instruction gives you
a NAND output and negating both inputs to an AND instruction gives you a
NOR, both of which would require a additional instruction(s) on
architectures that don't support this feature. However, I believe
negating both operands on an XOR instruction is truly redundant with
negating neither. Would it make sense to "special case" this to give
XNOR? Probably not.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
Stephen Fuld
2020-09-23 18:01:32 UTC
Permalink
Post by Stephen Fuld
Post by MitchAlsup
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600.  The add instruction (as well as others) allows you to
negate an operand.  So standard subtract is simply add with the second
operand negated.  Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
And for the logical operations, you do get some potentially useful
variants.  I think, negating both inputs to an OR instruction gives you
a NAND output and negating both inputs to an AND instruction gives you a
NOR, both of which would require a additional instruction(s) on
architectures that don't support this feature.  However, I believe
negating both operands on an XOR instruction is truly redundant with
negating neither.  Would it make sense to "special case" this to give
XNOR?  Probably not.
Sorry. Of course you can get XNOR by negating either but not both of
the inputs. That would make two redundant forms, one of the forms with
one operand negated, and the form with both negated.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
MitchAlsup
2020-09-23 18:36:17 UTC
Permalink
Post by Stephen Fuld
Post by MitchAlsup
Post by EricP
Post by Terje Mathisen
Post by Stephen Fuld
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
It would take a value in a register, and subtract that value from
the next operand (an immediate, memory value, or register), and
replace the value in the register with the result?
Of course, Mitch is the expert, but I think this is available for free
in his MY 6600. The add instruction (as well as others) allows you to
negate an operand. So standard subtract is simply add with the second
operand negated. Similarly, subtract reverse would be add with the
first value negated.
Exactly right.
I most often feel the need for reverse subtract when I need to calculate
the space remaining in a buffer, Mitch's approach of a having just the
regular HW adder defined and all the argument inversions exposed in the
instruction set just feels so very right.
Terje
An opcode could have Negate flags for both source operands
and the result operand but this has some redundancy
that uses opcode space for the sake of orthogonality.
The question is does one eliminate redundant or useless encodings?
Done properly there are no redundant encodings.
Is an encoding useless if it is only used rarely?
And for the logical operations, you do get some potentially useful
variants. I think, negating both inputs to an OR instruction gives you
a NAND output and negating both inputs to an AND instruction gives you a
NOR, both of which would require a additional instruction(s) on
architectures that don't support this feature.
yes,
Post by Stephen Fuld
However, I believe
negating both operands on an XOR instruction is truly redundant with
negating neither. Would it make sense to "special case" this to give
XNOR? Probably not.
Inverting both operands to an XOR leaves it as an XOR, but it cost no
(zero, nada) gates to do this as it happens "on the data path" and not
"in the calculation".

When performing logic, XOR and XNOR are seldom used (except in multiplier
trees.) One should be able to assume this would hold for HLLs, too.

But note:: When one needs an XNOR, if the compiler can remember that the
value out of the XOR needs to be inverted, it may be inverted when used
as a source. So, in practice, XNOR would add little to the efficiency of
the ISA.
Post by Stephen Fuld
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
Anton Ertl
2020-09-24 08:10:06 UTC
Permalink
Post by MitchAlsup
When performing logic, XOR and XNOR are seldom used (except in multiplier
trees.) One should be able to assume this would hold for HLLs, too.
Yes, you do not see (a<b)==(c<d) often, nor (a<b)!=(c<d). For bitwise
operations, ~(_^_) and (_^_) may be relatively more frequent, but
bitwise operations are not that frequent overall.

In the Gforth image we have the following number of occurences:

298 and (used for conditionals and bitwise)
137 or (used for conditionals and bitwise)
38 xor (used for conditionals and bitwise)
7 invert (bitwise)
178 0= (used for arbitrary cells and to invert conditionals)
Post by MitchAlsup
But note:: When one needs an XNOR, if the compiler can remember that the
value out of the XOR needs to be inverted, it may be inverted when used
as a source. So, in practice, XNOR would add little to the efficiency of
the ISA.
Thanks to De Morgan's laws and the law (~a)^b=a^(~b)=~(a^b), you can
push the logical inversions around for all bitwise operations to where
it is convenient (or where they cancel each other out). If the
bitwise operations form a tree, you can use tree parsing (a technique
used in instruction selection) to push them to optimal places in
linear time.

So out of the various combinations of inversion and other operations,
you can leave quite many away without increasing the number of
instructions executed for most code. I found Alpha's approach in this
area quite elegant: for each of AND, OR, XOR, it also has a version
that inverts the right input: ANDNOT (aka BIC), ORNOT, and XORNOT (aka
EQV). The right input may be a (IIRC zero-extended) 16-bit immediate
value or a register, so allowing to invert the right register adds
flexibility.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-***@googlegroups.com>
BGB
2020-09-24 17:13:03 UTC
Permalink
Post by Anton Ertl
Post by MitchAlsup
When performing logic, XOR and XNOR are seldom used (except in multiplier
trees.) One should be able to assume this would hold for HLLs, too.
Yes, you do not see (a<b)==(c<d) often, nor (a<b)!=(c<d). For bitwise
operations, ~(_^_) and (_^_) may be relatively more frequent, but
bitwise operations are not that frequent overall.
298 and (used for conditionals and bitwise)
137 or (used for conditionals and bitwise)
38 xor (used for conditionals and bitwise)
7 invert (bitwise)
178 0= (used for arbitrary cells and to invert conditionals)
I have seen similar patterns...

Quick ranking of the ALU ops from something I am looking at.

RegRegReg (3R):
ADDx.L 1518 (ADD with Sign/Zero extend)
SUBx.L 1051
ADD 795
AND 368
MUL 237
SUB 144
SHAD 130 (Arithmetic Shift)
OR 89
XOR 32
SHLD 17 (Logical Shift)
RegImmReg:
ADD 3805
ADDx.L 3756
SHAD 2802
MUL 1050
AND 662
OR 304
SHLD 284
XOR 26

What this seems to imply is that operations involving an immediate are a
lot more common than ones involving solely registers (and, by extension
in this case, that the vast majority of immediate values also fit into 9
bits).

However, a few other common ops:
MOV(2R) 23903
LDI(I,R) 13402
ADD(I,R) 4274

Granted, maybe there should be a little fewer RegReg MOV ops, but alas.
For the most part, register MOV ops represent an inefficiency in the
code generation, but could be worse (at least they are not all memory
loads/stores...).
Post by Anton Ertl
Post by MitchAlsup
But note:: When one needs an XNOR, if the compiler can remember that the
value out of the XOR needs to be inverted, it may be inverted when used
as a source. So, in practice, XNOR would add little to the efficiency of
the ISA.
Thanks to De Morgan's laws and the law (~a)^b=a^(~b)=~(a^b), you can
push the logical inversions around for all bitwise operations to where
it is convenient (or where they cancel each other out). If the
bitwise operations form a tree, you can use tree parsing (a technique
used in instruction selection) to push them to optimal places in
linear time.
So out of the various combinations of inversion and other operations,
you can leave quite many away without increasing the number of
instructions executed for most code. I found Alpha's approach in this
area quite elegant: for each of AND, OR, XOR, it also has a version
that inverts the right input: ANDNOT (aka BIC), ORNOT, and XORNOT (aka
EQV). The right input may be a (IIRC zero-extended) 16-bit immediate
value or a register, so allowing to invert the right register adds
flexibility.
Hmm...
Hadn't heard of this...


Otherwise, not much new in my case ISA wise.

Had recently been working on support in my C compiler for things like
VLA's, and then ended up working on a few parts related to my BS2
language (I partially leveraged some BS2 related mechanisms for the VLA
support in C; though they ended up being treated as a separate type with
slightly different behaviors).

Or, for context, the ability to do things like:
int n=x*13;
int arr[n];
And have arr be sized dynamically...

In this case, VLA's build on top of the "alloca()" mechanism, which in
this case exists as a wrapper on top of "malloc()" (mostly by internally
keeping track of memory allocations in a linked list).


They will kinda suck though, but VLAs are at best sort of a rarely used
edge case.

This was along with working some on _Complex and _Imaginary and similar...


On the BS2 side of things, mostly made a change that the compiler is now
partly aware of the type-tagging rules, and the initial state of the
pointer-tagging system is known (but needs to be kept in sync between
the compiler and runtime for the range of types which may be emitted at
compile time).

This allows more stuff to be initialized at compile time:
variant vj, vi=3;
string s="Foo";
int i;
...
vj=i; //convert 'int' to 'fixnum'

Can now be done without needing to use runtime calls.

Where, "variant" is a dynamically-typed value where the type of value
held may vary at run-time, requiring any value held in a variant to be
type-tagged. Generally, any other operations on these are implemented
via runtime calls (IOW: use sparingly).

The "string" type is basically a string, though also uses pointer
type-tagging. Although not exactly the same, in premise it isn't too far
off from a "const char *". In the current implementation, it comes in
ASCII (CP1252) and UTF-16 "flavors". It is now initialized as a tagged
pointer into the ".strtab" section, so not too much different from a
normal C string (apart from the type tags and similar; vs having
separate static types for "const char *" vs "const wchar_t *" and similar).

Note that the language uses a single-inheritance + interfaces model
(similar to C# and Java), which similarly type-tags the pointers. For
the most part though, the language uses a static type system (similar to
that of other C family languages).


Though, there is still a long way to go before this is likely to be
something anyone would want to use (IOW: "sufficiently not suck").

Goal is mostly for something that could be "reasonably competitive" with
C at similar tasks (nevermind the obvious code portability drawback).

Note that both C and BS2 are using the same backend in this case, so the
difference will mostly be down to the relative costs of various language
features (and it is kinda pointless if the features end up being too
expensive to be usable).

...

David Brown
2020-09-17 07:26:31 UTC
Permalink
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
ARM has the "RSB" instruction that does what you describe.
Jonathan Brandmeyer
2020-09-17 15:25:18 UTC
Permalink
Post by David Brown
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
ARM has the "RSB" instruction that does what you describe.
Similarly, POWER has subtract-from

https://devblogs.microsoft.com/oldnewthing/20180808-00/?p=99445
MitchAlsup
2020-09-17 15:46:41 UTC
Permalink
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
Forward subtract:

ADD Rd,Rs1,-Rs2

Reverse subtract:

ADD Rd,-Rs1,Rs2

Dual subtract:

ADD Rd,-Rs1,-Rs2
Ivan Godard
2020-09-20 17:54:38 UTC
Permalink
Post by MitchAlsup
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
ADD Rd,Rs1,-Rs2
ADD Rd,-Rs1,Rs2
ADD Rd,-Rs1,-Rs2
How do you detect overflow, for the code that cares?
MitchAlsup
2020-09-20 18:28:17 UTC
Permalink
Post by Ivan Godard
Post by MitchAlsup
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
ADD Rd,Rs1,-Rs2
ADD Rd,-Rs1,Rs2
ADD Rd,-Rs1,-Rs2
How do you detect overflow, for the code that cares?
There is a s-bit in the 2-register operand instructions which chooses
unsigned (s≡0) or signed with OVERFLOW detect (s≡1).

Overflow is detected by wrap with consideration of the S1-bit and S2-bit.
John Dallman
2020-09-19 07:55:00 UTC
Permalink
Post by Rick C. Hodgin
Is there a mnemonic and syntax in any ISA that performs a reverse
subtract?
x86 has this for floating-point. It started with the x87 floating-point
registers, where it can be quite useful in avoiding the need to swap
stack members; there's also reverse divide, for similar reasons.

John
Continue reading on narkive:
Loading...