DEC64 decimal floating point

This is coming from the Javascript big-names:

http://dec64.com/

What are people's thoughts on it?

Nick Maclaren

2014-03-10 13:55:00 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

It's clearly written by people who understand a tiny proportion of
the requirement, and believe that they understand all of it. There
are also some significant errors in its historical claims.

Technically, the proposal is quite reasonable for that very limited
range of requirements, and completely demented for all of the others.

Regards,
Nick Maclaren.

Quadibloc

2014-03-10 14:39:33 UTC

Post by William Edwards
What are people's thoughts on it?

The claim that the market for decimal arithmetic was destroyed by the IEEE 754 standard is inaccurate. It was in abeyance long before then.

Essentially, while it may give fast performance for integers, it does not give fast performance for floating-point numbers. So having it as the *only* floating-point format on a computer does not make sense.

They are correct that the idea is not new. A format of this class was used with the JOSS interpreter. Using it with a JavaScript-like language would also be quite reasonable.

John Savard

Ivan Godard

2014-03-10 15:38:10 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

IEEE754 defines a complete decimal FP type that provides all that this
does. IBM supports the IEEE decimal format in hardware, and there are
quality software implementations available. 754 decimal is even a
supported type on the new Mill CPU. Why are they reinventing the wheel?

Nick Maclaren

2014-03-10 16:20:00 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

Because this is simply a wheel, and not a wheel that can take multiple
forms, including the ability to extend legs to cover terrain that
wheels can't. That aspect I can't fault!

However, I accept that they should have at least taken note of the
fact that IEEE 754 decimal floating-point has received a resounding
yawn (when not actually a Bronx cheer) from most IT communities as
an indication that this might well be a solution desperately looking
for a requirement.

Regards,
Nick Maclaren.

Robert Wessel

2014-03-10 18:55:48 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

Because this is simply a wheel, and not a wheel that can take multiple
forms, including the ability to extend legs to cover terrain that
wheels can't. That aspect I can't fault!
However, I accept that they should have at least taken note of the
fact that IEEE 754 decimal floating-point has received a resounding
yawn (when not actually a Bronx cheer) from most IT communities as
an indication that this might well be a solution desperately looking
for a requirement.

I think they did ("A later revision of IEEE 754 attempted to remedy
this, but the formats it recommended were so inefficient that it has
not found much acceptance.").

As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware. Still, I'm
puzzled by the whole thing. Proper scaled integer arithmetic is
necessary for certain problems, particularly those involving currency,
but I'm just not sure how much decimal FP, even decimal FP tweaked to
support currency-type operations, really buys you.

Nick Maclaren

2014-03-10 19:38:07 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

Because this is simply a wheel, and not a wheel that can take multiple
forms, including the ability to extend legs to cover terrain that
wheels can't. That aspect I can't fault!
However, I accept that they should have at least taken note of the
fact that IEEE 754 decimal floating-point has received a resounding
yawn (when not actually a Bronx cheer) from most IT communities as
an indication that this might well be a solution desperately looking
for a requirement.

I think they did ("A later revision of IEEE 754 attempted to remedy
this, but the formats it recommended were so inefficient that it has
not found much acceptance.").

The lack of acceptance predates even the original IEEE 754 and,
as far as I know, has nothing whatsoever to do with performance.
The facility is simply not wanted by the vast majority of IT
communities.

Post by Robert Wessel
As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware.

Two. Intel abandoned their plans several years ago.

Post by Robert Wessel
Still, I'm
puzzled by the whole thing. Proper scaled integer arithmetic is
necessary for certain problems, particularly those involving currency,
but I'm just not sure how much decimal FP, even decimal FP tweaked to
support currency-type operations, really buys you.

Essentially damn-all, as has been known since time immemorial.

The first lunacy is that the fixed-point overflow exception is
essentially the same as the floating-point inexact one, but the
former is a serious error and the latter not an error. So should
you trap it or ignore it? The 'solution' is for everyone using
fixed-point to use 128-bit for everything - ha, ha.

The second is that almost all requirements for decimal fixed-
point are legally constrained, and the rules for handling
multiplication, division and precision conversion are legion,
with few of them being available in IEEE 754. Using floating-
point merely introduces the well-known problems of double
rounding.

That's not all, but those are the main ones that I know of.

Regards,
Nick Maclaren.

Robert Wessel

2014-03-10 20:24:49 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

Because this is simply a wheel, and not a wheel that can take multiple
forms, including the ability to extend legs to cover terrain that
wheels can't. That aspect I can't fault!
However, I accept that they should have at least taken note of the
fact that IEEE 754 decimal floating-point has received a resounding
yawn (when not actually a Bronx cheer) from most IT communities as
an indication that this might well be a solution desperately looking
for a requirement.

I think they did ("A later revision of IEEE 754 attempted to remedy
this, but the formats it recommended were so inefficient that it has
not found much acceptance.").

Post by Robert Wessel
As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware.

Two. Intel abandoned their plans several years ago.

POWER, Z, and now (or soon) SPARC.

Nick Maclaren

2014-03-10 20:41:18 UTC

Post by Ivan Godard
IEEE754 defines a complete decimal FP type that provides all that this
does. IBM supports the IEEE decimal format in hardware, and there are
quality software implementations available. 754 decimal is even a
supported type on the new Mill CPU. Why are they reinventing the wheel?

As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware.

Two. Intel abandoned their plans several years ago.

POWER, Z, and now (or soon) SPARC.

Z and POWER were more-or-less integrated a long time ago. They
are very different in some respects, but use the same component
library. And don't believe that, just because the hardware supports
it, the compilers will necessarily use it - or that, even if they
do, they will use it to replace the binary types. They won't.

Being an old cynic, I remember when people said "and now (or soon)
Intel" :-)

Regards,
Nick Maclaren.

Robert Wessel

2014-03-11 00:56:25 UTC

As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware.

Two. Intel abandoned their plans several years ago.

POWER, Z, and now (or soon) SPARC.

Z and POWER were more-or-less integrated a long time ago. They
are very different in some respects, but use the same component
library.

They really aren't. There was a slide of a roadmap from a number of
years back that showed those architectures being merged, but...

While there's no doubt that Z and POWER systems share some stuff, the
Z microprocessors and the POWER ones still have radically different
microarchitectures. One place they do share is the DFP unit (and
maybe the DFP unit), which Z largely stole/borrowed from POWER.

The P and I (AS/400) systems have, OTOH, largely joined hardware.

Post by Nick Maclaren
And don't believe that, just because the hardware supports
it, the compilers will necessarily use it - or that, even if they
do, they will use it to replace the binary types. They won't.

The newest Cobol compilers on Z now will use the DFP instructions
internally. Somewhat amusingly, the Cobol compiler doesn't actually
support DFP as a type yet (although several other languages do).
Although that may say more about the performance of the traditional
packed decimal instructions than about the value of DFP.

Nick Maclaren

2014-03-11 09:44:18 UTC

The newest Cobol compilers on Z now will use the DFP instructions
internally. Somewhat amusingly, the Cobol compiler doesn't actually
support DFP as a type yet (although several other languages do).

As far as I know, that is just C and C++ (via the ghastly hack),
though possibly RPG may still be around and join them. And, in
NO case, is the format intended to replace binary (which was the
claim for that ghastly Javascript format in the document that
started this).

Regards,
Nick Maclaren.

Robert Wessel

2014-03-11 10:19:54 UTC

The newest Cobol compilers on Z now will use the DFP instructions
internally. Somewhat amusingly, the Cobol compiler doesn't actually
support DFP as a type yet (although several other languages do).

PL/I does too. I think the AIX (POWER) version of PL/I does as well.

Ivan Godard

2014-03-10 20:59:43 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

Because this is simply a wheel, and not a wheel that can take multiple
forms, including the ability to extend legs to cover terrain that
wheels can't. That aspect I can't fault!
However, I accept that they should have at least taken note of the
fact that IEEE 754 decimal floating-point has received a resounding
yawn (when not actually a Bronx cheer) from most IT communities as
an indication that this might well be a solution desperately looking
for a requirement.

I think they did ("A later revision of IEEE 754 attempted to remedy
this, but the formats it recommended were so inefficient that it has
not found much acceptance.").

Post by Robert Wessel
As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware.

Two. Intel abandoned their plans several years ago.

Nope. Intel did not abandon their plan for decimal; they abandoned their
plan for decimal *hardware*. They put on a massive standard-twisting
effort to get IEEE to bless a format that could be halfway performant in
software when they discovered what the cost would be for
commercial-grade hardware on the x886 frame.

Decimal is a market that Intel cares about. The problem is that it's an
isolated market, big enough to matter but not big enough that the
non-decimal users don't matter. We are in exactly the same situation,
but we have the emulation mechanism to deal with compatability issues,
and Intel doesn't.

Ivan Godard

2014-03-10 21:13:25 UTC

<snip>

Essentially damn-all, as has been known since time immemorial.
The first lunacy is that the fixed-point overflow exception is
essentially the same as the floating-point inexact one, but the
former is a serious error and the latter not an error. So should
you trap it or ignore it? The 'solution' is for everyone using
fixed-point to use 128-bit for everything - ha, ha.
The second is that almost all requirements for decimal fixed-
point are legally constrained, and the rules for handling
multiplication, division and precision conversion are legion,
with few of them being available in IEEE 754. Using floating-
point merely introduces the well-known problems of double
rounding.
That's not all, but those are the main ones that I know of.

Commercial databases are a reasonably large market. We had a study that
evaluated where Oracle spent its time. Turns out that a database schema
format "numeric" means decimal and somewhat over a quarter of all
machine cycles were spent in the decimal arithmetic routines.

We had lots of input from the COBOL committee; they are intimately
familiar with the issues of legal requirements for computation. They
were desperate for a standard, and have incorporated 754 into the new
COBOL upgrade. They felt that the facilities in IEEE Decimal were
suitable for programming against all legal requirements.

COBOL is the Rodney Dangerfield of programming languages and gets no
respect. However, it is still true that a huge number of cycles are
executed by COBOL programs, and, within its application domain, COBOL is
quite suitable and preferable to other popular languages. Try writing -
and maintaining - the equivalent of MOVE CORRESPONDING in C and you will
see what I mean.

Nick Maclaren

2014-03-10 22:00:37 UTC

Post by Ivan Godard
Don't be misled by the name: IEEE Decimal is really scaled integer, with
semi-dynamic scaling. You have explicit control over the scaling, so it
is static scaling, but only when you want the control, so it is also
dynamic scaling aka floating point.

It's not as simple as that. It's a horse designed by a committee.

Post by Robert Wessel
As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware.

Two. Intel abandoned their plans several years ago.

Nope. Intel did not abandon their plan for decimal; they abandoned their
plan for decimal *hardware*. ...

That was the context, and that is what I said.

Post by Nick Maclaren
The second is that almost all requirements for decimal fixed-
point are legally constrained, and the rules for handling
multiplication, division and precision conversion are legion,
with few of them being available in IEEE 754. Using floating-
point merely introduces the well-known problems of double
rounding.

I am not disputing that you were told that, or even whether it is
(in at least some sense) true.

Given that virtually no HPC program spends its time in actual
computation any longer, and commercial and database codes always
tended to be I/O bound rather than CPU-bound, I have deep
suspicions of this claim. Without seeing the evidence, I have
no idea whether the comparisons are against a competent approach
or an emulation of IBM System/360 packed decimal. Or whether the
time measures is that spent in the CPU core, and ignores that spent
in the memory subsystem. Or what.

Post by Ivan Godard
COBOL is the Rodney Dangerfield of programming languages and gets no
respect. However, it is still true that a huge number of cycles are
executed by COBOL programs, and, within its application domain, COBOL is
quite suitable and preferable to other popular languages. Try writing -
and maintaining - the equivalent of MOVE CORRESPONDING in C and you will
see what I mean.

All of that is true, but irrelevant. What I said in the paragraph
earlier is the point - IBM 754 decimal isn't actually going to help,
compared to several simpler approaches.

Regards,
Nick Maclaren.

Michael S

2014-03-10 22:15:15 UTC

Post by Ivan Godard
<snip>

Essentially damn-all, as has been known since time immemorial.
The first lunacy is that the fixed-point overflow exception is
essentially the same as the floating-point inexact one, but the
former is a serious error and the latter not an error. So should
you trap it or ignore it? The 'solution' is for everyone using
fixed-point to use 128-bit for everything - ha, ha.
The second is that almost all requirements for decimal fixed-
point are legally constrained, and the rules for handling
multiplication, division and precision conversion are legion,
with few of them being available in IEEE 754. Using floating-
point merely introduces the well-known problems of double
rounding.
That's not all, but those are the main ones that I know of.

Commercial databases are a reasonably large market. We had a study that
evaluated where Oracle spent its time. Turns out that a database schema
format "numeric" means decimal

What you mean when you say that "numeric" means "decimal"?

Post by Ivan Godard
and somewhat over a quarter of all
machine cycles were spent in the decimal arithmetic routines.

On which benchmark on which machine?

IMHO, what's woud really be usefull for that sort of application is not DFP, but improved hardware support for 128-bit integer, likely including multiplication. May be even, division, although I am not sure that imparct of division is above noise level.
As to BCD<->binary conversions, when competently coded on SNB and friends they are already damn fast. Hardware support can make it faster yet, by why should we try to speed something which is not a bottleneck?

Now, my opinion on the issue is really humble, deep inside I don't care to be corrected.
I'd guess that Terje, Robert and Nick has far less humble opinions that are similar to mine.

Ivan Godard

2014-03-10 23:50:10 UTC

Post by Ivan Godard
<snip>

Still, I'm puzzled by the whole thing. Proper scaled integer
arithmetic is necessary for certain problems, particularly
those involving currency, but I'm just not sure how much
decimal FP, even decimal FP tweaked to support currency-type
operations, really buys you.

Essentially damn-all, as has been known since time immemorial.
The first lunacy is that the fixed-point overflow exception is
essentially the same as the floating-point inexact one, but the
former is a serious error and the latter not an error. So
should you trap it or ignore it? The 'solution' is for everyone
using fixed-point to use 128-bit for everything - ha, ha.
The second is that almost all requirements for decimal fixed-
point are legally constrained, and the rules for handling
multiplication, division and precision conversion are legion,
with few of them being available in IEEE 754. Using floating-
point merely introduces the well-known problems of double
rounding.
That's not all, but those are the main ones that I know of.

Commercial databases are a reasonably large market. We had a study
that evaluated where Oracle spent its time. Turns out that a
database schema format "numeric" means decimal

What you mean when you say that "numeric" means "decimal"?

Post by Ivan Godard
and somewhat over a quarter of all machine cycles were spent in the
decimal arithmetic routines.

On which benchmark on which machine?
IMHO, what's woud really be usefull for that sort of application is
not DFP, but improved hardware support for 128-bit integer, likely
including multiplication. May be even, division, although I am not
sure that imparct of division is above noise level. As to
BCD<->binary conversions, when competently coded on SNB and friends
they are already damn fast. Hardware support can make it faster yet,
by why should we try to speed something which is not a bottleneck?
Now, my opinion on the issue is really humble, deep inside I don't
care to be corrected. I'd guess that Terje, Robert and Nick has far
less humble opinions that are similar to mine.

Quad int gives you the precision, but then you have to manage the units
explicitly yourself. (units in this context means where to put the
decimal point). 754 decimal does that for you. Given the complexity of
business calculation (HPC weather code has nothing on commercial
multi-division tax code) the simplification produces lower costs in real
money and fewer bugs in bet-your-business code.

To say that commercial can be done in quad int, so there's no need for
correctly-rounded decimal, is to say that we can code in octal, so
there's no need for {your choice of language}. I'll believe it when I
hear someone who makes his living writing commercial code say so.

Done a payroll lately?

Nick Maclaren

2014-03-11 00:11:54 UTC

Post by Ivan Godard
Quad int gives you the precision, but then you have to manage the units
explicitly yourself. (units in this context means where to put the
decimal point). 754 decimal does that for you. Given the complexity of
business calculation (HPC weather code has nothing on commercial
multi-division tax code) the simplification produces lower costs in real
money and fewer bugs in bet-your-business code.

If it delivered that, it would. Opinions on that vary.

Post by Ivan Godard
To say that commercial can be done in quad int, so there's no need for
correctly-rounded decimal, is to say that we can code in octal, so
there's no need for {your choice of language}. I'll believe it when I
hear someone who makes his living writing commercial code say so.

Nonsense. It is nothing like that. All the compiler has to do
is generate calls to internal procedures, which was established
technology by the early 1960s. Lot of systems have used scaled
integers for commercial calculations, very successfully.

Regards,
Nick Maclaren.

Robert Wessel

2014-03-11 01:05:22 UTC

Post by Ivan Godard
<snip>

Essentially damn-all, as has been known since time immemorial.
The first lunacy is that the fixed-point overflow exception is
essentially the same as the floating-point inexact one, but the
former is a serious error and the latter not an error. So
should you trap it or ignore it? The 'solution' is for everyone
using fixed-point to use 128-bit for everything - ha, ha.
The second is that almost all requirements for decimal fixed-
point are legally constrained, and the rules for handling
multiplication, division and precision conversion are legion,
with few of them being available in IEEE 754. Using floating-
point merely introduces the well-known problems of double
rounding.
That's not all, but those are the main ones that I know of.

Commercial databases are a reasonably large market. We had a study
that evaluated where Oracle spent its time. Turns out that a
database schema format "numeric" means decimal

What you mean when you say that "numeric" means "decimal"?

Post by Ivan Godard
and somewhat over a quarter of all machine cycles were spent in the
decimal arithmetic routines.

On which benchmark on which machine?
IMHO, what's woud really be usefull for that sort of application is
not DFP, but improved hardware support for 128-bit integer, likely
including multiplication. May be even, division, although I am not
sure that imparct of division is above noise level. As to
BCD<->binary conversions, when competently coded on SNB and friends
they are already damn fast. Hardware support can make it faster yet,
by why should we try to speed something which is not a bottleneck?
Now, my opinion on the issue is really humble, deep inside I don't
care to be corrected. I'd guess that Terje, Robert and Nick has far
less humble opinions that are similar to mine.

Quad int gives you the precision, but then you have to manage the units
explicitly yourself. (units in this context means where to put the
decimal point). 754 decimal does that for you. Given the complexity of
business calculation (HPC weather code has nothing on commercial
multi-division tax code) the simplification produces lower costs in real
money and fewer bugs in bet-your-business code.
To say that commercial can be done in quad int, so there's no need for
correctly-rounded decimal, is to say that we can code in octal, so
there's no need for {your choice of language}. I'll believe it when I
hear someone who makes his living writing commercial code say so.
Done a payroll lately?

Nobody is saying that you don't need *decimal* scaled arithmetic, but
there's no need for that to be in actual decimal. Sufficiently large
values, and in particular intermediates, are needed as well. If I
change the COMP-3s (packed decimal) in:

01 A PIC S9(9)V99.
01 B PIC S999V9999.
01 C PIC S9(9)V999.

COMPUTE C ROUNDED = A * B.

to COMPs (binary), I'll get exactly the same result.

What the ISA needs is support for fast conversion between binary and
decimal, support for long intermediates, and decently performing
scaling (multiplying/dividing by powers of 10) and rounding.

Ivan Godard

2014-03-11 01:47:28 UTC

Post by Robert Wessel
What the ISA needs is support for fast conversion between binary and
decimal, support for long intermediates, and decently performing
scaling (multiplying/dividing by powers of 10) and rounding.

Rounding is the problem. Note that you cannot do (legally required)
correct decimal by doing the computation in binary to *any* precision
and then converting. You cannot even correctly take a decimal number,
convert it to binary, and convert it back to the same decimal number.
$0.01 does not have an exact representation in binary.

Yes, you can use integer and do the scaling yourself. Until you have a
divide, of course. By the time you are done with your library you will
discover that you have a nice software implementation - of 754 decimal.
Except that you have an idiosyncratic representation (like the folks
whose proposal started this thread) whose data nobody else can use.

Kind of like binary floating point before IEEE. There are good reasons
to have a standard, even one that nobody is entirely happy with. And the
people who use this stuff seem more than just happy, they seem grateful.

So if you have long intermediates and fast scaling, is there any reason
not to package that up in a standard-conforming ISA? You've already done
the work, after all :-)

Robert Wessel

2014-03-11 06:16:18 UTC

I was specifically talking about doing scaled (binary) integer
arithmetic.

Nick Maclaren

2014-03-11 09:52:18 UTC

Agreed. And you cannot perform the operation in decimal floating-
point, using another rounding and precision reduction rule, and
then fix it up afterwards. Have you ever tried to do that?
I have, and I have also read published analyses.

And THAT is why IEEE 754 decimal floating-point does not help,
except for the VERY few cases where the legally required rounding
and precision reduction rules match one of its few options.

Post by Ivan Godard
Yes, you can use integer and do the scaling yourself. Until you have a
divide, of course. By the time you are done with your library you will
discover that you have a nice software implementation - of 754 decimal.

Er, no. What you will do, which IEEE 754 does not, is be able to
match your specific rounding and precision reduction rules.

Post by Ivan Godard
Kind of like binary floating point before IEEE. There are good reasons
to have a standard, even one that nobody is entirely happy with. And the
people who use this stuff seem more than just happy, they seem grateful.

Some do. Others, and usually the more clued-up, aren't. And the
most clued-up are concerned by the way that it has simplified the
task of gormless programmers to get consistently wrong answers.

Post by Ivan Godard
So if you have long intermediates and fast scaling, is there any reason
not to package that up in a standard-conforming ISA? You've already done
the work, after all :-)

Yes. Rounding and precision reduction rules :-(

Regards,
Nick Maclaren.

Terje Mathisen

2014-03-11 10:33:05 UTC

Post by Ivan Godard
So if you have long intermediates and fast scaling, is there any reason
not to package that up in a standard-conforming ISA? You've already done
the work, after all :-)

Yes. Rounding and precision reduction rules :-(

As Ivan has indicated we are working on a set of helper ops which will
allow sw emulation of hw FP with reasonable efficiency (given reasonable
integer mul/mulh operations).

It is quite obvious that in this setup rounding will be a separate
operation, which would tend to make it much easier to replace just this
part of the algorithm.

I believe you have argued for exactly this kind of setup?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Nick Maclaren

2014-03-11 12:04:17 UTC

Post by Ivan Godard
So if you have long intermediates and fast scaling, is there any reason
not to package that up in a standard-conforming ISA? You've already done
the work, after all :-)

Yes. Rounding and precision reduction rules :-(

As Ivan has indicated we are working on a set of helper ops which will
allow sw emulation of hw FP with reasonable efficiency (given reasonable
integer mul/mulh operations).
It is quite obvious that in this setup rounding will be a separate
operation, which would tend to make it much easier to replace just this
part of the algorithm.
I believe you have argued for exactly this kind of setup?

Yes, but NOT as a way of fixing up the emulation of fixed-point.

It's feasible for multiplication, at increased cost, by doing
all multiplications in full precision (i.e. doubling the width),
but isn't for division.

Inter alia, consider a rule where division is required to round
in some fashion, and the remainder is accumulated separately.
That cannot be fixed up by a rounding add-on, no matter what
you do.

Regards,
Nick Maclaren.

Terje Mathisen

2014-03-11 13:02:37 UTC

Post by Ivan Godard
So if you have long intermediates and fast scaling, is there any reason
not to package that up in a standard-conforming ISA? You've already done
the work, after all :-)

Yes. Rounding and precision reduction rules :-(

As Ivan has indicated we are working on a set of helper ops which will
allow sw emulation of hw FP with reasonable efficiency (given reasonable
integer mul/mulh operations).
It is quite obvious that in this setup rounding will be a separate
operation, which would tend to make it much easier to replace just this
part of the algorithm.
I believe you have argued for exactly this kind of setup?

Yes, but NOT as a way of fixing up the emulation of fixed-point.
It's feasible for multiplication, at increased cost, by doing
all multiplications in full precision (i.e. doubling the width),
but isn't for division.
Inter alia, consider a rule where division is required to round
in some fashion, and the remainder is accumulated separately.
That cannot be fixed up by a rounding add-on, no matter what
you do.

The rounding has be specified in _some_ way that makes it possible to do
do it, at least by hand, right?

If you return both the result of the division, _and_ the remainder
(which could be positive or negative), then you can obviously write code
to implement any given rounding rule as a fixup to be applied
afterwards, and this might still be much faster than what you get on
platforms with no extra support.

I.e. please specify a rounding rule that you think would be hard to
implement!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Nick Maclaren

2014-03-11 16:27:04 UTC

Post by Ivan Godard
So if you have long intermediates and fast scaling, is there any reason
not to package that up in a standard-conforming ISA? You've already done
the work, after all :-)

Yes. Rounding and precision reduction rules :-(

As Ivan has indicated we are working on a set of helper ops which will
allow sw emulation of hw FP with reasonable efficiency (given reasonable
integer mul/mulh operations).
It is quite obvious that in this setup rounding will be a separate
operation, which would tend to make it much easier to replace just this
part of the algorithm.
I believe you have argued for exactly this kind of setup?

Yes, but NOT as a way of fixing up the emulation of fixed-point.
It's feasible for multiplication, at increased cost, by doing
all multiplications in full precision (i.e. doubling the width),
but isn't for division.
Inter alia, consider a rule where division is required to round
in some fashion, and the remainder is accumulated separately.
That cannot be fixed up by a rounding add-on, no matter what
you do.

The rounding has be specified in _some_ way that makes it possible to do
do it, at least by hand, right?

Why? :-) These are rules written by bureaucrats, under directions
from politicians and conflicting lawyers ....

Post by Terje Mathisen
If you return both the result of the division, _and_ the remainder
(which could be positive or negative), then you can obviously write code
to implement any given rounding rule as a fixup to be applied
afterwards, and this might still be much faster than what you get on
platforms with no extra support.

Yes. And that is irrespective of whether you do it in floating-point,
fixed-point, scaled integer or anything else like that.

Post by Terje Mathisen
I.e. please specify a rounding rule that you think would be hard to
implement!

Don't tempt me :-) Try rounding up if the exact values of the
previous two values rounded are expressible as a complex number
in the conventional Mandelbrot set and down otherwise ....

The serious question is the hardest plausible one, and there are
definitely some of the form I described.

Regards,
Nick Maclaren.

Terje Mathisen

2014-03-11 18:14:14 UTC

Post by Terje Mathisen
The rounding has be specified in _some_ way that makes it possible to do
do it, at least by hand, right?

Why? :-) These are rules written by bureaucrats, under directions
from politicians and conflicting lawyers ....

And very often the starting point for government IT projects that blow
up badly. :-(

Yes. And that is irrespective of whether you do it in floating-point,
fixed-point, scaled integer or anything else like that.

Absolutely!

This is the point where one should read Knuth in order to realize that
_most_ fp algorithms are independent of the number base used. (The
exceptions often work best in binary, right?)

The round to nearest_or_even rule for division corresponds to checking
if twice the remainder is <=> the divisor: If equal then check the
division result for odd/even, which is fast in all even number bases.

If the remainder was negative and of half the magnitude of the divisor,
then do the same but subtract one if the result was odd.

Post by Terje Mathisen
I.e. please specify a rounding rule that you think would be hard to
implement!

Don't tempt me :-) Try rounding up if the exact values of the
previous two values rounded are expressible as a complex number
in the conventional Mandelbrot set and down otherwise ....
The serious question is the hardest plausible one, and there are
definitely some of the form I described.

OK, rounding rules in the form of state machines with memory would make
things more interesting, but still feasible as long as you don't allow
any form of evaluation order optimization _and_ the rounding function is
a monitor so only a single operation can use it at once.

I.e. this would be a very effective guard against fast code. :-(

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Stephen Fuld

2014-03-12 15:40:55 UTC

snip

Post by Terje Mathisen
The rounding has be specified in _some_ way that makes it possible to do
do it, at least by hand, right?

Why? :-) These are rules written by bureaucrats, under directions
from politicians and conflicting lawyers ....

Yes. And that is irrespective of whether you do it in floating-point,
fixed-point, scaled integer or anything else like that.

Post by Terje Mathisen
I.e. please specify a rounding rule that you think would be hard to
implement!

Don't tempt me :-) Try rounding up if the exact values of the
previous two values rounded are expressible as a complex number
in the conventional Mandelbrot set and down otherwise ....
The serious question is the hardest plausible one, and there are
definitely some of the form I described.

While I will take your word for it that rules like you described are
used in some contexts, remember, decimal floating point is designed
specifically for meeting certain business/legal requirements more easily
than can be done with binary (floating point). And I can state, without
fear of being contradicted that essentially no business application uses
complex numbers. Their rounding rules may be complex, but that is a
different matter :-).

I suspect that somewhere there is a list of all the rounding rules
needed by business applications (probably in the internal documentation
of some business software provider). I see no need to provide support,
much less efficient support, for arcane rounding rules that are not
going to be applicable in the domain where the basic data type is going
to be used.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Nick Maclaren

2014-03-12 16:23:51 UTC

Post by Terje Mathisen
The rounding has be specified in _some_ way that makes it possible to do
do it, at least by hand, right?

Why? :-) These are rules written by bureaucrats, under directions
from politicians and conflicting lawyers ....

Yes. And that is irrespective of whether you do it in floating-point,
fixed-point, scaled integer or anything else like that.

Post by Terje Mathisen
I.e. please specify a rounding rule that you think would be hard to
implement!

Don't tempt me :-) Try rounding up if the exact values of the
previous two values rounded are expressible as a complex number
in the conventional Mandelbrot set and down otherwise ....
The serious question is the hardest plausible one, and there are
definitely some of the form I described.

It is PRECISELY those rules I am referring to! And, no, I wasn't
referring to that damn-fool example, which was obviously a joke
(at least it will have been obvious to Terje). Dammit, bureaucrats
wouldn't know the properties of a Mandelbrot set if it was tattooed
on their backsides.

There are lots of accounting and legal rules that require the
remainders to be added into the next value, or accumulated
seperatedly, or .... Terje quite rightly pointed out that any
such rule can be implemented if the division operation delivers
the quotient and remainder together.

My point here is that you CAN'T implement such rules by a rounding
rule alone, and you either need quotient+remainder or (equivalently)
some form of effectively infinite accuracy intermediate, AND to
maintain some state.

Regards,
Nick Maclaren.

Quadibloc

2014-03-12 19:21:38 UTC

Post by Stephen Fuld
I see no need to provide support,
much less efficient support, for arcane rounding rules that are not
going to be applicable in the domain where the basic data type is going
to be used.

I presume the idea is that the only reason to use a decimal type, either the old-fashioned fixed-point packed decimal, or the new-fangled decimal floating point, is in order to handle just exactly those business applications which have arcane rounding rules.

Therefore, a decimal type that cannot support them has essentially no application domain - at least that seems to be Nick MacLaren's claim.

John Savard

Nick Maclaren

2014-03-12 19:44:42 UTC

I presume the idea is that the only reason to use a decimal type, either th
e old-fashioned fixed-point packed decimal, or the new-fangled decimal floa
ting point, is in order to handle just exactly those business applications
which have arcane rounding rules.
Therefore, a decimal type that cannot support them has essentially no appli
cation domain - at least that seems to be Nick MacLaren's claim.

Essentially, yes.

Regards,
Nick Maclaren.

Robert Wessel

2014-03-12 20:31:06 UTC

On Wed, 12 Mar 2014 12:21:38 -0700 (PDT), Quadibloc

I presume the idea is that the only reason to use a decimal type, either the old-fashioned fixed-point packed decimal, or the new-fangled decimal floating point, is in order to handle just exactly those business applications which have arcane rounding rules.
Therefore, a decimal type that cannot support them has essentially no application domain - at least that seems to be Nick MacLaren's claim.

And has been pointed out several times, what you actually need is
decimal *scaling*. Decimal arithmetic can provide that fairly easily
(which is a point in its favor), but it's hardly necessary.

The question at the end of the day is how much time systems are
spending doing decimal math, and whether that's enough to justify
special hardware.

There are some arguments that decimal arithmetic can lead to less
surprise for users*, but that seems a limited argument (users are
still surprised when (1/3)*3 doesn't equal 1, even with decimal math).
And if that is an issue, decimal scaling is most of the actual
requirement there to.

*Many calculators do arithmetic in decimal, at least partially for
that reason (simple calculators use decimal primarily because they'd
have to more work to convert to/from binary).

Stephen Fuld

2014-03-12 20:41:14 UTC

Post by Robert Wessel
On Wed, 12 Mar 2014 12:21:38 -0700 (PDT), Quadibloc

I presume the idea is that the only reason to use a decimal type, either the old-fashioned fixed-point packed decimal, or the new-fangled decimal floating point, is in order to handle just exactly those business applications which have arcane rounding rules.
Therefore, a decimal type that cannot support them has essentially no application domain - at least that seems to be Nick MacLaren's claim.

The error here is that not being able to support every type of arcane
rounding rule is most definitely not the same as being able to support
the most common ones. Yes, not supporting some arcane, infrequently
used rule does limit the application domain, but that is a far cry from
saying there is no useful application domain.

Post by Robert Wessel
And has been pointed out several times, what you actually need is
decimal *scaling*. Decimal arithmetic can provide that fairly easily
(which is a point in its favor), but it's hardly necessary.
The question at the end of the day is how much time systems are
spending doing decimal math, and whether that's enough to justify
special hardware.

Agreed.

Post by Robert Wessel
There are some arguments that decimal arithmetic can lead to less
surprise for users*, but that seems a limited argument (users are
still surprised when (1/3)*3 doesn't equal 1, even with decimal math).
And if that is an issue, decimal scaling is most of the actual
requirement there to.

Well, people dealing with decimal arithmetic know there is no such thing
as 1/3. There is only (in US currency as an example), 33 cents, or .33
dollars. And pretty much everyone knows that three times 33 cents
equals 99 cents, not one dollar.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Tom Gardner

2014-03-12 21:07:25 UTC

Well, people dealing with decimal arithmetic know there is no such thing as 1/3. There is only (in US currency as an example), 33 cents, or .33 dollars. And pretty much everyone knows that three
times 33 cents equals 99 cents, not one dollar.

Not always :( I've seen (otherwise tolerable) programmers
on someone else's project manage to actively avoid thinking
about that and go through some truly remarkable gyrations
to (fail to) fix things up.

They also had to be lead by the hand to realise that
if you know X and Y to +-10%, you don't know X+Y to 10%.

Never underestimate human stupidity. And you can't make
things foolproof because fools are so damn ingenious.

Terje Mathisen

2014-03-12 21:38:07 UTC

Post by Stephen Fuld
Well, people dealing with decimal arithmetic know there is no such
thing as 1/3. There is only (in US currency as an example), 33 cents,
or .33 dollars. And pretty much everyone knows that three
times 33 cents equals 99 cents, not one dollar.

Not always :( I've seen (otherwise tolerable) programmers
on someone else's project manage to actively avoid thinking
about that and go through some truly remarkable gyrations
to (fail to) fix things up.
They also had to be lead by the hand to realise that
if you know X and Y to +-10%, you don't know X+Y to 10%.

That's actually a subtle problem:

The maximum error in the sum is +-20%, but only if the original bounds
were absolute and X and Y are approximately equal.

With independent gaussian distributed errors, the summing will reduce
the expected relative error.

Post by Tom Gardner
Never underestimate human stupidity. And you can't make
things foolproof because fools are so damn ingenious.

:-)
Terje

Terje Mathisen

2014-03-12 22:28:03 UTC

Not always :( I've seen (otherwise tolerable) programmers
on someone else's project manage to actively avoid thinking
about that and go through some truly remarkable gyrations
to (fail to) fix things up.
They also had to be lead by the hand to realise that
if you know X and Y to +-10%, you don't know X+Y to 10%.

The maximum error in the sum is +-20%, but only if the original bounds
were absolute and X and Y are approximately equal.

Mea culpa!

With opposite signs the sum (i.e. difference) can give infinite relative
error, while addition will maintain the 10% error limit.

Post by Terje Mathisen
With independent gaussian distributed errors, the summing will reduce
the expected relative error.

Post by Tom Gardner
Never underestimate human stupidity. And you can't make
things foolproof because fools are so damn ingenious.

:-)
Terje

Tom Gardner

2014-03-12 23:09:50 UTC

Not always :( I've seen (otherwise tolerable) programmers
on someone else's project manage to actively avoid thinking
about that and go through some truly remarkable gyrations
to (fail to) fix things up.
They also had to be lead by the hand to realise that
if you know X and Y to +-10%, you don't know X+Y to 10%.

The maximum error in the sum is +-20%, but only if the original bounds
were absolute and X and Y are approximately equal.

Mea culpa!
With opposite signs the sum (i.e. difference) can give infinite relative error, while addition will maintain the 10% error limit.

If I still worked there, you would give the perpetrators
some small satisfaction! Unfortunately /they/ didn't manage
to work out the correct answer on their own.

Nick Maclaren

2014-03-12 22:37:38 UTC

Well, people dealing with decimal arithmetic know there is no such thing as 1/3. There is only (in US currency as an example), 33 cents, or .33 dollars. And pretty much everyone knows that three
times 33 cents equals 99 cents, not one dollar.

Until very recently, the University of Cambridge levied fines in
marks, which are one third of a pound sterling, which used to be
6 shillings 8 pence. When the UK was decimalised, that was changed
to 33 pence, but with the rule that 3 of them was one pound.

No, I am not kidding.

Regards,
Nick Maclaren.

Tom Gardner

2014-03-12 23:12:35 UTC

Well, people dealing with decimal arithmetic know there is no such thing as 1/3. There is only (in US currency as an example), 33 cents, or .33 dollars. And pretty much everyone knows that three
times 33 cents equals 99 cents, not one dollar.

Until very recently, the University of Cambridge levied fines in
marks, which are one third of a pound sterling, which used to be
6 shillings 8 pence.

Excellent. When was that exchange rate (a) of practical use
and (b) determined?

Post by Nick Maclaren
When the UK was decimalised, that was changed
to 33 pence, but with the rule that 3 of them was one pound.
No, I am not kidding.

Seems a good engineering rule of thumb.

But simpler if the University had decreed that all
fines should be a multiple of three marks.

Terje Mathisen

2014-03-13 06:00:33 UTC

Post by Nick Maclaren
When the UK was decimalised, that was changed
to 33 pence, but with the rule that 3 of them was one pound.
No, I am not kidding.

Seems a good engineering rule of thumb.
But simpler if the University had decreed that all
fines should be a multiple of three marks.

What the rule means is that the rounding rule specifies that all amounts
must be calculated as integers only, then "rounded" at the point of
payment by a division by 3, truncated to the nearest .01.

This is of course different from normal rounding in that two of them
would be .66 and not .67 as default fp hw would give, in both binary and
decimal flavors.

Terje

Nick Maclaren

2014-03-13 09:15:17 UTC

Well, people dealing with decimal arithmetic know there is no such thing as 1/3. There is only (in US currency as an example), 33 cents, or .33 dollars. And pretty much everyone knows that three
times 33 cents equals 99 cents, not one dollar.

Until very recently, the University of Cambridge levied fines in
marks, which are one third of a pound sterling, which used to be
6 shillings 8 pence.

Excellent. When was that exchange rate (a) of practical use
and (b) determined?

It's not an exchange rate - as with the dollar, the mark was a
unit of English currency. And when? Sometime in the past
800 years - before my time, anyway :-)

Regards,
Nick Maclaren.

MitchAlsup

2014-03-13 00:22:14 UTC

Post by Nick Maclaren
Until very recently, the University of Cambridge levied fines in
marks, which are one third of a pound sterling, which used to be
6 shillings 8 pence. When the UK was decimalised, that was changed
to 33 pence, but with the rule that 3 of them was one pound.

Seems to me with rules like that the *.gov should be in the business
of designing math units able to put up with the arcane rules the
*.legislature imposes.

{I'm not holding my breath.}

Mitch

Nick Maclaren

2014-03-12 22:43:39 UTC

Post by Robert Wessel
On Wed, 12 Mar 2014 12:21:38 -0700 (PDT), Quadibloc

I presume the idea is that the only reason to use a decimal type, either the old-fashioned fixed-point packed decimal, or the new-fangled decimal floating point, is in order to handle just exactly those business applications which have arcane rounding rules.
Therefore, a decimal type that cannot support them has essentially no application domain - at least that seems to be Nick MacLaren's claim.

Your error is in in changing the context, from there being no
application domain that is not already satisfied by existing
mechanisms to there being no domain in an absolute sense.

Yes, OF COURSE, decimal floating- and fixed-point can be used
for other purposes but, in almost every case, binary floating-
point is better (and already near-universal).

Essentially the ONLY reason for wanting decimal arithmetic is for
codes constrained by arcane accounting and legal rules. If it
can't meet those, then it has no useful function.

Regards,
Nick Maclaren.

Stephen Fuld

2014-03-12 23:13:21 UTC

Post by Robert Wessel
On Wed, 12 Mar 2014 12:21:38 -0700 (PDT), Quadibloc

I presume the idea is that the only reason to use a decimal type, either the old-fashioned fixed-point packed decimal, or the new-fangled decimal floating point, is in order to handle just exactly those business applications which have arcane rounding rules.
Therefore, a decimal type that cannot support them has essentially no application domain - at least that seems to be Nick MacLaren's claim.

Your error is in in changing the context, from there being no
application domain that is not already satisfied by existing
mechanisms to there being no domain in an absolute sense.
Yes, OF COURSE, decimal floating- and fixed-point can be used
for other purposes but, in almost every case, binary floating-
point is better (and already near-universal).
Essentially the ONLY reason for wanting decimal arithmetic is for
codes constrained by arcane accounting and legal rules. If it
can't meet those, then it has no useful function.

I don't think we disagree much here. I agree that out of the area of
business, decimal in general and decimal floating point in particular
have essentially no applicability. And I think we agree that if the
implementation didn't satisfy at least a sufficiently large percentage
of the existing business usage, including rounding rules, then it would
have no applicability.

I maintain that if decimal can handle the vast majority of uses in the
business world, then it may be sufficiently useful to implement even if
it can't satisfy every arcane set of rules in use somewhere.

Furthermore, I believe that this is precisely the situation i.e. it can
handle the vast majority, but not all the cases.

Perhaps our only disagreement is how arcane is arcane? :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

William Clodius

2014-03-13 02:00:57 UTC

I see no need to provide support, much less efficient support, for
arcane rounding rules that are not going to be applicable in the
domain where the basic data type is going to be used.

I presume the idea is that the only reason to use a decimal type,
either the old-fashioned fixed-point packed decimal, or the new-fangled
decimal floating point, is in order to handle just exactly those
business applications which have arcane rounding rules.
Therefore, a decimal type that cannot support them has essentially no
application domain - at least that seems to be Nick MacLaren's claim.

Your error is in in changing the context, from there being no
application domain that is not already satisfied by existing
mechanisms to there being no domain in an absolute sense.
Yes, OF COURSE, decimal floating- and fixed-point can be used
for other purposes but, in almost every case, binary floating-
point is better (and already near-universal).
Essentially the ONLY reason for wanting decimal arithmetic is for
codes constrained by arcane accounting and legal rules. If it
can't meet those, then it has no useful function.
Regards,
Nick Maclaren.

The real question is not whether it can meet them, with suitable
libraries it can of course be used to meet them, but whether the
suitable libraries will be sufficiently better in a signifant sense than
libraries based on other types already in existence, libraries currently
in existence in particular. Better can be any combination of faster,
simpler, easier to maintain, easier to extend, ...

Quadibloc

2014-03-13 01:38:22 UTC

Yes, that is entirely correct. An integer, multiplied by a power of ten, has the same information whether that integer is represented in base-10, base-2, or base-3 or base-57.

However, I've already addressed this point. While decimal arithmetic is slower, and involves more overhead in terms of transistors for any given speed of implementation, that is nothing compared to the time required to perform *division by 10* in order to align exponents.

This is true even if you have a hardcoded table of the reciprocals of 10, 100, 10000, 100000000, and so on, to help speed things up.

This was all right for a JOSS interpreter; it is not acceptable if one wants a level of decimal floating-point performance suitable for serious numerical work (i.e. running a CFD code again in base-10 as a crude substitute for proper error analysis).

Of course, since nobody is proposing that DFP should be used for serious numerical work, the objection may be moot.

John Savard

Quadibloc

2014-03-11 05:37:27 UTC

Post by Robert Wessel
Nobody is saying that you don't need *decimal* scaled arithmetic, but
there's no need for that to be in actual decimal.
What the ISA needs is support for

...

Post by Robert Wessel
decently performing
scaling (multiplying/dividing by powers of 10)

For certain values of "decently", the only way *to* do that (for dividing by 10; _multiplying_ by 10 is not bad, and would cause _less_ of a performance hit than doing arithmetic in decimal) _is_ to do decimal scaled arithmetic in actual decimal.

John Savard

Robert Wessel

2014-03-11 06:25:58 UTC

On Mon, 10 Mar 2014 22:37:27 -0700 (PDT), Quadibloc

Post by Robert Wessel
Nobody is saying that you don't need *decimal* scaled arithmetic, but
there's no need for that to be in actual decimal.
What the ISA needs is support for

...

Post by Robert Wessel
decently performing
scaling (multiplying/dividing by powers of 10)

Reciprocal multiplication takes care of a big chunk of the work for
the decimal right shifts, but does produce some awkward aspects to
rounding. You really want access to the remainder, which means you're
stuck doing another multiplication. Some fast hardware support for
that, for at least a few powers of ten (say one to six or seven),
would go a long way.

Terje Mathisen

2014-03-11 06:28:45 UTC

Post by Robert Wessel
Nobody is saying that you don't need *decimal* scaled arithmetic,
but there's no need for that to be in actual decimal.
What the ISA needs is support for

...

Post by Robert Wessel
decently performing scaling (multiplying/dividing by powers of 10)

For certain values of "decently", the only way *to* do that (for
dividing by 10; _multiplying_ by 10 is not bad, and would cause
_less_ of a performance hit than doing arithmetic in decimal) _is_
to do decimal scaled arithmetic in actual decimal.

You scale by any power of 10 using a lookup table of reciprocals:

Less than 10 cycles even if you also need to get the remainder with
back-multiplication and subtraction.

Terje

Quadibloc

2014-03-13 04:13:18 UTC

Post by Terje Mathisen
Less than 10 cycles even if you also need to get the remainder with
back-multiplication and subtraction.

You probably would, if you need to satisfy some perverse rounding rule.

But "less than 10 cycles" added to a floating add is *not* trivial. Floating adds don't normally take that long, and if you implement decimal arithmetic with suitable hardware, you can do better than that.

I'm not saying the Intel format is wildly impractical, just that it's only suitable if you're willing to accept DFP performance that's quite a bit inferior to normal floating-point performance. If, instead, you want to get as close to parity as can possibly be achieved, you do need to do the arithmetic in decimal.

John Savard

Terje Mathisen

2014-03-13 06:21:34 UTC

Post by Terje Mathisen
Less than 10 cycles even if you also need to get the remainder
with back-multiplication and subtraction.

I'm not suggesting you do this at all, only that if you have to you
should at least do it right. :-)

I took a look at the reference implementation of the dec64 code which
started this thread, it does all scaling with loops of either division
or multiplication by 10!

Rounding is fixed to nearest or away from zero, and since the mantissa
format is twos complement, this makes for pretty complicated logic.

The only reason he can get reasonable performance is the fact that he
only normalizes when he has to, i.e. when running out of digits!

When working with add/sub of fixed-scale numbers or integer multiples of
such, no rescaling is needed, so you can do dec64_add() in 2-3 cycles
(plus the call or inlining overhead) and dec64_mul() in the same plus
the imul time.

I.e. this looks like a reasonable platform for the phone bill benchmark,
except/unless that benchmark specifies that every sub-item (i.e. call)
has to be rounded to N (probably 2!) digits and the price per second has
M (not equal to N) fractional digits.

Terje

Post by Quadibloc
I'm not saying the Intel format is wildly impractical, just that it's
only suitable if you're willing to accept DFP performance that's
quite a bit inferior to normal floating-point performance. If,
instead, you want to get as close to parity as can possibly be
achieved, you do need to do the arithmetic in decimal.
John Savard

Bill Findlay

2014-03-11 01:48:17 UTC

Post by Ivan Godard
Quad int gives you the precision, but then you have to manage the units
explicitly yourself. (units in this context means where to put the
decimal point).

Not if your programming language supports fixed-point decimal natively.
With a beautiful irony, my favourite language (Ada) does so, GNAT using
compiler-scaled binary as a fast and portable implementation.

Post by Ivan Godard
To say that commercial can be done in quad int, so there's no need for
correctly-rounded decimal, is to say that we can code in octal, so
there's no need for {your choice of language}.
I'll believe it when I hear someone who makes his living
writing commercial code say so.

http://www.adacore.com/press/deep-blue-capital-financial-system-development/

--
Bill Findlay
with blueyonder.co.uk;
use surname & forename;

Ivan Godard

2014-03-11 03:23:16 UTC

Post by Ivan Godard
Quad int gives you the precision, but then you have to manage the units
explicitly yourself. (units in this context means where to put the
decimal point).

And both definition and implementation predated the 754 standard, right?
And the proposals for the next revision of Ada include accepting IEEE
decimal, right?

Robert A Duff

2014-03-11 17:10:04 UTC

Post by Ivan Godard
Quad int gives you the precision, but then you have to manage the units
explicitly yourself. (units in this context means where to put the
decimal point).

And both definition and implementation predated the 754 standard, right?

Decimal fixed point was added to Ada in Ada 95. It was probably
implemented in GNAT and other Ada compilers in 1996 or so.
You can say:

type Money is delta 0.01 digits 18;

which is a type whose range is exactly
-9_999_999_999_999_999.99 .. 9_999_999_999_999_999.99.
This is typically implemented as a scaled *binary* two's complement
integer -- i.e. it's internally an exact integer number of pennies.
The rounding rules and whatnot are all well-specified and portable.
There is no need to do scaling by hand; the compiler takes care of it.
There is a way to request a BCD representation, but I don't think
many people use that.

You can also say:

type Money is delta 0.01 digits 18
range 0.0 .. 10.0**16 - 0.01;

to get the range 0.0 .. 9_999_999_999_999_999.99.

In other words, this seems like a programming language issue,
not a machine architecture issue, and I see no need for the DEC64
type that started this discussion. Just use a language that has
good support for exact fixed-point arithmetic. It is indeed a
huge pain to do scaling by hand!

If you wanted to support DEC64 in C, you'd have to extend the C
language. But if you're in the business of language extensions,
why wouldn't you add fixed point (i.e. scaled integers) to C,
which is already fully supported on existing hardware?

Post by Ivan Godard
And the proposals for the next revision of Ada include accepting IEEE
decimal, right?

The current version of the language is Ada 2012. The next version
will probably be Ada 2022 or thereabouts.

As far as I know, there are no plans to add decimal floating point
to Ada. I am on the committee that makes such decisions, and I don't
recall any requests for it from the user community.

- Bob

Bill Findlay

2014-03-11 18:10:38 UTC

Post by Robert A Duff

Post by Ivan Godard
Quad int gives you the precision, but then you have to manage the units
explicitly yourself. (units in this context means where to put the
decimal point).

And both definition and implementation predated the 754 standard, right?

Decimal fixed point was added to Ada in Ada 95. It was probably
implemented in GNAT and other Ada compilers in 1996 or so.
type Money is delta 0.01 digits 18;
which is a type whose range is exactly
-9_999_999_999_999_999.99 .. 9_999_999_999_999_999.99.
This is typically implemented as a scaled *binary* two's complement
integer -- i.e. it's internally an exact integer number of pennies.
The rounding rules and whatnot are all well-specified and portable.
There is no need to do scaling by hand; the compiler takes care of it.

Would it be simple to derive another type from such, so that the arithmetic
operations could be overloaded to implement different rounding rules?

--
Bill Findlay
with blueyonder.co.uk;
use surname & forename;

Robert A Duff

2014-03-11 23:20:24 UTC

Post by Robert A Duff
type Money is delta 0.01 digits 18;
which is a type whose range is exactly
-9_999_999_999_999_999.99 .. 9_999_999_999_999_999.99.
This is typically implemented as a scaled *binary* two's complement
integer -- i.e. it's internally an exact integer number of pennies.
The rounding rules and whatnot are all well-specified and portable.
There is no need to do scaling by hand; the compiler takes care of it.

Would it be simple to derive another type from such, so that the arithmetic
operations could be overloaded to implement different rounding rules?

Well, I'm mostly a compiler writer, so I don't write a lot of code
involving money (or involving floating point, either), but I guess
it wouldn't be too hard to do what you suggest.

I think the important thing is that rounding be well defined, so
programmers can easily obey laws that specify rounding rules for amounts
of money. Someone thought about this for Ada 95, as evidenced by the
following from the Annotated Ada Reference Manual (note the ``"undone"
by hand'' comment):

33.a/2 Discussion: {AI95-00267-01} This was implementation defined in
Ada 83. There seems no reason to preserve the nonportability in Ada
95. Round-away-from-zero is the conventional definition of rounding,
and standard Fortran and COBOL both specify rounding away from zero,
so for interoperability, it seems important to pick this. This is also
the most easily "undone" by hand. Round-to-nearest-even is an
alternative, but that is quite complicated if not supported by the
hardware. In any case, this operation is not usually part of an inner
loop, so predictability and portability are judged most important. A
floating point attribute function Unbiased_Rounding is provided (see
A.5.3) for those applications that require round-to-nearest-even, and
a floating point attribute function Machine_Rounding (also see A.5.3)
is provided for those applications that require the highest possible
performance. "Deterministic" rounding is required for static
conversions to integer as well. See 4.9.

This is talking about rounding involving both floating-point and
fixed-point types.

- Bob

Nick Maclaren

2014-03-12 08:51:01 UTC

Post by Robert A Duff
I think the important thing is that rounding be well defined, so
programmers can easily obey laws that specify rounding rules for amounts
of money. Someone thought about this for Ada 95, as evidenced by the
following from the Annotated Ada Reference Manual (note the ``"undone"
33.a/2 Discussion: {AI95-00267-01} This was implementation defined in
Ada 83. There seems no reason to preserve the nonportability in Ada
95. Round-away-from-zero is the conventional definition of rounding,
and standard Fortran and COBOL both specify rounding away from zero,
so for interoperability, it seems important to pick this. This is also
the most easily "undone" by hand. Round-to-nearest-even is an
alternative, but that is quite complicated if not supported by the
hardware. In any case, this operation is not usually part of an inner
loop, so predictability and portability are judged most important. A
floating point attribute function Unbiased_Rounding is provided (see
A.5.3) for those applications that require round-to-nearest-even, and
a floating point attribute function Machine_Rounding (also see A.5.3)
is provided for those applications that require the highest possible
performance. "Deterministic" rounding is required for static
conversions to integer as well. See 4.9.
This is talking about rounding involving both floating-point and
fixed-point types.

Er, Fortran doesn't specify it :-) From Fortran 2008, 1 Overview,
1.1 Scope:

* the physical properties of the representation of quantities
and the method of rounding, approximating, or computing numeric
values on a particular processor, except by reference to the
IEEE International Standard under conditions specified in
Clause 14, [ i.e. optional, and optionally selected ]

Regards,
Nick Maclaren.

Nick Maclaren

2014-03-11 20:21:53 UTC

Post by Robert A Duff

Post by Ivan Godard
And the proposals for the next revision of Ada include accepting IEEE
decimal, right?

The current version of the language is Ada 2012. The next version
will probably be Ada 2022 or thereabouts.
As far as I know, there are no plans to add decimal floating point
to Ada. I am on the committee that makes such decisions, and I don't
recall any requests for it from the user community.

I was active in WG14 at the time that was picked up for C (and hence
C++), and I saw no evidence that there was any user interest among
the C user community, either. As with long long, I investigated,
and found that most of the claims of its proponents were not
supported by the facts. That made no difference in either case.

I raised it in WG5, and the reaction was as I expected :-) That
was that the existing Fortran model already allows an implementation
to have different real kinds with different bases, but nobody could
think of any good reason to bother with decimal floating-point if
binary is available. Well, it was actually rather more, er, robust.

Regards,
Nick Maclaren.

Robert A Duff

2014-03-11 23:44:16 UTC

Post by Nick Maclaren
I was active in WG14 at the time that was picked up for C (and hence
C++), and I saw no evidence that there was any user interest among
the C user community, either. As with long long, I investigated,

Interesting.

How does WG14 work in practice? The corresponding thing for Ada is WG9,
but it delegates almost all technical work to a subcommittee called Ada
Rapporteur Group (ARG). WG9 is mostly political, as opposed to
technical, and with rare exceptions, WG9 does whatever ARG says.

What is/was the "long long" issue?

Post by Nick Maclaren
and found that most of the claims of its proponents were not
supported by the facts. That made no difference in either case.

Yeah, well, sometimes language standards committees add features
that users don't particularly want. Ada 2005 added support for leap
seconds, which as far as I can tell was just a waste of effort.
C added support for threads, but I don't see the point, because there's
already a standard for threads in C -- Posix. (Ada had threads since
Day 1, of course. The GNAT implementation is built on top of Posix
threads, on platforms that support it.)

Post by Nick Maclaren
I raised it in WG5, and the reaction was as I expected :-) That
was that the existing Fortran model already allows an implementation
to have different real kinds with different bases, but nobody could
think of any good reason to bother with decimal floating-point if
binary is available. Well, it was actually rather more, er, robust.

;-)

- Bob

Nick Maclaren

2014-03-12 09:13:49 UTC

Post by Robert A Duff

Interesting.
How does WG14 work in practice?

Dysfunctionally. It is run by a caucus, essentially entirely from
within one country. It was a little better in C99, but the only
other active country voted a flat "no" on both technical and
procedural grounds.

Post by Robert A Duff
What is/was the "long long" issue?

In K&R C and C90, (unsigned) long was guaranteed to be the longest
integer type. In C99, it was asserted that vast amounts of software
assumed that long was 32-bit, and many compilers used long long to
provide 64-bit. Despite repeated requests, the proponents never
produced ONE example of such a compiler, though they mentioned
Microsoft - in fact, the relevant compiler was still in development.
I investigated, and found no examples, but the proposed change
broke the majority of 24 widely-used and portable Internet programs,
with only gcc itself having any significant use of long long!

That was not the only reason that so many C using communities
gave C99 the thumbs down, but it was perhaps the main one. Even
today, a huge number of programs rely on the C90 guarantee, and
all but one compiler (with one option) provide it, as far as I
know.

Post by Robert A Duff
Yeah, well, sometimes language standards committees add features
that users don't particularly want. Ada 2005 added support for leap
seconds, which as far as I can tell was just a waste of effort.
C added support for threads, but I don't see the point, because there's
already a standard for threads in C -- Posix. (Ada had threads since
Day 1, of course. The GNAT implementation is built on top of Posix
threads, on platforms that support it.)

POSIX threading is so broken as to be effectively unusable in any
program that wants either portability or reliability. However,
that is not a justification for adding it to C, because almost
nobody in the IT world wants any extensions (let alone yet more
incompatible changes) to C. Indeed, of the C using communities
I followed, the number that had adopted C99 even in principle
reached half only in 2012. Few of them have even heard of C11,
and I know none that give a damn about it except on legalistic
grounds.

Regards,
Nick Maclaren.

Quadibloc

2014-03-12 10:50:48 UTC

Post by Nick Maclaren
In C99, it was asserted that vast amounts of software
assumed that long was 32-bit, and many compilers used long long to
provide 64-bit. Despite repeated requests, the proponents never
produced ONE example of such a compiler, though they mentioned
Microsoft - in fact, the relevant compiler was still in development.

There may not be any compilers for which long long is a 64-bit integer.

But there are many compilers for which int is 16 bits, and long is 32 bits. This may well have been true even for some compilers for the 8086 and 8088, and probably was, before the 80386 came along.

But I will have to admit that there aren't many PDP-11s in active service; they're usually in museums.

Of course, if one can say that C assumes you're programming on a PDP-11, then I suppose you can say that FORTRAN assumes that you're programming on an IBM 704.

So when will the personal computer be modified to be compatible with FORTRAN, by adding

- 36-bit single precision floating point
- Four sense lights
- Six sense switches

all of which are missing from today's PCs?

I suppose the sense lights and sense switches could be added to keyboards, rather than being put on the box containing the CPU itself (the system unit).

John Savard

Tom Gardner

2014-03-12 11:27:30 UTC

Post by Quadibloc
But I will have to admit that there aren't many PDP-11s in active service; they're usually in museums.

I'm not sure whether you class this as a museum...
http://www.theregister.co.uk/2013/06/19/nuke_plants_to_keep_pdp11_until_2050/
Assembler programmers wanted...

Quadibloc

2014-03-13 04:15:03 UTC

Post by Tom Gardner
I'm not sure whether you class this as a museum...

I did see that news item. But then, I said "usually".

John Savard

Casper H.S. Dik

2014-03-12 12:31:19 UTC

There may not be any compilers for which long long is a 64-bit integer.

All the 32 bit Solaris/SunOS compilers defined "long long" as a 64-bit
integer. long/int/intptr_t are all 32-bit in the 32 bit compilation
environment.

The natural way to extend this to 64-but was keeping the int as 32-bits
but made long/pointers 64 bits; but some believed that int and long
should be the same size so we have Microsoft (int/long 32) on the one
hand, HAL (first SPARC Solaris port) with int/long 64 bit and no natural
32 bit type and the middleground with int 32 bit and long 64 bits.

Casper

Nick Maclaren

2014-03-12 12:55:32 UTC

Post by Casper H.S. Dik

There may not be any compilers for which long long is a 64-bit integer.

All the 32 bit Solaris/SunOS compilers defined "long long" as a 64-bit
integer. long/int/intptr_t are all 32-bit in the 32 bit compilation
environment.
The natural way to extend this to 64-but was keeping the int as 32-bits
but made long/pointers 64 bits; but some believed that int and long
should be the same size so we have Microsoft (int/long 32) on the one
hand, HAL (first SPARC Solaris port) with int/long 64 bit and no natural
32 bit type and the middleground with int 32 bit and long 64 bits.

Yes. The issue was actually whether size_t could be longer than
unsigned long, and the claim was that (in 1987-8), there were many
such systems. As far as I could discover, there were none, and
there was only one under development. And there weren't all that
many that had a long long type at all.

As the incompatible change to introduce long long into C99 and allow
the standard types to exceed long broke a HUGE number of portable
programs, there was considerable opposition. The UK attempted to
ameliorate the incompatibility by having size_t and ptrdiff_t
constrained, but that was effectively ignored. However, the fact
that size_t and ptrdiff_t are no longer than long is still assumed
by a large number of portable programs and guaranteed by almost
all compilers.

Regards,
Nick Maclaren.

Anton Ertl

2014-03-12 13:58:38 UTC

Post by Casper H.S. Dik

...

Post by Casper H.S. Dik
All the 32 bit Solaris/SunOS compilers defined "long long" as a 64-bit
integer. long/int/intptr_t are all 32-bit in the 32 bit compilation
environment.

Same for all other common Unices.

Post by Nick Maclaren
Yes. The issue was actually whether size_t could be longer than
unsigned long, and the claim was that (in 1987-8), there were many
such systems.

But that has nothing to do with long long.

These days, there is Microsoft with it's IL32LLP64 Windows, but I
think, for the rest of the world sizeof(size_t)<=sizeof(long).

Post by Nick Maclaren
And there weren't all that
many that had a long long type at all.

In the C99 time frame? Doubtful. On Unices at least there are 64-bit
variants of lseek(), and on 32-bit systems (with 32-bit longs), this
was done with long long. Certainly gcc had long long.

Post by Nick Maclaren
As the incompatible change to introduce long long into C99 and allow
the standard types to exceed long broke a HUGE number of portable
programs, there was considerable opposition. The UK attempted to
ameliorate the incompatibility by having size_t and ptrdiff_t
constrained, but that was effectively ignored. However, the fact
that size_t and ptrdiff_t are no longer than long is still assumed
by a large number of portable programs and guaranteed by almost
all compilers.

Of course C standards bigots would argue that these programs are not
portable and they would probably claim that the program was broken
already...

- anton

--
M. Anton Ertl Some things have to be seen to be believed
***@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Nick Maclaren

2014-03-12 14:49:06 UTC

Post by Anton Ertl

Post by Nick Maclaren
Yes. The issue was actually whether size_t could be longer than
unsigned long, and the claim was that (in 1987-8), there were many
such systems.

But that has nothing to do with long long.

It was the main argument used to claim that long long was essential,
actually - the file offset one was secondary.

Post by Anton Ertl
These days, there is Microsoft with it's IL32LLP64 Windows, but I
think, for the rest of the world sizeof(size_t)<=sizeof(long).

I.e. they have maintained compatibility with C90.

Post by Anton Ertl

Post by Nick Maclaren
And there weren't all that
many that had a long long type at all.

In the C99 time frame? Doubtful. On Unices at least there are 64-bit
variants of lseek(), and on 32-bit systems (with 32-bit longs), this
was done with long long. Certainly gcc had long long.

Actually, no, it wasn't c. 1995, which was the relevant time. Most
of the 32-bit systems didn't support 64-bit lseeks, and most of
those that did used implementation-specific types or other hacks.

As I posted, gcc was the ONLY one of the 24 codes I looked at that
used long long in open (i.e. not system-specific) code, and none
of the others used it for more than one or two extreme systems.

Post by Anton Ertl

Of course C standards bigots would argue that these programs are not
portable and they would probably claim that the program was broken
already...

They did. They were telling porkies. I posted some examples of
specific code that would break, and the bigots merely got abusive.
For example:

void fred (ptrdiff_t arg) {printf("%ld\n",(long)arg);}

That was the most common feature broken in the programs I looked
at, but there were quite a lot that used long for calculation,
which were correct and portable in C90.

Regards,
Nick Maclaren.

Quadibloc

2014-03-13 04:24:49 UTC

Post by Nick Maclaren
Yes. The issue was actually whether size_t could be longer than
unsigned long,

Well, if long is a 32-bit signed number, and long long is a 64-bit signed number, then unsigned long has a range from 0 to 4294967295, and long long has a range from -9223372036854775808 to 9223372036854775807.

Maybe C on the Sun didn't have size_t? I seem to be missing your point here, unless you're trying to claim that size_t is allowed to have a domain that excludes some types that actually exist, or it's allowed to lie.

One of these days somebody is going to make a computer that does 128 bit fixed-point arithmetic. Do you want to force such a computer to use 64 bit integers as its default type?

If not, long long is *necessary* to allow C itself to be ported to systems with an arbitrary set of data types, and if it breaks programs, that's just too bad.

I might suggest, though, that to prevent this from happening in future, they'd better include long long long, long long long long, and so on, in the language definition as well.

John Savard

Ivan Godard

2014-03-13 05:12:31 UTC

Post by Nick Maclaren
Yes. The issue was actually whether size_t could be longer than
unsigned long,

Well, if long is a 32-bit signed number, and long long is a 64-bit
signed number, then unsigned long has a range from 0 to 4294967295,
and long long has a range from -9223372036854775808 to
9223372036854775807.
Maybe C on the Sun didn't have size_t? I seem to be missing your
point here, unless you're trying to claim that size_t is allowed to
have a domain that excludes some types that actually exist, or it's
allowed to lie.
One of these days somebody is going to make a computer that does 128
bit fixed-point arithmetic. Do you want to force such a computer to
use 64 bit integers as its default type?
If not, long long is *necessary* to allow C itself to be ported to
systems with an arbitrary set of data types, and if it breaks
programs, that's just too bad.
I might suggest, though, that to prevent this from happening in
future, they'd better include long long long, long long long long,
and so on, in the language definition as well.
John Savard

Mill:
byte = 8
short = 16
int = 32
long = 64
long long = 128
size_t = 64
ptrdiff_t = 64

Size_t is the type returned by sizeof(). i.e. the largest possible
array. Ptrdiff_t is the type returned by p-q, There is no reason either
must have any particular connection with the sizes used for numeric
computation.

Terje Mathisen

2014-03-11 06:21:57 UTC

Post by Ivan Godard
and somewhat over a quarter of all machine cycles were spent in the
decimal arithmetic routines.

On which benchmark on which machine?

Indeed.

25% is a proof of incompetence, totally independent of the application
domain.

Post by Michael S
IMHO, what's woud really be usefull for that sort of application is
not DFP, but improved hardware support for 128-bit integer, likely
including multiplication. May be even, division, although I am not
sure that imparct of division is above noise level. As to

64x64->128 mul, either directly or as separate mul and mulh opcodes is
the only needed support, and (if fast!) this also make the best support
for general bigint operations.

For division you really only need a fast way to generate reciprocals.

Post by Michael S
BCD<->binary conversions, when competently coded on SNB and friends
they are already damn fast. Hardware support can make it faster yet,
by why should we try to speed something which is not a bottleneck?

Indeed.

Even when it is a bottleneck, the algorithm I published here many years
ago runs in ~30 cycles on cpus with fast integer mul.

This is a full 32-bit unsigned to 10 bcd/ascii digits conversion.

64 bit to 20 digits takes less than twice as long, particularly since
modern 64-bit capable cores usually allow 3-way superscalar so I can
split the operation into three 7-digit parts that overlap completely.

Post by Michael S
Now, my opinion on the issue is really humble, deep inside I don't
care to be corrected. I'd guess that Terje, Robert and Nick has far
less humble opinions that are similar to mine.

IMNSHO you are completely right. :-)

Terje

Michael S

2014-03-11 09:40:19 UTC

Post by Terje Mathisen
Even when it is a bottleneck, the algorithm I published here many years
ago runs in ~30 cycles on cpus with fast integer mul.
This is a full 32-bit unsigned to 10 bcd/ascii digits conversion.
64 bit to 20 digits takes less than twice as long, particularly since
modern 64-bit capable cores usually allow 3-way superscalar so I can
split the operation into three 7-digit parts that overlap completely.

My latest AVX attempt at binary to BCD conversion (mostly, SIMD implementation of your algorithm) that I published here slightly more than a year ago certainly takes less than twice as long.

https://groups.google.com/forum/#!original/comp.arch/APbcPjoqs_g/pAjyaKjAsDIJ

But I am still unconvinced that binary-to-packed-BCD conversion is the best starting point for binary-to-unpacked-BCD conversion. Or that it is somehow useful in its own right.

Terje Mathisen

2014-03-11 10:28:10 UTC

Post by Terje Mathisen
Even when it is a bottleneck, the algorithm I published here many
years ago runs in ~30 cycles on cpus with fast integer mul.
This is a full 32-bit unsigned to 10 bcd/ascii digits conversion.
64 bit to 20 digits takes less than twice as long, particularly
since modern 64-bit capable cores usually allow 3-way superscalar
so I can split the operation into three 7-digit parts that overlap
completely.

Thanks, I remember this one, it makes it quite efficient to work with
19-digit numbers.

At this point one of the DFP bigots will come along with 34-digit
requirements, which does in fact hurt a little bit since you'll have to
start with a binary split using either a 128/64->(64,64) DIV or a much
more complicated twice unrolled reciprocal mul, back-multiplication and
subtraction startup.

Post by Michael S
But I am still unconvinced that binary-to-packed-BCD conversion is
the best starting point for binary-to-unpacked-BCD conversion. Or
that it is somehow useful in its own right.

If you need binary to ascii (i.e. unpacked BCD more or less), then
you'll have that as the target of your conversion algorithm instead of
having to unpack/widen the packed result.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Michael S

2014-03-11 11:05:51 UTC

Post by Terje Mathisen
Even when it is a bottleneck, the algorithm I published here many
years ago runs in ~30 cycles on cpus with fast integer mul.
This is a full 32-bit unsigned to 10 bcd/ascii digits conversion.
64 bit to 20 digits takes less than twice as long, particularly
since modern 64-bit capable cores usually allow 3-way superscalar
so I can split the operation into three 7-digit parts that overlap
completely.

Thanks, I remember this one, it makes it quite efficient to work with
19-digit numbers.
At this point one of the DFP bigots will come along with 34-digit
requirements, which does in fact hurt a little bit since you'll have to
start with a binary split using either a 128/64->(64,64) DIV or a much
more complicated twice unrolled reciprocal mul, back-multiplication and
subtraction startup.

I was not thinking specifically about 34-bit digit conversion, but in a post above I did call for improved 128-bit multiplication. It was you, not me, who said that 64x64=128 was sufficient ;)

Post by Michael S
But I am still unconvinced that binary-to-packed-BCD conversion is
the best starting point for binary-to-unpacked-BCD conversion. Or
that it is somehow useful in its own right.

If you need binary to ascii (i.e. unpacked BCD more or less), then
you'll have that as the target of your conversion algorithm instead of
having to unpack/widen the packed result.

What I am trying to say here (and was also saying a year ago) is that if the final target is an ASCII==unpacked_BCD then I am not at all sure that conversion to any form of base10 is the most optimal first step before seemingly inevitable final table lookup step. Given a task, I would try conversions to base100 and to base1000 as alternatives. It's very likely that what algorithm ends up the fastest depends not only on underlying CPU architecture and microarchitecture, but also on behavior of calling application, i.e. questions like "Is conversion called in relatively long bursts or individually/in short bursts?"

Terje Mathisen

2014-03-11 12:49:22 UTC

Post by Michael S
But I am still unconvinced that binary-to-packed-BCD conversion
is the best starting point for binary-to-unpacked-BCD conversion.
Or that it is somehow useful in its own right.

If you need binary to ascii (i.e. unpacked BCD more or less), then
you'll have that as the target of your conversion algorithm instead
of having to unpack/widen the packed result.

What I am trying to say here (and was also saying a year ago) is that
if the final target is an ASCII==unpacked_BCD then I am not at all
sure that conversion to any form of base10 is the most optimal first
step before seemingly inevitable final table lookup step. Given a
task, I would try conversions to base100 and to base1000 as
alternatives. It's very likely that what algorithm ends up the
fastest depends not only on underlying CPU architecture and
microarchitecture, but also on behavior of calling application, i.e.
questions like "Is conversion called in relatively long bursts or
individually/in short bursts?"

OK, that all makes sense.

For most apps/problems I'm quite happy to get within a factor of 2 of an
optimal solution, most times even a factor of 10 is OK.

I'm usually called in to troubleshoot underperforming solutions when
they are missing several orders of magnitude. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Robert Wessel

2014-03-11 01:08:39 UTC

Post by Ivan Godard
<snip>

Essentially damn-all, as has been known since time immemorial.
The first lunacy is that the fixed-point overflow exception is
essentially the same as the floating-point inexact one, but the
former is a serious error and the latter not an error. So should
you trap it or ignore it? The 'solution' is for everyone using
fixed-point to use 128-bit for everything - ha, ha.
The second is that almost all requirements for decimal fixed-
point are legally constrained, and the rules for handling
multiplication, division and precision conversion are legion,
with few of them being available in IEEE 754. Using floating-
point merely introduces the well-known problems of double
rounding.
That's not all, but those are the main ones that I know of.

Commercial databases are a reasonably large market. We had a study that
evaluated where Oracle spent its time. Turns out that a database schema
format "numeric" means decimal and somewhat over a quarter of all
machine cycles were spent in the decimal arithmetic routines.
We had lots of input from the COBOL committee; they are intimately
familiar with the issues of legal requirements for computation. They
were desperate for a standard, and have incorporated 754 into the new
COBOL upgrade. They felt that the facilities in IEEE Decimal were
suitable for programming against all legal requirements.
COBOL is the Rodney Dangerfield of programming languages and gets no
respect. However, it is still true that a huge number of cycles are
executed by COBOL programs, and, within its application domain, COBOL is
quite suitable and preferable to other popular languages. Try writing -
and maintaining - the equivalent of MOVE CORRESPONDING in C and you will
see what I mean.

Without getting into specific language constructs, I'd quibble about
the "huge" number of cycles. Cobol has significant cycle-share only a
small number of platforms, the most significant by far being the IBM
mainframes. Of which there are about 10,000 in the world.

Quadibloc

2014-03-11 05:43:05 UTC

Post by Robert Wessel
Without getting into specific language constructs, I'd quibble about
the "huge" number of cycles. Cobol has significant cycle-share only a
small number of platforms, the most significant by far being the IBM
mainframes. Of which there are about 10,000 in the world.

Yes, that is true.

However, if you look at older microcomputer databases, like dBase II, they did their arithmetic in decimal, however clumsy that might have been on an 8080 or an 8086.

While newer database programs do have binary types, in general there is a preference for keeping numbers in (unpacked!) decimal form in a database.

Commercial applications, even if not done on mainframes specifically designed to perform them well, do use up a great deal of computer power. So if somebody were to make microprocessors that did decimal well, but which sold at commodity chip prices instead of mainframe prices, there would be a market.

Given the huge economies of scale in chip manufacture, though, that market indeed might not be economical to support, unfortunately. Decimal arithmetic on FPGAs anyone?

John Savard

Robert Wessel

2014-03-11 06:39:17 UTC

On Mon, 10 Mar 2014 22:43:05 -0700 (PDT), Quadibloc

Yes, that is true.
However, if you look at older microcomputer databases, like dBase II, they did their arithmetic in decimal, however clumsy that might have been on an 8080 or an 8086.
While newer database programs do have binary types, in general there is a preference for keeping numbers in (unpacked!) decimal form in a database.
Commercial applications, even if not done on mainframes specifically designed to perform them well, do use up a great deal of computer power. So if somebody were to make microprocessors that did decimal well, but which sold at commodity chip prices instead of mainframe prices, there would be a market.

I'm with Nick here, I just have trouble seeing that the *decimal* part
of the vast majority of commercial loads is really that significant.
Between the high I/O loads and the low IPCs of most commercial loads,
the decimal math, which should actually run at quite high IPCs, and
cache near perfectly, should normally be a pretty small part of the
load. Some of the very high estimates you hear (like a quarter of the
cycles), have to be (significant) outliers.

Post by Quadibloc
Given the huge economies of scale in chip manufacture, though, that market indeed might not be economical to support, unfortunately. Decimal arithmetic on FPGAs anyone?

Eh. There's plenty of stuff with limited applicability in ISAs like
x86 already. If Intel could make x86s a third faster running Oracle
or SQL Server with a modest investment like that, they'd be all over
it. And ISAs like POWER and SPARC, that have always hosted
significant database/commercial loads, have only recently grown
decimal support.

Ivan Godard

2014-03-11 06:52:31 UTC

Post by Robert Wessel
On Mon, 10 Mar 2014 22:43:05 -0700 (PDT), Quadibloc

Post by Robert Wessel
Without getting into specific language constructs, I'd quibble
about the "huge" number of cycles. Cobol has significant
cycle-share only a small number of platforms, the most
significant by far being the IBM mainframes. Of which there are
about 10,000 in the world.

Yes, that is true.
However, if you look at older microcomputer databases, like dBase
II, they did their arithmetic in decimal, however clumsy that might
have been on an 8080 or an 8086.
While newer database programs do have binary types, in general
there is a preference for keeping numbers in (unpacked!) decimal
form in a database.
Commercial applications, even if not done on mainframes
specifically designed to perform them well, do use up a great deal
of computer power. So if somebody were to make microprocessors that
did decimal well, but which sold at commodity chip prices instead
of mainframe prices, there would be a market.

I'm with Nick here, I just have trouble seeing that the *decimal*
part of the vast majority of commercial loads is really that
significant. Between the high I/O loads and the low IPCs of most
commercial loads, the decimal math, which should actually run at
quite high IPCs, and cache near perfectly, should normally be a
pretty small part of the load. Some of the very high estimates you
hear (like a quarter of the cycles), have to be (significant)
outliers.

Try doing a range join on a million rows. All of disk and memory are
completely streamed, but you still have to do 10^6 X 10^6 X 2 compares.
Decimal compares.

And a million rows is a small database.

Post by Quadibloc
Given the huge economies of scale in chip manufacture, though, that
market indeed might not be economical to support, unfortunately.
Decimal arithmetic on FPGAs anyone?

Exactly. And they *were* all over it, to the point of arguably packing
the standards committee and subverting the process. I was there.
Lawyers, and salaries for a few meetings, are cheaper than a hunk of
hardware as big as four FPUs (128 bits, remember - and more if vectors
are supported) that is of use for only a specialized market, however big.

And ISAs like POWER and SPARC, that have always hosted

Post by Robert Wessel
significant database/commercial loads, have only recently grown
decimal support.

There hasn't been a standard to support for very long. The former 854
standard wasn't well-defined, so people made-do with BCD and
idiosyncratic ad-hocery.

Nick Maclaren

2014-03-11 10:01:54 UTC

Post by Robert Wessel
I'm with Nick here, I just have trouble seeing that the *decimal*
part of the vast majority of commercial loads is really that
significant. Between the high I/O loads and the low IPCs of most
commercial loads, the decimal math, which should actually run at
quite high IPCs, and cache near perfectly, should normally be a
pretty small part of the load. Some of the very high estimates you
hear (like a quarter of the cycles), have to be (significant)
outliers.

Try doing a range join on a million rows. All of disk and memory are
completely streamed, but you still have to do 10^6 X 10^6 X 2 compares.
Decimal compares.
And a million rows is a small database.

Such comparison is FASTER and more energy efficient using a scaled
integer approach. Even if it were not, to overload a modern CPU needs
an aggregate bandwidth in the TB/s range.

Post by Ivan Godard
And ISAs like POWER and SPARC, that have always hosted

Post by Robert Wessel
significant database/commercial loads, have only recently grown
decimal support.

There hasn't been a standard to support for very long. The former 854
standard wasn't well-defined, so people made-do with BCD and
idiosyncratic ad-hocery.

It was 15 years before even the original 754 standard became almost
universally available, and that is still only at the hardware level.
As far as I know, no language yet supports it in full, nor even has
plans to.

No, this was not a requirement looking for a solution.

Regards,
Nick Maclaren.

Terje Mathisen

2014-03-11 10:42:42 UTC

Post by Ivan Godard
Try doing a range join on a million rows. All of disk and memory are
completely streamed, but you still have to do 10^6 X 10^6 X 2 compares.
Decimal compares.
And a million rows is a small database.

Such comparison is FASTER and more energy efficient using a scaled
integer approach. Even if it were not, to overload a modern CPU needs
an aggregate bandwidth in the TB/s range.

Not only that:

If I had to implement this as a really time-critical operation, I would
look into using the parallel range compare operation, i.e. the SSE
opcode which allow you to setup an array of ranges and compare all
against all.

(BTW, a range compare of bcd data can be handled just fine with a binary
compare as long as the endianness is consistent.)

I would have to block one of the tables into L2-sized chunks, then
stream the second table past the first one once for each chunk.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

George Neuner

2014-03-11 23:38:08 UTC

On Tue, 11 Mar 2014 11:42:42 +0100, Terje Mathisen

Such comparison is FASTER and more energy efficient using a scaled
integer approach. Even if it were not, to overload a modern CPU needs
an aggregate bandwidth in the TB/s range.

If I had to implement this as a really time-critical operation, I would
look into using the parallel range compare operation, i.e. the SSE
opcode which allow you to setup an array of ranges and compare all
against all.

The problem is that only column-store RDBMS naturally keep data in a
format that makes corresponding fields in different records friendly
to processing with short-vector SIMD.

Obviously [unless your stable storage is SSDs] you can afford to
reformat data during I/O - the extra processing will add little to the
already huge I/O overhead. But regardless of which primary format you
have, some fundamental relational operation on it already is slow:
projection for row-store, selection for column-store, both if you try
to be cute by storing data items individually and simulating rows and
columns using indexes.

Using a non-relational DBMS doesn't help: every NoSQL storage scheme
I've seen spreads corresponding data the same as in a row-store RDBMS.

Post by Terje Mathisen
(BTW, a range compare of bcd data can be handled just fine with a binary
compare as long as the endianness is consistent.)
I would have to block one of the tables into L2-sized chunks, then
stream the second table past the first one once for each chunk.
Terje

George

Robert Wessel

2014-03-11 17:33:45 UTC

Post by Robert Wessel
On Mon, 10 Mar 2014 22:43:05 -0700 (PDT), Quadibloc

Post by Robert Wessel
Without getting into specific language constructs, I'd quibble
about the "huge" number of cycles. Cobol has significant
cycle-share only a small number of platforms, the most
significant by far being the IBM mainframes. Of which there are
about 10,000 in the world.

Yes, that is true.
However, if you look at older microcomputer databases, like dBase
II, they did their arithmetic in decimal, however clumsy that might
have been on an 8080 or an 8086.
While newer database programs do have binary types, in general
there is a preference for keeping numbers in (unpacked!) decimal
form in a database.
Commercial applications, even if not done on mainframes
specifically designed to perform them well, do use up a great deal
of computer power. So if somebody were to make microprocessors that
did decimal well, but which sold at commodity chip prices instead
of mainframe prices, there would be a market.

I'm with Nick here, I just have trouble seeing that the *decimal*
part of the vast majority of commercial loads is really that
significant. Between the high I/O loads and the low IPCs of most
commercial loads, the decimal math, which should actually run at
quite high IPCs, and cache near perfectly, should normally be a
pretty small part of the load. Some of the very high estimates you
hear (like a quarter of the cycles), have to be (significant)
outliers.

Try doing a range join on a million rows. All of disk and memory are
completely streamed, but you still have to do 10^6 X 10^6 X 2 compares.
Decimal compares.
And a million rows is a small database.

"Range join" means several different things to me. Do you mean a
query along the lines of: "SELECT ... INNER JOIN ... WHERE a.value >=
b.min AND a.value <= b.max"?

First, that's not really a very common thing to do, second, if the two
table are both to big to fit in memory performance is going to be
horrible, not matter what, if table A fits in memory, I'd expect the
optimizer to build a temporary index on a.value, and if it's still
stuck doing a huge scan, the comparison, at least if the scaling is
the same, is only moderately more complex than a string comparison,
and would like be modest compared to the overhead of processing all
10**12 combinations of records to compare.

So unless you mean something else, it's an outlier for usage, and not
really a good example of high overhead anyway.

George Neuner

2014-03-12 16:32:43 UTC

On Tue, 11 Mar 2014 12:33:45 -0500, Robert Wessel

Actually that's a common way to correlate dates or timestamps.
However, dates and timestamps are small values.

What isn't common (or shouldn't be) is to join tables that have very
many columns - the join result may have many columns but the source
tables ideally should not. If a solution requires joining very large
multicolumn tables then the DB design is poor and/or the question
being asked is wrong.

In any event, a sane person would project both A and B to eliminate
unnecessary columns, select the relevant ranges from both projections
and join those. Even so, it might touch a very large amount of data.

Post by Robert Wessel
So unless you mean something else, it's an outlier for usage, and not
really a good example of high overhead anyway.

Joins are the most CPU intensive operations in RDBMS. The mark of a
well written query is how *little* data it brings to a join.

George

Terje Mathisen

2014-03-12 17:30:32 UTC

Post by George Neuner
On Tue, 11 Mar 2014 12:33:45 -0500, Robert Wessel
In any event, a sane person would project both A and B to eliminate
unnecessary columns, select the relevant ranges from both projections
and join those. Even so, it might touch a very large amount of data.

Post by Robert Wessel
So unless you mean something else, it's an outlier for usage, and not
really a good example of high overhead anyway.

Joins are the most CPU intensive operations in RDBMS. The mark of a
well written query is how *little* data it brings to a join.

Which is why any join that the query optimizer decides that it needs a
table scan to perform is a red flag.

This particular 1e12 operation would pretty much have to consist of a
match between two indexed columns, reducing the total data amount to a
few MB.

(I'm assuming you don't end up selecting every possible pairing, since
1e12 actual records will take a while to deliver and fill up your disk
space while doing so! :-) )

Terje

Ivan Godard

2014-03-10 20:52:32 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

Because this is simply a wheel, and not a wheel that can take multiple
forms, including the ability to extend legs to cover terrain that
wheels can't. That aspect I can't fault!
However, I accept that they should have at least taken note of the
fact that IEEE 754 decimal floating-point has received a resounding
yawn (when not actually a Bronx cheer) from most IT communities as
an indication that this might well be a solution desperately looking
for a requirement.

I think they did ("A later revision of IEEE 754 attempted to remedy
this, but the formats it recommended were so inefficient that it has
not found much acceptance.").
As for acceptance, Fujitsu has recently announced adding it to SPARC,
so there are now three ISAs implementing it in hardware. Still, I'm
puzzled by the whole thing. Proper scaled integer arithmetic is
necessary for certain problems, particularly those involving currency,
but I'm just not sure how much decimal FP, even decimal FP tweaked to
support currency-type operations, really buys you.

Don't be misled by the name: IEEE Decimal is really scaled integer, with
semi-dynamic scaling. You have explicit control over the scaling, so it
is static scaling, but only when you want the control, so it is also
dynamic scaling aka floating point.

Quadibloc

2014-03-10 17:08:32 UTC

Post by Ivan Godard
Why are they reinventing the wheel?

I thought they were reinventing a _different_ wheel. This isn't a decimal floating point type, although they call it one; the significand is a binary integer. JOSS did that; and, of course, the problem is that it's slow, since normalization now requires multiplies or divides instead of shifts.

Works reasonably well in software only for machines with only binary hardware, of course.

John Savard

Ivan Godard

2014-03-10 18:38:32 UTC

Post by Ivan Godard
Why are they reinventing the wheel?

I thought they were reinventing a _different_ wheel. This isn't a
decimal floating point type, although they call it one; the
significand is a binary integer. JOSS did that; and, of course, the
problem is that it's slow, since normalization now requires
multiplies or divides instead of shifts.
Works reasonably well in software only for machines with only binary hardware, of course.
John Savard

754 permits two different representations: IBM-form, otherwise known as
DPD, in which the significand is a set of 10-bit declets holding values
0-999 and the overall value is base-1000; and Intel-form, aka BID, in
which the significand is represented as a binary integer, as in the
proposal.

IBM-form is much easier to implement and the fastest in hardware, and
the hardware is as fast as binary FP. Intel-form is much easier to
implement in software, and is slow but no worse than BCD on machines
with no hardware.

There's nothing in the proposal that has any advantage over standard
Intel-form. Why re-invent the wheel?

p.s. Don't ask why the two forms have company names. Standards is like
sausage-making :-(

Michael S

2014-03-10 20:54:51 UTC

Post by Ivan Godard
Why are they reinventing the wheel?

I thought they were reinventing a _different_ wheel. This isn't a
decimal floating point type, although they call it one; the
significand is a binary integer. JOSS did that; and, of course, the
problem is that it's slow, since normalization now requires
multiplies or divides instead of shifts.
Works reasonably well in software only for machines with only binary
hardware, of course.
John Savard

I don't know about z10, z196 and Power7, never saw their DFP hardware performance documented.
But I did see Power6 DFP hardware performance documented and it is a lot slower than BFP on the same machine.

Post by Ivan Godard
Intel-form is much easier to
implement in software, and is slow but no worse than BCD on machines
with no hardware.
There's nothing in the proposal that has any advantage over standard
Intel-form. Why re-invent the wheel?
p.s. Don't ask why the two forms have company names. Standards is like
sausage-making :-(

Robert Wessel

2014-03-11 01:17:16 UTC

On Mon, 10 Mar 2014 13:54:51 -0700 (PDT), Michael S

Post by Ivan Godard
Why are they reinventing the wheel?

I thought they were reinventing a _different_ wheel. This isn't a
decimal floating point type, although they call it one; the
significand is a binary integer. JOSS did that; and, of course, the
problem is that it's slow, since normalization now requires
multiplies or divides instead of shifts.
Works reasonably well in software only for machines with only binary
hardware, of course.
John Savard

IBM's first mainframe implementation of DFP was on the z9, as a
millicode retrofit. And yes, it was pretty slow. The z10 had actual
DFP hardware and was much faster. I'm not sure about the early
implementation history on POWER, but if the first generation did
microcode, it could well have been very slow too.

Quadibloc

2014-03-11 05:48:05 UTC

Post by Robert Wessel
IBM's first mainframe implementation of DFP was on the z9, as a
millicode retrofit. And yes, it was pretty slow. The z10 had actual
DFP hardware and was much faster.

I thought I had read something about the z10's decimal hardware, and it seemed as though it would have offered performance comparable with that of the old packed decimal instructions instead of something a lot better. But that depends on how _they_ were implemented.

Basically, they did _not_ include things like decimal Wallace Trees. And, yes, they are perfectly feasible.

http://www.quadibloc.com/comp/cp0202.htm

shows the logic diagram of a decimal carry-save adder.

John Savard

Robert Wessel

2014-03-11 06:46:47 UTC

On Mon, 10 Mar 2014 22:48:05 -0700 (PDT), Quadibloc

Post by Robert Wessel
IBM's first mainframe implementation of DFP was on the z9, as a
millicode retrofit. And yes, it was pretty slow. The z10 had actual
DFP hardware and was much faster.

I thought I had read something about the z10's decimal hardware, and it seemed as though it would have offered performance comparable with that of the old packed decimal instructions instead of something a lot better. But that depends on how _they_ were implemented.
Basically, they did _not_ include things like decimal Wallace Trees. And, yes, they are perfectly feasible.
http://www.quadibloc.com/comp/cp0202.htm
shows the logic diagram of a decimal carry-save adder.

We were comparing to BFP, not integer (packed) decimal. Although the
packed instructions haven't been barn burners on any of the Z
machines, it's quite possible that they were still faster than the DFP
instructions on the early implementations. I think the DFP
instructions are generally faster on EC12s than roughly equivalent
packed instructions, although it's tough to make an exact comparison.
There are certainly cases where even slow DFP instructions would be
significantly faster than the packed instructions, particularly when
dealing with longer numbers, where some quite painful subroutines get
invoked (anything involving multiplications or divisions with numbers
more than about 15 decimal digits long gets painful).

Terje Mathisen

2014-03-11 10:46:18 UTC

Post by Robert Wessel
We were comparing to BFP, not integer (packed) decimal. Although the
packed instructions haven't been barn burners on any of the Z
machines, it's quite possible that they were still faster than the DFP
instructions on the early implementations. I think the DFP
instructions are generally faster on EC12s than roughly equivalent
packed instructions, although it's tough to make an exact comparison.
There are certainly cases where even slow DFP instructions would be
significantly faster than the packed instructions, particularly when
dealing with longer numbers, where some quite painful subroutines get
invoked (anything involving multiplications or divisions with numbers
more than about 15 decimal digits long gets painful).

15 digits?

I would expect anything up to 19 to be OK, since that would still allow
a 64-bit binary mantissa to fit in 64 bits.

Above this point and up to the 34-digit max for DFP, you do get some
uglier code, particularly if you need rescaling the results, which you
normally do, right?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Robert Wessel

2014-03-11 11:18:44 UTC

On Tue, 11 Mar 2014 11:46:18 +0100, Terje Mathisen

15 digits?
I would expect anything up to 19 to be OK, since that would still allow
a 64-bit binary mantissa to fit in 64 bits.
Above this point and up to the 34-digit max for DFP, you do get some
uglier code, particularly if you need rescaling the results, which you
normally do, right?

Again, I'm talking about the packed instructions. While these
nominally take (up to) 16 byte operands, for multiplication and
division the operands have considerable restrictions on the
combination of sizes.

For example, the first operand in a MP (multiply (packed) decimal)
must have enough leading zero so that a the result doesn't overflow
("The multiplicand must have at least as many bytes of leftmost zeros
as the number of bytes in the multiplier;"), this effectively limits
the number of digits in the multiplier to 15.

DP splits the first operand into two separate fields to store both the
quotient and remainder in the (up to) 16 byte field ("The quotient is
placed leftmost in the first-operand location. The number of bytes in
the quotient field is equal to the difference between the dividend and
divisor lengths (L1 - L2). The remainder is placed rightmost in the
first-operand location and has a length equal to the divisor
length.").

That's not the complete set of rules, but it gets close. That
requires that operations with operands longer than about 15 digits be
done with multiple-precision subroutines. Still in decimal, of
course. DFP has the advantage that some of those restrictions are
effectively eased with the longer formats.

Terje Mathisen

2014-03-11 12:58:11 UTC

Post by Robert Wessel
On Tue, 11 Mar 2014 11:46:18 +0100, Terje Mathisen

Post by Terje Mathisen
15 digits?
I would expect anything up to 19 to be OK, since that would still allow
a 64-bit binary mantissa to fit in 64 bits.
Above this point and up to the 34-digit max for DFP, you do get some
uglier code, particularly if you need rescaling the results, which you
normally do, right?

OK, I was thinking about something in BID format, i.e. decimal with a
binary mantissa.

(Nearly) all your considerations below are the same for DPD and BID
format storage.

Post by Robert Wessel
For example, the first operand in a MP (multiply (packed) decimal)
must have enough leading zero so that a the result doesn't overflow
("The multiplicand must have at least as many bytes of leftmost zeros
as the number of bytes in the multiplier;"), this effectively limits
the number of digits in the multiplier to 15.

This is just like in the old days when I wrote out multiplication
problems on paper with 5mm square cells: You had to start so far to the
right that there would be room for the product, something you got if you
wrote the two numbers one after the other.

Post by Robert Wessel
DP splits the first operand into two separate fields to store both the
quotient and remainder in the (up to) 16 byte field ("The quotient is
placed leftmost in the first-operand location. The number of bytes in
the quotient field is equal to the difference between the dividend and
divisor lengths (L1 - L2). The remainder is placed rightmost in the
first-operand location and has a length equal to the divisor
length.").

Right, the remainder has to be in the [0..divisor> range.

Post by Robert Wessel
That's not the complete set of rules, but it gets close. That
requires that operations with operands longer than about 15 digits be
done with multiple-precision subroutines. Still in decimal, of
course. DFP has the advantage that some of those restrictions are
effectively eased with the longer formats.

I would state it slightly differently, in that is is the sum of # of
digits that matter for multiplication, and for division it is a _lot_
harder if the divisor is longer than the maximum supported by the cpu
hardware.

Using binary mantissas you do get maximum coverage range on a given cpu. :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Robert Wessel

2014-03-11 17:16:46 UTC

On Tue, 11 Mar 2014 13:58:11 +0100, Terje Mathisen

Post by Robert Wessel
On Tue, 11 Mar 2014 11:46:18 +0100, Terje Mathisen

OK, I was thinking about something in BID format, i.e. decimal with a
binary mantissa.
(Nearly) all your considerations below are the same for DPD and BID
format storage.

Right, the remainder has to be in the [0..divisor> range.

Pretty much, although as I mentioned, there are a few additional rules
to consider (for example the multiplier can't actually be more than 15
digits, although the multiplicand, can be - which just means you need
to reverse the multiplication into a temporary). Doing multiple
precision multiplication is a nuisance because of the lack of a good
carry flag and the continuing presence of the sign nibble in all the
actual operations. As you mentioned, multi-precision division is just
a PITA to do well.

Add to attach this to a prior thread, The shift-and-round-decimal
instruction actually uses a signed shift count. I really wasn't
thinking about non-binary arithmetic at the time.

Michael S

2014-03-11 09:11:59 UTC

Post by Robert Wessel
On Mon, 10 Mar 2014 13:54:51 -0700 (PDT), Michael S

Post by Ivan Godard
Why are they reinventing the wheel?

I thought they were reinventing a _different_ wheel. This isn't a
decimal floating point type, although they call it one; the
significand is a binary integer. JOSS did that; and, of course, the
problem is that it's slow, since normalization now requires
multiplies or divides instead of shifts.
Works reasonably well in software only for machines with only binary
hardware, of course.
John Savard

No, Power6 DFP is in hardware and it is not slow in absolute terms, just not nearly as fast as BFP. Actually, latency wise the difference is rather small - 1.5x to 3.x depending on operation. But when it comes to throughput then a single non-pipelined (some of conversions are pipelined, but according to my understanding, arithmetic instructions are not) FMA-less DFP is simply not in the same class as dual FMA-capable fully-pipelined BFP.

It seems, I heard that at hardware level z10 DFP is nearly identical to Power6', but it is possible that I am confusing DFP with some other HW block.

Robert Wessel

2014-03-10 18:51:03 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

I read "A later revision of IEEE 754 attempted to remedy this, but the
formats it recommended were so inefficient that it has not found much
acceptance." as a dismissal of decimal 754 on performance grounds.

Terje Mathisen

2014-03-10 22:13:52 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

Indeed.

This is a bowdlerized version of 64 bit decimal fp, using the legal
binary mantissa format.

Terje

Terje Mathisen

2014-03-10 22:10:20 UTC

Post by William Edwards
http://dec64.com/
What are people's thoughts on it?

There is a bug in the first and only example. :-)

Terje

Terje Mathisen

2014-03-11 07:04:13 UTC