Discussion:
Radians Or Degrees?
(too old to reply)
Michael S
2024-02-22 21:39:12 UTC
Permalink
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1). For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
You are working with approximations anyway. That those approximations
happen to exactly equal some random value seems irrelevant.
As to real-world use, how about any physical phenomena where one is
interest in resonance frequencies of the system. For a simple
example see https://www.feynmanlectures.caltech.edu/I_49.html where
one might write f(x) = sin(kx) = sin(pi * (2*x/L)) with L a length
of say a clamped string.
I don’t see a formula anything like that anywhere on that page. On
the other hand, I see lots of use of “ω” symbols, representing
frequencies in radians per second.
There are also uses with computing other functions, e.g., the true
gamma function via the Euler reflection formula.
gamma(x) * gamma(1 - x) = pi / sin(pi * x) = pi / sinpi(x)
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the dozens
or hundreds of examples of the usefulness of radians as an angle unit?
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
MitchAlsup1
2024-02-22 22:57:20 UTC
Permalink
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1).
You are better off with r in [-½,+½]
Post by Michael S
For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
I have a patent on doing argument reduction for sin(x) in 4 cycles...that
is as good as Payne-Haneok argument reduction over -infinity..+infinity
Post by Michael S
You are working with approximations anyway. That those approximations
happen to exactly equal some random value seems irrelevant.
As to real-world use, how about any physical phenomena where one is
interest in resonance frequencies of the system. For a simple
example see https://www.feynmanlectures.caltech.edu/I_49.html where
one might write f(x) = sin(kx) = sin(pi * (2*x/L)) with L a length
of say a clamped string.
I don’t see a formula anything like that anywhere on that page. On
the other hand, I see lots of use of “ω” symbols, representing
frequencies in radians per second.
There are also uses with computing other functions, e.g., the true
gamma function via the Euler reflection formula.
gamma(x) * gamma(1 - x) = pi / sin(pi * x) = pi / sinpi(x)
I would not have expected for the magnitude of gamma(x) * gamma(1 - x)
to be always greater than than pi....but there it is.
Post by Michael S
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the dozens
or hundreds of examples of the usefulness of radians as an angle unit?
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
I should remind that my patents include methods for calculating sin()
and cos() in 18 cycles, sinpi() and cospi() in 14 cycles while achieving
an accuracy n the 0.5002-bits of error. All of this is in HW, and all
carried out in precision wider than IEEE 754 with a bunch of HW tricks
not available to SW.

At this performance level, let the algorithms wanting pi based trigonometry
have it and those wanting unit based trigonometry have it too. Let program-
mers use what is easiest for them.
Steven G. Kargl
2024-02-23 00:13:02 UTC
Permalink
Post by MitchAlsup1
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1).
You are better off with r in [-½,+½]
Partial implementation details for single precision C code (trying
to keep this on topic for c.l.c) are at
https://cgit.freebsd.org/src/tree/lib/msun/src/s_sinpif.c

One uses sinpi(-|x|) = -sinpi(|x|), which natually gets one to
r = x - floor(x) in [0,1). One then uses kernels for sin() and
cos() to do the actual computation.

r in [0,0.25) is ksin(r).
r in [0.25,0.5) is kcos(0.5-r).
r in [0.5,0.75) is kcos(r - 0.5).
r in [0.75,1) is ksin(1 - r).
Post by MitchAlsup1
Post by Michael S
For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
I have a patent on doing argument reduction for sin(x) in 4 cycles...that
is as good as Payne-Haneok argument reduction over -infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.

(much trimmed for brevity)
Post by MitchAlsup1
Post by Michael S
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
I should remind that my patents include methods for calculating sin()
and cos() in 18 cycles, sinpi() and cospi() in 14 cycles while achieving
an accuracy n the 0.5002-bits of error. All of this is in HW, and all
carried out in precision wider than IEEE 754 with a bunch of HW tricks
not available to SW.
At this performance level, let the algorithms wanting pi based trigonometry
have it and those wanting unit based trigonometry have it too. Let program-
mers use what is easiest for them.
Agreed a programmer should use what is required by the problem
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
--
steve
Chris M. Thomasson
2024-02-23 00:49:25 UTC
Permalink
Post by Steven G. Kargl
Post by MitchAlsup1
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1).
You are better off with r in [-½,+½]
Partial implementation details for single precision C code (trying
to keep this on topic for c.l.c) are at
https://cgit.freebsd.org/src/tree/lib/msun/src/s_sinpif.c
One uses sinpi(-|x|) = -sinpi(|x|), which natually gets one to
r = x - floor(x) in [0,1). One then uses kernels for sin() and
cos() to do the actual computation.
r in [0,0.25) is ksin(r).
r in [0.25,0.5) is kcos(0.5-r).
r in [0.5,0.75) is kcos(r - 0.5).
r in [0.75,1) is ksin(1 - r).
Post by MitchAlsup1
Post by Michael S
For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
I have a patent on doing argument reduction for sin(x) in 4 cycles...that
is as good as Payne-Haneok argument reduction over -infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.
(much trimmed for brevity)
Post by MitchAlsup1
Post by Michael S
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
I should remind that my patents include methods for calculating sin()
and cos() in 18 cycles, sinpi() and cospi() in 14 cycles while achieving
an accuracy n the 0.5002-bits of error. All of this is in HW, and all
carried out in precision wider than IEEE 754 with a bunch of HW tricks
not available to SW.
At this performance level, let the algorithms wanting pi based trigonometry
have it and those wanting unit based trigonometry have it too. Let program-
mers use what is easiest for them.
Agreed a programmer should use what is required by the problem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indeed! :^)
Post by Steven G. Kargl
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
Use some form of arbitrary precision if you want to get to:

A 10^4141 zoom mset zoom!? It must took many hours to render... ;^)


Chris M. Thomasson
2024-02-23 00:59:37 UTC
Permalink
Post by Chris M. Thomasson
A 10^4141 zoom mset zoom!? It must took many hours to render... ;^)
http://youtu.be/Xjy_HSUujaw
A 10^275 zoom:


MitchAlsup1
2024-02-23 02:42:23 UTC
Permalink
Post by Chris M. Thomasson
Post by Chris M. Thomasson
A 10^4141 zoom mset zoom!? It must took many hours to render... ;^)
http://youtu.be/Xjy_HSUujaw
http://youtu.be/0jGaio87u3A
Argument reduction zoom::

COS( 6381956970095103×2^797) = -4.68716592425462761112×10-19


You can argue all you want about whether this has any significance, but
it happens to be the value closest to that radian argument when performed
with infinite precision.
Chris M. Thomasson
2024-02-23 20:28:40 UTC
Permalink
Post by MitchAlsup1
Post by Chris M. Thomasson
Post by Chris M. Thomasson
A 10^4141 zoom mset zoom!? It must took many hours to render... ;^)
http://youtu.be/Xjy_HSUujaw
http://youtu.be/0jGaio87u3A
COS( 6381956970095103×2^797) = -4.68716592425462761112×10-19
You can argue all you want about whether this has any significance, but
it happens to be the value closest to that radian argument when performed
with infinite precision.
Well, I use a lot of trig wrt my work. One example is getting at the
roots of complex numbers, polar form to rectangular form, ect... So
zooming in "deep" on say, something like my MultiJulia IFS work is going
to require arbitrary precision trig...

https://paulbourke.net/fractals/multijulia

Btw, my friend Paul took the time to create this nice write up and all
of the renders of my MultiJulia IFS experiment.
MitchAlsup1
2024-02-23 02:28:07 UTC
Permalink
Post by Steven G. Kargl
Post by MitchAlsup1
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Apparently, you missed the part about argument reduction. For
sinpi(x), it is fairly easy to reduce x = n + r with n an integer
and r in [0,1).
You are better off with r in [-½,+½]
Partial implementation details for single precision C code (trying
to keep this on topic for c.l.c) are at
https://cgit.freebsd.org/src/tree/lib/msun/src/s_sinpif.c
One uses sinpi(-|x|) = -sinpi(|x|), which natually gets one to
r = x - floor(x) in [0,1). One then uses kernels for sin() and
cos() to do the actual computation.
r in [0,0.25) is ksin(r).
r in [0.25,0.5) is kcos(0.5-r).
r in [0.5,0.75) is kcos(r - 0.5).
r in [0.75,1) is ksin(1 - r).
By "better off" I mean you have ½-bit of greater reduced argument precision
when the reduced argument is centered on zero.
Post by Steven G. Kargl
Post by MitchAlsup1
Post by Michael S
For the extended interval, x in [0,2^23], there are
roughly 2^23 values with r = 0.5; and thus, sinpi(x) = 1 exactly.
There are no such values for sin(x), and argument reduction for
sin(x) is much more involved.
I have a patent on doing argument reduction for sin(x) in 4 cycles...that
is as good as Payne-Haneok argument reduction over -infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.
Here is the version found in the SUN transcendentals::

static void ReduceFull(double *xp, int *a, double x)
{
Double X = { x };
int ec = X.s.exponent - (1023+33);
int k = (ec + 26) * (607*4) >> 16;
int m = 27*k - ec;
int offset = m >> 3;
x *= 0x1p-400;
double xDekker = x * (0x1p27 + 1);
double x0 = xDekker - (xDekker - x);
double x1 = x - x0;
const double *p0 = &TwoOverPiWithOffset[offset][k]; // 180 DP FP numbers
const double fp0 = p0[0];
const double fp1 = p0[1];
const double fp2 = p0[2];
const double fp3 = p0[3];
const double f0 = x1 * fp0 + fp1 * x0;
double f = x1 * fp1 + fp2 * x0;
const double fi = f0 + f;
static const double IntegerBias = 0x1.8p52;
Double Fi = { fi + IntegerBias };
*a = Fi.s.significand2;
double fint = Fi.d - IntegerBias;
const double fp4 = p0[4];
const double fp5 = p0[5];
const double fp6 = p0[6];
f = f0 - fint + f;
f += x1 * fp2 + fp3 * x0;
f += x1 * fp3 + fp4 * x0;
f += x1 * fp4 + fp5 * x0;
f += x1 * fp5 + fp6 * x0;
*xp = f * 0x3.243F6A8885A3p-1;
}

Which I can do in 4 cycles in HW !! {{I happen to have this on hand to explain how
I can do this in HW in 4 cycles......}}

I should also note that the conversion of f back into *xp looses ¼-bit of precision.
Post by Steven G. Kargl
(much trimmed for brevity)
Post by MitchAlsup1
Post by Michael S
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
I should remind that my patents include methods for calculating sin()
and cos() in 18 cycles, sinpi() and cospi() in 14 cycles while achieving
an accuracy n the 0.5002-bits of error. All of this is in HW, and all
carried out in precision wider than IEEE 754 with a bunch of HW tricks
not available to SW.
At this performance level, let the algorithms wanting pi based trigonometry
have it and those wanting unit based trigonometry have it too. Let program-
mers use what is easiest for them.
Agreed a programmer should use what is required by the problem
that they are solving. I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754 precision.
Terje Mathisen
2024-02-23 10:10:00 UTC
Permalink
Post by MitchAlsup1
Post by Steven G. Kargl
Agreed a programmer should use what is required by the problem
that they are solving.  I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754 precision.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.

There is a suggestion on the table to make that a (probably optional
imho) feature for an upcoming ieee754 revision.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Michael S
2024-03-14 09:26:55 UTC
Permalink
On Fri, 23 Feb 2024 11:10:00 +0100
Post by Terje Mathisen
Post by MitchAlsup1
Post by Steven G. Kargl
Agreed a programmer should use what is required by the problem
that they are solving.  I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754 precision.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
At which cost in tables sizes?
Post by Terje Mathisen
There is a suggestion on the table to make that a (probably optional
imho) feature for an upcoming ieee754 revision.
Terje
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.

The definition of 'exact' should be:
For any finite-precision function foo(x) lets designate the same
mathematical function calculated with infinite precision as Foo(x).
Let's designate an operation of rounding of infinite-precision number to
desired finite precision as Rnd(). Rounding is done in to-nearest mode.
Unlike in the case of basic operations, ties are allowed to be broken in
any direction.
The result of y=foo(x) for finite-precision number x considered
exact if *at least one* two conditions is true:
(1=Y-clause) Rnd(Foo(x)) == y
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.

As follows from the (2), it is possible and not uncommon that more
than one finite-precision number y is accepted exact result of foo(x).

If Committee omits the 2nd clause then the whole proposition will be not
just not useful, but harmful.
MitchAlsup1
2024-03-14 17:34:57 UTC
Permalink
Post by Michael S
On Fri, 23 Feb 2024 11:10:00 +0100
Post by Terje Mathisen
Post by MitchAlsup1
Post by Steven G. Kargl
Agreed a programmer should use what is required by the problem
that they are solving.  I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754 precision.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
At which cost in tables sizes?
Post by Terje Mathisen
There is a suggestion on the table to make that a (probably optional
imho) feature for an upcoming ieee754 revision.
Terje
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
For any finite-precision function foo(x) lets designate the same
mathematical function calculated with infinite precision as Foo(x).
Let's designate an operation of rounding of infinite-precision number to
desired finite precision as Rnd(). Rounding is done in to-nearest mode.
Unlike in the case of basic operations, ties are allowed to be broken in
any direction.
The result of y=foo(x) for finite-precision number x considered
(1=Y-clause) Rnd(Foo(x)) == y
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
Post by Michael S
As follows from the (2), it is possible and not uncommon that more
than one finite-precision number y is accepted exact result of foo(x).
If Committee omits the 2nd clause then the whole proposition will be not
just not useful, but harmful.
An interesting side effect of greater intermediate precision is the lack
of need to round prior to the final result. Thus, my sin(x) over its
entire calculation suffers exactly 1 rounding. Payne & Hanek does not
have this prperty.
MitchAlsup1
2024-03-14 19:48:26 UTC
Permalink
And thirdly::

Let us postulate that the reduced argument r is indeed calculated with 0.5ULP
error.

What makes you think you can calculate the polynomial without introducing
any more error ??
Terje Mathisen
2024-03-15 11:16:27 UTC
Permalink
Post by MitchAlsup1
Let us postulate that the reduced argument r is indeed calculated with 0.5ULP
error.
What makes you think you can calculate the polynomial without introducing
any more error ??
It should be obvious that any argument reduction step must return a
value with significantly higher precision than the input(s), and that
this higher precision value is then used in any polynomial evaluation.

With careful setup, it is very often possible to reduce the amount of
extended-procision work needed to just one or two steps, i.e. for the
classic Taylor sin(x) series, with x fairly small, the x^3 and higher
terms can make do with double precision, so that the final step is to
add the two parts of the leading x term: First the trailing part and
then when adding the upper 53 bits of x you get a single rounding at
this stage.

This is easier and better when done with 64-bit fixed-point values,
augemented with a few 128-bit operations.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Lawrence D'Oliveiro
2024-03-14 20:30:47 UTC
Permalink
Post by MitchAlsup1
Post by Michael S
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
No idea why that’s relevant. Michael S was talking about “Rnd”, not “RND”.

When you say “RND”, I think of a bad random-number generator found in
various implementations of BASIC.
Chris M. Thomasson
2024-03-14 22:12:43 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by MitchAlsup1
Post by Michael S
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
No idea why that’s relevant. Michael S was talking about “Rnd”, not “RND”.
When you say “RND”, I think of a bad random-number generator found in
various implementations of BASIC.
PRNG, an LGC? Fwiw, check this shit out:

https://groups.google.com/g/comp.lang.c++/c/7u_rLgQe86k/m/fYU9SnuAFQAJ

;^D
MitchAlsup1
2024-03-14 22:19:16 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by MitchAlsup1
Post by Michael S
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
No idea why that’s relevant. Michael S was talking about “Rnd”, not “RND”.
You have just selected yourself as someone I will never reply to again.
Post by Lawrence D'Oliveiro
When you say “RND”, I think of a bad random-number generator found in
various implementations of BASIC.
Chris M. Thomasson
2024-03-14 22:21:33 UTC
Permalink
Post by MitchAlsup1
Post by Lawrence D'Oliveiro
Post by MitchAlsup1
Post by Michael S
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
No idea why that’s relevant. Michael S was talking about “Rnd”, not “RND”.
You have just selected yourself as someone I will never reply to again.
It might be an AI? Just have a very strange feeling... Humm...
Post by MitchAlsup1
Post by Lawrence D'Oliveiro
When you say “RND”, I think of a bad random-number generator found in
various implementations of BASIC.
Chris M. Thomasson
2024-03-14 22:22:45 UTC
Permalink
[...]

An AI set on ass mode?
Michael S
2024-03-15 11:49:36 UTC
Permalink
On Thu, 14 Mar 2024 17:34:57 +0000
Post by MitchAlsup1
Post by Michael S
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
In the second clause:: are we guaranteed that RND(Foo(X))= Y ??
No, we are not.
Post by MitchAlsup1
Post by Michael S
As follows from the (2), it is possible and not uncommon that more
than one finite-precision number y is accepted exact result of foo(x).
If Committee omits the 2nd clause then the whole proposition will
be not just not useful, but harmful.
An interesting side effect of greater intermediate precision is the
lack of need to round prior to the final result. Thus, my sin(x) over
its entire calculation suffers exactly 1 rounding. Payne & Hanek does
not have this prperty.
Which does not help to recover precision lost during rounding of x
that happened before your wonderful instruction.
Terje Mathisen
2024-03-15 10:23:45 UTC
Permalink
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.

Actually using and depending upon the result is more stupid.

OTOH, it is and have always been, a core principle of ieee754 that basic
operations (FADD/FSUB/FMUL/FDIV/FSQRT) shall assume that the inputs are
exact (no fractional ulp uncertainty), and that we from that starting
point must deliver a correctly rounded version of the infinitely precise
exact result of the operation.

Given the latter, it is in fact very tempting to see if that basic
result rule could be applied to more of the non-core operations, but I
cannot foresee any situation where I would use it myself: If I find
myself in a situation where the final fractional ulp is important, then
I would far rather switch to doing the operation in fp128.

Terje
Post by Michael S
On Fri, 23 Feb 2024 11:10:00 +0100
Post by Terje Mathisen
Post by MitchAlsup1
Post by Steven G. Kargl
Agreed a programmer should use what is required by the problem
that they are solving.  I'll note that SW implementations have
their sets of tricks (e.g., use of double-double arithmetic to
achieve double precision).
To get near IEEE desired precision, one HAS TO use more than 754 precision.
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
At which cost in tables sizes?
Post by Terje Mathisen
There is a suggestion on the table to make that a (probably optional
imho) feature for an upcoming ieee754 revision.
Terje
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
For any finite-precision function foo(x) lets designate the same
mathematical function calculated with infinite precision as Foo(x).
Let's designate an operation of rounding of infinite-precision number to
desired finite precision as Rnd(). Rounding is done in to-nearest mode.
Unlike in the case of basic operations, ties are allowed to be broken in
any direction.
The result of y=foo(x) for finite-precision number x considered
(1=Y-clause) Rnd(Foo(x)) == y
(2=X-clause) There exist an infinite precision number X for which
both Foo(X) == y and Rnd(X) == x.
As follows from the (2), it is possible and not uncommon that more
than one finite-precision number y is accepted exact result of foo(x).
If Committee omits the 2nd clause then the whole proposition will be not
just not useful, but harmful.
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Michael S
2024-03-15 12:15:52 UTC
Permalink
On Fri, 15 Mar 2024 11:23:45 +0100
Post by Terje Mathisen
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.
Actually using and depending upon the result is more stupid.
OTOH, it is and have always been, a core principle of ieee754 that
basic operations (FADD/FSUB/FMUL/FDIV/FSQRT) shall assume that the
inputs are exact (no fractional ulp uncertainty), and that we from
that starting point must deliver a correctly rounded version of the
infinitely precise exact result of the operation.
Given the latter, it is in fact very tempting to see if that basic
result rule could be applied to more of the non-core operations, but
I cannot foresee any situation where I would use it myself: If I find
myself in a situation where the final fractional ulp is important,
then I would far rather switch to doing the operation in fp128.
Terje
To make it less tempting, you could try to push for inclusion of
rsqrt() into basic set. Long overdue, IMHO.

Right now, I can't think of any other transcendental that I really want
to elevate to higher status. It seems to me that elevation of log2(x)
and of 2**x will do no harm, but I am not sure about usefulness.
MitchAlsup1
2024-03-16 01:23:12 UTC
Permalink
Post by Michael S
To make it less tempting, you could try to push for inclusion of
rsqrt() into basic set. Long overdue, IMHO.
Many Newton-Raphson iterations converge more rapidly with RSQRT than
with SQRT, for reasons I never looked that deeply into.
Post by Michael S
Right now, I can't think of any other transcendental that I really want
to elevate to higher status. It seems to me that elevation of log2(x)
and of 2**x will do no harm, but I am not sure about usefulness.
Ln2 and EXP2 are the basis of My 66000 log and EXP families, performed in
HW function unit.

Ln2 has the property that a binade [1..2) maps directly to another binade
[2..4); exp2 has a similar property. Ln2 is easier to get good accuracy
from than Ln or Log as a starting point. But in any event:: you are always
within a high precision multiply of the correct result = round(53×106);
Terje Mathisen
2024-03-16 15:59:19 UTC
Permalink
Post by MitchAlsup1
Post by Michael S
To make it less tempting, you could try to push for inclusion of
rsqrt() into basic set. Long overdue, IMHO.
Many Newton-Raphson iterations converge more rapidly with RSQRT than
with SQRT, for reasons I never looked that deeply into.
The classic NR iteration for rsqrt() uses only fmul & fadd/fsub, while
the naive sqrt() version needs a divide in every iteration.

Personally I prefer to use the y = rsqrt(x) form and then just multiply
by x in the end:

y = sqrt(x) = x/sqrt(x) = rsqrt(x)*x

In order to get the rounding correct from this form you can probably
just square the final result and compare this with the original input?
Post by MitchAlsup1
Post by Michael S
Right now, I can't think of any other transcendental that I really want
to elevate to higher status. It seems to me that elevation of log2(x)
and of 2**x will do no harm, but I am not sure about usefulness.
Ln2 and EXP2 are the basis of My 66000 log and EXP families, performed in
HW function unit.
Ln2 has the property that a binade [1..2) maps directly to another binade
[2..4); exp2 has a similar property. Ln2 is easier to get good accuracy
from than Ln or Log as a starting point. But in any event:: you are
always within a high precision multiply of the correct result =
round(53×106);
Exactly the same reasoning I used to compute rsqrt() over the [1.0 -
4.0> range, or (simplifying the exp logic) over [0.5 - 2.0> so that the
last exp bit maps to the first vs second half of the range. Back before
we got approximate rsqrt() lookup opcodes, I wrote code that took the
last exp bit and the first n mantissa bits and looked up an optimal
starting point for that value + 0.5 ulp. Getting 11-12 bits this way
means approximatly correct float after a single iteration and useful
double after two.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Chris M. Thomasson
2024-03-15 20:59:52 UTC
Permalink
Post by Terje Mathisen
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.
[...]

;^D tooooooo big. :^)

Now, wrt the results, arbitrary precision for trig is useful, in say...
Deep fractal zooms...

Zooming in really deep in say something like this, well the precision of
trig can become an issue:

https://paulbourke.net/fractals/multijulia/

Trig would be used, say, in rectangular to-from polar forms wrt getting
the n-ary roots of a complex number?
Chris M. Thomasson
2024-03-15 21:13:11 UTC
Permalink
Post by Chris M. Thomasson
Post by Terje Mathisen
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.
[...]
;^D tooooooo big. :^)
Now, wrt the results, arbitrary precision for trig is useful, in say...
Deep fractal zooms...
Say I want at least 30 digits of precision for a certain calculation of
complex numbers at a certain zoom level. Along the lines of trying to
match convergents of continued fractions, take pi:

3.1415926535897932384...

two digits of decimal precision:

22/7 = 3.14_28571428571428571428571428571


three digits of decimal precision:

333/106 = 3.1415_094339622641509433962264151


six digits of decimal precision:

355/113 = 3.141592_9203539823008849557522124

On and on...

Well...
Post by Chris M. Thomasson
Zooming in really deep in say something like this, well the precision of
https://paulbourke.net/fractals/multijulia/
Trig would be used, say, in rectangular to-from polar forms wrt getting
the n-ary roots of a complex number?
Chris M. Thomasson
2024-03-16 20:23:46 UTC
Permalink
On 3/15/2024 2:13 PM, Chris M. Thomasson wrote:
[...]
Post by Chris M. Thomasson
333/106 = 3.1415_094339622641509433962264151
[...]
Keith Thompson
2024-03-15 21:16:33 UTC
Permalink
Post by Chris M. Thomasson
Post by Terje Mathisen
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.
[...]
;^D tooooooo big. :^)
Now, wrt the results, arbitrary precision for trig is useful, in
say... Deep fractal zooms...
Zooming in really deep in say something like this, well the precision
https://paulbourke.net/fractals/multijulia/
Trig would be used, say, in rectangular to-from polar forms wrt
getting the n-ary roots of a complex number?
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately? (Since I'm posting to
comp.lang.c, I'll mention that "^" is meant to be exponentation, not
bitwise xor.)
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Chris M. Thomasson
2024-03-15 21:26:56 UTC
Permalink
Post by Keith Thompson
Post by Chris M. Thomasson
Post by Terje Mathisen
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.
[...]
;^D tooooooo big. :^)
Now, wrt the results, arbitrary precision for trig is useful, in
say... Deep fractal zooms...
Zooming in really deep in say something like this, well the precision
https://paulbourke.net/fractals/multijulia/
Trig would be used, say, in rectangular to-from polar forms wrt
getting the n-ary roots of a complex number?
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Nope. :^) Fwiw, my fractals deal with zooming in and the numbers
comprising their initial state is tiny compared to 2^53. For instance
take these initial states for my multijulia fractal:

____________
c0 = {0.5,0.0}, c1 = {-5.5,0.0}

c0 = {0.0,1.0}, c1 = {0.0,-1.0}

c0 = {0.726514,0.240242}, c1 = {0.171039,0.235043}

c0 = {-1.444991,0.139179}, c1 = {-0.063294,-1.401774}

c0 = {1,0}, c1 = {-1,0}

c0 = {-.75, .06 }, c1 = {-.45, .6 }

c0 = {-1,0}, c1 = {1,0}, c2 = {0,-1}, c3 = {0,1}
____________

Notice how the numbers are small? However, if we zoom in enough, then
2^53 might become an issue..
Post by Keith Thompson
(Since I'm posting to
comp.lang.c, I'll mention that "^" is meant to be exponentation, not
bitwise xor.)
Indeed.
Chris M. Thomasson
2024-03-15 21:30:09 UTC
Permalink
On 3/15/2024 2:26 PM, Chris M. Thomasson wrote:
[...]
Post by Chris M. Thomasson
Notice how the numbers are small? However, if we zoom in enough, then
2^53 might become an issue..
[...]

Argh! I mean if we zoom in enough, then only 2^53 of precision might
become an issue... We can go in really deep:

A deep zoom might be 10^275. Somebody did one to the 10^4141! I feel
like including a youtube link here, but I will refrain out of respect
for you.
Keith Thompson
2024-03-15 22:48:39 UTC
Permalink
Post by Chris M. Thomasson
[...]
Post by Chris M. Thomasson
Notice how the numbers are small? However, if we zoom in enough,
then 2^53 might become an issue..
[...]
Argh! I mean if we zoom in enough, then only 2^53 of precision might
A deep zoom might be 10^275. Somebody did one to the 10^4141! I feel
like including a youtube link here, but I will refrain out of respect
for you.
I don't object to YouTube links as long as they're reasonably relevant
and there's enough information for readers to decide whether it's worth
their time to open them.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
Chris M. Thomasson
2024-03-17 20:41:16 UTC
Permalink
Post by Keith Thompson
Post by Chris M. Thomasson
[...]
Post by Chris M. Thomasson
Notice how the numbers are small? However, if we zoom in enough,
then 2^53 might become an issue..
[...]
Argh! I mean if we zoom in enough, then only 2^53 of precision might
A deep zoom might be 10^275. Somebody did one to the 10^4141! I feel
like including a youtube link here, but I will refrain out of respect
for you.
I don't object to YouTube links as long as they're reasonably relevant
and there's enough information for readers to decide whether it's worth
their time to open them.
Fair enough. :^)
Chris M. Thomasson
2024-03-18 04:49:17 UTC
Permalink
Post by Chris M. Thomasson
Post by Keith Thompson
Post by Chris M. Thomasson
[...]
Post by Chris M. Thomasson
Notice how the numbers are small? However, if we zoom in enough,
then 2^53 might become an issue..
[...]
Argh! I mean if we zoom in enough, then only 2^53 of precision might
A deep zoom might be 10^275. Somebody did one to the 10^4141! I feel
like including a youtube link here, but I will refrain out of respect
for you.
I don't object to YouTube links as long as they're reasonably relevant
and there's enough information for readers to decide whether it's worth
their time to open them.
Fair enough. :^)
Fwiw, another algorithm of mine that would benefit from high
precision... The fun part is that a point can be placed anywhere in the
field. So, it can be a big one. Depending on the users placement, so to
speak:

https://gallery.bridgesmathart.org/exhibitions/2024-joint-mathematics-meetings/chris-m-thomasson
MitchAlsup1
2024-03-16 01:16:25 UTC
Permalink
Post by Keith Thompson
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the programmer
to remain within his sandbox. Since early 1990s, when Payne and Hanek
argument reduction became widely available, there is no reason NOT to
perform argument reduction with great accuracy. NOTE: you don't HAVE to
use Payne and Hanek when simpler strategies word adequately; but when
they do not, you do not have to "Throw up your hands" and run away.
Michael S
2024-03-16 17:08:15 UTC
Permalink
On Sat, 16 Mar 2024 01:16:25 +0000
Post by MitchAlsup1
Post by Keith Thompson
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
sin(2**53) is not what programmer wanted, because 2**53 is not the x he
really wanted. It has nothing to do with calculation of sin().
Post by MitchAlsup1
Since early 1990s, when
Payne and Hanek argument reduction became widely available, there is
no reason NOT to perform argument reduction with great accuracy.
NOTE: you don't HAVE to use Payne and Hanek when simpler strategies
word adequately; but when they do not, you do not have to "Throw up
your hands" and run away.
MitchAlsup1
2024-03-16 17:22:50 UTC
Permalink
Post by Michael S
On Sat, 16 Mar 2024 01:16:25 +0000
Post by MitchAlsup1
Post by Keith Thompson
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).

With high precision argument reduction you have to do (wait for it)
NoThInG !!.

Without high precision argument reduction yo have to do a lot of
programming--programming that I insist should have been done by
the programmer responsible for sin(x).
Post by Michael S
sin(2**53) is not what programmer wanted, because 2**53 is not the x he
really wanted. It has nothing to do with calculation of sin().
And EvErYtHiNg to do with high precision argument reduction; and
maybe the programmer has no requirement on what he is allowed to
read in from a file and pass onto sin(x).
Post by Michael S
Post by MitchAlsup1
Since early 1990s, when
Payne and Hanek argument reduction became widely available, there is
no reason NOT to perform argument reduction with great accuracy.
NOTE: you don't HAVE to use Payne and Hanek when simpler strategies
word adequately; but when they do not, you do not have to "Throw up
your hands" and run away.
Scott Lurndal
2024-03-16 18:32:52 UTC
Permalink
Post by MitchAlsup1
Post by Michael S
On Sat, 16 Mar 2024 01:16:25 +0000
Post by MitchAlsup1
Post by Keith Thompson
I can see how computing sin(x) with high precision for "reasonable"
values of x would be useful, but does any of that benefit from being
able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
That generally seems to be an highly unlikely scenario.
Michael S
2024-03-16 18:49:45 UTC
Permalink
On Sat, 16 Mar 2024 17:22:50 +0000
Post by MitchAlsup1
Post by Michael S
On Sat, 16 Mar 2024 01:16:25 +0000
Post by MitchAlsup1
Post by Keith Thompson
I can see how computing sin(x) with high precision for
"reasonable" values of x would be useful, but does any of that
benefit from being able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
With high precision argument reduction you have to do (wait for it)
NoThInG !!.
Without high precision argument reduction yo have to do a lot of
programming--programming that I insist should have been done by
the programmer responsible for sin(x).
I never met programmers that wants do anything of this sort.
Programmers that I meet in real world could sometimes want something
else. They could want, for example, to draw 1000 points of wave of blue
LED with frequency 663.42 THz (663.42e12 Hz) with step of 10 nm (1e-8 m)
starting at distance of 1 million km. Well, not really, but at least I
can imagine them wanting to do it*.
Will double precision sin() routine that does perfect argument
reduction be any more helpful for their task then sin() implementation
that makes no effort at all beyound simple modulo 2*pi? No, they both
would be the same.
May be, sinpi() that does a perfect reduction more or less by
definition, be more helpful?
For naive folks - not at all.
For advanced folks - yes, but just a little. It still wouldn't be easy.
And among programmers that I know naives outnumber advanced by more
than factor of 10.

-----
* - For more real case, they could want to draw the same wave at
distance of 2m using single-precision arithmetic.
Keith Thompson
2024-03-16 23:19:11 UTC
Permalink
Post by MitchAlsup1
Post by Michael S
On Sat, 16 Mar 2024 01:16:25 +0000
Post by MitchAlsup1
Post by Keith Thompson
I can see how computing sin(x) with high precision for
"reasonable" values of x would be useful, but does any of that
benefit from being able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than just
doing it for the sake of doing it).

If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?

If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.

Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.

#include <math.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>
int main(void) {
{
printf("float (%zu bits, %d mantissa bits)\n", CHAR_BIT * sizeof (float), FLT_MANT_DIG);
const float x = (float)(1LL<<53);
const float y = nextafterf(x, x*2);
printf("%.8f %.8f\n", x, sinf(x));
printf("%.8f %.8f\n", y, sinf(y));
}
putchar('\n');
{
printf("double (%zu bits, %d mantissa bits)\n", CHAR_BIT * sizeof (double), DBL_MANT_DIG);
const double x = (double)(1LL<<53);
const double y = nextafter(x, x*2);
printf("%.8f %.8f\n", x, sin(x));
printf("%.8f %.8f\n", y, sin(y));
}
putchar('\n');
{
printf("long double (%zu bits, %d mantissa bits)\n", CHAR_BIT * sizeof (long double), LDBL_MANT_DIG);
const long double x = (long double)(1LL<<53);
const long double y = nextafterl(x, x*2);
printf("%.8Lf %.8Lf\n", x, sinl(x));
printf("%.8Lf %.8Lf\n", y, sinl(y));
}
}

Output:

float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181

double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655

long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
bart
2024-03-17 00:00:00 UTC
Permalink
Post by Keith Thompson
Post by MitchAlsup1
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.
...
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
Is this output supposed to be different between gcc -O0 and gcc -O3?
Keith Thompson
2024-03-17 01:38:02 UTC
Permalink
Post by bart
Post by Keith Thompson
Post by MitchAlsup1
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.
...
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
Is this output supposed to be different between gcc -O0 and gcc -O3?
No, why do you ask?

On my system, the output is identical with gcc and clang, and tcc at all
optimization levels, with both glibc and musl. I do get slightly
different results for long double on Cygwin vs. Ubuntu (Cygwin uses
newlib, not glibc).
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
bart
2024-03-17 01:57:22 UTC
Permalink
Post by Keith Thompson
Post by bart
Post by Keith Thompson
Post by MitchAlsup1
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.
...
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
Is this output supposed to be different between gcc -O0 and gcc -O3?
No, why do you ask?
On my system, the output is identical with gcc and clang, and tcc at all
optimization levels, with both glibc and musl. I do get slightly
different results for long double on Cygwin vs. Ubuntu (Cygwin uses
newlib, not glibc).
Because I get the results shown below. Even if the library is different
from yours, I would have assumed matching results.


------------------------------

C:\c>gcc c.c -O3

C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181

double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655

long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168

C:\c>gcc c.c -O0

C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84893209
9007200328482816.00000000 -0.34160271

double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740994.00000000 -0.12728505

long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740992.00097656 -0.84944780

C:\c>gcc --version
gcc (tdm64-1) 10.3.0
Keith Thompson
2024-03-17 02:57:17 UTC
Permalink
Post by bart
Post by Keith Thompson
Post by bart
Post by Keith Thompson
Post by MitchAlsup1
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.
...
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
Is this output supposed to be different between gcc -O0 and gcc -O3?
No, why do you ask?
On my system, the output is identical with gcc and clang, and tcc at all
optimization levels, with both glibc and musl. I do get slightly
different results for long double on Cygwin vs. Ubuntu (Cygwin uses
newlib, not glibc).
Because I get the results shown below. Even if the library is
different from yours, I would have assumed matching results.
------------------------------
C:\c>gcc c.c -O3
C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
C:\c>gcc c.c -O0
C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84893209
9007200328482816.00000000 -0.34160271
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740994.00000000 -0.12728505
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740992.00097656 -0.84944780
C:\c>gcc --version
gcc (tdm64-1) 10.3.0
I see similar results (possibly identical, I haven't checked) with
i686-w64-mingw32-gcc on Cygwin. I think it uses the same, or a similar,
runtime library as the tdm64 implementation you're using.

Comparing the generated assembly, at -O0 it generates a call to "sin",
while at -O1 and above it replaces the expression with a compile-time
constant. Apparently that compile-time constant matches the run-time
computed value for glibc, but not for whatever tdm64 uses.

I don't *think* it's a conformance issue as long as the precision is
within the standard's (fairly vague) requirements. When I modify the
program to start with sin(1.0) rather than sin(2**53), the output is
identical at all optimization levels.

Apparently glibc and the library used by tgm64 behave differently when
computing the sin of very large values. The result for sin(2**53) in
long double differs by about one part in 2**17 between -O0 and -O1, or
between gcc/glibc and tgm64.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
David Brown
2024-03-17 12:10:56 UTC
Permalink
Post by Keith Thompson
Post by bart
Post by Keith Thompson
Post by bart
Post by Keith Thompson
Post by MitchAlsup1
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than just
doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to make
the result practically meaningless.
...
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
Is this output supposed to be different between gcc -O0 and gcc -O3?
No, why do you ask?
On my system, the output is identical with gcc and clang, and tcc at all
optimization levels, with both glibc and musl. I do get slightly
different results for long double on Cygwin vs. Ubuntu (Cygwin uses
newlib, not glibc).
Because I get the results shown below. Even if the library is
different from yours, I would have assumed matching results.
------------------------------
C:\c>gcc c.c -O3
C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
C:\c>gcc c.c -O0
C:\c>a
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84893209
9007200328482816.00000000 -0.34160271
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740994.00000000 -0.12728505
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84893209
9007199254740992.00097656 -0.84944780
C:\c>gcc --version
gcc (tdm64-1) 10.3.0
I see similar results (possibly identical, I haven't checked) with
i686-w64-mingw32-gcc on Cygwin. I think it uses the same, or a similar,
runtime library as the tdm64 implementation you're using.
Comparing the generated assembly, at -O0 it generates a call to "sin",
while at -O1 and above it replaces the expression with a compile-time
constant. Apparently that compile-time constant matches the run-time
computed value for glibc, but not for whatever tdm64 uses.
I don't *think* it's a conformance issue as long as the precision is
within the standard's (fairly vague) requirements. When I modify the
program to start with sin(1.0) rather than sin(2**53), the output is
identical at all optimization levels.
Apparently glibc and the library used by tgm64 behave differently when
computing the sin of very large values. The result for sin(2**53) in
long double differs by about one part in 2**17 between -O0 and -O1, or
between gcc/glibc and tgm64.
That is interesting. It might not be a conformance issue, but it is a
divergence from the intention of the compiler. gcc makes use of a very
comprehensive library that emulates a whole range of floating point
hardware implementations to try to get bit-perfect compile-time versions
of run-time calculations, precisely so that it can do compile-time
calculations that fully match what you would get at run-time, regardless
of compilation flags and optimisation.

So if the program is producing different results for run-time and
compile-time versions of "sin", then there is a mismatch between the
configuration of the compiler and the run-time library it is using. I
don't know enough about the configuration options for gcc (there are a
/lot/ of them), but I know it can be configured to assume you are using
glibc or a glibc-compatible library. Perhaps the mingw build here is
configured with this option, but is actually using a run-time library
with a slightly implementation of these functions (such as an MS DLL C
library). If that is the case, then this is a mistake in the
configuration of gcc here, and I would think it could be reported back
to the mingw people as a bug.
Michael S
2024-03-17 09:06:21 UTC
Permalink
On Sat, 16 Mar 2024 16:19:11 -0700
Post by Keith Thompson
Post by MitchAlsup1
Post by Michael S
On Sat, 16 Mar 2024 01:16:25 +0000
Post by MitchAlsup1
Post by Keith Thompson
I can see how computing sin(x) with high precision for
"reasonable" values of x would be useful, but does any of that
benefit from being able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than
just doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual value
to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on the
application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The nextafter
functions are used to compute the nearest representable number. For
long double, the value of sin() changes by about 1 part in 1600, which
seems decent, but it's not nearly as precise as for values around 1.0.
For float and double, the imprecision of the argument is enough to
make the result practically meaningless.
#include <math.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>
int main(void) {
{
printf("float (%zu bits, %d mantissa bits)\n", CHAR_BIT *
sizeof (float), FLT_MANT_DIG); const float x = (float)(1LL<<53);
const float y = nextafterf(x, x*2);
printf("%.8f %.8f\n", x, sinf(x));
printf("%.8f %.8f\n", y, sinf(y));
}
putchar('\n');
{
printf("double (%zu bits, %d mantissa bits)\n", CHAR_BIT *
sizeof (double), DBL_MANT_DIG); const double x = (double)(1LL<<53);
const double y = nextafter(x, x*2);
printf("%.8f %.8f\n", x, sin(x));
printf("%.8f %.8f\n", y, sin(y));
}
putchar('\n');
{
printf("long double (%zu bits, %d mantissa bits)\n", CHAR_BIT
* sizeof (long double), LDBL_MANT_DIG); const long double x = (long
double)(1LL<<53); const long double y = nextafterl(x, x*2);
printf("%.8Lf %.8Lf\n", x, sinl(x));
printf("%.8Lf %.8Lf\n", y, sinl(y));
}
}
float (32 bits, 24 mantissa bits)
9007199254740992.00000000 -0.84892595
9007200328482816.00000000 -0.34159181
double (64 bits, 53 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740994.00000000 -0.12729655
long double (128 bits, 64 mantissa bits)
9007199254740992.00000000 -0.84892596
9007199254740992.00097656 -0.84944168
As written, your example does not emphasize that the problem has
nothing to do with implementation of sinX() library routine.
It's best illustrated by followup conversation with bart, IMHO 100%
O.T.
To make the point more clear I'd rather change it to following form:

#include <math.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>

void foo(long double x1, long double x2)
{
const double y1 = (double)sinl(x1);
const double y2 = (double)sinl(x2);
printf("%.20Le %.17f\n", x1, y1);
printf("%.20Le %.17f\n", x2, y2);
}

int main(void) {
const float x0 = (float)(1LL<<53);
{
printf("float (%zu bits, %d mantissa bits)\n", CHAR_BIT * sizeof
(float), FLT_MANT_DIG); const float x1 = x0;
const float x2 = nextafterf(x1, FLT_MAX);
foo(x1, x2);
}
putchar('\n');
{
printf("double (%zu bits, %d mantissa bits)\n", CHAR_BIT * sizeof
(double), DBL_MANT_DIG); const double x1 = x0;
const double x2 = nextafter(x1, FLT_MAX);
foo(x1, x2);
}
putchar('\n');
{
printf("long double (%zu bits, %d mantissa bits)\n", CHAR_BIT *
sizeof (long double), LDBL_MANT_DIG); const long double x1 = x0;
const long double x2 = nextafterl(x1, FLT_MAX);
foo(x1, x2);
}
}
Michael S
2024-03-17 09:34:58 UTC
Permalink
On Sun, 17 Mar 2024 11:06:21 +0200
Post by Michael S
On Sat, 16 Mar 2024 16:19:11 -0700
Post by Keith Thompson
Post by MitchAlsup1
Post by Michael S
On Sat, 16 Mar 2024 01:16:25 +0000
Post by MitchAlsup1
Post by Keith Thompson
I can see how computing sin(x) with high precision for
"reasonable" values of x would be useful, but does any of that
benefit from being able to compute sin(2^53) accurately?
Because accurate argument reduction reduces the burden on the
programmer to remain within his sandbox.
Not really.
Say you are a programmer and you receive a value like 2^53 from an
Input read and you wan the most accurate possible SIN( of that ).
I can't think of a scenario where that would be useful (other than
just doing it for the sake of doing it).
If 2^53 represents a physical quantity, how likely is the actual
value to be known within ±π (+/i pi for those who prefer ASCII)?
If you can get better precision without too much extra cost, that's
great. I don't know enough to have an opinion about what the best
tradeoff is, but I presume it's going to be different depending on
the application.
Here's a C program that shows how precise sin(2^53) can be for types
float, double, and long double (I used gcc and glibc). The
nextafter functions are used to compute the nearest representable
number. For long double, the value of sin() changes by about 1
part in 1600, which seems decent, but it's not nearly as precise as
for values around 1.0. For float and double, the imprecision of the
argument is enough to make the result practically meaningless.
<snip>
Post by Michael S
As written, your example does not emphasize that the problem has
nothing to do with implementation of sinX() library routine.
It's best illustrated by followup conversation with bart, IMHO 100%
O.T.
<snip>
BTW, personally I'd prefer to illustrate futility of Payne-Hanek or
equivalent reduction with following example:

#include <math.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>

void foo(long double x)
{
const double y = (double)sinl(x);
printf("%.20Le %25.16La %.17f\n", x, x, y);
}

int main(void) {
const float a0 = 0x1p56;
const float b0 = 11;
{
printf("%-12s (%3zu bits, %d mantissa bits) ",
"float", CHAR_BIT * sizeof (float), FLT_MANT_DIG);
const float a = a0;
const float b = b0;
const float x = a / b;
foo(x);
}
{
printf("%-12s (%3zu bits, %d mantissa bits) ",
"double", CHAR_BIT * sizeof (double), DBL_MANT_DIG);
const double a = a0;
const double b = b0;
const double x = a / b;
foo(x);
}
{
printf("%-12s (%3zu bits, %d mantissa bits) ",
"long double", CHAR_BIT * sizeof (long double), LDBL_MANT_DIG);
const long double a = a0;
const long double b = b0;
const long double x = a / b;
foo(x);
}
}
Chris M. Thomasson
2024-03-15 22:13:44 UTC
Permalink
Post by Chris M. Thomasson
Post by Terje Mathisen
Michael, I for the main part agree with you here, i.e. calculating
sin(x) with x larger than 2^53 or so, is almost certainly stupid.
[...]
;^D tooooooo big. :^)
Actually, I can think of one of my fractals that tries to scale back
numbers that get too big.

https://www.shadertoy.com/view/XtscDl

When a complex number breaches an escape time barrier, my code simply
scales it back, say z = z * .242, and tries the fractal again. It has
the effect of exploding the fractal out beyond its original boundaries...
Post by Chris M. Thomasson
Now, wrt the results, arbitrary precision for trig is useful, in say...
Deep fractal zooms...
Zooming in really deep in say something like this, well the precision of
https://paulbourke.net/fractals/multijulia/
Trig would be used, say, in rectangular to-from polar forms wrt getting
the n-ary roots of a complex number?
Stefan Monnier
2024-03-18 19:18:17 UTC
Permalink
Post by Michael S
Post by Terje Mathisen
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
That much? I had the impression it was significantly cheaper.
Post by Michael S
At which cost in tables sizes?
My impression was that it wasn't costly in that respect, but since my
recollection seems to be off on the performance, maybe it's off here
as well.
Post by Michael S
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
I don't know what are/were the motivations for the people working on
exact transcendentals, but they have applications unrelated to the fact
that they're "better": the main benefit (from this here PL guy) is that
it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.


Stefan
MitchAlsup1
2024-03-18 22:19:19 UTC
Permalink
Post by Stefan Monnier
Post by Michael S
Post by Terje Mathisen
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
Post by Stefan Monnier
Post by Michael S
At which cost in tables sizes?
Making transcendental faster is always a tradeoff between table size
and speed. SIN() COS() can use 10-term polynomials when the reduced
argument is -¼pi..+¼pi, on the other hand, when the reduced argument
is ±0.008 a 3 term polynomial is just as accurate but you need 128×3
tables.
Post by Stefan Monnier
My impression was that it wasn't costly in that respect, but since my
recollection seems to be off on the performance, maybe it's off here
as well.
ITANIC did rather well, here, because it had 2 FMAC units and could use
both for the polynomials.
Post by Stefan Monnier
Post by Michael S
The critical point here is definition of what considered exact. If
'exact' is measured only on y side of y=foo(x), disregarding
possible imprecision on the x side then you are very likely to end up
with results that are slower to calculate, but not at all more useful
from point of view of engineer or physicist. Exactly like Payne-Hanek
or Mitch's equivalent of Payne-Hanek.
I don't know what are/were the motivations for the people working on
exact transcendentals, but they have applications unrelated to the fact
that they're "better": the main benefit (from this here PL guy) is that
it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to machine.
When libm is correctly rounded, there is no issue at all; not so other-
wise.
Post by Stefan Monnier
Stefan
Stefan Monnier
2024-03-20 13:54:36 UTC
Permalink
Post by MitchAlsup1
Post by Stefan Monnier
Post by Terje Mathisen
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project claims
to get much better performance (basically, in the same ballpark as
not-correctly-rounded implementations).

[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]

My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
Post by MitchAlsup1
Post by Stefan Monnier
I don't know what are/were the motivations for the people working on
exact transcendentals, but they have applications unrelated to the fact
that they're "better": the main benefit (from this here PL guy) is that
it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to machine.
When libm is correctly rounded, there is no issue at all; not so other-
wise.
Exactly!
[ Or should I say "Correctly rounded!"? ]


Stefan
Michael S
2024-03-20 16:21:47 UTC
Permalink
On Wed, 20 Mar 2024 09:54:36 -0400
Post by Stefan Monnier
Post by MitchAlsup1
Post by Stefan Monnier
Post by Terje Mathisen
There are groups who have shown that exactly rounded
trancendental functions are in fact achievable with maybe 3X
reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
I had only read the 1st page.
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
Post by Stefan Monnier
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32 results
in approximately the same time as results calculated with max. err of,
say, 0.75 ULP. Especially so if target machine has fast binary64
arithmetic.
But in practice when we use lower (than binary64) precision we often
care about vector performance rather than scalar.
I.e. we care little about speed of sinf(), but want ippsTone_32f() as
fast as possible. In case you wonder, this function is part Intel
Performance Primitives and it is FAST. Writing correctly rounded
function that approaches the speed of this *almost* correctly
rounded routine (I think, for sane input ranges it's better than
0.55 ULP) would not be easy at all!
Post by Stefan Monnier
Post by MitchAlsup1
Post by Stefan Monnier
I don't know what are/were the motivations for the people working
on exact transcendentals, but they have applications unrelated to
the fact that they're "better": the main benefit (from this here
PL guy) is that it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to
machine. When libm is correctly rounded, there is no issue at all;
not so other- wise.
Exactly!
[ Or should I say "Correctly rounded!"? ]
Stefan
You like this proposal because you are implementer of the language/lib.
It makes your regression tests easier. And it's good challenge.
I don't like it because I am primarily user of the language/lib. My
floating-point tests have zero chance of repeatability of this sort for
a thousand of other reasons.
I don't want to pay for correct rounding of transcendental functions.
Neither in speed and especially nor in tables footprint. Not even a
little. Because for me there are no advantages.
Now, there are things that I am ready to pay for. E.g. preservation of
mathematical properties of original exact function. I.e. if original is
monotonic on certain interval then I do want at least weak monotonicity
of approximation. If original is even (or odd) I want the same for
approximation. If original never exceed 1, I want the same
for approximation. Etc... But correct rounding is not on the list.
Stefan Monnier
2024-03-20 16:59:24 UTC
Permalink
Post by Michael S
Post by Stefan Monnier
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
I had only read the 1st page.
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
Indeed, I hoped by now they had attacked the world of 64bit floats, but
it looks like it's not the case yet. In theory the same idea is
applicable there and should give similar results, but of course it's
harder for two reasons:

- The search space to consider is much larger, making it much more
costly to do exhaustive testing (and to collect the info they need to
choose/compute the polynomials).
- You can't cheaply use higher-precision floats for internal computations.
Post by Michael S
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32 results
in approximately the same time as results calculated with max. err of,
say, 0.75 ULP. Especially so if target machine has fast binary64
arithmetic.
IIUC that was not the case before their work: it was "easy" to get the
correct result in 99% of the cases, but covering all 100% of the cases
used to be costly because those few cases needed a lot more
internal precision.
Post by Michael S
I don't want to pay for correct rounding of transcendental functions.
Neither in speed and especially nor in tables footprint. Not even a
little. Because for me there are no advantages.
For that reason, correctly rounded results were nowhere near the menu,
indeed. But the proposition of Rlibm is that maybe we can have our cake
and eat it too, because (if all goes well) you don't have to pay for it
in speed nor in table size.

I'm hoping that pans out :-)
Post by Michael S
Now, there are things that I am ready to pay for. E.g. preservation of
mathematical properties of original exact function. I.e. if original is
monotonic on certain interval then I do want at least weak monotonicity
of approximation. If original is even (or odd) I want the same for
approximation. If original never exceed 1, I want the same
for approximation. Etc... But correct rounding is not on the list.
But correct rounding would give you those properties.


Stefan
MitchAlsup1
2024-03-20 20:40:25 UTC
Permalink
Post by Stefan Monnier
Post by Michael S
Post by Stefan Monnier
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
I had only read the 1st page.
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
Indeed, I hoped by now they had attacked the world of 64bit floats, but
it looks like it's not the case yet. In theory the same idea is
applicable there and should give similar results, but of course it's
- The search space to consider is much larger, making it much more
costly to do exhaustive testing (and to collect the info they need to
choose/compute the polynomials).
- You can't cheaply use higher-precision floats for internal computations.
You can in HW, just not in SW. That is a strong algorithms to migrate
these functions from SW procedures to HW instructions. You do not need
FP128, just FP with a 64-bit fraction.
Post by Stefan Monnier
Post by Michael S
It all depend of what you compare against.
OK:: SIN(x) takes 19-cycles and is 0.5002 accurate with Payne & Hanek argument
reduction--Against any SW implementation.
Post by Stefan Monnier
Post by Michael S
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32 results
in approximately the same time as results calculated with max. err of,
say, 0.75 ULP.
Yes, at the cost of 4× larger tables.
Post by Stefan Monnier
Especially so if target machine has fast binary64
Post by Michael S
arithmetic.
Obviously.
Post by Stefan Monnier
IIUC that was not the case before their work: it was "easy" to get the
correct result in 99% of the cases, but covering all 100% of the cases
used to be costly because those few cases needed a lot more
internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get correct
roundings 100% of the time. FP128 only has 2×n+3 and is insufficient by
itself.
Post by Stefan Monnier
Post by Michael S
I don't want to pay for correct rounding of transcendental functions.
Neither in speed and especially nor in tables footprint. Not even a
little. Because for me there are no advantages.
For that reason, correctly rounded results were nowhere near the menu,
indeed. But the proposition of Rlibm is that maybe we can have our cake
and eat it too, because (if all goes well) you don't have to pay for it
in speed nor in table size.
I'm hoping that pans out :-)
Post by Michael S
Now, there are things that I am ready to pay for. E.g. preservation of
mathematical properties of original exact function. I.e. if original is
monotonic on certain interval then I do want at least weak monotonicity
of approximation. If original is even (or odd) I want the same for
approximation. If original never exceed 1, I want the same
for approximation. Etc... But correct rounding is not on the list.
But correct rounding would give you those properties.
Stefan
Terje Mathisen
2024-03-21 07:52:18 UTC
Permalink
Post by MitchAlsup1
Post by Stefan Monnier
IIUC that was not the case before their work: it was "easy" to get the
correct result in 99% of the cases, but covering all 100% of the cases
used to be costly because those few cases needed a lot more
internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get correct
roundings 100% of the time. FP128 only has 2×n+3 and is insufficient by
itself.
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.

On the one hand, 53*2+6 -> 112, on the other hand (if we include the
hidden bits) we get 54*2+5 -> 113.

So significantly more than 2n+3 but not enough on its own to guarantee
correct rounding.

As Michael S have mentioned, we want these algorithms to work for
vector/SIMD calculations, and at that point both lookup tables and
widening the size of the temporaries are very costly.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Michael S
2024-03-21 12:51:33 UTC
Permalink
On Thu, 21 Mar 2024 08:52:18 +0100
Post by Terje Mathisen
Post by MitchAlsup1
Post by Stefan Monnier
IIUC that was not the case before their work: it was "easy" to get
the correct result in 99% of the cases, but covering all 100% of
the cases used to be costly because those few cases needed a lot
more internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get
correct roundings 100% of the time. FP128 only has 2×n+3 and is
insufficient by itself.
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.
IEEE-754 binary64 is 1:11:52 :-)

But anyway I am skeptical about Miller's rules of thumb.
I'd expect that different transcendental functions would exercise
non-trivially different behaviors, mostly because they have different
relationships between input and output ranges. Some of them compress
wider inputs into narrower output and some do the opposite.
Yet another factor is luck.

Besides, I see nothing special about binary128 as a helper format.
It is not supported on wast majority of HW, And even when it is
supported, like on IBM POWER, for majority of operations it is slower
than emulated 128-bit fixed-point. Fix-point is more work for coder, but
sounds like more sure path to success.
Post by Terje Mathisen
On the one hand, 53*2+6 -> 112, on the other hand (if we include the
hidden bits) we get 54*2+5 -> 113.
So significantly more than 2n+3 but not enough on its own to
guarantee correct rounding.
As Michael S have mentioned, we want these algorithms to work for
vector/SIMD calculations, and at that point both lookup tables and
widening the size of the temporaries are very costly.
Terje
MitchAlsup1
2024-03-21 16:37:29 UTC
Permalink
Post by Michael S
On Thu, 21 Mar 2024 08:52:18 +0100
Post by Terje Mathisen
Post by MitchAlsup1
Post by Stefan Monnier
IIUC that was not the case before their work: it was "easy" to get
the correct result in 99% of the cases, but covering all 100% of
the cases used to be costly because those few cases needed a lot
more internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get
correct roundings 100% of the time. FP128 only has 2×n+3 and is
insufficient by itself.
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.
IEEE-754 binary64 is 1:11:52 :-)
But anyway I am skeptical about Miller's rules of thumb.
See Chapter 10 "Elementary Functions: Algorithms and Implementation"
Jean-Michael Muller {you can find a free copy on the net} and in
particular tables 10.5-10.14 -- quoting from the book::

"In his PhD dissertation [206], Lef`evre gives algorithms for finding
the worst cases of the table maker’s dilemma. These algorithms use linear
approximations and massive parallelism. A recent presentation of these
algorithms, with some improvements, can be found in [207]. We have run
Lef`evre’s algorithms to find worst cases in double-precision for the
most common functions and domains. The results obtained so far, first
published in [208] are given in Tables 10.5 to 10.14

They are NOT rules of thumb !! But go read it for yourself.
Post by Michael S
I'd expect that different transcendental functions would exercise
non-trivially different behaviors, mostly because they have different
relationships between input and output ranges. Some of them compress
wider inputs into narrower output and some do the opposite.
Yet another factor is luck.
Besides, I see nothing special about binary128 as a helper format.
It is not supported on wast majority of HW, And even when it is
supported, like on IBM POWER, for majority of operations it is slower
than emulated 128-bit fixed-point. Fix-point is more work for coder, but
sounds like more sure path to success.
Post by Terje Mathisen
On the one hand, 53*2+6 -> 112, on the other hand (if we include the
hidden bits) we get 54*2+5 -> 113.
So significantly more than 2n+3 but not enough on its own to
guarantee correct rounding.
As Michael S have mentioned, we want these algorithms to work for
vector/SIMD calculations, and at that point both lookup tables and
widening the size of the temporaries are very costly.
Terje
Terje Mathisen
2024-03-23 08:11:38 UTC
Permalink
Post by Michael S
On Thu, 21 Mar 2024 08:52:18 +0100
Post by Terje Mathisen
Post by Stefan Monnier
IIUC that was not the case before their work: it was "easy" to get
the correct result in 99% of the cases, but covering all 100% of
the cases used to be costly because those few cases needed a lot
more internal precision.
Muller indicates one typically need 2×n+6 to 2×n+12 bits to get
correct roundings 100% of the time. FP128 only has 2×n+3 and is
insufficient by itself.
I agree with everything else you've written about this subject, but
afair, fp128 is using 1:15:112 while double is of course 1:10:53.
IEEE-754 binary64 is 1:11:52 :-)
Oops! Mea Culpa!
I _do_ know that double has a 10-bit exponent bias (1023), so it has to
be 11 bits wide. :-(
Post by Michael S
But anyway I am skeptical about Miller's rules of thumb.
I'd expect that different transcendental functions would exercise
non-trivially different behaviors, mostly because they have different
relationships between input and output ranges. Some of them compress
wider inputs into narrower output and some do the opposite.
Yet another factor is luck.
I agree, this is a per-function problem, with some being substantially
harder than others.
Post by Michael S
Besides, I see nothing special about binary128 as a helper format.
It is not supported on wast majority of HW, And even when it is
supported, like on IBM POWER, for majority of operations it is slower
than emulated 128-bit fixed-point. Fix-point is more work for coder, but
sounds like more sure path to success.
In my own code (since I don't have Mitch's ability to use much wider
internal fp formats) I also prefer 64-bit u64 as the working chunk size.

Almost 30 years ago, during the FDIV workaround, I needed a q&d way to
verify that our fpatan2 replacement was correct, so what I did was to
write a 1:31:96 format library over a weekend.

Yeah, it was much more exponent than needed, but with only 32-bit
registers available it was much easier to get the asm correct.

For the fpatan2 I used a dead simple approach with little range
reduction, just a longish Taylor series (i.e. no Cheby optimizations).

I had previously written 3-4 different iomplementations of arbitrary
precision atan2() when I wanted to calculatie as many digits of pi as
possible, so I just reused one of those algorithms.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Steven G. Kargl
2024-03-20 17:02:57 UTC
Permalink
Post by Michael S
On Wed, 20 Mar 2024 09:54:36 -0400
Post by Stefan Monnier
Post by MitchAlsup1
Post by Stefan Monnier
Post by Terje Mathisen
There are groups who have shown that exactly rounded
trancendental functions are in fact achievable with maybe 3X
reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
I had only read the 1st page.
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
I skimmed their logf(x) implementation. Their technique will
fall a part for binary64 (and other higher precisions). With
logf(x), they combine an argument step with table look-up and
a 5th-order polynomial approximation. The polynomial is computed
in double precision, and provides the correct rounding. For
binary64, the polynomial would require many more terms and
double-double or binary128 evaluation.
--
steve
MitchAlsup1
2024-03-20 20:47:02 UTC
Permalink
Post by Steven G. Kargl
Post by Michael S
On Wed, 20 Mar 2024 09:54:36 -0400
Post by Stefan Monnier
Post by MitchAlsup1
Post by Stefan Monnier
Post by Terje Mathisen
There are groups who have shown that exactly rounded
trancendental functions are in fact achievable with maybe 3X
reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project
claims to get much better performance (basically, in the same
ballpark as not-correctly-rounded implementations).
I had only read the 1st page.
It sounds like they are not particularly interested in IEEE binary64
which appears to be the major point of interest of math libs of
conventional languages.
I skimmed their logf(x) implementation. Their technique will
fall a part for binary64 (and other higher precisions). With
logf(x), they combine an argument step with table look-up and
a 5th-order polynomial approximation. The polynomial is computed
in double precision, and provides the correct rounding. For
binary64, the polynomial would require many more terms and
double-double or binary128 evaluation.
The first 7 terms of the DP polynomial do not require FP128, whereas
the last 3 do/will.

Whereas: HW with 64-bits of fraction does not need FP128 at all.
64-bits of fraction with 64-bit coefficients added into a 128-bit
accumulator is more than sufficient.

Note FP64 has 53-bits of fraction.....
MitchAlsup1
2024-03-20 20:33:44 UTC
Permalink
Post by Michael S
On Wed, 20 Mar 2024 09:54:36 -0400
Post by Stefan Monnier
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32 results
in approximately the same time as results calculated with max. err of,
say, 0.75 ULP. Especially so if target machine has fast binary64
arithmetic.
But in practice when we use lower (than binary64) precision we often
care about vector performance rather than scalar.
I.e. we care little about speed of sinf(), but want ippsTone_32f() as
fast as possible. In case you wonder, this function is part Intel
Performance Primitives and it is FAST. Writing correctly rounded
function that approaches the speed of this *almost* correctly
rounded routine (I think, for sane input ranges it's better than
0.55 ULP) would not be easy at all!
I challenge ANY software version of SIN() correctly rounded or not
to compete with my <patented> HW implementations for speed (or even
power).
Post by Michael S
Post by Stefan Monnier
Post by MitchAlsup1
Post by Stefan Monnier
I don't know what are/were the motivations for the people working
on exact transcendentals, but they have applications unrelated to
the fact that they're "better": the main benefit (from this here
PL guy) is that it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to
machine. When libm is correctly rounded, there is no issue at all;
not so other- wise.
Exactly!
[ Or should I say "Correctly rounded!"? ]
Stefan
You like this proposal because you are implementer of the language/lib.
It makes your regression tests easier. And it's good challenge.
I don't like it because I am primarily user of the language/lib. My
floating-point tests have zero chance of repeatability of this sort for
a thousand of other reasons.
I don't want to pay for correct rounding of transcendental functions.
Even when the HW is 7× faster than SW algorithms ??
Post by Michael S
Neither in speed and especially nor in tables footprint. Not even a
little. Because for me there are no advantages.
My HW algorithms use no memory (cache, DRAM, or even LDs.)
Post by Michael S
Now, there are things that I am ready to pay for. E.g. preservation of
mathematical properties of original exact function.
You know, of course, that incorrectly rounded SIN() and COS() do not
maintain the property of SIN()^2+COS()^2 == 1.0
Post by Michael S
I.e. if original is
monotonic on certain interval then I do want at least weak monotonicity
of approximation.
My HW algorithms have a numeric proof of this maintenance.
Post by Michael S
If original is even (or odd) I want the same for
approximation. If original never exceed 1, I want the same
for approximation. Etc... But correct rounding is not on the list.
SIN(x) can be incorrectly rounded to be greater than 1.0.

Still want incorrect rounding--or just a polynomial that does not
have SI(X) > 1.0 ??
Michael S
2024-03-20 22:03:48 UTC
Permalink
On Wed, 20 Mar 2024 20:33:44 +0000
Post by MitchAlsup1
Post by Michael S
On Wed, 20 Mar 2024 09:54:36 -0400
Post by Stefan Monnier
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact
result and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
My impression was that their performance was good enough that the
case for not-correctly-rounded implementations becomes very weak.
It all depend of what you compare against.
For scalar call for majority of transcendental functions on IEEE-754
list, it's probably very easy to get correctly rounded binary32
results in approximately the same time as results calculated with
max. err of, say, 0.75 ULP. Especially so if target machine has
fast binary64 arithmetic.
But in practice when we use lower (than binary64) precision we often
care about vector performance rather than scalar.
I.e. we care little about speed of sinf(), but want ippsTone_32f()
as fast as possible. In case you wonder, this function is part Intel
Performance Primitives and it is FAST. Writing correctly rounded
function that approaches the speed of this *almost* correctly
rounded routine (I think, for sane input ranges it's better than
0.55 ULP) would not be easy at all!
I challenge ANY software version of SIN() correctly rounded or not
to compete with my <patented> HW implementations for speed (or even
power).
Before you post this response, you could as well look at what
ippsTone_32f() is doing. Hint - it's not generic scalar sin().
IMHO, for long enough vector and on modern enough Intel or AMD CPU it
will very easily beat any scalar-oriented binary64-oriented HW
implementation of sin() or cos().
This function is not about latency. It's about throughput.

AFAIR, youu were quite surprised by speed (throughput) of another IPP
primitive, ippsSin_64f_A53() when I posted results of timing
measurement here less than 2 yeears ago. So, before you issue a
challenge, just take into account that ippsTone_32f() is both more
specialized than ippsSin_64f_A53() and has much lower precision. So,
while I didn't test, I expect that it is much much faster.
MitchAlsup1
2024-03-20 20:26:03 UTC
Permalink
Post by Stefan Monnier
Post by MitchAlsup1
Post by Stefan Monnier
Post by Terje Mathisen
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project claims
to get much better performance (basically, in the same ballpark as
not-correctly-rounded implementations).
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
This sounds like the "Minefield Method"
Post by Stefan Monnier
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
Post by MitchAlsup1
Post by Stefan Monnier
I don't know what are/were the motivations for the people working on
exact transcendentals, but they have applications unrelated to the fact
that they're "better": the main benefit (from this here PL guy) is that
it gives them a reliable, reproducible semantics.
Bit-for-bit reproducibility makes several things much easier.
Consider moving an application which uses libm from machine to machine.
When libm is correctly rounded, there is no issue at all; not so other-
wise.
Exactly!
[ Or should I say "Correctly rounded!"? ]
Stefan
Stefan Monnier
2024-03-20 20:34:22 UTC
Permalink
Post by MitchAlsup1
Post by Stefan Monnier
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
This sounds like the "Minefield Method"
Indeed it is.


Stefan
Terje Mathisen
2024-03-21 07:38:53 UTC
Permalink
Post by Stefan Monnier
Post by MitchAlsup1
Post by Stefan Monnier
Post by Terje Mathisen
There are groups who have shown that exactly rounded trancendental
functions are in fact achievable with maybe 3X reduced performance.
That much? I had the impression it was significantly cheaper.
The J. M. Muller book indicates about 2× to 2.5×
The [Rlibm](https://people.cs.rutgers.edu/~sn349/rlibm/) project claims
to get much better performance (basically, in the same ballpark as
not-correctly-rounded implementations).
[ Their key insight is the idea that to get correct rounding, you
shouldn't try to compute the best approximation of the exact result
and then round, but you should instead try to compute any
approximation whose rounding gives the correct result. ]
That is indeed interesting. However, it is also very interesting that
they only do this for 32-bit or less. I.e. the domains for which it is
almost trivially easy to verify the results by checking all possible
inputs. :-)
Post by Stefan Monnier
My impression was that their performance was good enough that the case
for not-correctly-rounded implementations becomes very weak.
I agree in priniciple: If you can use polys that are not much more
complicated than the min-max/cheby case, and which always round to the
desired values, then that's an obvious good.
...
Reading the full log2f() code, it seems that they can use double for all
evaluations, and with a single (!) mantissa exception, this produces the
correct results for all inputs and all rounding modes. :-)

I.e. with 53 bits to work with, giving up about one ulp for the range
reduction, the 52 remaining bits corresponds to 2n+6 bits available for
the 5-term poly to evaluate a final float result.

When asking for perfectly rounded trancendentals, with approximately the
same runtime, the float case is a form of cheating, simply because
current FPUs tend to run float and double at the same speed.

OTOH, I'm still persuaded that for a float library, using this approach
might in fact be included in the 754 standard.

Doing the same for double by "simply" doing all operations with fp128
variables would definitely take significantly longer, and verification
would be an "interesting" problem. (Interesting in the "multiple PhDs"
domain.)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Michael S
2024-02-23 12:32:50 UTC
Permalink
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
Post by Steven G. Kargl
Post by MitchAlsup1
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.
Payne-Hanek is a right answer to a wrong question.
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in range
[x-ULP/2:x+ULP/2]. However in this reduced range probability is flat,
an "exact" x is no worse and no better than any other point within this
range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful answer
to the same question is "You're holding it wrong".

* in paragraph Sin(x) designates result calculated with infinite
precision.
MitchAlsup1
2024-02-23 20:02:00 UTC
Permalink
Post by Michael S
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
Post by Steven G. Kargl
Post by MitchAlsup1
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the
top of stack of papers on my desk. Still need to internalize
the details.
Payne-Hanek is a right answer to a wrong question.
If someone looking for fast sin(x) you are correct, for a person looking
for the best possible (correctly rounded) result, you are not. In any
event I have driven that monstrosity down from 50-odd cycles to 4--
making it at least palatable.
Post by Michael S
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in range
[x-ULP/2:x+ULP/2]. However in this reduced range probability is flat,
an "exact" x is no worse and no better than any other point within this
range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful answer
to the same question is "You're holding it wrong".
There are verification tests that know the exact value of both x and sin(x).
For these uses you argument is not valid--however I am willing to grant
that outside of verification, and in actual use, your argument holds as
well ad the noise in the data provides.
Post by Michael S
* in paragraph Sin(x) designates result calculated with infinite
precision.
Steven G. Kargl
2024-02-23 20:38:59 UTC
Permalink
Post by MitchAlsup1
Post by Michael S
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
Post by MitchAlsup1
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the top of
stack of papers on my desk. Still need to internalize the details.
Payne-Hanek is a right answer to a wrong question.
If someone looking for fast sin(x) you are correct, for a person looking
for the best possible (correctly rounded) result, you are not. In any
event I have driven that monstrosity down from 50-odd cycles to 4--
making it at least palatable.
Post by Michael S
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in range
[x-ULP/2:x+ULP/2]. However in this reduced range probability is flat,
an "exact" x is no worse and no better than any other point within this
range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful answer
to the same question is "You're holding it wrong".
There are verification tests that know the exact value of both x and sin(x).
For these uses you argument is not valid--however I am willing to grant
that outside of verification, and in actual use, your argument holds as
well ad the noise in the data provides.
Fortunately, for 32-bit single precision floating point, one can
exhaustively test single-argument functions. For the SW implementation
on FreeBSD, exhaustive testing shows

% tlibm sinpi -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinpif: [0,8.38861e+06]
1000000 calls, 0.015453 secs, 0.01545 usecs/call
ulp <= 0.5: 99.842% 1253631604 | 99.842% 1253631604
0.5 < ulp < 0.6: 0.158% 1989420 | 100.000% 1255621024
Max ulp: 0.583666 at 6.07950642e-05

% tlibm sin -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinf: [0,8.38861e+06]
1000000 calls, 0.019520 secs, 0.01952 usecs/call
ulp <= 0.5: 99.995% 1249842491 | 99.995% 1249842491
0.5 < ulp < 0.6: 0.005% 60102 | 100.000% 1249902593
Max ulp: 0.500900 at 3.68277281e+05

The speed test is for 1M values evenly distributed over
the interval. The difference in speed for sinpi vs sin
is due to the argument reduction.

Note 1: libm built with clang/llvm with only -O2 on a
AMD Phenom II X2 560 Processor (3300.14-MHz K8-class CPU).

Note 2: subnormal values for x are test not included in
the count as subnormal are tested with a different approach.
--
steve
MitchAlsup1
2024-02-23 22:29:17 UTC
Permalink
Post by Steven G. Kargl
Post by MitchAlsup1
Post by Michael S
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
Post by MitchAlsup1
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the top of
stack of papers on my desk. Still need to internalize the details.
Payne-Hanek is a right answer to a wrong question.
If someone looking for fast sin(x) you are correct, for a person looking
for the best possible (correctly rounded) result, you are not. In any
event I have driven that monstrosity down from 50-odd cycles to 4--
making it at least palatable.
Post by Michael S
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in range
[x-ULP/2:x+ULP/2]. However in this reduced range probability is flat,
an "exact" x is no worse and no better than any other point within this
range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful answer
to the same question is "You're holding it wrong".
There are verification tests that know the exact value of both x and sin(x).
For these uses you argument is not valid--however I am willing to grant
that outside of verification, and in actual use, your argument holds as
well ad the noise in the data provides.
Fortunately, for 32-bit single precision floating point, one can
All of the numerics I have spoken about are DP (64-bit) values.
Post by Steven G. Kargl
exhaustively test single-argument functions. For the SW implementation
on FreeBSD, exhaustive testing shows
% tlibm sinpi -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinpif: [0,8.38861e+06]
1000000 calls, 0.015453 secs, 0.01545 usecs/call
ulp <= 0.5: 99.842% 1253631604 | 99.842% 1253631604
0.5 < ulp < 0.6: 0.158% 1989420 | 100.000% 1255621024
Max ulp: 0.583666 at 6.07950642e-05
% tlibm sin -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinf: [0,8.38861e+06]
1000000 calls, 0.019520 secs, 0.01952 usecs/call
ulp <= 0.5: 99.995% 1249842491 | 99.995% 1249842491
0.5 < ulp < 0.6: 0.005% 60102 | 100.000% 1249902593
Max ulp: 0.500900 at 3.68277281e+05
I find it interesting that sinpi() has worse numeric error than sin().

I also find it interesting that the highest error is in the easy part
of polynomial evaluation without argument reduction whereas sin() has
its worst result in a region where significant argument reduction has
transpired.

{{Seems like another case of faster poor answers top slower correct
answers.}}
Post by Steven G. Kargl
The speed test is for 1M values evenly distributed over
the interval. The difference in speed for sinpi vs sin
is due to the argument reduction.
But result precision suffers nonetheless.
Post by Steven G. Kargl
Note 1: libm built with clang/llvm with only -O2 on a
AMD Phenom II X2 560 Processor (3300.14-MHz K8-class CPU).
Note 2: subnormal values for x are test not included in
the count as subnormal are tested with a different approach.
Lawrence D'Oliveiro
2024-02-23 22:39:19 UTC
Permalink
Post by MitchAlsup1
I find it interesting that sinpi() has worse numeric error than sin().
Another argument for sticking with the simpler solution? One set of
radian-specific functions plus conversion factors, instead of a separate
set of unit-specific functions for every unit?
Steven G. Kargl
2024-02-24 04:03:08 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by MitchAlsup1
I find it interesting that sinpi() has worse numeric error than sin().
Another argument for sticking with the simpler solution? One set of
radian-specific functions plus conversion factors, instead of a separate
set of unit-specific functions for every unit?
Here's a counter argument for your simple conversion
factors. Changing FreeBSD's libm to return sinf(M_PI*x)
for sinpif(x), one observes

%./tlibm sinpi -f -x 0 -X 0x1p22 -PED
Interval tested for sinpif: [0,4.1943e+06]
ulp <= 0.5: 83.174% 1037372501 | 83.174% 1037372501
0.5 < ulp < 0.6: 0.937% 11690904 | 84.111% 1049063405
0.6 < ulp < 0.7: 0.689% 8587516 | 84.800% 1057650921
0.7 < ulp < 0.8: 0.497% 6200902 | 85.297% 1063851823
0.8 < ulp < 0.9: 0.323% 4023726 | 85.620% 1067875549
0.9 < ulp < 1.0: 0.157% 1952256 | 85.776% 1069827805
1.0 < ulp < 1.5: 0.347% 4332879 | 86.124% 1074160684
1.5 < ulp < 2.0: 0.247% 3083647 | 86.371% 1077244331
2.0 < ulp < 3.0: 0.360% 4491936 | 86.731% 1081736267
3.0 < ulp < 0.0: 13.269% 165496149 | 100.000% 1247232416
Max ulp: 8388278.500000 at 5.65300049e+03

% ./tlibm sinpi -f -a 5.65300049e+03
x = 5.65300049e+03f, /* 0x45b0a801 */
libm = -2.51050433e-03f, /* 0xbb248746 */
mpfr = -1.53398013e-03f, /* 0xbac90fd5 */
ULP = 8388278.50000

I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
--
steve
Lawrence D'Oliveiro
2024-02-24 04:47:29 UTC
Permalink
Post by Steven G. Kargl
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
That’s 9 figures, but single-precision IEEE754 floats only allow about 7.
Steven G. Kargl
2024-02-24 05:27:10 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Steven G. Kargl
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
That’s 9 figures, but single-precision IEEE754 floats only allow about 7.
ROTFL. I suggest (re-)reading the entire section

IEEE Std 754TM-2008

5.12.2 External decimal character sequences representing
finite numbers

However, I'll give you the salient part on p.32


For the purposes of discussing the limits on correctly
rounded conversion, define the following quantities:

for binary32, Pmin (binary32) = 9
...

Conversions from a supported binary format bf to an external
character sequence and back again results in a copy of the
original number so long as there are at least Pmin (bf)
significant digits specified and the rounding-direction
attributes in effect during the two conversions are round to
nearest rounding-direction attributes.

You also seem to miss that I gave you the bit pattern as
an unsigned 32-bit int. There are no extra binary digits.

At this point, I think it might be prudent for you to
read Goldberg's paper [1] and IEEE 754.

[1] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

I'm done responding to your trolls and apologies other c.l.c
posters.
--
steve
Kaz Kylheku
2024-02-24 05:48:33 UTC
Permalink
Post by Steven G. Kargl
Post by Lawrence D'Oliveiro
Post by Steven G. Kargl
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
That’s 9 figures, but single-precision IEEE754 floats only allow about 7.
ROTFL. I suggest (re-)reading the entire section
[ snip ]
Post by Steven G. Kargl
for binary32, Pmin (binary32) = 9
Also, from the comp.lang.c point of view, in the C language we have
FLT_DIG and FLT_DECIMAL_DIG in <float.h>.

On IEEE 754 systems these are 6 and 9.

FLT_DIG gives you, roughly speaking, how many decimal digits of
precision a float can preserve in the decimal -> float -> decimal
conversion sequence

FLT_DECIMAL_DIG indicates how many digits are required to exactly
preserve a float value in decimal form: float -> decimal -> float.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Lawrence D'Oliveiro
2024-02-24 05:38:53 UTC
Permalink
Post by Steven G. Kargl
% ./tlibm sinpi -f -a 5.65300049e+03
x = 5.65300049e+03f, /* 0x45b0a801 */
libm = -2.51050433e-03f, /* 0xbb248746 */
mpfr = -1.53398013e-03f, /* 0xbac90fd5 */
ULP = 8388278.50000
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
They’re all wrong. Python says (in double-precision), that
math.sin(5653.00049) is -0.9566595256716378.

Even GCC’s sinf function says the value is -0.9565167, which is only a
slight discrepancy in the fourth decimal place, so even in single-
precision it is doing a darn sight better than you have managed above.

So where is there a version of the code you’re using, using different
angle units, that gets closer to the right answer?
Steven G. Kargl
2024-02-24 06:13:58 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Steven G. Kargl
% ./tlibm sinpi -f -a 5.65300049e+03
x = 5.65300049e+03f, /* 0x45b0a801 */
libm = -2.51050433e-03f, /* 0xbb248746 */
mpfr = -1.53398013e-03f, /* 0xbac90fd5 */
ULP = 8388278.50000
I can assure one of the libm and mpfr values, when x = 5653.00049,
is quite wrong.
They’re all wrong. Python says (in double-precision), that
math.sin(5653.00049) is -0.9566595256716378.
Even GCC’s sinf function says the value is -0.9565167, which is only a
slight discrepancy in the fourth decimal place, so even in single-
precision it is doing a darn sight better than you have managed above.
So where is there a version of the code you’re using, using different
angle units, that gets closer to the right answer?
sin(x) is not sinpi(x). The conversion factor that you're missing
is M_PI as in sinpi(x) = sin(M_PI*x). The whole point of the
half-cycle trig functions is that one does not need to do the
multiplication by pi, or more importantly the argument reduction
modulo pi.

% ./tlibm sin -f -a 5.65300049e+3
x = 5.65300049e+03f, /* 0x45b0a801 */
libm = -9.56659019e-01f, /* 0xbf74e79b */
mpfr = -9.56659019e-01f, /* 0xbf74e79b */
ULP = 0.10338
--
steve
MitchAlsup1
2024-03-14 17:15:11 UTC
Permalink
Post by Steven G. Kargl
sin(x) is not sinpi(x). The conversion factor that you're missing
is M_PI as in sinpi(x) = sin(M_PI*x).
When I faced this kind of accuracy/precision problem in my HW transcendentals,

To get a sufficiently correct reduced argument I had to multiply the fraction
of x by a 2/pi number selected such that the HoB was aligned to 2 (quadrant)
and the multiplied result had 51+53 bits of precision so that up to 51 leading
bits of the reduced argument (after the quadrant bits) could be skipped if 0.
This required 3 uses of the multiplier tree one produced the leading bits::

Lead bits |/
Middle bits / /
trail bits /|

which were arranged |/ /| into a single 104 bit (minimum) product. My current
implementation uses 128 bits. And my polynomial evaluation uses 58-bit argu-
ments (min, 64-bit current) at each iteration. And I still only get 0.502
(58-bit:: 0.5002 64-bit) precision.

Payne and Hanek argument reduction--because it does not have access to the
intermediate bits, needs 4 multiplies instead of 2 and very careful arithmetic
to preserve accuracy. I can do this in 2 trips through the multiplier array
(patented)

So, I used 159-bits of 2/pi in 2 multiplies over 2 trips through the array
and get perfect argument (DP) reduction in 4 cycles. Payne ad Hanek use 256-
bits of DP FP operands and 30+ instruction to do the same thing.

My multiplier tree is cut into 2 sections (just like one would do for dual SP)
but here I feed the top 2/pi bits into the left hand side and the bottom 2/pi
bits into the right hand side so both get computed simultaneously; the subsequent
cycle multiplies by the middle bits of 2/pi. The 2/pi table is indexed by exponent.
Steven G. Kargl
2024-02-23 23:20:17 UTC
Permalink
Post by MitchAlsup1
Post by Steven G. Kargl
Post by MitchAlsup1
Post by Michael S
On Fri, 23 Feb 2024 00:13:02 -0000 (UTC)
Post by MitchAlsup1
I have a patent on doing argument reduction for sin(x) in 4
cycles...that is as good as Payne-Haneok argument reduction over
-infinity..+infinity
By a strange coincidence, I have the Payn-Hanek paper on the top of
stack of papers on my desk. Still need to internalize the details.
Payne-Hanek is a right answer to a wrong question.
If someone looking for fast sin(x) you are correct, for a person
looking for the best possible (correctly rounded) result, you are not.
In any event I have driven that monstrosity down from 50-odd cycles to
4-- making it at least palatable.
Post by Michael S
In science/engineering floating-point number x represents a range
[x-ULP:x+ULP] with hopefully much higher probability of being in
range [x-ULP/2:x+ULP/2]. However in this reduced range probability is
flat, an "exact" x is no worse and no better than any other point
within this range.
Which means that when we look for y=sin(x) then for
scientific/engineering purposes any answer in range*
[sin(x-ULP/2):sin(x+ULP/2)] is as "right" as any other answer. The
right answer to the question "What is a sin() of IEEE-754 binary64
number 1e17?" is "Anything in range [-1:1]". The wise and useful
answer to the same question is "You're holding it wrong".
There are verification tests that know the exact value of both x and sin(x).
For these uses you argument is not valid--however I am willing to
grant that outside of verification, and in actual use, your argument
holds as well ad the noise in the data provides.
Fortunately, for 32-bit single precision floating point, one can
All of the numerics I have spoken about are DP (64-bit) values.
Sure, but it isn't possible to exhaustively DP on current hardware
(available to me). Simple testing can give one a warm fuzzy feeling.
With 10M values in the range [0,0x1p53], the SW sinpi seems to be better
than the SW sin.

% tlibm sin -d -x 0 -X 0x1p53 -N 10 -s 0
Interval tested for sin: [0,9.0072e+15]
10000000 calls, 2.594275 secs, 0.25943 usecs/call
count: 10000000
xm = 3.4119949728897955e+15, /* 0x43283e61, 0xf8ab4587 */
libm = 7.2135591711063873e-01, /* 0x3fe71559, 0x0118855e */
mpfr = 7.2135591711063862e-01, /* 0x3fe71559, 0x0118855d */
ULP = 0.77814

% tlibm sinpi -d -x 0 -X 0x1p53 -N 10 -s 0
Interval tested for sinpi: [0,9.0072e+15]
10000000 calls, 0.184541 secs, 0.01845 usecs/call
count: 10000000
xm = 1.5384297865527400e+12, /* 0x42766318, 0xf99b8bd7 */
libm = 7.2898962872051931e-01, /* 0x3fe753e2, 0x0ecf4a3e */
mpfr = 7.2898962872051942e-01, /* 0x3fe753e2, 0x0ecf4a3f */
ULP = 0.69476

Note, the timing difference is again dominated by argument reduction.

Also note, that each of the above tests took ~180 cpu seconds. At
the moment, tlibm will not use multiple cores or cpus on other nodes.
Post by MitchAlsup1
Post by Steven G. Kargl
exhaustively test single-argument functions. For the SW implementation
on FreeBSD, exhaustive testing shows
[0,8.38861e+06]
1000000 calls, 0.015453 secs, 0.01545 usecs/call
ulp <= 0.5: 99.842% 1253631604 | 99.842% 1253631604
0.583666 at 6.07950642e-05
[0,8.38861e+06]
1000000 calls, 0.019520 secs, 0.01952 usecs/call
ulp <= 0.5: 99.995% 1249842491 | 99.995% 1249842491
0.500900 at 3.68277281e+05
I find it interesting that sinpi() has worse numeric error than sin().
I also find it interesting that the highest error is in the easy part of
polynomial evaluation without argument reduction whereas sin() has its
worst result in a region where significant argument reduction has
transpired.
{{Seems like another case of faster poor answers top slower correct
answers.}}
For my purposes, a max ulp 0.538 is good enough. The time needed
to reduce this to 0.5009 is better spent on other unimplemented
libm functions in Freebsd's libm or implemented functions with
much worse ULP
Post by MitchAlsup1
Post by Steven G. Kargl
The speed test is for 1M values evenly distributed over the interval.
The difference in speed for sinpi vs sin is due to the argument
reduction.
But result precision suffers nonetheless.
Argument reduction itself isn't the culprit. Go read the FreeBSD source
code. What is done once one has the reduced argument is the issue. I
suspect that any patch you come up with would be gladly accepted.
--
steve
Terje Mathisen
2024-02-25 15:19:02 UTC
Permalink
Post by Steven G. Kargl
Fortunately, for 32-bit single precision floating point, one can
exhaustively test single-argument functions. For the SW implementation
on FreeBSD, exhaustive testing shows
% tlibm sinpi -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinpif: [0,8.38861e+06]
1000000 calls, 0.015453 secs, 0.01545 usecs/call
ulp <= 0.5: 99.842% 1253631604 | 99.842% 1253631604
0.5 < ulp < 0.6: 0.158% 1989420 | 100.000% 1255621024
Max ulp: 0.583666 at 6.07950642e-05
That seems worse than it should be, I would expect better results than
the 0.5009 quoted for regular sin() below, and still with the same or
better performance. The only obvious explanation would be that sin(x)
over +/- 45 degrees would have a first Taylor term of 1.0*x, and you
could probably force the first Cheby term to be the same, so by adding
that as the last stage you get improved precision for free?
Post by Steven G. Kargl
% tlibm sin -fPED -x 0 -X 0x1p23 -s 0
Interval tested for sinf: [0,8.38861e+06]
1000000 calls, 0.019520 secs, 0.01952 usecs/call
ulp <= 0.5: 99.995% 1249842491 | 99.995% 1249842491
0.5 < ulp < 0.6: 0.005% 60102 | 100.000% 1249902593
Max ulp: 0.500900 at 3.68277281e+05
The speed test is for 1M values evenly distributed over
the interval. The difference in speed for sinpi vs sin
is due to the argument reduction.
Note 1: libm built with clang/llvm with only -O2 on a
AMD Phenom II X2 560 Processor (3300.14-MHz K8-class CPU).
Note 2: subnormal values for x are test not included in
the count as subnormal are tested with a different approach.
sin(x) for Subnormal x seems like it should be trivial: If you just
return x the maximum error of a Taylor series with two terms
(x - x^3/3!)
would be that second term which is by definition zero since x is already
subnormal, right?

If you want to perform well on a platform where subnormal ops traps to a
much slower hardware or software path, then it might make sense to start
by checking for |x| < 2^-340 or so?

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
MitchAlsup1
2024-02-25 18:11:40 UTC
Permalink
Post by Terje Mathisen
sin(x) for Subnormal x seems like it should be trivial: If you just
return x the maximum error of a Taylor series with two terms
(x - x^3/3!)
would be that second term which is by definition zero since x is already
subnormal, right?
The 2 term polynomial begins to be accurate when |x| < ~2^-19 which is far
bigger than 2^-1023. (2^-19)^3 is 2^-57 which has few digits in the resulting
fraction 2^-19; certainly no 3rd term is required.
Post by Terje Mathisen
If you want to perform well on a platform where subnormal ops traps to a
much slower hardware or software path, then it might make sense to start
by checking for |x| < 2^-340 or so?
Many subnormal operands yield constant results or themselves::

cos(sn) = 1.0
sin(sn) = sn
tan(sn) = sn
Post by Terje Mathisen
Terje
Terje Mathisen
2024-02-23 10:01:02 UTC
Permalink
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the dozens
or hundreds of examples of the usefulness of radians as an angle unit?
In digital signal processing circle-based units are pretty much always
more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most obvious
choice.
From [binary floating point] numerical properties perspective, 1/8th of
the circle (==pi/4 radians = 45 degrees) is probably the best option
for a library routine, because for sin() its derivative at 0 is closest
to 1 among all powers of two which means that loss of precision
near 0 is very similar for input and for output. But this advantage
does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't matter
if you use circles, half-circles (i.e. sinpi) or some other binary
fraction of a circle: Argument reduction for huge inputs are just as
easy, you might just have to multiply by the corresponding power of two
(i.e. adjust the exponent) before extracting the fractional term.

For sinpi(x) I could do it like this:

if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
ix = int(x);
x_reduced = x - (double) (ix & ~1);
if (x_reduced < 0.0) x_reduced += 2.0;

but it is probably better to return a value in the [-1.0 .. 1.0> range?

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
Michael S
2024-02-23 12:02:09 UTC
Permalink
On Fri, 23 Feb 2024 11:01:02 +0100
Post by Terje Mathisen
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
π radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the
dozens or hundreds of examples of the usefulness of radians as an
angle unit?
In digital signal processing circle-based units are pretty much
always more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most
obvious choice.
From [binary floating point] numerical properties perspective,
1/8th of the circle (==pi/4 radians = 45 degrees) is probably the
best option for a library routine, because for sin() its derivative
at 0 is closest to 1 among all powers of two which means that loss
of precision near 0 is very similar for input and for output. But
this advantage does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't
matter if you use circles, half-circles (i.e. sinpi) or some other
binary fraction of a circle: Argument reduction for huge inputs are
just as easy, you might just have to multiply by the corresponding
power of two (i.e. adjust the exponent) before extracting the
fractional term.
if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
ix = int(x);
x_reduced = x - (double) (ix & ~1);
if (x_reduced < 0.0) x_reduced += 2.0;
but it is probably better to return a value in the [-1.0 .. 1.0> range?
Terje
Both you and Mitch look at it from wrong perspective.
When we define a library API, an ease of implementation of the library
function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
The real concern of API designer should be with avoidance of loss of
precision in preparation of inputs and use of outputs.
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the
argument. An implementation, no matter how good, can't recover what's
already lost.
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.

My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of
precision.
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos
inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
Now, one could ask "Why 2**64, why not 2**56 that has the same
property?". My answer is "Because 2**64 is a scaling that is most
convenient for preparation of trig arguments in fix-point [on 64-bit
computer]." I.e. apart from being a good scaling for avoidance of loss
of precision in tiny range, it happens to be the best scaling for
interoperability with fixed point.
That is my answer today. Will it hold tomorrow? Tomorrow will tell.
Terje Mathisen
2024-02-25 15:00:58 UTC
Permalink
Post by Michael S
On Fri, 23 Feb 2024 11:01:02 +0100
Post by Terje Mathisen
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Ï€ radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the
dozens or hundreds of examples of the usefulness of radians as an
angle unit?
In digital signal processing circle-based units are pretty much
always more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most
obvious choice.
From [binary floating point] numerical properties perspective,
1/8th of the circle (==pi/4 radians = 45 degrees) is probably the
best option for a library routine, because for sin() its derivative
at 0 is closest to 1 among all powers of two which means that loss
of precision near 0 is very similar for input and for output. But
this advantage does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't
matter if you use circles, half-circles (i.e. sinpi) or some other
binary fraction of a circle: Argument reduction for huge inputs are
just as easy, you might just have to multiply by the corresponding
power of two (i.e. adjust the exponent) before extracting the
fractional term.
if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
ix = int(x);
x_reduced = x - (double) (ix & ~1);
if (x_reduced < 0.0) x_reduced += 2.0;
but it is probably better to return a value in the [-1.0 .. 1.0> range?
Terje
Both you and Mitch look at it from wrong perspective.
When we define a library API, an ease of implementation of the library
function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
The real concern of API designer should be with avoidance of loss of
precision in preparation of inputs and use of outputs.
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the
argument. An implementation, no matter how good, can't recover what's
already lost.
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.
My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of
precision.
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos
inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
Now, one could ask "Why 2**64, why not 2**56 that has the same
property?". My answer is "Because 2**64 is a scaling that is most
convenient for preparation of trig arguments in fix-point [on 64-bit
computer]." I.e. apart from being a good scaling for avoidance of loss
of precision in tiny range, it happens to be the best scaling for
interoperability with fixed point.
That is my answer today. Will it hold tomorrow? Tomorrow will tell.
This is, except for today being a 64-bit world as opposed to 30 years
earlier, exactly the same reasoning Garmin's programmers used to decide
that all their lat/long calculations would use 2^32 as the full circle.

With a signed 32-bit int you get a resolution of 40e6m / 2^32 = 0.0093m
or 9.3 mm, which they considered was more than good enough back in the
days of SA and its ~100 RMS noise, and even after Clinton got rid of
that (May 2 2000 or 2001?), sub-cm GPS is very rarely available.

Doing the same with 64-bit means that you get a resolution of 2.17e-12 m
which is 2.17 picometers or 0.0217 Å, so significantly smaller than a
single H atom which is about 1 Å in size.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
MitchAlsup1
2024-03-14 17:28:35 UTC
Permalink
Post by Terje Mathisen
Post by Michael S
On Fri, 23 Feb 2024 11:01:02 +0100
Post by Terje Mathisen
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Ï€ radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the
dozens or hundreds of examples of the usefulness of radians as an
angle unit?
In digital signal processing circle-based units are pretty much
always more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
From algorithmic perspective, full circle looks like the most
obvious choice.
From [binary floating point] numerical properties perspective,
1/8th of the circle (==pi/4 radians = 45 degrees) is probably the
best option for a library routine, because for sin() its derivative
at 0 is closest to 1 among all powers of two which means that loss
of precision near 0 is very similar for input and for output. But
this advantage does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't
matter if you use circles, half-circles (i.e. sinpi) or some other
binary fraction of a circle: Argument reduction for huge inputs are
just as easy, you might just have to multiply by the corresponding
power of two (i.e. adjust the exponent) before extracting the
fractional term.
if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
ix = int(x);
x_reduced = x - (double) (ix & ~1);
if (x_reduced < 0.0) x_reduced += 2.0;
but it is probably better to return a value in the [-1.0 .. 1.0> range?
Terje
Both you and Mitch look at it from wrong perspective.
When we define a library API, an ease of implementation of the library
function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
The real concern of API designer should be with avoidance of loss of
precision in preparation of inputs and use of outputs.
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the
argument. An implementation, no matter how good, can't recover what's
already lost.
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.
My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of
precision.
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos
inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
Now, one could ask "Why 2**64, why not 2**56 that has the same
property?". My answer is "Because 2**64 is a scaling that is most
convenient for preparation of trig arguments in fix-point [on 64-bit
computer]." I.e. apart from being a good scaling for avoidance of loss
of precision in tiny range, it happens to be the best scaling for
interoperability with fixed point.
That is my answer today. Will it hold tomorrow? Tomorrow will tell.
This is, except for today being a 64-bit world as opposed to 30 years
earlier, exactly the same reasoning Garmin's programmers used to decide
that all their lat/long calculations would use 2^32 as the full circle.
With a signed 32-bit int you get a resolution of 40e6m / 2^32 = 0.0093m
or 9.3 mm, which they considered was more than good enough back in the
days of SA and its ~100 RMS noise, and even after Clinton got rid of
that (May 2 2000 or 2001?), sub-cm GPS is very rarely available.
The drift rate of the oscillators in the satellites is such it will neve be.
The clock drift is reset every orbit.

Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
Post by Terje Mathisen
Doing the same with 64-bit means that you get a resolution of 2.17e-12 m
which is 2.17 picometers or 0.0217 Å, so significantly smaller than a
single H atom which is about 1 Å in size.
And yet, driving by Edwards AFB sometimes my car's GPS shows my 50m off the
interstate quality road, and sometimes not.
Post by Terje Mathisen
Terje
Chris M. Thomasson
2024-03-14 22:06:55 UTC
Permalink
Post by MitchAlsup1
Post by Terje Mathisen
Post by Michael S
On Fri, 23 Feb 2024 11:01:02 +0100
Post by Terje Mathisen
Post by Michael S
On Thu, 22 Feb 2024 21:04:52 -0000 (UTC)
Ï€ radians = half a circle. Are there any other examples of the
usefulness of half-circles as an angle unit? As opposed to the
dozens or hundreds of examples of the usefulness of radians as an
angle unit?
In digital signal processing circle-based units are pretty much
always more natural than radians.
For specific case of 1/2 circle, I can't see where it can be used
directly.
  From algorithmic perspective, full circle looks like the most
obvious choice.
  From [binary floating point] numerical properties perspective,
1/8th of the circle (==pi/4 radians = 45 degrees) is probably the
best option for a library routine, because for sin() its derivative
at 0 is closest to 1 among all powers of two which means that loss
of precision near 0 is very similar for input and for output. But
this advantage does not sound like particularly big deal.
ieee754 defines sinpi() and siblings, but imho it really doesn't
matter if you use circles, half-circles (i.e. sinpi) or some other
binary fraction of a circle: Argument reduction for huge inputs are
just as easy, you might just have to multiply by the corresponding
power of two (i.e. adjust the exponent) before extracting the
fractional term.
   if (abs(x) >= two_to_52nd_power) error("Zero significant bits.");
   ix = int(x);
   x_reduced = x - (double) (ix & ~1);
   if (x_reduced < 0.0) x_reduced += 2.0;
but it is probably better to return a value in the [-1.0 .. 1.0> range?
Terje
Both you and Mitch look at it from wrong perspective.
When we define a library API, an ease of implementation of the library
function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
The real concern of API designer should be with avoidance of loss of
precision in preparation of inputs and use of outputs.
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the
argument. An implementation, no matter how good, can't recover what's
already lost.
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.
My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of
precision.
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos
inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
Now, one could ask "Why 2**64, why not 2**56 that has the same
property?". My answer is "Because 2**64 is a scaling that is most
convenient for preparation of trig arguments in fix-point [on 64-bit
computer]." I.e. apart from being a good scaling for avoidance of loss
of precision in tiny range, it happens to be the best scaling for
interoperability with fixed point.
That is my answer today. Will it hold tomorrow? Tomorrow will tell.
This is, except for today being a 64-bit world as opposed to 30 years
earlier, exactly the same reasoning Garmin's programmers used to
decide that all their lat/long calculations would use 2^32 as the full
circle.
With a signed 32-bit int you get a resolution of 40e6m / 2^32 =
0.0093m or 9.3 mm, which they considered was more than good enough
back in the days of SA and its ~100 RMS noise, and even after Clinton
got rid of that (May 2 2000 or 2001?), sub-cm GPS is very rarely
available.
The drift rate of the oscillators in the satellites is such it will neve be.
The clock drift is reset every orbit.
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
GPS Spoofing?
Post by MitchAlsup1
Post by Terje Mathisen
Doing the same with 64-bit means that you get a resolution of 2.17e-12
m which is 2.17 picometers or 0.0217 Å, so significantly smaller than
a single H atom which is about 1 Å in size.
And yet, driving by Edwards AFB sometimes my car's GPS shows my 50m off the
interstate quality road, and sometimes not.
Post by Terje Mathisen
Terje
MitchAlsup1
2024-03-14 22:18:27 UTC
Permalink
Post by Chris M. Thomasson
Post by MitchAlsup1
The drift rate of the oscillators in the satellites is such it will neve be.
The clock drift is reset every orbit.
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
GPS Spoofing?
More so in Europe than USA, but it is coming. Mentor Pilot had a video on
this last week.
Chris M. Thomasson
2024-03-14 22:28:28 UTC
Permalink
Post by MitchAlsup1
Post by Chris M. Thomasson
Post by MitchAlsup1
The drift rate of the oscillators in the satellites is such it will neve be.
The clock drift is reset every orbit.
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
GPS Spoofing?
More so in Europe than USA, but it is coming. Mentor Pilot had a video on
this last week.
Thanks. The device thinks its all right... However, its connected to a
hyper nefarious dynamic data provider. Yikes!
Terje Mathisen
2024-03-15 11:00:45 UTC
Permalink
Post by MitchAlsup1
Post by Terje Mathisen
This is, except for today being a 64-bit world as opposed to 30 years
earlier, exactly the same reasoning Garmin's programmers used to
decide that all their lat/long calculations would use 2^32 as the full
circle.
With a signed 32-bit int you get a resolution of 40e6m / 2^32 =
0.0093m or 9.3 mm, which they considered was more than good enough
back in the days of SA and its ~100 RMS noise, and even after Clinton
got rid of that (May 2 2000 or 2001?), sub-cm GPS is very rarely
available.
The drift rate of the oscillators in the satellites is such it will neve be.
The clock drift is reset every orbit.
Sub-meter (typically 2 cm at 10-20 Hz) GPS requires a nearby (static)
base station which can measure the individual pseudo-range errors from
each sat and send that to the measuring device (the rover) over a
separate channel.

If you record all the pseudorange values, then you can do the same with
post-processing, this can be useful for DIY surveying.
Post by MitchAlsup1
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
The potential for this attack is one of the reasons the military signal
is encrypted, so that it cannot be spoofed.
Post by MitchAlsup1
Post by Terje Mathisen
Doing the same with 64-bit means that you get a resolution of 2.17e-12
m which is 2.17 picometers or 0.0217 Ã…, so significantly smaller than
a single H atom which is about 1 Ã… in size.
And yet, driving by Edwards AFB sometimes my car's GPS shows my 50m off the
interstate quality road, and sometimes not.
50 m is a bit high, but still fairly typical for what commercial
receivers can do in the vicinity of refelcting surfaces like cliffs or
buildings.

Modern multi-system receivers are actually getting much better at
detecting and mitigating such problems. Starting with
GPS+Glonass+Galileo having worldwide coverage, then adding in the
Chinese and Japanese sats means that as long as you don't worry to much
about pwer usage, you can do quite well. My personal target is sub 3m
when under a wet forest canopy, that is good enough for orienteering map
field survey work.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
George Neuner
2024-03-15 15:40:57 UTC
Permalink
On Fri, 15 Mar 2024 12:00:45 +0100, Terje Mathisen
Post by Terje Mathisen
Post by MitchAlsup1
Post by Terje Mathisen
This is, except for today being a 64-bit world as opposed to 30 years
earlier, exactly the same reasoning Garmin's programmers used to
decide that all their lat/long calculations would use 2^32 as the full
circle.
With a signed 32-bit int you get a resolution of 40e6m / 2^32 =
0.0093m or 9.3 mm, which they considered was more than good enough
back in the days of SA and its ~100 RMS noise, and even after Clinton
got rid of that (May 2 2000 or 2001?), sub-cm GPS is very rarely
available.
The drift rate of the oscillators in the satellites is such it will neve be.
The clock drift is reset every orbit.
Sub-meter (typically 2 cm at 10-20 Hz) GPS requires a nearby (static)
base station which can measure the individual pseudo-range errors from
each sat and send that to the measuring device (the rover) over a
separate channel.
If you record all the pseudorange values, then you can do the same with
post-processing, this can be useful for DIY surveying.
Post by MitchAlsup1
Also note: hackers are now using ground based GPS transmitters to alter
where your GPS calculates where you think you are. This is most annoying
around airports when planes use GPS to auto guide the planes to runways.
The potential for this attack is one of the reasons the military signal
is encrypted, so that it cannot be spoofed.
Post by MitchAlsup1
Post by Terje Mathisen
Doing the same with 64-bit means that you get a resolution of 2.17e-12
m which is 2.17 picometers or 0.0217 Ã…, so significantly smaller than
a single H atom which is about 1 Ã… in size.
And yet, driving by Edwards AFB sometimes my car's GPS shows my 50m off the
interstate quality road, and sometimes not.
50 m is a bit high, but still fairly typical for what commercial
receivers can do in the vicinity of refelcting surfaces like cliffs or
buildings.
Also the AFB operates differential GPS for its runways. An OTS unit
might be confused by close proximity to the ground transmitter.
Post by Terje Mathisen
Modern multi-system receivers are actually getting much better at
detecting and mitigating such problems. Starting with
GPS+Glonass+Galileo having worldwide coverage, then adding in the
Chinese and Japanese sats means that as long as you don't worry to much
about pwer usage, you can do quite well. My personal target is sub 3m
when under a wet forest canopy, that is good enough for orienteering map
field survey work.
Terje
MitchAlsup1
2024-03-14 22:31:57 UTC
Permalink
Post by Michael S
On Fri, 23 Feb 2024 11:01:02 +0100
Both you and Mitch look at it from wrong perspective.
For the record, I am only addressing SIN() or SINPI() as a HW instruction.
Post by Michael S
When we define a library API, an ease of implementation of the library
function should be pretty low on our priority scale. As long as
reasonably precise calculation is theoretically possible, we should
give credit to intelligence of implementor, that's all.
At the HW level, when someone executes a
SIN R9,R17
instruction, they should get a result that has been computed to very
high precision, including argument reductions (which may require more
bits in the reduced argument than fit in an IEEE container).
Post by Michael S
The real concern of API designer should be with avoidance of loss of
precision in preparation of inputs and use of outputs.
With an instruction performing all the work, this argument is moot.
Post by Michael S
In specific case of y=sin2pi(x), it is x that is more problematic,
because near 0 it starts to lose precision 3 octaves before y. In
subnormal range we lose ~2.5 bits of precision in preparation of the
argument.
NOT when the argument reduction process does not contain ROUNDINGs !!!
OR when the argument reduction process contains more precision bits than
fit in an IEEE container !!! I determined for SIN() this needed 57-bits
of intermediate reduced argument, for TAN() and ATAN() this requires
58-bits:: compared to the 53 available in normalized IEEE 754.
Post by Michael S
An implementation, no matter how good, can't recover what's
already lost.
My 66000 ISA has these as instructions (not function calls) and has
the execution width to avoid your perils.
Post by Michael S
sinpi() is slightly better, but only slightly, not enough better to
justify less natural semantics.
SINPI() is also an instruction.
SIN() takes 19 cycles whereas SINPI() takes 16.....
Post by Michael S
My yesterday suggestion is a step in right direction, but today I think
that it is not sufficiently radical step. In specific case of sin/cos
there is no good reason to match loss of precision on input with loss of
precision.
In the end, you either have a precise result, or you do not.
Post by Michael S
<Thinking out load>
There are advantages in matched loss of precision for input and for
output when both input and output occupy full range of real numbers
(ignoring sign). Output of sin/cos does not occupy o full range. But
for tan() it does. So, may be, there is a one good reason for matching
input with output for sin/cos - consistency with tan.
</Thinking out load>
So, ignoring tan(), what is really an optimal input scaling for sin/cos
inputs? Today I think that it is a scaling in which full circle
corresponds to 2**64. With such scaling you never lose any input
precision to subnoramals before precision of the result is lost
completely.
See USPTO 10,761,806 and 10,983,755 for your answer.
Loading...