Implement the "unum" representation in D ? (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Implement the "unum" representation in D ? (page 2)

February 20, 2014

Re: Implement the "unum" representation in D ?

Posted by Francesco Cattoglio
in reply to Nick B

Francesco Cattoglio

Posted in reply to Nick B

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
>
> The abstract is here:  http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

"The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second"

Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.

February 20, 2014

Re: Implement the "unum" representation in D ?

Posted by Nordlöw
in reply to jerro

Nordlöw

Posted in reply to jerro

> Also, because they are variable sized, you need to access them through pointers if you want random access. Now you are reading both the pointer and the value from memory.

We might not need random access though.

For basic linear algebra forward or bidirectional should be enough which should suit this format.

Depends on the application.

February 20, 2014

Re: Implement the "unum" representation in D ?

Posted by jerro
in reply to Nordlöw

jerro

Posted in reply to Nordlöw

> We might not need random access though.

Even if you don't need random access, you can't
store them in a packed way if you want to be able
to mutate them in place, since mathematical operations
on them can change the number of bits they take.

> Depends on the application.

I suspect that in most numerical applications
the number of bits needed to store the numbers
will keep increasing during the computation until
it reaches the maximal supported value. That can
actually happen very fast - a single division
with a non power of two is enough. Applications
that could actually benefit from this in any way
are probably extremely rare. The only one I can
think of is to use it as a very simple form of
compression, but I'm sure there are better options
for that.

February 20, 2014

Re: Implement the "unum" representation in D ?

Posted by Nordlöw
in reply to Nick B

Nordlöw

Posted in reply to Nick B

The unum variable length encoding is very similar to how msgpack packs integers. See msgpack-d on github for a superb implementation in D.

February 20, 2014

Re: Implement the "unum" representation in D ?

Posted by Francesco Cattoglio
in reply to Nordlöw

Francesco Cattoglio

Posted in reply to Nordlöw

On Thursday, 20 February 2014 at 23:21:26 UTC, Nordlöw wrote:
> We might not need random access though.
>
> For basic linear algebra forward or bidirectional should be enough which should suit this format.
>
> Depends on the application.

"Basic Linear Algebra" often requires sparse matrices. Sparse Matrices = both random access and indirection.

February 20, 2014

Re: Implement the "unum" representation in D ?

Posted by Chris Williams
in reply to Francesco Cattoglio

Chris Williams

Posted in reply to Francesco Cattoglio

On Thursday, 20 February 2014 at 23:13:20 UTC, Francesco Cattoglio wrote:
> On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
>>
>> The abstract is here:  http://openparallel.com/multicore-world-2014/speakers/john-gustafson/
>
> "The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second"
>
> Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.

I don't quite understand his ubox stuff, but his unum format doesn't really solve the 0.1 problem, except maybe by allowing the size of his values to exceed 64-bits so that precision errors creap up a little bit slower. (I'm not sure how many bits his format tops out at and I don't want to re-open the PDF again to look). It also wasn't clear whether his format removed the multiple values of NaN, -0, etc. It looked like it was just the current IEEE formats bonded to a sliding bit-length, which would bring along with it all the problems of the IEEE format that he mentioned.

I think the only way to solve for problems like 0.1 in decimal not mapping to any reasonable value in binary is to store numbers as integer equations (i.e. 0.1 = 1/10), which would take a hell of a complex format to represent and some pretty fancy CPUs.

February 21, 2014

Re: Implement the "unum" representation in D ?

Posted by Francesco Cattoglio
in reply to Chris Williams

Francesco Cattoglio

Posted in reply to Chris Williams

On Thursday, 20 February 2014 at 23:52:13 UTC, Chris Williams wrote:
> I don't quite understand his ubox stuff, but his unum format doesn't really solve the 0.1 problem, except maybe by allowing the size of his values to exceed 64-bits so that precision errors creap up a little bit slower. (I'm not sure how many bits his format tops out at and I don't want to re-open the PDF again to look).
Exctly. If I read correctly it should cover 128+ bits.
>It also wasn't clear whether his format removed
> the multiple values of NaN, -0, etc. It looked like it was just the current IEEE formats bonded to a sliding bit-length
Pretty sure it would not. It seems to me that space wasted by multiple representations is still there. The only real advantage is that you can probably store a NaN in 8 bits. Honestly, who cares. If I get a NaN in my numerical simulation, I have bigger concerns other than saving memory space.

> I think the only way to solve for problems like 0.1 in decimal not mapping to any reasonable value in binary is to store numbers as integer equations (i.e. 0.1 = 1/10), which would take a hell of a complex format to represent and some pretty fancy CPUs.
In fact, true rational numbers can only be represented by rational numbers. How extraordinary! :P
The whole idea has one merit: the float now stores it's own accuracy. But I think you can achieve more or less the same goal by storing a pair of something. The whole idea of saving space sounds bogus, at least for my field of application. We already have an amazing technique for saving a lot of space for numerical simulation of PDEs, it's called grid refinement.

February 21, 2014

Re: Implement the "unum" representation in D ?

Posted by Frustrated
in reply to jerro

Frustrated

Posted in reply to jerro

On Thursday, 20 February 2014 at 23:41:12 UTC, jerro wrote:
>> We might not need random access though.
>
> Even if you don't need random access, you can't
> store them in a packed way if you want to be able
> to mutate them in place, since mathematical operations
> on them can change the number of bits they take.
>
>> Depends on the application.
>
> I suspect that in most numerical applications
> the number of bits needed to store the numbers
> will keep increasing during the computation until
> it reaches the maximal supported value. That can
> actually happen very fast - a single division
> with a non power of two is enough. Applications
> that could actually benefit from this in any way
> are probably extremely rare. The only one I can
> think of is to use it as a very simple form of
> compression, but I'm sure there are better options
> for that.

Yes, but his ubox method attempts to solve this problem by
providing only the most accurate answers. By using a "sliding"
floating point you can control the accuracy much better and by
using the ubox method you can zero in on the most accurate
solutions.

I think a few here are missing the point. First: The unums self
scale to provide the "fewest bit" representation ala his morse
code vs ascii example. Not a huge deal, after all, it's only
memory. But by using such representations one can have more
accurate results because one can better represent values that are
not accurately representable in standard fp.

I think though adding a "repeating" bit would make it even more
accurate so that repeating decimals within the bounds of maximum
bits used could be represented perfectly. e.g., 1/3 = 0.3333...
could be represented perfectly with such a bit and sliding fp
type. With proper cpu support one could have 0.3333... * 3 = 1
exactly.

By having two extra bits one could represent constants to any
degree of accuracy. e.g., the last bit says the value represents
the ith constant in some list. This would allow very common
irrational constants to be used: e, pi, sqrt(2), etc... with more
accuracy than normal and handled internally by the cpu. With such
constants one could have sqrt(2)*sqrt(2) = 2 exactly. (designate
part of the constants as "squares" since one could have up to 2^n
- 1 constants)

The problem I see is that to make it all work requires a lot of
work on the hardware. Without it, there is no real reason to use
it.

One also runs into the problem of optimization. Having variable
sized fp numbers may not be very efficient in memory alignment. A
list of resizedable fp types would contain 1, 2, 3, 4, 5, ...
byte numbers of random size.

If you do a calculation on the list and try to use the list as
storage again could end up with a different sized list... which
could be very inefficient. You'll probably have to allocate a
list of the maximum size possible in the first place to prevent
buffer overflows, which then defeats the purpose in some sense

In any case, one could have a type where the last 2 bits
designate the representation.

00 - Positive Integer
01 - Floating point
10 - Constant - (value represents an index, possibly virtual,
into a table of constants)
11 - Negative Integer

For floating points, the 3rd last bit could represent a repeating
decimal or they could be used in the constants for common
repeating decimals. (since chances are, repeated calculations
would not produce repeating decimals)

February 21, 2014

Re: Implement the "unum" representation in D ?

Posted by francesco cattoglio
in reply to Frustrated

francesco cattoglio

Posted in reply to Frustrated

On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:
>
> I think though adding a "repeating" bit would make it even more
> accurate so that repeating decimals within the bounds of maximum
> bits used could be represented perfectly. e.g., 1/3 = 0.3333...
> could be represented perfectly with such a bit and sliding fp
> type. With proper cpu support one could have 0.3333... * 3 = 1
> exactly.
>
> By having two extra bits one could represent constants to any
> degree of accuracy. e.g., the last bit says the value represents
> the ith constant in some list. This would allow very common
> irrational constants to be used: e, pi, sqrt(2), etc...
Unfortunately maths (real world maths) isn't made by "common" constants. More, such a "repeating bit" should become a "repeating counter", since you usually get a certain number of repeating digits, not just a single one.

> For floating points, the 3rd last bit could represent a repeating
> decimal or they could be used in the constants for common
> repeating decimals. (since chances are, repeated calculations
> would not produce repeating decimals)
Things like those are cool and might have their application (I'm thinking mostly at message passing via TCP/IP), but have no real use in scientific computation. If you want good precision, you might as well be better off with bignum numbers.

February 21, 2014

Re: Implement the "unum" representation in D ?

Posted by ponce
in reply to Nordlöw

ponce

Posted in reply to Nordlöw

On Thursday, 20 February 2014 at 23:43:18 UTC, Nordlöw wrote:
> The unum variable length encoding is very similar to how msgpack packs integers. See msgpack-d on github for a superb implementation in D.

msgpack-d is indeed a great library that makes serialization almost instant to implement.

I implemented CBOR another binary encoding scheme and it was obvious CBOR brings nothing over msgpack, it even did worse with integer encoding.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation