Thread overview
Re: std.digest can't CTFE?
Jun 01, 2018
Johannes Pfau
Jun 01, 2018
Kagamin
Jun 01, 2018
Johannes Pfau
Jun 01, 2018
Kagamin
Jun 02, 2018
Atila Neves
Jun 08, 2018
Johannes Pfau
Jun 08, 2018
Manu
Jun 10, 2018
Johannes Pfau
June 01, 2018
Am Thu, 31 May 2018 18:12:35 -0700 schrieb Manu:

> Hashing's not low-level. It would be great if these did CTFE; generating
> compile-time hashes is a thing that would be really useful!
> Right here, I have a string class that carries a hash around with it for
> comparison reasons. Such string literals would prefer to have CT hashes.
> 

As I was the one who wrote that doc comment: For basically all hash implementations you'll be casting from an integer type to the raw bytes representation somewhere. As the binary presentation needs to be portable, you need to be aware of the endianess of the system you're running your code on. AFAIR CTFE does (did?) not provide any way to do endianess-dependent conversions at all and there's also no way to know the CTFE endianess, so this is a fundamental limitation. (E.g. if you have a cross-compiler targeting a system with a different endianess, version(BigEndian) will give you the target endianess. But what will actually be used in CTFE?).

I don't know if anything changed in this regard since std.digest was written some time ago. But if you get the std.bitmanip  nativeTo*Endian and *EndianToNative functions to work in CTFE, std.digest should work as well.

There may be some workaround, as IIRC druntimes core.internal.hash works in CTFE? It's either this, or it's buggy in that cross-compilation scenario ;-)

-- 
Johannes
June 01, 2018
On Friday, 1 June 2018 at 08:37:33 UTC, Johannes Pfau wrote:
> I don't know if anything changed in this regard since std.digest was written some time ago. But if you get the std.bitmanip  nativeTo*Endian and *EndianToNative functions to work in CTFE, std.digest should work as well.

Standard cryptographic algorithms are by design not dependent on endianness, rather they set on a specific endianness.
June 01, 2018
Am Fri, 01 Jun 2018 08:50:19 +0000 schrieb Kagamin:

> On Friday, 1 June 2018 at 08:37:33 UTC, Johannes Pfau wrote:
>> I don't know if anything changed in this regard since std.digest was written some time ago. But if you get the std.bitmanip  nativeTo*Endian and *EndianToNative functions to work in CTFE, std.digest should work as well.
> 
> Standard cryptographic algorithms are by design not dependent on endianness, rather they set on a specific endianness.

However you want to call it, the algorithms interpret data as numbers which means that the binary representation differs based on endianess. If you want portable results, you can't ignore that fact in the implementation. So even though the algorithms are not dependent on the endianess, the representation of the result is. Therefore standards do usually propose an internal byte order.

-- 
Johannes
June 01, 2018
On Friday, 1 June 2018 at 10:04:52 UTC, Johannes Pfau wrote:
> However you want to call it, the algorithms interpret data as numbers which means that the binary representation differs based on endianess. If you want portable results, you can't ignore that fact in the implementation. So even though the algorithms are not dependent on the endianess, the representation of the result is. Therefore standards do usually propose an internal byte order.

Huh? The algorithm packs bytes into integers and does it independently of platform. Once integers are formed, the arithmetic operations are independent of endianness. It works this way even in pure javascript, which is not sensitive to endianness.
June 02, 2018
On Friday, 1 June 2018 at 20:12:23 UTC, Kagamin wrote:
> On Friday, 1 June 2018 at 10:04:52 UTC, Johannes Pfau wrote:
>> However you want to call it, the algorithms interpret data as numbers which means that the binary representation differs based on endianess. If you want portable results, you can't ignore that fact in the implementation. So even though the algorithms are not dependent on the endianess, the representation of the result is. Therefore standards do usually propose an internal byte order.
>
> Huh? The algorithm packs bytes into integers and does it independently of platform. Once integers are formed, the arithmetic operations are independent of endianness. It works this way even in pure javascript, which is not sensitive to endianness.

It's a common programming misconception that endianness matters much. It's one of those that just won't go away, like "GC languages are slow" or "C is magically fast". I recommend reading this:

https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html

In short, unless you're a compiler writer or implementing a binary protocol endianness only matters if you cast between pointers and integers. So... Don't.

Atila

June 08, 2018
Am Sat, 02 Jun 2018 06:31:37 +0000 schrieb Atila Neves:

> On Friday, 1 June 2018 at 20:12:23 UTC, Kagamin wrote:
>> On Friday, 1 June 2018 at 10:04:52 UTC, Johannes Pfau wrote:
>>> However you want to call it, the algorithms interpret data as numbers which means that the binary representation differs based on endianess. If you want portable results, you can't ignore that fact in the implementation. So even though the algorithms are not dependent on the endianess, the representation of the result is. Therefore standards do usually propose an internal byte order.
>>
>> Huh? The algorithm packs bytes into integers and does it independently of platform. Once integers are formed, the arithmetic operations are independent of endianness. It works this way even in pure javascript, which is not sensitive to endianness.
> 
> It's a common programming misconception that endianness matters much. It's one of those that just won't go away, like "GC languages are slow" or "C is magically fast". I recommend reading this:
> 
> https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
> 
> In short, unless you're a compiler writer or implementing a binary protocol endianness only matters if you cast between pointers and integers. So... Don't.
> 
> Atila

That's an interesting point. When I said the algorithm depends on the
system endianess I was indeed always thinking in terms of machine code
(i.e. if system endianess=data endianess you hopefully do nothing at all,
otherwise you need some conversion).
But it is indeed true that describing conversion as mathematical shift
operations + indexing will leave handling these differences to the
compilers. So you can probably say the algorithm doesn't depend on system
endianess, although a low level representation of implementations will. I
guess this is what Kagamin wanted to explain, please excuse me for not
getting the point.

So in our case, we can obviously use that higher-abstraction-level interpretation and the idiom used in the article indeed works fine in CTFE. So somebody (@Manu?) just has to fix std.bitmanip *EndianToNative nativeTo*Endian functions to use this (probably benchmarking performance impacts). Then std.digest should simply start working or should at least be easy to fix for CTFE support.

-- 
Johannes
June 08, 2018
On Fri, 8 Jun 2018 at 11:35, Johannes Pfau via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>
> Am Sat, 02 Jun 2018 06:31:37 +0000 schrieb Atila Neves:
>
> > On Friday, 1 June 2018 at 20:12:23 UTC, Kagamin wrote:
> >> On Friday, 1 June 2018 at 10:04:52 UTC, Johannes Pfau wrote:
> >>> However you want to call it, the algorithms interpret data as numbers which means that the binary representation differs based on endianess. If you want portable results, you can't ignore that fact in the implementation. So even though the algorithms are not dependent on the endianess, the representation of the result is. Therefore standards do usually propose an internal byte order.
> >>
> >> Huh? The algorithm packs bytes into integers and does it independently of platform. Once integers are formed, the arithmetic operations are independent of endianness. It works this way even in pure javascript, which is not sensitive to endianness.
> >
> > It's a common programming misconception that endianness matters much. It's one of those that just won't go away, like "GC languages are slow" or "C is magically fast". I recommend reading this:
> >
> > https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
> >
> > In short, unless you're a compiler writer or implementing a binary protocol endianness only matters if you cast between pointers and integers. So... Don't.
> >
> > Atila
>
> That's an interesting point. When I said the algorithm depends on the
> system endianess I was indeed always thinking in terms of machine code
> (i.e. if system endianess=data endianess you hopefully do nothing at all,
> otherwise you need some conversion).
> But it is indeed true that describing conversion as mathematical shift
> operations + indexing will leave handling these differences to the
> compilers. So you can probably say the algorithm doesn't depend on system
> endianess, although a low level representation of implementations will. I
> guess this is what Kagamin wanted to explain, please excuse me for not
> getting the point.
>
> So in our case, we can obviously use that higher-abstraction-level interpretation and the idiom used in the article indeed works fine in CTFE. So somebody (@Manu?) just has to fix std.bitmanip *EndianToNative nativeTo*Endian functions to use this (probably benchmarking performance impacts). Then std.digest should simply start working or should at least be easy to fix for CTFE support.

I'm already burning about 3x my reasonably allocate-able free time to
DMD PR's...
I'd really love if someone else would look at that :)

I'm not quite sure what you mean though; endian conversion functions
are still endian conversion functions, and they shouldn't be affected
here.
The problem is in the std.digest code where it *calls* endian
functions (or makes endian assumptions). There need be no reference to
endian in std.digest... if code is pulling bytes from an int (ie,
cast(byte*)) or something, just use ubyte[4] and index it instead if
uint, etc. I'm surprised that digest code would use anything other
than byte buffers.
It may be that there are some optimised version()-ed fast-paths might
be endian conscious, but the default path has no reason to not work.
June 10, 2018
Am Fri, 08 Jun 2018 11:46:41 -0700 schrieb Manu:
> 
> I'm already burning about 3x my reasonably allocate-able free time to
> DMD PR's...
> I'd really love if someone else would look at that :)

I'll see if I can allocate some time for that. Should be a mostly trivial change.

> I'm not quite sure what you mean though; endian conversion functions are still endian conversion functions, and they shouldn't be affected here.

Yes, but the point made in that article is that you can implement *Endian<=>native conversions without knowing the native endianness. This would immediately make these functions CTFE-able.

> The problem is in the std.digest code where it *calls* endian functions (or makes endian assumptions). There need be no reference to endian in std.digest... if code is pulling bytes from an int (ie, cast(byte*)) or something, just use ubyte[4] and index it instead if uint, etc. I'm surprised that digest code would use anything other than byte buffers. It may be that there are some optimised version()-ed fast-paths might be endian conscious, but the default path has no reason to not work.

That's not how hash algorithms are usually specified. These algorithms perform bit rotate operations, additions, multiplications on these values*. You could probably implement these on byte[4] values instead, but you'll waste time porting the algorithm, benchmarking possible performance impacts and it will be more difficult to compare the implementation to the reference implementation (think of audits).

So it's not realistic to change this.

* An interesting question here is if you could actually always ignore system endianess and do simple casts when cleverly adjusting all constants in the algorithm to fit?
-- 
Johannes