Vote for the new std.hash (oops, std.digest) (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Vote for the new std.hash (oops, std.digest) (page 3)

August 26, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Johannes Pfau
in reply to Johannes Pfau

Johannes Pfau

Posted in reply to Johannes Pfau

Am Sun, 26 Aug 2012 15:43:48 +0200
schrieb Johannes Pfau <nospam@example.com>:

> Am Sun, 26 Aug 2012 13:19:35 +0300
> schrieb Manu <turkeyman@gmail.com>:
> 
> > Looks good, though one thing annoys me as always throughout the D docs, liberal use of auto can make them very difficult to understand.
> > 
> > auto result = hash.finish();
> > 
> > >From the examples where this appears, I have absolutely no idea what
> > 'result' could possibly be and what I can do with it, and you're forcing me to go and dig further for that information (waste of time). Surely there would be no harm in just writing the type there for clarity?
> > 
> > <rant>
> > I'd personally like to see auto abolished, or at least heavily
> > discouraged for the sake of clarity from code examples throughout
> > the docs. I'm constantly having to chase up what auto's may resolve
> > to when reading examples >_<
> > You may argue this demonstrated un-idiomatic code, and my trouble is
> > due to inexperience; I ask, who is most likely to be reading docs?
> > 
> 
> OK, I'll fix this if std.digest is accepted into phobos.

Seems some people would rather want to keep auto in the examples. I'm not sure what to do about this. I never used auto return types, so the return type can be looked up in the documentation but it's of course not obvious by just reading the examples.

August 26, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by nazriel
in reply to Johannes Pfau

nazriel

Posted in reply to Johannes Pfau

Yes.

----
Yea, I agree that docs shouldn't use auto. It's real pain in ...
I am fine with auto in code examples but in function/method description the returned type should be clearly pointed out.

August 26, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Dmitry Olshansky
in reply to Andrei Alexandrescu

Dmitry Olshansky

Posted in reply to Andrei Alexandrescu

On 26-Aug-12 19:11, Andrei Alexandrescu wrote:> On 8/22/12 8:36 AM, Dmitry Olshansky wrote:
>> The discussion around new unified API for digest hash functions has
>> subdued, just in time as the review period has ended.
>>
>> The voting for std.digest package starts today and ends on 29 August.
>>
>> Rules are simple: reply in this thread with definite "YES" or "NO" on
>> whether this package should go into Phobos. Including descriptive short
>> notes is appreciated (esp. the NO ones).
>
> I couldn't make time for a review during the window, apologies.
> Nevertheless I'd like to make a pass in hope that a few issues can be
> fixed before the code is publicly released.
>
Well I don't think it'll make any harm to see what issues you have found that need to be addressed. If they prove to be way too much we may consider prolonging review/postponing voting. I'll reply to some points.

> * No need for start() after default construction in the documentation
> example of CRC32. BTW the documentation of start() suggests that it
> _must_ be called even right after default construction, is that the case?
>
I believe it is. FWIW that's how OpenSSL and it's ilk work and people thought that being able to wrap it cleanly would good idea. And there is Q of being able to do initialization at C-T, say if wrapping OS primitives (like Win32 Crypto Provider).

> * In the template interface, no need for void peek(ref ...). Returning
> by value is fine due to the RVO. Same about finish() - just keep the
> functions returning by value.
>
> * This example:
>
> copy(oneMillionRange, &ctx); //Note: You must pass a pointer to copy!
>
> suggests we're doing something wrong. I think a better solution would be
> to have copy() take the target range by "auto ref", and institute this
> passing convention as a general rule for output ranges.
>
This one actually is the most important (IMO).
Namely the suggested pattern is error prone and needs to get fixed. Interestingly though, some array code may break because it never modified passed slice (and it would now).

> * s/digestType/DigestType/
>
> * s/isDigestable/isDigestible/
>
> * s/startDigest/makeDigest/ - but then again I wonder if requiring
> start() after default construction is needed.
>
> * s/asArray/asStaticArray/. Better yet, we should include this primitive
> in std.conv. Then we get to use to!(ubyte[16])(dynarray).
>
Better yet there should be (there is ?) a better way to overload to! template.

> * The fact that digests aren't unique is trivial by the pigeonhole
> principle, and should not be listed as a bug (the BUGS: tag implies bugs
> in the implementation). It's more interesting if a document can be
> altered to produce a specific digest. If that's what you meant, include
> a link to the appropriate work.

I do suspect it's the latter as it is a well known issue that MD5 is reversible in reasonable amount of time on modern PCs. See rainbow tables.

> * I don't think the aliases for WrapperDigest!Xyz should be defined.
> Just call that Digest!Xyz and have people use it. It's about as short as
> the aliases.

Actually +1. I'm just wondering if there is something that'll make people do class version but not the struct one (bypassing the wrapper).

> I vote for inclusion whether or not the points above are addressed, but
> of course it would be great if they were minded before release.
> Apologies for the late review.


-- 
Olshansky Dmitry

August 26, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Bernard Helyer
in reply to nazriel

Bernard Helyer

Posted in reply to nazriel

> Yea, I agree that docs shouldn't use auto. It's real pain in ...
> I am fine with auto in code examples but in function/method description the returned type should be clearly pointed out.

I personally only try to use auto where the type is unambiguous.
For example, if I'm calling a function in the same module, I'll
still write the type out -- a little work for more clarity.

Where auto shines of course is redundant type specification.

August 26, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Manu
in reply to Johannes Pfau

Manu

Posted in reply to Johannes Pfau

Attachments:

text/html part

On 26 August 2012 16:43, Johannes Pfau <nospam@example.com> wrote:

> Am Sun, 26 Aug 2012 13:19:35 +0300
> schrieb Manu <turkeyman@gmail.com>:
>
> > Looks good, though one thing annoys me as always throughout the D docs, liberal use of auto can make them very difficult to understand.
> >
> > auto result = hash.finish();
> >
> > >From the examples where this appears, I have absolutely no idea what
> > 'result' could possibly be and what I can do with it, and you're forcing me to go and dig further for that information (waste of time). Surely there would be no harm in just writing the type there for clarity?
> >
> > <rant>
> > I'd personally like to see auto abolished, or at least heavily
> > discouraged for the sake of clarity from code examples throughout the
> > docs. I'm constantly having to chase up what auto's may resolve to
> > when reading examples >_<
> > You may argue this demonstrated un-idiomatic code, and my trouble is
> > due to inexperience; I ask, who is most likely to be reading docs?
> >
>
> OK, I'll fix this if std.digest is accepted into phobos.
>
> However, in this example auto is not just used for convenience, it has an actual use case:
>
> As documented here
>
> http://dl.dropbox.com/u/24218791/d/phobos/std_digest_digest.html#ExampleDigest
> (finish method) the type returned by finish can differ between different
> hash implementations. It's always a static array of ubyte, but for crc
> it's ubyte[4] for sha1 it's ubyte[20] and for md5 it's ubyte[16]. So
> in templated methods, auto is more generic than hardcoding a type. The
> equivalent to auto is using digestType!Digest, so
>
> auto result = hash.finish();
> digestType!(typeof(hash)) result = hash.finish();
>
> are equivalent. But
>
> ubyte[4] result = hash.finish();
>
> does only work if hash is a CRC32 digest. It does not work for md5, sha1 digests, etc.
>
> But I agree that examples are easier to read without auto, so I'll change them to use the explicit type where possible.
>

Case in point, look how much information you just gave us that is
completely obscured by 'auto' :)
My suggestion, simplify the example, or, at least the example associated
with the direct function documentation:

 auto md5   = digest!MD5(  "The quick brown fox jumps over the lazy dog");
 auto sha1  = digest!SHA1( "The quick brown fox jumps over the lazy dog");
 auto crc32 = digest!CRC32("The quick brown fox jumps over the lazy dog");

Those autos can be ubyte[n].
Though that said, looking at the API doc for finish(), it appears to return
a dynamic array, not a static array as you say... so ubyte[] could be used
everywhere, and that would be nice and clear?

August 26, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Andrei Alexandrescu
in reply to Johannes Pfau

Andrei Alexandrescu

Posted in reply to Johannes Pfau

On 8/26/12 1:32 PM, Johannes Pfau wrote:
> Seems some people would rather want to keep auto in the examples. I'm
> not sure what to do about this. I never used auto return types, so the
> return type can be looked up in the documentation but it's of course
> not obvious by just reading the examples.

After looking through the docs, it seems examples are in good shape as they are.

Andrei

August 27, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Manu
in reply to Andrei Alexandrescu

Manu

Posted in reply to Andrei Alexandrescu

Attachments:

text/html part

On 27 August 2012 02:06, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>wrote:

> On 8/26/12 1:32 PM, Johannes Pfau wrote:
>
>> Seems some people would rather want to keep auto in the examples. I'm not sure what to do about this. I never used auto return types, so the return type can be looked up in the documentation but it's of course not obvious by just reading the examples.
>>
>
> After looking through the docs, it seems examples are in good shape as they are.


Assuming it's clear at a glance that it returns byte[]s. Also, are they binary, hex string? base32? :)

August 27, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Johannes Pfau
in reply to d_follower

Johannes Pfau

Posted in reply to d_follower

Am Fri, 24 Aug 2012 17:49:00 +0200
schrieb "d_follower" <d_follower@fakemail.com>:

> 
> But I'd like to see at least one algorithm being implemented (e.g. CRC32 which currently works with CTFE) as a proof of concept (and a unit-test).

Proof of concept:
http://dpaste.dzfl.pl/b14e8aad

As you can see the templated digest and toHexString functions already work fine in CTFE. Please note that it can only work with data which can be converted to ubyte[] in CTFE (so char[] works, but wchar[] probably not).

The changes for CRC32 where quite simple, replacing "nativeToLittleEndian" with version(BigEndian) and bswap and using the getByte function to convert from uint to ubyte[4].

I won't include this code now though, as the review is almost finished. I'll post an additional pull request instead. However, I also have to be sure what getByte does is actually supported and not just some oversight in CTFE.

August 27, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Johannes Pfau
in reply to Manu

Johannes Pfau

Posted in reply to Manu

Am Mon, 27 Aug 2012 01:49:12 +0300
schrieb Manu <turkeyman@gmail.com>:

> Though that said, looking at the API doc for finish(), it appears to return a dynamic array, not a static array as you say... so ubyte[] could be used everywhere, and that would be nice and clear?
> 

That's the Digest / OOP APIs finish method. As all finish methods in the OOP API must have the same return type so ubyte[] is used. The drawback is that the data is allocated using the GC unless a buffer was explicitly passed to finish.

The template API uses ubyte[n] and can therefore return the value using stack space and no allocations.

I made another pass through the docs and replaced auto where possible. I also added another simple example.

August 27, 2012

Re: Vote for the new std.hash (oops, std.digest)

Posted by Johannes Pfau
in reply to Dmitry Olshansky

Johannes Pfau

Posted in reply to Dmitry Olshansky

Am Sun, 26 Aug 2012 23:57:24 +0400
schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> On 26-Aug-12 19:11, Andrei Alexandrescu wrote:> On 8/22/12 8:36 AM,
> Dmitry Olshansky wrote:
>  >> The discussion around new unified API for digest hash functions
>  >> has subdued, just in time as the review period has ended.
>  >>
>  >> The voting for std.digest package starts today and ends on 29
>  >> August.
>  >>
>  >> Rules are simple: reply in this thread with definite "YES" or
>  >> "NO" on whether this package should go into Phobos. Including
>  >> descriptive short notes is appreciated (esp. the NO ones).
>  >
>  > I couldn't make time for a review during the window, apologies.
>  > Nevertheless I'd like to make a pass in hope that a few issues can
>  > be fixed before the code is publicly released.
>  >
> Well I don't think it'll make any harm to see what issues you have
> found that need to be addressed. If they prove to be way too much we
> may consider prolonging review/postponing voting. I'll reply to some
> points.
> 
>  > * No need for start() after default construction in the
>  > documentation example of CRC32. BTW the documentation of start()
>  > suggests that it _must_ be called even right after default
>  > construction, is that the case?
>  >
> I believe it is. FWIW that's how OpenSSL and it's ilk work and people
> thought that being able to wrap it cleanly would good idea. And there
> is Q of being able to do initialization at C-T, say if wrapping OS
> primitives (like Win32 Crypto Provider).

Yes, generic code should always call start for the reasons mentioned above. Examples using the CRC32, MD5 and SHA1 digests directly could avoid the call though.

I'll also document that the current implementations don't need a call to start after default construction.
> 
>  > * In the template interface, no need for void peek(ref ...).
>  > Returning by value is fine due to the RVO. Same about finish() -
>  > just keep the functions returning by value.

OK, that makes sense. As it also makes implementing a digest easier, I'm all for it.

>  >
>  > * This example:
>  >
>  > copy(oneMillionRange, &ctx); //Note: You must pass a pointer to
>  > copy!
>  >
>  > suggests we're doing something wrong. I think a better solution
>  > would be to have copy() take the target range by "auto ref", and
>  > institute this passing convention as a general rule for output
>  > ranges.
>  >
> This one actually is the most important (IMO).
> Namely the suggested pattern is error prone and needs to get fixed.
> Interestingly though, some array code may break because it never
> modified passed slice (and it would now).

I agree this should be fixed in copy. Should we special case arrays
then?
BTW: This does work as well, but it's still error-prone:
ctx = copy(oneMillionRange, ctx);
> 
>  > * s/digestType/DigestType/

OK
>  > * s/isDigestable/isDigestible/

OK
>  > * s/startDigest/makeDigest/ - but then again I wonder if requiring
>  > start() after default construction is needed.

OK. As makeDigest is meant to work correctly for all digests a call to start is needed (see above, for example OpenSSL wrappers need this)

>  > * s/asArray/asStaticArray/. Better yet, we should include this
>  > primitive in std.conv. Then we get to use to!(ubyte[16])(dynarray).
>  >
> Better yet there should be (there is ?) a better way to overload to!
> template.

I just removed asArray from the documentation for now. I'll probably
file a pull request to merge it into std.conv.
Currently the code just asserts in debug mode to make sure the dynamic
array is big enough. Would I have to use enforce to get this merged
into std.conv.to?

BTW: Is the asArray behavior really what'd be expected from
to!(ubyte[16])(dynarray)?

asArray does explicitly not make a copy, so changing the returned static array will change the dynamic array's content.

>  > * The fact that digests aren't unique is trivial by the pigeonhole
>  > principle, and should not be listed as a bug (the BUGS: tag
>  > implies bugs in the implementation). It's more interesting if a
>  > document can be altered to produce a specific digest. If that's
>  > what you meant, include a link to the appropriate work.
> 
> I do suspect it's the latter as it is a well known issue that MD5 is reversible in reasonable amount of time on modern PCs. See rainbow tables.

That was part of the original std.md5 documentation and I just left it
in. To be honest I'm not sure what was meant by that statement. I
always thought it referred to hash collisions, but as you said those
occur for every digest.
So I think you're right and it's better to remove this from the BUGS
section.

> 
>  > * I don't think the aliases for WrapperDigest!Xyz should be
>  > defined. Just call that Digest!Xyz and have people use it. It's
>  > about as short as the aliases.
> 
> Actually +1. I'm just wondering if there is something that'll make people do class version but not the struct one (bypassing the wrapper).

The interface is already called Digest ;-)
I'm not sure, I'd rather leave things as they are. There might be
reasons not to use the wrapper template for some digests (e.g. special
functionality not covered by the wrapper)

> 
>  > I vote for inclusion whether or not the points above are
>  > addressed, but of course it would be great if they were minded
>  > before release. Apologies for the late review.
> 
> 

Thanks for the review, I hope I addressed most of your points.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation