Early std.crypto - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Early std.crypto

Thread overview

Early std.crypto
Oct 25, 2011 Piotr Szturmaj
Oct 25, 2011 Martin Nowak
Oct 25, 2011 Piotr Szturmaj
Oct 25, 2011 Martin Nowak
Oct 25, 2011 Piotr Szturmaj
Oct 25, 2011 Brad Roberts
Oct 25, 2011 Piotr Szturmaj
Oct 25, 2011 Walter Bright
Oct 25, 2011 Piotr Szturmaj
Oct 25, 2011 Walter Bright
Oct 25, 2011 Jonathan M Davis
Oct 26, 2011 Walter Bright
Oct 25, 2011 Brad Anderson
Oct 25, 2011 Andrej Mitrovic
Oct 26, 2011 Piotr Szturmaj
Oct 26, 2011 Walter Bright
Oct 26, 2011 Steve Teale
Oct 26, 2011 Jonathan M Davis
Oct 26, 2011 Steve Teale
Oct 26, 2011 Jonathan M Davis
Oct 26, 2011 Steve Teale
Oct 26, 2011 Jonathan M Davis
Oct 26, 2011 Dmitry Olshansky
Oct 25, 2011 Jonathan M Davis
Oct 29, 2011 bcs
Nov 04, 2011 Piotr Szturmaj
Nov 05, 2011 bcs
Nov 05, 2011 Walter Bright
Nov 20, 2011 Piotr Szturmaj
Nov 22, 2011 bcs
Nov 22, 2011 Piotr Szturmaj
Nov 22, 2011 Regan Heath
Nov 26, 2011 bcs
Nov 27, 2011 Brad Anderson
Nov 27, 2011 bcs
Nov 27, 2011 Jude Young
Nov 27, 2011 Piotr Szturmaj
Nov 27, 2011 bcs
Nov 27, 2011 Brad Anderson
Nov 27, 2011 bcs

October 25, 2011

Early std.crypto

Posted by Piotr Szturmaj

Piotr Szturmaj

https://github.com/pszturmaj/phobos/tree/master/std/crypto

This is some early work on std.crypto proposal. Currently only MD5, HMAC and all SHA family functions (excluding SHA0 which is very old, broken and no longer in use). I plan to add other crypto primitives later.

I know about one SHA1 pull request optimized for SSSE3. I think native code must be there to support other non x86 CPUs and SIMD optimization may be added at any time later.

Any opinions are welcome. Especially if such design is good or bad, and what needs to be changed.

Thanks :)

October 25, 2011

Re: Early std.crypto

Posted by Martin Nowak
in reply to Piotr Szturmaj

Martin Nowak

Posted in reply to Piotr Szturmaj

On Tue, 25 Oct 2011 02:10:49 +0200, Piotr Szturmaj <bncrbme@jadamspam.pl> wrote:

> https://github.com/pszturmaj/phobos/tree/master/std/crypto
>
> This is some early work on std.crypto proposal. Currently only MD5, HMAC and all SHA family functions (excluding SHA0 which is very old, broken and no longer in use). I plan to add other crypto primitives later.
>
> I know about one SHA1 pull request optimized for SSSE3. I think native code must be there to support other non x86 CPUs and SIMD optimization may be added at any time later.
>
> Any opinions are welcome. Especially if such design is good or bad, and what needs to be changed.
>
> Thanks :)

Great to push this a little.

I have to say though that I like the current struct based interface
much better.

struct Hash
{
    // enhanced by some compile time traits
    enum hashLength  = 16;
    enum blockLength =  0;

    // three interface functions
    void start();
    void update(const(ubyte)[] data);
    void finish(ref ubyte[hashLength] digest);
}

You wouldn't need the save, restore functions.
Some unnecessary allocations could go away.
Most important instances would have less mutable state.

You could probably parameterize a Merkle Damgård base with free
functions for the transformation.

A dynamic interface can be obtaines by templated instances similar to what std.range does.

October 25, 2011

Re: Early std.crypto

Posted by Piotr Szturmaj
in reply to Martin Nowak

Piotr Szturmaj

Posted in reply to Martin Nowak

Martin Nowak wrote:
> I have to say though that I like the current struct based interface
> much better.
>
> struct Hash
> {
> // enhanced by some compile time traits
> enum hashLength = 16;
> enum blockLength = 0;

The reason why hash and block length are runtime variables is that some hash functions are parametrized with variables of great amplitude, for example CubeHash may have any number of rounds, and any size of block and hash output.

> // three interface functions
> void start();
> void update(const(ubyte)[] data);
> void finish(ref ubyte[hashLength] digest);
> }

There, it is:

reset();
put();
finish();

The put() function makes hash implementation an OutputRange.

> You wouldn't need the save, restore functions.

They're not needed. They only serve as speed optimization when hashing many messages which have the same beginning block. This is used in HMAC, which is:

HMAC(func, key, message) = func(key ^ opad, func(key ^ ipad, message));

when func supports saving the IV, the first parts are precomputed, when not HMAC resorts to full hashing. This optimization is also mentioned in HMAC spec.

> Some unnecessary allocations could go away.
> Most important instances would have less mutable state.

Could you specify which ones, please?

> You could probably parameterize a Merkle Damgård base with free
> functions for the transformation.

What would be the difference from current class parametrization?

> A dynamic interface can be obtaines by templated instances similar to
> what std.range does.

Could you elaborate? I don't know exactly what do you mean. Function templates?

Thanks a lot!

October 25, 2011

Re: Early std.crypto

Posted by Martin Nowak
in reply to Piotr Szturmaj

Martin Nowak

Posted in reply to Piotr Szturmaj

On Tue, 25 Oct 2011 09:43:48 +0200, Piotr Szturmaj <bncrbme@jadamspam.pl> wrote:

> Martin Nowak wrote:
>> I have to say though that I like the current struct based interface
>> much better.
>>
>> struct Hash
>> {
>> // enhanced by some compile time traits
>> enum hashLength = 16;
>> enum blockLength = 0;
>
> The reason why hash and block length are runtime variables is that some hash functions are parametrized with variables of great amplitude, for example CubeHash may have any number of rounds, and any size of block and hash output.
>
>> // three interface functions
>> void start();
>> void update(const(ubyte)[] data);
>> void finish(ref ubyte[hashLength] digest);
>> }
>
> There, it is:
>
> reset();
> put();
> finish();
>
Reset does two different things depending on the internal state. Not so good.

> The put() function makes hash implementation an OutputRange.
>
>> You wouldn't need the save, restore functions.
>
> They're not needed. They only serve as speed optimization when hashing many messages which have the same beginning block. This is used in HMAC, which is:
>
> HMAC(func, key, message) = func(key ^ opad, func(key ^ ipad, message));
>
> when func supports saving the IV, the first parts are precomputed, when not HMAC resorts to full hashing. This optimization is also mentioned in HMAC spec.
>
If hash contexts were value type you could simply do.
auto saved = hash_ctx;
Or alternatively one could add a 'save()' function to an isSaveableHash(H)
concept.

>> Some unnecessary allocations could go away.
>> Most important instances would have less mutable state.
>
> Could you specify which ones, please?
>
Basically every 'new' in std.hash.crypto.base but especially the ones in hash(T) and hashToHex(T).

>> You could probably parameterize a Merkle Damgård base with free
>> functions for the transformation.
>
> What would be the difference from current class parametrization?
>
Just wanted to point out a specific alternative if code reuse is of a concern.
If not using classes you need a way to inject the transformation which could
be done like.
alias MerkleDamgard!(uint, 5, 80, 16, 20, sha1Transform) SHA1;

>> A dynamic interface can be obtaines by templated instances similar to
>> what std.range does.
>
> Could you elaborate? I don't know exactly what do you mean. Function templates?
>
http://www.digitalmars.com/d/2.0/phobos/std_range.html#InputRange
DynamicAllocatorTemplate at https://github.com/dsimcha/TempAlloc/blob/master/std/allocators/allocator.d

These are to support cases where you either want a stable ABI or
have a template firewall for scalability issues (e.g. could be sensible for the HMAC implementation although not really necessary).

> Thanks a lot!

October 25, 2011

Re: Early std.crypto

Posted by Brad Roberts
in reply to Piotr Szturmaj

Brad Roberts

Posted in reply to Piotr Szturmaj

On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
> https://github.com/pszturmaj/phobos/tree/master/std/crypto
> 
> This is some early work on std.crypto proposal. Currently only MD5, HMAC and all SHA family functions (excluding SHA0 which is very old, broken and no longer in use). I plan to add other crypto primitives later.
> 
> I know about one SHA1 pull request optimized for SSSE3. I think native code must be there to support other non x86 CPUs and SIMD optimization may be added at any time later.
> 
> Any opinions are welcome. Especially if such design is good or bad, and what needs to be changed.
> 
> Thanks :)

A key element to a lot of crypto code is speed.  I really don't think we want to re-invent all the optimizations on all the platforms.  To that end, I really suggest that we stick to wrapping existing implementations, like openssl.  While I hate the openssl apis, I do respect the continual effort that various companies invest in optimizing the code.

My 2 cents,
Brad

October 25, 2011

Re: Early std.crypto

Posted by Piotr Szturmaj
in reply to Brad Roberts

Piotr Szturmaj

Posted in reply to Brad Roberts

Brad Roberts wrote:
> On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
>> https://github.com/pszturmaj/phobos/tree/master/std/crypto
>>
>> This is some early work on std.crypto proposal. Currently only MD5, HMAC and all SHA family functions (excluding SHA0
>> which is very old, broken and no longer in use). I plan to add other crypto primitives later.
>>
>> I know about one SHA1 pull request optimized for SSSE3. I think native code must be there to support other non x86 CPUs
>> and SIMD optimization may be added at any time later.
>>
>> Any opinions are welcome. Especially if such design is good or bad, and what needs to be changed.
>>
>> Thanks :)
>
> A key element to a lot of crypto code is speed.  I really don't think we want to re-invent all the optimizations on all
> the platforms.  To that end, I really suggest that we stick to wrapping existing implementations, like openssl.  While I
> hate the openssl apis, I do respect the continual effort that various companies invest in optimizing the code.

You are of course right about speed but there are some reasons for having our own code _if_ we want std.crypto:

1. Phobos independence
2. Non D friendly API of openssl
3. No need to link with openssl to compute a simple hash.
4. Licensing

We also have some options for speed improvement, while retaining our API

1. Wrap openssl _on demand_, this is transparent to the user, API doesn't change
2. Many (if not all) openssl asm code may be obtained using CRYPTOGAMS license (BSD). But reading http://www.openssl.org/~appro/cryptogams/ suggests that author might be willing to license it under Boost.
3. Adapt Crypto++ asm code which is public domain (x86/64 only)

The 1st one should be easy, and user would have choice between openssl wrapped within std.crypto or direct access to etc.c.openssl.

October 25, 2011

Re: Early std.crypto

Posted by Walter Bright
in reply to Piotr Szturmaj

Walter Bright

Posted in reply to Piotr Szturmaj

On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
> https://github.com/pszturmaj/phobos/tree/master/std/crypto
>
> This is some early work on std.crypto proposal. Currently only MD5, HMAC and all
> SHA family functions (excluding SHA0 which is very old, broken and no longer in
> use). I plan to add other crypto primitives later.
>
> I know about one SHA1 pull request optimized for SSSE3. I think native code must
> be there to support other non x86 CPUs and SIMD optimization may be added at any
> time later.
>
> Any opinions are welcome. Especially if such design is good or bad, and what
> needs to be changed.

Thanks for championing this.

The input to the functions should be a range, not an array (although an array is a range).

In general, for Phobos, all arbitrary input data should be in the form of ranges, and all arbitrary output data should present itself as a range. This facilitates the idea of:

    range => algorithm => range

So, for example, I want to encrypt and then zip a file and send the output to a socket:

    file => encrypt => compress => socket

All the components here will just "snap" together. With the existing design of crypto, I'd have to read the file into an array, then pass the array to encrypt, etc.

Think of it like the filter concept in Unix that has been so successful.

October 25, 2011

Re: Early std.crypto

Posted by Piotr Szturmaj
in reply to Martin Nowak

Piotr Szturmaj

Posted in reply to Martin Nowak

Martin Nowak wrote:
> On Tue, 25 Oct 2011 09:43:48 +0200, Piotr Szturmaj
> <bncrbme@jadamspam.pl> wrote:
>
>> Martin Nowak wrote:
>>> I have to say though that I like the current struct based interface
>>> much better.
>>>
>>> struct Hash
>>> {
>>> // enhanced by some compile time traits
>>> enum hashLength = 16;
>>> enum blockLength = 0;
>>
>> The reason why hash and block length are runtime variables is that
>> some hash functions are parametrized with variables of great
>> amplitude, for example CubeHash may have any number of rounds, and any
>> size of block and hash output.
>>
>>> // three interface functions
>>> void start();
>>> void update(const(ubyte)[] data);
>>> void finish(ref ubyte[hashLength] digest);
>>> }
>>
>> There, it is:
>>
>> reset();
>> put();
>> finish();
>>
> Reset does two different things depending on the internal state. Not so
> good.

I think that's negligible, but it may be "unbranched" easily.

>> The put() function makes hash implementation an OutputRange.
>>
>>> You wouldn't need the save, restore functions.
>>
>> They're not needed. They only serve as speed optimization when hashing
>> many messages which have the same beginning block. This is used in
>> HMAC, which is:
>>
>> HMAC(func, key, message) = func(key ^ opad, func(key ^ ipad, message));
>>
>> when func supports saving the IV, the first parts are precomputed,
>> when not HMAC resorts to full hashing. This optimization is also
>> mentioned in HMAC spec.
>>
> If hash contexts were value type you could simply do.
> auto saved = hash_ctx;
> Or alternatively one could add a 'save()' function to an isSaveableHash(H)
> concept.

Yes, I thought about that, but current way is faster because only initialization vector is saved, and api name emphasises it. General save() on SHA512 would need to copy IV and 80 ulongs (internal state).

>>> Some unnecessary allocations could go away.
>>> Most important instances would have less mutable state.
>>
>> Could you specify which ones, please?
>>
> Basically every 'new' in std.hash.crypto.base but especially the ones in
> hash(T) and hashToHex(T).

Yes, this will be fixed.

>>> You could probably parameterize a Merkle Damgård base with free
>>> functions for the transformation.
>>
>> What would be the difference from current class parametrization?
>>
> Just wanted to point out a specific alternative if code reuse is of a
> concern.
> If not using classes you need a way to inject the transformation which
> could
> be done like.
> alias MerkleDamgard!(uint, 5, 80, 16, 20, sha1Transform) SHA1;

Either way this function must be written. Either as free function or as class method. I don't think someone would use transformation function directly.

>>> A dynamic interface can be obtaines by templated instances similar to
>>> what std.range does.
>>
>> Could you elaborate? I don't know exactly what do you mean. Function
>> templates?
>>
> http://www.digitalmars.com/d/2.0/phobos/std_range.html#InputRange
> DynamicAllocatorTemplate at
> https://github.com/dsimcha/TempAlloc/blob/master/std/allocators/allocator.d
>
> These are to support cases where you either want a stable ABI or
> have a template firewall for scalability issues (e.g. could be sensible
> for the HMAC implementation although not really necessary).
>

I will look into it. Thanks!

October 25, 2011

Re: Early std.crypto

Posted by Piotr Szturmaj
in reply to Walter Bright

Piotr Szturmaj

Posted in reply to Walter Bright

Walter Bright wrote:
> On 10/24/2011 5:10 PM, Piotr Szturmaj wrote:
>> https://github.com/pszturmaj/phobos/tree/master/std/crypto
>>
>> This is some early work on std.crypto proposal. Currently only MD5,
>> HMAC and all
>> SHA family functions (excluding SHA0 which is very old, broken and no
>> longer in
>> use). I plan to add other crypto primitives later.
>>
>> I know about one SHA1 pull request optimized for SSSE3. I think native
>> code must
>> be there to support other non x86 CPUs and SIMD optimization may be
>> added at any
>> time later.
>>
>> Any opinions are welcome. Especially if such design is good or bad,
>> and what
>> needs to be changed.
>
>
> Thanks for championing this.
>
> The input to the functions should be a range, not an array (although an
> array is a range).
>
> In general, for Phobos, all arbitrary input data should be in the form
> of ranges, and all arbitrary output data should present itself as a
> range. This facilitates the idea of:
>
> range => algorithm => range
>
> So, for example, I want to encrypt and then zip a file and send the
> output to a socket:
>
> file => encrypt => compress => socket
>
> All the components here will just "snap" together. With the existing
> design of crypto, I'd have to read the file into an array, then pass the
> array to encrypt, etc.
>
> Think of it like the filter concept in Unix that has been so successful.
>

I share your opinion. I was thinking about such filter concept for std.crypto.cipher (TBD), but I will also try to convert current hash function code to ranges.

Thanks for pointing that out.

October 25, 2011

Re: Early std.crypto

Posted by Walter Bright
in reply to Piotr Szturmaj

Walter Bright

Posted in reply to Piotr Szturmaj

On 10/25/2011 3:40 PM, Piotr Szturmaj wrote:
> I share your opinion. I was thinking about such filter concept for
> std.crypto.cipher (TBD), but I will also try to convert current hash function
> code to ranges.
>
> Thanks for pointing that out.

Andrei and I have pretty much failed at articulating this vision for Phobos. We need to get our act together.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation