Jump to page: 1 2
Thread overview
Feature Request: Hashed Based Assertion
Nov 26, 2015
tcak
Nov 26, 2015
tcak
Nov 26, 2015
Andrea Fontana
Nov 26, 2015
tcak
Nov 26, 2015
Jacob Carlborg
Nov 26, 2015
qznc
Nov 26, 2015
Idan Arye
Nov 26, 2015
bitwise
Nov 27, 2015
deadalnix
Nov 27, 2015
tcak
Nov 27, 2015
bitwise
Nov 27, 2015
tcak
Nov 27, 2015
bitwise
Nov 27, 2015
tcak
Nov 27, 2015
qznc
Nov 27, 2015
deadalnix
Nov 27, 2015
Nordlöw
November 26, 2015
I brought this topic in "Learn" a while ago, but I want to talk about it again.

You are in a big team or working with a big code base. APIs are being defined/modified, configuration constants are defined/modified, structures are defined/modified for data.

You are coding on business logic side, and relying everything based on current APIs, configuration, and data structures. A part of codes have been updated on API side, but you are not aware of it, or time has passed, and you assume that your code will work properly. Nobody would be checking every single part of business logic line by line.

On runtime, you will get unexpected results, and lose some hair till finding where the problem is. Also finding expected results on a long running processes would cause much more trouble.

---

What I do currently is that: I calculate the hash of API code (function, configuration, etc together) with a hash function, and store it where the API is defined as a constant.

public enum HASH_OF_THIS_API = 0x1234;

// Hash is calculated from here
public void my_api_function(){}

public enum my_api_constant = 5;
// till here

Then wherever I use that API, I insert a "static assert( HASH_OF_THIS_API == 0x1234 );".

Whoever modifies the API, after the modification, calculates the most recent code's hash value and updates the constant. This allows compiler to warn the business logic programmer about changes on API codes. So, changing parts can be reviewed and changes are made if required.

---

The feature request part comes here: It is possible that API programmer forgets to update the hash value in the code. Also, comments in the code shouldn't affect the hash value. Automation is required on compile-time, so the compiler automatically calculates the hash value of code, and it can be read on compile-time. Hence, no constant is required to store the hash value.

What is needed is to be able to bind a hash value to any block with a name.

November 26, 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:
> I brought this topic in "Learn" a while ago, but I want to talk about it again.
>
> [...]

One applicable solution: __traits( hashOf, apiFunctionName/structName/variableName/className )
November 26, 2015
On Thursday, 26 November 2015 at 11:14:54 UTC, tcak wrote:
> On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:
>> I brought this topic in "Learn" a while ago, but I want to talk about it again.
>>
>> [...]
>
> One applicable solution: __traits( hashOf, apiFunctionName/structName/variableName/className )

Can't you calculate hash of involved files at compile time?
November 26, 2015
On Thursday, 26 November 2015 at 11:18:19 UTC, Andrea Fontana wrote:
> On Thursday, 26 November 2015 at 11:14:54 UTC, tcak wrote:
>> On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:
>>> I brought this topic in "Learn" a while ago, but I want to talk about it again.
>>>
>>> [...]
>>
>> One applicable solution: __traits( hashOf, apiFunctionName/structName/variableName/className )
>
> Can't you calculate hash of involved files at compile time?

One file can consist of many API functions. If there are 50 functions in it, and only 1 of them has been modified, whole hash will change. Compiler cannot tell which API has been changed then. Purpose is to decrease the burden on programmer, and put it onto compiler.
November 26, 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:
> I brought this topic in "Learn" a while ago, but I want to talk about it again.
>
> You are in a big team or working with a big code base. APIs are being defined/modified, configuration constants are defined/modified, structures are defined/modified for data.
>
> You are coding on business logic side, and relying everything based on current APIs, configuration, and data structures. A part of codes have been updated on API side, but you are not aware of it, or time has passed, and you assume that your code will work properly. Nobody would be checking every single part of business logic line by line.

This is the job of the type checker, isn't it? What would a hash provide that a type checker does not?

November 26, 2015
On 2015-11-26 12:24, tcak wrote:

> One file can consist of many API functions. If there are 50 functions in
> it, and only 1 of them has been modified, whole hash will change.
> Compiler cannot tell which API has been changed then. Purpose is to
> decrease the burden on programmer, and put it onto compiler.

With a complete D front end working at compile time it would at least be possible in theory.

-- 
/Jacob Carlborg
November 26, 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:
> I brought this topic in "Learn" a while ago, but I want to talk about it again.
>
> [...]

So it's not just the function's signature you want to hash, but it's code as well? What about functions called from the API function? Or functions that set data that'll later be used by the API functions?

If anything, I would have hashed the unittests of the API function. If the behavior of the API function changes in a fashion that requires a modification of the unittest, then you might need to alert the business logic programmers. Anything less than that is just useless noise that'll hide the actual changes you want to be warned about among the endless clutter created by trivial changes.
November 26, 2015
On Thursday, 26 November 2015 at 11:12:07 UTC, tcak wrote:
> I brought this topic in "Learn" a while ago, but I want to talk about it again.
>
> You are in a big team or working with a big code base. APIs are being defined/modified, configuration constants are defined/modified, structures are defined/modified for data.
>
> You are coding on business logic side, and relying everything based on current APIs, configuration, and data structures. A part of codes have been updated on API side, but you are not aware of it, or time has passed, and you assume that your code will work properly. Nobody would be checking every single part of business logic line by line.
>
> On runtime, you will get unexpected results, and lose some hair till finding where the problem is. Also finding expected results on a long running processes would cause much more trouble.
>
> ---
>
> What I do currently is that: I calculate the hash of API code (function, configuration, etc together) with a hash function, and store it where the API is defined as a constant.
>
> public enum HASH_OF_THIS_API = 0x1234;
>
> // Hash is calculated from here
> public void my_api_function(){}
>
> public enum my_api_constant = 5;
> // till here
>
> Then wherever I use that API, I insert a "static assert( HASH_OF_THIS_API == 0x1234 );".
>
> Whoever modifies the API, after the modification, calculates the most recent code's hash value and updates the constant. This allows compiler to warn the business logic programmer about changes on API codes. So, changing parts can be reviewed and changes are made if required.
>
> ---
>
> The feature request part comes here: It is possible that API programmer forgets to update the hash value in the code. Also, comments in the code shouldn't affect the hash value. Automation is required on compile-time, so the compiler automatically calculates the hash value of code, and it can be read on compile-time. Hence, no constant is required to store the hash value.
>
> What is needed is to be able to bind a hash value to any block with a name.

I'm wondering if a diff tool could be somehow combined with a parser to create a list of functions/symbols which may have experienced behavioural changes between versions of dmd. What I'm suggesting is a diff tool which is aware of a symbol's dependancies so that even if a function body wasn't changed, its dependant symbols could be checked as well.

If such a tool existed, it could be ran against each new release of dmd, and produce a comma separated list of functions that may have experienced behavioural changes. With that list in hand, one could then simply grep for each symbol in their own repository each time they upgrade dmd.

I hearby place this idea in the public domain ;)

   Bit
November 27, 2015
I see many solution here that do not require any language change. To start, have a linter yell at the programmer when (s)he submit a diff. Dev commit directly ? What the fuck are you doing ? Do code review and get a linter.

Alternatively, generate a di file and hash it. You can have a bot do it and commit with a commit hook.

DMD can dump infos about the program in json format. hash this and run with it.

You may also change your strategy in term of source control: https://www.youtube.com/watch?v=W71BTkUbdqE . Unified source code aleviate completely these kind of issues to boot.

November 27, 2015
On Friday, 27 November 2015 at 05:33:52 UTC, deadalnix wrote:
> I see many solution here that do not require any language change. To start, have a linter yell at the programmer when (s)he submit a diff. Dev commit directly ? What the fuck are you doing ? Do code review and get a linter.
>
> Alternatively, generate a di file and hash it. You can have a bot do it and commit with a commit hook.
>
> DMD can dump infos about the program in json format. hash this and run with it.
>
> You may also change your strategy in term of source control: https://www.youtube.com/watch?v=W71BTkUbdqE . Unified source code aleviate completely these kind of issues to boot.

Not one thing in your solutions give any simple solution like:

static assert( __traits( hashOf, std.file.read ) == 0x1234, "They have changed implementation again." );

static assert( __traits( hashOf, facebook.apis.addUser ) == 0x5543, "Check API documentation again for addUser." );



di file wouldn't work. It doesn't contain implementation code. Also, all APIs are in it. We need specific hash for each API, so it doesn't take long time to find where the problem is.

JSON is same as di. No difference.


Yours are not helping, making everything more complex.
« First   ‹ Prev
1 2