July 23, 2011
Am 24.07.2011 01:02, schrieb so:
> On Sun, 24 Jul 2011 01:47:32 +0300, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> and then either remove the .tmp if identical or rename it forcefully
>> to filename.di if different. That's a classic in code generation tools.
>
> Not sure i got it right but how renaming it forcefully would solve this?
> This must be two separate process for the compiler, issuing an error and
> if it was intended then the .di file must be generated.

Because if not done forcefully there may be an error because filename.di already exists.
July 23, 2011
On 7/23/11 6:10 PM, Daniel Gibson wrote:
> Am 24.07.2011 01:02, schrieb so:
>> On Sun, 24 Jul 2011 01:47:32 +0300, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> and then either remove the .tmp if identical or rename it forcefully
>>> to filename.di if different. That's a classic in code generation tools.
>>
>> Not sure i got it right but how renaming it forcefully would solve this?
>> This must be two separate process for the compiler, issuing an error and
>> if it was intended then the .di file must be generated.
>
> Because if not done forcefully there may be an error because filename.di
> already exists.

Exactly right.

Andrei
July 23, 2011
On Sun, 24 Jul 2011 01:47:32 +0300, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

>> Yes, neither of the above are "proper" solutions. But, unless I've lost
>> track of something, you're trying to justify a solid amount of work on
>> the compiler to implement the "proper" solution, when the above
>> alternatives are much simpler in practice. (If you have more
>> counter-arguments, I'd like to hear them.)
>
> I don't think at all these aren't proper. As I said, people are willing to do crazy things to keep large projects sane. The larger question here is how to solve a failure scenario, i.e. what can we offer that senior engineer who fixes the build when the .di files are not in sync anymore.

OK. I'm not going to ask your estimates of how probable this is to happen in practice, and if it justifies the implementation effort of going into this direction, as I think I've fully elaborated my point already.

One thing that I should have mentioned before, is one of my reasons for this argument: the design for implementing verification of manually-maintained .di files has been discussed and published by D's creators, so now someone just needs to go and implement it. However, there is no discussion or consensus about improving .di-file generation. Even choosing the name of whatever attribute gets DMD to copy function bodies to .di files would be a start.

>>> But wait, there's less. The programmers don't have the option of
>>> grouping method implementations in a hierarchy by functionality (which
>>> is common in visitation patterns - even dmd does so). They must define
>>> one class with everything in one place, and there's no way out of that.
>>
>> Sorry, I don't understand this part. Could you elaborate?
>
> A class hierarchy defines foo() and bar(). We want to put all foo() implementations together and all bar() implementations together.

Ah, so this is a new possibility created by the "method definitions outside a class" requirement. But how would this play with the module system? Would you be allowed to place method definitions in modules other than the class declaration?

-- 
Best regards,
 Vladimir                            mailto:vladimir@thecybershadow.net
July 23, 2011
On Sun, 24 Jul 2011 02:10:47 +0300, Daniel Gibson <metalcaedes@gmail.com> wrote:

> Am 24.07.2011 01:02, schrieb so:
>> On Sun, 24 Jul 2011 01:47:32 +0300, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> and then either remove the .tmp if identical or rename it forcefully
>>> to filename.di if different. That's a classic in code generation tools.
>>
>> Not sure i got it right but how renaming it forcefully would solve this?
>> This must be two separate process for the compiler, issuing an error and
>> if it was intended then the .di file must be generated.
>
> Because if not done forcefully there may be an error because filename.di already exists.

What i meant is:

You got file.di file.d, file.di being your reference.
You got a conflict. Now this conflict could be a result of something not intended and it most likely wasn't intended because file.di is your reference and you don't change many things often in there. There IMO you should just issue an error and let the user deal with it by explicitly generating file.di from file.d, if you are doing this explicitly compiler just shouldn't bother if it was already there. (again it is free to compare results of these two and keep the old one if they are same)

Doing this forcefully also is a solution but it better be optional (a compiler flag) not the default.
July 24, 2011
On 7/23/2011 3:50 PM, Andrei Alexandrescu wrote:
> On 7/23/11 5:39 PM, bearophile wrote:
>> I have suggested some fine-grained hashing. Compute a hash from a
>> class definition, and later quickly compare this value with a value
>> stored elsewhere (like automatically written in the .di file).
>
> I discussed four options with Walter, and this was one of them. It has issues.
> The proposal as in this thread is the simplest and most effective I could find.

The only way the linker can detect mismatches is by embedding the hash into the name, i.e. more name mangling. This has serious issues:

1. The hashing cannot be reversed. Hence, the user will be faced with really, really ugly error messages from the linker that will make today's mangled names look like a marvel of clarity. Consider all the users today, who have a really hard time with things like:

    undefined symbol: _foo

from the linker. Now imagine it's:

    undefined symbol: _foo12345WQERTYHBVCFDERTYHGFRTYHGFTYUHGTYUHGTYUJHGTYU

They'll run screaming, and I would, too.

2. This hash will get added to all struct/class names, so there will be an explosion in the length of names the linker sees. This can make tools that deal with symbolic names in the executable (like debuggers, disassemblers, profilers, etc.) much more messy to deal with.

3. Hashes aren't perfect, they can have collisions, unless you want to go with really long ones like MD5.


July 24, 2011
On 07/23/2011 11:54 PM, Andrei Alexandrescu wrote:
> The problem with this setup is that it's extremely fragile, in ways that
> are undetectable during compilation or runtime. For example, just
> swapping a and b in the implementation file makes the program print
> "08.96566e-31344". Similar issues occur if fields or methods are added
> or removed from one file but not the other.
>
> In an attempt to fix this, the developers may add an "import a" to a.d,
> thinking that the compiler would import a.di and would verify the bodies
> of the two classes for correspondence. That doesn't work - the compiler
> simply ignores the import. Things can be tenuously arranged such that
> the .d file and the .di file have different names, but in that case the
> compiler complains about duplicate definitions.

If the .di files are this fragile, the compiler should just always check if the .d file matches (if present) .di file, then there is no need for the extra import.

-- 
Mike Wey
July 24, 2011
On 7/24/11 4:31 AM, Mike Wey wrote:
> On 07/23/2011 11:54 PM, Andrei Alexandrescu wrote:
>> The problem with this setup is that it's extremely fragile, in ways that
>> are undetectable during compilation or runtime. For example, just
>> swapping a and b in the implementation file makes the program print
>> "08.96566e-31344". Similar issues occur if fields or methods are added
>> or removed from one file but not the other.
>>
>> In an attempt to fix this, the developers may add an "import a" to a.d,
>> thinking that the compiler would import a.di and would verify the bodies
>> of the two classes for correspondence. That doesn't work - the compiler
>> simply ignores the import. Things can be tenuously arranged such that
>> the .d file and the .di file have different names, but in that case the
>> compiler complains about duplicate definitions.
>
> If the .di files are this fragile, the compiler should just always check
> if the .d file matches (if present) .di file, then there is no need for
> the extra import.

Trouble is, they sometimes may be in different dirs.

Andrei
July 24, 2011
On 22/07/2011 23:06, Andrei Alexandrescu wrote:
> I see this as a source of problems going forward, and I propose the
> following changes to the language:
>
> 1. The compiler shall accept method definitions outside a class.
>
> 2. A method cannot be implemented unless it was declared with the same
> signature in the class definition.

So what you're proposing is akin to this in C++?

class A
{
  void foo();
}

void A::foo()
{
}

The equivalent D being s/::/./? My question is then, do we allow something like:

// a.d
class A
{
  void foo();
}

// b.d
import a;
void A.foo() {}

Which causes the module system to breakdown somewhat, or do we require the method to be implemented in the same module (.di or .d)?

-- 
Robert
http://octarineparrot.com/
July 24, 2011
On Sat, 23 Jul 2011 21:14:27 -0300, Walter Bright <newshound2@digitalmars.com> wrote:

> On 7/23/2011 3:50 PM, Andrei Alexandrescu wrote:
>> On 7/23/11 5:39 PM, bearophile wrote:
>>> I have suggested some fine-grained hashing. Compute a hash from a
>>> class definition, and later quickly compare this value with a value
>>> stored elsewhere (like automatically written in the .di file).
>>
>> I discussed four options with Walter, and this was one of them. It has issues.
>> The proposal as in this thread is the simplest and most effective I could find.
>
> The only way the linker can detect mismatches is by embedding the hash into the name, i.e. more name mangling. This has serious issues:
>
> 1. The hashing cannot be reversed. Hence, the user will be faced with really, really ugly error messages from the linker that will make today's mangled names look like a marvel of clarity. Consider all the users today, who have a really hard time with things like:
>
>      undefined symbol: _foo
>
> from the linker. Now imagine it's:
>
>      undefined symbol: _foo12345WQERTYHBVCFDERTYHGFRTYHGFTYUHGTYUHGTYUJHGTYU
>
> They'll run screaming, and I would, too.

A simplistic suggestion:

This could be made better by specifying a hash introduction character, known
by or specifyable in all tools. That could give
_foo^12345WQERTYHBVCFDERTYHGFRTYHGFTYUHGTYUHGTYUJHGTYU
in tools not yet aware of the hash intro character, and just
_foo
in tools that has been adapted to take advantage of it.
Both cases are easier to read IMHO, and the system enables easy implementation
of the second case in varous tools.

> 2. This hash will get added to all struct/class names, so there will be an explosion in the length of names the linker sees. This can make tools that deal with symbolic names in the executable (like debuggers, disassemblers, profilers, etc.) much more messy to deal with.
> 3. Hashes aren't perfect, they can have collisions, unless you want to go with really long ones like MD5.

The system above would make the length of the hash almost irrelevant, because
it would simplify the adaption of tools to not display the symbols hash, while
also make the symbol easier to read in old tools not yet adapted.

I do not know if other compiled languages has the same problem, but if they
do such a convention might be nice for them as well.

Roald
July 24, 2011
Don:

> As far as performance goes, it would seem much simpler to just

Have Walter and Andrei seen this post by Don?

Bye,
bearophile