Compilation strategy (page 13) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Compilation strategy (page 13)

December 18, 2012

Re: Compilation strategy

Posted by Walter Bright
in reply to Jacob Carlborg

Walter Bright

Posted in reply to Jacob Carlborg

On 12/17/2012 11:40 PM, Jacob Carlborg wrote:
> On 2012-12-17 00:09, Walter Bright wrote:
>
>> Figure out the cases where it happens and fix those cases.
>
> How is it supposed to work? Could there be some issue with the dependency
> tracker that should otherwise have indicated that more modules should have been
> recompiled.

It should only generate function bodies that are needed, not all of them.

December 18, 2012

Re: Compilation strategy

Posted by Paulo Pinto
in reply to Jacob Carlborg

Paulo Pinto

Posted in reply to Jacob Carlborg

On Tuesday, 18 December 2012 at 07:48:01 UTC, Jacob Carlborg wrote:
> On 2012-12-17 23:12, Walter Bright wrote:
>
>> I have toyed with the idea many times, however, of having dmd support
>> zip files. Zip files can contain an arbitrary file hierarchy, with
>> individual files in compressed, encrypted, or plaintext at the selection
>> of the zip builder. An entire project, or library, or collection of
>> source modules can be distributed as a zip file, and could be compiled
>> with nothing more than:
>
> I think that a package manager should handle this. Example:
>
> https://github.com/jacob-carlborg/orbit/wiki/Orbit-Package-Manager-for-D
>
> Yes I can change the orbfiles to be written in D.

Hehe. :)

I even checked out your code with that idea in mind, but other things keep having higher priority.

--
Paulo

December 18, 2012

Re: Compilation strategy

Posted by Dmitry Olshansky
in reply to Walter Bright

Dmitry Olshansky

Posted in reply to Walter Bright

12/18/2012 4:42 AM, Walter Bright пишет:
> On 12/17/2012 3:03 PM, deadalnix wrote:
>> I know that. I not arguing against that. I'm arguing against the fact
>> that this
>> is a blocker. This is blocker in very few use cases in fact. I just
>> look at the
>> whole picture here. People needing that are the exception, not the rule.
>
> I'm not sure what you mean. A blocker for what?
>
>
>> And what prevent us from using a bytecode that loose information ?
>
> I'd turn that around and ask why have a bytecode?
>
>
>> As long as it is CTFEable, most people will be happy.
>
> CTFE needs the type information and AST trees and symbol table.
> Everything needed for decompilation.
>

The fact that CTFE has to crawl AST trees is AFAIK a mere happenstance. It does help nothing but the way to hack it into the current compiler structure.

There should be a far more suitable IR (if you don't like the bytecode term) if we are to run CTFE at least at marginally comparable to run-time speeds.

> I know that bytecode has been around since 1995 in its current
> incarnation, and there's an ingrained assumption that since there's such
> an extensive ecosystem around it, that there is some advantage to it.
>

I don't care for ecosystems. And there is none involved in the argument.

> But there isn't.

Compared to doing computations on AST tries (and looking up every name in symbol table?), creating fake nodes when the result is computed etc?

I'm out of words.

-- 
Dmitry Olshansky

December 18, 2012

Re: Compilation strategy

Posted by foobar
in reply to H. S. Teoh

foobar

Posted in reply to H. S. Teoh

On Tuesday, 18 December 2012 at 00:15:04 UTC, H. S. Teoh wrote:
> On Tue, Dec 18, 2012 at 02:08:55AM +0400, Dmitry Olshansky wrote:
> [...]
>> I suspect it's one of prime examples where UNIX philosophy of
>> combining a bunch of simple (~ dumb) programs together in place of
>> one more complex program was taken *far* beyond reasonable lengths.
>> 
>> Having a pipe-line:
>> preprocessor -> compiler -> (still?) assembler -> linker
>> 
>> where every program tries hard to know nothing about the previous
>> ones (and be as simple as possibly can be) is bound to get
>> inadequate results on many fronts:
>> - efficiency & scalability
>> - cross-border error reporting and detection (linker errors? errors
>> for expanded macro magic?)
>> - cross-file manipulations (e.g. optimization, see _how_ LTO is done in GCC)
>> - multiple problems from a loss of information across pipeline*
>
> The problem is not so much the structure preprocessor -> compiler ->
> assembler -> linker; the problem is that these logical stages have been
> arbitrarily assigned to individual processes residing in their own
> address space, communicating via files (or pipes, whatever it may be).
>
> The fact that they are separate processes is in itself not that big of a
> problem, but the fact that they reside in their own address space is a
> big problem, because you cannot pass any information down the chain
> except through rudimentary OS interfaces like files and pipes. Even that
> wouldn't have been so bad, if it weren't for the fact that user
> interface (in the form of text input / object file format) has also been
> conflated with program interface (the compiler has to produce the input
> to the assembler, in *text*, and the assembler has to produce object
> files that do not encode any direct dependency information because
> that's the standard file format the linker expects).
>
> Now consider if we keep the same stages, but each stage is not a
> separate program but a *library*. The code then might look, in greatly
> simplified form, something like this:
>
> 	import libdmd.compiler;
> 	import libdmd.assembler;
> 	import libdmd.linker;
>
> 	void main(string[] args) {
> 		// typeof(asmCode) is some arbitrarily complex data
> 		// structure encoding assembly code, inter-module
> 		// dependencies, etc.
> 		auto asmCode = compiler.lex(args)
> 			.parse()
> 			.optimize()
> 			.codegen();
>
> 		// Note: no stupid redundant convert to string, parse,
> 		// convert back to internal representation.
> 		auto objectCode = assembler.assemble(asmCode);
>
> 		// Note: linker has direct access to dependency info,
> 		// etc., carried over from asmCode -> objectCode.
> 		auto executable = linker.link(objectCode);
> 		File output(outfile, "w");
> 		executable.generate(output);
> 	}
>
> Note that the types asmCode, objectCode, executable, are arbitrarily
> complex, and may contain lazy-evaluated data structure, references to
> on-disk temporary storage (for large projects you can't hold everything
> in RAM), etc.. Dependency information in asmCode is propagated to
> objectCode, as necessary. The linker has full access to all info the
> compiler has access to, and can perform inter-module optimization, etc.,
> by accessing information available to the *compiler* front-end, not just
> some crippled object file format.
>
> The root of the current nonsense is that perfectly-fine data structures
> are arbitrarily required to be flattened into some kind of intermediate
> form, written to some file (or sent down some pipe), often with loss of
> information, then read from the other end, interpreted, and
> reconstituted into other data structures (with incomplete info), then
> processed. In many cases, information that didn't make it through the
> channel has to be reconstructed (often imperfectly), and then used. Most
> of these steps are redundant. If the compiler data structures were
> already directly available in the first place, none of this baroque
> dance is necessary.
>
>
>> *Semantic info on interdependency of symbols in a source file is
>> destroyed right before the linker and thus each .obj file is
>> included as a whole or not at all. Thus all C run-times I've seen
>> _sidestep_ this by writing each function in its own file(!). Even
>> this alone should have been a clear indication.
>> 
>> While simplicity (and correspondingly size in memory) of programs
>> was the king in 70's it's well past due. Nowadays I think is all
>> about getting highest throughput and more powerful features.
> [...]
>
> Simplicity is good. Simplicity lets you modularize a very complex piece
> of software (a compiler that converts D source code into executables)
> into manageable chunks. Simplicity does not require shoe-horning modules
> into separate programs with separate address spaces with separate (and
> deficient) input/output formats.
>
> The problem isn't with simplicity, the problem is with carrying over the
> archaic mapping of compilation stage -> separate program. I mean,
> imagine if std.regex was written so that regex compilation runs in a
> separate program with a separate address space, and the regex matcher
> that executes the match runs in another separate program with a separate
> address space, and the two talk to each other via pipes, or worse,
> intermediate files.
>
> I've mentioned a few times before a horrendous C++ project that I had to
> work with once, where to make a single function call to a particular
> subsystem, it had to go through 6 layers of abstraction, one of which
> was IPC through a local UNIX socket, *and* another of which involved
> fwrite()ing function parameters into a file and fread()ing said
> parameters from the file in another process, with the 6 layers repeating
> in reverse to propagate the return value of the function back to the
> caller.
>
> In the new version of said project, that subsystem exposes a library API
> where to make a function call, you, um, just call the function (gee,
> what a concept).  Needless to say, it didn't take a lot of effort to
> convince customers to upgrade, upon which we proceeded with great relish
> to delete every single source file having to do with that 6-layered
> monstrosity, and had a celebration afterwards.
>
>>From the design POV, though, the layout of the old version of the
> project utterly made sense. It was superbly (over)engineered, and if you
> made UML diagrams of it, they would be works of art fit for the British
> Museum. The implementation, however, was "somewhat" disappointing.
>
>
> T

IMO, it's not even an issue of the separate address spaces. The core problem is the direct result of relying on *archaic file formats*.
Simply using serialization of the intermediate data structure already solves the data loss problems and all that remains are aspects of efficiency which are much less important given current compilation speeds. Separate address spaces can be useful if we add distributed and concurrent aspects into the mix.

December 18, 2012

Re: Compilation strategy

Posted by foobar
in reply to Walter Bright

foobar

Posted in reply to Walter Bright

On Tuesday, 18 December 2012 at 00:48:40 UTC, Walter Bright wrote:

>> Wow, I think that's exactly what we could use! It serves multiple optional use
>> cases all at once!
>>
>> Was there a technical reason for you not getting around towards implementing, or
>> just a lack of time?
>
> There always seemed something more important to be doing, and Andrei thought it would be better to put such a capability in rdmd rather than dmd.

This is inconsistent with D's design - providing useful features built-in (docs generator, testing, profiling, etc).
More over, it breaks encapsulation. This means the compiler exposes an inferior format that will later be wrapped around by a more capable packaging format, thus exposing the implementation details and adding an external dependency on that inferior format. Besides, the other compilers merge in the same front-end code so they'll gain the same feature anyway. There's no gain in separating it out to rdmd.

The main question is if you approve the concept and willing to put it on the to-do list? I'm sure that if you endorse this feature someone else will come in and implement it.

December 18, 2012

Re: Compilation strategy

Posted by Andrej Mitrovic
in reply to foobar

Andrej Mitrovic

Posted in reply to foobar

On 12/18/12, foobar <foo@bar.com> wrote:
> Besides, the other compilers merge in the same front-end
> code so they'll gain the same feature anyway. There's no gain in
> separating it out to rdmd.

Adding more front-end features adds more work for maintainers of compilers which are based on the DMD front-end, and not all compilers are based on the DMD front-end.

Don't forget the huge gain of using D over C++ to implement the feature.

December 18, 2012

D Frontend and shared code base (moving away from calling it DMD front-end).

Posted by Iain Buclaw
in reply to Andrej Mitrovic

Iain Buclaw

Posted in reply to Andrej Mitrovic

On Tuesday, 18 December 2012 at 12:36:31 UTC, Andrej Mitrovic wrote:
> On 12/18/12, foobar <foo@bar.com> wrote:
>> Besides, the other compilers merge in the same front-end
>> code so they'll gain the same feature anyway. There's no gain in
>> separating it out to rdmd.
>
> Adding more front-end features adds more work for maintainers of
> compilers which are based on the DMD front-end, and not all compilers
> are based on the DMD front-end.
>

9 times out of 10 any new features don't harm compilers for other backends. The only ones you need to look out for are features that pass new information to the backend to handle (eg: nullable types and vectors were the last two off the top of my head).

Also, personally I refer to it as the D Front-end (or DFE for short sometimes in IRC), as for the most part, it has long since been specific only for DMD.

However, the situation could be better though. Despite me regarding the frontend as shared code, it is far from that. As GDC and LDC make quite a lot of changes to make DFE work with their respective backends and for portability to non-x86 architectures. See this diff between LDC, DMD and GDC as an example of just how bad the current situation is.

http://img21.imageshack.us/img21/1396/meldview1.png

I have spent quite a huge amount of time trying to thin down on these differences, as it does ease the merge process when a new update to DFE gets rolled out.  The following is an example where I moved the building of these calls to the GDC glue code, at the expense of GDC's codebase growing in size.

http://img43.imageshack.us/img43/4922/meldview3.png

By and large though, LDC and GDC actually share a lot of backend-specific changes. Here's an example which returns the target align size of types.

http://img191.imageshack.us/img191/6156/meldview2.png

For these cases when portability mattered, I'd hope that these sorts of changes wouldn't need to be conditional, and I'd rather the DFE to be a common repository used for all the compilers using it.  Any discrepancies between the compilers replaced with a hook that is unconditional, and left to the compiler maintainer's job to implement in the correct way for them.

Hopefully, I'd like to get the core developers of D Front-end to work with the people maintaining other such compilers (GDC, LDC, and any others that might come into existance) so that there can genuinely be a shared, portable source base for the D Front-end code, to be used by all maintainers, without conditionals based on which compiler it's used in, and with that shared source base only using an absolute minimum of headers from DMD backend.

This could be done with a new interface, for example struct target. Meaning the example in the screenshot above would be reduced to the following shared code.

unsigned TypeBasic::alignsize()
{
    return target.alignsize(this);
}

I'll let you all brood over that though, and would welcome any feedback to get some sort of plan rolling (even if it's just a DIP).

Thanks,
Iain.

December 18, 2012

Re: D Frontend and shared code base (moving away from calling it DMD front-end).

Posted by Andrej Mitrovic
in reply to Iain Buclaw

Andrej Mitrovic

Posted in reply to Iain Buclaw

On 12/18/12, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> I'd like to get the core developers of D Front-end to
> work with the people maintaining other such compilers (GDC, LDC,
> and any others that might come into existance) so that there can
> genuinely be a shared, portable source base for the D Front-end
> code..

Maybe as a first baby-step GDC/LDC devs could identify all the functions which should be abstracted away into such a "target" struct.

December 18, 2012

Re: D Frontend and shared code base (moving away from calling it DMD front-end).

Posted by Iain Buclaw

Iain Buclaw

Attachments:

text/html part

On 18 December 2012 14:07, Andrej Mitrovic <andrej.mitrovich@gmail.com>wrote:

> On 12/18/12, Iain Buclaw <ibuclaw@ubuntu.com> wrote:
> > I'd like to get the core developers of D Front-end to
> > work with the people maintaining other such compilers (GDC, LDC,
> > and any others that might come into existance) so that there can
> > genuinely be a shared, portable source base for the D Front-end
> > code..
>
> Maybe as a first baby-step GDC/LDC devs could identify all the functions which should be abstracted away into such a "target" struct.
>

*Takes a big breathe*



I'll draw up a list when I'm travelling on the train this weekend.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

December 18, 2012

Re: Compilation strategy

Posted by Walter Bright
in reply to Dmitry Olshansky

Walter Bright

Posted in reply to Dmitry Olshansky

On 12/18/2012 1:43 AM, Dmitry Olshansky wrote:
> Compared to doing computations on AST tries (and looking up every name in symbol
> table?), creating fake nodes when the result is computed etc?

CTFE does not look up every (or any) name in the symbol table. I don't see any advantage to interpreting bytecode over interpreting ASTs. In fact, all the Java bytecode is is a serialized AST.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation