June 15, 2011
On Tue, 14 Jun 2011 22:24:04 -0400, Nick Sabalausky <a@a.a> wrote:

> "Andrei Alexandrescu" <SeeWebsiteForEmail@erdani.org> wrote in message
> news:4DF7D92A.8050606@erdani.org...
>> On 6/14/11 4:38 PM, Nick Sabalausky wrote:
>>> - Putting it in the compiler forces it all to be written in C++. As an
>>> external tool, we could use D.
>>
>> Having the compiler communicate with a download tool supplied with the
>> distribution seems to be a very promising approach that would address this
>> concern.
>>
>
> A two way "compiler <-> build tool" channel is messier than "build tool
> invoked compier", and I don't really see much benefit.

It's neither.  It's not a build tool, it's a fetch tool.  The build tool has nothing to do with getting the modules.

The drawback here is that the build tool has to interface with said fetch tool in order to do incremental builds.

However, we could make an assumption that files that are downloaded are rather static, and therefore, the target doesn't "depend" on them.  To override this, just do a rebuild-from-scratch on the rare occasion you have to update the files.

>>> - By default, it ends up downloading an entire library one inferred
>>> source
>>> file at a time. Why? Libraries are a packaged whole. Standard behavior
>>> should be for libraries should be treated as such.
>>
>> Fair point, though in fact the effect is that one ends up downloading
>> exactly the used modules from that library and potentially others.
>>
>
> I really don't see a problem with that. And you'll typically end up needing
> most, if not all, anyway. It's very difficult to see this as an actual
> drawback.

When requesting a given module, it might be that it's part of a package (I would say most definitely).  The fetch tool could know to get the entire package and extract it into the cache.

>>> - Does every project that uses libX have to download it separately? If
>>> not
>>> (or really even if so), how does the compiler handle different versions
>>> of
>>> the lib and prevent "dll hell"? Versioning seems to be an afterthought in
>>> this DIP - and that's a guaranteed way to eventually find yourself in dll
>>> hell.
>>
>> Versioning is a policy matter that can, I think, be addressed within the
>> URL structure. This proposal tries to support versioning without
>> explicitly imposing it or standing in its way.
>>
>
> That's exactly my point. If you leave it open like that, everyone will come
> up with thier own way to do it, many will not even give it any attention at
> all, and most of those approaches will end up being wrong WRT avoiding dll
> hell. Hence, dll hell will get in and library users will end up having to
> deal it. The only way to avoid it is to design it out of the system up from
> *with explicitly imposing it*.

If the proposal becomes one where the include path specifies base urls, then the build tool can specify exact versions.

The cache should be responsible for making sure files named the same from different URLs do not conflict.

for example:

-Ihttp://url.to.project/v1.2.3

in one project and

-Ihttp://url.to.project/v1.2.4

in another.

I still feel that specifying the url in the source is the wrong approach -- it puts too much information into the source, and any small change requires modifying source code.  We don't specify full paths for local imports, why should we specify full paths for remote ones?

-Steve
June 15, 2011
On Tue, 14 Jun 2011 09:53:16 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> http://www.wikiservice.at/d/wiki.cgi?LanguageDevel/DIPs/DIP11
>
> Destroy.

I put this as replies in several threads, but I'll throw it out there as its own thread:

* You already agree that having the fetching done by a separate program (possibly written in d) makes the solution cleaner (i.e. you are not infiltrating the code that actually does compiling with code that does network fetching).

* I think specifying the entire url in the pragma is akin to specifying the full path of a given module on your local disk.  I think it's not the right place for it, the person who is building the code should be responsible for where the modules come from, and import should continue to specify the module relative to the include path.

* A perfect (IMO) way to configure the fetch tool is by using the same mechanism that configures dmd on how to get modules -- the include path.  For instance -Ihttp://xxx.yyy.zzz/package can be passed to the compiler or put into the dmd.conf.

* DMD already has a good mechanism to specify configuration and you would barely have to change anything internally.

Here's how it would work.  I'll specify how it goes from command line to final (note the http path is not a valid path, it's just an example):

dmd -Ihttp://www.dsource.org/projects/dcollections/import testproj.d

1. dmd recognizes the url pattern and stores this as an 'external' path
2. dmd reads the file testproj.d and sees that it imports dcollections.TreeMap
3. Using it's non-external paths, it cannot find the module.
4. It calls:
    dget -Ihttp://www.dsource.org/projects/dcollections/import dcollections.TreeMap
5. dget checks its internal cache to see if the file dcollections/TreeMap.[d|di] already exists -- not found
6. dget uses internal logic to generate a request to download either
   a. an entire package which contains the requested import (preferred)
   b. just the specific file dcollections/TreeMap.d
7. Using the url as a key, it stores the TreeMap.d file in a cache so it doesn't have to download it again (can be stored globally or local to the user/project)
8. Pipes the file to stdout, dmd reads the file, and returns 0 for success
9. dmd finishes compiling.

On a second run to dmd, it would go through the same process, but dget succeeds on step 5 of finding it in the cache and pipes it to stdout.

Some issues with this scheme:

1. dependency checking would be difficult for a build tool (like make) for doing incremental builds.  However, traditionally one does not specify standard library files as dependencies, so downloaded files would probably be under this same category.  I.e. if you need to rebuild, you'd have to clear the cache and do a make clean (or equivalent).  Another option is to have dget check to see if the file on the server has been modified.

2. It's possible that dget fetches files one at a time, which might be very slow (on the first build).  However, one can trigger whole package downloads easily enough (for example, by making the include path entry point at a zip file or tarball).  dget should be smart enough to handle extracting packages.

I can't really think of any other issues.

-Steve
June 15, 2011
On 6/15/11 7:53 AM, Steven Schveighoffer wrote:
> On Tue, 14 Jun 2011 16:47:01 -0400, Adam D. Ruppe
> <destructionator@gmail.com> wrote:
>
>> BTW, I don't think it should be limited to just passing a
>> url to the helper program.
>>
>> I'd do it something like this:
>>
>> dget module.name url_from_pragma
>
> I still don't like the url being stored in the source file -- where
> *specifically* on the network to get the file has nothing to do with
> compiling the code, and fixing a path problem shouldn't involve editing
> a source file -- there is too much risk.

First, clearly we need command-line equivalents for the pragmas. They can be subsequently loaded from a config file. The embedded URLs are for people who want to distribute libraries without requiring their users to change their config files. I think that simplifies matters for many. Again - the ULTIMATE place where dependencies exist is in the source files.

> For comparison, you don't have to specify a full path to the compiler of
> where to get modules, they are specified relative to the configured
> include paths. I think this model works well, and we should be able to
> re-use it for this purpose also. You could even just use urls as include
> paths:
>
> -Ihttp://www.dsource.org/projects/dcollections/import

I also think that model works well, except HTTP does not offer search the same way a filesystem does. You could do that with FTP though.


Andrei
June 15, 2011
On 6/14/11 8:44 PM, Nick Sabalausky wrote:
> "Adam D. Ruppe"<destructionator@gmail.com>  wrote in message
> news:it91b0$aa0$1@digitalmars.com...
>> Nick Sabalausky wrote:
>>> Just one extra deps-gathering invokation each time a
>>> deps-gathering invokation finds unsatisfied depenencies, and *only*
>>> the first time you build.
>>
>> It could probably cache the last successful command...
>
> Nothing would need to be cached. After the initial "gather everything and
> build" build, all it would ever have to do is exactly what RDMD already does
> right now: Run DMD once to find the deps, check them to see if anything
> needs rebuilt, and if so, run DMD the second time to build. There'd never be
> any need for more than those two invokations (and the first one tends to be
> much faster anyway) until a new library dependency is introduced.

I think this works, but I personally find it clumsy. Particularly because when dmd fails, you don't know exactly why - may have been an import, may have been something else. So the utility needs to essentially remember the last import attempted (won't work when the compiler will use multiple threads) and scrape dmd's stderr output and parse it for something that looks like a specific "module not found" error message (see http://arsdnet.net/dcode/build.d). It's quite a shaky design that relies on a bunch of stars aligning.

Andrei
June 15, 2011
On Wed, 15 Jun 2011 09:53:31 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 6/15/11 7:53 AM, Steven Schveighoffer wrote:
>> On Tue, 14 Jun 2011 16:47:01 -0400, Adam D. Ruppe
>> <destructionator@gmail.com> wrote:
>>
>>> BTW, I don't think it should be limited to just passing a
>>> url to the helper program.
>>>
>>> I'd do it something like this:
>>>
>>> dget module.name url_from_pragma
>>
>> I still don't like the url being stored in the source file -- where
>> *specifically* on the network to get the file has nothing to do with
>> compiling the code, and fixing a path problem shouldn't involve editing
>> a source file -- there is too much risk.
>
> First, clearly we need command-line equivalents for the pragmas. They can be subsequently loaded from a config file. The embedded URLs are for people who want to distribute libraries without requiring their users to change their config files. I think that simplifies matters for many. Again - the ULTIMATE place where dependencies exist is in the source files.

We have been getting along swimmingly without pragmas for adding local include paths.  Why do we need to add them using pragmas for network include paths?

Also, I don't see the major difference in someone who's making a piece of software from adding the include path to their source file vs. adding it to their build script.

But in any case, it doesn't matter if both options are available -- it doesn't hurt to have a pragma option as long as a config option is available.  I just don't want to *require* the pragma solution.

>
>> For comparison, you don't have to specify a full path to the compiler of
>> where to get modules, they are specified relative to the configured
>> include paths. I think this model works well, and we should be able to
>> re-use it for this purpose also. You could even just use urls as include
>> paths:
>>
>> -Ihttp://www.dsource.org/projects/dcollections/import
>
> I also think that model works well, except HTTP does not offer search the same way a filesystem does. You could do that with FTP though.

dget would just add the appropriate path:

import dcollections.TreeMap =>
get http://www.dsource.org/projects/dcollections/import/dcollections/TreeMap.d
hm.. doesn't work
get http://www.dsource.org/projects/dcollections/import/dcollections/TreeMap.di
ok, there it is!

As I said in another post, you could also specify a zip file or tarball as a base path, and the whole package is downloaded instead.  We may need some sort of manifest instead in order to verify the import will be found instead of downloading the entire package to find out.

-Steve
June 15, 2011
On 6/15/11 8:33 AM, Steven Schveighoffer wrote:
> On Tue, 14 Jun 2011 09:53:16 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> http://www.wikiservice.at/d/wiki.cgi?LanguageDevel/DIPs/DIP11
>>
>> Destroy.
>
> I put this as replies in several threads, but I'll throw it out there as
> its own thread:
>
> * You already agree that having the fetching done by a separate program
> (possibly written in d) makes the solution cleaner (i.e. you are not
> infiltrating the code that actually does compiling with code that does
> network fetching).

I agree.

> * I think specifying the entire url in the pragma is akin to specifying
> the full path of a given module on your local disk. I think it's not the
> right place for it, the person who is building the code should be
> responsible for where the modules come from, and import should continue
> to specify the module relative to the include path.

I understand. It hasn't been rare that I would have preferred to specify an -I equivalent through a pragma in my D programs. Otherwise all of a sudden I needed to have a more elaborate dmd/rdmd line, and then I thought, heck, I need a script or makefile or a dmd.conf to build this simple script... I don't think one is good and the other is bad. Both have their uses.

BTW, Perl and Python (and probably others) have a way to specify paths for imports.

http://www.perlhowto.com/extending_the_library_path
http://stackoverflow.com/questions/279237/python-import-a-module-from-a-folder

> * A perfect (IMO) way to configure the fetch tool is by using the same
> mechanism that configures dmd on how to get modules -- the include path.
> For instance -Ihttp://xxx.yyy.zzz/package can be passed to the compiler
> or put into the dmd.conf.

HTTP is not a filesystem so the mechanism must be different. I added a section "Command-line equivalent": http://www.wikiservice.at/d/wiki.cgi?LanguageDevel/DIPs/DIP11#section10

My concern about using cmdline/conf exclusively remains. There must be a way to specify dependencies where they belong - with the source. That is _literally_ where they belong!

One additional problem is one remote library that depends on another. You end up needing to add K URLs where K is the number of dependent libraries. The process of doing so will be mightily annoying - repeated failure to compile and RTFMs.

> * DMD already has a good mechanism to specify configuration and you
> would barely have to change anything internally.
>
> Here's how it would work. I'll specify how it goes from command line to
> final (note the http path is not a valid path, it's just an example):
>
> dmd -Ihttp://www.dsource.org/projects/dcollections/import testproj.d
>
> 1. dmd recognizes the url pattern and stores this as an 'external' path
> 2. dmd reads the file testproj.d and sees that it imports
> dcollections.TreeMap
> 3. Using it's non-external paths, it cannot find the module.
> 4. It calls:
> dget -Ihttp://www.dsource.org/projects/dcollections/import
> dcollections.TreeMap
> 5. dget checks its internal cache to see if the file
> dcollections/TreeMap.[d|di] already exists -- not found
> 6. dget uses internal logic to generate a request to download either
> a. an entire package which contains the requested import (preferred)
> b. just the specific file dcollections/TreeMap.d
> 7. Using the url as a key, it stores the TreeMap.d file in a cache so it
> doesn't have to download it again (can be stored globally or local to
> the user/project)
> 8. Pipes the file to stdout, dmd reads the file, and returns 0 for success
> 9. dmd finishes compiling.

Not so fast. What if dcollections depends on stevesutils, to be found on http://www.stevesu.ti/ls and larspath, to be found on http://la.rs/path? The thing will fail to compile, and there will be no informative message on what to do next.


Andrei
June 15, 2011
On 6/15/11 9:13 AM, Steven Schveighoffer wrote:
> On Wed, 15 Jun 2011 09:53:31 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 6/15/11 7:53 AM, Steven Schveighoffer wrote:
>>> On Tue, 14 Jun 2011 16:47:01 -0400, Adam D. Ruppe
>>> <destructionator@gmail.com> wrote:
>>>
>>>> BTW, I don't think it should be limited to just passing a
>>>> url to the helper program.
>>>>
>>>> I'd do it something like this:
>>>>
>>>> dget module.name url_from_pragma
>>>
>>> I still don't like the url being stored in the source file -- where
>>> *specifically* on the network to get the file has nothing to do with
>>> compiling the code, and fixing a path problem shouldn't involve editing
>>> a source file -- there is too much risk.
>>
>> First, clearly we need command-line equivalents for the pragmas. They
>> can be subsequently loaded from a config file. The embedded URLs are
>> for people who want to distribute libraries without requiring their
>> users to change their config files. I think that simplifies matters
>> for many. Again - the ULTIMATE place where dependencies exist is in
>> the source files.
>
> We have been getting along swimmingly without pragmas for adding local
> include paths. Why do we need to add them using pragmas for network
> include paths?

That doesn't mean the situation is beyond improvement. If I had my way I'd add pragma(liburl) AND pragma(libpath).

> Also, I don't see the major difference in someone who's making a piece
> of software from adding the include path to their source file vs. adding
> it to their build script.

Because in the former case the whole need for a build script may be obviated. That's where I'm trying to be.

> But in any case, it doesn't matter if both options are available -- it
> doesn't hurt to have a pragma option as long as a config option is
> available. I just don't want to *require* the pragma solution.

Sounds good. I actually had the same notion, just forgot to mention it in the dip (fixed).

>>> For comparison, you don't have to specify a full path to the compiler of
>>> where to get modules, they are specified relative to the configured
>>> include paths. I think this model works well, and we should be able to
>>> re-use it for this purpose also. You could even just use urls as include
>>> paths:
>>>
>>> -Ihttp://www.dsource.org/projects/dcollections/import
>>
>> I also think that model works well, except HTTP does not offer search
>> the same way a filesystem does. You could do that with FTP though.
>
> dget would just add the appropriate path:
>
> import dcollections.TreeMap =>
> get
> http://www.dsource.org/projects/dcollections/import/dcollections/TreeMap.d
> hm.. doesn't work
> get
> http://www.dsource.org/projects/dcollections/import/dcollections/TreeMap.di
> ok, there it is!

This assumes the URL contains the package prefix. That would work, but imposes too much on the URL structure. I find the notation -Upackage=url more general.

> As I said in another post, you could also specify a zip file or tarball
> as a base path, and the whole package is downloaded instead. We may need
> some sort of manifest instead in order to verify the import will be
> found instead of downloading the entire package to find out.

Sounds cool.


Andrei
June 15, 2011
On 6/15/11 8:33 AM, Steven Schveighoffer wrote:
> I can't really think of any other issues.

Allow me to repeat: the scheme as you mention it is unable to figure and load dependent remote libraries for remote libraries. It's essentially a flat scheme in which you know only the top remote library but nothing about the rest.

The dip takes care of that by using transitivity and by relying on the presence of dependency information exactly where it belongs - in the dependent source files. Separating that information from source files has two liabilities. First, it breaks the whole transitivity thing. Second, it adds yet another itsy-bitsy pellet of metadata/config/whatevs files that need to be minded. I just don't see the advantage of imposing that.


Andrei

June 15, 2011
On 15.06.2011 17:33, Steven Schveighoffer wrote:
> On Tue, 14 Jun 2011 09:53:16 -0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
>
>> http://www.wikiservice.at/d/wiki.cgi?LanguageDevel/DIPs/DIP11
>>
>> Destroy.
>
> I put this as replies in several threads, but I'll throw it out there as its own thread:
>
> * You already agree that having the fetching done by a separate program (possibly written in d) makes the solution cleaner (i.e. you are not infiltrating the code that actually does compiling with code that does network fetching).
>
> * I think specifying the entire url in the pragma is akin to specifying the full path of a given module on your local disk.  I think it's not the right place for it, the person who is building the code should be responsible for where the modules come from, and import should continue to specify the module relative to the include path.
>
> * A perfect (IMO) way to configure the fetch tool is by using the same mechanism that configures dmd on how to get modules -- the include path.  For instance -Ihttp://xxx.yyy.zzz/package can be passed to the compiler or put into the dmd.conf.
>
> * DMD already has a good mechanism to specify configuration and you would barely have to change anything internally.
>
> Here's how it would work.  I'll specify how it goes from command line to final (note the http path is not a valid path, it's just an example):
>
> dmd -Ihttp://www.dsource.org/projects/dcollections/import testproj.d

Now it's abundantly clear that dmd should have rdmd's 'make' functionality built-in. Otherwise you'd have to specify TreeMap.d (or library) on the command line.

>
> 1. dmd recognizes the url pattern and stores this as an 'external' path
> 2. dmd reads the file testproj.d and sees that it imports dcollections.TreeMap
> 3. Using it's non-external paths, it cannot find the module.
> 4. It calls:
>     dget -Ihttp://www.dsource.org/projects/dcollections/import dcollections.TreeMap
> 5. dget checks its internal cache to see if the file dcollections/TreeMap.[d|di] already exists -- not found
> 6. dget uses internal logic to generate a request to download either
>    a. an entire package which contains the requested import (preferred)
>    b. just the specific file dcollections/TreeMap.d
> 7. Using the url as a key, it stores the TreeMap.d file in a cache so it doesn't have to download it again (can be stored globally or local to the user/project)
> 8. Pipes the file to stdout, dmd reads the file, and returns 0 for success
> 9. dmd finishes compiling.
>
> On a second run to dmd, it would go through the same process, but dget succeeds on step 5 of finding it in the cache and pipes it to stdout.
>
> Some issues with this scheme:
>
> 1. dependency checking would be difficult for a build tool (like make) for doing incremental builds.  However, traditionally one does not specify standard library files as dependencies, so downloaded files would probably be under this same category.  I.e. if you need to rebuild, you'd have to clear the cache and do a make clean (or equivalent).  Another option is to have dget check to see if the file on the server has been modified.
>
> 2. It's possible that dget fetches files one at a time, which might be very slow (on the first build).  However, one can trigger whole package downloads easily enough (for example, by making the include path entry point at a zip file or tarball).  dget should be smart enough to handle extracting packages.
>
> I can't really think of any other issues.
>
> -Steve

dmd should be able to run multiple instances of dget without any conflicts (also parallel builds etc.).
Other then that it looks quite good to me.

P.S. It seems like dget is, in fact, dcache :)

-- 
Dmitry Olshansky

June 15, 2011
On 15/06/2011 15:33, Andrei Alexandrescu wrote:
> On 6/15/11 9:13 AM, Steven Schveighoffer wrote:
>> We have been getting along swimmingly without pragmas for adding local
>> include paths. Why do we need to add them using pragmas for network
>> include paths?
>
> That doesn't mean the situation is beyond improvement. If I had my way
> I'd add pragma(liburl) AND pragma(libpath).

pragma(lib) doesn't (and can't) work as it is, why do you want to add more useless pragmas? Command line arguments are the correct way to go here. Not to mention that paths won't be standardized across machines most likely so the latter would be useless.

>> Also, I don't see the major difference in someone who's making a piece
>> of software from adding the include path to their source file vs. adding
>> it to their build script.
>
> Because in the former case the whole need for a build script may be
> obviated. That's where I'm trying to be.

This can't happen in a lot of cases, eg if you're interfacing with a scripting language, you need certain files automatically generating during build etc. Admittedly, for the most part, you'll just want to be able to build libraries given a directory or an executable given a file with _Dmain() in. There'll still be a lot of cases where you want to specify some things to be dynamic libs, other static libs, and what if any of it you want in a resulting binary.

>> But in any case, it doesn't matter if both options are available -- it
>> doesn't hurt to have a pragma option as long as a config option is
>> available. I just don't want to *require* the pragma solution.
>
> Sounds good. I actually had the same notion, just forgot to mention it
> in the dip (fixed).

I'd agree with Steven that we need command line arguments for it, I completely disagree about pragmas though given that they don't work (as mentioned above). Just because I know you're going to ask:

# a.d has a pragma(lib) in it
$ dmd a.d
$ dmd b.d
$ dmd a.o b.o
<Linker errors>

This is unavoidable unless you put metadata in the object files, and even then you leave clutter in the resulting binary, unless you specify that the linker should remove it (I don't know if it can).

>> dget would just add the appropriate path:
>>
>> import dcollections.TreeMap =>
>> get
>> http://www.dsource.org/projects/dcollections/import/dcollections/TreeMap.d
>>
>> hm.. doesn't work
>> get
>> http://www.dsource.org/projects/dcollections/import/dcollections/TreeMap.di
>>
>> ok, there it is!
>
> This assumes the URL contains the package prefix. That would work, but
> imposes too much on the URL structure. I find the notation -Upackage=url
> more general.

I personally think there should be a central repository listing packages and their URLs etc, which massively simplifies what needs passing on a command line. Eg -RmyPackage would cause myPackage to be looked up on the central server, which will have the relevant URL etc.

Of course, there should be some sort of override method for private remote servers.

>> As I said in another post, you could also specify a zip file or tarball
>> as a base path, and the whole package is downloaded instead. We may need
>> some sort of manifest instead in order to verify the import will be
>> found instead of downloading the entire package to find out.
>
> Sounds cool.

I don't believe this tool should exist without compression being default.

-- 
Robert
http://octarineparrot.com/
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18