View mode: basic / threaded / horizontal-split · Log in · Help
January 18, 2005
D and clusters
Has anyone tried running a D program over a cluster of any kind?  I'm wondering
how it would be handled with the GC statically built into the executable.  I'm
guessing it would be fine.  I've just been reading articles lately about
bioinformatics and how a lot of lower level libraries are built with C and then
used by Python or Java and often distributed over a cluster.  D might be a good
fit in this arena; as a library workhorse or as a pipeline driver front-end.

Kramer
January 19, 2005
Re: D and clusters
In article <csjsuu$2db2$1@digitaldaemon.com>, Kramer says...
>
>Has anyone tried running a D program over a cluster of any kind?  I'm wondering
>how it would be handled with the GC statically built into the executable.  I'm
>guessing it would be fine.  I've just been reading articles lately about
>bioinformatics and how a lot of lower level libraries are built with C and then
>used by Python or Java and often distributed over a cluster.  D might be a good
>fit in this arena; as a library workhorse or as a pipeline driver front-end.



Mango (dsource.org) has a reasonably extensive clustering package: it
distributes queue & cache content as D classes, supports optimized
cache-coherency, and can squirt behaviour around the network as mobile-tasks.
It's also rather easy to use.

However, the GC really needs to support DLLs properly to make the latter operate
in a robust, truly dynamic manner. 

That is, if the mobile-code functionality you need can be defined statically
(per cluster node) then it will currently operate just fine within a cluster. If
you need dynamic Java-style loading of classes via a DLL distribution mechanism,
then you may run into the MM problems that plague D & DLLs -- each DLL will end
up with it's own GC, which can wreak havoc if you're not rather careful.

Mango.cluster is designed to handle both scenarios, but you currently have to be
aware of the multiple GC issues within the dynamic scenario. There are a number
of past topics where people are lamenting the lack of useful GC support
vis-a-vis DLLs. Any dynamic system will run into these problems: imagine having
to reconfigure & reboot an entire site just to install a new servlet ...

Walter perfers everything to be statically linked, and previously indicated that
he does not like DLLs at all (due to potential versioning problems). This is
likely the reason why the multiple GC problem apparently has rather minimal
priority.

Needless to say, many of us believe D would benefit greatly from some attention
in this arena.

Two things need to happen:

1) the GC has to be isolated into a DLL itself (so there's only one instance)

2) As I recall, the static-data extents of each DLL have to be registered with
the GC (in the same manner as executables). This could theoretically be done
manually, but should be done by the compiler instead.

Sean has been working on #1 (as part of the 'Ares' project), while #2 really
needs support from Walter himself. I can only suggest that more people encourage
Walter to assist. Failing that, Sean's work could get picked up by GDC and a
language fork could occur.

- Kris
January 20, 2005
Re: D and clusters
In article <csmcjh$5ro$1@digitaldaemon.com>, Kris says...
>

[snip]

>Mango.cluster is designed to handle both scenarios, but you currently have to be
>aware of the multiple GC issues within the dynamic scenario. There are a number
>of past topics where people are lamenting the lack of useful GC support
>vis-a-vis DLLs. Any dynamic system will run into these problems: imagine having
>to reconfigure & reboot an entire site just to install a new servlet ...
>
>Walter perfers everything to be statically linked, and previously indicated that
>he does not like DLLs at all (due to potential versioning problems). This is
>likely the reason why the multiple GC problem apparently has rather minimal
>priority.
>
>Needless to say, many of us believe D would benefit greatly from some attention
>in this arena.
>
>Two things need to happen:
>
>1) the GC has to be isolated into a DLL itself (so there's only one instance)
>
>2) As I recall, the static-data extents of each DLL have to be registered with
>the GC (in the same manner as executables). This could theoretically be done
>manually, but should be done by the compiler instead.
>
>Sean has been working on #1 (as part of the 'Ares' project), while #2 really
>needs support from Walter himself. I can only suggest that more people encourage
>Walter to assist. Failing that, Sean's work could get picked up by GDC and a
>language fork could occur.
>
>- Kris
>

There definetly were some posts mentioning DLL's and the GC as important issues
when the MIID thread was active.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/10456
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/9166
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/10555

-Kramer
January 21, 2005
Re: D and clusters
"Kris" <Kris_member@pathlink.com> wrote in message news:csmcjh$5ro$1@digitaldaemon.com...
> In article <csjsuu$2db2$1@digitaldaemon.com>, Kramer says...
>>
>>Has anyone tried running a D program over a cluster of any kind?  I'm wondering
>>how it would be handled with the GC statically built into the executable.  I'm
>>guessing it would be fine.  I've just been reading articles lately about
>>bioinformatics and how a lot of lower level libraries are built with C and then
>>used by Python or Java and often distributed over a cluster.  D might be a good
>>fit in this arena; as a library workhorse or as a pipeline driver front-end.
>
>
>
> Mango (dsource.org) has a reasonably extensive clustering package: it
> distributes queue & cache content as D classes, supports optimized
> cache-coherency, and can squirt behaviour around the network as mobile-tasks.
> It's also rather easy to use.
>
> However, the GC really needs to support DLLs properly to make the latter operate
> in a robust, truly dynamic manner.
>
> That is, if the mobile-code functionality you need can be defined statically
> (per cluster node) then it will currently operate just fine within a cluster. If
> you need dynamic Java-style loading of classes via a DLL distribution mechanism,
> then you may run into the MM problems that plague D & DLLs -- each DLL will end
> up with it's own GC, which can wreak havoc if you're not rather careful.
>
> Mango.cluster is designed to handle both scenarios, but you currently have to be
> aware of the multiple GC issues within the dynamic scenario. There are a number
> of past topics where people are lamenting the lack of useful GC support
> vis-a-vis DLLs. Any dynamic system will run into these problems: imagine having
> to reconfigure & reboot an entire site just to install a new servlet ...
>
> Walter perfers everything to be statically linked, and previously indicated that
> he does not like DLLs at all (due to potential versioning problems). This is
> likely the reason why the multiple GC problem apparently has rather minimal
> priority.
>
> Needless to say, many of us believe D would benefit greatly from some attention
> in this arena.
>
> Two things need to happen:
>
> 1) the GC has to be isolated into a DLL itself (so there's only one instance)
>
> 2) As I recall, the static-data extents of each DLL have to be registered with
> the GC (in the same manner as executables). This could theoretically be done
> manually, but should be done by the compiler instead.
>
> Sean has been working on #1 (as part of the 'Ares' project), while #2 really
> needs support from Walter himself. I can only suggest that more people encourage
> Walter to assist. Failing that, Sean's work could get picked up by GDC and a
> language fork could occur.

I'm not convinced that the GC *has* to be in a DLL/.so, but I completely agree that this issue needs to be sorted.

To be frank, I'm surprised it's not received any input in the months I've been away. I'll certainly help lend a voice 
and, sometime later next month, technical input to the cause.

But I would say now that I believe that D should support the following scenarios, all correctly functional:

   1. Compilation of an exe, statically linked, without any non-system runtime dynamic library dependencies
   2. Compilation of an exe, dynamically linked to use "The D DLL" (let's call it DGC.DLL)
   3. Compilation of an exe, statically linked, that can load a statically linked D DLL
   4. Compilation of an exe, statically linked, that can load a dynamically linked D DLL (i.e. the DLL uses DCG.DLL)
   5. Compilation of an exe, dynamically linked to DCG.DLL, that can load a statically linked D DLL
   6. Compilation of an exe, dynamically linked to DCG.DLL, that can load a dynamically linked D DLL (i.e. the DLL uses 
DCG.DLL)

If it fails to do any of these, it's still born, IMO, since it will fail to be better than C++ and/or Java/.NET in their 
respective areas of weakness.

AFAIK the current state of play is that only 1 is supported, and possibly 3.

If we say that 1, 3, 5 & 4 are not needed, then it's easy to do, but D becomes another VM/Dll-hell white elephant joke 
like Java and .NET, suitable only for large-scale, highly proactively managed, projects whose installations have to be 
nursed by experts.

I think I proposed many moons ago that the GC objects inside the exe *and* inside any DLLs must, at the epoch of their 
initialisation, work out who is in first, and defer to that. Naturally, there are some complications, since one might 
load two D DLLs from a non-D program. In such a case, were the second D DLL to defer all its GC to the first, and the 
first to be unloaded, the second one might snuff it in an unseemly fashion. Methinks that the better way would be to 
associate the GC with the _process_, rather than the _module_, and so each D-GC-using module either creates the GC, or 
attaches to it if it already exists. Naturally, the single per-process GC would have to operate some kind of reference 
counting. The other complication, of course, is where the GC code reside. If it's not in the process, but rather in a 
DLL, the second and subsequent D DLLs would themselves have to take module references (a la 
LoadLibrary(GetModuleFileName())) on the first, so as to ensure that it's code stays in the process. I'm not sure of all 
the subtleties involved here off the top of my head, but at worst case it might mean that the first D DLL would remain 
in memory for the lifetime of the process.
Top | Discussion index | About this forum | D home