April 07, 2006
Walter Bright wrote:
> kris wrote:
>> Yes, that's correct. But typeinfo is a rather rudimetary part of the language support. Wouldn't you agree? If I, for example, declare an array of 10 bytes (static byte[10]) then I'm bound over to import std.string ~ simply because TypeInfo_StaticArray wants to use std.string.toString(int), rather than the C library version of itoa() or a "low-level support" version instead.
> 
> It has nothing to do with having a static byte[10] declaration. For the program:
> 
> void main()
> {
>     static byte[10] b;
> }
> 
> The only things referenced by the object file are _main, __acrtused_con, and __Dmain. You can verify this by running obj2asm on the output, which gives:
> 
> -------------------------------------
> _TEXT   segment dword use32 public 'CODE'       ;size is 0
> _TEXT   ends
> _DATA   segment para use32 public 'DATA'        ;size is 0
> _DATA   ends
> CONST   segment para use32 public 'CONST'       ;size is 0
> CONST   ends
> _BSS    segment para use32 public 'BSS' ;size is 10
> _BSS    ends
> FLAT    group
> includelib phobos.lib
>         extrn   _main
>         extrn   __acrtused_con
>         extrn   __Dmain
> __Dmain COMDAT flags=x0 attr=x0 align=x0
> 
> _TEXT   segment
>         assume  CS:_TEXT
> _TEXT   ends
> _DATA   segment
> _DATA   ends
> CONST   segment
> CONST   ends
> _BSS    segment
> _BSS    ends
> __Dmain comdat
>         assume  CS:__Dmain
>                 xor     EAX,EAX
>                 ret
> __Dmain ends
>         end
> ----------------------------------

As expected, building this against Ares produces the exact same output.

> Examining the .map file produced shows that only these functions are pulled in from std.string:
> 
> 0002:00002364       _D3std6string7iswhiteFwZi  00404364
> 0002:000023A4       _D3std6string3cmpFAaAaZi   004043A4
> 0002:000023E8       _D3std6string4findFAawZi   004043E8
> 0002:00002450       _D3std6string8toStringFkZAa 00404450
> 0002:000024CC       _D3std6string9inPatternFwAaZi 004044CC
> 0002:00002520       _D3std6string6columnFAaiZk 00404520
> 
> I do not know offhand why a couple of those are pulled in, but I suggest that obj2asm and the generated .map files are invaluable at determining what pulls in what. Sometimes the results are surprising.

Do I have to do anything special to get this data in the .map file? Mine contains no function references at all.  Here's the first few lines (where it seems the function data should be):

 Start         Length     Name                   Class
 0002:00000000 0000E1B8H  _TEXT                  CODE 32-bit
 0002:0000E1B8 00000162H  ICODE                  ICODE 32-bit
 0003:00000000 00000004H  .CRT$XIA               DATA 32-bit

> It's just not a big deal. Try the following:
> 
> extern (C) int printf(char* f, ...) { return 0; }
> 
> void main()
> {
>     static byte[10] b;
> }
> 
> and compare the difference in exe file sizes, with and without the printf stub.

Compiled against Ares with "-release" specified, the EXE is 82,972 bytes without the stub and 82,972 bytes with the stub.  Compiled against Phobos, it's 87,068 bytes without the stub and 86,556 with the stub.  So you're right, it's not a big difference at all.  And neither is the ~5K executable size difference--I think the gap has actually closed over time, as I remember it being wider.  The zero byte difference for Ares is a bit confusing though.  I'll take a look at the binaries on my way home and see if I can suss out the differences.


Sean
April 07, 2006
On Thu, 06 Apr 2006 17:06:15 -0700, Sean Kelly wrote:

> Walter Bright wrote:
> Do I have to do anything special to get this data in the .map file?
> Mine contains no function references at all.  Here's the first few lines
> (where it seems the function data should be):
> 
>   Start         Length     Name                   Class
>   0002:00000000 0000E1B8H  _TEXT                  CODE 32-bit
>   0002:0000E1B8 00000162H  ICODE                  ICODE 32-bit
>   0003:00000000 00000004H  .CRT$XIA               DATA 32-bit

dmd yourprog.d -L/map


-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
7/04/2006 10:25:31 AM
April 07, 2006
Sean Kelly wrote:
> Do I have to do anything special to get this data in the .map file? Mine contains no function references at all.

Add the switch -L/map to the dmd command line.


> Compiled against Ares with "-release" specified, the EXE is 82,972 bytes without the stub and 82,972 bytes with the stub.  Compiled against Phobos, it's 87,068 bytes without the stub and 86,556 with the stub.  So you're right, it's not a big difference at all.  And neither is the ~5K executable size difference--I think the gap has actually closed over time, as I remember it being wider.  The zero byte difference for Ares is a bit confusing though.  I'll take a look at the binaries on my way home and see if I can suss out the differences.

Segment sizes get rounded up, I think to the page size.
April 07, 2006
Long post; sorry about that.


Walter Bright wrote:
> kris wrote:
> 
>> Yes, that's correct. But typeinfo is a rather rudimetary part of the language support. Wouldn't you agree? If I, for example, declare an array of 10 bytes (static byte[10]) then I'm bound over to import std.string ~ simply because TypeInfo_StaticArray wants to use std.string.toString(int), rather than the C library version of itoa() or a "low-level support" version instead.
> 
> 
> It has nothing to do with having a static byte[10] declaration. For the program:
> 
> void main()
> {
>     static byte[10] b;
> }
> 
> The only things referenced by the object file are _main, __acrtused_con, and __Dmain. You can verify this by running obj2asm on the output, which gives:
> 
> -------------------------------------
> _TEXT   segment dword use32 public 'CODE'       ;size is 0
> _TEXT   ends
> _DATA   segment para use32 public 'DATA'        ;size is 0
> _DATA   ends
> CONST   segment para use32 public 'CONST'       ;size is 0
> CONST   ends
> _BSS    segment para use32 public 'BSS' ;size is 10
> _BSS    ends
> FLAT    group
> includelib phobos.lib
>         extrn   _main
>         extrn   __acrtused_con
>         extrn   __Dmain
> __Dmain COMDAT flags=x0 attr=x0 align=x0
> 
> _TEXT   segment
>         assume  CS:_TEXT
> _TEXT   ends
> _DATA   segment
> _DATA   ends
> CONST   segment
> CONST   ends
> _BSS    segment
> _BSS    ends
> __Dmain comdat
>         assume  CS:__Dmain
>                 xor     EAX,EAX
>                 ret
> __Dmain ends
>         end
> ----------------------------------
> 


It would help if you'd note under what circumstances the TypeInfo /is/ included, then. For example, this program:

void main()
{
        throw new Exception ("");
}


causes all kinds of TypeInfo to be linked:

_D3std8typeinfo2Aa11TypeInfo_Aa5tsizeFZk 004074E8
_D3std8typeinfo2Aa11TypeInfo_Aa6equalsFPvPvZi 00407470
_D3std8typeinfo2Aa11TypeInfo_Aa7compareFPvPvZi 004074CC
_D3std8typeinfo2Aa11TypeInfo_Aa7getHashFPvZk 00407430
_D3std8typeinfo2Aa11TypeInfo_Aa8toStringFZAa 00407424
_D3std8typeinfo7ti_char10TypeInfo_a4swapFPvPvZv 0040466C
_D3std8typeinfo7ti_char10TypeInfo_a5tsizeFZk 00404664
_D3std8typeinfo7ti_char10TypeInfo_a6equalsFPvPvZi 00404630
_D3std8typeinfo7ti_char10TypeInfo_a7compareFPvPvZi 0040464C
_D3std8typeinfo7ti_char10TypeInfo_a7getHashFPvZk 00404624
_D3std8typeinfo7ti_char10TypeInfo_a8toStringFZAa 00404618
_D3std8typeinfo7ti_uint10TypeInfo_k4swapFPvPvZv 00407400
_D3std8typeinfo7ti_uint10TypeInfo_k5tsizeFZk 004073F8
_D3std8typeinfo7ti_uint10TypeInfo_k6equalsFPvPvZi 004073B0
_D3std8typeinfo7ti_uint10TypeInfo_k7compareFPvPvZi 004073CC
_D3std8typeinfo7ti_uint10TypeInfo_k7getHashFPvZk 004073A4
_D3std8typeinfo7ti_uint10TypeInfo_k8toStringFZAa 00407398
_D6object14TypeInfo_Array4swapFPvPvZv 004028A8
_D6object14TypeInfo_Array5tsizeFZk 004028A0
_D6object14TypeInfo_Array6equalsFPvPvZi 00402778
_D6object14TypeInfo_Array7compareFPvPvZi 00402808
_D6object14TypeInfo_Array7getHashFPvZk 0040271C
_D6object14TypeInfo_Array8toStringFZAa 004026F8
_D6object14TypeInfo_Class5tsizeFZk 00402C48
_D6object14TypeInfo_Class6equalsFPvPvZi 00402BB8
_D6object14TypeInfo_Class7compareFPvPvZi 00402C00
_D6object14TypeInfo_Class7getHashFPvZk 00402BA8
_D6object14TypeInfo_Class8toStringFZAa 00402B9C
_D6object15TypeInfo_Struct5tsizeFZk 00402D3C
_D6object15TypeInfo_Struct6equalsFPvPvZi 00402C94
_D6object15TypeInfo_Struct7compareFPvPvZi 00402CE8
_D6object15TypeInfo_Struct7getHashFPvZk 00402C58
_D6object15TypeInfo_Struct8toStringFZAa 00402C50
_D6object16TypeInfo_Pointer4swapFPvPvZv 004026E0
_D6object16TypeInfo_Pointer5tsizeFZk 004026D8
_D6object16TypeInfo_Pointer6equalsFPvPvZi 004026AC
_D6object16TypeInfo_Pointer7compareFPvPvZi 004026C8
_D6object16TypeInfo_Pointer7getHashFPvZk 004026A0
_D6object16TypeInfo_Pointer8toStringFZAa 0040267C
_D6object16TypeInfo_Typedef4swapFPvPvZv 00402664
_D6object16TypeInfo_Typedef5tsizeFZk 00402658
_D6object16TypeInfo_Typedef6equalsFPvPvZi 00402628 _D6object16TypeInfo_Typedef7compareFPvPvZi 00402640
_D6object16TypeInfo_Typedef7getHashFPvZk 00402618
_D6object16TypeInfo_Typedef8toStringFZAa 00402610
_D6object17TypeInfo_Delegate5tsizeFZk 00402B94
_D6object17TypeInfo_Delegate8toStringFZAa 00402B70
_D6object17TypeInfo_Function5tsizeFZk 00402B6C
_D6object17TypeInfo_Function8toStringFZAa 00402B48
_D6object20TypeInfo_StaticArray4swapFPvPvZv 00402A40
_D6object20TypeInfo_StaticArray5tsizeFZk 00402A2C
_D6object20TypeInfo_StaticArray6equalsFPvPvZi 00402960
_D6object20TypeInfo_StaticArray7compareFPvPvZi 004029BC
_D6object20TypeInfo_StaticArray7getHashFPvZk 00402924
_D6object20TypeInfo_StaticArray8toStringFZAa 004028E4
_D6object25TypeInfo_AssociativeArray5tsizeFZk 00402B40 _D6object25TypeInfo_AssociativeArray8toStringFZAa 00402AFC


Where did all that come from? I suspect you're looking at this concern with a microscope only, while I think the bigger picture is perhaps more important.


> Examining the .map file produced shows that only these functions are pulled in from std.string:
> 
> 0002:00002364       _D3std6string7iswhiteFwZi  00404364
> 0002:000023A4       _D3std6string3cmpFAaAaZi   004043A4
> 0002:000023E8       _D3std6string4findFAawZi   004043E8
> 0002:00002450       _D3std6string8toStringFkZAa 00404450
> 0002:000024CC       _D3std6string9inPatternFwAaZi 004044CC
> 0002:00002520       _D3std6string6columnFAaiZk 00404520
> 
> I do not know offhand why a couple of those are pulled in, but I suggest that obj2asm and the generated .map files are invaluable at determining what pulls in what. Sometimes the results are surprising.


Yes they are surprising ~ partly because there's more than one might imagine:

 0003:00000D74       _D3std6string10whitespaceG6a 00411D74
 0003:00000D7C       _D3std6string2LSw          00411D7C
 0003:00000D80       _D3std6string2PSw          00411D80
 0002:00002464       _D3std6string3cmpFAaAaZi   00404464
 0002:000024A8       _D3std6string4findFAawZi   004044A8
 0002:000025E4       _D3std6string6columnFAaiZi 004045E4
 0003:00000CF4       _D3std6string6digitsG10a   00411CF4
 0002:00002424       _D3std6string7iswhiteFwZi  00404424
 0003:00000D40       _D3std6string7lettersG52a  00411D40
 0003:00000D84       _D3std6string7newlineG2a   00411D84
 0002:00002514       _D3std6string8toStringFkZAa 00404514
 0003:00000CE4       _D3std6string9hexdigitsG16a 00411CE4
 0002:00002590       _D3std6string9inPatternFwAaZi 00404590
 0003:00000D08       _D3std6string9lowercaseG26a 00411D08
 0003:00000D00       _D3std6string9octdigitsG8a 00411D00
 0003:00000D24       _D3std6string9uppercaseG26a 00411D24

Please see the extensive list at the end for some further surprises


> 
>> That's tight-coupling within very low-level language support. Uncool.
>> Wouldn't you at least agree that specific instance is hardly an absolute necessity?
> 
> 
> std.string.toString is 124 bytes long, and doesn't pull anything else in (except see below). Writing another version of it in typeinfo isn't going to reduce the size of the program *at all*, in fact, it will likely increase it because now there'll be two versions of it.


You're focusing purely on the fact that adding an itoa() would increase the executable size. At the same time, completely ignoring the explicit mention of using the C runtime function instead (which is usually linked also), and the clear fact that importing std.string brings along with it the following:

 0003:00000D74       _D3std6string10whitespaceG6a 00411D74
 0003:00000D7C       _D3std6string2LSw          00411D7C
 0003:00000D80       _D3std6string2PSw          00411D80
 0002:00002464       _D3std6string3cmpFAaAaZi   00404464
 0002:000024A8       _D3std6string4findFAawZi   004044A8
 0002:000025E4       _D3std6string6columnFAaiZi 004045E4
 0003:00000CF4       _D3std6string6digitsG10a   00411CF4
 0002:00002424       _D3std6string7iswhiteFwZi  00404424
 0003:00000D40       _D3std6string7lettersG52a  00411D40
 0003:00000D84       _D3std6string7newlineG2a   00411D84
 0002:00002514       _D3std6string8toStringFkZAa 00404514
 0003:00000CE4       _D3std6string9hexdigitsG16a 00411CE4
 0002:00002590       _D3std6string9inPatternFwAaZi 00404590
 0003:00000D08       _D3std6string9lowercaseG26a 00411D08
 0003:00000D00       _D3std6string9octdigitsG8a 00411D00
 0003:00000D24       _D3std6string9uppercaseG26a 00411D24


Along with a number of dependencies.

And, apparently, you think it's perhaps responsible for bringing in the floating point support too.

The point being made is that of coupling between low and high levels ~ illustrated quite well by the above.

I think this kind of thing is worth addressing, for a number of reasons.


>>> Although there is a lot of code in std.string, unreferenced free functions in it should be discarded by the linker. A check of the generated .map file should verify this - it is certainly supposed to work that way. One problem Java has is that there are no free functions, so referencing one function wound up pulling in every part of the class the function resided in.
>>
>> This is exactly the case with printf <g>. It winds up linking the world
> 
> 
> No, it does not link in the world, floating point, or graphics libraries. It links in C's standard I/O (which usually gets linked in anyway), and about 4000 bytes of code. That's somewhat less than a megabyte <g>.


Who says the standard C IO should /always/ get linked in? D currently /enforces/ that, whereas it's not a requirement at all for valid operation. What's more, the enforcement is simply because Object.d has a print() method, which uses printf() like so:

print ()
{
    printf ("%.*s", toString());
}

Why not just use ConsoleWrite(), or anything but printf()? There's a number of valid (and decoupled) alternatives to this approach. Why can't they be used instead? You're answer is "well, it doesn't make any difference anyway". That's entirely silly. Yes, the C-library console-startup wrapper causes the IO system to be linked also. But that can be replaced, since it's not directly part of the D runtime support. To make things worse, Object.print() is perhaps the least used method in all of D! Thus, it tends to place this whole issue on the verge of ridiculous.

Why not just remove the dependency instead?

One of the tenets of good library design is to build in layers, and then ensure there's no dependencies between a lower layer and any of the higher ones. Here's two cases of just such a dependency ~ they are almost trivial to fix, yet nothing happens ... why?

Thus, I really don't wish to argue with you on this one, Walter. If you simply refuse to accept that any system might prefer to avoid the default IO platform, for whatever valid reason it may have, then there's little point in even discussing the nature of tight-coupling.

One can hack the internal dependencies in an attempt to rectify the concerns; yet why? Better to leave all of /internal and friends as it stands to avoid branching the code. I really thought you'd understand the value in making that part platform (library) agnostic. And for such a minor cost, too.


>> because it's a general purpose utility function that does all kinds of conversion and all kinds of IO. Printf() is an all or nothing design ~ you can't selectively link pieces of it.
>>
>> That's usually not a problem. However, you've chosen to bind it to low-level language support (in the root Object). That choice causes tight coupling between the language low-level support and a high-level library function ~ one which ought to be optional.
>>
>> Wouldn't you at least agree this specific case is not necessary for the D language to function correctly? That there are other perfectly workable alternatives?
> 
> 
> It's just not a big deal. Try the following:
> 
> extern (C) int printf(char* f, ...) { return 0; }
> 
> void main()
> {
>     static byte[10] b;
> }
> 
> and compare the difference in exe file sizes, with and without the printf stub.


Funny :-D

It makes little difference because all the other dependency code is linked in from other places, Walter. It can be fixed one step at a time.

What you're saying here is the following. Take a shotgun, and pepper the boat you're standing in with holes. Now, see? When you plug up this one hole, it really doesn't stop the water coming in? See? Hardly any difference!

Needless to say, I think you're being somewhat disingenious. Or, at least trying to obfuscate a simple case of unecessary low-high coupling in D. But let's move on ...


>>> printf doesn't pull in the floating point library (I went to a lot of effort to make that so!). It does pull in the C IO library, which is very hard to not pull in (there always seems to be something referencing it). It shouldn't pull in the C wide character stuff. D's IO (writefln) will pull in C's IO anyway, so the only thing extra is the integer version of the specific printf code (about 4K).
>>
>> How can it convert %f, %g and so on if it doesn't use FP support at all? 
> 
> 
> It's magic! Naw, it's just that if you actually use floating point in a program, the compiler emits a special extern reference (to __fltused) which pulls in the floating point IO formatting code. Otherwise, it defaults to just a stub. Try it.


void main()
{
        throw new Exception ("");
}

I'm quite familiar with __fltused. It's clearly used by the little example program above, given that this stuff is linked in:

 0003:00007150       ___wpscanfloat             00418150
 0003:00007154       ___wpfloatfmt              00418154
 0003:00007158       ___pscanfloat              00418158
 0003:0000715C       ___pfloatfmt               0041815C
 0003:0000453C       __8087                     0041553C
 0003:0000453C       __80x87                    0041553C
 0002:0000E560       __8087_init                00410560
 0002:0000E9B0       __FCOMPP@                  004109B0
 0002:0000E9CE       __FTEST0@                  004109CE
 0002:0000E9EE       __FTEST@                   004109EE
 0002:0000EA06       __DTST87@                  00410A06
 0002:0000EA0A       __87TOPSW@                 00410A0A
 0002:0000EA0F       __DBLTO87@                 00410A0F
 0002:0000EA1A       __DBLINT87@                00410A1A
 0002:0000EA3B       __DBLLNG87@                00410A3B
 0002:0000EA57       __FLTTO87@                 00410A57
 0002:0000EA5E       __status87                 00410A5E
 0002:0000EA63       __clear87                  00410A63
 0002:0000EA6C       __control87                00410A6C
 0002:0000EA93       __fpreset                  00410A93

That looks rather like floating point support; Where in the program is floating point actually used? I don't get it.


>> Either way, it's not currently possible to build a D program without a swathe of FP support code,
>> printf,
>> the entire C IO package,
>> wide-char support,
>> and a whole lot more besides. I'd assumed the linked FP support was for printf, but perhaps it's for std.string instead? I've posted the linker maps (in the past) to illustrate exactly this.
> 
> 
> My point is that assuming what is pulled in by what is about as reliable as guessing where the bottlenecks in one's code is. You can't tell bottlenecks without a profiler, and you've got both hands tied behind your back trying to figure out who pulls in what if you're not using .map files, grep, and obj2asm.
> 
>> Are you not at all interested in improving this aspect of the language usage?
> 
> 
> Sure, but based on accurate information. 

*Cough*



> Pulling printf won't do anything. Try it if you don't agree.


That's your claim, not mine :)

See the analogy above.


> 
> For example, which modules pull in the floating point formatting code? It isn't printf. We can find out by doing a grep for __fltused:
> 
> boxer.obj:      __fltused
> complex.obj:    __fltused
> conv.obj:       __fltused
> date.obj:       __fltused
> demangle.obj:   __fltused
> format.obj:     __fltused
> gamma.obj:      __fltused
> math.obj:       __fltused
> math2.obj:      __fltused
> outbuffer.obj:  __fltused
> stream.obj:     __fltused
> string.obj:     __fltused
> ti_Acdouble.obj:        __fltused
> ti_Acfloat.obj: __fltused
> ti_Acreal.obj:  __fltused
> ti_Adouble.obj: __fltused
> ti_Afloat.obj:  __fltused
> ti_Areal.obj:   __fltused
> ti_cdouble.obj: __fltused
> ti_cfloat.obj:  __fltused
> ti_creal.obj:   __fltused
> ti_double.obj:  __fltused
> ti_float.obj:   __fltused
> ti_real.obj:    __fltused
> 
> Some examination of the .map file shows that the only one of these pulled in by default is std.string. So I think a reasonable approach would be to look at removing the floating point from std.string 


So importing std.string is causing FP support to be imported? No surprises there; something is certainly bringing it in. Along with the "world", as one can see from the attached .map of the example program:

void main()
{
        throw new Exception ("");
}

Keep in mind it's not the number of entries, but the number of superfluous entries that are of concern (I removed all Win32 imports in an attempt to make the list more managable).

Also, please keep in mind that the concern is one of unecessary coupling from the low-level runtime support, into the high-level library functions. This will often result in a cascade of dependencies, much like what we see below. Not only does it cause code-bloat, but it makes the language-support dependent upon a specific high-level library. These dependencies are /very/ easy to remedy, with an approriate reduction in code size as a bonus.

The map file is here, since it's too big to attach: http://www.dsource.org/projects/mango/browser/trunk/doc/map.txt?rev=818&format=raw
April 07, 2006
kris wrote:
> It would help if you'd note under what circumstances the TypeInfo /is/ included, then. For example, this program:
> 
> void main()
> {
>         throw new Exception ("");
> }
> 
> 
> causes all kinds of TypeInfo to be linked:

In general, an easy way to see why a particular module is being pulled in is to temporarily remove it from the library (lib phobos -foo;), link, and see where the undefined reference is coming from. I'd start by running obj2asm on the module you just compiled, and see what extern directives it puts out.

I'm not trying to be a jerk by telling you this procedure rather than just giving the answer, but 1) I don't know the answer offhand and I'd have to follow the same procedure to figure it out and 2) I hope that by giving you the tools and methodology for figuring it out, this kind of question won't repeatedly come up (and yes, it has come up repeatedly). 3) I hope that anyone else with these kinds of questions will get familiar with how to use these tools, too. It's a lot better than guessing and assuming.

Tools like lib, obj2asm, and grep are incredibly useful.


> Where did all that come from? I suspect you're looking at this concern with a microscope only, while I think the bigger picture is perhaps more important.

I don't think there is a bigger picture. There's only a case by case analysis of what is needed and what isn't.

> Yes they are surprising ~ partly because there's more than one might imagine:
> 
>  0003:00000D74       _D3std6string10whitespaceG6a 00411D74
>  0003:00000D7C       _D3std6string2LSw          00411D7C
>  0003:00000D80       _D3std6string2PSw          00411D80
>  0002:00002464       _D3std6string3cmpFAaAaZi   00404464
>  0002:000024A8       _D3std6string4findFAawZi   004044A8
>  0002:000025E4       _D3std6string6columnFAaiZi 004045E4
>  0003:00000CF4       _D3std6string6digitsG10a   00411CF4
>  0002:00002424       _D3std6string7iswhiteFwZi  00404424
>  0003:00000D40       _D3std6string7lettersG52a  00411D40
>  0003:00000D84       _D3std6string7newlineG2a   00411D84
>  0002:00002514       _D3std6string8toStringFkZAa 00404514
>  0003:00000CE4       _D3std6string9hexdigitsG16a 00411CE4
>  0002:00002590       _D3std6string9inPatternFwAaZi 00404590
>  0003:00000D08       _D3std6string9lowercaseG26a 00411D08
>  0003:00000D00       _D3std6string9octdigitsG8a 00411D00
>  0003:00000D24       _D3std6string9uppercaseG26a 00411D24
> 
> Please see the extensive list at the end for some further surprises


All those other names are are the static data. Things like:

const dchar LS = '\u2028';      /// UTF line separator
const dchar PS = '\u2029';      /// UTF paragraph separator

I submit that they aren't significant. The significant thing is the entire std.string.obj is not linked in.

> You're focusing purely on the fact that adding an itoa() would increase the executable size.

Yes.

> At the same time, completely ignoring the explicit
> mention of using the C runtime function instead (which is usually linked also), and the clear fact that importing std.string brings along with it the following:

And the only possible problem I see there is worrying about executable size.

> 
>  0003:00000D74       _D3std6string10whitespaceG6a 00411D74
>  0003:00000D7C       _D3std6string2LSw          00411D7C
>  0003:00000D80       _D3std6string2PSw          00411D80
>  0002:00002464       _D3std6string3cmpFAaAaZi   00404464
>  0002:000024A8       _D3std6string4findFAawZi   004044A8
>  0002:000025E4       _D3std6string6columnFAaiZi 004045E4
>  0003:00000CF4       _D3std6string6digitsG10a   00411CF4
>  0002:00002424       _D3std6string7iswhiteFwZi  00404424
>  0003:00000D40       _D3std6string7lettersG52a  00411D40
>  0003:00000D84       _D3std6string7newlineG2a   00411D84
>  0002:00002514       _D3std6string8toStringFkZAa 00404514
>  0003:00000CE4       _D3std6string9hexdigitsG16a 00411CE4
>  0002:00002590       _D3std6string9inPatternFwAaZi 00404590
>  0003:00000D08       _D3std6string9lowercaseG26a 00411D08
>  0003:00000D00       _D3std6string9octdigitsG8a 00411D00
>  0003:00000D24       _D3std6string9uppercaseG26a 00411D24
> 
> 
> Along with a number of dependencies.

Take a look at those functions and data - what dependencies?

> And, apparently, you think it's perhaps responsible for bringing in the floating point support too.

That is a problem, and I can fix that. No big deal - it wasn't printf bringing in the floating point - and a reengineering or rewrite of Phobos is not necessary. I don't even need to change any library source code.

> The point being made is that of coupling between low and high levels ~ illustrated quite well by the above.
> I think this kind of thing is worth addressing, for a number of reasons.

I think you're seeing an effect that is an issue, but are mistaken as to the cause of the problem.


> Who says the standard C IO should /always/ get linked in? D currently /enforces/ that, whereas it's not a requirement at all for valid operation.

There isn't that much to it, and it doesn't hurt anything.

> What's more, the enforcement is simply because Object.d has a print() method, which uses printf() like so:
> 
> print ()
> {
>     printf ("%.*s", toString());
> }

Again, it isn't necessarilly printf doing that. Try the code I posted in the last message that stubs out printf, which will *prevent* it from being linked in from the library. Compile/link it, and examine the .map file.

(The stubbing out method is another technique for figuring out what pulls in what.)

> Why not just use ConsoleWrite(), or anything but printf()?

Because it's not portable (what should the Linux one look like?), and does not deliver the billed benefits. But the worst thing about calling ConsoleWrite() directly is that it does not play well with any other IO the user may have done or be in the process of doing. What will happen is that any object.print()'s will not be synchronized with the output from writef, printf, or any other of the stdout functions.

> There's a number of valid (and decoupled) alternatives to this approach. Why can't they be used instead? You're answer is "well, it doesn't make any difference anyway". That's entirely silly. Yes, the C-library console-startup wrapper causes the IO system to be linked also. But that can be replaced, since it's not directly part of the D runtime support.

Why does the C library need replacing? I honestly don't get it.


> To make things worse, Object.print() is perhaps the least used method in all of D! Thus, it tends to place this whole issue on the verge of ridiculous.
> Why not just remove the dependency instead?

Because it doesn't buy anything to remove it. Try it and see (or even easier, try the source I posted with the stubbed out printf - that will absolutely, positively prevent printf from being linked in from the library, without needing to change or recompile object.d at all).


> One of the tenets of good library design is to build in layers, and then ensure there's no dependencies between a lower layer and any of the higher ones. Here's two cases of just such a dependency ~ they are almost trivial to fix, yet nothing happens ... why?
> 
> Thus, I really don't wish to argue with you on this one, Walter. If you simply refuse to accept that any system might prefer to avoid the default IO platform, for whatever valid reason it may have, then there's little point in even discussing the nature of tight-coupling.

If you want to use a system that for some reason can't have C's IO subsystem, then just include the one liner:

extern (C) int printf(char* f, ...) { return 0; }

somewhere in your code, and it's gone.


> One can hack the internal dependencies in an attempt to rectify the concerns; yet why? Better to leave all of /internal and friends as it stands to avoid branching the code. I really thought you'd understand the value in making that part platform (library) agnostic. And for such a minor cost, too.

You don't need to hack the internals to get rid of any vestige of printf. Just stub it out.


> Or, at least trying to obfuscate a simple case of unecessary low-high coupling in D. But let's move on ...

I'm trying to point out that things aren't so simple.


> I'm quite familiar with __fltused.

Your questions about how printf avoided linking in %f support indicated otherwise.

> It's clearly used by the little example program above, given that this stuff is linked in:
...
> That looks rather like floating point support; Where in the program is floating point actually used? I don't get it.

I went over that in my last post, too.


>> Pulling printf won't do anything. Try it if you don't agree.
> That's your claim, not mine :)

You don't have to believe me, that's why I encourage you to try it and give you the tools and methodology to figure these things out.


> Keep in mind it's not the number of entries, but the number of superfluous entries that are of concern (I removed all Win32 imports in an attempt to make the list more managable).

Until you've tracked down each and every one and understand where it is pulled in from and why it is there, there is no way to decide which ones are superfluous or not.

There's an awful lot of startup and shutdown going on - stuff that is required for D (or the C runtime library, for that matter) to function. An awful lot is required for the exception handling support to work - that has to be in all programs. For the gc to start up and shut down gracefully. It goes on.


> Also, please keep in mind that the concern is one of unecessary coupling from the low-level runtime support, into the high-level library functions. This will often result in a cascade of dependencies, much like what we see below. Not only does it cause code-bloat, but it makes the language-support dependent upon a specific high-level library. These dependencies are /very/ easy to remedy, with an approriate reduction in code size as a bonus.

As we've discovered, pulling printf out of object.d isn't going to remedy anything. It just is not that simple.
April 07, 2006
Walter Bright wrote:
> Georg Wrede wrote:
> 
>> I admit this is a "feelings based" thing with most people I've talked with. It seems that on embedded platforms, many expect to write all the needed code themselves. It's also felt (possibly unduely??) that Phobos (or whatever general Win+*nix standard library) is mostly useless in embedded applications.
> 
> I'd like to get to the bottom of this feeling. For example, Kris was unhappy that typeinfo imported std.strings. I can't figure out what the problem with that is.
> 
>> To give a parallell (to explain my view here): There are many Linux distributions that are compiled with 386 as target. At the same time, their specs for memory, clock speed, etc. _in_practice_ rule out any machine not using recent Intel processors. I see this as a joke.
>>
>> Call this inconsistent specs. I'm discussing here so D would avoid this kind of inconsistencies.
> 
> For the embedded people I've talked with, D without floating point would have been a good match.

Uh-oh, after having read what Kris and others have posted as replies to your post, I can't push D for embedded development. At least until the issues they've brought up are resolved.

>> Insisting on not needing hardware FP is ok. But to legitimize that, one has to cater to scarce resources in other areas too. Conversely, not genuinely making the language usable in smaller environments, makes striving to independence of FPU not worth the effort and inconvenience.
April 07, 2006
Georg Wrede wrote:
> 
> Uh-oh, after having read what Kris and others have posted as replies to your post, I can't push D for embedded development. At least until the issues they've brought up are resolved.

For what it's worth, the D spec doesn't require any of the behavior Kris has been talking about.  I'd consider most of it specific to the DMD implementation.


Sean
April 07, 2006
Georg Wrede wrote:
> Uh-oh, after having read what Kris and others have posted as replies to your post, I can't push D for embedded development. At least until the issues they've brought up are resolved.

Which particular issue?
April 07, 2006
Walter Bright wrote:
> kris wrote:
> 
>> It would help if you'd note under what circumstances the TypeInfo /is/ included, then. For example, this program:
>>
>> void main()
>> {
>>         throw new Exception ("");
>> }
>>
>>
>> causes all kinds of TypeInfo to be linked:
> 
> 
> In general, an easy way to see why a particular module is being pulled in is to temporarily remove it from the library (lib phobos -foo;), link, and see where the undefined reference is coming from. I'd start by running obj2asm on the module you just compiled, and see what extern directives it puts out.
> 
> I'm not trying to be a jerk by telling you this procedure rather than just giving the answer, but 1) I don't know the answer offhand and I'd have to follow the same procedure to figure it out and 2) I hope that by giving you the tools and methodology for figuring it out, this kind of question won't repeatedly come up (and yes, it has come up repeatedly). 3) I hope that anyone else with these kinds of questions will get familiar with how to use these tools, too. It's a lot better than guessing and assuming.
> 
> Tools like lib, obj2asm, and grep are incredibly useful.

Yes, they are useful. And I'm not trying to be a jerk by pointing out that the D runtime is missing some much needed TLC (to put it very nicely). Why do you think Ares exists anyway?


>> Where did all that come from? I suspect you're looking at this concern with a microscope only, while I think the bigger picture is perhaps more important.
> 
> 
> I don't think there is a bigger picture. There's only a case by case analysis of what is needed and what isn't.

I see.


> All those other names are are the static data. Things like:
> 
> const dchar LS = '\u2028';      /// UTF line separator
> const dchar PS = '\u2029';      /// UTF paragraph separator
> 
> I submit that they aren't significant. 

Yes, you're right. Unless they're huge tables (such as the Unicode character map; oh wait; is that linked by default? :)


>> You're focusing purely on the fact that adding an itoa() would increase the executable size.
> 
> 
> Yes.
> 
>  > At the same time, completely ignoring the explicit
> 
>> mention of using the C runtime function instead (which is usually linked also), and the clear fact that importing std.string brings along with it the following:
> 
> 
> And the only possible problem I see there is worrying about executable size.


Forgive me, but, there's a certain unattributed reputation for avoiding any and all important and/or salient points whenever it suits ~


>> And, apparently, you think it's perhaps responsible for bringing in the floating point support too.
> 
> 
> That is a problem, and I can fix that. No big deal - it wasn't printf bringing in the floating point - and a reengineering or rewrite of Phobos is not necessary. I don't even need to change any library source code.


What's all this about necessary re-engineering and rewriting of Phobos? Where the heck did that come from?


>> The point being made is that of coupling between low and high levels ~ illustrated quite well by the above.
>> I think this kind of thing is worth addressing, for a number of reasons.
> 
> 
> I think you're seeing an effect that is an issue, but are mistaken as to the cause of the problem.


I see low-level code being dependent upon high-level. I also see a large brick wall with an entirely unresponsive mason sitting atop.



> Again, it isn't necessarilly printf doing that. Try the code I posted in the last message that stubs out printf, which will *prevent* it from being linked in from the library. Compile/link it, and examine the .map file.


Sigh. I did that last year, as you well know. After all, that's the reason you sent me the source code.


> Because it's not portable (what should the Linux one look like?), and does not deliver the billed benefits. But the worst thing about calling ConsoleWrite() directly is that it does not play well with any other IO the user may have done or be in the process of doing. What will happen is that any object.print()'s will not be synchronized with the output from writef, printf, or any other of the stdout functions.


Fair point about the synchronization aspect. I'm glad you brought that up.

This is when library designers do one of two things: insert an indirect hook (like Sean has done in many places), or remove the functionality from that layer (again, like Sean has done). The worst possible thing to do is just leave it in there. It becomes part of the "legacy" and thus is impossible to remove cleanly at some future date; and it tends to negate reasonable attempts to clean up other similar concerns.

Heck, object.print() should probably be entirely removed anyway; the functionality provided is of dubious value, other than for some truly lazy and limited debugging; it caused enough concern that there was a general concensus to remove it, *or at least make it a null-op*, two years ago; and it's not even applied to any extent. The toString() method provides similar capability without the coupling issues. Again, it could simply become a null-op, or be removed. There's no reasonable balanced need for it to exist as it does today.

But then, why ever bother cleaning /anything/ up, when it won't make any difference anyway?


> Why does the C library need replacing? I honestly don't get it.

Who said it /needed/ replacing? I'm simply talking about applying a small sprinkling of decoupling dust


>> Why not just remove the dependency instead?
> 
> Because it doesn't buy anything to remove it. Try it and see (or even easier, try the source I posted with the stubbed out printf - that will absolutely, positively prevent printf from being linked in from the library, without needing to change or recompile object.d at all).


You snipped what I thought a fair and appropriate analogy, so I'll repeat it:

"Take a shotgun, and pepper the boat you're standing in with holes. Now, see? When you plug up this one hole, it really doesn't stop the water coming in? See? Hardly any difference"

There /is/ a bigger picture here. One has to first see it, and then approach a resolution in small steps. Unfortunately the "it doesn't buy anything to remove it" outlook is entirely non-conducive to stepwise refinement. Nothing will ever get improved with that attitude.


> If you want to use a system that for some reason can't have C's IO subsystem, then just include the one liner:
> 
> extern (C) int printf(char* f, ...) { return 0; }
> 
> somewhere in your code, and it's gone.


Forgive me, but that's just printf(). If I have, as you say, a system that can't have C's IO subsystem, the above will hardly help me at all. I suspect you're well aware the C IO subsystem consists of significantly more than just printf? Say, perhaps 50-ish functions? You're going to suggest I stub them all out just because Object.print() calls printf()?



> I'm trying to point out that things aren't so simple.


And I'm trying to point out just how simple it is to /start/ the process of eliminating questionable couplings that lead to Derek's sad example.



> Until you've tracked down each and every one and understand where it is pulled in from and why it is there, there is no way to decide which ones are superfluous or not.


There's rarely a need to do any of that when one is careful to decouple responsibilities.


> There's an awful lot of startup and shutdown going on - stuff that is required for D (or the C runtime library, for that matter) to function. An awful lot is required for the exception handling support to work - that has to be in all programs. For the gc to start up and shut down gracefully. It goes on.


Sure; that's a given. Yet it's also quite clear the D runtime links the kitchen-sink too. Given Derek's example:

void main() {}

The .map for that is a fine specimen of "unexpected" coupling. It does seem as though much of that is actually in the C library, but then you're arguing most fervently against doing anything to fix any such things.


>> Also, please keep in mind that the concern is one of unecessary coupling from the low-level runtime support, into the high-level library functions. This will often result in a cascade of dependencies, much like what we see below. Not only does it cause code-bloat, but it makes the language-support dependent upon a specific high-level library. These dependencies are /very/ easy to remedy, with an approriate reduction in code size as a bonus.
> 
> 
> As we've discovered, pulling printf out of object.d isn't going to remedy anything. 

Sigh. I'm now at a complete loss at how to respond. Instead, I'll remind you what prompted this exercise in futility:

~~~~~~~~~~~~~~~~~~~~

Walter Bright wrote:
> Phobos doesn't require floating point support from the processor
> unless one actually uses floating point in the application code.

That turned out to be somewhat less than truthful.

> I also really don't understand why anyone using D would require
> not using Phobos. What's the problem?

With much respect, the 'problem' is perhaps that you don't see any?


April 07, 2006
Sean Kelly wrote:
> Georg Wrede wrote:
> 
>>
>> Uh-oh, after having read what Kris and others have posted as replies to your post, I can't push D for embedded development. At least until the issues they've brought up are resolved.
> 
> 
> For what it's worth, the D spec doesn't require any of the behavior Kris has been talking about.  I'd consider most of it specific to the DMD implementation.

Agreed. If there were a good embedded compiler available (via GDC) then I'd certainly use Ares plus an appropriate C lib.