April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> Daniel Keep wrote:
>> Michel Fortin wrote:
>>> On 2009-04-27 10:51:22 -0400, Frits van Bommel
>>> <fvbommel@REMwOVExCAPSs.nl> said:
>>>
>>>> I edited this code to work with ldc (D1) + Tango, and saw the Direct
>>>> and opApply cases generate identical code (inc, cmp, jne, with the
>>>> loop counter in a register) [1], so they're equally fast (modulo
>>>> process scheduling randomness).
>>> Thank you for your timings. I think it shows my point: that by prefering
>>> ranges over opApply we're just optimising around a deficiency in DMD's
>>> optimizer.
>> Not true. Here's an excellent reason to use ranges over opApply: you
>> can't define zip with opApply. Because opApply uses inversion of
>> control, you can't use more than one without bringing threads into the
>> equation.
>>
>
> Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?
Second, isn't the overhead still relatively high compared to simple function calls?
| |||
April 28, 2009 Re: Phobos2: zip, integral ranges, map, Any, All, Map | ||||
|---|---|---|---|---|
| ||||
Posted in reply to bearophile | == Quote from bearophile (bearophileHUGS@lycos.com)'s article > Some more notes about Phobos2. Some of the things I say may be wrong because my experience with Phobos2 is limited still. > Daniel Keep: > > Not true. Here's an excellent reason to use ranges over opApply: you can't define zip with opApply. Because opApply uses inversion of control, you can't use more than one without bringing threads into the equation. > I'll try to zip two ranges that return the leaves of two different binary trees. This can be a simple example to show to attract people to D2 language :-) > So I've tried using zip: > import std.range: zip; > import std.stdio: writeln; > void main() { > auto a = [1, 2, 3]; > string[] b = ["a", "b", "c"]; > foreach (xy; zip(a, b)) > writeln(xy.at!(0), " ", xy.at!(1)); > } > That doesn't work: > ...\dmd\src\phobos\std\range.d(1847): Error: template instance Zip!(int[3u],immutable(char)[][]) does not match template declaration Zip(R...) if (R.length && allSatisfy!(isInputRange,R)) > probably because 'a' is a static array. Is isInputRange false for static arrays? Yes, because the range interface for arrays relies on std.array being imported and using extension method syntax. This has bitten me a bunch of times, too. The functions in std.array take dynamic arrays and static arrays are not implicitly convertible to dynamic arrays, except in the specific case of binding an array literal to a variable. Whether they should be, I don't know. > ------------ > The most basic and useful range is the one of integer numbers. How can I create with Phobos2 lazy and eager ranges like the following ones? > >>> range(1, 5) > [1, 2, 3, 4] > >>> range(5, 1, -1) > [5, 4, 3, 2] > >>> list(xrange(5, 10)) > [5, 6, 7, 8, 9] > >>> list(xrange(5, 10, 2)) > [5, 7, 9] > Similar ranges are useful with map,zip, and in many other situations. > (I hope the x..y range syntax of D2 foreach will evolve in a syntax that can be used everywhere an integral range can be used). I think you're looking for std.range.iota, although it doesn't work so well right now because of bug 2871. | |||
April 28, 2009 Re: Phobos2: zip, integral ranges, map, Any, All, Map | ||||
|---|---|---|---|---|
| ||||
Posted in reply to dsimcha | dsimcha: >Whether they should be, I don't know.< In my D1 dlibs most things digest static arrays too (but they don't let you iterate on them or return them, that's a limit of the language). I don't know D2 enough yet to give you a better answer, but I think it's better to remove similar limits from the language. zip-ing two static arrays has to be a legit operation. >I think you're looking for std.range.iota,< Yes, sorry, I have found it few seconds after sending the post. I am dumb. >although it doesn't work so well right now because of bug 2871.< And I have found the bug, of course... :-) I can also see Denis Koroskin has suggested a small fix: http://d.puremagic.com/issues/show_bug.cgi?id=2871 Regarding iota: it's better for the third argument of iota to defult to 1, becasuse most of the times you want a step = 1. In Python range() goes one step further: if you give just one argument it's meant to be the second one and the first defaults to zero. In practice this causes no problems because in Python you use range()/xrange() all the time. In D I may accept iota to have 2-3 arguments because it will probably used much less often. Bye and thank you, bearophile | |||
April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to grauzone | grauzone wrote:
> downs wrote:
>> Daniel Keep wrote:
>>> Here's an excellent reason to use ranges over opApply: you
>>> can't define zip with opApply. Because opApply uses inversion of
>>> control, you can't use more than one without bringing threads into the
>>> equation.
>>
>> Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
>
> First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?
A fiber-specific stack needn’t be very large. A few KB is often enough even for long-running threads; if the call stack is only going to be a few levels deep you might get away with a few hundred bytes each.
—Joel Salomon
| |||
April 28, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Joel C. Salomon | Joel C. Salomon wrote:
> grauzone wrote:
>> downs wrote:
>>> Daniel Keep wrote:
>>>> Here's an excellent reason to use ranges over opApply: you
>>>> can't define zip with opApply. Because opApply uses inversion of
>>>> control, you can't use more than one without bringing threads into the
>>>> equation.
>>> Your point stands, of course, but I just wanted to mention that stackthreads/fibers work too and have far less overhead.
>> First, don't they have LOTS of memory overhead, because each fiber needs a new/separate stack?
>
> A fiber-specific stack needn’t be very large. A few KB is often enough even for long-running threads; if the call stack is only going to be a few levels deep you might get away with a few hundred bytes each.
>
> —Joel Salomon
Furthermore, using anonymous mmapped files the memory is only allocated when accessed - the only thing taken is address space. Which, granted, can be limiting. Luckily 64-bit will fix that. :)
| |||
April 29, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to downs | downs wrote:
> Joel C. Salomon wrote:
>> grauzone wrote:
>>> downs wrote:
>>>> Daniel Keep wrote:
>>>>> Here's an excellent reason to use ranges over opApply: you
>>>>> can't define zip with opApply. Because opApply uses inversion of
>>>>> control, you can't use more than one without bringing threads into the
>>>>> equation.
>>>> Your point stands, of course, but I just wanted to mention that
>>>> stackthreads/fibers work too and have far less overhead.
>>> First, don't they have LOTS of memory overhead, because each fiber needs
>>> a new/separate stack?
>> A fiber-specific stack needn’t be very large. A few KB is often enough
>> even for long-running threads; if the call stack is only going to be a
>> few levels deep you might get away with a few hundred bytes each.
The Commodore-64 had a stack of 0.25kB shared between your Basic program, subroutine calls, and the operating system. While not unheard of, it still was unusual to have to make amendments to program design because of stack limitations, even in large programs.
*If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed.
This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case".
Andrei, Walter???
PS, on the 6502 (the C64 CPU), the stack was hard wired to 0100-01FF. This made the processor vastly simpler, which was essential for cost savings in an era where an additional transistor in the processor did cost an arm and a leg.
The area before the stack was used with 8-bit addressing, which was faster, and traditionally it was used more or less as an extension to processor registers, because the 6502 only had the X, Y, A, PC, and Flags registers. The area after the stack was free for the circuit board designer to use in any way he liked. On the C64 this was used for ROM, memory mapped peripherals, IO, the heap, and the program area. Interestingly, self-modifying code was used for keyboard input, to enhance speed and code size. This concept was copied from the VIC-20, which came with 3k of usable RAM.
| |||
April 29, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Georg Wrede | Georg Wrede wrote:
> downs wrote:
>> Joel C. Salomon wrote:
>>> grauzone wrote:
>>>> downs wrote:
>>>>> Daniel Keep wrote:
>>>>>> Here's an excellent reason to use ranges over opApply: you
>>>>>> can't define zip with opApply. Because opApply uses inversion of
>>>>>> control, you can't use more than one without bringing threads into the
>>>>>> equation.
>>>>> Your point stands, of course, but I just wanted to mention that
>>>>> stackthreads/fibers work too and have far less overhead.
>>>> First, don't they have LOTS of memory overhead, because each fiber needs
>>>> a new/separate stack?
>>> A fiber-specific stack needn’t be very large. A few KB is often enough
>>> even for long-running threads; if the call stack is only going to be a
>>> few levels deep you might get away with a few hundred bytes each.
>
> The Commodore-64 had a stack of 0.25kB shared between your Basic program, subroutine calls, and the operating system. While not unheard of, it still was unusual to have to make amendments to program design because of stack limitations, even in large programs.
>
> *If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed.
>
> This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case".
>
> Andrei, Walter???
Since C64 programs have gotten a lot bigger, programming style favors deeper call stacks, and (mutually) recursive functions are not a rarity.
Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.
Andrei
| |||
April 29, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Andrei Alexandrescu | Andrei Alexandrescu wrote:
> Georg Wrede wrote:
>> downs wrote:
>>> Joel C. Salomon wrote:
>>>> grauzone wrote:
>>>>> downs wrote:
>>>>>> Daniel Keep wrote:
>>>>>>> Here's an excellent reason to use ranges over opApply: you
>>>>>>> can't define zip with opApply. Because opApply uses inversion of
>>>>>>> control, you can't use more than one without bringing threads into the
>>>>>>> equation.
>>>>>> Your point stands, of course, but I just wanted to mention that
>>>>>> stackthreads/fibers work too and have far less overhead.
>>>>> First, don't they have LOTS of memory overhead, because each fiber needs
>>>>> a new/separate stack?
>>>> A fiber-specific stack needn’t be very large. A few KB is often enough
>>>> even for long-running threads; if the call stack is only going to be a
>>>> few levels deep you might get away with a few hundred bytes each.
>>
>> The Commodore-64 had a stack of 0.25kB shared between your Basic program, subroutine calls, and the operating system. While not unheard of, it still was unusual to have to make amendments to program design because of stack limitations, even in large programs.
>>
>> *If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed.
>>
>> This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case".
>>
>> Andrei, Walter???
>
> Since C64 programs have gotten a lot bigger, programming style favors deeper call stacks, and (mutually) recursive functions are not a rarity.
>
> Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky.
The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too.
A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value.
This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size.
If this shows the maximum stack usage as 160 bytes, then today's youngsters could allocate, say, 0.25k, (or even 1k) instead of the 50k they'd otherwise allocate.
Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too.
| |||
April 29, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Georg Wrede | Georg Wrede wrote: > *If* we had a convenient way to record the high-water mark of stack usage for functions (and maybe also threads), then people could have new insights on how little stack space is needed. > > This would be /very/ convenient once we start using thread specific stacks, because the stack space has to be allocated in advance, hopefully not wasting huge amounts of space "just in case". What do you mean by “thread-specific stacks”? > Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too. How much rewriting of the stack would have to happen? Relocating the stack will invalidate any pointers to automatic (in the C sense) variables. Detecting stack overflow at runtime might also not be trivial. —Joel Salomon | |||
April 29, 2009 Re: Yet another strike against the current AA implementation | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Georg Wrede | Georg Wrede wrote: > Andrei Alexandrescu wrote: >> Computing the stack depth even for non-recursive functions is an interprocedural analysis so it's difficult to do modularly. The stack is also unchecked so going with a "good enough" approximation is bound to be risky. > > The old size estimation problem. No matter how big you decide on, somewhere is an idiot who will blow that size too. > > > A runtime solution just occurred to me. The newly allocated stack for a thread is zeroed first, right? Well then, just before the thread finishes, one could scan the stack area, and find the first address that contains a non-zero value. > > This isn't bullet proof in theory, but hey, the aim is to help in /estimating/ the needed stack size. You can't be more serious than Mark Twain when giving advice on the stock market: "Buy some stock. If it goes up, sell it. If it doesn't go up, don't buy it." > Incidentally, this raises another thought: what if the stack was extendable? Like D arrays? A stack overflow would trigger a relocation of the stack, and then continue. Being able to have hundreds of small stacks would be useful in other things than threads, too. That would be great. I don't know whether it can be done. By the way, I think you're underestimating how much stack the OS gives to the stack. On my system: $ grep PTHREAD_STACK /usr/include/**/*.h /usr/include/bits/local_lim.h:#define PTHREAD_STACK_MIN 16384 /usr/include/pthread.h: minimal size of the block must be PTHREAD_STACK_MIN.*/ /usr/include/pthread.h: to be started. This size must never be less than PTHREAD_STACK_MIN $ _ You can't have a stack smaller than 16KB as far as I understand. I seem to recall the default stack size is much bigger. Andrei | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply