April 12, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev Attachments:
| On 13 April 2013 00:19, Vladimir Panteleev <vladimir@thecybershadow.net>wrote: > On Friday, 12 April 2013 at 13:39:38 UTC, Manu wrote: > >> If allocating a string on the stack makes it buggy, then there is >> something >> really wrong. It should be no less convenient if appropriate helpers are >> available. >> > > Please see my reply to your other post. > > > With consideration to the string[string] argument, surely instances like >> that can be reconsidered? How is string[] going to produce more bugs than string[string]? >> > > env ~= "FOO=BAR"; > > This will probably not do what you want if there was already a line starting with "FOO=" in env. > > An array of strings is a less direct representation of the environment than a string map. Certain common operations, such as finding the value of a variable, or setting / overwriting a variable, become more difficult. I didn't see any attempt to index the array by key in this case. That's what an AA is for, and it's not being used here, so it's not a job for an AA. I wouldn't use env ~= "FOO=BAR"; I would use env ~= EnvVar("FOO", "BAR"); Or whatever key/value pair structure you like. You're being paranoid, or sensationalising the effect of simple >> optimisation. >> > > Strong words... Well it seemed appropriate. I can't understand what's so wildly complex that it would make code utterly unmaintainable, and error prone. And >> again, speed is not my concern here, it's inconsiderate the allocation policy. >> > > I'm interested in eliminating allocations. It's just another function that >> can't be called in a no-gc area. If it used the stack for its temporaries, no problem. >> > > Why allocations, specifically, if not for the performance costs of allocation and garbage collection? That's one aspect, but it's also about having control over the allocation patterns of your program in general. Lots of small allocations fragment the heap, and they also push the memory barrier. I couldn't disable the GC and call into phobos functions for very long, micro-allocations of temporaries not being freed would quickly eat all the system memory. Reducing allocations is always better where possible. As much is convenient without causing you to start obscuring your code? >> > That's my personal rule. >> But I make it a habit to consider efficiency when designing code, I never >> retrofit it. I tend to choose designs that are both simple and efficient >> at >> the start. >> > > OK, so if I understand you correctly: you would like Phobos to adopt a policy of avoiding heap allocations whenever possible, and this argument applies to std.process not because doing so would result in a tangible improvement of its performance or other metric, but for the purpose of being consistent across Phobos. Assuming that the language can provide or allow implementing suitably safe abstractions for doing so without complicating the code much, I think that's a goal worth looking forward, and we have been doing so for some time (hence the pending allocator design). > It starts as soon as the majority agree it's important enough to enforce. Although I think a tool like: char[len] stackString; would be a super-useful tool to make this considerably more painless. Some C compilers support this. |
April 12, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Friday, 12 April 2013 at 14:58:10 UTC, Manu wrote: > I didn't see any attempt to index the array by key in this case. That's > what an AA is for, and it's not being used here, so it's not a job for an > AA. Sorry, not following. Are you suggesting to use a dynamic array for creating processes, but an associative array for examining the current process's environment? > I wouldn't use env ~= "FOO=BAR"; > I would use env ~= EnvVar("FOO", "BAR"); > Or whatever key/value pair structure you like. OK, but I don't see how it changes your argument. Also, you mentioned string[] earlier. > Well it seemed appropriate. I can't understand what's so wildly complex > that it would make code utterly unmaintainable, and error prone. I did not say it would be _utterly_ unmaintainable or error prone. Just more so. > It starts as soon as the majority agree it's important enough to enforce. > Although I think a tool like: char[len] stackString; would be a > super-useful tool to make this considerably more painless. Some C compilers > support this. I believe this is the feature: http://en.wikipedia.org/wiki/Variable-length_array It is part of C99. ------------------------------------------------------------- Earlier today, I wrote: > Please rewrite some part of std.process with performance in mind, and post it here for review. This way, we can analyze the benefits and drawbacks based on a concrete example, instead of vapor and hot air. I've tried doing this myself, for the bit of code you brought up (constructing environment variables). Here's what I ended up with: ------------------------------------------------------------- import std.array; private struct StaticAppender(size_t SIZE, T) { T[SIZE] buffer = void; Appender!(T[]) appender; alias appender this; } /// Returns a struct containing a fixed-size buffer and an /// appender. The appender will use the buffer (which has /// SIZE elements) until it runs out of space, at which /// point it will reallocate itself on the heap. auto staticAppender(size_t SIZE, T)() { StaticAppender!(SIZE, T) result; result.appender = appender(result.buffer[]); return result; } /// Allows allocating a T[] given a size. /// Contains a fized-size buffer of T elements with SIZE /// length. When asked to allocate an array with /// length <= than SIZE, the buffer is used instead of /// the heap. struct StaticArray(size_t SIZE, T) { T[SIZE] buffer = void; T[] get(size_t size) { if (size <= SIZE) { buffer[0..size] = T.init; return buffer[0..size]; } else return new T[size]; } } // -------------------------------------------------------- void exec(const(char)*[] envz) { /+ ... +/ } void oldWay(string[string] environment) { auto envz = new const(char)*[environment.length + 1]; int pos = 0; foreach (var, val; environment) envz[pos++] = (var~'='~val~'\0').ptr; exec(envz); } void newWay(string[string] environment) { auto buf = staticAppender!(4096, char)(); StaticArray!(64, size_t) envpBuf; size_t[] envp = envpBuf.get(environment.length + 1); size_t pos; foreach (var, val; environment) { envp[pos++] = buf.data.length; buf.put(var); buf.put('='); buf.put(val); buf.put('\0'); } // Convert offsets to pointers in-place auto envz = cast(const(char)*[])envp; foreach (n; 0..pos) envz[n] += cast(size_t)buf.data.ptr; exec(envz); } ------------------------------------------------------------- As you can see, the code is quite more verbose, even with the helper types. It's no longer obvious at a glance what the code is doing. Perhaps you can come up with better abstractions? |
April 12, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Fri, 12 Apr 2013 09:42:43 -0400, Manu <turkeyman@gmail.com> wrote:
> On 12 April 2013 23:21, Lars T. Kyllingstad <public@kyllingen.net> wrote:
>
>> On Friday, 12 April 2013 at 07:04:23 UTC, Manu wrote:
>>
>>>
>>> string[string] is used in the main API to receive environment variables;
>>> perhaps kinda convenient, but it's impossible to supply environment
>>> variables with loads of allocations.
>>>
>>
>> Environment variables are a mapping of strings to strings. The natural
>> way to express such a mapping in D is with a string[string]. It shouldn't
>> be necessary to allocate an AA literal, though.
>>
>
> That's a good point, do AA's support literals that don't allocate? You
> can't even produce an array literal without it needlessly allocating.
No, because it would have to be COW.
However, a string[string] literal that uses strings only must allocate the structure of the AA, not the strings themselves.
The compiler might be able to optimize this by figuring out how much memory to allocate in order to contain the entire AA structure, then create one block that has all the data in it. It still needs a new block though, otherwise, what happens when you change an AA that was once a literal?
I think the best path forward is to replace the functions with a templated one that takes a indexable type as the env pointer. Then one can optimize as much as one desires.
-Steve
|
April 12, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev Attachments:
| On 13 April 2013 01:29, Vladimir Panteleev <vladimir@thecybershadow.net>wrote: > On Friday, 12 April 2013 at 14:58:10 UTC, Manu wrote: > >> I didn't see any attempt to index the array by key in this case. That's what an AA is for, and it's not being used here, so it's not a job for an AA. >> > > Sorry, not following. > > Are you suggesting to use a dynamic array for creating processes, but an associative array for examining the current process's environment? I didn't see anywhere where it was possible to example the current processed environment? I only saw the mechanism for feeding additional env vars to the system command. Linear array of key/value pair struct would be fine. I wouldn't use env ~= "FOO=BAR"; >> I would use env ~= EnvVar("FOO", "BAR"); >> Or whatever key/value pair structure you like. >> > > OK, but I don't see how it changes your argument. Also, you mentioned string[] earlier. Sorry. I didn't think it through at the time. It starts as soon as the majority agree it's important enough to enforce. >> > Although I think a tool like: char[len] stackString; would be a >> super-useful tool to make this considerably more painless. Some C >> compilers >> support this. >> > > I believe this is the feature: > > http://en.wikipedia.org/wiki/**Variable-length_array<http://en.wikipedia.org/wiki/Variable-length_array> > > It is part of C99. > > ------------------------------**------------------------------**- > > Earlier today, I wrote: > > Please rewrite some part of std.process with performance in mind, and >> post it here for review. This way, we can analyze the benefits and drawbacks based on a concrete example, instead of vapor and hot air. >> > > I've tried doing this myself, for the bit of code you brought up (constructing environment variables). Here's what I ended up with: > > ------------------------------**------------------------------**- > > import std.array; > > private struct StaticAppender(size_t SIZE, T) > { > T[SIZE] buffer = void; > Appender!(T[]) appender; > alias appender this; > } > > /// Returns a struct containing a fixed-size buffer and an > /// appender. The appender will use the buffer (which has > /// SIZE elements) until it runs out of space, at which > /// point it will reallocate itself on the heap. > auto staticAppender(size_t SIZE, T)() > { > StaticAppender!(SIZE, T) result; > result.appender = appender(result.buffer[]); > return result; > } > > /// Allows allocating a T[] given a size. > /// Contains a fized-size buffer of T elements with SIZE > /// length. When asked to allocate an array with > /// length <= than SIZE, the buffer is used instead of > /// the heap. > struct StaticArray(size_t SIZE, T) > { > T[SIZE] buffer = void; > T[] get(size_t size) > { > if (size <= SIZE) > { > buffer[0..size] = T.init; > return buffer[0..size]; > } > else > return new T[size]; > } > } > > // ------------------------------**-------------------------- > > void exec(const(char)*[] envz) { /+ ... +/ } > > void oldWay(string[string] environment) > { > auto envz = new const(char)*[environment.**length + 1]; > int pos = 0; > foreach (var, val; environment) > > envz[pos++] = (var~'='~val~'\0').ptr; > > exec(envz); > } > > void newWay(string[string] environment) > { > auto buf = staticAppender!(4096, char)(); > StaticArray!(64, size_t) envpBuf; > size_t[] envp = envpBuf.get(environment.length + 1); > > size_t pos; > foreach (var, val; environment) > { > envp[pos++] = buf.data.length; > buf.put(var); > buf.put('='); > buf.put(val); > buf.put('\0'); > } > > // Convert offsets to pointers in-place > auto envz = cast(const(char)*[])envp; > foreach (n; 0..pos) > envz[n] += cast(size_t)buf.data.ptr; > > exec(envz); > } > > ------------------------------**------------------------------**- > > As you can see, the code is quite more verbose, even with the helper types. It's no longer obvious at a glance what the code is doing. Perhaps you can come up with better abstractions? > Beautiful! Actually, I think if you look a couple of pages below, I think you'll see something rather like that already there in the windows code. Thought not sure why you use a size_t array which you just cast to a char* array? I think you could also fold that into one pass, rather than 2. And I'm not sure about this line: envz[n] += cast(size_t)buf.data.ptr; But those helpers make the problem rather painless. I wonder if there's opportunity for improvement by having appender support the ~ operator? Might be able to jig it to use natural concat syntax rather than put()... |
April 12, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Friday, 12 April 2013 at 16:04:09 UTC, Manu wrote: > I didn't see anywhere where it was possible to example the current > processed environment? I only saw the mechanism for feeding additional env > vars to the system command. > Linear array of key/value pair struct would be fine. There's the "environment" object. It generally acts like a string[string], and has a toAA() method that constructs a real string[string]. > Beautiful! Actually, I think if you look a couple of pages below, I think > you'll see something rather like that already there in the windows code. I see, it also uses appender, although appender's buffer will be on the heap, and there is no need for an envz array. > Thought not sure why you use a size_t array which you just cast to a char* > array? > I think you could also fold that into one pass, rather than 2. > And I'm not sure about this line: envz[n] += cast(size_t)buf.data.ptr; The problem is that the pointer to the data may change once appender reallocates the buffer when it reaches the current buffer's capacity. For this reason, we can't store pointers to the strings we store, since they can "move" around until the point that we're done appending. I think this is the most common gotcha when writing / working with appenders, and it bit be once too. As you can see, the code is not completely obvious ;) > But those helpers make the problem rather painless. > I wonder if there's opportunity for improvement by having appender support > the ~ operator? Might be able to jig it to use natural concat syntax rather > than put()... Allowing put() take multiple arguments would be an improvement as well - not just in usability, but performance as well, since it would only need to check for overflow once for all arguments. I have this in my own appender: https://github.com/CyberShadow/ae/blob/master/utils/appender.d Rob Jacques was working on a Phobos appender replacement which also had this, I believe: http://d.puremagic.com/issues/show_bug.cgi?id=5813 Too bad nothing came out of the latter. |
April 12, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Vladimir Panteleev | > void oldWay(string[string] environment)
> {
> auto envz = new const(char)*[environment.length + 1];
> int pos = 0;
> foreach (var, val; environment)
> envz[pos++] = (var~'='~val~'\0').ptr;
>
> exec(envz);
> }
>
> void newWay(string[string] environment)
> {
> auto buf = staticAppender!(4096, char)();
> StaticArray!(64, size_t) envpBuf;
> size_t[] envp = envpBuf.get(environment.length + 1);
>
> size_t pos;
> foreach (var, val; environment)
> {
> envp[pos++] = buf.data.length;
> buf.put(var);
> buf.put('=');
> buf.put(val);
> buf.put('\0');
> }
>
> // Convert offsets to pointers in-place
> auto envz = cast(const(char)*[])envp;
> foreach (n; 0..pos)
> envz[n] += cast(size_t)buf.data.ptr;
>
> exec(envz);
> }
I just thought of something!
I don't comment here often, but I want to express my opinion:
Currently D, being a system language, offers full control over the allocation method, be it the stack or the heap. The helpers above show how flexible it is in doing custom optimized stuff if one wants to. But there's an obvious drawback, quoting Vladimir:
"the code is quite more verbose, even with the helper types. It's no longer obvious at a glance what the code is doing."
But think for a moment - does the programmer usually needs to choose the allocation method? Why explicit is default?
It could be the other way around:
void oldWay(string[string] environment)
{
const(char)* envz[environment.length + 1]; // Compiler decides whether to use stack or heap.
int pos = 0;
foreach (var, val; environment)
envz[pos++] = (var~'='~val~'\0').ptr; // Use appender. Use stack and switch to heap if necessary
exec(envz);
}
How nifty would that be, don't you think?
Another benefit could be that the compiler could adjust the stack allocation limit per architecture, and probably it could be defined as a command line parameter, e.g. when targeting a low end device.
This principle could work on fields other than allocation, e.g. parameter passing by ref/value. The programmer needs to only specify whether he wants a copy or the actual value, and the compiler would decide the optimal way to pass it.
e.g.:
immutable int n;
func(n); // by value
immutable int arr[30];
func(arr); // by ref
But these are probably suggestions for D3 or something... Too drastic :)
|
April 13, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mr. Anonymous | After some Googling, I found out a similar technique already exists: auto_buffer in C++. http://goo.gl/3RLK6 I think it would be perfect to make it the default allocation method in D. |
April 13, 2013 Re: Was: Re: Vote for std.process | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mr. Anonymous | > Someone (bearophile?) once suggested static arrays whose length is determined at runtime, which would > be a great addition to the language: > void foo(int n) { int[n] myArr; } That would be great. Question about the type: it won't be int[n] as n is runtime not compile time. would it be int[] or some other type? pros of some other type: makes some optimizations that benefit from the fact it's stack allocated possible cons: we need to be able to reuse existing algos that don't make such distinction. maybe another type that would satisfy both type traits: isDynamicArray!T isAllocaArray!T On Sat, Apr 13, 2013 at 4:19 AM, Mr. Anonymous <mailnew4ster@gmail.com> wrote: > After some Googling, I found out a similar technique already exists: > auto_buffer in C++. > http://goo.gl/3RLK6 > > I think it would be perfect to make it the default allocation method in D. |
Copyright © 1999-2021 by the D Language Foundation