Jump to page: 1 25  
Page
Thread overview
V2 string
Jul 04, 2007
Derek Parnell
Jul 04, 2007
Walter Bright
Jul 04, 2007
Derek Parnell
Jul 05, 2007
Vladimir Panteleev
Jul 05, 2007
Derek Parnell
Jul 05, 2007
Walter Bright
Jul 05, 2007
Regan Heath
Jul 05, 2007
Walter Bright
Jul 06, 2007
James Dennett
Jul 06, 2007
Walter Bright
Jul 06, 2007
Derek Parnell
Jul 06, 2007
Derek Parnell
Jul 06, 2007
Bill Baxter
Jul 06, 2007
Sean Kelly
Jul 06, 2007
Walter Bright
Jul 06, 2007
Regan Heath
Jul 07, 2007
Bruno Medeiros
Jul 06, 2007
Leandro Lucarella
Jul 07, 2007
James Dennett
Jul 07, 2007
Serg Kovrov
Jul 05, 2007
Derek Parnell
Jul 05, 2007
Walter Bright
Jul 05, 2007
Regan Heath
Jul 05, 2007
Bruno Medeiros
Jul 05, 2007
Sean Kelly
Jul 05, 2007
Derek Parnell
Jul 05, 2007
Walter Bright
Jul 05, 2007
Regan Heath
Jul 05, 2007
Bruno Medeiros
Jul 05, 2007
Frits van Bommel
Jul 05, 2007
Bruno Medeiros
Jul 06, 2007
Regan Heath
Jul 07, 2007
Bruno Medeiros
Jul 05, 2007
Regan Heath
Jul 05, 2007
Bruno Medeiros
Jul 06, 2007
Regan Heath
Jul 05, 2007
Walter Bright
Jul 06, 2007
Regan Heath
Jul 06, 2007
Regan Heath
Jul 05, 2007
Derek Parnell
Jul 05, 2007
Oskar Linde
Jul 05, 2007
Walter Bright
Jul 05, 2007
Derek Parnell
Jul 05, 2007
BCS
Jul 06, 2007
Walter Bright
Jul 05, 2007
Sean Kelly
Re: V2 string (general issues)
Jul 05, 2007
Kristian Kilpi
Jul 05, 2007
Walter Bright
Jul 06, 2007
Kristian Kilpi
July 04, 2007
I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable.

I might have to rethink of the design of the application to avoid the performance hit of all these dups.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
July 04, 2007
Derek Parnell wrote:
> I'm converting Bud to compile using V2 and so far its been a very hard
> thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
> over the place, which is exactly what I thought would happen. Bud does a
> lot of text manipulation so having 'string' as invariant means that calls
> to functions that return string need to often be .dup'ed because I need to
> assign the result to a malleable variable. 
> 
> I might have to rethink of the design of the application to avoid the
> performance hit of all these dups.
> 

First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments).

Second, you can use char[] instead of string.
July 04, 2007
On Wed, 04 Jul 2007 15:48:45 -0700, Walter Bright wrote:

> Derek Parnell wrote:
>> I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable.
>> 
>> I might have to rethink of the design of the application to avoid the performance hit of all these dups.
>> 
> 
> First of all, if you were returning string literals as char[] and trying to manipulate them, they'd fail on linux at run time (because string literals are put into read only segments).

But I'm not, and never have been, returning string literals anywhere.

> Second, you can use char[] instead of string.

The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own).

This leads to constructs like ...

   char[] result;

   result = SomeTextFunc(data).dup;

Another commonly used idiom that I had to stop using was ...

   char[] text;
   text = getvalue();
   if (wrongvalue(text))
       text = ""; // Reset to an empty string

I now code ...

       text.length = 0; // Reset to an empty string

which is slightly less readable.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
July 05, 2007
On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek@psych.ward> wrote:

> This leads to constructs like ...
>
>    char[] result;
>
>    result = SomeTextFunc(data).dup;

Is SomeTextFunc allocating a copy of the string which it is returning? If it is, then there's no reason why it should return a "string" type. If it isn't, then modifying the data in the returned char[] could have unforeseen consequences.

> Another commonly used idiom that I had to stop using was ...
>
>    char[] text;
>    text = getvalue();
>    if (wrongvalue(text))
>        text = ""; // Reset to an empty string

Since empty string literals don't really point to data, I'd suggest that empty string and array literals shouldn't be const/invariant in favor of the above example. It breaks some consistency, but "a foolish consistency is the hobgoblin of little minds" ;)

-- 
Best regards,
  Vladimir                          mailto:thecybershadow@gmail.com
July 05, 2007
On Thu, 05 Jul 2007 04:44:41 +0300, Vladimir Panteleev wrote:

> On Thu, 05 Jul 2007 02:23:11 +0300, Derek Parnell <derek@psych.ward> wrote:
> 
>> This leads to constructs like ...
>>
>>    char[] result;
>>
>>    result = SomeTextFunc(data).dup;
> 
> Is SomeTextFunc allocating a copy of the string which it is returning? If it is, then there's no reason why it should return a "string" type. If it isn't, then modifying the data in the returned char[] could have unforeseen consequences.

Yes, I realize this and I'm not saying its doing the wrong thing, and actually I'm not even complaining. I'm just letting people know some of the observations I've had in moving to v2. In this case, someone has to copy the resulting data - either the function that created it or the routine that called the function. If the called function does the duplication, it could be a waste if the calling function is not going to further modify it, that is why I elected to pass a 'const' reference to the new data. The calling function can then decide if it needs a copy (to modify it) or not.

   string result;
   result = SomeTextFunc(data); // no need to dup if I'm not changing it.


I've got a set of alias to help me ...

   alias char[]  text;
   alias wchar[] wtext;
   alias dchar[] dtext;

so now I see 'text' as mutable and 'string' as immutable.

>> Another commonly used idiom that I had to stop using was ...
>>
>>    char[] txt;
>>    txt = getvalue();
>>    if (wrongvalue(txt))
>>        txt = ""; // Reset to an empty string
> 
> Since empty string literals don't really point to data, I'd suggest that empty string and array literals shouldn't be const/invariant in favor of the above example. It breaks some consistency, but "a foolish consistency is the hobgoblin of little minds" ;)

Nice idea, but I can't see it happening because of the inconsistency angle.

Instead I've decided to use the idiom ...

    text txt;
    txt = getvalue();
    if (wrongvalue(txt))
        txt = text.init; // Reset to an empty string

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
5/07/2007 3:52:27 PM
July 05, 2007
Derek Parnell wrote:
> I'm converting Bud to compile using V2 and so far its been a very hard
> thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all
> over the place, which is exactly what I thought would happen. Bud does a
> lot of text manipulation so having 'string' as invariant means that calls
> to functions that return string need to often be .dup'ed because I need to
> assign the result to a malleable variable. 

So just use char[] instead of 'string'.  I don't plan to use the aliases much either.


Sean
July 05, 2007
On Thu, 05 Jul 2007 00:15:41 -0700, Sean Kelly wrote:

> Derek Parnell wrote:
>> I'm converting Bud to compile using V2 and so far its been a very hard thing to do. I'm finding that I'm now having to use '.dup' and '.idup' all over the place, which is exactly what I thought would happen. Bud does a lot of text manipulation so having 'string' as invariant means that calls to functions that return string need to often be .dup'ed because I need to assign the result to a malleable variable.
> 
> So just use char[] instead of 'string'.  I don't plan to use the aliases much either.

It's not so clear cut. Firstly, a lot of phobos routines now return 'string' results and expect 'string' inputs. Secondly, I like the idea of general purpose functions returning 'const' data, because it helps guard against inadvertent modifications by the calling routines. It is up to the calling function to explicitly decide if it is going to modify returned stuff or not.

For example, if I know that I'll not need to modify the 'fullpath' then I might do this ...

   string fullpath;

   fullpath = CanonicalPath(shortname);


However, if I might need to update it ...

   char[] fullpath;

   fullpath = CanonicalPath(shortname).dup;
   version(Windows)
   {
      setLowerCase(fullpath);
   }

The point is that the 'CanonicalPath' function hasn't got a clue what the calling function is intending to do with the result so it is trying to be responsible by guarding it against mistakes by the caller.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
5/07/2007 5:17:33 PM
July 05, 2007
Derek Parnell wrote:
> The idiom I'm using is that functions that receive text have those
> parameters as 'string' to guard against the function inadvertantly
> modifying that which is passed, and functions that return text return
> 'string' to guard against calling functions inadvertantly modifying data
> that they did not create (own).
> 
> This leads to constructs like ...
> 
>    char[] result;
> 
>    result = SomeTextFunc(data).dup;

If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.

> Another commonly used idiom that I had to stop using was ...
> 
>    char[] text;
>    text = getvalue();
>    if (wrongvalue(text))
>        text = ""; // Reset to an empty string
> 
> I now code ...
> 
>        text.length = 0; // Reset to an empty string
> 
> which is slightly less readable.

This should do it nicely:

	text = null;
July 05, 2007
Derek Parnell wrote:
> However, if I might need to update it ...
> 
>    char[] fullpath;
> 
>    fullpath = CanonicalPath(shortname).dup;
>    version(Windows)
>    {
>       setLowerCase(fullpath);
>    }
> 
> The point is that the 'CanonicalPath' function hasn't got a clue what the
> calling function is intending to do with the result so it is trying to be
> responsible by guarding it against mistakes by the caller.

If you write it like this:

string fullpath;

fullpath = CanonicalPath(shortname);
version(Windows)
{
      fullpath = std.string.tolower(fullpath);
}

you won't need to do the .dup .
July 05, 2007
Walter Bright Wrote:
> Derek Parnell wrote:
> > The idiom I'm using is that functions that receive text have those parameters as 'string' to guard against the function inadvertantly modifying that which is passed, and functions that return text return 'string' to guard against calling functions inadvertantly modifying data that they did not create (own).
> > 
> > This leads to constructs like ...
> > 
> >    char[] result;
> > 
> >    result = SomeTextFunc(data).dup;
> 
> If you're needing to guard against inadvertent modification, that's just what const strings are for. I'm not understanding the issue here.
> 
> > Another commonly used idiom that I had to stop using was ...
> > 
> >    char[] text;
> >    text = getvalue();
> >    if (wrongvalue(text))
> >        text = ""; // Reset to an empty string
> > 
> > I now code ...
> > 
> >        text.length = 0; // Reset to an empty string
> > 
> > which is slightly less readable.
> 
> This should do it nicely:
> 
> 	text = null;

Aaargh!  You're confusing empty and non-existant (null) again!  <g>

In some cases there is an important difference between the two.  In this case maybe not I don't really know.

Regan

« First   ‹ Prev
1 2 3 4 5