V2 string (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » V2 string (page 5)

July 06, 2007

Posted by Regan Heath
in reply to Walter Bright

Regan Heath

Posted in reply to Walter Bright

Walter Bright wrote:
> Regan Heath wrote:
>> Walter Bright Wrote:
>>> string fullpath;
>>>
>>> fullpath = CanonicalPath(shortname);
>>> version(Windows)
>>> {
>>>        fullpath = std.string.tolower(fullpath);
>>> }
>>>
>>> you won't need to do the .dup .
>>
>> Because tolower does it for you, but it still returns string
> 
> tolower only dups the string if it needs to. It won't dup a string that is already in lower case.

Sure, but there is still a case where it does dup.  (dup #1)

>  > and if for example
>  > you need to add something to the end of the path, like a filename you  > will end up
>  > doing yet another dup somewhere.
> 
> Concatenating strings does not require a .dup.

opCatAssign does. (dup #2)

OR

newString = constString ~ bitToAdd; (is a copy of constString to newString which is essentially a dup) (dup #2)

So, the worst case scenario is that 2 dups are done.

Further if the input is char[] you can still get this worst case scenario because tolower returns string instead of char[].  With a templated version you get a much more efficient tolower for char[].

Regan

July 06, 2007

Posted by Regan Heath
in reply to Walter Bright

Regan Heath

Posted in reply to Walter Bright

Proof of concept.

Only duplicate when the input is 'string' allowing for more efficient handling of char[] parameters and allowing callers to pass mutable char[] parameter, recieve the result as a mutable char[] and avoid future dup calls on the returned data.

Output:
sStringM: 0x  416080 becomes 0x  880FD0 DUP
sCharM  : 0x  880FE0 becomes 0x  880FE0 SAME
sString : 0x  416110 becomes 0x  416110 SAME
sChar   : 0x  880FC0 becomes 0x  880FC0 SAME


Code:
# /*
#  * Common Public License Version 1.0
#  * http://www.opensource.org/licenses/cpl1.0.php
#  */
# import std.stdio;
#
# void main()
# {
# 	string sStringM = "tEsT";
# 	char[] sCharM = sStringM.dup;
# 	string rStringM = .tolower(sStringM);
# 	char[] rCharM = .tolower(sCharM);
# 	
# 	writefln("sStringM: 0x%08x becomes 0x%08x %s", sStringM.ptr, rStringM.ptr, (sStringM.ptr!=rStringM.ptr)?"DUP":"SAME");
# 	writefln("sCharM  : 0x%08x becomes 0x%08x %s", sCharM.ptr, rCharM.ptr, (sCharM.ptr!=rCharM.ptr)?"DUP":"SAME");
#
# 	string sString = "test";
# 	char[] sChar = sString.dup;
# 	string rString = .tolower(sString);
# 	char[] rChar = .tolower(sChar);
#
# 	writefln("sString : 0x%08x becomes 0x%08x %s", sString.ptr, rString.ptr, (sString.ptr!=rString.ptr)?"DUP":"SAME");
# 	writefln("sChar   : 0x%08x becomes 0x%08x %s", sChar.ptr, rChar.ptr, (sChar.ptr!=rChar.ptr)?"DUP":"SAME");
# }
#
# T tolower(T)(T s)
# {
#     bool changed;
#     char[] r;
#
#     if (is(typeof(s) == char[]))
#     {
#     	changed = true;
#     	r = cast(char[])s;
#     }
#
#     for (size_t i = 0; i < s.length; i++)
#     {
# 	auto c = s[i];
# 	if ('A' <= c && c <= 'Z')
# 	{
# 	    if (!changed)
# 	    {
# 		r = s.dup;
# 		changed = true;
# 	    }
# 	    r[i] = cast(char) (c + (cast(char)'a' - 'A'));
# 	}
# 	else if (c >= 0x7F)
# 	{
# 	    foreach(size_t j, dchar dc; s[i .. length])
# 	    {
# 		if (std.uni.isUniUpper(dc))
# 		{
# 		    dc = std.uni.toUniLower(dc);
# 		    if (!changed)
# 		    {
# 			r = s[0 .. i + j].dup;
# 			changed = true;
# 		    }
# 		}
# 		if (changed)
# 		{
# 		    if (r.length != i + j)
# 			r = r[0 .. i + j];
# 		    std.utf.encode(r, dc);
# 		}
# 	    }
# 	    break;
# 	}
#     }
#     return changed ? r : s;
# }

July 06, 2007

Re: V2 string (general issues)

Posted by Kristian Kilpi
in reply to Walter Bright

Kristian Kilpi

Posted in reply to Walter Bright

On Thu, 05 Jul 2007 22:11:37 +0300, Walter Bright <newshound1@digitalmars.com> wrote:
> Kristian Kilpi wrote:
>> First, I am wondering why some functions are formed as follows:
>> (but I'm sure someone will (hopefully) enlight me about that ;) )
>>    string foo(string bar);
>>  That is, if they return something else than 'bar' (they do some string manipulation).
>> Shouldn't they return char[] instead?
>
> No, because then they must always dup the string. If they don't need to dup the string, they can return a reference to the parameter, and if so, it must be const.
>
>> There should be two different functions, one for each group:
>>    char[] tolower(char[] str);  //modifies and returns 'str'
>>    char[] getlower(string str);  //returns a copy
>
> When one would use a mutating tolower, one is already manipulating the contents of a string character by character. In such cases, one can tolower the characters in that process, instead of doing it later (the former will be more efficient anyway, and the only advantage to a mutating tolower is an efficiency improvement).

That makes sense (especially with strings).

Of course, as said, it's not a perfect solution because
unnecessary .dupping can occur.

For example:

  s = "blah " ~ foo(tolower(str).dup);

'foo()' modifies its input string and returns it.

If 'foo' would be a copy-on-write function, you could just do:

  s = "blah " ~ foo(tolower(str));

That's much nicer, but 'str' could be copied twice in both the cases above.
If both 'foo()' and 'tolower()' would modify 'str', no copying
had been done (by these functions).

Well, it's just how you like to code and build things.
Both the ways have their own pros and cons.

July 06, 2007

Posted by Leandro Lucarella
in reply to Derek Parnell

Leandro Lucarella

Posted in reply to Derek Parnell

Derek Parnell, el  6 de julio a las 14:23 me escribiste:
> On Thu, 05 Jul 2007 20:58:11 -0700, Walter Bright wrote:
> 
> > James Dennett wrote:
> >> I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all.  I'd be interested to know why you assert that no such cases exist.
> > 
> > I'd like to know of such cases.
> 
>   char[] Option;
> 
>   Option = getOptionFromUser();
>   if (Option.ptr = 0)
>   {
>    Option = DefaultOption;
>   }
> 
> However, if the user sets the option to "" then that is what they want and not the default one.

Basically is the same issue as NULL and NOT NULL on SQL...

-- 
LUCA - Leandro Lucarella - Usando Debian GNU/Linux Sid - GNU Generation
------------------------------------------------------------------------
E-Mail / JID:     luca@lugmen.org.ar
GPG Fingerprint:  D9E1 4545 0F4B 7928 E82C  375D 4B02 0FE0 B08B 4FB2
GPG Key:          gpg --keyserver pks.lugmen.org.ar --recv-keys B08B4FB2
------------------------------------------------------------------------
Sé que tu me miras, pero yo me juraría que, en esos ojos negros que
tenés, hay un indio sensible que piensa: "Qué bárbaro que este tipo
blanco esté tratando de comunicarse conmigo que soy un ser inferior en
la escala del homo sapiens". Por eso, querido indio, no puedo dejar de
mirarte como si fueras un cobayo de mierda al que puedo pisar cuando
quiera.
	-- Ricardo Vaporeso. Carta a los aborígenes, ed. Gredos,
		Barcelona, 1912, página 102.

July 06, 2007

Posted by Sean Kelly
in reply to Bill Baxter

Sean Kelly

Posted in reply to Bill Baxter

Bill Baxter wrote:
> 
> Anyway googling for "null versus empty" turns up a bevy of hits, so from that I think we can presume that the distinction is important to a non-empty subset of programmers.

Either that or it's important to a non-null set of programmers.


;-) Sean

July 07, 2007

Posted by James Dennett
in reply to Walter Bright

James Dennett

Posted in reply to Walter Bright

Walter Bright wrote:
> James Dennett wrote:
>> I've found many times when the difference between an empty string and no string was important; they generally have nothing to do with extending at all.  I'd be interested to know why you assert that no such cases exist.
> 
> I'd like to know of such cases.

Any time you need a difference between "specified, and
known to be empty" and "unspecified or unknown", which
is very common.  The alternative is to carry a boolean
around to say whether the string is in use.

Others have raised the case of null meaning "use default" (but let's not spend too much time on that specific case), and the fact that the database world often (though not always) distinguishes null from empty.  Many people have found good reason to do this.  The "Maybe" or "Fallible" type constructors used in other languages also cover cases where "absent" can usefully be handled separately from "empty" (in more general cases than just strings).

-- James

July 07, 2007

Posted by Serg Kovrov
in reply to Walter Bright

Serg Kovrov

Posted in reply to Walter Bright

Walter Bright wrote:
> James Dennett wrote:
>> I've found many times when the difference between an empty
>> string and no string was important; they generally have
>> nothing to do with extending at all.  I'd be interested to
>> know why you assert that no such cases exist.
> 
> I'd like to know of such cases.

I used to this pattern:
void foo(char[] bar=null)
{
    if (bar is null)
        m_bar = "default_value";
    else
        m_bar = bar; // even if it's empty
}

often as one-liner:
m_bar = (bar is null) ? "default_value" : bar;

This is most used one (at least by me), but of course there are more.


-- serg.

July 07, 2007

Posted by Bruno Medeiros
in reply to Regan Heath

Bruno Medeiros

Posted in reply to Regan Heath

Regan Heath wrote:
> Bruno Medeiros wrote:
>>>> The current signature:
>>>>   const(char)[] tolower(const(char)[] str)
>>>> is kinda incorrect, because it returns a const reference for an array that has no mutable references, and that is the same as an invariant reference, so tolower might as well return invariant(char)[].
>>>
>>> Again, that only holds if a copy was actually made at run time. If no copy was made the original input is returned, to which there may be mutable references.
>>
>> You're right, if a copy is not made *every* time (which is the case
>> after all), then the above doesn't hold.
>> But then, what I think is happening is that Phobo's current tolower is
>> suboptimal in terms of usefulness, because the fact that we don't know
>> if a new copy is made or not. I'm wondering now what would be the more
>> useful form, or forms, of tolower (and similar functions) to have.
>> Now that I think of it again (admittedly I haven't got much experience with string manipulation in C++ or D, though), but perhaps the best form is an in-place mutable version:
>>   char[] tolower(char[] str);
>> And it's this one after all that is the most general form. If you want to call tolower on a const or invariant array you dup it yourself on the call:
>>   char[] str = tolower("FOO".dup);
> 
> True.. but it's unfortunate that the most efficient case, where no duplication is needed, is no longer possible :(
> 

Algoritms should care about worst-case performance, or average-case performance. That most efficient "case", where a string is already tolower, is a minority case in most applications, and is never a worst-case scenario. So why bother?
Also, doing this tolower like that would give other performance problems like these:

> The only problem is that the case where you pass const data and it has to dup, you get back a const reference to a piece of data with no other owner (meaning it doesn't need to be const) which might cause another dup in your code at a later point.
> 
> Regan

Indeed, with such scenario, you would end up with worse performance overall.

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

July 07, 2007

Posted by Bruno Medeiros
in reply to Walter Bright

Bruno Medeiros

Posted in reply to Walter Bright

Walter Bright wrote:
> Derek Parnell wrote:
>> Let's say that there is this library routine, which is closed source and I
>> don't have access to its source, that accepts a string as its argument.
>> Further more, if that passed string is null the routine uses a default
>> value - whatever that is because I don't know it. Now in my code I call it
>> with ...
>>
>>    SomeFunc("");   -- Use an empty string to do its magic
>>    SomeFunc(null); -- But this time, use the default value
>>
>> Remember, I have no control over the SomeFunc routine's implementation.
> 
> Of course, if a function is documented to behave that way, and you have no control over it, you must adhere to its documentation.
> 
> There are other ways to do default arguments. I suspect we could argue about it like we could argue about tab stops, and never reach any sort of resolution <g>.

Uh, unlike tab stops, I think it is widely recognized by the developer community that it is useful to have a distinction between *valid* and *invalid* values of something.

Why is there a NAN for floats (and in D NAN is the default value for floats) ? What if NAN was equal to zero? Didn't you yourself, Walter, said once that if there was a way to have an actual invalid value for ints (without sacrificing precision) you would like to have that, and you would place it as the default value for int, instead of -1 (which is a valid int)?
So why shouldn't arrays (who are already reference types) have a value that means "invalid array", especially if we can get that for free (unlike ints)?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation