toUTFz and WinAPI GetTextExtentPoint32W (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » toUTFz and WinAPI GetTextExtentPoint32W (page 2)

September 20, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Andrej Mitrovic

Andrej Mitrovic

On 9/20/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> Or std.range.walkLength. I don't know why we really have std.utf.count. I
> just
> calls walkLength anyway. I suspect that it's a function that predates
> walkLength and was made to use walkLength after walkLength was introduced.
> But
> it's kind of pointless now.
>
> - Jonathan M Davis
>

I don't think having better-named aliases is a bad thing. Although now I'm seeing it's not just an alias but a function.

What exactly is the "static if (E.sizeof < 4)" in there for btw? When would the element type exceed 4 bytes while still passing the isSomeChar contract, and then why not stop compilation at that point instead of return "s.length"?

September 20, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Andrej Mitrovic

Andrej Mitrovic

One other thing, count can only take an array which seems too restrictive since walkLength can take any range at all. So maybe count should be just an alias to walkLength or it should possibly be removed (I'm against fully removing it because I already use it in code and I think the name does make sense).

September 20, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Jonathan M Davis

Jonathan M Davis

On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
> On 9/20/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> > Or std.range.walkLength. I don't know why we really have std.utf.count. I
> > just
> > calls walkLength anyway. I suspect that it's a function that predates
> > walkLength and was made to use walkLength after walkLength was
> > introduced. But
> > it's kind of pointless now.
> > 
> > - Jonathan M Davis
> 
> I don't think having better-named aliases is a bad thing. Although now I'm seeing it's not just an alias but a function.

We specifically avoid having aliases in Phobos simply for having alternate function names. Aliases need to actually be useful, or they shouldn't be there.

> What exactly is the "static if (E.sizeof < 4)" in there for btw? When would the element type exceed 4 bytes while still passing the isSomeChar contract, and then why not stop compilation at that point instead of return "s.length"?

The static if is there to special-case narrow strings. It's unnecessary (though it does eliminate a function call when -inline isn't used). It would have been necessary prior to count just forwarding to walkLength, but it isn't now.

> One other thing, count can only take an array which seems too restrictive since walkLength can take any range at all. So maybe count should be just an alias to walkLength or it should possibly be removed (I'm against fully removing it because I already use it in code and I think the name does make sense).

I don't know if we're going to remove std.utf.count or not, but it _is_ the kind of thing that we've been removing. It doesn't add any real value. It's just another function which does exactly the same thing as walkLength except that it's restricted to strings, and we don't generally like having pointless aliases around (or pointless function wrappers, which amounts to pretty much the same thing). So, it wouldn't surprise me at all if it goes away, but if/when it does, it'll go through the proper deprecation cycle rather than just being removed, so if/when we do that, it's not like your code would immediately break.

- Jonathan M Davis

September 20, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Andrej Mitrovic

Andrej Mitrovic

On 9/20/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> We specifically avoid having aliases in Phobos simply for having alternate function names. Aliases need to actually be useful, or they shouldn't be there.

And function names have to be useful to library users. walkLength is an awful name for something that returns the character count.

If you ask a GUI developer to look for a function that creates a rectangle path, you can be sure he'll start looking for Rectangle or DrawRect or something similar, and not "ClosedShapePointN!4" or something that generic.

September 20, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Jonathan M Davis

Jonathan M Davis

On Tuesday, September 20, 2011 15:10 Andrej Mitrovic wrote:
> On 9/20/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> > We specifically avoid having aliases in Phobos simply for having alternate function names. Aliases need to actually be useful, or they shouldn't be there.
> 
> And function names have to be useful to library users. walkLength is an awful name for something that returns the character count.
> 
> If you ask a GUI developer to look for a function that creates a rectangle path, you can be sure he'll start looking for Rectangle or DrawRect or something similar, and not "ClosedShapePointN!4" or something that generic.

In this case, if there's a problem it's not how generic the function is, it's the name walkLength. There's nothing special about strings which makes the name count better for them than it is for other ranges. The function is returning the number of elements in the range - be they code points or integers or whatever. The name walkLength works just as well for strings as it does for anything else. So, if there's a problem it's that the name walkLength isn't necessarily all that great. Strings aren't so special that they merit their own function name for the same functionality. So, if count stays, it's simply because it's been around for a while, not because it's inherently better to have a separate count function.

- Jonathan M Davis

September 20, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Christophe
in reply to Jonathan M Davis

Christophe

Posted in reply to Jonathan M Davis

"Jonathan M Davis" , dans le message (digitalmars.D.learn:29637), a
 écrit :
> On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
>> On 9/20/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
>> > Or std.range.walkLength. I don't know why we really have std.utf.count. I
>> > just
>> > calls walkLength anyway. I suspect that it's a function that predates
>> > walkLength and was made to use walkLength after walkLength was
>> > introduced. But
>> > it's kind of pointless now.
>> > 
>> > - Jonathan M Davis
>> 
>> I don't think having better-named aliases is a bad thing. Although now I'm seeing it's not just an alias but a function.
> 

std.utf.count has on advantage: someone looking for the function will
find it. The programmer might not look in std.range to find a function
about UFT strings, and even if he did, it is not indicated in walkLength
that it works with (narrow) strings the way it does. To know you can use
walklength, you must know that:
-popFront works differently in string.
-hasLength is not true for strings.
-what is walkLength.

So yes, you experienced programmer don't need std.utf.count, but newbies do.

Last point: WalkLength is not optimized for strings. std.utf.count should be.

This short implementation of count was 3 to 8 times faster than walkLength is a simple benchmark:

size_t myCount(string text)
{
  size_t n = text.length;
  for (uint i=0; i<text.length; ++i)
    {
      auto s = text[i]>>6;
      n -= (s>>1) - ((s+1)>>2);
    }
  return n;
}

(compiled with gdc on 64 bits, the sample text was the introduction of french wikipedia UTF-8 article down to the sommaire - http://fr.wikipedia.org/wiki/UTF-8 ).

The reason is that the loop can be unrolled by the compiler.

September 21, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Timon Gehr
in reply to Christophe

Timon Gehr

Posted in reply to Christophe

On 09/21/2011 01:57 AM, Christophe wrote:
> "Jonathan M Davis" , dans le message (digitalmars.D.learn:29637), a
>   écrit :
>> On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
>>> On 9/20/11, Jonathan M Davis<jmdavisProg@gmx.com>  wrote:
>>>> Or std.range.walkLength. I don't know why we really have std.utf.count. I
>>>> just
>>>> calls walkLength anyway. I suspect that it's a function that predates
>>>> walkLength and was made to use walkLength after walkLength was
>>>> introduced. But
>>>> it's kind of pointless now.
>>>>
>>>> - Jonathan M Davis
>>>
>>> I don't think having better-named aliases is a bad thing. Although now
>>> I'm seeing it's not just an alias but a function.
>>
>
> std.utf.count has on advantage: someone looking for the function will
> find it. The programmer might not look in std.range to find a function
> about UFT strings, and even if he did, it is not indicated in walkLength
> that it works with (narrow) strings the way it does. To know you can use
> walklength, you must know that:
> -popFront works differently in string.
> -hasLength is not true for strings.
> -what is walkLength.
>
> So yes, you experienced programmer don't need std.utf.count, but newbies
> do.
>
> Last point: WalkLength is not optimized for strings.
> std.utf.count should be.
>
> This short implementation of count was 3 to 8 times faster than
> walkLength is a simple benchmark:
>
> size_t myCount(string text)
> {
>    size_t n = text.length;
>    for (uint i=0; i<text.length; ++i)
>      {
>        auto s = text[i]>>6;
>        n -= (s>>1) - ((s+1)>>2);
>      }
>    return n;
> }
>
> (compiled with gdc on 64 bits, the sample text was the introduction of
> french wikipedia UTF-8 article down to the sommaire -
> http://fr.wikipedia.org/wiki/UTF-8 ).
>
> The reason is that the loop can be unrolled by the compiler.

Very good point, you might want to file an enhancement request. It would make the functionality different enough to prevent count from being removed: walkLength throws on an invalid UTF sequence.

September 21, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Christophe
in reply to Timon Gehr

Christophe

Posted in reply to Timon Gehr

Timon Gehr , dans le message (digitalmars.D.learn:29641), a écrit :
>> Last point: WalkLength is not optimized for strings. std.utf.count should be.
>>
>> This short implementation of count was 3 to 8 times faster than walkLength is a simple benchmark:
>>
>> size_t myCount(string text)
>> {
>>    size_t n = text.length;
>>    for (uint i=0; i<text.length; ++i)
>>      {
>>        auto s = text[i]>>6;
>>        n -= (s>>1) - ((s+1)>>2);
>>      }
>>    return n;
>> }
>>
>> (compiled with gdc on 64 bits, the sample text was the introduction of french wikipedia UTF-8 article down to the sommaire - http://fr.wikipedia.org/wiki/UTF-8 ).
>>
>> The reason is that the loop can be unrolled by the compiler.
> 
> Very good point, you might want to file an enhancement request. It would make the functionality different enough to prevent count from being removed: walkLength throws on an invalid UTF sequence.

I would be glad to do so, but I am quite new here, so I don't know how to. A little pointer could help.

-- 
Christophe

September 21, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Dmitry Olshansky
in reply to Timon Gehr

Dmitry Olshansky

Posted in reply to Timon Gehr

On 21.09.2011 4:04, Timon Gehr wrote:
> On 09/21/2011 01:57 AM, Christophe wrote:
>> "Jonathan M Davis" , dans le message (digitalmars.D.learn:29637), a
>> écrit :
>>> On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
>>>> On 9/20/11, Jonathan M Davis<jmdavisProg@gmx.com> wrote:
>>>>> Or std.range.walkLength. I don't know why we really have
>>>>> std.utf.count. I
>>>>> just
>>>>> calls walkLength anyway. I suspect that it's a function that predates
>>>>> walkLength and was made to use walkLength after walkLength was
>>>>> introduced. But
>>>>> it's kind of pointless now.
>>>>>
>>>>> - Jonathan M Davis
>>>>
>>>> I don't think having better-named aliases is a bad thing. Although now
>>>> I'm seeing it's not just an alias but a function.
>>>
>>
>> std.utf.count has on advantage: someone looking for the function will
>> find it. The programmer might not look in std.range to find a function
>> about UFT strings, and even if he did, it is not indicated in walkLength
>> that it works with (narrow) strings the way it does. To know you can use
>> walklength, you must know that:
>> -popFront works differently in string.
>> -hasLength is not true for strings.
>> -what is walkLength.
>>
>> So yes, you experienced programmer don't need std.utf.count, but newbies
>> do.
>>
>> Last point: WalkLength is not optimized for strings.
>> std.utf.count should be.
>>
>> This short implementation of count was 3 to 8 times faster than
>> walkLength is a simple benchmark:
>>
>> size_t myCount(string text)
>> {
>> size_t n = text.length;
>> for (uint i=0; i<text.length; ++i)
>> {
>> auto s = text[i]>>6;
>> n -= (s>>1) - ((s+1)>>2);
>> }
>> return n;
>> }
>>
>> (compiled with gdc on 64 bits, the sample text was the introduction of
>> french wikipedia UTF-8 article down to the sommaire -
>> http://fr.wikipedia.org/wiki/UTF-8 ).
>>
>> The reason is that the loop can be unrolled by the compiler.
>
> Very good point, you might want to file an enhancement request. It would
> make the functionality different enough to prevent count from being
> removed: walkLength throws on an invalid UTF sequence.

Actually, I don't buy it. I guess the reason it's faster is that it doesn't check if the codepoint is valid. In fact you can easily get ridiculous overflowed "negative" lengths. Maybe we can put it here as unsafe and fast version though.
Also check std.utf.stride to see if you can get it better, it's the beast behind narrow string popFront.

-- 
Dmitry Olshansky

September 21, 2011

Re: toUTFz and WinAPI GetTextExtentPoint32W

Posted by Timon Gehr
in reply to Christophe

Timon Gehr

Posted in reply to Christophe

On 09/21/2011 02:15 AM, Christophe wrote:
> Timon Gehr , dans le message (digitalmars.D.learn:29641), a écrit :
>>> Last point: WalkLength is not optimized for strings.
>>> std.utf.count should be.
>>>
>>> This short implementation of count was 3 to 8 times faster than
>>> walkLength is a simple benchmark:
>>>
>>> size_t myCount(string text)
>>> {
>>>     size_t n = text.length;
>>>     for (uint i=0; i<text.length; ++i)
>>>       {
>>>         auto s = text[i]>>6;
>>>         n -= (s>>1) - ((s+1)>>2);
>>>       }
>>>     return n;
>>> }
>>>
>>> (compiled with gdc on 64 bits, the sample text was the introduction of
>>> french wikipedia UTF-8 article down to the sommaire -
>>> http://fr.wikipedia.org/wiki/UTF-8 ).
>>>
>>> The reason is that the loop can be unrolled by the compiler.
>>
>> Very good point, you might want to file an enhancement request. It would
>> make the functionality different enough to prevent count from being
>> removed: walkLength throws on an invalid UTF sequence.
>
> I would be glad to do so, but I am quite new here, so I don't know how
> to. A little pointer could help.
>

http://d.puremagic.com/issues/

You can tick 'Severity: enhancement request'. Probably it would be best if it throws if the final result is larger than text.length though.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation