Jump to page: 1 2
Thread overview
String Prefix Predicate
Aug 14, 2014
Nordlöw
Aug 14, 2014
Nordlöw
Aug 14, 2014
Justin Whear
Aug 14, 2014
Nordlöw
Aug 14, 2014
Jonathan M Davis
Aug 16, 2014
monarch_dodra
Aug 16, 2014
monarch_dodra
Aug 18, 2014
Nordlöw
Aug 18, 2014
monarch_dodra
Aug 18, 2014
Nordlöw
Aug 19, 2014
monarch_dodra
Aug 19, 2014
Nordlöw
August 14, 2014
What's the preferrred way to check if a string starts with another string if the string is a

1. string (utf-8) BiDir
2. wstring (utf-16) BiDir
3. dstring (utf-32) Random
August 14, 2014
On Thursday, 14 August 2014 at 17:17:13 UTC, Nordlöw wrote:
> What's the preferrred way to check if a string starts with another string if the string is a

Should I use std.algorithm.startsWith() in all cases?
August 14, 2014
On Thu, 14 Aug 2014 17:17:11 +0000, Nordlöw wrote:

> What's the preferrred way to check if a string starts with another string if the string is a
> 
> 1. string (utf-8) BiDir 2. wstring (utf-16) BiDir 3. dstring (utf-32)
> Random

std.algorithm.startsWith?  Should auto-decode, so it'll do a utf-32 comparison behind the scenes.
August 14, 2014
On Thursday, 14 August 2014 at 17:33:41 UTC, Justin Whear wrote:
> std.algorithm.startsWith?  Should auto-decode, so it'll do a

What about https://github.com/D-Programming-Language/phobos/pull/2043

Auto-decoding should be avoided when possible.

I guess something like

whole.byDchar().startsWith(part.byDchar())

is preferred right?

If so is this what we will live with until Phobos has been upgraded to using pull 2043 in a few years?
August 14, 2014
On Thursday, 14 August 2014 at 17:41:08 UTC, Nordlöw wrote:
> On Thursday, 14 August 2014 at 17:33:41 UTC, Justin Whear wrote:
>> std.algorithm.startsWith?  Should auto-decode, so it'll do a
>
> What about https://github.com/D-Programming-Language/phobos/pull/2043
>
> Auto-decoding should be avoided when possible.
>
> I guess something like
>
> whole.byDchar().startsWith(part.byDchar())
>
> is preferred right?
>
> If so is this what we will live with until Phobos has been upgraded to using pull 2043 in a few years?

Except that you _have_ to decode in this case. Unless the string types match, there's no way around it. And startsWith won't decode if the string types match. So, I really see no issue in just straight-up using startsWith.

Where you run into problems with auto-decoding in Phobos functions is when a function results in a new range type. That forces you into a range of dchar, whether you wanted it or not. But beyond that, Phobos is actually pretty good about avoiding unnecessary decoding (though there probably are places where it could be improved). The big problem is that that requires special-casing a lot of functions, whereas that wouldn't be required with a range of char or wchar.

So, the biggest problems with automatic decoding are when a function returns a range of dchar when you wanted to operate on code units or when you write a function and then have to special case it for strings if you want to avoid the auto-decoding, whereas that's already been done for you with most Phobos functions.

- Jonathan M Davis
August 16, 2014
On Thursday, 14 August 2014 at 17:41:08 UTC, Nordlöw wrote:
> On Thursday, 14 August 2014 at 17:33:41 UTC, Justin Whear wrote:
>> std.algorithm.startsWith?  Should auto-decode, so it'll do a
>
> What about https://github.com/D-Programming-Language/phobos/pull/2043
>
> Auto-decoding should be avoided when possible.
>
> I guess something like
>
> whole.byDchar().startsWith(part.byDchar())
>
> is preferred right?

I don't get it? If you use "byDchar", you are *explicitly* decoding. How is that any better? If anything, you are *preventing* the (many) opportunities phobos has to *avoid* decoding when it can...

If you really want to avoid decoding, use either "representation" which will do char[] => ubyte[] conversion, or "byCodeUnit", which will create a range that returns single elements (IMO, "byCodeUnit" should be prefered over "byChar", as it infers the correct width).

August 16, 2014
On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra wrote:
> If anything, you are *preventing* the (many) opportunities phobos has to *avoid* decoding when it can...

By that I want to stress what Jonathan M Davis said
"Unless the string types match, there's no way around it."

You should absolutely realize that that means that when the string types (widths) *do* match, then "search" (which includes all flavors in phobos) will NOT decode.

Heck, if you do a "string, element" search, eg find("my phrase", someDchar), then phobos will *encode* someDchar into a correctly sized string, and then do a full non-decoding string-string search, which is actually much faster than the naive decoding search.
August 18, 2014
On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra wrote:
> I don't get it? If you use "byDchar", you are *explicitly* decoding. How is that any better? If anything, you are *preventing* the (many) opportunities phobos has to *avoid* decoding when it can...

byDchar and alikes are lazy ranges, ie they don't allocate.

They also don't throw exceptions which is prefferably in some cases.

Read the details at
https://github.com/D-Programming-Language/phobos/pull/2043
August 18, 2014
On Monday, 18 August 2014 at 11:28:25 UTC, Nordlöw wrote:
> On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra wrote:
>> I don't get it? If you use "byDchar", you are *explicitly* decoding. How is that any better? If anything, you are *preventing* the (many) opportunities phobos has to *avoid* decoding when it can...
>
> byDchar and alikes are lazy ranges, ie they don't allocate.

Lazy does NOT mean does not allocate. You are making a terrible mistake if you assume that.

Furthermore decoding does NOT allocate either. At worst, it can throw an exception, but that's exceptional.

> They also don't throw exceptions which is preferably in some cases.

Even then, "startsWith(string1, string2)" will *NOT* decode. It will do a binary comparison of the codeunits. A fast one at that, since you'll use SIMD vector comparison. Because of this, it won't throw any exceptions either. This compiles just fine:
void main() nothrow
{
    bool b = "foobar".startsWith("foo");
}


In contrast, with:
whole.byDchar().startsWith(part.byDchar())
You *will* decode. *THAT* will be painfully slow.

> Read the details at
> https://github.com/D-Programming-Language/phobos/pull/2043

If you are using a string, the only thing helpful in there is `byCodeunit`. The rest is only useful if you have actual ranges.

If you are using phobos, you should really trust the implementation that decoding will only happen on a "as needed" basis.
August 18, 2014
On Monday, 18 August 2014 at 12:42:25 UTC, monarch_dodra wrote:
> On Monday, 18 August 2014 at 11:28:25 UTC, Nordlöw wrote:
>> On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra wrote:
>>> I don't get it? If you use "byDchar", you are *explicitly* decoding. How is that any better? If anything, you are *preventing* the (many) opportunities phobos has to *avoid* decoding when it can...
>>
>> byDchar and alikes are lazy ranges, ie they don't allocate.
>
> Lazy does NOT mean does not allocate. You are making a terrible mistake if you assume that.

Ok, sorry about that. My mistake. And thanks for correcting me on this matter.

> Furthermore decoding does NOT allocate either. At worst, it can throw an exception, but that's exceptional.
>
>> They also don't throw exceptions which is preferably in some cases.
>
> Even then, "startsWith(string1, string2)" will *NOT* decode. It will do a binary comparison of the codeunits. A fast one at that, since you'll use SIMD vector comparison. Because of this, it won't throw any exceptions either. This compiles just fine:
> void main() nothrow
> {
>     bool b = "foobar".startsWith("foo");
> }

Ok, so decoding is needed only when whole and part have different encodings,

> In contrast, with:
> whole.byDchar().startsWith(part.byDchar())
> You *will* decode. *THAT* will be painfully slow.

Ok.

>
>> Read the details at
>> https://github.com/D-Programming-Language/phobos/pull/2043
>
> If you are using a string, the only thing helpful in there is `byCodeunit`. The rest is only useful if you have actual ranges.

Actual ranges of...characters and strings? Could you gives some examples? I'm curious.

> If you are using phobos, you should really trust the implementation that decoding will only happen on a "as needed" basis.

Ok, got it.
« First   ‹ Prev
1 2