Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
August 18, 2006 Operating with substrings in strings | ||||
---|---|---|---|---|
| ||||
Hi, i haven't found a function in the phobos lib to read a block of chars of a given length from a string taking the start index as a parameter, for example: we have the word "hello", i want to read starting from index 1 and i want this substring to have a length of 2, so the result should be "el", i've seen this in other languages, the function looks like: GetSubString(string mystring, int startindex, int length). Is there a way to acomplish this? Thx |
August 18, 2006 Re: Operating with substrings in strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Heinz | Heinz wrote: > Hi, i haven't found a function in the phobos lib to read a block of chars of a > given length from a string taking the start index as a parameter, for example: > we have the word "hello", i want to read starting from index 1 and i want this > substring to have a length of 2, so the result should be "el", i've seen this > in other languages, the function looks like: GetSubString(string mystring, int > startindex, int length). > > Is there a way to acomplish this? > > Thx Slicing: char[] h = "hello"; char[] sub = h[1..3] // Slice the string "hello" writefln(sub); // Prints "el" http://digitalmars.com/d/arrays.html#slicing -- Kirk McDonald Pyd: Wrapping Python with D http://pyd.dsource.org |
August 18, 2006 Re: Operating with substrings in strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kirk McDonald |
> Slicing:
>
> char[] h = "hello";
> char[] sub = h[1..3] // Slice the string "hello"
> writefln(sub); // Prints "el"
>
> http://digitalmars.com/d/arrays.html#slicing
>
I do not know much about UTF8. And I am often not sure if I do string processing right. Can someone enlighten me?
If I have
char[] str = ... some multibyte utf8 chars;
What does str.length give me. The number of bytes or the number of characters by looking at every character, which one are multi-bytes?
If I do some slicing (str[3..4]), does the indices slice at these byte positions and I have the risk of destroying the string or does it look at the characters to find the start of the third utf8 character?
Or did I miss something completely?
|
August 18, 2006 Re: Operating with substrings in strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frank Benoit | Frank Benoit wrote: > >> Slicing: >> >> char[] h = "hello"; >> char[] sub = h[1..3] // Slice the string "hello" >> writefln(sub); // Prints "el" >> >> http://digitalmars.com/d/arrays.html#slicing >> > > I do not know much about UTF8. And I am often not sure if I do string processing right. Can someone enlighten me? > > If I have > char[] str = ... some multibyte utf8 chars; > > What does str.length give me. The number of bytes or the number of characters by looking at every character, which one are multi-bytes? The number of bytes. > > If I do some slicing (str[3..4]), does the indices slice at these byte positions and I have the risk of destroying the string or does it look at the characters to find the start of the third utf8 character? It counts the byte positions. And you are correct. You risk splitting in the middle of a utf-8 code sequence making the string invalid. > > Or did I miss something completely? Not as far as I can tell. :) /Oskar |
August 18, 2006 Re: Operating with substrings in strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Oskar Linde | Oskar Linde schrieb:
> Frank Benoit wrote:
>> What does str.length give me. The number of bytes or the number of characters by looking at every character, which one are multi-bytes?
>
> The number of bytes.
>
>> If I do some slicing (str[3..4]), does the indices slice at these byte positions and I have the risk of destroying the string or does it look at the characters to find the start of the third utf8 character?
>
> It counts the byte positions. And you are correct. You risk splitting in the middle of a utf-8 code sequence making the string invalid.
>
> /Oskar
char is a utf8 character. Where is the difference to ubyte or 'ascii/latin1/...' char if there is no native support?
If the functionality is in a lib like phobos std.utf, ubyte/ushort/uint would work also. (Ok, the init values are different, but I hope that is not all).
Is dchar (utf32) the only save option to easily work with strings in a
correct way?
|
August 18, 2006 Re: Operating with substrings in strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frank Benoit | Frank Benoit wrote:
>
> Is dchar (utf32) the only save option to easily work with strings in a
> correct way?
Yes. Though I think in practice, slicing through the middle of a UTF8 character is probably unlikely as most string operations begin with search operations and the like.
Sean
|
August 19, 2006 Re: Operating with substrings in strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Frank Benoit | On Fri, 18 Aug 2006 22:03:49 +0200, Frank Benoit wrote: >> Slicing: >> >> char[] h = "hello"; >> char[] sub = h[1..3] // Slice the string "hello" >> writefln(sub); // Prints "el" >> >> http://digitalmars.com/d/arrays.html#slicing >> > > I do not know much about UTF8. And I am often not sure if I do string processing right. Can someone enlighten me? > > If I have > char[] str = ... some multibyte utf8 chars; > > What does str.length give me. The number of bytes or the number of characters by looking at every character, which one are multi-bytes? The number of bytes not characters. > If I do some slicing (str[3..4]), does the indices slice at these byte positions and I have the risk of destroying the string or does it look at the characters to find the start of the third utf8 character? > > Or did I miss something completely? No you didn't. The above slicing is only guaranteed if the variable contains ASCII text. If it doesn't then you will have to use more sophisticated methods. For example: char[] subtext; char[] text; subtext = toUTF8(toUTF32(text)[1..3]); -- Derek Parnell Melbourne, Australia "Down with mediocrity!" |
Copyright © 1999-2021 by the D Language Foundation