Thread overview
odd behavior of split() function
Jun 07, 2013
Bedros
Jun 07, 2013
Jonathan M Davis
Jun 07, 2013
Bedros
Jun 07, 2013
Benjamin Thaut
June 07, 2013
I would like to split "A+B+C+D" into "A", "B", "C", "D"

but when using split() I get

"A+B+C+D", "B+C+D", "C+D", "D"


the code is below


import std.stdio;
import std.string;
import std.array;

int main()
{
     string [] str_list;
     string test_str = "A+B+C+D";
     str_list = test_str.split("+");
     foreach(item; str_list)
             printf("%s\n", cast(char*)item);

     return 0;
}
June 07, 2013
On Friday, June 07, 2013 09:18:57 Bedros wrote:
> I would like to split "A+B+C+D" into "A", "B", "C", "D"
> 
> but when using split() I get
> 
> "A+B+C+D", "B+C+D", "C+D", "D"
> 
> 
> the code is below
> 
> 
> import std.stdio;
> import std.string;
> import std.array;
> 
> int main()
> {
>       string [] str_list;
>       string test_str = "A+B+C+D";
>       str_list = test_str.split("+");
>       foreach(item; str_list)
>               printf("%s\n", cast(char*)item);
> 
>       return 0;
> }

That would be because of your misuse of printf. If you used

foreach(item; str_list)
    writeln(item);

you would have been fine. D string literals do happen to have a null character one past their end so that you can pass them directly to C functions, but D strings in general are _not_ null terminated, and printf expects strings to be null terminated. If you want to convert a D string to a null terminated string, you need to use std.string.toStringz, not a cast. You should pretty much never cast a D string to char* or const char* or any variant thereof. So, you could have done

printf("%s\n", toStringz(item));

but I don't know why you'd want to use printf rather than writeln or writefln - both of which (unlike printf) are typesafe and understand D types.

You got

"A+B+C+D", "B+C+D", "C+D", "D"

because the original string (being a string literal) had a null character one past its end, and each of the strings returned by split was a slice of the original string, and printf blithely ignored the actual boundaries of the slice looking for the next null character that it happened to find in memory, which - because they were all slices of the same string literal - happened to be the end of the original string literal. And the strings printed differed, because each slice started in a different portion of the underlying array.

- Jonathan M Davis
June 07, 2013
first of all, many thanks for the quick reply.

I'm learning D and it's just because of the habit I unconsciously used printf instead of writef

thanks again.

-Bedros

On Friday, 7 June 2013 at 07:29:48 UTC, Jonathan M Davis wrote:
> On Friday, June 07, 2013 09:18:57 Bedros wrote:
>> I would like to split "A+B+C+D" into "A", "B", "C", "D"
>> 
>> but when using split() I get
>> 
>> "A+B+C+D", "B+C+D", "C+D", "D"
>> 
>> 
>> the code is below
>> 
>> 
>> import std.stdio;
>> import std.string;
>> import std.array;
>> 
>> int main()
>> {
>>       string [] str_list;
>>       string test_str = "A+B+C+D";
>>       str_list = test_str.split("+");
>>       foreach(item; str_list)
>>               printf("%s\n", cast(char*)item);
>> 
>>       return 0;
>> }
>
> That would be because of your misuse of printf. If you used
>
> foreach(item; str_list)
>     writeln(item);
>
> you would have been fine. D string literals do happen to have a null character
> one past their end so that you can pass them directly to C functions, but D
> strings in general are _not_ null terminated, and printf expects strings to be
> null terminated. If you want to convert a D string to a null terminated
> string, you need to use std.string.toStringz, not a cast. You should pretty
> much never cast a D string to char* or const char* or any variant thereof. So,
> you could have done
>
> printf("%s\n", toStringz(item));
>
> but I don't know why you'd want to use printf rather than writeln or writefln -
> both of which (unlike printf) are typesafe and understand D types.
>
> You got
>
> "A+B+C+D", "B+C+D", "C+D", "D"
>
> because the original string (being a string literal) had a null character one
> past its end, and each of the strings returned by split was a slice of the
> original string, and printf blithely ignored the actual boundaries of the
> slice looking for the next null character that it happened to find in memory,
> which - because they were all slices of the same string literal - happened to
> be the end of the original string literal. And the strings printed differed,
> because each slice started in a different portion of the underlying array.
>
> - Jonathan M Davis

June 07, 2013
Am 07.06.2013 09:53, schrieb Bedros:
> first of all, many thanks for the quick reply.
>
> I'm learning D and it's just because of the habit I unconsciously used
> printf instead of writef
>
> thanks again.
>
> -Bedros
>
> On Friday, 7 June 2013 at 07:29:48 UTC, Jonathan M Davis wrote:
>> On Friday, June 07, 2013 09:18:57 Bedros wrote:
>>> I would like to split "A+B+C+D" into "A", "B", "C", "D"
>>>
>>> but when using split() I get
>>>
>>> "A+B+C+D", "B+C+D", "C+D", "D"
>>>
>>>
>>> the code is below
>>>
>>>
>>> import std.stdio;
>>> import std.string;
>>> import std.array;
>>>
>>> int main()
>>> {
>>>       string [] str_list;
>>>       string test_str = "A+B+C+D";
>>>       str_list = test_str.split("+");
>>>       foreach(item; str_list)
>>>               printf("%s\n", cast(char*)item);
>>>
>>>       return 0;
>>> }
>>
>> That would be because of your misuse of printf. If you used
>>
>> foreach(item; str_list)
>>     writeln(item);
>>
>> you would have been fine. D string literals do happen to have a null
>> character
>> one past their end so that you can pass them directly to C functions,
>> but D
>> strings in general are _not_ null terminated, and printf expects
>> strings to be
>> null terminated. If you want to convert a D string to a null terminated
>> string, you need to use std.string.toStringz, not a cast. You should
>> pretty
>> much never cast a D string to char* or const char* or any variant
>> thereof. So,
>> you could have done
>>
>> printf("%s\n", toStringz(item));
>>
>> but I don't know why you'd want to use printf rather than writeln or
>> writefln -
>> both of which (unlike printf) are typesafe and understand D types.
>>
>> You got
>>
>> "A+B+C+D", "B+C+D", "C+D", "D"
>>
>> because the original string (being a string literal) had a null
>> character one
>> past its end, and each of the strings returned by split was a slice of
>> the
>> original string, and printf blithely ignored the actual boundaries of the
>> slice looking for the next null character that it happened to find in
>> memory,
>> which - because they were all slices of the same string literal -
>> happened to
>> be the end of the original string literal. And the strings printed
>> differed,
>> because each slice started in a different portion of the underlying
>> array.
>>
>> - Jonathan M Davis
>

You can use printf if you want to, the correct usage is not so nice though:

string str = "test";
printf("%.*s", str.length, str.ptr);

Kind Regards
Benjamin Thaut