Jump to page: 1 2
Thread overview
to delete the '\0' characters
Sep 22, 2022
Salih Dincer
Sep 22, 2022
ag0aep6g
Sep 22, 2022
ag0aep6g
Sep 22, 2022
user1234
Sep 22, 2022
Salih Dincer
Sep 22, 2022
Paul Backus
Sep 22, 2022
Ali Çehreli
Sep 22, 2022
Ali Çehreli
Sep 22, 2022
Salih Dincer
Sep 22, 2022
Salih Dincer
Sep 22, 2022
Ali Çehreli
Sep 23, 2022
Salih Dincer
Sep 23, 2022
Jesse Phillips
Sep 23, 2022
Salih Dincer
Sep 23, 2022
Ali Çehreli
Sep 23, 2022
Paul Backus
Sep 24, 2022
Salih Dincer
Sep 23, 2022
Quirin Schroll
September 22, 2022

Is there a more accurate way to delete the '\0' characters at the end of the string? I tried functions in this module: https://dlang.org/phobos/std_string.html

auto foo(string s)
{
  string r;
  foreach(c; s)
  {
    if(c > 0)
    {
      r ~= c;
    }
  }
  return r;
}

SDB@79

September 22, 2022
On 22.09.22 12:53, Salih Dincer wrote:
> Is there a more accurate way to delete the '\0' characters at the end of the string? I tried functions in this module: https://dlang.org/phobos/std_string.html
> 
> ```d
> auto foo(string s)
> {
>    string r;
>    foreach(c; s)
>    {
>      if(c > 0)
>      {
>        r ~= c;
>      }
>    }
>    return r;
> }
> ```

I don't understand what you mean by "more accurate".

Here's a snippet that's a bit shorter than yours and doesn't copy the data:

    while (s.length > 0 && s[$ - 1] == '\0')
    {
        s = s[0 .. $ - 1];
    }
    return s;

But do you really want to allow embedded '\0's? I.e., should foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"?

Usually, it's the first '\0' that signals the end of a string. In that case you better start the search at the front and stop at the first hit.
September 22, 2022
On 22.09.22 13:14, ag0aep6g wrote:
> On 22.09.22 12:53, Salih Dincer wrote:
[...]
>> ```d
>> auto foo(string s)
>> {
>>    string r;
>>    foreach(c; s)
>>    {
>>      if(c > 0)
>>      {
>>        r ~= c;
>>      }
>>    }
>>    return r;
>> }
>> ```
[...]
> Here's a snippet that's a bit shorter than yours and doesn't copy the data:
> 
>      while (s.length > 0 && s[$ - 1] == '\0')
>      {
>          s = s[0 .. $ - 1];
>      }
>      return s;
> 
> But do you really want to allow embedded '\0's? I.e., should foo("foo\0bar\0") really resolve to "foo\0bar" and not "foo"?

Whoops. Your code actually turns "foo\0bar" into "foobar", removing the embedded '\0'. So my supposed alternative is wrong.

Still, you usually want to stop at the first '\0'.
September 22, 2022

On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer wrote:

>

Is there a more accurate way to delete the '\0' characters at the end of the string? I tried functions in this module: https://dlang.org/phobos/std_string.html

auto foo(string s)
{
  string r;
  foreach(c; s)
  {
    if(c > 0)
    {
      r ~= c;
    }
  }
  return r;
}

SDB@79

Two remarks:

  1. The zero implicitly added to literals is not part of the string. for example s[$-1] will not give 0 unless you added it explictly to a literal

  2. you code remove all the 0, not the one at the end. As it still ensure what you want to achieve, maybe try stripRight(). The second overload allows to specify the characters to remove.

September 22, 2022

On Thursday, 22 September 2022 at 13:29:43 UTC, user1234 wrote:

>

Two remarks:

  1. The zero implicitly added to literals is not part of the string. for example s[$-1] will not give 0 unless you added it explictly to a literal

  2. you code remove all the 0, not the one at the end. As it still ensure what you want to achieve, maybe try stripRight(). The second overload allows to specify the characters to remove.

As I mentioned earlier stripRight() and others don't work. What I'm talking about is not the terminating character. Actually, I'm the one who added the \0 character, and they are multiple. For example:

>

4B 6F 72 6B 6D 61 20 73 F6 6E 6D 65 7A 20 62 75 20 15F 61 66 61 6B 6C 61 72 64 61 20 79 FC 7A 65 6E 20 61 6C 20 73 61 6E 63 61 6B 0 0
53 F6 6E 6D 65 64 65 6E 20 79 75 72 64 75 6D 75 6E 20 FC 73 74 FC 6E 64 65 20 74 FC 74 65 6E 20 65 6E 20 73 6F 6E 20 6F 63 61 6B 0
4F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 20 79 131 6C 64 131 7A 131 64 131 72 20 70 61 72 6C 61 79 61 63 61 6B 0 0 0 0
4F 20 62 65 6E 69 6D 64 69 72 20 6F 20 62 65 6E 69 6D 20 6D 69 6C 6C 65 74 69 6D 69 6E 64 69 72 20 61 6E 63 61 6B 0 0
C7 61 74 6D 61 20 6B 75 72 62 61 6E 20 6F 6C 61 79 131 6D 20 E7 65 68 72 65 6E 69 20 65 79 20 6E 61 7A 6C 131 20 68 69 6C 61 6C 0 0
4B 61 68 72 61 6D 61 6E 20 131 72 6B 131 6D 61 20 62 69 72 20 67 FC 6C 20 6E 65 20 62 75 20 15F 69 64 64 65 74 20 62 75 20 63 65 6C E2 6C 0 0 0 0 0 0

Thanks, SDB@79

September 22, 2022

On Thursday, 22 September 2022 at 10:53:32 UTC, Salih Dincer wrote:

>

Is there a more accurate way to delete the '\0' characters at the end of the string? I tried functions in this module: https://dlang.org/phobos/std_string.html

auto foo(string s)
{
  string r;
  foreach(c; s)
  {
    if(c > 0)
    {
      r ~= c;
    }
  }
  return r;
}
import std.algorithm : filter;
import std.utf : byCodeUnit;
import std.array : array;

string removeZeroes(string s)
{
    return s.byCodeUnit
        .filter!(c => c != '\0')
        .array;
}
September 22, 2022
On 9/22/22 03:53, Salih Dincer wrote:
> Is there a more accurate way to delete the '\0' characters at the end of
> the string? I tried functions in this module:
> https://dlang.org/phobos/std_string.html

Just to remind, the following are always related as well because strings are arrays, which are ranges:

  std.range
  std.algorithm
  std.array

>        r ~= c;

Stefan Koch once said the ~ operator should be called "the slow operator". Meaning, if you want to make your code slow, then use that operator. :)

The reason is, that operation may need to allocate memory from the heap and copy existing elements there. And any memory allocation may trigger a garbage collection cycle.

Of course, none of that matters if we are talking about a short string. However, it may become a dominating reason why a program may be slow.

I was going to suggest Paul Backus' solution as well but I may leave the array part out in my own code until I really need it:

string noZeroes(string s)
{
    return s.byCodeUnit.filter!(c => c != '\0');
}

Now, a caller may be happy without an array:

    auto a = s.noZeroes.take(10);

And another can easily add a .array when really needed:

    auto b = s.noZeroes.array;

That may be seen as premature optimization but I see it as avoiding a premature pessimization because I did not put in any extra work there. But again, this all depends on each program.

If we were talking about mutable elements and the order of elements did not matter, then the fastest option would be to remove with SwapStrategy.unstable:

import std;

void main() {
    auto arr = [ 1, 0, 2, 0, 0, 3, 4, 5 ];
    arr = remove!(i => i == 0, SwapStrategy.unstable)(arr);
    writeln(arr);
}

unstable works by swapping the first 0 that it finds with the last non-zero that it finds and continues in that way. No memory is allocated. As a result, the order of elements will not preserved but unstable can be very fast compared to .stable (which is the default) because .stable must move elements to the left (multiple times in some cases) and can be expensive especially for some types.

The result of the program above is the following:

[1, 5, 2, 4, 3]

Zeros are removed but the order is not preserved.

And very important: Don't forget to assign remove's return value back to 'arr'. ;)

I know this will not work for a string but something to keep in mind...

Ali


September 22, 2022
On 9/22/22 08:19, Ali Çehreli wrote:

> string noZeroes(string s)
> {
>      return s.byCodeUnit.filter!(c => c != '\0');
> }

That won't compile; the return type must be 'auto'.

Ali


September 22, 2022

On Thursday, 22 September 2022 at 15:22:06 UTC, Ali Çehreli wrote:

>

On 9/22/22 08:19, Ali Çehreli wrote:

>
string noZeroes(string s)
{
     return s.byCodeUnit.filter!(c => c != '\0');
}

That won't compile; the return type must be 'auto'.

Ali

Thank you for all the valuable information you wrote. I chose to split because the '\0' are at the end of the string:

string splitz(string s)
{
  import std.string : indexOf;
  size_t seekPos = s.indexOf('\0');
  return s[0..seekPos];
}

SDB@79

September 22, 2022

On Thursday, 22 September 2022 at 20:53:28 UTC, Salih Dincer wrote:

>
string splitz(string s)
{
  import std.string : indexOf;
  size_t seekPos = s.indexOf('\0');
  return s[0..seekPos];
}

I ignored the possibility of not finding '\0'. I'm fixing it now:

string splitz(string s)
{
  import std.string : indexOf;
  auto seekPos = s.indexOf('\0');
  return seekPos > 0 ? s[0..seekPos] : s;
}

But I also wish it could be like this:

string splitz(string s)
{
  import std.string : indexOf;
  if(auto seekPos = s.indexOf('\0') > 0)
  {
    return s[0..seekPos];
  }
  return s;
}

SDB@79

« First   ‹ Prev
1 2