Same process to different results? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Same process to different results?

Thread overview

Same process to different results?
Jul 01, 2015 Taylor Hillegeist
Jul 01, 2015 Adam D. Ruppe
Jul 01, 2015 Taylor Hillegeist
Jul 01, 2015 anonymous
Jul 01, 2015 Taylor Hillegeist
Jul 01, 2015 Steven Schveighoffer
Jul 01, 2015 Steven Schveighoffer
Jul 01, 2015 H. S. Teoh

July 01, 2015

Same process to different results?

Posted by Taylor Hillegeist

Taylor Hillegeist

When I run the code (compiled on DMD 2.067.1):


------------------------------------------------------
import std.algorithm;
import std.stdio;
import std.range;

string A="AaA";
string B="BbBb";
string C="CcCcC";

void main(){
	int L=25;

  int seg1len=(L-B.length)/2;
  int seg2len=B.length;
  int seg3len=L-seg1len-seg2len;

  (A.cycle.take(seg1len).array
  ~B.cycle.take(seg2len).array
  ~C.cycle.take(seg3len).array).writeln;

  string q = cast(string)
  (A.cycle.take(seg1len).array
  ~B.cycle.take(seg2len).array
  ~C.cycle.take(seg3len).array);

  q.writeln;

}
-----------------------------------------------

I get a weird result of
AaAAaAAaAABbBbCcCcCCcCcCC
A   a   A   A   a   A   A   a   A   A   B   b   B   b   C   c   C   c   C   C   c   C   c   C   C

Any ideas why?

July 01, 2015

Re: Same process to different results?

Posted by Adam D. Ruppe
in reply to Taylor Hillegeist

Adam D. Ruppe

Posted in reply to Taylor Hillegeist

I betcha it is because A, B, and C are modified by the first pass. A lot of the range functions consume their input.

July 01, 2015

Re: Same process to different results?

Posted by Taylor Hillegeist
in reply to Adam D. Ruppe

Taylor Hillegeist

Posted in reply to Adam D. Ruppe

On Wednesday, 1 July 2015 at 17:06:01 UTC, Adam D. Ruppe wrote:
> I betcha it is because A, B, and C are modified by the first pass. A lot of the range functions consume their input.

Running them one at a time produces the same result.

for some reason:

  (A.cycle.take(seg1len).array
  ~B.cycle.take(seg2len).array
  ~C.cycle.take(seg3len).array).writeln;

is different from:

  string q = cast(string)
  (A.cycle.take(seg1len).array
  ~B.cycle.take(seg2len).array
  ~C.cycle.take(seg3len).array);
  q.writeln;

I was wondering if it might be the cast?

July 01, 2015

Re: Same process to different results?

Posted by anonymous
in reply to Taylor Hillegeist

anonymous

Posted in reply to Taylor Hillegeist

On Wednesday, 1 July 2015 at 17:13:03 UTC, Taylor Hillegeist wrote:
>   string q = cast(string)
>   (A.cycle.take(seg1len).array
>   ~B.cycle.take(seg2len).array
>   ~C.cycle.take(seg3len).array);
>   q.writeln;
>
> I was wondering if it might be the cast?

Yes, the cast is wrong. You're reinterpreting (not converting) an array of `dchar`s (UTF-32 code units) as an array of `char`s (UTF-8 code units).

If you print the numeric values of the string, e.g. via std.string.representation, you can see that every actual character has three null bytes following it:
----
import std.string: representation;
writeln(q.representation);
----
[65, 0, 0, 0, 97, 0, 0, 0, 65, 0, 0, 0, 65, 0, 0, 0, 97, 0, 0, 0, 65, 0, 0, 0, 65, 0, 0, 0, 97, 0, 0, 0, 65, 0, 0, 0, 65, 0, 0, 0, 66, 0, 0, 0, 98, 0, 0, 0, 66, 0, 0, 0, 98, 0, 0, 0, 67, 0, 0, 0, 99, 0, 0, 0, 67, 0, 0, 0, 99, 0, 0, 0, 67, 0, 0, 0, 67, 0, 0, 0, 99, 0, 0, 0, 67, 0, 0, 0, 99, 0, 0, 0, 67, 0, 0, 0, 67, 0, 0, 0]
----

Use std.conv.to for less surprising conversions. And don't use casts unless you know exactly what you're doing.

July 01, 2015

Re: Same process to different results?

Posted by Taylor Hillegeist
in reply to Taylor Hillegeist

Taylor Hillegeist

Posted in reply to Taylor Hillegeist

On Wednesday, 1 July 2015 at 17:00:51 UTC, Taylor Hillegeist wrote:
> When I run the code (compiled on DMD 2.067.1):
>
>
> ------------------------------------------------------
> import std.algorithm;
> import std.stdio;
> import std.range;
>
> string A="AaA";
> string B="BbBb";
> string C="CcCcC";
>
> void main(){
> 	int L=25;
>
>   int seg1len=(L-B.length)/2;
>   int seg2len=B.length;
>   int seg3len=L-seg1len-seg2len;
>
>   (A.cycle.take(seg1len).array
>   ~B.cycle.take(seg2len).array
>   ~C.cycle.take(seg3len).array).writeln;
>
>   string q = cast(string)
>   (A.cycle.take(seg1len).array
>   ~B.cycle.take(seg2len).array
>   ~C.cycle.take(seg3len).array);
>
>   q.writeln;
>
> }
> -----------------------------------------------
>
> I get a weird result of
> AaAAaAAaAABbBbCcCcCCcCcCC
> A   a   A   A   a   A   A   a   A   A   B   b   B   b   C   c   C
>   c   C   C   c   C   c   C   C
>
> Any ideas why?

Some way or another the type was converted to a dchar[]
during this process:

 A.cycle.take(seg1len).array
~B.cycle.take(seg2len).array
~C.cycle.take(seg3len).array

Why would it change the type so sneaky like?... Except for maybe its the default behavior with string due to 32bits => (typically one grapheme)?
I bet cycle did this.

July 01, 2015

Re: Same process to different results?

Posted by Steven Schveighoffer
in reply to Taylor Hillegeist

Steven Schveighoffer

Posted in reply to Taylor Hillegeist

On 7/1/15 1:00 PM, Taylor Hillegeist wrote:
> When I run the code (compiled on DMD 2.067.1):
>
>
> ------------------------------------------------------
> import std.algorithm;
> import std.stdio;
> import std.range;
>
> string A="AaA";
> string B="BbBb";
> string C="CcCcC";
>
> void main(){
>      int L=25;
>
>    int seg1len=(L-B.length)/2;
>    int seg2len=B.length;
>    int seg3len=L-seg1len-seg2len;
>
>    (A.cycle.take(seg1len).array
>    ~B.cycle.take(seg2len).array
>    ~C.cycle.take(seg3len).array).writeln;
>
>    string q = cast(string)
>    (A.cycle.take(seg1len).array
>    ~B.cycle.take(seg2len).array
>    ~C.cycle.take(seg3len).array);
>
>    q.writeln;
>
> }
> -----------------------------------------------
>
> I get a weird result of
> AaAAaAAaAABbBbCcCcCCcCcCC
> A   a   A   A   a   A   A   a   A   A   B   b   B   b   C   c   C   c
> C   C   c   C   c   C   C
>
> Any ideas why?

Schizophrenia of Phobos.

Phobos thinks a string is a range of dchar instead of a range of char. So what cycle, take, and array all output are dchar ranges and arrays.

When you cast the dchar[] result to a string, (which is a char[]), it then treats all the 0's in each dchar element as '\0', printing a blank apparently.

-Steve

July 01, 2015

Re: Same process to different results?

Posted by Steven Schveighoffer
in reply to Steven Schveighoffer

Steven Schveighoffer

Posted in reply to Steven Schveighoffer

On 7/1/15 1:44 PM, Steven Schveighoffer wrote:

> Schizophrenia of Phobos.
>
> Phobos thinks a string is a range of dchar instead of a range of char.
> So what cycle, take, and array all output are dchar ranges and arrays.
>
> When you cast the dchar[] result to a string, (which is a char[]), it
> then treats all the 0's in each dchar element as '\0', printing a blank
> apparently.

This has to be one of the most obvious cases I've ever seen that phobos treating string as a range of dchar was the wrong decision. That one can't use ranges to make a new string is ridiculous. Just the thought of "fixing" this by re-encoding...

-Steve

July 01, 2015

Re: Same process to different results?

Posted by H. S. Teoh
in reply to Steven Schveighoffer

H. S. Teoh

Posted in reply to Steven Schveighoffer

On Wed, Jul 01, 2015 at 02:14:49PM -0400, Steven Schveighoffer via Digitalmars-d-learn wrote:
> On 7/1/15 1:44 PM, Steven Schveighoffer wrote:
> 
> >Schizophrenia of Phobos.
> >
> >Phobos thinks a string is a range of dchar instead of a range of char.  So what cycle, take, and array all output are dchar ranges and arrays.
> >
> >When you cast the dchar[] result to a string, (which is a char[]), it then treats all the 0's in each dchar element as '\0', printing a blank apparently.
> 
> This has to be one of the most obvious cases I've ever seen that phobos treating string as a range of dchar was the wrong decision. That one can't use ranges to make a new string is ridiculous. Just the thought of "fixing" this by re-encoding...
[...]

Yeah, although Andrei has vetoed all suggestions of getting rid of autodecoding, this is one of the glaring cases where it's obviously a bad idea.

It almost makes me want to create my own custom string type that serves up char instead of dchar.


T

-- 
There are four kinds of lies: lies, damn lies, and statistics.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation