Jump to page: 1 2
Thread overview
splitting numbers from a test file
Sep 19, 2012
Craig Dillabaugh
Sep 19, 2012
bearophile
Sep 19, 2012
Craig Dillabaugh
Sep 19, 2012
Ali Çehreli
Sep 19, 2012
Jonathan M Davis
Sep 19, 2012
Craig Dillabaugh
Sep 19, 2012
Jonathan M Davis
Sep 19, 2012
Craig Dillabaugh
Sep 19, 2012
Ali Çehreli
Sep 19, 2012
Craig Dillabaugh
Sep 19, 2012
Ali Çehreli
Sep 19, 2012
Jonathan M Davis
September 19, 2012
Hello I am trying to read in a set of numbers from a text file.
The file in questions looks something like this:

35  2  0  1
    0    0.49463548699999998  0.88077994719999997    0
    1    0.60672109949999997  0.2254208717    0


After each line I want to check how many numbers were on the line
I just read. My code to read this file looks like:

1 import std.stdio;
2 import std.conv;
3
4 int main( string[] argv ) {
5    real[] numbers_read;
6    size_t line_count=1;
7
8    auto f = std.stdio.File("test.txt", "r");
9    foreach( char[] s; f.byLine() ) {
10     string line = std.string.strip( to!string(s) );
11     auto parts = std.array.splitter( line );
12     writeln("There are ", parts.length, " numbers in line ",
line_count++);
13     foreach(string p; parts) {
14     numbers_read ~= to!real(p);
15      }
16    }
17    f.close();
18    return 0;
19 }

When I try to compile this I get an error:
test.d(12): Error undefined identifier 'length;

However, shouldn't splitter be returning an array (thats what the
docs seem to show)? What is the type of 'parts'? (I tried using
std.traits to figure this out, but that just generated more
syntax errors for me).

Cheers,

Craig


September 19, 2012
Craig Dillabaugh:

> 8    auto f = std.stdio.File("test.txt", "r");
> 9    foreach( char[] s; f.byLine() ) {
> 10     string line = std.string.strip( to!string(s) );
> 11     auto parts = std.array.splitter( line );
> 12     writeln("There are ", parts.length, " numbers in line ",
> line_count++);
> 13     foreach(string p; parts) {
> 14     numbers_read ~= to!real(p);
> 15      }
> 16    }
> 17    f.close();
> 18    return 0;
> 19 }
>
> When I try to compile this I get an error:
> test.d(12): Error undefined identifier 'length;

Here to!string() is probably unnecessary, it's a wasted allocation.

splitter() returns a lazy range that doesn't know its length.

To solve your problem there are two main solutions: to use split() instead of splitter(), or to use walkLength() on the range given by splitter().

In theory splitter() should faster, but in practice this isn't always true.

Keep in mind that "real" is usually more than 64 bits long, and it's not so fast.

Maybe nowdays there are other ways to load that data, I don't know if readfln("%(%f %)%") or something similar works.

Bye,
bearophile
September 19, 2012
On 09/18/2012 07:50 PM, Craig Dillabaugh wrote:

> 11 auto parts = std.array.splitter( line );
> 12 writeln("There are ", parts.length, " numbers in line ",

> When I try to compile this I get an error:
> test.d(12): Error undefined identifier 'length;

That is a very common confusion with ranges.

> However, shouldn't splitter be returning an array (thats what the
> docs seem to show)?

No, parts is a lazy range, which is ready to serve its elements as needed. If you want to convert its elements to an array eagerly, you can call std.array.array:

import std.array;
// ...

    writeln("There are ", array(parts).length,
            " numbers in line ", line_count++);

> What is the type of 'parts'?

    writeln(typeid(parts));

or

    writeln(typeof(parts).stringof);

Ali

-- 
D Programming Language Tutorial: http://ddili.org/ders/d.en/index.html

September 19, 2012
On Wednesday, September 19, 2012 04:50:45 Craig Dillabaugh wrote:
> Hello I am trying to read in a set of numbers from a text file. The file in questions looks something like this:
> 
> 35  2  0  1
>      0    0.49463548699999998  0.88077994719999997    0
>      1    0.60672109949999997  0.2254208717    0
> 
> 
> After each line I want to check how many numbers were on the line I just read. My code to read this file looks like:
> 
> 1 import std.stdio;
> 2 import std.conv;
> 3
> 4 int main( string[] argv ) {
> 5    real[] numbers_read;
> 6    size_t line_count=1;
> 7
> 8    auto f = std.stdio.File("test.txt", "r");
> 9    foreach( char[] s; f.byLine() ) {
> 10     string line = std.string.strip( to!string(s) );
> 11     auto parts = std.array.splitter( line );
> 12     writeln("There are ", parts.length, " numbers in line ",
> line_count++);
> 13     foreach(string p; parts) {
> 14     numbers_read ~= to!real(p);
> 15      }
> 16    }
> 17    f.close();
> 18    return 0;
> 19 }
> 
> When I try to compile this I get an error:
> test.d(12): Error undefined identifier 'length;
> 
> However, shouldn't splitter be returning an array (thats what the docs seem to show)? What is the type of 'parts'? (I tried using std.traits to figure this out, but that just generated more syntax errors for me).

The docs do not show that splitter returns an array, because it doesn't. It returns a lazy range type which finds each successive element as you iterate over it. It doesn't have a length property, because it's length isn't known until you iterate over it. You have three options:

1. Use std.array.split, which returns an array (so, it's eager and requires additional memory allocations to create the array, but you'll have its length without having to iterate over it multiple times).

2. Use std.range.walkLength to get the length of the range. If a range has a length property, then walkLength just returns that, otherwise it iterates over the whole range and counts its elements. So, you won't get extra memory allocations, but you'll have to iterate over the range twice.

3. Simply count up the number of elements as you iterate over them and _then_ print out the length.

Also, theres no need to convert s to a string like that. If you were saving the string or needed an actual string instead of char[], then that would make sense, but you're just splitting it and then converting it to a number. char[] will work just fine for that. So, something like this would probably be better

import std.conv;
import std.stdio;
import std.string;

void main()
{
    real[] numbers_read;
    size_t line_count = 0;

    auto f = std.stdio.File("test.txt", "r");
    foreach(line; f.byLine())
    {
        line = strip(line);
        auto parts = std.array.splitter(line);
        size_t length = 0;

        foreach(p; parts)
        {
            numbers_read ~= to!real(p);
            ++length;
        }

        writeln("There are ", length, " numbers in line ", ++line_count);
    }
}

If you aren't familiar with ranges, then read this

http://ddili.org/ders/d.en/ranges.html

But ranges are used quite heavily in Phobos, so you should be familiar with them if you intend to use D.

- Jonathan M Davis
September 19, 2012
On Wednesday, 19 September 2012 at 02:58:33 UTC, bearophile wrote:
> Craig Dillabaugh:
>
>> 8    auto f = std.stdio.File("test.txt", "r");
>> 9    foreach( char[] s; f.byLine() ) {
>> 10     string line = std.string.strip( to!string(s) );
>> 11     auto parts = std.array.splitter( line );
>> 12     writeln("There are ", parts.length, " numbers in line ",
>> line_count++);
>> 13     foreach(string p; parts) {
>> 14     numbers_read ~= to!real(p);
>> 15      }
>> 16    }
>> 17    f.close();
>> 18    return 0;
>> 19 }
>>
>> When I try to compile this I get an error:
>> test.d(12): Error undefined identifier 'length;
>
> Here to!string() is probably unnecessary, it's a wasted allocation.
>
> splitter() returns a lazy range that doesn't know its length.
>
> To solve your problem there are two main solutions: to use split() instead of splitter(), or to use walkLength() on the range given by splitter().
>
> In theory splitter() should faster, but in practice this isn't always true.
>
> Keep in mind that "real" is usually more than 64 bits long, and it's not so fast.
>
> Maybe nowdays there are other ways to load that data, I don't know if readfln("%(%f %)%") or something similar works.
>
> Bye,
> bearophile

Thanks very much.

I tried the strip() without to!string and got a syntax error when
I tried to compile.

Cheers,
Craig

September 19, 2012
On Wednesday, 19 September 2012 at 03:12:21 UTC, Jonathan M Davis
wrote:
> On Wednesday, September 19, 2012 04:50:45 Craig Dillabaugh wrote:
>> Hello I am trying to read in a set of numbers from a text file.
>> The file in questions looks something like this:
>> 
>> 35  2  0  1
>>      0    0.49463548699999998  0.88077994719999997    0
>>      1    0.60672109949999997  0.2254208717    0
>> 
>> 
>> After each line I want to check how many numbers were on the line
>> I just read. My code to read this file looks like:
>> 
>> 1 import std.stdio;
>> 2 import std.conv;
>> 3
>> 4 int main( string[] argv ) {
>> 5    real[] numbers_read;
>> 6    size_t line_count=1;
>> 7
>> 8    auto f = std.stdio.File("test.txt", "r");
>> 9    foreach( char[] s; f.byLine() ) {
>> 10     string line = std.string.strip( to!string(s) );
>> 11     auto parts = std.array.splitter( line );
>> 12     writeln("There are ", parts.length, " numbers in line ",
>> line_count++);
>> 13     foreach(string p; parts) {
>> 14     numbers_read ~= to!real(p);
>> 15      }
>> 16    }
>> 17    f.close();
>> 18    return 0;
>> 19 }
>> 
>> When I try to compile this I get an error:
>> test.d(12): Error undefined identifier 'length;
>> 
>> However, shouldn't splitter be returning an array (thats what the
>> docs seem to show)? What is the type of 'parts'? (I tried using
>> std.traits to figure this out, but that just generated more
>> syntax errors for me).
>
> The docs do not show that splitter returns an array, because it doesn't. It
> returns a lazy range type which finds each successive element as you iterate
> over it. It doesn't have a length property, because it's length isn't known
> until you iterate over it. You have three options:

Thanks, a few others have pointed that out to me too.  But as a D
newbie how would I have any clue what splitter returns since the
return type is auto?
The is an example in the docs.

auto a = " a     bcd   ef gh ";
assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][]));

I guessed that since the return of splitter was equal to :
["", "a", "bcd", "ef", "gh"][]
it was returning some sort of 2D array!

When a function returns an 'auto' in the Phobos is this generally
indicative of the return value being a range?

>
> 1. Use std.array.split, which returns an array (so, it's eager and requires
> additional memory allocations to create the array, but you'll have its length
> without having to iterate over it multiple times).
>
> 2. Use std.range.walkLength to get the length of the range. If a range has a
> length property, then walkLength just returns that, otherwise it iterates over
> the whole range and counts its elements. So, you won't get extra memory
> allocations, but you'll have to iterate over the range twice.
>
> 3. Simply count up the number of elements as you iterate over them and _then_
> print out the length.
>
> Also, theres no need to convert s to a string like that. If you were saving
> the string or needed an actual string instead of char[], then that would make
> sense, but you're just splitting it and then converting it to a number. char[]
> will work just fine for that. So, something like this would probably be better
I think my problem was that I was trying to call strip on it first
to remove leading/trailing whitespace and I was getting syntax
errors
when I called strip() on the char[]. Just calling split works as
you
say.



>
> import std.conv;
> import std.stdio;
> import std.string;
>
> void main()
> {
>     real[] numbers_read;
>     size_t line_count = 0;
>
>     auto f = std.stdio.File("test.txt", "r");
>     foreach(line; f.byLine())
>     {
>         line = strip(line);
>         auto parts = std.array.splitter(line);
>         size_t length = 0;
>
>         foreach(p; parts)
>         {
>             numbers_read ~= to!real(p);
>             ++length;
>         }
>
>         writeln("There are ", length, " numbers in line ", ++line_count);
>     }
> }
>
> If you aren't familiar with ranges, then read this
>
> http://ddili.org/ders/d.en/ranges.html
>
> But ranges are used quite heavily in Phobos, so you should be familiar with
> them if you intend to use D.
>
> - Jonathan M Davis


September 19, 2012
On Wednesday, September 19, 2012 05:36:36 Craig Dillabaugh wrote:
> Thanks, a few others have pointed that out to me too.  But as a D newbie how would I have any clue what splitter returns since the return type is auto?

The documentation says that it returns a range. Presumably then, the problem is that you're not familiar with ranges, and that needs to be handled better. We really need a proper article/tutorial on the main site which explains them, and we don't. But I don't know what we'd do differently in the documentation for functions in general. Ranges are a concept that are used quite heavily in Phobos, and it wouldn't make sense to try and explain them for every function that uses them.

> The is an example in the docs.
> 
> auto a = " a     bcd   ef gh ";
> assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][]));

It would have used == if it were an array. equal operates on ranges, so if it's used, odds are that the types on the right and left sides are different.

> I guessed that since the return of splitter was equal to :
> ["", "a", "bcd", "ef", "gh"][]
> it was returning some sort of 2D array!
> 
> When a function returns an 'auto' in the Phobos is this generally indicative of the return value being a range?

That's the most common, but it's not always the case. It will usually say in the documentation though (and if you're familiar with ranges, it's generally fairly obvious if the return type is a range just based on what the function is doing), and in this case it does.

> I think my problem was that I was trying to call strip on it first
> to remove leading/trailing whitespace and I was getting syntax
> errors when I called strip() on the char[]. Just calling split works as
> you say.

strip works just fine on a char[]. I don't know why you were having problems with it. Maybe you're using an older release of the compiler and strip used to take a string rather than being templated on character type? I don't know. If you're on 2.060 though, strip should work just fine with char[].

- Jonathan M Davis
September 19, 2012
On Wednesday, 19 September 2012 at 04:03:44 UTC, Jonathan M Davis
wrote:
> On Wednesday, September 19, 2012 05:36:36 Craig Dillabaugh wrote:
>> Thanks, a few others have pointed that out to me too.  But as a D
>> newbie how would I have any clue what splitter returns since the
>> return type is auto?
>
> The documentation says that it returns a range. Presumably then, the problem
> is that you're not familiar with ranges, and that needs to be handled better.
> We really need a proper article/tutorial on the main site which explains them,
> and we don't. But I don't know what we'd do differently in the documentation
> for functions in general. Ranges are a concept that are used quite heavily in
> Phobos, and it wouldn't make sense to try and explain them for every function
> that uses them.
From:
http://dlang.org/phobos/std_array.html#splitter

The documentation (copied and pasted) for splitter reads:

auto splitter(C)(C[] s);
Splits a string by whitespace.

Example:
auto a = " a     bcd   ef gh ";
assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][]));

I have this awful feeling that I am missing something blatantly
obvious here, and that by posting this reply I am leaving a
permanent testament to my stupidity on the internet, but I really
want to understand this ...

I just want to figure out how you can explicitly say "the
documentation says it returns a range" based on that!  Is is
simply because you recognize the range from the assert statement
in the example?

I am sure the Phobos developers have better things to do then
writing documentation that coddles newbies, but could the
documentation not say:

auto splitter(C)(C[] s);
Splits a string by whitespace. Returns an InputRange of all
substrings.

Or something to that affect.

Thanks again for your time.

clip ....

September 19, 2012
On 09/18/2012 09:56 PM, Craig Dillabaugh wrote:
> On Wednesday, 19 September 2012 at 04:03:44 UTC, Jonathan M Davis
> wrote:

>> The documentation says that it returns a range.

> From:
> http://dlang.org/phobos/std_array.html#splitter
>
> The documentation (copied and pasted) for splitter reads:
>
> auto splitter(C)(C[] s);
> Splits a string by whitespace.
>
> Example:
> auto a = " a bcd ef gh ";
> assert(equal(splitter(a), ["", "a", "bcd", "ef", "gh"][]));

It is unfortunate that there is also the other splitter, which at least implies ranges: :-/

  http://dlang.org/phobos/std_algorithm.html#splitter

Yes, the documentation can be much better.

For example, the documentation for the second splitter above looks exacly like the other one, except that one says "using an element as a separator." while the other one says "using another range as a separator".

I think it is a ddoc limitation: Template constraints are not included in documentation yet.

Ali

September 19, 2012
On Wednesday, September 19, 2012 06:56:23 Craig Dillabaugh wrote:
> From:
> http://dlang.org/phobos/std_array.html#splitter

Ah. I was looking at std.algorithm.splitter (which operates on generic ranges and separators) which _does_ explicitly say that it returns a range.

Yeah. The documentation on std.array.splitter is incredibly sparse. It doesn't even state the result is lazy (though if it did, it would be bound to say that it was a lazy range, which would then mean that it was stating that the return type was a range), making the difference between it and split not at all obvious. That should be fixed. Internally, it just does

return std.algorithm.splitter!(std.uni.isWhite)(s);

- Jonathan M Davis
« First   ‹ Prev
1 2