Thread overview
Multidimensional dynamic array of strings initialized with split()
Sep 04, 2013
Ludovit Lucenic
Sep 04, 2013
H. S. Teoh
Sep 04, 2013
Ludovit Lucenic
Sep 05, 2013
Ludovit Lucenic
Sep 05, 2013
Ali Çehreli
Sep 22, 2017
Ludovit Lucenic
September 04, 2013
Hello friends,

with the following code

import std.stdio;
import std.array;

auto file71 = File(argv[2], "r");

string[][] buffer;
foreach (line; file71.byLines) {
    buffer ~= split(line, "\t");
}

I am trying to cut the lines from the file with tab as delimiter to pre-fetch the content of a file before further processing.

Each split() call gives correct string[] values in and of itself.
But when I try to read buffer, after the loop, I got corrupted data, like this:

[ ["-", "_Unit226", "constructor", "sub_00BE896C\t1\t?:?\t\t//con", "t", "uc...

Obviously the concatenation is doing no good, since there are tabs in the values...

What am I missing here ? Is it that split() allocated memory that gets overwritten in the loop and the ~= just copies the subarrays not copying the subsubarrays ? How to overcome this ?

Thank you very much,
Ludovit
September 04, 2013
On Thu, Sep 05, 2013 at 12:57:34AM +0200, Ludovit Lucenic wrote:
> Hello friends,
> 
> with the following code
> 
> import std.stdio;
> import std.array;
> 
> auto file71 = File(argv[2], "r");
> 
> string[][] buffer;
> foreach (line; file71.byLines) {
>     buffer ~= split(line, "\t");
> }
> 
> I am trying to cut the lines from the file with tab as delimiter to pre-fetch the content of a file before further processing.
> 
> Each split() call gives correct string[] values in and of itself. But when I try to read buffer, after the loop, I got corrupted data, like this:
> 
> [ ["-", "_Unit226", "constructor", "sub_00BE896C\t1\t?:?\t\t//con", "t", "uc...
> 
> Obviously the concatenation is doing no good, since there are tabs in the values...
> 
> What am I missing here ? Is it that split() allocated memory that gets overwritten in the loop and the ~= just copies the subarrays not copying the subsubarrays ? How to overcome this ?
[...]

The problem is that File.byLine() reuses its buffer for efficiency, and split is optimized to return slices into that buffer instead of copying each substring. So after every iteration the buffer (and therefore the slices into it) gets overwritten.

Replace the loop body with the following and it should work:

	buffer ~= split(line.dup, "\t");


T

-- 
Dogs have owners ... cats have staff. -- Krista Casada
September 04, 2013
On Wednesday, 4 September 2013 at 23:06:10 UTC, H. S. Teoh wrote:
>
> The problem is that File.byLine() reuses its buffer for efficiency, and
> split is optimized to return slices into that buffer instead of copying
> each substring. So after every iteration the buffer (and therefore the
> slices into it) gets overwritten.
>
> Replace the loop body with the following and it should work:
>
> 	buffer ~= split(line.dup, "\t");
>
>
> T

Thank you so much for your explanation.
Helped me a lot to understand things and works actually :-)
LL
September 05, 2013
I have created a wiki on this one.
http://wiki.dlang.org/Read_table_data_from_file

September 05, 2013
On 09/05/2013 01:14 AM, Ludovit Lucenic wrote:
> I have created a wiki on this one.
> http://wiki.dlang.org/Read_table_data_from_file
>

Compiling with "DMD64 D Compiler v2.064-devel-52cc287" produces the following errors:

* You had byLines in your original code as well. Shouldn't it be byLine?

* You are missing the closing brace of the foreach loop as well.

* "Error: cannot append type char[][] to type string[][]" I have to replace .dup with .idup

The following version is lazy:

import std.stdio;
import std.array;
import std.algorithm;

auto readInData(File inputFile, string fieldSeparator)
{
    return
        inputFile
        .byLine
        .map!(line => line
                      .idup
                      .split("\t"));
}

The caller can either use the result lazily:

import std.range;

void main()
{
    auto file = File("deneme.txt");
    writeln(readInData(file, "\t").take(2));
}

Or call .array on the result to consume the range eagerly:

    auto table = readInData(file, "\t").array;

Ali

September 22, 2017
On Thursday, 5 September 2013 at 16:22:46 UTC, Ali Çehreli wrote:
>
> Compiling with "DMD64 D Compiler v2.064-devel-52cc287" produces the following errors:
>
> * You had byLines in your original code as well. Shouldn't it be byLine?
>
> * You are missing the closing brace of the foreach loop as well.
>
> * "Error: cannot append type char[][] to type string[][]" I have to replace .dup with .idup

Thank you for pointing out the errors, Ali.
I have updated the example.

>
> The following version is lazy:
>
> import std.stdio;
> import std.array;
> import std.algorithm;
>
> auto readInData(File inputFile, string fieldSeparator)
> {
>     return
>         inputFile
>         .byLine
>         .map!(line => line
>                       .idup
>                       .split("\t"));
> }
>
> The caller can either use the result lazily:
>
> import std.range;
>
> void main()
> {
>     auto file = File("deneme.txt");
>     writeln(readInData(file, "\t").take(2));
> }
>
> Or call .array on the result to consume the range eagerly:
>
>     auto table = readInData(file, "\t").array;
>
> Ali

Thank you for the alternative approaches. This thread is linked from Credits section, if someone wants to find out more on the topic from the wiki.