Thread overview
Record separator is being lost after string cast
Feb 04, 2015
Kadir Erdem Demir
Feb 04, 2015
Kagamin
Feb 04, 2015
Kagamin
Feb 04, 2015
ketmar
Feb 04, 2015
Kadir Erdem Demir
Feb 04, 2015
ketmar
Feb 04, 2015
Kadir Erdem Demir
February 04, 2015
I am opening a .gz file and reading it chunk by chunk for uncompressing it.

The data in the uncompressed file is like : aRSbRScRSd, There are record separators(ASCII code 30) between each record(records in my dummy example a,b,c).

    File file = File(mylog.gz, "r");
    auto uc = new UnCompress();
    foreach (ubyte[] curChunk; file.byChunk(4096*1024))
    {
        auto uncompressed = cast(string)uc.uncompress(curChunk);
        writeln(uncompressed);
        auto stringRange = uncompressed.splitLines();
        foreach (string line; stringRange)
        {
            ***************** Do something with line

The result of the code above is: abcd unfortunately record separators(ASCII 30) are missing.

I realized by examining the data record separators are missing after I cast ubyte[] to string.

Now I have two questions :

Urgent one (my boss already a little disturbed I started the task with D I need to solve this): What should I change in the code to keep record separator?

Second one : How can I write the code above without for loops? I want to read gz file line by line.

A more general and understandable code for first question :

    ubyte[] temp = [ 65, 30, 66, 30, 67];
    writeln(temp);
    string tempStr = cast(string) temp;
    writeln (tempStr);

Result is : ABC which is not desired.

Thanks
Kadir Erdem
February 04, 2015
Looks like RS is an unprintable character, that's why you don't see it in console.
February 04, 2015
You can use C functions in D too:

import core.stdc.stdio;
ubyte[] temp = [ 65, 30, 66, 30, 67, 0];
puts(cast(char*)temp.ptr);
February 04, 2015
On Wed, 04 Feb 2015 08:13:28 +0000, Kadir Erdem Demir wrote:

> A more general and understandable code for first question :
> 
>      ubyte[] temp = [ 65, 30, 66, 30, 67]; writeln(temp);
>      string tempStr = cast(string) temp;
>      writeln (tempStr);
> 
> Result is : ABC which is not desired.

nothing is lost in the program. what you see is a quirk in tty output: '\x1f' is unprintable character, so you simply cannot see it. redirect the output to file and open that file in any hex editor -- and you will find your separators intact.

don't beleive what you see! ;-)

February 04, 2015
> don't beleive what you see! ;-)

I am sorry make a busy community more busy with false alarms.
When I write to file I saw Record Separator really exists.

I hope my second question is a valid one.
How can I write the code below better? How can I reduce the number of foreach? statements.

    File file = File(mylog.gz, "r");
    auto uc = new UnCompress();
    foreach (ubyte[] curChunk; file.byChunk(4096*1024))
    {
        auto uncompressed = cast(string)uc.uncompress(curChunk);
        writeln(uncompressed);
        auto stringRange = uncompressed.splitLines();
        foreach (string line; stringRange)
        {

Thanks a lot for replies
Kadir Erdem
February 04, 2015
On Wed, 04 Feb 2015 09:28:27 +0000, Kadir Erdem Demir wrote:

> I am sorry make a busy community more busy with false alarms.

don't mind it. ;-) "D.learn" is for *any* questions about language, no matter how strange they may seem.

> How can I write the code below better? How can I reduce the number of foreach? statements.

actually, your loop seems to be not good anyway, as it may easily read only part of a line. sadly, there is no streaming interface to gz files, so your best bet is to read the whole file in memory, then unpack it all at once, and then process it. just be sure that you have enough RAM. something like this:

  import std.stdio;
  import std.string;
  import std.zlib;

  void main () {
    char[] unpacked;
    // read the whole file and unpack it
    {
      auto fl = File("test.txt.gz", "rb");
      auto packed = new ubyte[](cast(usize)fl.size);
      fl.rawRead(packed);
      auto up = new UnCompress();
      unpacked ~= cast(char[])up.uncompress(packed);
      unpacked ~= cast(char[])up.flush();
    }
    foreach (auto s; unpacked.splitLines) {
      writeln(s);
    }
  }


February 04, 2015
Thanks a lot,

I will follow your advise and implement this part same as your example.

Regards
Kadir Erdem