Jump to page: 1 2
Thread overview
Reading files using delimiters/terminators
Dec 27, 2020
Rekel
Dec 27, 2020
Jesse Phillips
Dec 27, 2020
Rekel
Dec 27, 2020
Jesse Phillips
Dec 27, 2020
Ali Çehreli
Dec 27, 2020
oddp
Dec 27, 2020
Rekel
Dec 27, 2020
Rekel
Dec 28, 2020
Mike Parker
Dec 28, 2020
Ali Çehreli
Dec 28, 2020
oddp
Dec 28, 2020
Rekel
Dec 30, 2020
Rekel
December 27, 2020
I'm trying to read a file with entries seperated by '\n\n' (empty line), with entries containing '\n'. I thought the File.readLine(KeepTerminator, Terminator) might work, as it seems to accept strings as terminators, since there seems to have been a thread regarding '\r\n' seperators.

I don't know if there's some underlying reason, but when I try to use "\n\n" as a terminator, I end up getting the entire file into 1 char[], so it's not delimited.

Should this work or is there a reason one cannot use byLine like this?

For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/)
December 27, 2020
On Sunday, 27 December 2020 at 00:13:30 UTC, Rekel wrote:
> I'm trying to read a file with entries seperated by '\n\n' (empty line), with entries containing '\n'. I thought the File.readLine(KeepTerminator, Terminator) might work, as it seems to accept strings as terminators, since there seems to have been a thread regarding '\r\n' seperators.
>
> I don't know if there's some underlying reason, but when I try to use "\n\n" as a terminator, I end up getting the entire file into 1 char[], so it's not delimited.
>
> Should this work or is there a reason one cannot use byLine like this?
>
> For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/)

Unfortunately std.csv is character based and not string. https://dlang.org/phobos/std_csv.html#.csvReader

But your use case sounds like splitter is more aligned with your needs.

https://dlang.org/phobos/std_algorithm_iteration.html#.splitter
December 26, 2020
On 12/26/20 4:13 PM, Rekel wrote:
> I'm trying to read a file with entries seperated by '\n\n' (empty line), with entries containing '\n'. I thought the File.readLine(KeepTerminator, Terminator) might work, as it seems to accept strings as terminators, since there seems to have been a thread regarding '\r\n' seperators.
> 
> I don't know if there's some underlying reason, but when I try to use "\n\n" as a terminator, I end up getting the entire file into 1 char[], so it's not delimited.
> 
> Should this work or is there a reason one cannot use byLine like this?
> 
> For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/)

byLine should work:

import std.stdio;

void main() {
  auto f = File("deneme.d");

  // Warning: byLine reuses an internal buffer. Call byLineCopy
  // if potentially parsed strings into the line need to persist.
  foreach (line; f.byLine) {
    if (line.length == 0) {
      writeln("EMPTY LINE");

    } else {
      writeln(line);
    }
  }
}

Ali

December 27, 2020
On Sunday, 27 December 2020 at 02:41:12 UTC, Jesse Phillips wrote:
> Unfortunately std.csv is character based and not string. https://dlang.org/phobos/std_csv.html#.csvReader
>
> But your use case sounds like splitter is more aligned with your needs.
>
> https://dlang.org/phobos/std_algorithm_iteration.html#.splitter

But I'm not using csv right? Additionally, shouldnt byLine also work with "\r\n"?
December 27, 2020
On 27.12.20 01:13, Rekel via Digitalmars-d-learn wrote:
> For context, I'm trying this with the puzzle input of day 6 of this year's advent of code. (https://adventofcode.com/)

For that specific puzzle I simply did:

foreach (group; readText("input").splitter("\n\n")) { ... }

Since the input is never that big, I prefer reading in the whole thing and then do the processing.

Also, on other days, when the input is more uniform, there's always https://dlang.org/library/std/file/slurp.html which makes reading it in even easier, e.g. day02:

alias Record = Tuple!(int, "low", int, "high", char, "needle", string, "hay");
auto input = slurp!Record("input", "%d-%d %s: %s");

P.S.: would've loved to have had multiwayIntersection in the stdlib for day06 part2, especially when there's already multiwayUnion in setops. fold!setIntersection felt a bit clunky.
December 27, 2020
On Sunday, 27 December 2020 at 13:21:44 UTC, Rekel wrote:
> On Sunday, 27 December 2020 at 02:41:12 UTC, Jesse Phillips wrote:
>> Unfortunately std.csv is character based and not string. https://dlang.org/phobos/std_csv.html#.csvReader
>>
>> But your use case sounds like splitter is more aligned with your needs.
>>
>> https://dlang.org/phobos/std_algorithm_iteration.html#.splitter
>
> But I'm not using csv right? Additionally, shouldnt byLine also work with "\r\n"?

Right, you weren't using csv. I'm not familiar with the file terminater to known why it didn't work.

byline would allow \r\n as well as \n
December 27, 2020
On Sunday, 27 December 2020 at 13:27:49 UTC, oddp wrote:
> foreach (group; readText("input").splitter("\n\n")) { ... }

> Also, on other days, when the input is more uniform, there's always https://dlang.org/library/std/file/slurp.html which makes reading it in even easier, e.g. day02:
>
> alias Record = Tuple!(int, "low", int, "high", char, "needle", string, "hay");
> auto input = slurp!Record("input", "%d-%d %s: %s");
>
> P.S.: would've loved to have had multiwayIntersection in the stdlib for day06 part2, especially when there's already multiwayUnion in setops. fold!setIntersection felt a bit clunky.

Oh my, all these things are new to me, haha, thanks a lot! I'll be looking into those (slurp & tuple). By the way, is there a reason to use either 'splitter' or 'split'? I'm not sure I see why the difference would matter in the end.

Sidetangent, don't mean to bash the learning tour, as it's been really useful for getting started, but I'm surprised stuff like tuples and files arent mentioned there.
Especially since the documentation tends to trip me up, with stuff like 'isSomeString' mentioning 'built in string types', while I haven't been able to find that concept elsewhere, let alone functionality one can expect in this case (like .length and the like), and stuff like 'countUntil' not being called 'indexOf', although it also exists and does basically the same thing. Also assumeUnique seems to be a thing?
December 27, 2020
On Sunday, 27 December 2020 at 23:12:46 UTC, Rekel wrote:
> Sidetangent, don't mean to bash the learning tour, as it's been really useful for getting started, but I'm surprised stuff like tuples and files arent mentioned there.

Update;
Any clue why there's both "std.file" and "std.io.File"?
I was mostly unaware of the former.
December 27, 2020
On 12/27/20 3:12 PM, Rekel wrote:

> is there a reason to use
> either 'splitter' or 'split'? I'm not sure I see why the difference
> would matter in the end.

splitter() is a lazy range algorithm. split() is a range algorithm as well but it is eager; it will put the results in an array that it grows. The string elements would not be copies of the original range; they will still be just the pair of .ptr and .length but it can be expensive if there are a lot of parts.

Further, if you want to process just a small number of the initial parts, then being eager would be wasteful.

As all lazy range algorithms, splitter() is just an iteration object waiting to be used. It does not allocate any array but serves the parts one by one. You can filter the parts as you iterate over or you can stop at any point. For example, the following would take the first 3 non-empty lines:

import std.stdio;
import std.range;
import std.algorithm;

void main() {
  auto s = "hello\n\nworld\n\n\nand\nmoon";
  writefln!"%(%s, %)"(s.splitter('\n').filter!(part => !part.empty).take(3));
}

> Sidetangent, don't mean to bash the learning tour, as it's been really
> useful for getting started, but I'm surprised stuff like tuples and
> files arent mentioned there.

Alternative place to search: :)

  http://ddili.org/ders/d.en/ix.html

> Especially since the documentation tends to trip me up, with stuff like
> 'isSomeString' mentioning 'built in string types', while I haven't been
> able to find that concept elsewhere,

Built in strings are just arrays of character types: char[], wchar[], and dchar[]. Commonly used by their respective immutable aliases: string, wstring, and dstring.

> 'countUntil' not being called 'indexOf'

countUntil() is more general because it works with any range while indexOf requires a string.

> assumeUnique seems to be a thing?

That appears in the index I posted above as well. ;)

Ali

December 28, 2020
On 28.12.20 00:12, Rekel via Digitalmars-d-learn wrote:
> is there a reason to use either 'splitter' or 'split'?

split gives you a newly allocated array with the results, splitter is lazy equivalent and doesn't allocate. Feel free using either, doesn't matter much with these small puzzle inputs.

> Sidetangent, don't mean to bash the learning tour, as it's been really useful for getting started, but I'm surprised stuff like tuples and files arent mentioned there.
> Especially since the documentation tends to trip me up, with stuff like 'isSomeString' mentioning 'built in string types', while I haven't been able to find that concept elsewhere, let alone functionality one can expect in this case (like .length and the like), and stuff like 'countUntil' not being called 'indexOf', although it also exists and does basically the same thing. Also assumeUnique seems to be a thing?

Might be worth discussing that in a new topic. The stdlib is vast and has tons of useful utilities, not all of which can be explained in detail in a series of overview posts. Ali's "Programming in D" [1], which has a free online version, functions as an excellent in-depth introduction to the language, going over all the important topics.

Regarding function names and docs: Yes, some might seem slightly off coming from other languages (e.g. find vs. dropWhile, until vs. takeWhile, cumulativeFold vs scan/accumulate, etc.), but it's all in there somewhere, implemented with the most care to not waste precious cycles. Might makes it harder to grok going over the implementation or docs for very the first time, but it gets easier after a while. Furthermore, alternative names are often times mentioned in the docs so a quick google search should bring you to the right place.

[1] http://ddili.org/ders/d.en/index.html
« First   ‹ Prev
1 2