Thread overview
Lazily parse a JSON text file using stdx.data.json?
Dec 17, 2017
David Gileadi
Dec 17, 2017
Jonathan M Davis
May 22, 2018
Dr.No
Dec 17, 2017
WebFreak001
Dec 17, 2017
David Gileadi
Dec 31, 2017
Marco Leise
Jan 01, 2018
David Gileadi
December 16, 2017
I'm a longtime fan of dlang, but haven't had a chance to do much in-depth dlang programming, and especially not range programming. Today I thought I'd use stdx.data.json to read from a text file. Since it's a somewhat large file, I thought I'd create a text range from the file and parse it that way. stdx.data.json has a great interface for lazily parsing text into JSON values, so all I had to do was turn a text file into a lazy range of UTF-8 chars that stdx.data.json's lexer could use. (In my best Clarkson voice:) How hard could it be?

Several hours later, I've finally given up and am just reading the whole file into a string. There may be a magic incantation I could use to make it work, but I can't find it, and frankly I can't see why I should need an incantation in the first place. It really ought to just be a method of std.stdio.File.

Apparently some of the complexity is caused by autodecoding (e.g. joiner returns a range of dchar from char ranges), and some of the fault may be in stdx.data.json, but either way I'm surprised that I couldn't do it. This is the kind of thing I expected to be ground level stuff.
December 17, 2017
On Saturday, December 16, 2017 21:34:22 David Gileadi via Digitalmars-d wrote:
> I'm a longtime fan of dlang, but haven't had a chance to do much in-depth dlang programming, and especially not range programming. Today I thought I'd use stdx.data.json to read from a text file. Since it's a somewhat large file, I thought I'd create a text range from the file and parse it that way. stdx.data.json has a great interface for lazily parsing text into JSON values, so all I had to do was turn a text file into a lazy range of UTF-8 chars that stdx.data.json's lexer could use. (In my best Clarkson voice:) How hard could it be?
>
> Several hours later, I've finally given up and am just reading the whole file into a string. There may be a magic incantation I could use to make it work, but I can't find it, and frankly I can't see why I should need an incantation in the first place. It really ought to just be a method of std.stdio.File.
>
> Apparently some of the complexity is caused by autodecoding (e.g. joiner returns a range of dchar from char ranges), and some of the fault may be in stdx.data.json, but either way I'm surprised that I couldn't do it. This is the kind of thing I expected to be ground level stuff.

I don't know what problems specifically you were hitting, but a lot of range-based stuff (especially parsing) requires forward ranges so that there can be some amount of lookahead (having just a basic input range can be incredibly restrictive), and forward ranges and lazily reading from a file don't tend to go together very well, because it tends to require allocating buffers that then have to be copied on save. It gets to be rather difficult to do it efficiently. std.stdio.File does support lazily reading in a file, which works well with foreach, but if you're trying to process the entire file as a range, it's usually just way easier to read in the entire file at once and operate on it as a dynamic array. The option halfway in between is to use std.mmfile so that the file gets treated as a dynamic array but the OS is reading it in piecemeal for you. If I were seriously looking at reading in a file lazily as a forward range, I'd look at http://code.dlang.org/packages/iopipe, though as I understand it, it's very much a work in progress.

As for auto-decoding, yeah, it sucks. You can work around it with stuff like std.utf.byCodeUnit, but auto-decoding is a problem all around, and it's one that we're likely stuck with, because unfortunately, we haven't found a way to remove it without breaking everything.

- Jonathan M Davis

December 17, 2017
On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote:
> I'm a longtime fan of dlang, but haven't had a chance to do much in-depth dlang programming, and especially not range programming. Today I thought I'd use stdx.data.json to read from a text file. Since it's a somewhat large file, I thought I'd create a text range from the file and parse it that way. stdx.data.json has a great interface for lazily parsing text into JSON values, so all I had to do was turn a text file into a lazy range of UTF-8 chars that stdx.data.json's lexer could use. (In my best Clarkson voice:) How hard could it be?
>
> [...]

uh I don't know about stdx.data.json but if you didn't manage to succeed yet, I know that asdf[1] works really well with streaming json. There is also an example how it works.

[1]: http://asdf.dub.pm
December 17, 2017
On 12/17/17 4:44 AM, Jonathan M Davis wrote:

> If I were seriously looking at
> reading in a file lazily as a forward range, I'd look at
> http://code.dlang.org/packages/iopipe, though as I understand it, it's very
> much a work in progress.

There is an even more work-in-progress library built on that, but it's not yet in dub (this was the library I wrote for my dconf talk this year): https://github.com/schveiguy/jsoniopipe

This kind of demonstrates how to parse json data lazily with pretty high performance.

It really depends on what you are trying to do, though.

> As for auto-decoding, yeah, it sucks. You can work around it with stuff like
> std.utf.byCodeUnit, but auto-decoding is a problem all around, and it's one
> that we're likely stuck with, because unfortunately, we haven't found a way
> to remove it without breaking everything.

I think there eventually will have to be a day of reckoning for auto-decoding. But it probably will take a monumental effort to show how it can be done without being too painful for existing code. I still believe it can be done.

-Steve
December 17, 2017
On 12/17/17 3:28 AM, WebFreak001 wrote:
> On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote:
> uh I don't know about stdx.data.json but if you didn't manage to succeed yet, I know that asdf[1] works really well with streaming json. There is also an example how it works.
> 
> [1]: http://asdf.dub.pm

Thanks, reading the whole file into memory worked fine. However, asdf looks really cool. I'll definitely look into next time I need to deal with JSON.
December 31, 2017
Am Sun, 17 Dec 2017 10:21:33 -0700
schrieb David Gileadi <gileadisNOSPM@gmail.com>:

> On 12/17/17 3:28 AM, WebFreak001 wrote:
> > On Sunday, 17 December 2017 at 04:34:22 UTC, David Gileadi wrote:
> > uh I don't know about stdx.data.json but if you didn't manage to succeed
> > yet, I know that asdf[1] works really well with streaming json. There is
> > also an example how it works.
> > 
> > [1]: http://asdf.dub.pm
> 
> Thanks, reading the whole file into memory worked fine. However, asdf looks really cool. I'll definitely look into next time I need to deal with JSON.

There is also the JSON parser from
https://github.com/mleise/fast
if you need to parse 2x faster than RapidJSON ;)

-- 
Marco

January 01, 2018
On 12/30/17 8:16 PM, Marco Leise wrote:
> There is also the JSON parser from
> https://github.com/mleise/fast
> if you need to parse 2x faster than RapidJSON ;)

Nice, I'll take a look.

My original post was mainly to express how surprised I was that one of D's front-page features was, for me, impossible to get working in this context. I posted in hopes that more experienced folks might consider making fixes to help smooth future attempts by others.

I realize that compile-time ranges are not runtime interfaces like many languages provide for iteration, but right now ranges seem too hard to get right when it feels like they should just work.
May 22, 2018
On Sunday, 17 December 2017 at 16:51:21 UTC, Steven Schveighoffer wrote:
> On 12/17/17 4:44 AM, Jonathan M Davis wrote:
>
>> [...]
>
> There is an even more work-in-progress library built on that, but it's not yet in dub (this was the library I wrote for my dconf talk this year): https://github.com/schveiguy/jsoniopipe
>
> This kind of demonstrates how to parse json data lazily with pretty high performance.
>
> It really depends on what you are trying to do, though.
>
>> [...]
>
> I think there eventually will have to be a day of reckoning for auto-decoding. But it probably will take a monumental effort to show how it can be done without being too painful for existing code. I still believe it can be done.
>
> -Steve

Does this cause infine loop?
https://github.com/schveiguy/jsoniopipe/blob/master/source/jsoniopipe/dom.d#L134
May 22, 2018
On 5/22/18 3:58 PM, Dr.No wrote:
> Does this cause infine loop?
> https://github.com/schveiguy/jsoniopipe/blob/master/source/jsoniopipe/dom.d#L134 
> 

Possibly. Bug reports are welcome :) I think on this line, it will make progress: https://github.com/schveiguy/jsoniopipe/blob/master/source/jsoniopipe/dom.d#L148, but I'm not confident enough to say I'm sure of it.

Of course, as you can probably see, I've spent almost no time working on that code base so far. I need to get back to it. The DOM parser has very little real usage, I just got it working with the given unittests and then checked it in.

I've changed iopipe a bit since then as well, but I think I got it compiling just before my "lightning talk" at the Munich D meetup during dconf. Didn't have time to demonstrate it though.

-Steve