| Thread overview | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 23, 2015 stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
It's great, but it's not quite a replacement for std.json, as I see it. The stream parser is fast, and it's valuable to be able to access it at a low level. However, it was consciously designed to be low-level, and for something else to go on top. As I understand it, there is a gap between what you can currently do with std.json (and indeed vibed json) and what you can do with stdx.data.json. And the capability falls short of what can be done in other standard libraries such as the ones for python. So since we are going for a nuclear-power station included approach, does that not mean that we need to specify what this layer should do, and somebody should start to work on it? | ||||
June 23, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc | On 24/06/2015 12:17 a.m., Laeeth Isharc wrote: > It's great, but it's not quite a replacement for std.json, as I see it. > > The stream parser is fast, and it's valuable to be able to access it at > a low level. > > However, it was consciously designed to be low-level, and for something > else to go on top. > > As I understand it, there is a gap between what you can currently do > with std.json (and indeed vibed json) and what you can do with > stdx.data.json. And the capability falls short of what can be done in > other standard libraries such as the ones for python. > > So since we are going for a nuclear-power station included approach, > does that not mean that we need to specify what this layer should do, > and somebody should start to work on it? Please come onto https://www.livecoding.tv/alphaglosined/ and hang out for half an hour. I want to show you something related. | |||
June 23, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc | Am 23.06.2015 um 14:17 schrieb Laeeth Isharc: > It's great, but it's not quite a replacement for std.json, as I see it. > > The stream parser is fast, and it's valuable to be able to access it at > a low level. > > However, it was consciously designed to be low-level, and for something > else to go on top. > > As I understand it, there is a gap between what you can currently do > with std.json (and indeed vibed json) and what you can do with > stdx.data.json. And the capability falls short of what can be done in > other standard libraries such as the ones for python. > > So since we are going for a nuclear-power station included approach, > does that not mean that we need to specify what this layer should do, > and somebody should start to work on it? One thing. which I consider the most important missing building block, is Jacob's anticipated std.serialization module [1]*. Skipping the data representation layer and going straight for a statically typed access to the data is the way to go in a language such as D, at least in most situations. Another part is a high level layer on top of the stream parser that exists for a while (albeit with room for improvement), but that I forgot to update the documentation for. I've now caught up on that and it can be found under [2] - see the read[...] and skip[...] functions. Do you, or anyone else, have further ideas for higher level functionality, or any concrete examples in other standard libraries? [1]: https://github.com/jacob-carlborg/orange [2]: http://s-ludwig.github.io/std_data_json/stdx/data/json/parser.html * Or any other suitable replacement, if that doesn't work out for some reason. The vibe.data.serialization module to me is not a suitable candidate as it stands, because it lacks some features of Jacob's solution, such as proper handling of (duplicate/interior) references. But it's a perfect fit for my own class of problems, so I currently can't justify to put work into this either. | |||
June 23, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Rikki Cattermole | On Tuesday, 23 June 2015 at 12:28:00 UTC, Rikki Cattermole wrote:
> Please come onto https://www.livecoding.tv/alphaglosined/ and hang out for half an hour. I want to show you something related.
what times GMT or BST are good for you?
| |||
June 23, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | On Tuesday, 23 June 2015 at 14:06:38 UTC, Sönke Ludwig wrote: >> As I understand it, there is a gap between what you can currently do >> with std.json (and indeed vibed json) and what you can do with >> stdx.data.json. And the capability falls short of what can be done in >> other standard libraries such as the ones for python. >> >> So since we are going for a nuclear-power station included approach, >> does that not mean that we need to specify what this layer should do, >> and somebody should start to work on it? > > One thing. which I consider the most important missing building block, is Jacob's anticipated std.serialization module [1]*. Skipping the data representation layer and going straight for a statically typed access to the data is the way to go in a language such as D, at least in most situations. Thanks, Sonke. I appreciate your taking the time to reply, and I hope I represented my understanding of things correctly. I think often things get stuck in limbo because people don't know what's most useful, so I do think a central list of "things that need to be done" in D ecosystem might be nice, if it doesn't become excessively structured and bureaucratic. (I ain't volunteering to maintain it, as I can't commit to it). Thing is there are different use cases. For example, I pull data from Quandl - the metadata is standard and won't change in format often; but the data for a particular series will. For example if I pull volatility data that will have different fields to price or economic data. And I don't know beforehand the total set of possibilities. This must be quite a common use case, and indeed I just hit another one recently with a poorly-documented internal corporate database for securities. Maybe it's fine to generate the static typing in response to reading the data, but then it ought to be easy to do so (ultimately). Because otherwise you hack something up in Python because it's just easier, and that hack job becomes the basis for something larger then you ever intended or wanted and it's never worth rewriting given the other stuff you need. But even if you prefer static typing generated on the fly (which maybe becomes useful via introspection a la Alexandrescu talk), sometimes one will prefer dynamic typing, and since it's easy to do in a way that doesn't destroy the elegance and coherence of the whole project, why not give people the option ? It seems to me that Guido painted a target on Python by saying "it's fast enough, and you are usually I/O etc bound", because the numerical computing people have different needs. So BLAS and the like may be part of that, but also having something like pandas - and the ability to get data in and out of it - would be an important part in making it easy and fun to use D for this purpose, and it's not so hard to do so, just a fair bit of work. Not that it makes sense to undergo a death march to duplicate python functionality, but there are some things that are relatively easy that have a high payoff - like John Colvin's pydmagic. (The link here, which may not be so obvious, is that in a way pandas is a kind of replacement for a spreadsheet, and being able to just pull stuff in without minding your 'p's and 'q's to get a quick result lends itself to the kind of iterative exploration that makes spreadsheets still overused even today. And that's the link to JSON and (de)-serialization). > Another part is a high level layer on top of the stream parser that exists for a while (albeit with room for improvement), but that I forgot to update the documentation for. I've now caught up on that and it can be found under [2] - see the read[...] and skip[...] functions. Thank you for the link. > > Do you, or anyone else, have further ideas for higher level functionality, or any concrete examples in other standard libraries? Will think it through and try to come up with some simple examples. Paging John Colvin and Russell Winder, too. > * Or any other suitable replacement, if that doesn't work out for some reason. The vibe.data.serialization module to me is not a suitable candidate as it stands, because it lacks some features of Jacob's solution, such as proper handling of (duplicate/interior) references. But it's a perfect fit for my own class of problems, so I currently can't justify to put work into this either. Is it worth you or someone else trying to articulate well what it does well that is missing from stdx.data.json? | |||
June 24, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc | On 24/06/2015 7:05 a.m., Laeeth Isharc wrote:
> On Tuesday, 23 June 2015 at 12:28:00 UTC, Rikki Cattermole wrote:
>> Please come onto https://www.livecoding.tv/alphaglosined/ and hang out
>> for half an hour. I want to show you something related.
>
> what times GMT or BST are good for you?
12pm UTC+0 is when I aim to stream. Hopefully I'll stream again tonight. Although I'm getting a bit tired after streaming for three days! (usually only twice a week).
Follow or keep an eye on livecodingtv on twitter to know when I start.
| |||
June 24, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc | On 23/06/15 21:22, Laeeth Isharc wrote: > Thing is there are different use cases. For example, I pull data from > Quandl - the metadata is standard and won't change in format often; but > the data for a particular series will. For example if I pull volatility > data that will have different fields to price or economic data. And I > don't know beforehand the total set of possibilities. This must be > quite a common use case, and indeed I just hit another one recently with > a poorly-documented internal corporate database for securities. If the data can change between calls or is not consistent my serialization library is not a good fit. But if the data is consistent but changes over time, something like once a month, my serialization library could work if you update the data structures when the data changes. My serialization library can also work with optional fields if custom serialization is used. -- /Jacob Carlborg | |||
June 24, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Jacob Carlborg | On Wednesday, 24 June 2015 at 13:15:52 UTC, Jacob Carlborg wrote: > On 23/06/15 21:22, Laeeth Isharc wrote: > >> Thing is there are different use cases. For example, I pull data from >> Quandl - the metadata is standard and won't change in format often; but >> the data for a particular series will. For example if I pull volatility >> data that will have different fields to price or economic data. And I >> don't know beforehand the total set of possibilities. This must be >> quite a common use case, and indeed I just hit another one recently with >> a poorly-documented internal corporate database for securities. > > If the data can change between calls or is not consistent my serialization library is not a good fit. But if the data is consistent but changes over time, something like once a month, my serialization library could work if you update the data structures when the data changes. > > My serialization library can also work with optional fields if custom serialization is used. Thanks, Jacob. Some series shouldn't change too often. On the other hand, just with Quandl that is 10 million data series taken from a whole range of different sources, some of them rather unfinished, and it's hard to know. My needs are not relevant for the library, except that I think people often want to explore new data sets iteratively (over the course of weeks and months). Of course it doesn't take long to write the struct (or make something that will write it given the data and some guidance) but that's one more layer of friction. So from the perspective of D succeeding, I would think giving people the option (within a coherent framework, so not using one library here and another there when in other language ecosystems it is not fragmented) of using static or dynamic typing as they prefer would pay off. I don't know if you have looked at pandas and ipython notebook much. But now one can call D code from the ipython notebook (again, a 'trivial' piece of glue but ingenious and removing this small friction makes getting work done much easier) maybe having the option to have dynamic types with JSON will have more value. See here, as one simple example: http://nbviewer.ipython.org/gist/wesm/4757075/PandasTour.ipynb So it would be nice to be able to something like Adam Ruppe does here: https://github.com/adamdruppe/arsd/blob/master/jsvar.d var j = json!q{ "hello": { "data":[1,2,"giggle",4] }, "world":20 }; writeln(j.hello.data[2]); Obviously the scope is outside a serialization library, but just thinking about the broader integrated and coherent library offering we should have. | |||
June 24, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Laeeth Isharc | On 24/06/15 15:48, Laeeth Isharc wrote: > So it would be nice to be able to something like Adam Ruppe does here: > https://github.com/adamdruppe/arsd/blob/master/jsvar.d > > var j = json!q{ > "hello": { > "data":[1,2,"giggle",4] > }, > "world":20 > }; > > writeln(j.hello.data[2]); > > Obviously the scope is outside a serialization library, but just > thinking about the broader integrated and coherent library offering we > should have. I understand and I agree it would be nice to have. -- /Jacob Carlborg | |||
June 24, 2015 Re: stdx.data.json needs a layer on top | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | On 06/23/2015 04:06 PM, Sönke Ludwig wrote:
>
> Do you, or anyone else, have further ideas for higher level functionality, or any concrete examples in other standard libraries?
Allowing to lazily foreach over elements would be nice.
foreach (elem; nodes.readArray)
{
// each elem would be a bounded node stream (range)
foreach (key, value; elem.readObject)
{
}
}
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply