std.xml and Adam D Ruppe's dom module (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.xml and Adam D Ruppe's dom module (page 4)

February 09, 2012

Re: std.xml and Adam D Ruppe's dom module

Posted by Sean Kelly
in reply to Johannes Pfau

Sean Kelly

Posted in reply to Johannes Pfau

This. And decoded JSON strings are always smaller than encoded strings--JSON uses escaping to encode non UTF-8 stuff, so in the case where someone sends a surrogate pair (legal in JSON) it's encoded as \u0000\u0000. In short, it's absolutely possible to create a pull parser that never allocates, even for decoding. As proof, I've done it before. :-p

On Feb 9, 2012, at 3:07 AM, Johannes Pfau <nospam@example.com> wrote:

> Am Wed, 08 Feb 2012 20:49:48 -0600
> schrieb "Robert Jacques" <sandford@jhu.edu>:
> 
>> On Wed, 08 Feb 2012 02:12:57 -0600, Johannes Pfau <nospam@example.com> wrote:
>>> Am Tue, 07 Feb 2012 20:44:08 -0500
>>> schrieb "Jonathan M Davis" <jmdavisProg@gmx.com>:
>>>> On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
>>>>> On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis
>> [snip]
>>> 
>>> Using ranges of dchar directly can be horribly inefficient in some
>>> cases, you'll need at least some kind off buffered dchar range. Some
>>> std.json replacement code tried to use only dchar ranges and had to
>>> reassemble strings character by character using Appender. That sucks
>>> especially if you're only interested in a small part of the data and
>>> don't care about the rest.
>>> So for pull/sax parsers: Use buffering, return strings(better:
>>> w/d/char[]) as slices to that buffer. If the user needs to keep a
>>> string, he can still copy it. (String decoding should also be done
>>> on-demand only).
>> 
>> Speaking as the one proposing said Json replacement, I'd like to point out that JSON strings != UTF strings: manual conversion is required some of the time. And I use appender as a dynamic buffer in exactly the manner you suggest. There's even an option to use a string cache to minimize total memory usage. (Hmm... that functionality should probably be re-factored out and made into its own utility) That said, I do end up doing a bunch of useless encodes and decodes, so I'm going to special case those away and add slicing support for strings. wstrings and dstring will still need to be converted as currently Json values only accept strings and therefore also Json tokens only support strings. As a potential user of the sax/pull interface would you prefer the extra clutter of special side channels for zero-copy wstrings and dstrings?
> 
> Regarding wstrings and dstrings: We'll JSON seems to be UTF8 in almost all cases, so it's not that important. But i think it should be possible to use templates to implement identical parsers for d/w/strings
> 
> Regarding the use of Appender: Long text ahead ;-)
> 
> I think pull parsers should really be as fast a possible and low-level. For easy to use highlevel stuff there's always DOM and a safe, high-level serialization API should be implemented based on the PullParser as well. The serialization API would read only the requested data, skipping the rest:
> ----------------
> struct Data
> {
>    string link;
> }
> auto Data = unserialize!Data(json);
> ----------------
> 
> So in the PullParser we should
> avoid memory allocation whenever possible, I think we can even avoid it
> completely:
> 
> I think dchar ranges are just the wrong input type for parsers, parsers
> should use buffered ranges or streams (which would be basically the
> same). We could use a generic BufferedRange with real
> dchar-ranges then. This BufferedRange could use a static buffer, so
> there's no need to allocate anything.
> 
> The pull parser should return slices to the original string (if the
> input is a string) or slices to the Range/Stream's buffer.
> Of course, such a slice is only valid till the pull parser is called
> again. The slice also wouldn't be decoded yet. And a slice string could
> only be as long as the buffer, but I don't think this is an issue, a
> 512KB buffer can already store 524288 characters.
> 
> If the user wants to keep a string, he should really do decodeJSONString(data).idup. There's a little more opportunity for optimization: As long as a decoded json string is always smaller than the encoded one(I don't know if it is), we could have a decodeJSONString function which overwrites the original buffer --> no memory allocation.
> 
> If that's not the case, decodeJSONString has to allocate iff the
> decoded string is different. So we need a function which always returns
> the decoded string as a safe too keep copy and a function which returns
> the decoded string as a slice if the decoded string is
> the same as the original.
> 
> An example: string json =
> {
>   "link":"http://www.google.com",
>   "useless_data":"lorem ipsum",
>   "more":{
>      "not interested":"yes"
>   }
> }
> 
> now I'm only interested in the link. I should be possible to parse that with zero memory allocations:
> 
> auto parser = Parser(json);
> parser.popFront();
> while(!parser.empty)
> {
>    if(parser.front.type == KEY
>       && tempDecodeJSON(parser.front.value) == "link")
>    {
>        parser.popFront();
>        assert(!parser.empty && parser.front.type == VALUE);
>        return decodeJSON(parser.front.value); //Should return a slice
>    }
>    //Skip everything else;
>    parser.popFront();
> }
> 
> tempDecodeJSON returns a decoded string, which (usually) isn't safe to store(it can/should be a slice to the internal buffer, here it's a slice to the original string, so it could be stored, but there's no guarantee). In this case, the call to tempDecodeJSON could even be left out, as we only search for "link" wich doesn't need encoding.

February 09, 2012

Re: std.xml and Adam D Ruppe's dom module

Posted by Johannes Pfau
in reply to Robert Jacques

Johannes Pfau

Posted in reply to Robert Jacques

Am Thu, 09 Feb 2012 08:18:15 -0600
schrieb "Robert Jacques" <sandford@jhu.edu>:

> On Thu, 09 Feb 2012 05:13:52 -0600, Johannes Pfau <nospam@example.com> wrote:
> > Am Wed, 08 Feb 2012 20:49:48 -0600
> > schrieb "Robert Jacques" <sandford@jhu.edu>:
> >>
> >> Speaking as the one proposing said Json replacement, I'd like to point out that JSON strings != UTF strings: manual conversion is required some of the time. And I use appender as a dynamic buffer in exactly the manner you suggest. There's even an option to use a string cache to minimize total memory usage. (Hmm... that functionality should probably be re-factored out and made into its own utility) That said, I do end up doing a bunch of useless encodes and decodes, so I'm going to special case those away and add slicing support for strings. wstrings and dstring will still need to be converted as currently Json values only accept strings and therefore also Json tokens only support strings. As a potential user of the sax/pull interface would you prefer the extra clutter of special side channels for zero-copy wstrings and dstrings?
> >
> > BTW: Do you know DYAML?
> > https://github.com/kiith-sa/D-YAML
> >
> > I think it has a pretty nice DOM implementation which doesn't require any changes to phobos. As YAML is a superset of JSON, adapting it for std.json shouldn't be too hard. The code is boost licensed and well documented.
> >
> > I think std.json would have better chances of being merged into phobos if it didn't rely on changes to std.variant.
> 
> I know about D-YAML, but haven't taken a deep look at it; it was developed long after I wrote my own JSON library.

I know, I didn't mean to criticize. I just thought DYAML could give some useful inspiration for the DOM api.

> I did look into
> YAML before deciding to use JSON for my application; I just didn't
> need the extra features and implementing them would've taken extra
> dev time.

Sure, I was only referring to DYAML cause the DOM is very similar. Just remove some features and it would suit JSON very well. One problem is that DYAML uses some older YAML version which isn't 100% compatible with JSON, so it can't be used as a JSON parser. There's also no way to tell it to generate only JSON compatible output (and AFAIK that's a design decision and not simply a missing feature)
> 
> As for reliance on changes to std.variant, this was a change *suggested* by Andrei.
Ok, then those changes obviously make sense. I actually thought Andrei didn't like some of those changes.

> And while it is the slower route to go, I
> believe it is the correct software engineering choice; prior to the
> change I was implementing my own typed union (i.e. I poorly
> reinvented std.variant) Actually, most of my initial work on Variant
> was to make its API just as good as my home-rolled JSON type.
> Furthermore, a quick check of the YAML code-base seems to indicate
> that underneath the hood, Variant is being used. I'm actually a
> little curious about what prevented YAML from being expressed using
> std.variant directly and if those limitations can be removed.

I guess the custom Node type was only added to support additional methods(isScalar, isSequence, isMapping, add, remove, removeAt) and I'm not sure if those are supported on Variant (length, foreach, opIndex, opIndexAssign), but IIRC those are supported in your new std.variant.
> 
> * The other thing slowing both std.variant and std.json down is my thesis writing :)

February 09, 2012

Re: OT Adam D Ruppe's web stuff

Posted by Andrei Alexandrescu
in reply to Adam D. Ruppe

Andrei Alexandrescu

Posted in reply to Adam D. Ruppe

On 2/9/12 6:56 AM, Adam D. Ruppe wrote:
> Here's the ddoc:
> http://arsdnet.net/web.d/cgi.html

Cue the choir: "Please submit to Phobos".

Andrei

February 09, 2012

Re: OT Adam D Ruppe's web stuff

Posted by Adam D. Ruppe
in reply to Andrei Alexandrescu

Adam D. Ruppe

Posted in reply to Andrei Alexandrescu

On Thursday, 9 February 2012 at 17:36:01 UTC, Andrei Alexandrescu wrote:
> Cue the choir: "Please submit to Phobos".

Perhaps when I finish the URL struct in there. (It
takes a url and breaks it down into parts you can edit,
and can do rebasing. Currently, the handling of the Location:
header is technically wrong - the http spec says it is supposed
to be an absolute url, but I don't enforce that.

Now, in cgi mode, it doesn't matter, since the web server
fixes it up for us. But, in http mode... well, it still
doesn't matter since the browsers can all figure it out,
but I'd like to do the right thing anyway.).


I might change the http constructor and/or add one
that takes a std.socket socket cuz that would be cool.



But I just don't want to submit it when I still might
be making some big changes in the near future.




BTW, I spent a little time reorganizing and documenting
dom.d a bit more.

http://arsdnet.net/web.d/dom.html

Still not great docs, but if you come from javascript,
I think it is pretty self-explanatory anyway.

February 10, 2012

Re: OT Adam D Ruppe's web stuff

Posted by Adam D. Ruppe
in reply to Adam D. Ruppe

Adam D. Ruppe

Posted in reply to Adam D. Ruppe

On Tuesday, 7 February 2012 at 20:00:26 UTC, Adam D. Ruppe wrote:
> I'm taking this to an extreme with this:
>
> http://arsdnet.net:8080/

hehehe, I played with this a little bit more tonight.

http://arsdnet.net/dcode/sse/

needs the bleeding edge dom.d from my github.
https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff

Here's the code, not very long.
http://arsdnet.net/dcode/sse/test.d

The best part is this:

 document.mainBody.addEventListener("click", (Element thislol, Event event) {
   event.target.style.color = "red";
   event.target.appendText(" clicked! ");
   event.preventDefault();
 });

A html onclick handler written in D!

Now, like I said before, probably not usable for real work. What
this does is for each user session, it creates a server side DOM
object.

Using observers on the DOM, it listens for changes and forwards them
to Javascript. You use the D api to change your document, and it
sends them down. I've only implemented a couple mutation events,
but they go a long way - appendChild and setAttribute - as they
are the building blocks for many of the functions.

On the client side, the javascript listens for events and forwards
them to D.

To sync the elements on both sides, I added a special feature
to dom.d to put an attribute there that is usable on both sides.
The Makefile in there shows the -version needed to enable it.

Since it is a server side document btw, you can refresh the browser
and keep the same document. It could quite plausible gracefully degrade!

But, yeah, lots of fun. D rox.

September 17, 2014

Re: OT Adam D Ruppe's web stuff

Posted by Joel
in reply to Adam D. Ruppe

Joel

Posted in reply to Adam D. Ruppe

I think there might be a better one out, I noticed some thing in a forum (Aug 2013).

I've found that GTkD and arsd\dom.d (Adam's) have 'Event' symbols that clash with each other.

With the code 'alias jEvent = gdk.Event;' I get errors saying gtk.Event is used as a type.

Is there an easier way to to think out symbol clashes?

I've got some code here:
https://github.com/joelcnz/Jyble

Thanks for any help.

September 17, 2014

Re: OT Adam D Ruppe's web stuff

Posted by ketmar
in reply to Joel

ketmar

Posted in reply to Joel

Attachments:

signature.asc

On Wed, 17 Sep 2014 04:57:03 +0000
Joel via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

> Is there an easier way to to think out symbol clashes?
sure. documentation on 'import' keyword should englighten you.

September 17, 2014

Re: OT Adam D Ruppe's web stuff

Posted by Adam D. Ruppe
in reply to Joel

Adam D. Ruppe

Posted in reply to Joel

On Wednesday, 17 September 2014 at 04:57:05 UTC, Joel wrote:
> Is there an easier way to to think out symbol clashes?

Since most of dom.d's functionality is inside the Document and other classes, you can probably get you code to compile most easily with a selective import:

import arsd.dom : Document, Element, Form, Link;

or something like that. You might not even be using Form and Link. Then since the rest of the functions used are methods of those classes, you don't need to list them individually, they'll just work, and you didn't import the Event class so it should be like it doesn't even exist,

September 18, 2014

Re: OT Adam D Ruppe's web stuff

Posted by Joel
in reply to Adam D. Ruppe

Joel

Posted in reply to Adam D. Ruppe

Yay! Got it working again! Thanks guys!

I want to get it working on OSX at some point. I haven't tried yet.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation