simple sax-style xml parser

Jul 20, 2016

Jul 29, 2016

i wrote a simple sax-style xml parser[1][2] for my own needs, and decided to share it. it has two interfaces: `xmparse()` function which simply calls callbacks without any validation or encoding conversion, and `SaxyEx` class, which does some validation, converts content to utf-8 (from anything std.encoding supports), and calls callbacks when the given path is triggered. it can parse any `char` input range, or std.stdio.File. parsing files is probably slightly faster than parsing ranges. internally it is extensively reusing memory buffers it allocated, so it should not create a big pressure on GC. you are expected to copy any data you need in callbacks (not just slice, but .dup!). so far i'm using it to parse fb2 files, and it parsing 8.5 megabyte utf-8 file (and creating internal reader structures, including splitting text to words and some other housekeeping) in one second on my i3 (with dmd -O, even without -inline and -release). it is not really documented, but i think it is "intuitive". there are also some comments in source code; please, read those! ;-) p.s. it decodes standard xml entities (&# and &#x probably works right only in utf-8 files, though), understands CDATA and comments. enjoy, and happy hacking! [1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d [2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests

July 29, 2016

Re: simple sax-style xml parser

Posted by Chris
in reply to ketmar

Permalink

Chris

Posted in reply to ketmar

Permalink

On Wednesday, 20 July 2016 at 01:49:37 UTC, ketmar wrote:
> i wrote a simple sax-style xml parser[1][2] for my own needs, and decided to share it. it has two interfaces: `xmparse()` function which simply calls callbacks without any validation or encoding conversion, and `SaxyEx` class, which does some validation, converts content to utf-8 (from anything std.encoding supports), and calls callbacks when the given path is triggered.
>
> it can parse any `char` input range, or std.stdio.File. parsing files is probably slightly faster than parsing ranges.
>
> internally it is extensively reusing memory buffers it allocated, so it should not create a big pressure on GC.
>
> you are expected to copy any data you need in callbacks (not just slice, but .dup!).
>
> so far i'm using it to parse fb2 files, and it parsing 8.5 megabyte utf-8 file (and creating internal reader structures, including splitting text to words and some other housekeeping) in one second on my i3 (with dmd -O, even without -inline and -release).
>
> it is not really documented, but i think it is "intuitive". there are also some comments in source code; please, read those! ;-)
>
> p.s. it decodes standard xml entities (&# and &#x probably works right only in utf-8 files, though), understands CDATA and comments.
>
>
> enjoy, and happy hacking!
>
>
> [1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
> [2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests

Thanks. I might actually use it. I need an XML parser and wrote a very basic and incomplete one for my needs.

On Friday, 29 July 2016 at 14:47:08 UTC, Chris wrote: > Thanks. I might actually use it. I need an XML parser and wrote a very basic and incomplete one for my needs. great. don't forget to get lastest versions from that links. and feel free to report any bugs here, i'll try to fix them asap. ;-)

Forums