Thread overview
How to simply parse and print the XML with dxml?
Sep 09
jfondren
Sep 09
jfondren
September 09

Maybe I missed something obvious in the docs but how can I just parse the XML and print its content?

import dxml.parser;

auto xml = parseXML!simpleXML(layout);
xml.map!(e => e.text).join.writeln;

throws core.exception.AssertError@../../../.dub/packages/dxml-0.4.3/dxml/source/dxml/parser.d(1457): text cannot be called with elementStart.

September 09

On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals wrote:

>

Maybe I missed something obvious in the docs but how can I just parse the XML and print its content?

import dxml.parser;

auto xml = parseXML!simpleXML(layout);
xml.map!(e => e.text).join.writeln;

throws core.exception.AssertError@../../../.dub/packages/dxml-0.4.3/dxml/source/dxml/parser.d(1457): text cannot be called with elementStart.

I am not fully experienced with it, but once I used it for reading glade files [1]. I used dxml.dom. Hope it helps.

1: https://github.com/aferust/makegtkdclass/blob/master/source/gladeparser.d#L43

September 09
On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals wrote:
> Maybe I missed something obvious in the docs but how can I just parse the XML and print its content?

idk how to use dxml but my dom.d makes these things trivial

http://arsd-official.dpldocs.info/arsd.dom.html

https://github.com/adamdruppe/arsd/blob/master/dom.d
https://code.dlang.org/packages/arsd-official%3Adom

if you're familiar with javascript you'll find a lot of similarities with my api there.

for strict xml mode you just use `new XmlDocument` instead of `new Document`
September 09

On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals wrote:

>

Maybe I missed something obvious in the docs but how can I just parse the XML and print its content?

import dxml.parser;

auto xml = parseXML!simpleXML(layout);
xml.map!(e => e.text).join.writeln;

throws core.exception.AssertError@../../../.dub/packages/dxml-0.4.3/dxml/source/dxml/parser.d(1457): text cannot be called with elementStart.

dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over xml, you're not getting <a>some text</a> at a time, but <a>, some text, and </a> separately, as they're parsed. The <a> there is an elementStart which lacks a text, hence the error.

Here's a script:

#! /usr/bin/env dub
/++ dub.sdl:
    dependency "dxml" version="0.4.0"
    stringImportPaths "."
+/
import dxml.parser;
import std;

enum text = import(__FILE__)
    .splitLines
    .find("__EOF__")
    .drop(1)
    .join("\n");

void main() {
    foreach (entity; parseXML!simpleXML(text)) {
        if (entity.type == EntityType.text)
            writeln(entity.text.strip);
    }
}
__EOF__
<!-- comment -->
<root>
    <foo>some text<whatever/></foo>
    <bar/>
    <baz></baz>
    more text
</root>

that runs with this output:

some text
more text
September 09

On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:

>

On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals wrote:

>

[...]

dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over xml, you're not getting <a>some text</a> at a time, but <a>, some text, and </a> separately, as they're parsed. The <a> there is an elementStart which lacks a text, hence the error.

[...]

That's a nice trick you did there

September 09

On Thursday, 9 September 2021 at 23:29:56 UTC, Imperatorn wrote:

>

On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:

>

On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals wrote:

>

[...]

dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over xml, you're not getting <a>some text</a> at a time, but <a>, some text, and </a> separately, as they're parsed. The <a> there is an elementStart which lacks a text, hence the error.

[...]

That's a nice trick you did there

Something in the quoted text?

Or if you mean the self-string-importing script that uses the content after __EOF__ , yeah, that's in imitation of Perl's __DATA__ https://perldoc.perl.org/perldata#Special-Literals

September 10

On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:

>

On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals wrote:

>

[...]

dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over xml, you're not getting <a>some text</a> at a time, but <a>, some text, and </a> separately, as they're parsed. The <a> there is an elementStart which lacks a text, hence the error.

Here's a script:

#! /usr/bin/env dub
/++ dub.sdl:
    dependency "dxml" version="0.4.0"
    stringImportPaths "."
+/
import dxml.parser;
import std;

enum text = import(__FILE__)
    .splitLines
    .find("__EOF__")
    .drop(1)
    .join("\n");

void main() {
    foreach (entity; parseXML!simpleXML(text)) {
        if (entity.type == EntityType.text)
            writeln(entity.text.strip);
    }
}
__EOF__
<!-- comment -->
<root>
    <foo>some text<whatever/></foo>
    <bar/>
    <baz></baz>
    more text
</root>

that runs with this output:

some text
more text

Ok, that makes sense now. Thank you.

As for the dxml, I believe adding a small quick start example would be very beneficial for the newcomers. Especially, ppl like me who are not aware of the XML parser types and just need to extract text from an XML file.

September 10

On Thursday, 9 September 2021 at 23:42:42 UTC, jfondren wrote:

>

On Thursday, 9 September 2021 at 23:29:56 UTC, Imperatorn wrote:

>

On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:

>

[...]

That's a nice trick you did there

Something in the quoted text?

Or if you mean the self-string-importing script that uses the content after __EOF__ , yeah, that's in imitation of Perl's __DATA__ https://perldoc.perl.org/perldata#Special-Literals

Yeah, the import thing

September 11

On Friday, 10 September 2021 at 07:50:29 UTC, tastyminerals wrote:

>

As for the dxml, I believe adding a small quick start example would be very beneficial for the newcomers. Especially, ppl like me who are not aware of the XML parser types and just need to extract text from an XML file.

Submit a request:

https://github.com/jmdavis/dxml/issues

I don't know how active Jonathan is these days, but it won't get implemented at all if no one requests it.