| |
 | Posted by tastyminerals in reply to jfondren | Permalink Reply |
|
tastyminerals 
Posted in reply to jfondren
| On Thursday, 9 September 2021 at 18:40:53 UTC, jfondren wrote:
> On Thursday, 9 September 2021 at 17:17:23 UTC, tastyminerals wrote:
> [...]
dxml.parser is a streaming XML parser. The documentation at http://jmdavisprog.com/docs/dxml/0.4.0/dxml_parser.html has a link to more information about this at the top, behind 'StAX'. Thus, when you're mapping over xml , you're not getting <a>some text</a> at a time, but <a> , some text , and </a> separately, as they're parsed. The <a> there is an elementStart which lacks a text , hence the error.
Here's a script:
#! /usr/bin/env dub
/++ dub.sdl:
dependency "dxml" version="0.4.0"
stringImportPaths "."
+/
import dxml.parser;
import std;
enum text = import(__FILE__)
.splitLines
.find("__EOF__")
.drop(1)
.join("\n");
void main() {
foreach (entity; parseXML!simpleXML(text)) {
if (entity.type == EntityType.text)
writeln(entity.text.strip);
}
}
__EOF__
<!-- comment -->
<root>
<foo>some text<whatever/></foo>
<bar/>
<baz></baz>
more text
</root>
that runs with this output:
some text
more text
Ok, that makes sense now. Thank you.
As for the dxml, I believe adding a small quick start example would be very beneficial for the newcomers. Especially, ppl like me who are not aware of the XML parser types and just need to extract text from an XML file.
|