Thread overview
Parsing with dxml
Nov 18, 2019
Joel
Nov 18, 2019
Joel
Nov 19, 2019
Jonathan M Davis
Nov 19, 2019
Joel
Nov 19, 2019
Joel
Nov 19, 2019
Kagamin
Nov 20, 2019
Joel
Nov 20, 2019
Joel
November 18, 2019
I can only parse one row successfully. I tried increasing the popFronts, till it said I'd gone off the end.

Running ./app
core.exception.AssertError@../../../../.dub/packages/dxml-0.4.1/dxml/source/dxml/parser.d(1457): text cannot be called with elementEnd
----------------
??:? _d_assert_msg [0x104b3981a]
../../JMiscLib/source/jmisc/base.d:161 pure @property @safe immutable(char)[] dxml.parser.EntityRange!(dxml.parser.Config(1, 1, 1, 1), immutable(char)[]).EntityRange.Entity.text() [0x104b2297b]
source/app.d:26 _Dmain [0x104aeb46e]
Program exited with code 1

```
<?xml version="1.0"?>

<resultset statement="SELECT * FROM bible.t_asv
" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <row>
	<field name="id">01001001</field>
	<field name="b">1</field>
	<field name="c">1</field>
	<field name="v">1</field>
	<field name="t">In the beginning God created the heavens and the earth.</field>
  </row>

  <row>
	<field name="id">01001002</field>
	<field name="b">1</field>
	<field name="c">1</field>
	<field name="v">2</field>
	<field name="t">And the earth was waste and void; and darkness was upon the face of the deep: and the Spirit of God moved upon the face of the waters.</field>
  </row>

```

```d
void main() {
    import std.stdio;
    import std.file : readText;
    import dxml.parser;
    import std.conv : to;

    struct Verse {
        string id;
        int b, c, v;
        string t;
    }

    auto range = parseXML!simpleXML(readText("xmltest.xml"));

    // simpleXML skips comments

    void pops(int c) {
        foreach(_; 0 .. c)
            range.popFront();
    }
    pops(3);

    Verse[] vers;
    foreach(_; 0 .. 2) {
        Verse ver;
        ver.id = range.front.text;
        pops(3);
        ver.b = range.front.text.to!int;
        pops(3);
        ver.c = range.front.text.to!int;
        pops(3);
        ver.v = range.front.text.to!int;
        pops(3);
        ver.t = range.front.text;

        with(ver)
            vers ~= Verse(id,b,c,v,t);

        pops(2);
    }
    foreach(verse; vers) with(verse)
        writeln(id, " Book: ", b, " ", c, ":", v, " -> ", t);
}
```

November 18, 2019
On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
>         with(ver)
>             vers ~= Verse(id,b,c,v,t);
>

Or, vers ~= ver;
November 18, 2019
On Sunday, November 17, 2019 11:44:43 PM MST Joel via Digitalmars-d-learn wrote:
> I can only parse one row successfully. I tried increasing the popFronts, till it said I'd gone off the end.
>
> Running ./app core.exception.AssertError@../../../../.dub/packages/dxml-0.4.1/dxml/sourc e/dxml/parser.d(1457): text cannot be called with elementEnd
> ----------------
> ??:? _d_assert_msg [0x104b3981a]
> ../../JMiscLib/source/jmisc/base.d:161 pure @property @safe
> immutable(char)[] dxml.parser.EntityRange!(dxml.parser.Config(1,
> 1, 1, 1), immutable(char)[]).EntityRange.Entity.text()
> [0x104b2297b]
> source/app.d:26 _Dmain [0x104aeb46e]
> Program exited with code 1
>
> ```
> <?xml version="1.0"?>
>
> <resultset statement="SELECT * FROM bible.t_asv
> " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>    <row>
>   <field name="id">01001001</field>
>   <field name="b">1</field>
>   <field name="c">1</field>
>   <field name="v">1</field>
>   <field name="t">In the beginning God created the heavens and the
> earth.</field>
>    </row>
>
>    <row>
>   <field name="id">01001002</field>
>   <field name="b">1</field>
>   <field name="c">1</field>
>   <field name="v">2</field>
>   <field name="t">And the earth was waste and void; and darkness
> was upon the face of the deep: and the Spirit of God moved upon
> the face of the waters.</field>
>    </row>
>
> ```
>
> ```d
> void main() {
>      import std.stdio;
>      import std.file : readText;
>      import dxml.parser;
>      import std.conv : to;
>
>      struct Verse {
>          string id;
>          int b, c, v;
>          string t;
>      }
>
>      auto range = parseXML!simpleXML(readText("xmltest.xml"));
>
>      // simpleXML skips comments
>
>      void pops(int c) {
>          foreach(_; 0 .. c)
>              range.popFront();
>      }
>      pops(3);
>
>      Verse[] vers;
>      foreach(_; 0 .. 2) {
>          Verse ver;
>          ver.id = range.front.text;
>          pops(3);
>          ver.b = range.front.text.to!int;
>          pops(3);
>          ver.c = range.front.text.to!int;
>          pops(3);
>          ver.v = range.front.text.to!int;
>          pops(3);
>          ver.t = range.front.text;
>
>          with(ver)
>              vers ~= Verse(id,b,c,v,t);
>
>          pops(2);
>      }
>      foreach(verse; vers) with(verse)
>          writeln(id, " Book: ", b, " ", c, ":", v, " -> ", t);
> }
> ```

You need to be checking the type of the entity before you call either name or text on it, because not all entities have a name, and not all entities have text - e.g. <field name="id"> is an EntityType.elementStart, so it has a name (which is "field"), but it doesn't have text, whereas the 01001001 between the <field name="id"> and </field> tags has no name but does have text, because it's an EntityType.text. If you call name or text without verifying the type first, then you're almost certainly going to get an assertion failure at some point (assuming that you don't compile with -release anyway), since you're bound to end up with an entity that you don't expect at some point (either because you were wrong about where you were in the document, or because the document didn't match the layout that was expected).

Per the assertion's message, you managed to call text on an EntityType.elementEnd, and per the stack trace, text was called on this line

         ver.id = range.front.text;

If I add

         if(range.front.type == EntityType.elementEnd)
         {
             writeln(range.front.name);
             writeln(range.front.pos);
         }

right above that, I get

row
TextPos(11, 4)

indicating that the end tag was </row> and that it was on line 11, 4 code units in (and since this is ASCII, that would be 4 characters). So, you managed to parse all of the <field>***</field> lines but didn't correctly deal with the end of that section.

If I add

    writeln(range.front);

right before

    pops(2);

then I get:

Entity(text, TextPos(10, 25), , Text!(ByCodeUnitImpl)(In the beginning God
created the heavens and the earth., TextPos(10, 25)))

So, prior to popping twice, it's on the text between <field name="t"> and </field>, which looks like it's what you intended. If you look at the XML after that, it should be clear why you're in the wrong place afterwards.

Since at that point, range.front is on the EntityType.text between
<field name="t"> and </field>, popping once makes it so that range.front is
</field>. And popping a second time makes range.front </row>, which is where
the range is when it the tries to call text at the top of the loop.
Presumably, you want it to be on the EntityType.text in

        <field name="id">01001002</field>

To get there from </row>, you'd have to pop once to get to <row>, a second time to get to <field>, and a third time to get to 01001002. So, if you had

        pops(5);

instead of

        pops(2);

the range would be at the correct place at the top of the loop - though it would then be the wrong number of times to pop the second time around. With the text as provided, it would throw an XMLParsingException when it reached the end of the loop the second time, because the XML document doesn't have the matching </resultset> tag, and with that fixed, you end up with an assertion failure, because popFront was called on an empty range (since there aren't 7 elements left in the range at that point):

core.exception.AssertError@../../.dub/packages/dxml-0.4.0/dxml/source/dxml
/parser.d(1746): It's illegal to call popFront() on an empty EntityRange.

So, you'd need to adjust the end of the loop so that it only pops what it needs to pop on the second loop. If you don't care about any data after that point, you could just make it not pop on the last iteration, or what would probably be better would be to write the loop so that it expects to start on <row>, and it will exit the loop if it's instead on an end tag (since that would indicate the end of that section, and in this case, it would mean that it was no the last entity in the document).

Regardless, if you're actually looking to parse a document like this in production code instead of in something that's just thrown together to get something done, you'd actually need to be checking the EntityType of each element to make sure that it was what was expected so that you can provide an error to the user when the document is malformed. dxml expects that you will only ever call a property of an EntityRange.Entity which is valid for that EntityType, and it asserts that it's not called on the wrong type. So, if you don't check the EntityType, unless you can guarantee that the XML document is as expected, you're going to get assertion failures when not compiling with -release, and you'll get weird results when the assertions are complied out with -release.

On an unrelated note, std.range.primitives.popFrontN (or std.range.popFrontN, since std.range publicly imports std.range.primitives) does what your pops function does - and it does it more efficiently for ranges which have slicing (which dxml's EntityRange doesn't, but either way, you can just use the function from Phobos instead of writing your own).

- Jonathan M Davis



November 19, 2019
On Tuesday, 19 November 2019 at 02:45:29 UTC, Jonathan M Davis wrote:
> On Sunday, November 17, 2019 11:44:43 PM MST Joel via Digitalmars-d-learn wrote:
>> [...]
>
> You need to be checking the type of the entity before you call either name or text on it, because not all entities have a name, and not all entities have text - e.g. <field name="id"> is an EntityType.elementStart, so it has a name (which is "field"), but it doesn't have text, whereas the 01001001 between the <field name="id"> and </field> tags has no name but does have text, because it's an EntityType.text. If you call name or text without verifying the type first, then you're almost certainly going to get an assertion failure at some point (assuming that you don't compile with -release anyway), since you're bound to end up with an entity that you don't expect at some point (either because you were wrong about where you were in the document, or because the document didn't match the layout that was expected).
>
> [...]

Thanks for taking the time to reply.

I have had another xml Bible version text in the past [1]. It had a different format. And Adam Ruppe helped me by writing code that worked (with just one tweak). I think I want another example that I can just paste into my program, using the same structs as the last xml version (see link).

[1] https://forum.dlang.org/thread/j7ljs5$24r2$1@digitalmars.com
November 19, 2019
On Tuesday, 19 November 2019 at 04:43:31 UTC, Joel wrote:
> On Tuesday, 19 November 2019 at 02:45:29 UTC, Jonathan M Davis wrote:
>> [...]
>
> Thanks for taking the time to reply.
>
> I have had another xml Bible version text in the past [1]. It had a different format. And Adam Ruppe helped me by writing code that worked (with just one tweak). I think I want another example that I can just paste into my program, using the same structs as the last xml version (see link).
>
> [1] https://forum.dlang.org/thread/j7ljs5$24r2$1@digitalmars.com

-class's (not structs)
November 19, 2019
On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
> ```
> <?xml version="1.0"?>
>
> <resultset statement="SELECT * FROM bible.t_asv
> " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>   </row>
>
> ```

You're missing a closing tag.
November 20, 2019
On Tuesday, 19 November 2019 at 14:20:39 UTC, Kagamin wrote:
> On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
>> ```
>> <?xml version="1.0"?>
>>
>> <resultset statement="SELECT * FROM bible.t_asv
>> " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>   </row>
>>
>> ```
>
> You're missing a closing tag.

I can store the ASV Bible in an array (I check for if the last book, chapter, and verse number instead of a closing tag). But I haven't figured out how to get it into the class's setup I've got.
November 20, 2019
On Wednesday, 20 November 2019 at 00:07:53 UTC, Joel wrote:
> On Tuesday, 19 November 2019 at 14:20:39 UTC, Kagamin wrote:
>> On Monday, 18 November 2019 at 06:44:43 UTC, Joel wrote:
>>> ```
>>> <?xml version="1.0"?>
>>>
>>> <resultset statement="SELECT * FROM bible.t_asv
>>> " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>>>   </row>
>>>
>>> ```
>>
>> You're missing a closing tag.
>
> I can store the ASV Bible in an array (I check for if the last book, chapter, and verse number instead of a closing tag). But I haven't figured out how to get it into the class's setup I've got.

Ok, got it working. Though didn't use any xml tools, just split the xml file into lines, and went from there. I used my trace function in a mixin for tracing what was happening, from simple code I reuse in my programs - I shows the variable and its value without having to write the variable twice.

```
	g_bible = new Bible;

	int b, c, v;
	size_t j;
	break0: do {
		b = verses[j].b;
		g_bible.m_books ~= new Book(bookNames[b-1]);
		version(asvtrace)
			mixin(trace("g_bible.m_books[$-1].m_bookTitle"));
		do {
			c = verses[j].c;
			g_bible.m_books[$-1].m_chapters ~= new Chapter(c.to!string);
			version(asvtrace)
				mixin(trace("j g_bible.m_books[$-1].m_chapters[$-1].m_chapterTitle".split));
			do {
				v = verses[j].v;
				g_bible.m_books[$-1].m_chapters[$-1].m_verses ~= new Verse(v.to!string);
				g_bible.m_books[$-1].m_chapters[$-1].m_verses[$-1].verse = verses[j].t;
				version(asvtrace)
					mixin(trace(("j g_bible.m_books[$-1].m_chapters[$-1].m_verses[$-1].m_verseTitle" ~
						" g_bible.m_books[$-1].m_chapters[$-1].m_verses[$-1].verse").split));
				j += 1;
				if (j == verses.length)
					break break0;
			} while(verses[j].v != 1);
		} while(verses[j+1].c != 1);
	} while(true);
```