Thread overview
XML Parsing
Mar 20, 2012
Chris Pons
Mar 20, 2012
Adam D. Ruppe
Mar 20, 2012
Chris Pons
May 18, 2012
Iain
May 18, 2012
Adam D. Ruppe
May 18, 2012
Iain
May 19, 2012
Iain
May 19, 2012
Adam D. Ruppe
March 20, 2012
Hey Guys,
I am trying to parse an XML document with std.xml. I've looked over the reference of std.xml as well as the example but i'm still stuck. I've also looked over some example code, but it's a bit confusing and doesn't entirely help explain what i'm doing wrong.

As far as I understand it, I should load a file with read in std.file and save that into a string. From there, I check to make sure the string xmlData is in a proper xml format.

This is where it gets a bit confusing, I followed the example and created a new instance of the class document parser and then tried to parse an attribute from the start tag map. The value i'm targeting right now is the width of the map in tiles, and want to save this into an integer. However, the value I get is 0.

Any help would be MUCH appreciated.

Here is a reference to the XML file: http://pastebin.com/tpUU1Wtv


//These two functions are called in my main loop.
	void LoadMap(string filename)
	{
		enforce( filename != "" , "Filename is invalid!" );

		xmlData = cast(string) read(filename);

		enforce( xmlData != "", "Read file Failed!" );

		debug StopWatch sw = StopWatch(AutoStart.yes);
		check(xmlData);
		debug writeln( "Verified XML in ", sw.peek.msecs, "ms.");		
	}
	
	void ParseMap()
	{
		auto xml = new DocumentParser(xmlData);

		xml.onStartTag["map"] = (ElementParser xml)
		{
			mapWidth = to!int(xml.tag.attr["width"]);
			xml.parse();
		};
		xml.parse();
		writeln("Map Width: ", mapWidth);
	}
March 20, 2012
I know very little about std.xml (I looked at it and
said 'meh' and wrote my own lib), but my lib
makes this pretty simple.

https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff

grab dom.d and characterencodings.d

This has a bit of an html bias, but it works for xml too.

===
import arsd.dom;
import std.file;
import std.stdio;
import std.conv;

void main() {
	auto document = new Document(readText("test12.xml"), true, true);

	auto map = document.requireSelector("map");

	writeln(to!int(map.width), "x", to!int(map.height));

	foreach(tile; document.getElementsByTagName("tile"))
		writeln(tile.gid);
}
===

$ dmd test12.d dom.d characterencodings.d
$ test12
25x19
<snip tile data>





Let me explain the lines:

	auto document = new Document(readText("test12.xml"), true, true);

We use std.file.readText to read the file as a string. Document's
constructor is: (string data, bool caseSensitive, bool strictMode).

So, "true, true" means it will act like an XML parser, instead of
trying to correct for html tag soup.


Now, document is a DOM, like you see in W3C or web browsers
(via javascript), though it is expanded with a lot of convenience
and sugar.

	auto map = document.requireSelector("map");

querySelector and requireSelector use CSS selector syntax
to fetch one element. querySelector may return null, whereas
requireSelector will throw an exception if the element is not
found.

You can learn more about CSS selector syntax on the web. I tried
to cover a good chunk of the standard, including most css2 and some
css3.

Here, I'm asking for the first element with tag name "map".


You can also use querySelectorAll to get all the elements that
match, returned as an array, which is great for looping.

	writeln(to!int(map.width), "x", to!int(map.height));


The attributes on an element are exposed via dot syntax,
or you can use element.getAttribute("name") if you
prefer.

They are returned as strings. Using std.conv.to, we can
easily convert them to integers.


	foreach(tile; document.getElementsByTagName("tile"))
		writeln(tile.gid);

And finally, we get all the tile tags in the document and
print out their gid attribute.

Note that you can also call the element search functions
on individual elements. That will only return that
element and its children.



Here, you didn't need it, but you can also use
element.innerText to get the text inside a tag,
pretty much covering basic data retrieval.




Note: my library is not good at handling huge files;
it eats a good chunk of memory and loads the whole
document at once. But, it is the easiest way I've
seen (I'm biased though) to work with xml files,
so I like it.
March 20, 2012
On Tuesday, 20 March 2012 at 04:32:13 UTC, Adam D. Ruppe wrote:
> I know very little about std.xml (I looked at it and
> said 'meh' and wrote my own lib), but my lib
> makes this pretty simple.
>
> https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff
>
> grab dom.d and characterencodings.d
>
> This has a bit of an html bias, but it works for xml too.
>
> ===
> import arsd.dom;
> import std.file;
> import std.stdio;
> import std.conv;
>
> void main() {
> 	auto document = new Document(readText("test12.xml"), true, true);
>
> 	auto map = document.requireSelector("map");
>
> 	writeln(to!int(map.width), "x", to!int(map.height));
>
> 	foreach(tile; document.getElementsByTagName("tile"))
> 		writeln(tile.gid);
> }
> ===
>
> $ dmd test12.d dom.d characterencodings.d
> $ test12
> 25x19
> <snip tile data>
>
>
>
>
>
> Let me explain the lines:
>
> 	auto document = new Document(readText("test12.xml"), true, true);
>
> We use std.file.readText to read the file as a string. Document's
> constructor is: (string data, bool caseSensitive, bool strictMode).
>
> So, "true, true" means it will act like an XML parser, instead of
> trying to correct for html tag soup.
>
>
> Now, document is a DOM, like you see in W3C or web browsers
> (via javascript), though it is expanded with a lot of convenience
> and sugar.
>
> 	auto map = document.requireSelector("map");
>
> querySelector and requireSelector use CSS selector syntax
> to fetch one element. querySelector may return null, whereas
> requireSelector will throw an exception if the element is not
> found.
>
> You can learn more about CSS selector syntax on the web. I tried
> to cover a good chunk of the standard, including most css2 and some
> css3.
>
> Here, I'm asking for the first element with tag name "map".
>
>
> You can also use querySelectorAll to get all the elements that
> match, returned as an array, which is great for looping.
>
> 	writeln(to!int(map.width), "x", to!int(map.height));
>
>
> The attributes on an element are exposed via dot syntax,
> or you can use element.getAttribute("name") if you
> prefer.
>
> They are returned as strings. Using std.conv.to, we can
> easily convert them to integers.
>
>
> 	foreach(tile; document.getElementsByTagName("tile"))
> 		writeln(tile.gid);
>
> And finally, we get all the tile tags in the document and
> print out their gid attribute.
>
> Note that you can also call the element search functions
> on individual elements. That will only return that
> element and its children.
>
>
>
> Here, you didn't need it, but you can also use
> element.innerText to get the text inside a tag,
> pretty much covering basic data retrieval.
>
>
>
>
> Note: my library is not good at handling huge files;
> it eats a good chunk of memory and loads the whole
> document at once. But, it is the easiest way I've
> seen (I'm biased though) to work with xml files,
> so I like it.

Thank you. I'll check it out.


May 18, 2012
On Tuesday, 20 March 2012 at 04:32:13 UTC, Adam D. Ruppe wrote:
> I know very little about std.xml (I looked at it and
> said 'meh' and wrote my own lib), but my lib
> makes this pretty simple.
>
> https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff
>
> grab dom.d and characterencodings.d
>
> This has a bit of an html bias, but it works for xml too.
>
> ===
> import arsd.dom;
> import std.file;
> import std.stdio;
> import std.conv;
>
> void main() {
> 	auto document = new Document(readText("test12.xml"), true, true);
>
> 	auto map = document.requireSelector("map");
>
> 	writeln(to!int(map.width), "x", to!int(map.height));
>
> 	foreach(tile; document.getElementsByTagName("tile"))
> 		writeln(tile.gid);
> }
> ===
>
> $ dmd test12.d dom.d characterencodings.d
> $ test12
> 25x19
> <snip tile data>
>

Hi Adam,

I'm also interested in your solution, as the std.xml page is so sparsely documented I can't make head nor tail of it.  Also, neither of the examples compile for me, making life that little bit harder!

Sadly, I can't get your code working either!  I have downloaded the folder zip from your github link, and extracted it so that all the .d files are living in C:\D\dmd2\src\phobos\arsd\

If I try to compile the code you gave above, I get a pile of linking errors using D 2.059:

C:\D\dmd2\windows\bin\dmd.exe parseSpain -O
OPTLINK (R) for Win32  Release 8.00.12
Copyright (C) Digital Mars 1989-2010  All rights reserved.
http://www.digitalmars.com/ctg/optlink.html
parseSpain.obj(parseSpain)
 Error 42: Symbol Undefined _D4arsd3dom12__ModuleInfoZ
parseSpain.obj(parseSpain)
 Error 42: Symbol Undefined _D4arsd3dom8__assertFiZv
parseSpain.obj(parseSpain)
 Error 42: Symbol Undefined _D4arsd3dom24ElementNotFoundException7__ClassZ
parseSpain.obj(parseSpain)
 Error 42: Symbol Undefined _D4arsd3dom24ElementNotFoundException6__ctorMFAyaAya
AyaiZC4arsd3dom24ElementNotFoundException
parseSpain.obj(parseSpain)
 Error 42: Symbol Undefined _D4arsd3dom8Document6__ctorMFAyabbZC4arsd3dom8Docume
nt
parseSpain.obj(parseSpain)
 Error 42: Symbol Undefined _D4arsd3dom8Document7__ClassZ
--- errorlevel 6


Do you have any idea what's going on?!
May 18, 2012
On Friday, 18 May 2012 at 23:08:59 UTC, Iain wrote:
> If I try to compile the code you gave above, I get a pile of linking errors using D 2.059:

You have to link in the modules too on the command line

dmd.exe parseSpain arsd/dom.d arsd/characterencoding.d

(or whatever the full path to the modules is)

May 18, 2012
On Friday, 18 May 2012 at 23:16:26 UTC, Adam D. Ruppe wrote:
> On Friday, 18 May 2012 at 23:08:59 UTC, Iain wrote:
>> If I try to compile the code you gave above, I get a pile of linking errors using D 2.059:
>
> You have to link in the modules too on the command line
>
> dmd.exe parseSpain arsd/dom.d arsd/characterencoding.d
>
> (or whatever the full path to the modules is)

Aah thank you!  Finally, an XML parser that works in D!!!
May 19, 2012
On Friday, 18 May 2012 at 23:31:05 UTC, Iain wrote:
> Aah thank you!  Finally, an XML parser that works in D!!!

Adam, thanks for this!  I guess you don't need much documentation for your code, as you can just look up the wealth of tutorials that have been written for Javascript's XML parser.

I have re-jigged one of std.xml's examples as follows - and it works!

If there were a vote (and there probably should be) I would suggest your code ought to replace std.xml.  How can D be taken seriously when it has major parts of the standard library broken?



/*
 *  read all the titles from book.xml
 *
 *  uses dom.d and characterencodings.d by alex d ruppe:
 *  https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff
 */

import arsd.dom;
import std.file;
import std.stdio;
import std.conv;

void main()
{
	// http://msdn2.microsoft.com/en-us/library/ms762271(VS.85).aspx
	auto document = new Document(readText("book.xml"), true, true);

	auto map = document.requireSelector("catalog");

	foreach (book; document.getElementsByTagName("book"))
	{
		string title = book.getElementsByTagName("title")[0].innerText();
		
		writeln(title);
	}
}





May 19, 2012
On Saturday, 19 May 2012 at 00:00:50 UTC, Iain wrote:
> I guess you don't need much documentation for your code, as
> you can just look up the wealth of tutorials that have been written for Javascript's XML parser.

Yeah, that's basically how I feel about it. I started writing
some documentation but haven't gotten around to finishing it
yet.

But, if you know Javascript, you can probably get work
done with my thing too.


> If there were a vote (and there probably should be) I would suggest your code ought to replace std.xml.

This has come up before and some people are for it, but my
code isn't built for speed or memory efficiency, so it isn't
right for everybody.