Jump to page: 1 2
Thread overview
Learning to XML with D
Feb 06, 2015
Derix
Feb 06, 2015
Chris
Feb 06, 2015
Marc Schütz
Feb 06, 2015
Adam D. Ruppe
Feb 09, 2015
Derix
Feb 06, 2015
CraigDillabaugh
Feb 06, 2015
CraigDillabaugh
Feb 06, 2015
Chris
Feb 06, 2015
CraigDillabaugh
Feb 06, 2015
Adam D. Ruppe
Feb 09, 2015
Derix
Feb 07, 2015
Arjan
February 06, 2015
So, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library.

There is this nice example :
http://dlang.org/phobos/std_xml.html#.DocumentParser
that I put to some use already, however some of the basics seem to escape me, specially in lines like

    xml.onEndTag["author"]       = (in Element e) { book.author      = e.text(); };

OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?)

What I don't quite grab is the construct (in Element e) , especially the *in* part.

Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering.

It's probably more a way to further qualify the argument e were passing to the  λ-function : could someone elaborate on that ?

Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-)


Thxxx
February 06, 2015
On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:
> So, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library.
>
> There is this nice example :
> http://dlang.org/phobos/std_xml.html#.DocumentParser
> that I put to some use already, however some of the basics seem to escape me, specially in lines like
>
>     xml.onEndTag["author"]       = (in Element e) { book.author
>    = e.text(); };
>
> OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?)
>
> What I don't quite grab is the construct (in Element e) , especially the *in* part.
>
> Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering.
>
> It's probably more a way to further qualify the argument e were passing to the  λ-function : could someone elaborate on that ?
>
> Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-)
>
>
> Thxxx

The documentation says:

"Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term."

My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated.

If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).

There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
February 06, 2015
On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:
> If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).

Another place to look is http://code.dlang.org/ , which contains packages usable with DUB. There you can find KXML, for example:
http://code.dlang.org/packages/kxml

>
> There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.

:-P

I think the reason is simply that someone has to do the actual work of pushing things forward. And to make matters worse, std.xml2 is marked as abandoned, so it would first have to be brought back into form before it can even be submitted.
February 06, 2015
On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:
> OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?)

yeah

> What I don't quite grab is the construct (in Element e) , especially the *in* part.

Function parameters in D can be qualified as in or out, optionally:

http://dlang.org/function.html#parameters

(in Element e) means you are taking an argument of type Element that you only intend to take in to look at. An "in" parameter is const and you are not supposed to store a reference to it.

So basically, `in` on a function parameter means "look, don't touch".
February 06, 2015
On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:
> If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).

Yeah, if you're used to DOM work in Javascript, my dom.d works in a familiar way - it offers similar attributes, methods, uses css selector syntax if you want, etc.. You can download just that one file then build your program like "dmd yourfile.d dom.d" and it should just work, it has no outside dependencies.

Mine can do almost any xml, but the out of the box experience is focused on HTML. When combined with my characterencodings.d from the same repo, it can handle most web pages too, making it useful for scraping html sites.

> There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.

I don't have a problem with xml.... but my own lib Works For Me (tm) so I don't personally care much about what is or isn't in phobos....
February 06, 2015
On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:
> On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:
>> So, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library.
>>
>> There is this nice example :
>> http://dlang.org/phobos/std_xml.html#.DocumentParser
>> that I put to some use already, however some of the basics seem to escape me, specially in lines like
>>
>>    xml.onEndTag["author"]       = (in Element e) { book.author
>>   = e.text(); };
>>
>> OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?)
>>
>> What I don't quite grab is the construct (in Element e) , especially the *in* part.
>>
>> Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering.
>>
>> It's probably more a way to further qualify the argument e were passing to the  λ-function : could someone elaborate on that ?
>>
>> Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-)
>>
>>
>> Thxxx
>
> The documentation says:
>
> "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term."
>
> My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated.
>
> If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).
>
> There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.

I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor.  Are you busy this summer?

http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_Library

February 06, 2015
On Friday, 6 February 2015 at 14:09:51 UTC, CraigDillabaugh wrote:
> On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:
>> On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:
clip
>>>
>>>
>>> Thxxx
>>
>> The documentation says:
>>
>> "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term."
>>
>> My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated.
>>
>> If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).
>>
>> There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
>
> I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor.  Are you busy this summer?
>
> http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_Library

Just for the record, I hate XML too, but it is VERY widely used, so good XML support is essential ... like it or not!
February 06, 2015
On Friday, 6 February 2015 at 14:11:19 UTC, CraigDillabaugh wrote:
> On Friday, 6 February 2015 at 14:09:51 UTC, CraigDillabaugh wrote:
>> On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:
>>> On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:
> clip
>>>>
>>>>
>>>> Thxxx
>>>
>>> The documentation says:
>>>
>>> "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term."
>>>
>>> My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated.
>>>
>>> If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).
>>>
>>> There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
>>
>> I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor.  Are you busy this summer?
>>
>> http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_Library
>
> Just for the record, I hate XML too, but it is VERY widely used, so good XML support is essential ... like it or not!

You're right of course. It is widely (and wildly) used. I for my part have changed my input files from XML to a simpler custom format.

PS I am busy this summer. But maybe Adam's dom.d can be used as a basis for a new module, unlike std.xml2 it's not abandoned.
February 06, 2015
On Friday, 6 February 2015 at 14:15:44 UTC, Chris wrote:
> On Friday, 6 February 2015 at 14:11:19 UTC, CraigDillabaugh wrote:
>> On Friday, 6 February 2015 at 14:09:51 UTC, CraigDillabaugh wrote:
>>> On Friday, 6 February 2015 at 11:39:32 UTC, Chris wrote:
>>>> On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:
>> clip
>>>>>
>>>>>
>>>>> Thxxx
>>>>
>>>> The documentation says:
>>>>
>>>> "Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term."
>>>>
>>>> My advice is not to use it. I used it a while back, but it slowed down my system (why I still don't know), and it is permanently soon-to-be deprecated.
>>>>
>>>> If you wanna use D for XML parsing, see if you can find a solid 3rd party library in D (have a look at Adam's github page: https://github.com/adamdruppe/, he has some DOM and HTML stuff up there).
>>>>
>>>> There is a new xml module in the review queue, but nobody seems to care. I _think_ the reason why nobody really cares is that most people in the D community don't like XML.
>>>
>>> I added XML to the GSOC idea's page (see Phobos section), but it still needs a mentor.  Are you busy this summer?
>>>
>>> http://wiki.dlang.org/GSOC_2015_Ideas#Phobos:_D_Standard_Library
>>
>> Just for the record, I hate XML too, but it is VERY widely used, so good XML support is essential ... like it or not!
>
> You're right of course. It is widely (and wildly) used. I for my part have changed my input files from XML to a simpler custom format.
>
> PS I am busy this summer. But maybe Adam's dom.d can be used as a basis for a new module, unlike std.xml2 it's not abandoned.

Thanks for the tip.  I may add a reference there!
February 07, 2015
On Friday, 6 February 2015 at 09:15:54 UTC, Derix wrote:
> So, I set sails to transform a bunch of HTML files with D. This, of course, will happen with the std.xml library.
>
> There is this nice example :
> http://dlang.org/phobos/std_xml.html#.DocumentParser
> that I put to some use already, however some of the basics seem to escape me, specially in lines like
>
>     xml.onEndTag["author"]       = (in Element e) { book.author
>    = e.text(); };
>
> OK, we're doing some event-base parsing, reacting with a lambda function on encountering so-and-do tag, à la SAX. (are we ?)
>
> What I don't quite grab is the construct (in Element e) , especially the *in* part.
>
> Is it *in* as in http://dlang.org/expression.html#InExpression ? In which case I fail to see what associative array we're considering.
>
> It's probably more a way to further qualify the argument e were passing to the  λ-function : could someone elaborate on that ?
>
> Of course, it is entirely possible that I completely miss the point and that I'm overlooking some fundamentals, if so have mercy and help me find my way back to teh righteous path ;-)
>
>
> Thxxx

Maybe, when you're on windows, you could use msxml6 through COM.
You have DOM, SAX, Xpath 1.0 and XSLT at your disposal.

« First   ‹ Prev
1 2