Thread overview | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
May 14, 2009 [Issue 2979] New: Xml tags with only attributes return as without attributes ElementParser.parse | ||||
---|---|---|---|---|
| ||||
http://d.puremagic.com/issues/show_bug.cgi?id=2979 Summary: Xml tags with only attributes return as without attributes ElementParser.parse Product: D Version: 2.030 Platform: All OS/Version: All Status: NEW Keywords: patch Severity: minor Priority: P4 Component: Phobos AssignedTo: bugzilla@digitalmars.com ReportedBy: y0uf00bar@gmail.com CC: y0uf00bar@gmail.com Minor functionality problem in std.xml in Phobos. This is an easy patch fix. Consider that XML <mytag attr1="value1"/> is the effectively same thing as <mytag attr1="value1"/></mytag> The first version reads better, as if that is an xml issue. The onStartText delagate call back argument ElementParser will have a tag object with no attributes for the first version, and will have the attributes for the second. In the first the private Tag constructor parses the attributes and reports the Tag as TagType.EMPTY, in the sense that there is no content. Slight fix: In ElementParser.parse() at line 1945: else if (tag_.isEmpty) { Tag startTag = new Tag(tag_.name); //++ add these 2 lines if (tag_.attr.length > 0) foreach(tn,tv; tag_.attr) startTag.attr[tn]=tv; //so the returned Tag in the element. Alternately change the Tag constructor to report the Tag as START if it has attributes. But this will be a bigger change code flow design and efficiency. Either way, the onStartTag call returns a Tag with START -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
May 14, 2009 [Issue 2979] Xml tags with only attributes return as without attributes ElementParser.parse | ||||
---|---|---|---|---|
| ||||
Posted in reply to y0uf00bar@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=2979 Sobirari Muhomori <maxmo@pochta.ru> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |maxmo@pochta.ru --- Comment #1 from Sobirari Muhomori <maxmo@pochta.ru> 2009-05-14 06:36:02 PDT --- why does the code use new Tag instead of tag_ ? > Alternately change the Tag constructor to report the Tag as START if it has attributes. But this will be a bigger change code flow design and efficiency. Either way, the onStartTag call returns a Tag with START It's valid for EMPTY tag to have attributes and as I see Tag constructor parses empty tag with attributes and sets type to EMPTY. What's wrong with this? BTW found lack of support for ampersand-quoted attributes: (line 974) --- reqc(s,'='); munch(s,whitespace); string val; if(optc(s,'"')){ val = encode(munch(s,"^\"")); reqc(s,'"'); } else { reqc(s,'\''); val = encode(munch(s,"^'")); reqc(s,'\''); } munch(s,whitespace); attr[key] = val; --- -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
May 14, 2009 [Issue 2979] Xml tags with only attributes return as without attributes ElementParser.parse | ||||
---|---|---|---|---|
| ||||
Posted in reply to y0uf00bar@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=2979 --- Comment #2 from Sobirari Muhomori <maxmo@pochta.ru> 2009-05-14 06:40:32 PDT --- WTF? Why it's encode instead of decode? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
May 15, 2009 [Issue 2979] Xml tags with only attributes return as without attributes ElementParser.parse | ||||
---|---|---|---|---|
| ||||
Posted in reply to y0uf00bar@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=2979 --- Comment #3 from hed010gy <y0uf00bar@gmail.com> 2009-05-15 06:11:32 PDT --- (In reply to comment #1) > why does the code use new Tag instead of tag_ ? > Because thats in the original code! And I yet do not know the problem | D-code 2.0 | modules integral design well enough to say whether or not its best thing. It seems to work. My current position is that of a user of the code, playing around to see if I can do anything with the module. I noticed in parsing and debugging my own code in the onStartTag delegate callback, that if the tag was Empty but with attributes, the ElementParser xml object returned has a Tag object attached, but with no attributes. That is not what I want in an XML parser. I do not wish to extensively revise the module, only to make it work for what I want it to do, otherwise I will be using an Expat parser. Writing a good parser takes a long time. The original Author(s) design is to be judged at the moment by its external interface usability (D-friendlyness, similar to the idea of Pythonesque code) and correctness of result, and and ability to be enhanced or fix bugs in the behaviour, without breaking the rest of it. So it does parse the EMPTY tag and the attributes OK, but then creates a new Tag using only the name from the parsed tag, as if it was really, really empty without any attributes at all. Empty means no markup content, but attributes allowed, hence I add here the lines to copy any attributes found. Because the new tag defaults to START, all the Tags to the onStartTag callback are marked as START even if it is empty. This seems depriving the module user of useful information, as why bother setting up for more work in the callback if the Tag is actually empty. So why a new Tag? I know this makes fore more cpu work. But module writer has had make some robust assumptions, especially in an early draft in a strange new language. (To me lots of it is still strange). The passed ElementParser object is not a const object, because the onStartTag call back likes to set various properites and delagates. Perhaps the user can modify the passed Tag at this point, and so the module functions are partly protected by passing objects whose modification will not hurt the its code. Before making more of this, I personally need to have to learn to create D unit test cases. All that fine contract - assertion stuff , doc comments, unit tests takes time and learning. My coding efforts usually stop as soon as the code appears to do what the test cases require, so more test cases is good. Having a more complete validation suite of files would be essential to bringing the parser up to a reasonable state. I found modules pretty output will not create the Empty Tag style. Possibly thats why it wasn't tested to read it either. I found I could improve the pretty function to make the output look like what I call pretty. The isEmptyXML for Tag, which is used by pretty always returns false. The function could instead do a (items.length == 0) ? true : false; There is some argument has to whether a space is necessary before the /> of an Empty tag. Such EmptyTags with or without attributes are a compatibility hazard with SGML and HTML variant parsers. Output style needs to be modified depending on the intended consumer of the file. I am an XML abuser who happens to like empty tag style, and uses XML in a loose way not liked by markup purists and SGML, HTML and XHTML overs. So customizable bits for pretty for me. Emit Empty Tags ? true | false Space before /> in empty tag? true | false Having a few different or a customizable single output pretty formatters would be nice. Its easy enough to write another as well. Now I shall go back to using it for my code, using my slightly modified version, and get on with my own project, for I am far more interesting in abusing XML than writing and fixing parsers. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
May 15, 2009 [Issue 2979] Xml tags with only attributes return as without attributes ElementParser.parse | ||||
---|---|---|---|---|
| ||||
Posted in reply to y0uf00bar@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=2979 --- Comment #4 from hed010gy <y0uf00bar@gmail.com> 2009-05-15 06:50:39 PDT --- Now that I think about it a little, passing a copied tag back is very,very important. The user call back can hold references to all the Elements and Tag objects that can be assumed not to be further modified by the parser. Make them and drop them freely and let the GC do its business. A new tag needs to be created with every element anyway. I did try once the idea of making a parser that kept a dictionary of elements, so that there was only actual real copy of the element string name, and all element tags referenced it. Each time a new element was parsed, a look up was done on the table, and the reference returned , or a new entry made. Too much work. The concept of having multiple copies of the same element string in the XML DOM seems a waste, but I have learned to ignore it, and there is always more memory. Another memory / time / code tradeoff. The compressibility of XML by generic compression tools like 7zip is amazing. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
May 15, 2009 [Issue 2979] Xml tags with only attributes return as without attributes ElementParser.parse | ||||
---|---|---|---|---|
| ||||
Posted in reply to y0uf00bar@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=2979 --- Comment #5 from hed010gy <y0uf00bar@gmail.com> 2009-05-15 07:15:09 PDT --- Oh no! another thought. A sax-like parser throws out Tag and Element objects and does not need to bother to put them into a DOM. A consumer of the objects can create a DOM or filter-transform. Well, its all been tried/done before by lots of code. So the std.xml module might be better factored to support a more versatile usage. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
August 28, 2009 [Issue 2979] Xml tags with only attributes return as without attributes ElementParser.parse | ||||
---|---|---|---|---|
| ||||
Posted in reply to y0uf00bar@gmail.com | http://d.puremagic.com/issues/show_bug.cgi?id=2979 Andrei Alexandrescu <andrei@metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |andrei@metalanguage.com Resolution| |FIXED --- Comment #6 from Andrei Alexandrescu <andrei@metalanguage.com> 2009-08-28 09:17:31 PDT --- I have integrated hed010gy's first (small) fix but nothing else. We need to rewrite xml, so fixing it thoroughly first would be a bad investment. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- |
Copyright © 1999-2021 by the D Language Foundation