Thread overview | |||||
---|---|---|---|---|---|
|
January 11, 2020 How to parse epub content | ||||
---|---|---|---|---|
| ||||
How would someone approach parsing epub files in D? Is there any libraries to parse XHTML? |
January 11, 2020 Re: How to parse epub content | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adnan | On Saturday, 11 January 2020 at 12:38:38 UTC, Adnan wrote:
> How would someone approach parsing epub files in D? Is there any libraries to parse XHTML?
XHTML is XML. There are libraries to parse XML, from std.xml in the standard library to libraries like dxml in the package repository.
|
January 11, 2020 Re: How to parse epub content | ||||
---|---|---|---|---|
| ||||
Posted in reply to Adnan | On Saturday, 11 January 2020 at 12:38:38 UTC, Adnan wrote: > How would someone approach parsing epub files in D? Is there any libraries to parse XHTML? I've done it before with my dom.d easily enough. The epub itself is a zip file. You might simply unzip it ahead of time, or use std.zip to access the contents easily enough. (basic zip file support is in phobos). Then once you get inside there's xhtml files which again are easy enough to parse. Like with my dom.d it is as simple as like import arsd.dom; // the true,true here tells it to use strict xml mode for xhtml // isn't really necessary though so it is ok auto document = new Document(string_holding_xml, true, true); foreach(ele; document.querySelectorAll("p")) writeln(ele.innerText); the api there is similar to javascript if you're familiar with that. |
Copyright © 1999-2021 by the D Language Foundation