commonmark-d: A fast CommonMark and Github Flavoured Markdown parser, translation of MD4C

Sep 30, 2019

Guillaume Piolat

Oct 01, 2019

Mike Parker

Oct 01, 2019

Oct 02, 2019

Oct 01, 2019

Oct 02, 2019

Oct 02, 2019

Oct 02, 2019

Oct 03, 2019

Oct 05, 2019

Hello, commonmark-d is a D translation of MD4C, a fast SAX-like Markdown parser. MD4C achieves remarkable parsing speed through the lack of AST and careful memory usage. The route of translation was choosen because parsing Markdown is much more involved that first thought. The D translation largely preserve the speed benefits of M4DC. Usage: // Parse CommonMark, generate HTML import commonmarkd; string html = convertMarkdownToHTML(markdown); Key Performance Numbers: - commonmark-d compiles 3x faster than dmarkdown and 40x faster than hunt-markdown. - commonmark-d parses Markdown 2x faster than dmarkdown and 15x faster than hunt-markdown (see GitHub for benchmark details) I haven't measured memory usage of either compile time or run time, but I feel like it's also better. Available now on DUB: http://code.dlang.org/packages/commonmark-d GitHub page: https://github.com/p0nce/commonmark-d

On Monday, 30 September 2019 at 23:06:42 UTC, Guillaume Piolat wrote: > Hello, > > commonmark-d is a D translation of MD4C, a fast SAX-like Markdown parser. > MD4C achieves remarkable parsing speed through the lack of AST and careful memory usage. > > The route of translation was choosen because parsing Markdown is much more involved that first thought. The D translation largely preserve the speed benefits of M4DC. > > > Usage: > > // Parse CommonMark, generate HTML > import commonmarkd; > string html = convertMarkdownToHTML(markdown); > > Key Performance Numbers: > - commonmark-d compiles 3x faster than dmarkdown and 40x faster than hunt-markdown. > - commonmark-d parses Markdown 2x faster than dmarkdown and 15x faster than hunt-markdown (see GitHub for benchmark details) > > I haven't measured memory usage of either compile time or run time, but I feel like it's also better. > > Available now on DUB: http://code.dlang.org/packages/commonmark-d > GitHub page: https://github.com/p0nce/commonmark-d This is really nice. The examples show only conversion to html. Is there an easy way to get the intermediate output and convert to PDF through latex, to org-mode, etc., or to change the html conversion? One use case that is easy with Pandoc is to copy just the code from markdown into its own source file as a simple form of literate programming.

On Tuesday, 1 October 2019 at 11:37:00 UTC, Dennis wrote: > Cool! > > On Monday, 30 September 2019 at 23:06:42 UTC, Guillaume Piolat wrote: >> Key Performance Numbers: > > Have you compared it with the original C code from MD4C? No. It's completely possible that there is a small difference, however most of the code is under nothrow @nogc and only use GC to allocate the output buffer (the grow strategy might matter there). I don't expect much difference, but yeah, haven't tested :)

October 02, 2019

Re: commonmark-d: A fast CommonMark and Github Flavoured Markdown parser, translation of MD4C

Posted by Guillaume Piolat
in reply to bachmeier

Permalink

Guillaume Piolat

Posted in reply to bachmeier

Permalink

On Tuesday, 1 October 2019 at 16:02:47 UTC, bachmeier wrote:
> On Monday, 30 September 2019 at 23:06:42 UTC, Guillaume Piolat wrote:
>> Hello,
>>
>> commonmark-d is a D translation of MD4C, a fast SAX-like Markdown parser.
>> MD4C achieves remarkable parsing speed through the lack of AST and careful memory usage.
>>
>> The route of translation was choosen because parsing Markdown is much more involved that first thought. The D translation largely preserve the speed benefits of M4DC.
>>
>>
>> Usage:
>>
>>     // Parse CommonMark, generate HTML
>>     import commonmarkd;
>>     string html = convertMarkdownToHTML(markdown);
>>
>> Key Performance Numbers:
>>     - commonmark-d compiles 3x faster than dmarkdown and 40x faster than hunt-markdown.
>>     - commonmark-d parses Markdown 2x faster than dmarkdown and 15x faster than hunt-markdown (see GitHub for benchmark details)
>>
>> I haven't measured memory usage of either compile time or run time, but I feel like it's also better.
>>
>> Available now on DUB: http://code.dlang.org/packages/commonmark-d
>> GitHub page: https://github.com/p0nce/commonmark-d
>
> This is really nice. The examples show only conversion to html. Is there an easy way to get the intermediate output and convert to PDF through latex, to org-mode, etc., or to change the html conversion? One use case that is easy with Pandoc is to copy just the code from markdown into its own source file as a simple form of literate programming.

MD4C is a push parser without AST so you have to give it callbacks to generate any koind of intermediate output. You'd have to make md_parse public in commonmark-d, this is a C-style API

My long term goal is indeed super fast conversion of markdown to PDF, now we have the commonmark parser and the PDF generation, I just need the time to manage layout. Possibly making a minimal browser is a better route, dunno.

On Monday, 30 September 2019 at 23:06:42 UTC, Guillaume Piolat wrote: > Hello, > > I haven't measured memory usage of either compile time or run time, but I feel like it's also better. > Thanks, I like this project. Because hunt-markdown is strictly abstract in design, the performance is not particularly good:)

On Wednesday, 2 October 2019 at 09:33:03 UTC, zoujiaqing wrote: > On Monday, 30 September 2019 at 23:06:42 UTC, Guillaume Piolat wrote: >> Hello, >> >> I haven't measured memory usage of either compile time or run time, but I feel like it's also better. >> > > Thanks, I like this project. > > Because hunt-markdown is strictly abstract in design, the performance is not particularly good:) I wanted to use hunt-markdown but was thinking it could use a bit less RAM :) Translations look like the originals. I'd be very happy if you can consider commonmark-d for your use case. Having no AST is less flexible but have nice properties.

On Monday, 30 September 2019 at 23:06:42 UTC, Guillaume Piolat wrote: > Hello, > > commonmark-d is a D translation of MD4C, a fast SAX-like Markdown parser. > MD4C achieves remarkable parsing speed through the lack of AST and careful memory usage. > > The route of translation was choosen because parsing Markdown is much more involved that first thought. The D translation largely preserve the speed benefits of M4DC. > > > Usage: > > // Parse CommonMark, generate HTML > import commonmarkd; > string html = convertMarkdownToHTML(markdown); > > Key Performance Numbers: > - commonmark-d compiles 3x faster than dmarkdown and 40x faster than hunt-markdown. > - commonmark-d parses Markdown 2x faster than dmarkdown and 15x faster than hunt-markdown (see GitHub for benchmark details) > > I haven't measured memory usage of either compile time or run time, but I feel like it's also better. > > Available now on DUB: http://code.dlang.org/packages/commonmark-d > GitHub page: https://github.com/p0nce/commonmark-d d-markdown was actually extracted from vibe-d a a few years ago, mostly for a software called "harbored-mod", to add support for markdown in DDOC comments, so vibe-d MD module should still be in the same magnitude of "sub-optimal-ity". For conversions from MD to HTML, in a static context (i.e not a server), I'd just use Pandoc. markdown-d had some bugs. Maybe fixed in the newest vibe-d since the fork you compare to was basically dead-born.

Forums