Best practices for parsing files

Jan 25, 2007

lurker

Jan 25, 2007

BCS

Jan 25, 2007

Jan 25, 2007

Jan 25, 2007

Jan 25, 2007

Jan 25, 2007

Jan 26, 2007

Jan 26, 2007

Hi. I'm new to D but not to programming. I would like to write a small scripting engine using the great D programming language but I'm undecided on what techniques should use to parse source files. Since slices seem to be a central feature of D I was thinking on reading the whole file in memory and use slices to build the syntax tree. Does anyone have examples of parsing files using this method? Any other methods I should consider? Thanks.

Reply to lurker, > Hi. > > I'm new to D but not to programming. I would like to write a small > scripting engine using the great D programming language but I'm > undecided on what techniques should use to parse source files. > > Since slices seem to be a central feature of D I was thinking on > reading the whole file in memory and use slices to build the syntax > tree. > > Does anyone have examples of parsing files using this method? > > Any other methods I should consider? > > Thanks. > Enki would be my choice if you don't mind using a code generator http://www.dsource.org/projects/ddl/wiki/Enki If you are feeling adventurous you can try dparse http://www.dsource.org/projects/scrapple/browser/trunk/dparser/dparse.d It's not vary mature but it's kind of fun to play with. (full disclosure: I wrote dparse)

Both suggestions are very interesting and I'll be evaluating them; but what I was hoping was something more on the line of DMD's parser (been insanely fast): A hand-written parser. We also thought of translating it to D just as an exercise to learn how it works. You see, one of my concerns (and the primary reason to use D) is parsing speed: I'm going to parse lot's and lot's of those files and memory consumption almost isn't an issue since we have lots of it. Also, the tasks will be executed on a thread pool and we don't want to face locking problems with code generated by some tool. At least if we write the code we'll know who to blame. :D Thanks.

lurker wrote: > Both suggestions are very interesting and I'll be evaluating them; but what I was > hoping was something more on the line of DMD's parser (been insanely fast): A > hand-written parser. We also thought of translating it to D just as an exercise to > learn how it works. Somebody did that already, it's not been updated for a couple of months though: http://www.dsource.org/projects/dparser

January 25, 2007

Re: Best practices for parsing files

Posted by BCS
in reply to lurker

Permalink

BCS

Posted in reply to lurker

Permalink

Reply to lurker,

> Both suggestions are very interesting and I'll be evaluating them; but
> what I was hoping was something more on the line of DMD's parser (been
> insanely fast): A hand-written parser. We also thought of translating
> it to D just as an exercise to learn how it works.
> 
> You see, one of my concerns (and the primary reason to use D) is
> parsing speed: I'm going to parse lot's and lot's of those files and
> memory consumption almost isn't an issue since we have lots of it.
>

Ah, then I guess you won't want an LL parser. 

> 
> Also, the tasks will be executed on a thread pool and we don't want to
> face locking problems with code generated by some tool. At least if we
> write the code we'll know who to blame. :D
> 

Both should be thread safe (if you stick to one thread per file)


As far as slicing goes, I'm working on a parser that read a file into memory (I guess it could mmap it in as well) and converts it to an array of token structs. A parser will then walk on the array. If you new a big array of struct in advance and have your lexer write directly to the array (slicing out of the file where the text is important, that should be fairly fast. 

That's my 2 cents, I'm not sure how much help this will be (my parser is /not/ performance driven) but I hope it might help.

BCS wrote: > As far as slicing goes, I'm working on a parser that read a file into memory (I guess it could mmap it in as well) and converts it to an array of token structs. A parser will then walk on the array. If you new a big array of struct in advance and have your lexer write directly to the array (slicing out of the file where the text is important, that should be fairly fast. Excellent! Is any of your code available? We really would like take a look at your code (If possible). We are a little lost right now and by your description It seams very much like what we want to build. Thanks

lurker wrote: > Hi. > > I'm new to D but not to programming. I would like to write a small scripting > engine using the great D programming language but I'm undecided on what > techniques should use to parse source files. > > Since slices seem to be a central feature of D I was thinking on reading the > whole file in memory and use slices to build the syntax tree. > > Does anyone have examples of parsing files using this method? The DMD lexer works pretty much this way, and it's available in every DMD distribution :-) > Any other methods I should consider? This is the method I've used in the past, even in C++. It seems to make for cleaner code than the allocate/copy method, and it's faster to boot. Sean

Reply to lurker, > BCS wrote: > >> As far as slicing goes, I'm working on a parser that read a file into >> memory (I guess it could mmap it in as well) and converts it to an >> array of token structs. A parser will then walk on the array. If you >> new a big array of struct in advance and have your lexer write >> directly to the array (slicing out of the file where the text is >> important, that should be fairly fast. >> > Excellent! Is any of your code available? We really would like take a > look at your code (If possible). We are a little lost right now and by > your description It seams very much like what we want to build. > > Thanks > That isn't how my lexer works (at the moment), I was just saying I think it could be done. In fact, my app copies everything to make sure that it doesn't stomp on it's self. OTOH, it wouldn't be to hard to port it to what I described above, and I plan on posting the code when I get a bit closer to done.

Forums