May 14, 2012
On Monday, 14 May 2012 at 17:05:17 UTC, Dmitry Olshansky wrote:
> On 14.05.2012 19:10, Roman D. Boiko wrote:
>> (Subj.) I'm in doubt which to choose for my case, but this is a generic
>> question.
>>
>> http://forum.dlang.org/post/odcrgqxoldrktdtarskf@forum.dlang.org
>>
>> Cross-posting here. I would appreciate any feedback. (Whether to reply
>> in this or that thread is up to you.) Thanks
> On 14.05.2012 19:10, Roman D. Boiko wrote:
> Oops, sorry I meant to post to NG only :) Repost:
>
> Clearly you are puting too much pressure on Token.
>
> In my mind it should be real simple:
>
> struct Token{
>     uint col, line;
>     uint flags;//indicated info about token, serves as both type tag and flag set;
> //indicates proper type once token was cooked (like "31.415926" -> 3.145926e1) i.e. values are calculated
>     union {
>         string     chars;
>         float     f_val;
>         double     d_val;
>         uint     uint_val;
>         long     long_val;
>         ulnog     ulong_val;
>         //... anything else you may need (8 bytes are plenty)
>     }//even then you may use up to 12bytes
>     //total size == 24 or 20
> };
>
> Where:
>     Each raw token at start has chars == slice of characters in text (or if not UTF-8 source = copy of source). Except for keywords and operators.
>     Cooking is a process of calculating constant values and such (say populating symbols table will putting symbol id into token instead of leaving string slice). Do it on the fly or after the whole source - let the user choose.
>
> Value types have nice property of being real fast, I suggest you to do at least some syntetic tests before going with ref-based stuff. Pushing 4 word is cheap, indirection never is. Classes also have hidden mutex _monitor_ field so using 'class' can be best described as suicide.
>
> Yet you may go with freelist of tokens (leaving them as structs). It's an old and proven way.
>
>
> About row/col - if this stuff is encoded into Finite Automation (and you sure want to do something like DFA) it comes out almost at no cost.
> The only disadvantage is complicating DFA tables with some irregularities of "true Unicode line ending sequences". It's more of nuisance then real problem though.
>
> P.S. if you real bend on performance, I suggest to run sparate Aho-Corassic-style thing for keywords, it would greatly simplify (=speed up) DFA structure if keywords are not hardcoded into automation.

Thanks Dmitry, I think I'll need to contact you privately about some details later.
May 14, 2012
On 14.05.2012 21:20, Roman D. Boiko wrote:
> On Monday, 14 May 2012 at 16:41:39 UTC, Jonathan M Davis wrote:
>> On Monday, May 14, 2012 17:10:23 Roman D. Boiko wrote:
>>> (Subj.) I'm in doubt which to choose for my case, but this is a
>>> generic question.
>>>
>>> http://forum.dlang.org/post/odcrgqxoldrktdtarskf@forum.dlang.org
>>>
>>> Cross-posting here. I would appreciate any feedback. (Whether to
>>> reply in this or that thread is up to you.) Thanks
>>
>> For the question in general, there are primarily 2 questions to consider:
>>
>> 1. Do you need inheritance/polymorphism?
>> don't have inheritance.
>>
>> 2. Do you need a value type or a reference type?
>>
>> If you need inheritance/polymorphism, you have to use classes, because
>> structs
>>
>> If you want a value type, you have to use structs, because classes are
>> reference types.
>>
>> If you want a reference type, then you can use either a class or a
>> struct, but
>> it's easier with classes, because classes are always reference types,
>> whereas
>> structs are naturally value types, so it can take more work to make them
>> reference types depending on what its member variables are.
>>
>> One other thing to consider is deterministic destruction. The only way
>> to get
>> deterministic destruction is to have a struct on the stack. Value
>> types will
>> naturally get that, but if you want a reference type with determinstic
>> destruction, then you're probably going to have to use a ref-counted
>> struct.
>>
>> In the vast majority of cases, that should be enough to decide whether
>> you
>> need a class or a struct. The main case that it doesn't is the case
>> where you
>> want a reference type and don't necessarily want a class, and that
>> decision
>> can get complicated.
>>
>> In the case of a token in a lexer, I would definitely expect that to
>> be a value
>> type, which means using a struct.
>>
>> - Jonathan M Davis
> Semantics is clearly of value type. But it would be nice to
> minimize size of instance to be copied. (I could simply split it
> into two, if some part is not used in most scenarios.)
>
> Just curious: how to implement a struct as ref type? I could add
> indirection (but then why not just use a class?), or simply pass
> by ref into all methods. Are there any other options?
>

Use the damn pointer ;)

In a few words (lines) overwrite struct contents (first 4-8 bytes) with pointer to next free token:

//setup:
void get_some_memory()
{
	Token pool = new Token[block_size];
	for(size_t i=0;i<pool.length-1; i++)
		*(void**)(tok.ptr+i) =   tok.ptr+i+1;
	*(void**)&tok[$-1]  = null; //end of list
}
Token* head = pool.ptr;//this guy in TLS

//usage:
Token* new_tok(){
	return head ? get_some_memory(), head : head;
}

void free_tok(Token* tok){
	*(void**)tok =  head;
	head = tok;
}

Untested but trustworthy ;)

> Thanks to everybody for your feedback, it looks like I'll stay
> with struct.


-- 
Dmitry Olshansky
May 14, 2012
On 14.05.2012 21:33, Dmitry Olshansky wrote:
> On 14.05.2012 21:20, Roman D. Boiko wrote:
>> On Monday, 14 May 2012 at 16:41:39 UTC, Jonathan M Davis wrote:
>>> On Monday, May 14, 2012 17:10:23 Roman D. Boiko wrote:
>>>> (Subj.) I'm in doubt which to choose for my case, but this is a
>>>> generic question.
>>>>
>>>> http://forum.dlang.org/post/odcrgqxoldrktdtarskf@forum.dlang.org
>>>>
>>>> Cross-posting here. I would appreciate any feedback. (Whether to
>>>> reply in this or that thread is up to you.) Thanks
>>>
>>> For the question in general, there are primarily 2 questions to
>>> consider:
>>>
>>> 1. Do you need inheritance/polymorphism?
>>> don't have inheritance.
>>>
>>> 2. Do you need a value type or a reference type?
>>>
>>> If you need inheritance/polymorphism, you have to use classes, because
>>> structs
>>>
>>> If you want a value type, you have to use structs, because classes are
>>> reference types.
>>>
>>> If you want a reference type, then you can use either a class or a
>>> struct, but
>>> it's easier with classes, because classes are always reference types,
>>> whereas
>>> structs are naturally value types, so it can take more work to make them
>>> reference types depending on what its member variables are.
>>>
>>> One other thing to consider is deterministic destruction. The only way
>>> to get
>>> deterministic destruction is to have a struct on the stack. Value
>>> types will
>>> naturally get that, but if you want a reference type with determinstic
>>> destruction, then you're probably going to have to use a ref-counted
>>> struct.
>>>
>>> In the vast majority of cases, that should be enough to decide whether
>>> you
>>> need a class or a struct. The main case that it doesn't is the case
>>> where you
>>> want a reference type and don't necessarily want a class, and that
>>> decision
>>> can get complicated.
>>>
>>> In the case of a token in a lexer, I would definitely expect that to
>>> be a value
>>> type, which means using a struct.
>>>
>>> - Jonathan M Davis
>> Semantics is clearly of value type. But it would be nice to
>> minimize size of instance to be copied. (I could simply split it
>> into two, if some part is not used in most scenarios.)
>>
>> Just curious: how to implement a struct as ref type? I could add
>> indirection (but then why not just use a class?), or simply pass
>> by ref into all methods. Are there any other options?
>>
>
> Use the damn pointer ;)
>
> In a few words (lines) overwrite struct contents (first 4-8 bytes) with
> pointer to next free token:
>
> //setup:
> void get_some_memory()
> {
> Token pool = new Token[block_size];
> for(size_t i=0;i<pool.length-1; i++)
> *(void**)(tok.ptr+i) = tok.ptr+i+1;
> *(void**)&tok[$-1] = null; //end of list
> }
> Token* head = pool.ptr;//this guy in TLS
>
> //usage:
> Token* new_tok(){
	if(!head)
		get_some_memory();
	Token* r = head;
	head = *(token**)head;
	return r;
> }

Not so trustworthy :o)
But hopefully you get the idea. See something simillar in std.regex, though pointer in there also serves for intrusive linked-list.

>
> void free_tok(Token* tok){
> *(void**)tok = head;
> head = tok;
> }
>
> Untested but trustworthy ;)
>
>> Thanks to everybody for your feedback, it looks like I'll stay
>> with struct.
>
>


-- 
Dmitry Olshansky
May 14, 2012
On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
> But hopefully you get the idea. See something simillar in std.regex, though pointer in there also serves for intrusive linked-list.
Thanks, most likely I'll go your way.
May 15, 2012
On Monday, 14 May 2012 at 18:00:42 UTC, Roman D. Boiko wrote:
> On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
>> But hopefully you get the idea. See something simillar in std.regex, though pointer in there also serves for intrusive linked-list.

> Thanks, most likely I'll go your way.

 If you don't mind doing some lite informational reading, read up on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that right. I'm not telling you to use C (or the other tools mind you), but it does go into handling tokens of multiple types as well as managing them using pointers and unions. A good portion of that can carry over for your lexer and project.
May 15, 2012
On Monday, 14 May 2012 at 18:00:42 UTC, Roman D. Boiko wrote:
> On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
>> But hopefully you get the idea. See something simillar in std.regex, though pointer in there also serves for intrusive linked-list.

> Thanks, most likely I'll go your way.

 If you don't mind doing some lite informational reading, read up on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that right. I'm not telling you to use C (or the other tools mind you), but it does go into handling tokens of multiple types as well as managing them using pointers and unions. A good portion of that can carry over for your lexer and project.
May 15, 2012
On Tuesday, 15 May 2012 at 06:17:31 UTC, Era Scarecrow wrote:
> On Monday, 14 May 2012 at 18:00:42 UTC, Roman D. Boiko wrote:
>> On Monday, 14 May 2012 at 17:37:02 UTC, Dmitry Olshansky wrote:
>>> But hopefully you get the idea. See something simillar in std.regex, though pointer in there also serves for intrusive linked-list.
>
>> Thanks, most likely I'll go your way.
>
>  If you don't mind doing some lite informational reading, read up on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that right. I'm not telling you to use C (or the other tools mind you), but it does go into handling tokens of multiple types as well as managing them using pointers and unions. A good portion of that can carry over for your lexer and project.
I've got flex & bison edition (ISBN: 0596155972), will check there, thanks :)
May 15, 2012
On Tuesday, 15 May 2012 at 06:17:31 UTC, Era Scarecrow wrote:
>  If you don't mind doing some lite informational reading, read up on 'lex and yacc'. ISBN: 1-56592-000-7; Hope I got that right. I'm not telling you to use C (or the other tools mind you), but it does go into handling tokens of multiple types as well as managing them using pointers and unions. A good portion of that can carry over for your lexer and project.
Looks like I can introduce several lex/flex concepts in my lexer,
thank you!
I've created a task for myself to do this investigation:
https://github.com/roman-d-boiko/dct/issues/42
1 2
Next ›   Last »