August 02, 2012
On Thursday, 2 August 2012 at 07:11:36 UTC, Jacob Carlborg wrote:
> If you change Token to a struct it takes 64bytes on a LP64 platform. I don't know if that is too big to be passed around by value.

That's why I moved Token to a class in the first place.
It became far too big and you had to pass it around by
reference, which I thought defeated the purpose.

https://github.com/bhelyer/std.d.lexer/

Gonna spend some time massaging this into a
Walter-Approved (tm) lexer. It's got some ways to go.

August 02, 2012
In my dev work I've shaved some bytes off of Token.
I removed the filename from Location, as we don't assume
the input is a file anymore, and I've changed to tracking
line and column numbers as uint instead of size_t.

I don't know what kind of number I _should_ be aiming for,
but I'd imagine I'm not gonna get it that small.
August 02, 2012
On 2012-08-02 09:26, Bernard Helyer wrote:
> In my dev work I've shaved some bytes off of Token.
> I removed the filename from Location, as we don't assume
> the input is a file anymore, and I've changed to tracking
> line and column numbers as uint instead of size_t.
>
> I don't know what kind of number I _should_ be aiming for,
> but I'd imagine I'm not gonna get it that small.

I think the source location is calculated on demand based on that offset.

You can probably shave off a couple of bytes by using a (u)short or (u)byte instead of TokenKind. The TokenKind takes 32 bits, that's way more then what's actually needed.

-- 
/Jacob Carlborg
August 02, 2012
On Thursday, 2 August 2012 at 07:42:05 UTC, Jacob Carlborg wrote:
> You can probably shave off a couple of bytes by using a (u)short or (u)byte instead of TokenKind. The TokenKind takes 32 bits, that's way more then what's actually needed.

Good point. I think there's 180 ish at the moment, so we can
get away with a ubyte save a cambrian explosion of new
keywords. :P

August 02, 2012
On Thursday, 2 August 2012 at 05:36:37 UTC, Walter Bright wrote:
> Using a class implies an extra level of indirection, and the other issue is the only point to using a class is if you're going to derive from it and override its methods. I don't see that for a Token.
>
> Use pass-by-ref for the Token.

You'll always have an extra layer of indirection if you aim not to pass by value. By only exposing a pointer/class reference you make it impossible to do the wrong thing by implicitly copying the struct; and if we have a struct which is only ever meant to be used through a pointer, we're better off using a class.

Of course, if we can trim the size of the struct sufficiently I'm all for using a value type; then we would also lose the two-word overhead all classes have (but really shouldn't have to have; the monitor should only be on synchronized classes and we wouldn't have to force a vpointer if Object wasn't bloated, but that's an argument for another day).
August 02, 2012
Le 02/08/2012 07:35, Walter Bright a écrit :
> Using a class implies an extra level of indirection, and the other issue
> is the only point to using a class is if you're going to derive from it
> and override its methods. I don't see that for a Token.
>
> Use pass-by-ref for the Token.
>

The fact that ref for classes and ref for everything else works differently don't really help here.
August 02, 2012
On 8/2/2012 12:04 AM, David Nadlinger wrote:
> On Thursday, 2 August 2012 at 05:36:37 UTC, Walter Bright wrote:
>> Using a class implies an extra level of indirection, […]
>> Use pass-by-ref for the Token.
>
> How is pass-by-ref not an extra level of indirection?

If you have a "Lexer" instance that contains by value the current Token, then you deref to get to "Lexer". If Token is a class, then you deref to get "Lexer" and then deref again to get the current Token.


August 02, 2012
On 8/2/2012 12:22 AM, Bernard Helyer wrote:
> Gonna spend some time massaging this into a
> Walter-Approved (tm) lexer. It's got some ways to go.


Thank you. I've got a lot of experience writing lexers and heavily using them professionally, but I'm still finding ways to make them better - faster - more encapsulated.

Since std.d.lexer won't be a third party module, but one that comes with a compiler, it must pass muster as a top quality one and be as good as the one in the compiler.
August 03, 2012
On Thursday, 2 August 2012 at 20:05:59 UTC, Walter Bright wrote:
> On 8/2/2012 12:22 AM, Bernard Helyer wrote:
>> Gonna spend some time massaging this into a
>> Walter-Approved (tm) lexer. It's got some ways to go.
>
>
> Thank you. I've got a lot of experience writing lexers and heavily using them professionally, but I'm still finding ways to make them better - faster - more encapsulated.
>
> Since std.d.lexer won't be a third party module, but one that comes with a compiler, it must pass muster as a top quality one and be as good as the one in the compiler.

I will make it my mission to kick your
(metaphorical performance based) ass, sir.
August 03, 2012
On 8/2/2012 7:21 PM, Bernard Helyer wrote:
> I will make it my mission to kick your
> (metaphorical performance based) ass, sir.

I am looking forward to a good ass-kicking lexer!