Thread overview
Strings and Slices
Feb 18, 2021
Mike Brown
Feb 18, 2021
Adam D. Ruppe
Feb 20, 2021
Mike Brown
Feb 20, 2021
Mike Brown
February 18, 2021
Hi All,

I'm rebuilding a C++, and the beginning section is a lexer that uses strings, and string_view.

Is slices comparable to a string_view?

The architecture of the lexer is a single layer (Non-)FSM Lexer. Basically a main loop, checking the first letter of the current input position, which then calls a function/lambda to continue from there.

string lex_identifier(ref string input) {
  ...
}

while(!input.empty()) {
  if (isAlpha(input.front)) {
    auto tmp = lex_identifier(input);
  }
}

I'm passing in a ref because I ideally want to iterate over a string, and to produce a slice to the lexeme. This needs a way to create a mark at a given point, and have an iteration point. These marks seem logically to be the slice start and end.

Is there a better way to make these slices?
The ref doesn't work well with unittests that pass in a literal, is there an easier way than creating a temp var for the input?

In the body of the lex_identifier, i am using drop(). This doesn't seem to do what I thought it did. I want to create a slice from the beginning of a ref slice upto a given mark, and move the beginning point of that ref slice to that mark also.

what is the best way to achieve this?

Kind regards,
Mikey



February 18, 2021
On Thursday, 18 February 2021 at 20:47:33 UTC, Mike Brown wrote:
> Is slices comparable to a string_view?

My c++ is rusty af but yes I think so.

A d slice is `struct slice { size_t length; T* ptr; }` so when in doubt just think back to what that does.

> string lex_identifier(ref string input) {

And that makes this ref iffy. Since it is already passed as a ptr+length pair, you rarely need ref on it. Only when you'd use a char** in C; that is, when you can reassign the value of the pointer and have the caller see that change (e.g. you are appending to it). If you're just looking, no need for ref.

> In the body of the lex_identifier, i am using drop(). This doesn't seem to do what I thought it did. I want to create a slice from the beginning of a ref slice upto a given mark, and move the beginning point of that ref slice to that mark also.

I do it in two steps:

piece = input[0 .. mark]; // get piece out
input = input[mark .. $]; // advance the original slice


Note that such operations are just `ptr += mark; length -= mark;` so they are very cheap.

February 20, 2021
On Thursday, 18 February 2021 at 21:08:45 UTC, Adam D. Ruppe wrote:
> On Thursday, 18 February 2021 at 20:47:33 UTC, Mike Brown wrote:
>> [...]
>
> My c++ is rusty af but yes I think so.
>
> A d slice is `struct slice { size_t length; T* ptr; }` so when in doubt just think back to what that does.
>
>> [...]
>
> And that makes this ref iffy. Since it is already passed as a ptr+length pair, you rarely need ref on it. Only when you'd use a char** in C; that is, when you can reassign the value of the pointer and have the caller see that change (e.g. you are appending to it). If you're just looking, no need for ref.
>
>> [...]
>
> I do it in two steps:
>
> piece = input[0 .. mark]; // get piece out
> input = input[mark .. $]; // advance the original slice
>
>
> Note that such operations are just `ptr += mark; length -= mark;` so they are very cheap.

Thank you. Is there a standardised type to make "mark"? size_t or is a normal integer suitable?
February 20, 2021
On Saturday, 20 February 2021 at 19:28:00 UTC, Mike Brown wrote:
> On Thursday, 18 February 2021 at 21:08:45 UTC, Adam D. Ruppe wrote:
>> [...]
>
> Thank you. Is there a standardised type to make "mark"? size_t or is a normal integer suitable?

Ah, and whats the recommended way to iterate over a slice using a mark? Can I get the current iteration point from a foreach loop?
February 20, 2021
On 2/20/21 2:31 PM, Mike Brown wrote:
> On Saturday, 20 February 2021 at 19:28:00 UTC, Mike Brown wrote:
>> On Thursday, 18 February 2021 at 21:08:45 UTC, Adam D. Ruppe wrote:
>>> [...]
>>
>> Thank you. Is there a standardised type to make "mark"? size_t or is a normal integer suitable?
> 
> Ah, and whats the recommended way to iterate over a slice using a mark? Can I get the current iteration point from a foreach loop?

ints work as slice endpoints just fine. They will get cast to size_t when used for slicing.

If you are going to keep the mark valid, you shouldn't slice away the input, because now 0 becomes the point at the mark.

Typically with slices, you don't store a position (sometimes), you just divvy up the slice into pieces you care about.

It all depends on what information is important.

-Steve