Thread overview
converting to/from char[]/string
Mar 05, 2020
mark
Mar 05, 2020
drug
Mar 05, 2020
mark
Mar 05, 2020
Dennis
Mar 05, 2020
mark
Mar 05, 2020
mark
Mar 05, 2020
Adam D. Ruppe
Mar 05, 2020
mark
March 05, 2020
I want to use the Porter stemming algorithm.
There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txt

The main public function's signature is:

char[] stem(char[] p, int i, int j)

But I work entirely in terms of strings (containing individual words), so I want to add another function with this signature:

string stem(string word)

I've tried this without success:

    public string stem(string word) {
        import std.conv: to;

        char[] chars = word.to!char[];
        int end = chars.length.to!int;
        return stem(chars, 0, end).to!string;
    }

Here are just a few of the errors:

src/porterstemmer.d(197,13): Error: cannot implicitly convert expression s.length of type ulong to int
src/porterstemmer.d(222,9): Error: cannot implicitly convert expression cast(ulong)this.m_j + s.length of type ulong to int
src/porterstemmer.d(259,12): Error: function porterstemmer.PorterStemmer.ends(char[] s) is not callable using argument types (string)
src/porterstemmer.d(259,12):        cannot pass argument "sses" of type string to parameter char[] s

March 05, 2020
On 3/5/20 2:03 PM, mark wrote:
> I want to use the Porter stemming algorithm.
> There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txt
> 
> The main public function's signature is:
> 
> char[] stem(char[] p, int i, int j)
> 
> But I work entirely in terms of strings (containing individual words), so I want to add another function with this signature:
> 
> string stem(string word)
> 
> I've tried this without success:
> 
>      public string stem(string word) {
>          import std.conv: to;
> 
>          char[] chars = word.to!char[];
>          int end = chars.length.to!int; >          return stem(chars, 0, end).to!string;
>      }
> 
> Here are just a few of the errors:
> 
> src/porterstemmer.d(197,13): Error: cannot implicitly convert expression s.length of type ulong to int
> src/porterstemmer.d(222,9): Error: cannot implicitly convert expression cast(ulong)this.m_j + s.length of type ulong to int
> src/porterstemmer.d(259,12): Error: function porterstemmer.PorterStemmer.ends(char[] s) is not callable using argument types (string)
> src/porterstemmer.d(259,12):        cannot pass argument "sses" of type string to parameter char[] s
> 
Your code and errors seem to be not related.
March 05, 2020
On Thursday, 5 March 2020 at 11:12:24 UTC, drug wrote:
> On 3/5/20 2:03 PM, mark wrote:
[snip]
> Your code and errors seem to be not related.

OK, it is probably that the D stemmer is 19 years old!

I've now got Martin Porter's own Java version, so I'll have a go at porting that to D myself.
March 05, 2020
On Thursday, 5 March 2020 at 11:31:43 UTC, mark wrote:
> I've now got Martin Porter's own Java version, so I'll have a go at porting that to D myself.

I don't think that's necessary, the errors seem easy to fix.

> src/porterstemmer.d(197,13): Error: cannot implicitly convert expression s.length of type ulong to int
> src/porterstemmer.d(222,9): Error: cannot implicitly convert expression cast(ulong)this.m_j + s.length of type ulong to int

These errors are probably because the code was only compiled on 32-bit targets where .length is of type `uint`, but you are compiling on 64-bit where .length is of type `ulong`.
A quick fix is to simply cast the result like `cast(int) s.length` and `cast(int) (this.m_j + s.length)`, though a proper fix would be to change the types of variables to `long`, `size_t`, `auto` or `const` (depending on which is most appropriate).

> src/porterstemmer.d(259,12): Error: function porterstemmer.PorterStemmer.ends(char[] s) is not callable using argument types (string)
> src/porterstemmer.d(259,12):        cannot pass argument "sses" of type string to parameter char[] s

These errors are because `string` is `immutable(char)[]`, meaning the characters may not be modified, while the function accepts a `char[]` which is allowed to mutate the characters.
I don't think the functions actually do that, so you can simply change `char[]` into `const(char)[]` so a string can be passed to those functions.
March 05, 2020
I changed int to size_t and used const(char[]) etc. as suggested.
It ran but crashed. Each crash was a range violation, so for each one I put in a guard so instead of

if ( ... m_b[m_k])

I used

if (m_k < m_b.length && ... m_b[m_k)

I did this kind of fix in three places.

The result is that it does some but not all the stemming!

Anyway, I'll compare it with the Python version and see if I can spot the problem(s).

Thanks.
March 05, 2020
I suspect the problem is using .length rather than some other size property.
March 05, 2020
On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:
> I want to use the Porter stemming algorithm.
> There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txt

I think I (or ketmar and I stole it from him) ported that very same file before:

https://github.com/adamdruppe/adrdox/blob/master/stemmer.d

By just adding `const` where appropriate it becomes compatible with string and you can slice to take care of the size thing.

https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512

is that stem function as a const slice
March 05, 2020
On Thursday, 5 March 2020 at 13:31:14 UTC, Adam D. Ruppe wrote:
> On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:
>> I want to use the Porter stemming algorithm.
>> There's a D implementation here: https://tartarus.org/martin/PorterStemmer/d.txt
>
> I think I (or ketmar and I stole it from him) ported that very same file before:
>
> https://github.com/adamdruppe/adrdox/blob/master/stemmer.d
>
> By just adding `const` where appropriate it becomes compatible with string and you can slice to take care of the size thing.
>
> https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512
>
> is that stem function as a const slice

I thought the problem was using char[] rather than dchar[], but evidently not.

I downloaded yours and it "just works": I didn't have to change anything. (dscanner gives a couple of const/immutable hints which I'll fix, but still.)

Might be good to ask to add yours to https://tartarus.org/martin/PorterStemmer/ since it works and the old one doesn't.

Thank you!