Thread overview
Port of Python's difflib.SequenceMatcher class
Dec 02, 2006
Michael Butscher
Dec 02, 2006
Walter Bright
Dec 04, 2006
Pragma
Dec 06, 2006
Michael Butscher
Dec 07, 2006
Bill Baxter
Dec 07, 2006
Oskar Linde
Dec 07, 2006
Oskar Linde
Dec 08, 2006
Bill Baxter
December 02, 2006
Hi,

a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at

  http://www.mbutscher.de/snippets/difflib_d20061202.zip

It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well).

Comments, critique?



Michael
December 02, 2006
Michael Butscher wrote:
> a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at
> 
>   http://www.mbutscher.de/snippets/difflib_d20061202.zip
> 
> It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well).
> 
> Comments, critique?

Yes: please put up a web page about it! See http://www.digitalmars.com/d/howto-promote.html
December 04, 2006
Michael Butscher wrote:
> Hi, 
> 
> a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at
> 
>   http://www.mbutscher.de/snippets/difflib_d20061202.zip
> 
> It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well).
> 
> Comments, critique?

I agree with Walter that you should throw this up on a page somewhere. I'm curious, but rarely have time to sift through sourcecode unless I'm in need of something specific - I develop using SVN 99% of the time, which does .diff output for me already.

But I *am* curious about how the porting went, what the pitfalls were, and how you worked around Python idioms and tuple types.  Also, I'm wondering if the D version brings any extra perks like better performance, or less/clearer code?

-- 
- EricAnderton at yahoo
December 06, 2006
Pragma wrote:
> Michael Butscher wrote:
> > Hi,
> > 
> > a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at
> > 
> >   http://www.mbutscher.de/snippets/difflib_d20061202.zip
> > 
> > It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well).
> > 
> > Comments, critique?
> 
> I agree with Walter that you should throw this up on a page somewhere.

At least I have mentioned it on the page

  http://www.mbutscher.de/software.html

as a "snippet" (it isn't much more, I think).



> I'm curious, but rarely have time to sift through sourcecode unless I'm in need of something specific - I develop using SVN 99% of the time, which does .diff output for me already.

I will need it later for a project written in Python (kind of personal wiki without server) to allow to store different versions of a wiki page.

When the time comes, I will add a little C interface for a DLL which mainly can create some sort of binary diff of two arbitrary byte-blocks and allows to apply the diff to the first block to create the second.


> But I *am* curious about how the porting went, what the pitfalls were, and how you worked around Python idioms and tuple types.

- The often used "self" was just translated to "this" therefore the code looks a bit weird in D, e.g.:


    void set_seq2(ST b)
    {
        if (b is this.b)
            return;
        this.b = b;
        this.matching_blocks = null;
        this.opcodes = null;
        this.fullbcount = null;
        this.chain_b();
    }


- One thing I really missed in D was the get() method for Python dictionaries with a default argument. Therefore I created inner functions like

        IndexType j2lenget(IndexType i, IndexType def)
        {
            IndexType* result = i in j2len;
            if (result)
                return *result;
            else
                return def;
        }

Probably this can be done more elegantly, but I personally think that get() should be a standard method of AAs.



- The class used only two types of tuples which had clear purposes, so they were translated into structs without much harm.



> Also, I'm wondering if the D version brings any extra perks like better performance, or less/clearer code?

I have not yet done any benchmarks, but I just assume that D is much faster.


The D code is a bit longer and IMHO a bit less readable than Python, but I'm much more used to Python than D.


Michael
December 07, 2006
Michael Butscher wrote:

> - One thing I really missed in D was the get() method for Python dictionaries with a default argument. Therefore I created inner functions like
> 
>         IndexType j2lenget(IndexType i, IndexType def)
>         {
>             IndexType* result = i in j2len;
>             if (result)
>                 return *result;
>             else
>                 return def;
>         }
> 
> Probably this can be done more elegantly, but I personally think that
> get() should be a standard method of AAs.

+1.  Me too.

If IFTI were smarter, something like this would do the trick:

V get(V,K)(V[K] dict, K key, V def = V.init)
{
    V* ptr = key in dict;
    return ptr? *ptr: def;
}

The property trick works for AA's too so taking one instance of that:

char[] get(char[][int] dict, int key, char[] def = null)
{
    char[]* ptr = key in dict;
    return ptr? *ptr: def;
}

you can do:

    char[][int] i2s;
    i2s[1] = "Hello";
    i2s[5] = "There";

    writefln( i2s.get(1, "yeh") );
    writefln( i2s.get(2, "default") );
    writefln( i2s.get(1) );
    writefln( i2s.get(2) );

Too bad the template version doesn't work.
D doesn't seem to be able to pick out the V and K from an associative array argument.

--bb
December 07, 2006
Bill Baxter wrote:
> Michael Butscher wrote:
> 
>> - One thing I really missed in D was the get() method for Python dictionaries with a default argument. Therefore I created inner functions like
>>
>>         IndexType j2lenget(IndexType i, IndexType def)
>>         {
>>             IndexType* result = i in j2len;
>>             if (result)
>>                 return *result;
>>             else
>>                 return def;
>>         }
>>
>> Probably this can be done more elegantly, but I personally think that
>> get() should be a standard method of AAs.
> 
> +1.  Me too.
> 
> If IFTI were smarter, something like this would do the trick:
> 
> V get(V,K)(V[K] dict, K key, V def = V.init)
> {
>     V* ptr = key in dict;
>     return ptr? *ptr: def;
> }

And what compiler do you use? The above code works perfectly. :)

The following two get functions have been part of my own standard imports for quite a while and I find them very handy.

T get(T,U)(T[U] aa, U key) {
        T* ptr = key in aa;
        return ptr ? *ptr : T.init;
}

bool get(T,U,int dummy=1)(T[U] aa, U key, out T val) {
        T* ptr = key in aa;
        if (!ptr)
                return false;
        val = *ptr;
        return true;
}

/Oskar
December 07, 2006
Bill Baxter wrote:

> V get(V,K)(V[K] dict, K key, V def = V.init)
> {
>     V* ptr = key in dict;
>     return ptr? *ptr: def;
> }
> 
[snip]
>     char[][int] i2s;
>     i2s[1] = "Hello";
>     i2s[5] = "There";
> 
>     writefln( i2s.get(1, "yeh") );
>     writefln( i2s.get(2, "default") );
>     writefln( i2s.get(1) );
>     writefln( i2s.get(2) );
> 
> Too bad the template version doesn't work.
> D doesn't seem to be able to pick out the V and K from an associative array argument.

Sorry, i missed this part. The compiler is confused by not being able to tell if V should be char[] or char[3].

writefln( i2s.get(1, "yeh"[]) );
writefln( i2s.get(2, "default"[]) );

both works. So you are right. The IFTI could perhaps be improved by figuring out that both V argument types are implicitly convertible to the same type.

/Oskar
December 08, 2006
Oskar Linde wrote:
> Bill Baxter wrote:
> 
>> V get(V,K)(V[K] dict, K key, V def = V.init)
>> {
>>     V* ptr = key in dict;
>>     return ptr? *ptr: def;
>> }
>>
> [snip]
>>     char[][int] i2s;
>>     i2s[1] = "Hello";
>>     i2s[5] = "There";
>>
>>     writefln( i2s.get(1, "yeh") );
>>     writefln( i2s.get(2, "default") );
>>     writefln( i2s.get(1) );
>>     writefln( i2s.get(2) );
>>
>> Too bad the template version doesn't work.
>> D doesn't seem to be able to pick out the V and K from an associative array argument.
> 
> Sorry, i missed this part. The compiler is confused by not being able to tell if V should be char[] or char[3].
> 
> writefln( i2s.get(1, "yeh"[]) );
> writefln( i2s.get(2, "default"[]) );
> 
> both works. So you are right. The IFTI could perhaps be improved by figuring out that both V argument types are implicitly convertible to the same type.
> 
> /Oskar

Oh, ok.  So I was right, but for the wrong reason.  :-)  The compiler message wasn't very specific about what it didn't like, just "no match" was all it was willing to divulge.

These char[] char[N] conversion issues are rather annoying.


--bb