Port of Python's difflib.SequenceMatcher class

Dec 02, 2006

Michael Butscher

Dec 02, 2006

Walter Bright

Dec 04, 2006

Dec 06, 2006

Dec 07, 2006

Dec 07, 2006

Dec 07, 2006

Dec 08, 2006

Hi, a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at http://www.mbutscher.de/snippets/difflib_d20061202.zip It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well). Comments, critique? Michael

Michael Butscher wrote: > a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at > > http://www.mbutscher.de/snippets/difflib_d20061202.zip > > It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well). > > Comments, critique? Yes: please put up a web page about it! See http://www.digitalmars.com/d/howto-promote.html

Michael Butscher wrote: > Hi, > > a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at > > http://www.mbutscher.de/snippets/difflib_d20061202.zip > > It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well). > > Comments, critique? I agree with Walter that you should throw this up on a page somewhere. I'm curious, but rarely have time to sift through sourcecode unless I'm in need of something specific - I develop using SVN 99% of the time, which does .diff output for me already. But I *am* curious about how the porting went, what the pitfalls were, and how you worked around Python idioms and tuple types. Also, I'm wondering if the D version brings any extra perks like better performance, or less/clearer code? -- - EricAnderton at yahoo

December 06, 2006

Re: Port of Python's difflib.SequenceMatcher class

Posted by Michael Butscher
in reply to Pragma

Permalink

Michael Butscher

Posted in reply to Pragma

Permalink

Pragma wrote:
> Michael Butscher wrote:
> > Hi,
> > 
> > a D port (version 0.175) of Python's difflib.SequenceMatcher class to generate diff's is available at
> > 
> >   http://www.mbutscher.de/snippets/difflib_d20061202.zip
> > 
> > It might need some cleaning up yet but the translated doctests pass (except one I couldn't make compile in D, but "in theory" it passes as well).
> > 
> > Comments, critique?
> 
> I agree with Walter that you should throw this up on a page somewhere.

At least I have mentioned it on the page

  http://www.mbutscher.de/software.html

as a "snippet" (it isn't much more, I think).



> I'm curious, but rarely have time to sift through sourcecode unless I'm in need of something specific - I develop using SVN 99% of the time, which does .diff output for me already.

I will need it later for a project written in Python (kind of personal wiki without server) to allow to store different versions of a wiki page.

When the time comes, I will add a little C interface for a DLL which mainly can create some sort of binary diff of two arbitrary byte-blocks and allows to apply the diff to the first block to create the second.


> But I *am* curious about how the porting went, what the pitfalls were, and how you worked around Python idioms and tuple types.

- The often used "self" was just translated to "this" therefore the code looks a bit weird in D, e.g.:


    void set_seq2(ST b)
    {
        if (b is this.b)
            return;
        this.b = b;
        this.matching_blocks = null;
        this.opcodes = null;
        this.fullbcount = null;
        this.chain_b();
    }


- One thing I really missed in D was the get() method for Python dictionaries with a default argument. Therefore I created inner functions like

        IndexType j2lenget(IndexType i, IndexType def)
        {
            IndexType* result = i in j2len;
            if (result)
                return *result;
            else
                return def;
        }

Probably this can be done more elegantly, but I personally think that get() should be a standard method of AAs.



- The class used only two types of tuples which had clear purposes, so they were translated into structs without much harm.



> Also, I'm wondering if the D version brings any extra perks like better performance, or less/clearer code?

I have not yet done any benchmarks, but I just assume that D is much faster.


The D code is a bit longer and IMHO a bit less readable than Python, but I'm much more used to Python than D.


Michael

Michael Butscher wrote: > - One thing I really missed in D was the get() method for Python dictionaries with a default argument. Therefore I created inner functions like > > IndexType j2lenget(IndexType i, IndexType def) > { > IndexType* result = i in j2len; > if (result) > return *result; > else > return def; > } > > Probably this can be done more elegantly, but I personally think that > get() should be a standard method of AAs. +1. Me too. If IFTI were smarter, something like this would do the trick: V get(V,K)(V[K] dict, K key, V def = V.init) { V* ptr = key in dict; return ptr? *ptr: def; } The property trick works for AA's too so taking one instance of that: char[] get(char[][int] dict, int key, char[] def = null) { char[]* ptr = key in dict; return ptr? *ptr: def; } you can do: char[][int] i2s; i2s[1] = "Hello"; i2s[5] = "There"; writefln( i2s.get(1, "yeh") ); writefln( i2s.get(2, "default") ); writefln( i2s.get(1) ); writefln( i2s.get(2) ); Too bad the template version doesn't work. D doesn't seem to be able to pick out the V and K from an associative array argument. --bb

Bill Baxter wrote: > Michael Butscher wrote: > >> - One thing I really missed in D was the get() method for Python dictionaries with a default argument. Therefore I created inner functions like >> >> IndexType j2lenget(IndexType i, IndexType def) >> { >> IndexType* result = i in j2len; >> if (result) >> return *result; >> else >> return def; >> } >> >> Probably this can be done more elegantly, but I personally think that >> get() should be a standard method of AAs. > > +1. Me too. > > If IFTI were smarter, something like this would do the trick: > > V get(V,K)(V[K] dict, K key, V def = V.init) > { > V* ptr = key in dict; > return ptr? *ptr: def; > } And what compiler do you use? The above code works perfectly. :) The following two get functions have been part of my own standard imports for quite a while and I find them very handy. T get(T,U)(T[U] aa, U key) { T* ptr = key in aa; return ptr ? *ptr : T.init; } bool get(T,U,int dummy=1)(T[U] aa, U key, out T val) { T* ptr = key in aa; if (!ptr) return false; val = *ptr; return true; } /Oskar

Bill Baxter wrote: > V get(V,K)(V[K] dict, K key, V def = V.init) > { > V* ptr = key in dict; > return ptr? *ptr: def; > } > [snip] > char[][int] i2s; > i2s[1] = "Hello"; > i2s[5] = "There"; > > writefln( i2s.get(1, "yeh") ); > writefln( i2s.get(2, "default") ); > writefln( i2s.get(1) ); > writefln( i2s.get(2) ); > > Too bad the template version doesn't work. > D doesn't seem to be able to pick out the V and K from an associative array argument. Sorry, i missed this part. The compiler is confused by not being able to tell if V should be char[] or char[3]. writefln( i2s.get(1, "yeh"[]) ); writefln( i2s.get(2, "default"[]) ); both works. So you are right. The IFTI could perhaps be improved by figuring out that both V argument types are implicitly convertible to the same type. /Oskar

Oskar Linde wrote: > Bill Baxter wrote: > >> V get(V,K)(V[K] dict, K key, V def = V.init) >> { >> V* ptr = key in dict; >> return ptr? *ptr: def; >> } >> > [snip] >> char[][int] i2s; >> i2s[1] = "Hello"; >> i2s[5] = "There"; >> >> writefln( i2s.get(1, "yeh") ); >> writefln( i2s.get(2, "default") ); >> writefln( i2s.get(1) ); >> writefln( i2s.get(2) ); >> >> Too bad the template version doesn't work. >> D doesn't seem to be able to pick out the V and K from an associative array argument. > > Sorry, i missed this part. The compiler is confused by not being able to tell if V should be char[] or char[3]. > > writefln( i2s.get(1, "yeh"[]) ); > writefln( i2s.get(2, "default"[]) ); > > both works. So you are right. The IFTI could perhaps be improved by figuring out that both V argument types are implicitly convertible to the same type. > > /Oskar Oh, ok. So I was right, but for the wrong reason. :-) The compiler message wasn't very specific about what it didn't like, just "no match" was all it was willing to divulge. These char[] char[N] conversion issues are rather annoying. --bb

Forums