Jump to page: 1 2
Thread overview
D source code revision system idea
Aug 19, 2004
jdunne4
Aug 19, 2004
pragma
Aug 19, 2004
Jaymz
Aug 19, 2004
Berin Loritsch
Aug 19, 2004
Jaymz
Aug 19, 2004
Regan Heath
Aug 19, 2004
Berin Loritsch
Aug 20, 2004
Jaymz
Aug 20, 2004
J C Calvarese
Aug 20, 2004
Jaymz
Aug 19, 2004
Ilya Minkov
Aug 19, 2004
Jaymz
Aug 19, 2004
Ilya Minkov
Aug 19, 2004
pragma
Aug 19, 2004
Jaymz
Aug 19, 2004
pragma
Aug 19, 2004
Jaymz
August 19, 2004
I'm not sure if this is the right place to throw up an idea like this, but there seem to be an astonishing number of competent developers here to offer insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond and bounce ideas back off me!

What would you think of a source code revision system that does not work on line-by-line code differences, but rather semantical differences?  This would be mainly targeted at D source code, since it lends well to this type of revision system.  The lack of a pre-processor combined with the concept of modules makes this language an ideal target.

Pros:
1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
2)  Easy merging of codebases (trunks)
3)  Easy conflict detection during merges (check function call parameters, etc.)
4)  Could spot possible compile errors
5)  Code can be regenerated to conform to a formatting standard
6)  Accepts only correct code (possible con...)

Cons:
1)  Maintaining comments and their positions in the code becomes difficult,
since they are not compilable elements
2)  Somewhat difficult implementation

A new patch/diff toolset would need to be created to accomodate this new semantic revision control system as well.

Please, let me know what you think!

James Dunne
August 19, 2004
Not a bad idea.  Would this be a stand-alone project, or something added to an existing product, like Subversion or CVS?

The only thing that comes to mind is: how would you even attempt to define semantic merging and versioning in any language?  Are you talking about making sure that merged sources compile okay, or is it something deeper than a unittest?

- Pragma

In article <cg2i31$23fu$1@digitaldaemon.com>, jdunne4@bradley.edu says...
>
>I'm not sure if this is the right place to throw up an idea like this, but there seem to be an astonishing number of competent developers here to offer insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond and bounce ideas back off me!
>
>What would you think of a source code revision system that does not work on line-by-line code differences, but rather semantical differences?  This would be mainly targeted at D source code, since it lends well to this type of revision system.  The lack of a pre-processor combined with the concept of modules makes this language an ideal target.
>
>Pros:
>1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
>2)  Easy merging of codebases (trunks)
>3)  Easy conflict detection during merges (check function call parameters, etc.)
>4)  Could spot possible compile errors
>5)  Code can be regenerated to conform to a formatting standard
>6)  Accepts only correct code (possible con...)
>
>Cons:
>1)  Maintaining comments and their positions in the code becomes difficult,
>since they are not compilable elements
>2)  Somewhat difficult implementation
>
>A new patch/diff toolset would need to be created to accomodate this new semantic revision control system as well.
>
>Please, let me know what you think!
>
>James Dunne


August 19, 2004
Let's see...

Upon first design, this could just be a simple stand-alone project implemented for the D language, consisting of a defined patch-format and a patch/diff-like toolset.  After all, we've got the front-end source to D already!  That could *possibly* make this simpler to implement, as it contains all the data structures necessary to parse, analyze, and possibly re-create the code with.

How I see the "diff" tool working:
1)  Lex & parse the source files
2)  Create semantic tree representation of the original & new code
3)  Compare new code's semantic tree with original code's semantic tree
4)  Output a series of simple, defined operations to transform the original
code's semantic tree into the new code's semantic tree.

And the "patch" tool would do basically the inverse of the diff tool:
1)  Lex & parse the target source file
2)  Create semantic tree representation of the target source code
3)  Apply defined operations on the semantic tree
4)  Rebuild the target code from the modified semantic tree, possibly conforming
to a given formatting standard, or using hints provided by the diff tool to
recreate the formatting of the original file.

This type of patch/diff toolset could handle the creation of an entire module, simply defined by "create" operations on an "empty" semantic tree.

Let me know what you all think of this.  Thanks for your input, Pragma!


In article <cg2mmu$266i$1@digitaldaemon.com>, pragma <EricAnderton at yahoo dot com> says...
>
>Not a bad idea.  Would this be a stand-alone project, or something added to an existing product, like Subversion or CVS?
>
>The only thing that comes to mind is: how would you even attempt to define semantic merging and versioning in any language?  Are you talking about making sure that merged sources compile okay, or is it something deeper than a unittest?
>
>- Pragma
>
>In article <cg2i31$23fu$1@digitaldaemon.com>, jdunne4@bradley.edu says...
>>
>>I'm not sure if this is the right place to throw up an idea like this, but there seem to be an astonishing number of competent developers here to offer insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond and bounce ideas back off me!
>>
>>What would you think of a source code revision system that does not work on line-by-line code differences, but rather semantical differences?  This would be mainly targeted at D source code, since it lends well to this type of revision system.  The lack of a pre-processor combined with the concept of modules makes this language an ideal target.
>>
>>Pros:
>>1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
>>2)  Easy merging of codebases (trunks)
>>3)  Easy conflict detection during merges (check function call parameters, etc.)
>>4)  Could spot possible compile errors
>>5)  Code can be regenerated to conform to a formatting standard
>>6)  Accepts only correct code (possible con...)
>>
>>Cons:
>>1)  Maintaining comments and their positions in the code becomes difficult,
>>since they are not compilable elements
>>2)  Somewhat difficult implementation
>>
>>A new patch/diff toolset would need to be created to accomodate this new semantic revision control system as well.
>>
>>Please, let me know what you think!
>>
>>James Dunne
>
>


August 19, 2004
Jaymz wrote:
> Let's see...
> 
> Upon first design, this could just be a simple stand-alone project implemented
> for the D language, consisting of a defined patch-format and a patch/diff-like
> toolset.  After all, we've got the front-end source to D already!  That could
> *possibly* make this simpler to implement, as it contains all the data
> structures necessary to parse, analyze, and possibly re-create the code with.
> 
> How I see the "diff" tool working:
> 1)  Lex & parse the source files
> 2)  Create semantic tree representation of the original & new code
> 3)  Compare new code's semantic tree with original code's semantic tree
> 4)  Output a series of simple, defined operations to transform the original
> code's semantic tree into the new code's semantic tree.
> 
> And the "patch" tool would do basically the inverse of the diff tool:
> 1)  Lex & parse the target source file
> 2)  Create semantic tree representation of the target source code
> 3)  Apply defined operations on the semantic tree
> 4)  Rebuild the target code from the modified semantic tree, possibly conforming
> to a given formatting standard, or using hints provided by the diff tool to
> recreate the formatting of the original file.
> 
> This type of patch/diff toolset could handle the creation of an entire module,
> simply defined by "create" operations on an "empty" semantic tree.
> 
> Let me know what you all think of this.  Thanks for your input, Pragma!
> 

If you start by getting the diff/patch utilities working properly, with
a format compatible with the unix diff/patch utilities, then you could
specify it as the diff/patch util for the CVS or SVN repos.  That would
be the only real level of integration you need.

I will say this: (ir)Rational ClearCase tries to use this technique as
much as possible with abismal results.  From what I understand, the
PowerBuilder integration works decently, but the XML diff tool is worse
than their line diff tool (which still randomizes things).

If you get the diff/patch utility right, I will be very impressed.  Just
be careful to focus only on diff/patch and not try to have a tool that
does a whole bunch of stuff.  KISS
August 19, 2004
In article <cg2rcg$290b$1@digitaldaemon.com>, Berin Loritsch says...
>
>Jaymz wrote:
>> Let's see...
>> 
>> Upon first design, this could just be a simple stand-alone project implemented for the D language, consisting of a defined patch-format and a patch/diff-like toolset.  After all, we've got the front-end source to D already!  That could *possibly* make this simpler to implement, as it contains all the data structures necessary to parse, analyze, and possibly re-create the code with.
>> 
>> How I see the "diff" tool working:
>> 1)  Lex & parse the source files
>> 2)  Create semantic tree representation of the original & new code
>> 3)  Compare new code's semantic tree with original code's semantic tree
>> 4)  Output a series of simple, defined operations to transform the original
>> code's semantic tree into the new code's semantic tree.
>> 
>> And the "patch" tool would do basically the inverse of the diff tool:
>> 1)  Lex & parse the target source file
>> 2)  Create semantic tree representation of the target source code
>> 3)  Apply defined operations on the semantic tree
>> 4)  Rebuild the target code from the modified semantic tree, possibly conforming
>> to a given formatting standard, or using hints provided by the diff tool to
>> recreate the formatting of the original file.
>> 
>> This type of patch/diff toolset could handle the creation of an entire module, simply defined by "create" operations on an "empty" semantic tree.
>> 
>> Let me know what you all think of this.  Thanks for your input, Pragma!
>> 
>
>If you start by getting the diff/patch utilities working properly, with a format compatible with the unix diff/patch utilities, then you could specify it as the diff/patch util for the CVS or SVN repos.  That would be the only real level of integration you need.
>
>I will say this: (ir)Rational ClearCase tries to use this technique as much as possible with abismal results.  From what I understand, the PowerBuilder integration works decently, but the XML diff tool is worse than their line diff tool (which still randomizes things).
>
>If you get the diff/patch utility right, I will be very impressed.  Just be careful to focus only on diff/patch and not try to have a tool that does a whole bunch of stuff.  KISS

Unfortunately, I don't see how I could create a format compatible with the unix diff/patch utilities which are line-based, using a semantic tree-based modification scheme.  The format would have to be entirely different.  I could, however, make my toolset support the command-line arguments of the original diff/patch utilities, ignoring now senseless ones, which would be the best way to go.

This would not be a necessarily bad thing for SVN use ... if you make the decision to use my diff/patch utilities from the start, as the new patch format wouldn't be compatible with the unix diff/patch utilities patch format.  SVN really doesn't care what the diff/patch format that it stores in its database is, AFAIK.  It simply relies on correct operation from diff/patch to do its work.

And sorry, I haven't used any of the products to which you made mention: ClearCase or PowerBuilder.  Could you post an example of "abysmal results" so we can see what NOT to produce?  :-)  I do like to develop tools that produce qualiy results -- this is probably due to my delusion that I have unlimited project development time, and that a project is never quite "done" ;).

BTW, what's w/ the KISS?  Thanks for your comments!

James Dunne
August 19, 2004
If it is defined over a tree, i imagine it fairly unstable. Not that it couldn't be done, but i'm somewhat sceptical, also considering the extandable syntax which might come in 2.x. Are there any good tree diff/merge tools already? Any open-source ones? If there are such tools for XML, one could define some mapping between D and XML.

If you define it over a stream of lexemes, it will be wonderfully robust, but i don't imagibe it being too useful. It will at most take care of formatting issue (different contributors prefer different formatting), but projects now use some kind of an auto-formatter with certain settings, which also provides (an admittably much cruder) solution.

What i would think of being more valuable for now, would be a documentation system and code formatter written completely in D.

-eye
August 19, 2004
In article <cg2qa4$28dh$1@digitaldaemon.com>, Jaymz says...
>
>Let's see...
>
>Upon first design, this could just be a simple stand-alone project implemented for the D language, consisting of a defined patch-format and a patch/diff-like toolset.  After all, we've got the front-end source to D already!  That could *possibly* make this simpler to implement, as it contains all the data structures necessary to parse, analyze, and possibly re-create the code with.
>
>How I see the "diff" tool working:
>1)  Lex & parse the source files
>2)  Create semantic tree representation of the original & new code
>3)  Compare new code's semantic tree with original code's semantic tree
>4)  Output a series of simple, defined operations to transform the original
>code's semantic tree into the new code's semantic tree.
>
>And the "patch" tool would do basically the inverse of the diff tool:
>1)  Lex & parse the target source file
>2)  Create semantic tree representation of the target source code
>3)  Apply defined operations on the semantic tree
>4)  Rebuild the target code from the modified semantic tree, possibly conforming
>to a given formatting standard, or using hints provided by the diff tool to
>recreate the formatting of the original file.
>
>This type of patch/diff toolset could handle the creation of an entire module, simply defined by "create" operations on an "empty" semantic tree.
>
>Let me know what you all think of this.  Thanks for your input, Pragma!


I can see the merit in a stand-alone server, but this may be the wrong way to start out.  Honestly, I think an add-on module to an existing source control tool might prove much more useful than an outright replacement.  Take dsource.org for example: an entire website dedicated to D programming that is backed on Subversion.  IMO an extension to Subversion would be far more useful (and easier to implement) to the D community as a whole.

All the same, please look at Mango over on dsource if you're going to write a stand-alone server.  The I/O and socket portions of that library may give you a good head-start.

That aside, I like where you're going with this, especially with 'creating a semantic tree' of the code.  I gather this would be some form of pseudocode or XML?  I can see this becoming useful for increasing performance if you keep the current semantic tree version on hand at all times.  That way one can compare the tree against their own local source to make sure they're not altering other portions of the application too badly (i.e. trying not to violate contracts across a whole project)

Another thing, a lot of the spirit of what you're proposing here is captured in D's in/out/body and unittest contracting system.  Have you considered incorporating these statements in particular to deepen the semantic meaning of code when you assess it?  :)

- Pragma


August 19, 2004
In article <cg2tk8$2amn$1@digitaldaemon.com>, Ilya Minkov says...
>
>If it is defined over a tree, i imagine it fairly unstable. Not that it couldn't be done, but i'm somewhat sceptical, also considering the extandable syntax which might come in 2.x. Are there any good tree diff/merge tools already? Any open-source ones? If there are such tools for XML, one could define some mapping between D and XML.
>
>If you define it over a stream of lexemes, it will be wonderfully robust, but i don't imagibe it being too useful. It will at most take care of formatting issue (different contributors prefer different formatting), but projects now use some kind of an auto-formatter with certain settings, which also provides (an admittably much cruder) solution.
>
>What i would think of being more valuable for now, would be a documentation system and code formatter written completely in D.
>
>-eye


Well, it would have to be defined with something a bit more complex than just a tree structure.  A tree-based structure, like a DOM, would be ideal.  I don't see how that'd be unstable.  It should be defined over a stream of lexemes, of course.  That's what the DOM will hold.

I'm not too keen on having this be another implementation of a source code re-formatter.  It's merely just a different way of patching source code using the assumption that we're reading SOURCE CODE, not just arbitrary lines of text. The code re-formatting comes out of the need to reproduce the code from the DOM.

A documentation system for D written entirely in D?  Just a few simple changes to Doxygen it sounds like, minus the initial work of porting to D ;).

This could be a whole different pile of monkeys if class meta-data support was in D *WINK WINK*.  I saw a few threads of discussion on meta-data, but it didn't seem  to end up anywhere.  Gr.  I don't see what the big issue is, the symbol table doesn't take up *that* much room.  I personally would like a bit more flexibility at the cost of the executable size being bumped up a few KB.

BTW, could you elaborate a bit on your skepticism?  I'm a bit confused here. Thanks!

James Dunne
August 19, 2004
In article <cg2vh6$2c5t$1@digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
<<snip>>
>
>I can see the merit in a stand-alone server, but this may be the wrong way to start out.  Honestly, I think an add-on module to an existing source control tool might prove much more useful than an outright replacement.  Take dsource.org for example: an entire website dedicated to D programming that is backed on Subversion.  IMO an extension to Subversion would be far more useful (and easier to implement) to the D community as a whole.
>
>All the same, please look at Mango over on dsource if you're going to write a stand-alone server.  The I/O and socket portions of that library may give you a good head-start.
>
>That aside, I like where you're going with this, especially with 'creating a semantic tree' of the code.  I gather this would be some form of pseudocode or XML?  I can see this becoming useful for increasing performance if you keep the current semantic tree version on hand at all times.  That way one can compare the tree against their own local source to make sure they're not altering other portions of the application too badly (i.e. trying not to violate contracts across a whole project)
>
>Another thing, a lot of the spirit of what you're proposing here is captured in D's in/out/body and unittest contracting system.  Have you considered incorporating these statements in particular to deepen the semantic meaning of code when you assess it?  :)
>
>- Pragma
>

Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy of the code on hand.  I do like the system and am using it personally, just never cared to see its code ;).  But if you say it is easier to extend, then I will believe you.  An SVN extension is definitely a possible direction in the future for this project, assuming the proof-of-concept diff/patch toolset works. After all, I didn't really have any ambition to create a new stand-alone server in the first place.

You're saying I could gather contract information from the in, out, invariant, etc. constructs that D provides and make sure the coder isn't going to violate them with the code commit?  Wow, that takes balls.

Actually I don't think that's possible.  How are you to know at compile time if the coder is violating any contracts?  Where do you get your values to test against the contracts?  And finally, HOW do you represent a contract in an evaluative way, assuming you magically have values provided by the committed code to test against the contracts?  I don't think the contract information would be too useful in a revision control system, and it wouldn't be very language-independent either.  But it's a cool idea, nonetheless.

Er, anyway... The real intent behind building the semantic tree of the module is to have a uniform way of accessing functions, structures, classes in order to compare them and change them easily.  I could foresee this being defined by a relatively large inter-related class hierarchy of things like expressions, statements, etc...

.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...

Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh God, my head...  Someone clarify myself for me.

James Dunne
August 19, 2004
Jaymz schrieb:

> Well, it would have to be defined with something a bit more complex than just a
> tree structure.  A tree-based structure, like a DOM, would be ideal.  I don't
> see how that'd be unstable.  It should be defined over a stream of lexemes, of
> course.  That's what the DOM will hold.

That might work... Although i'd like it somehow independant from most language constructs, and being able to handle new syntax constructs gracefully... more or less like a highliting editor with "levels" recognition does. Perhaps even some extensibility?

> I'm not too keen on having this be another implementation of a source code
> re-formatter.  It's merely just a different way of patching source code using
> the assumption that we're reading SOURCE CODE, not just arbitrary lines of text.
> The code re-formatting comes out of the need to reproduce the code from the DOM.

On the other hand the DIFF will not be very human-readable.

> A documentation system for D written entirely in D?  Just a few simple changes
> to Doxygen it sounds like, minus the initial work of porting to D ;).

Hr hr. :)

> This could be a whole different pile of monkeys if class meta-data support was
> in D *WINK WINK*.  I saw a few threads of discussion on meta-data, but it didn't
> seem  to end up anywhere.  Gr.  I don't see what the big issue is, the symbol
> table doesn't take up *that* much room.  I personally would like a bit more
> flexibility at the cost of the executable size being bumped up a few KB.

The metadata was already there in DLI, and was incomplete, and only the DLI verion of Phobos ever used it. The topic must be raised again in the post-1.0 era. For now, the consensus was that a parser and some custom code generators would have to do the work for the others, and relieve Walter from something unnecessary to do right now. Besides, the metadata was only intended to be used in a program itself.

I wonder whether i find some time to bake a D version of my favorite parser gen (COCO/R) and a corresponding D grammar... It would be a great help on creating tools. I started to port the Java version, but after seeing the C version i have come to dislike that for Java and will probably first hack up a C version which outputs D code, then someone else could finish porting it. I am sure that the tool can cope perfectly with D syntax, and the generated code is efficient.

> BTW, could you elaborate a bit on your skepticism?  I'm a bit confused here.
> Thanks!

I don't know, i'm totally new to the matter... That means i'm confused and skeptical. Still, are there any tree diffs out there?

One point to consider is that /me and Bill Cox has raised the question of an extentable language, where libraries could introduce new syntax, like in OpenC++ and similar. Walter promised to consider this again in the post-1.0 era.

-eye
« First   ‹ Prev
1 2