August 19, 2004
In article <cg31on$2ega$1@digitaldaemon.com>, Jaymz says...
>
>Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy of the code on hand.  I do like the system and am using it personally, just never cared to see its code ;).  But if you say it is easier to extend, then I will believe you.  An SVN extension is definitely a possible direction in the future for this project, assuming the proof-of-concept diff/patch toolset works.

Well, I haven't done it personally, but word has it that it has an event model of some kind that was written with extensibility in mind. :)

>
>You're saying I could gather contract information from the in, out, invariant, etc. constructs that D provides and make sure the coder isn't going to violate them with the code commit?  Wow, that takes balls.

Um, thank you?  I wasn't aware of that statement being all that out there, but in retrospect it's pretty bogus.

Its probably all this going back and forth between ColdFusion (work) and D (here
in the NG).  But I am on the same page now and will restrain from making any
future "ballsy" comments. ;)

>Actually I don't think that's possible.  How are you to know at compile time if the coder is violating any contracts?  Where do you get your values to test against the contracts?  And finally, HOW do you represent a contract in an evaluative way, assuming you magically have values provided by the committed code to test against the contracts?  I don't think the contract information would be too useful in a revision control system, and it wouldn't be very language-independent either.  But it's a cool idea, nonetheless.

Okay, I see where you're coming from now.  I was thinking more at the compilation and unittest level, where testing DBC *really* comes into play. You're right: you can't use that kind of information when you're just looking at how the code is put together.

Of course there's no reason why you couldn't get a nightly or on-demand build to do some analysis when an assert or static assert fires in a unittest. After all that processing, wouldn't it be pretty easy to correlate a line number and error message with a particular change ... especially since your semantic pass will know how everthing is interrelated?

>
>Er, anyway... The real intent behind building the semantic tree of the module is to have a uniform way of accessing functions, structures, classes in order to compare them and change them easily.  I could foresee this being defined by a relatively large inter-related class hierarchy of things like expressions, statements, etc...

Gotcha.  So if the revision system can acutally "understand" the code it's processing, then it'll be less prone to screwups and possibly catch developer mistakes as well...
>
>.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...
>
>Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh God, my head...  Someone clarify myself for me.

I'll take a stab at that one.

I've always understood the semantics of a program to be derived from the syntax used.  Yes, it's almost 1-for-1 between meaining and the syntax used, especially in D.  The difference lies in how one can do some things in more than one way, like using "?" instead of "if()" and so on: both have the same semantic meaning, but the syntax is totally different.

- Pragma


August 19, 2004
On Thu, 19 Aug 2004 19:01:07 +0000 (UTC), Jaymz <jdunne4@bradley.edu> wrote:

> In article <cg2rcg$290b$1@digitaldaemon.com>, Berin Loritsch says...
>>
>> Jaymz wrote:
>>> Let's see...
>>>
>>> Upon first design, this could just be a simple stand-alone project implemented
>>> for the D language, consisting of a defined patch-format and a patch/diff-like
>>> toolset.  After all, we've got the front-end source to D already!  That could
>>> *possibly* make this simpler to implement, as it contains all the data
>>> structures necessary to parse, analyze, and possibly re-create the code with.
>>>
>>> How I see the "diff" tool working:
>>> 1)  Lex & parse the source files
>>> 2)  Create semantic tree representation of the original & new code
>>> 3)  Compare new code's semantic tree with original code's semantic tree
>>> 4)  Output a series of simple, defined operations to transform the original
>>> code's semantic tree into the new code's semantic tree.
>>>
>>> And the "patch" tool would do basically the inverse of the diff tool:
>>> 1)  Lex & parse the target source file
>>> 2)  Create semantic tree representation of the target source code
>>> 3)  Apply defined operations on the semantic tree
>>> 4)  Rebuild the target code from the modified semantic tree, possibly conforming
>>> to a given formatting standard, or using hints provided by the diff tool to
>>> recreate the formatting of the original file.
>>>
>>> This type of patch/diff toolset could handle the creation of an entire module,
>>> simply defined by "create" operations on an "empty" semantic tree.
>>>
>>> Let me know what you all think of this.  Thanks for your input, Pragma!
>>>
>>
>> If you start by getting the diff/patch utilities working properly, with
>> a format compatible with the unix diff/patch utilities, then you could
>> specify it as the diff/patch util for the CVS or SVN repos.  That would
>> be the only real level of integration you need.
>>
>> I will say this: (ir)Rational ClearCase tries to use this technique as
>> much as possible with abismal results.  From what I understand, the
>> PowerBuilder integration works decently, but the XML diff tool is worse
>> than their line diff tool (which still randomizes things).
>>
>> If you get the diff/patch utility right, I will be very impressed.  Just
>> be careful to focus only on diff/patch and not try to have a tool that
>> does a whole bunch of stuff.  KISS
>
> Unfortunately, I don't see how I could create a format compatible with the unix
> diff/patch utilities which are line-based, using a semantic tree-based
> modification scheme.  The format would have to be entirely different.  I could,
> however, make my toolset support the command-line arguments of the original
> diff/patch utilities, ignoring now senseless ones, which would be the best way
> to go.
>
> This would not be a necessarily bad thing for SVN use ... if you make the
> decision to use my diff/patch utilities from the start, as the new patch format
> wouldn't be compatible with the unix diff/patch utilities patch format.  SVN
> really doesn't care what the diff/patch format that it stores in its database
> is, AFAIK.  It simply relies on correct operation from diff/patch to do its
> work.
>
> And sorry, I haven't used any of the products to which you made mention:
> ClearCase or PowerBuilder.  Could you post an example of "abysmal results" so we
> can see what NOT to produce?  :-)  I do like to develop tools that produce
> qualiy results -- this is probably due to my delusion that I have unlimited
> project development time, and that a project is never quite "done" ;).
>
> BTW, what's w/ the KISS?  Thanks for your comments!

KISS == Keep It Simple Stupid.

And before you take any offense, none was intended (I assume), it's a somewhat common acronymm meaning simply that you should attempt not to *over* complicate things.

Regan

p.s. I think your idea is great.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
August 19, 2004
In article <cg358p$2hho$1@digitaldaemon.com>, pragma <EricAnderton at yahoo dot com> says...
>
>In article <cg31on$2ega$1@digitaldaemon.com>, Jaymz says...
>>
>>Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy of the code on hand.  I do like the system and am using it personally, just never cared to see its code ;).  But if you say it is easier to extend, then I will believe you.  An SVN extension is definitely a possible direction in the future for this project, assuming the proof-of-concept diff/patch toolset works.
>
>Well, I haven't done it personally, but word has it that it has an event model of some kind that was written with extensibility in mind. :)
>
>>
>>You're saying I could gather contract information from the in, out, invariant, etc. constructs that D provides and make sure the coder isn't going to violate them with the code commit?  Wow, that takes balls.
>
>Um, thank you?  I wasn't aware of that statement being all that out there, but in retrospect it's pretty bogus.
>
>Its probably all this going back and forth between ColdFusion (work) and D (here
>in the NG).  But I am on the same page now and will restrain from making any
>future "ballsy" comments. ;)
>
>>Actually I don't think that's possible.  How are you to know at compile time if the coder is violating any contracts?  Where do you get your values to test against the contracts?  And finally, HOW do you represent a contract in an evaluative way, assuming you magically have values provided by the committed code to test against the contracts?  I don't think the contract information would be too useful in a revision control system, and it wouldn't be very language-independent either.  But it's a cool idea, nonetheless.
>
>Okay, I see where you're coming from now.  I was thinking more at the compilation and unittest level, where testing DBC *really* comes into play. You're right: you can't use that kind of information when you're just looking at how the code is put together.
>
>Of course there's no reason why you couldn't get a nightly or on-demand build to do some analysis when an assert or static assert fires in a unittest. After all that processing, wouldn't it be pretty easy to correlate a line number and error message with a particular change ... especially since your semantic pass will know how everthing is interrelated?
>

I thought you were gonna *restrain* from making future "ballsy" comments? ;).

What type of algorithm could be developed based on a commit and a static assert firing to lead to a possible collection of offending line numbers?  Now THAT would be really interesting, and is technically feasible!


>>
>>Er, anyway... The real intent behind building the semantic tree of the module is to have a uniform way of accessing functions, structures, classes in order to compare them and change them easily.  I could foresee this being defined by a relatively large inter-related class hierarchy of things like expressions, statements, etc...
>
>Gotcha.  So if the revision system can acutally "understand" the code it's processing, then it'll be less prone to screwups and possibly catch developer mistakes as well...

Well, the project's scope certainly has escalated from a simple syntactic tree based revision control system to an intelligent learning machine that'll automagically fix your mistakes and know what you *really* want to do.  LOL. Not to pick on you, Pragma. ;)

Hey!  Why don't we just build a neural network of a few billion nodes and train it on D grammar and semantics 'til it's sick?  Oh wait, we already got a couple hundred of 'em walkin around.. Dammit.  lol.

>>
>>.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...
>>
>>Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh God, my head...  Someone clarify myself for me.
>
>I'll take a stab at that one.
>
>I've always understood the semantics of a program to be derived from the syntax used.  Yes, it's almost 1-for-1 between meaining and the syntax used, especially in D.  The difference lies in how one can do some things in more than one way, like using "?" instead of "if()" and so on: both have the same semantic meaning, but the syntax is totally different.
>
>- Pragma

<not snobby>I do realize the difference between syntax and semantics</not snobby>.  And in general, a language's syntax will strongly reflect its semantics (unless you complain of such silly things as READABILITY ... damn VB coders).

Regardless, should the revision control system be based on a /syntactic/ or /semantic/ tree representation of the code?  To contradict myself, as I always do, I don't see much benefit now in a /semantic/ tree for a simple revision control system.  :)

I do hope I've successfully confused everyone now.  The master of deception and contradiction will be back tomorrow morning.

James Dunne
August 19, 2004
Regan Heath wrote:

> On Thu, 19 Aug 2004 19:01:07 +0000 (UTC), Jaymz <jdunne4@bradley.edu> wrote:
> 
>> And sorry, I haven't used any of the products to which you made mention:
>> ClearCase or PowerBuilder.  Could you post an example of "abysmal results" so we
>> can see what NOT to produce?  :-)  I do like to develop tools that produce
>> qualiy results -- this is probably due to my delusion that I have unlimited
>> project development time, and that a project is never quite "done" ;).
>>
>> BTW, what's w/ the KISS?  Thanks for your comments!
> 
> 
> KISS == Keep It Simple Stupid.
> 
> And before you take any offense, none was intended (I assume), it's a somewhat common acronymm meaning simply that you should attempt not to *over* complicate things.
> 
> Regan
> 
> p.s. I think your idea is great.

That was its intention (how did you get this message and I didn't?).

Anyway for an example of bad integration, ClearCase's XML merge is
perfect.

If the XML is not properly formatted, the tool will choke beyond
reason (this applies to original or new XML documents).  By that
I mean the tool will attempt to treat the whole block as one element.
For example:

OLD:

<!-- parse error here -->
<element
  <embedded type="element"/>
</element>

NEW

<!-- parse error fixed -->
<element>
  <embedded type="element"/>
</element>

CONFLICT:

The element "element" conflicts with "element       <embedded type="element"/>      "
August 20, 2004
That was odd, the posts got out of order and I didn't see Regan's post initially... Anyways...

Yeah, I'm definitely a fan of KISS... Heh, punny punny... But seriously, I'd like to keep this revision control system on the ground:  simple and reliable, yet very powerful.  It seems as though after a nice evening of playing Doom 3, I have no will to be near a computer until at least tomorrow morning... Jesus Christ, that game ... wow ...  Then, this weekend is gonna be crazy, moving back into house at skool.

If anyone, in the down-time here, would like to poke thru the D front-end parser/analyzer code and possibly produce some nice D code to achieve the same effect, that'd be sweet.  If not, that's cool too, I'll just do it once I'm at skool.

On the train ride home from work today I was jottin' down some ideas on how to do a tree-diff operation.  I started writing out D code in an XML-like format just to see how I could process a given module as a syntactic tree and rearrange, add, and remove parts of it.  I came up with a quickie example XML-like tree: (some declarations are useless, but exist for example's sake)

D source module:

module addition;

import std.c.stdio;

alias int myInt;

int add(int a, int b)
out {
assert(a + b == value);
}
body {
return (a + b);
}

Corresponding XML tree definition:

<module name="addition">
<import name="std.c.stdio"/>
<alias type="d:int" name="myInt"/>
<function name="add" return="d:int">
<param type="d:int" modifier="in" name="a"/>
<param type="d:int" modifier="in" name="b"/>
<out>
<assert>
<opEquals>
<left>
<opAdd>
<left><ref-param name="a"/></left>
<right><ref-param name="b"/></right>
</opAdd>
</left>
<right>
<return-value/>
</right>
</assert>
</out>
<body>
<return>
<paren>
<opAdd>
<left><ref-param name="a"/></left>
<right><ref-param name="b"/></right>
</opAdd>
</paren>
</return>
</body>
</function>
</module>

As you can see, it's pretty much a syntactic representation of the D module.  It looks similar to a CodeDOM structure, if you've ever used that from .NET.  Of course, this is easily extendible to Java, C#, VB.Net, etc.  We could add tags like <linecomment>, <blockcomment>, <nestcomment>, <blankline>, etc to preserve spacing and comments.  All of the D operators as tags should be defined by their corresponding op* names.  Feel absolutely free to rip on my definition schema here, I just made it up without *much* thought.  Admittedly, there was *some* thought.

Now to the real meat...

Defining the operations to ADD and REMOVE sections is easy enough, just treat the tree in an in-order-traversal manner and linearly add/remove tags (start and end tags must be matched, of course).  Process the two trees just as diff processes two files, trying to match them up tag-by-tag wherever possible.  This makes a huge benefit in terms of simple changes that have a major impact on the formatting of a document.

For example, in unix diff/patch you indent a block of code, all the affected lines are included in the diff.  But when using /my/ utility, the affected start and end tags of the if-statement are created and the internal code block is left completely alone, making the diff much more compressed.  Here, we win against the unix diff utility, whereas the worst case would be a draw with the unix diff utility.

To try to complicate things, defining a MOVE operation without falling back to an ADD and REMOVE operation should be considered.  Of course, in the initial implementation it could just very well be not defined, and we could rely on ADD/REMOVE, just as the unix diff/patch utilities do.  However, in the future this could be a major source of improvement.

Let me know what you all think!

James Dunne
August 20, 2004
Jaymz wrote:
> That was odd, the posts got out of order and I didn't see Regan's post
> initially... Anyways...
> 
...
> On the train ride home from work today I was jottin' down some ideas on how to
> do a tree-diff operation.  I started writing out D code in an XML-like format
> just to see how I could process a given module as a syntactic tree and
> rearrange, add, and remove parts of it.  I came up with a quickie example
> XML-like tree: (some declarations are useless, but exist for example's sake)
> 
> D source module:
> 
> module addition;
> 
> import std.c.stdio;
> 
> alias int myInt;
> 
> int add(int a, int b)
> out {
> assert(a + b == value);
> }
> body {
> return (a + b);
> }
> 
> Corresponding XML tree definition:
> 
> <module name="addition">
> <import name="std.c.stdio"/>
> <alias type="d:int" name="myInt"/>
<snip>

This discussion reminds me of the DML idea that was mentioned a while back (I think it was brought up 2 or 3 years ago):

http://jdanielsmith.org/DML/

I don't know how similar this is what you're thinking, but it is XML-based.

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/
August 20, 2004
In article <cg3qps$2qcg$1@digitaldaemon.com>, J C Calvarese says...
>
>Jaymz wrote:
>> That was odd, the posts got out of order and I didn't see Regan's post initially... Anyways...
>> 
>...
>> On the train ride home from work today I was jottin' down some ideas on how to do a tree-diff operation.  I started writing out D code in an XML-like format just to see how I could process a given module as a syntactic tree and rearrange, add, and remove parts of it.  I came up with a quickie example XML-like tree: (some declarations are useless, but exist for example's sake)
>> 
>> D source module:
>> 
>> module addition;
>> 
>> import std.c.stdio;
>> 
>> alias int myInt;
>> 
>> int add(int a, int b)
>> out {
>> assert(a + b == value);
>> }
>> body {
>> return (a + b);
>> }
>> 
>> Corresponding XML tree definition:
>> 
>> <module name="addition">
>> <import name="std.c.stdio"/>
>> <alias type="d:int" name="myInt"/>
><snip>
>
>This discussion reminds me of the DML idea that was mentioned a while back (I think it was brought up 2 or 3 years ago):
>
>http://jdanielsmith.org/DML/
>
>I don't know how similar this is what you're thinking, but it is XML-based.
>
>-- 
>Justin (a/k/a jcc7)
>http://jcc_7.tripod.com/d/

Thanks for your comment, but I was just trying to convey the idea of the syntactic tree using an XML-like form.  I'm not going to *actually use* any form of XML or DML in the syntactic tree's definition.  I'll be keeping that all in memory in a DOM structure.

Which reminds me, I did do a little poring over the DMD front-end code last night, and it looks very clean and easy to port over to D for just the syntactic analysis.  The data structures used are pretty clear and can easily be suited to this project.

Really, now, the only design problem that I can see is how to define the patch format, possibly in a human-readable way.  I'm leaning towards an extensible binary format using chunks (like EBML does), since a text-based patch format would get rather lengthy.

Does anyone know if SVN needs the patch data to be in ASCII text format, or does it not care?

James Dunne
1 2
Next ›   Last »