Thread overview
Re: Volunteer for research project?
Feb 22, 2013
H. S. Teoh
Feb 22, 2013
Jacob Carlborg
Feb 22, 2013
Andrej Mitrovic
Feb 22, 2013
H. S. Teoh
Apr 03, 2013
jenniferlopes102
Feb 22, 2013
H. S. Teoh
Feb 22, 2013
Maxim Fomin
Feb 22, 2013
Brad Roberts
February 22, 2013
On Wed, Feb 20, 2013 at 11:02:51PM -0800, Brad Roberts wrote:
> Would any of you be interested in helping out (read that as "doing") a research / data mining project for us?  I'd love to take all of the regressions this year (or for the last year, or whatever period of time can be reasonably accomplished) and track them back to which commit introduced each of them (already done for some of them).  From there, I'd like to see what sort of correlations can be found.  Is there a particular area of code that's responsible for them.  Is there a particular feature (spread across a lot of files, maybe) that's responsible.  Etc.
> 
> Maybe it's all over the map.  Maybe it will highlight one or a few areas to take a harder look at.
> 
> Anyone interested?
[...]

I'm surprised nobody offered to help, seeing as there are many complaints about DMD bugs.

Well, I'd love to help, but I can't promise I'll have the time to do a lot.  But I'm reasonably comfortable with running git bisect to isolate the offending commits; so if you'll send me a list of issues, I could try to work through it at whatever pace I can manage and send you the results. I hope it won't be just me, though, 'cos I probably won't have the time to do a lot, but if there's a team of people working on it, I'll love to chip in.

Don't know how much help I'll be in the correlation part, though. But I suppose that will have to come from comparing offending commits to look for patterns.


T

-- 
"Real programmers can write assembly code in any language. :-)" -- Larry Wall
February 22, 2013
On Fri, Feb 22, 2013 at 06:51:53AM +0100, Maxim Fomin wrote:
> On Thursday, 21 February 2013 at 07:03:08 UTC, Brad Roberts wrote:
> >Would any of you be interested in helping out (read that as "doing") a research / data mining project for us?  I'd love to take all of the regressions this year (or for the last year, or whatever period of time can be reasonably accomplished) and track them back to which commit introduced each of them (already done for some of them).  From there, I'd like to see what sort of correlations can be found.  Is there a particular area of code that's responsible for them.  Is there a particular feature (spread across a lot of files, maybe) that's responsible.  Etc.
> >
> >Maybe it's all over the map.  Maybe it will highlight one or a few areas to take a harder look at.
> >
> >Anyone interested?
> >
> >Thanks,
> >Brad
> 
> It sounds interesting, but what are you expecting to found? And how much are you sure you can found something? I would expect that often code which fixes some feature breaks the same feature in another aspect of functioning which is quite obvious. Sometimes one code relies implicitly on functioning of other code, so when you change the the latter, the former stops working correctly. You provide example with spreading across several files - how does knowing this helps in reducing regressions?

I would think he's referring to issues that are filed in the bugtracker. Obviously, we have no way of knowing if a code change broke something if nobody found any bug afterwards!

So I'm thinking it's probably a matter of going through the regression bugs in the bugtracker, and making test cases to reproduce them, and then use git bisect to figure out which commit introduced the problem.


T

-- 
Public parking: euphemism for paid parking. -- Flora
February 22, 2013
On Friday, 22 February 2013 at 06:02:20 UTC, H. S. Teoh wrote:
> I would think he's referring to issues that are filed in the bugtracker.
> Obviously, we have no way of knowing if a code change broke something if
> nobody found any bug afterwards!

Yes, it is obvious that he refers to bugzilla issues.

> So I'm thinking it's probably a matter of going through the regression
> bugs in the bugtracker, and making test cases to reproduce them, and
> then use git bisect to figure out which commit introduced the problem.
>
>
> T

This is also obvious. The question is what to do with such information next, how to analyze it and interpret the results.

For example http://d.puremagic.com/issues/show_bug.cgi?id=9406 (there is commit which introduced regression). What can you infer from fixed regressions  (http://d.puremagic.com/issues/buglist.cgi?query_format=advanced&bug_severity=regression&bug_status=RESOLVED&resolution=FIXED) which can be useful in fighting against non-closed ones?

P.S. There is something wrong either with forum or with your answering. The discussion in mailbox is single piece, but in forum it is splitted into two threads. Posting message in one thread in answering to reply in another is strange. Do you use email for answering or forum?
February 22, 2013
On 2/21/2013 10:00 PM, H. S. Teoh wrote:
> On Fri, Feb 22, 2013 at 06:51:53AM +0100, Maxim Fomin wrote:
>> On Thursday, 21 February 2013 at 07:03:08 UTC, Brad Roberts wrote:
>>> Would any of you be interested in helping out (read that as "doing") a research / data mining project for us?  I'd love to take all of the regressions this year (or for the last year, or whatever period of time can be reasonably accomplished) and track them back to which commit introduced each of them (already done for some of them).  From there, I'd like to see what sort of correlations can be found.  Is there a particular area of code that's responsible for them.  Is there a particular feature (spread across a lot of files, maybe) that's responsible.  Etc.
>>>
>>> Maybe it's all over the map.  Maybe it will highlight one or a few areas to take a harder look at.
>>>
>>> Anyone interested?
>>>
>>> Thanks,
>>> Brad
>>
>> It sounds interesting, but what are you expecting to found? And how much are you sure you can found something? I would expect that often code which fixes some feature breaks the same feature in another aspect of functioning which is quite obvious. Sometimes one code relies implicitly on functioning of other code, so when you change the the latter, the former stops working correctly. You provide example with spreading across several files - how does knowing this helps in reducing regressions?
> 
> I would think he's referring to issues that are filed in the bugtracker. Obviously, we have no way of knowing if a code change broke something if nobody found any bug afterwards!
> 
> So I'm thinking it's probably a matter of going through the regression bugs in the bugtracker, and making test cases to reproduce them, and then use git bisect to figure out which commit introduced the problem.
> 
> 
> T
> 

Pretty much that.  (Nearly) every bug comes with a test case already.  The part that will be work is taking that test case and finding the exact commit that broke it.  By definition, a regression once worked and something changed that broke it.  My hope is that one or more people can spend some time going through each regression report in bugzilla and tracking down the exact commit for each.

What will be uncovered by the effort?  Who knows.  It's better to not try to anticipate or predict since that can bias the analysis.  The entire point of the exercise is to find out.  If there is one or move obvious or detectible clusters, that gives us some interesting data.  It might well point out a part of the code that's particularly sensitive to change.  Or is very poorly covered by the test suite.  Or is flawed in some other way.  Regardless, if there are clusters, it's worth some study and pondering to consider what can be done to make it/them NOT hot beds of regressions.

It's a research project.  It might turn out to yield nothing useful.  That's certainly a risk.  I suspect it won't turn out to be fruitless.

To seed the effort, here's all the regression bugs that have changed since the beginning of the year:

http://d.puremagic.com/issues/buglist.cgi?chfieldto=Now&query_format=advanced&chfieldfrom=2013-01-01&bug_severity=regression&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED


February 22, 2013
On 2013-02-22 06:09, H. S. Teoh wrote:

> I'm surprised nobody offered to help, seeing as there are many
> complaints about DMD bugs.
>
> Well, I'd love to help, but I can't promise I'll have the time to do a
> lot.  But I'm reasonably comfortable with running git bisect to isolate
> the offending commits; so if you'll send me a list of issues, I could
> try to work through it at whatever pace I can manage and send you the
> results. I hope it won't be just me, though, 'cos I probably won't have
> the time to do a lot, but if there's a team of people working on it,
> I'll love to chip in.
>
> Don't know how much help I'll be in the correlation part, though. But I
> suppose that will have to come from comparing offending commits to look
> for patterns.

You do know that "git bisect" has a sub command for automatically running a test suite/command for each step when bisecting. This way a "git bisect" can be handle completely automatically.

http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html#_bisect_run

Perhaps we can setup something that uses this for automatically finding the breaking commits.

-- 
/Jacob Carlborg
February 22, 2013
On 2/22/13, Jacob Carlborg <doob@me.com> wrote:
> Perhaps we can setup something that uses this for automatically finding the breaking commits.

We would also need this script to automatically checkout commits for DMD+Druntime+Phobos which are known to work together. Sometimes you're reducing a Phobos bug, but an older version of Phobos might not compile with a newer version of the compiler, so the 3 components need to be kept relatively in sync (commits of all 3 components should be as close to the date of a specific commit).
February 22, 2013
On Fri, Feb 22, 2013 at 06:40:34PM +0100, Andrej Mitrovic wrote:
> On 2/22/13, Jacob Carlborg <doob@me.com> wrote:
> > Perhaps we can setup something that uses this for automatically finding the breaking commits.
> 
> We would also need this script to automatically checkout commits for DMD+Druntime+Phobos which are known to work together. Sometimes you're reducing a Phobos bug, but an older version of Phobos might not compile with a newer version of the compiler, so the 3 components need to be kept relatively in sync (commits of all 3 components should be as close to the date of a specific commit).

Actually, a script would be necessary in any case, because otherwise you'd have to repeatedly do cd dmd; git checkout ...; cd ../druntime; git checkout ...; cd ../phobos; git checkout ...; then rebuild each one, then test, etc..

A script for checking out druntime/phobos at some particular date (based on what git bisect selected in dmd, say), would ease a lot of this tedium.


T

-- 
"I'm not childish; I'm just in touch with the child within!" - RL
April 03, 2013
I am moving to the region of the Bay in the fall. I am at the Community College, is doing very well, and I look for the summer / fall positions search. I applied to many summer internships, and they all responded that I should build the volunteer in the search for a certain time experience <a href":http://students-guidance.blogspot.com/">good writing skills</a>