September 22, 2013
On 9/22/2013 5:24 AM, Timon Gehr wrote:
>[...]

Your proposal has a serious shortcoming:

The fast path through the compiler is the one where people generate symbolic debug info. This requires file/line numbers for everything. Adding an arbitrarily expensive computation to get them makes for slow compiles.
September 22, 2013
On 9/22/2013 7:50 AM, Andrej Mitrovic wrote:
> I run into these all the time, and I have to spend half a minute
> scanning left and right trying to figure out where the damn missing
> brace went. Marginal benefit my ass.


I programmed into my editor (MicroEmacs) the F3 command. F3, when sitting on one of ()[]{}<> will find the matching character, even if nested. If it is not on one of those characters, it finds the next occurrence of that character.

As a bonus, if it is on the # character at the beginning of the line, it will find the matching #if, #ifdef, #elif, #else, or #endif :-)

I find it very, very handy.

It's based on an elithp macro written for Emacs back in the 1980's. If your editor has any ability to have user-written extensions, this is a simple one, and I highly recommend it. Here's the corresponding MicroEmacs implementation:

-----------------------------------------------------------------------
/*********************************
 * Examine line at '.'.
 * Returns:
 *	HASH_xxx
 *	0	anything else
 */

#define HASH_IF		1
#define HASH_ELIF	2
#define HASH_ELSE	3
#define HASH_ENDIF	4

static int ifhash(clp)
LINE *clp;
{
    register int len;
    register int i,h;
    static char *hash[] = {"if","elif","else","endif"};

    len = llength(clp);
    if (len < 3 || lgetc(clp,0) != '#')
	goto ret0;
    for (i = 1; ; i++)
    {
	if (i >= len)
	    goto ret0;
	if (!isspace(lgetc(clp,i)))
	    break;
    }
    for (h = 0; h < arraysize(hash); h++)
	if (len - i >= strlen(hash[h]) &&
	    memcmp(&clp->l_text[i],hash[h],strlen(hash[h])) == 0)
	    return h + 1;
ret0:
    return 0;
}


/*********************************
 * Search for the next occurence of the character at '.'.
 * If character is a (){}[]<>, search for matching bracket.
 * If '.' is on #if, #elif, or #else search for next #elif, #else or #endif.
 * If '.' is on #endif, search backwards for corresponding #if.
 */

int search_paren(f, n)
    {
    register LINE *clp;
    register int cbo;
    register int len;
    register int i;
    char chinc,chdec,ch;
    int count;
    int forward;
    int h;
    static char bracket[][2] = {{'(',')'},{'<','>'},{'[',']'},{'{','}'}};

    clp = curwp->w_dotp;		/* get pointer to current line	*/
    cbo = curwp->w_doto;		/* and offset into that line	*/
    count = 0;

    len = llength(clp);
    if (cbo >= len)
	chinc = '\n';
    else
	chinc = lgetc(clp,cbo);

    if (cbo == 0 && (h = ifhash(clp)) != 0)
    {	forward = h != HASH_ENDIF;
    }
    else
    {
	if (inword())
	{   // Search for word the cursor is currently on
	    int s;
	    do
		s = backchar(FALSE, 1);
	    while (s && inword());

	    if (s && forwchar(FALSE, 1))
	    {
		int start = curwp->w_doto;
		if (word_forw(FALSE, 1))
		{
		    cbo = curwp->w_doto;
		    int i;
		    for (i = 0; i < NPAT - 1 && start + i < cbo; i++)
		    {
			pat[i] = lgetc(clp, start + i);
		    }
		    pat[i] = 0;
		    if (Dsearchagain(f, n))
			return backchar(FALSE, 1);
		}
	    }
	    mlwrite("Not found");
	    return FALSE;
	}
	forward = TRUE;			/* forward			*/
	h = 0;
	chdec = chinc;
	for (i = 0; i < 4; i++)
	    if (bracket[i][0] == chinc)
	    {	chdec = bracket[i][1];
		break;
	    }
	for (i = 0; i < 4; i++)
	    if (bracket[i][1] == chinc)
	    {	chdec = bracket[i][0];
		forward = FALSE;	/* search backwards		*/
		break;
	    }
    }

    while (1)				/* while not end of buffer	*/
    {
	if (forward)
	{
	    if (h || cbo >= len)
	    {
		clp = lforw(clp);
		if (clp == curbp->b_linep)	/* if end of buffer	*/
		    break;
		len = llength(clp);
		cbo = 0;
	    }
	    else
		cbo++;
	}
	else /* backward */
	{
	    if (h || cbo == 0)
            {
		clp = lback(clp);
		if (clp == curbp->b_linep)
		    break;
		len = llength(clp);
		cbo = len;
            }
	    else
		--cbo;
	}

	if (h)
	{   int h2;

	    cbo = 0;
	    h2 = ifhash(clp);
	    if (h2)
	    {	if (h == HASH_ENDIF)
		{
		    if (h2 == HASH_ENDIF)
			count++;
		    else if (h2 == HASH_IF)
		    {	if (count-- == 0)
			    goto found;
		    }
		}
		else
		{   if (h2 == HASH_IF)
			count++;
		    else
		    {	if (count == 0)
			    goto found;
			if (h2 == HASH_ENDIF)
			    count--;
		    }
		}
	    }
	}
	else
	{
	    ch = (cbo < len) ? lgetc(clp,cbo) : '\n';
	    if (eq(ch,chdec))
	    {   if (count-- == 0)
		{
		    /* We've found it	*/
		found:
		    curwp->w_dotp  = clp;
		    curwp->w_doto  = cbo;
		    curwp->w_flag |= WFMOVE;
		    return (TRUE);
		}
	    }
	    else if (eq(ch,chinc))
		count++;
	}
    }
    mlwrite("Not found");
    return (FALSE);
}

September 22, 2013
On 09/22/2013 07:51 PM, Walter Bright wrote:
> ...
> The fast path through the compiler is the one where people generate
> symbolic debug info. This requires file/line numbers for everything.

Typically the file names of AST nodes that are processed consecutively coincide, and testing whether a location is in a given file is easy, so it is possible that this is not an issue at all.

Line numbers can be stored without increasing the memory footprint in comparison with the current scheme.

> Adding an arbitrarily expensive computation to get them makes for slow
> compiles.

Tracking line numbers is likely worth it. I don't believe that providing column numbers in error messages necessitates a slowdown though. (Probably we should just implement/optimize and measure.)


September 22, 2013
On 09/22/2013 05:24 PM, Peter Alexander wrote:
> On Sunday, 22 September 2013 at 12:24:03 UTC, Timon Gehr wrote:
>> You could also get rid of linnum by splitting the source file buffer
>> into lines on demand and using binary search.
>
> Does that work with #line?

Well, yes. Eg. collect their locations/line number pairs into an array while lexing and then process that during line splitting.
September 22, 2013
On 9/22/2013 11:43 AM, Timon Gehr wrote:
> Tracking line numbers is likely worth it. I don't believe that providing column
> numbers in error messages necessitates a slowdown though.

Please consider that:

     IT ISN'T JUST FOR ERROR MESSAGES

It would go in the symbolic debug info, too, where it will be required everywhere and will be right there on the fast path through the lexer/compiler.

Now consider the lexer doing a fast skip over comment text (this ranks fairly high in the profile). This operation gets a lot slower if you're also keeping track of column number. Please note that:

     COLUMN NUMBER ISN'T THE OFFSET FROM THE START OF THE LINE

because of tabs and UTF-8 sequences.

Any proposal for column number tracking must take these issues into account.

Also note that g++ and clang are hardly noted for their fast compile speeds. Also note that g++'s compile speed has dropped significantly lately - I don't know why, but it also added column number support recently (!).
September 22, 2013
Walter Bright:

> Any proposal for column number tracking must take these issues into account.
>
> Also note that g++ and clang are hardly noted for their fast compile speeds. Also note that g++'s compile speed has dropped significantly lately - I don't know why, but it also added column number support recently (!).

If someone writes a front-end patch to show column numbers we'll have to benchmark it and decide how much complex/buggy it is and how much extra memory/time it asks for.

Bye,
bearophile
September 22, 2013
On Sunday, September 22, 2013 09:36:45 eles wrote:
> On Sunday, 22 September 2013 at 00:33:32 UTC, Walter Bright wrote:
> > On 9/21/2013 5:11 PM, Sean Kelly wrote:
> > someone's entire source file is on one line, and on it goes.
> 
> OTOH, this is exactly the showcase where such a feature would be *really* useful.

True, but it's also a use case which is completely unreasonable and for which it would make no sense to even help out with, let alone optimize for. Anyone who writes source like that deserves to have problems finding where in the code an error is.

- Jonathan M Davis
September 22, 2013
On 9/22/2013 1:30 PM, bearophile wrote:
> If someone writes a front-end patch to show column numbers we'll have to
> benchmark it and decide how much complex/buggy it is and how much extra
> memory/time it asks for.

Sure.

September 22, 2013
On Saturday, 21 September 2013 at 21:17:13 UTC, Paulo Pinto wrote:
> Am 21.09.2013 23:03, schrieb eles:
>> On Saturday, 21 September 2013 at 19:47:08 UTC, Walter Bright wrote:
>>> On 9/21/2013 12:38 PM, eles wrote:
>>>> On Saturday, 21 September 2013 at 18:55:46 UTC, Walter Bright wrote:
>>>>> On 9/21/2013 11:03 AM, Maxim Fomin wrote:
> The funny thing is that this was already supported since a few years in the form of colorgcc.

Yes, but it is only recently that gcc started to improve error messages, with 4.9

OTOH, column was displayed even before, but in plain text.

For those using dmd or gdc and looking for something a bit on the line of colorgcc (albeit a coarser approach), check this piece of software:

https://github.com/dmoulding/hilite/blob/master/hilite.c
September 22, 2013
On Sunday, 22 September 2013 at 00:33:32 UTC, Walter Bright wrote:
> On 9/21/2013 5:11 PM, Sean Kelly wrote:
as well as having to have the source file
> buffer stay around throughout the compile (to compute column number you need the source in order to account for tabs & Unicode).

trade-off: print the offending line as seen in the analyzed file (after you process tabs&unicode), and refer the column wrt its beginning.

Then display a caret on the next line.

Even if not identical with the line in the original file, at least it's better than nothing.