August 07, 2010
I wrote these two trivial utilities for the purpose of canonicalizing source code before checkins and to deal with FreeBSD's inability to deal with CRLF line endings, and because I can never figure out the right settings for git to make it do the canonicalization.

tolf - converts LF, CR, and CRLF line endings to LF.

detab - converts all tabs to the correct number of spaces. Assumes tabs are 8 column tabs. Removes trailing whitespace from lines.

Posted here just in case someone wonders what they are.
---------------------------------------------------------
/* Replace tabs with spaces, and remove trailing whitespace from lines.
 */

import std.file;
import std.path;

int main(string[] args)
{
    foreach (f; args[1 .. $])
    {
        auto input = cast(char[]) std.file.read(f);
        auto output = filter(input);
        if (output != input)
            std.file.write(f, output);
    }
    return 0;
}


char[] filter(char[] input)
{
    char[] output;
    size_t j;

    int column;
    for (size_t i = 0; i < input.length; i++)
    {
        auto c = input[i];

        switch (c)
        {
            case '\t':
                while ((column & 7) != 7)
                {   output ~= ' ';
                    j++;
                    column++;
                }
                c = ' ';
                column++;
                break;

            case '\r':
            case '\n':
                while (j && output[j - 1] == ' ')
                    j--;
                output = output[0 .. j];
                column = 0;
                break;

            default:
                column++;
                break;
        }
        output ~= c;
        j++;
    }
    while (j && output[j - 1] == ' ')
        j--;
    return output[0 .. j];
}
-----------------------------------------------------
/* Replace line endings with LF
 */

import std.file;
import std.path;

int main(string[] args)
{
    foreach (f; args[1 .. $])
    {
        auto input = cast(char[]) std.file.read(f);
        auto output = filter(input);
        if (output != input)
            std.file.write(f, output);
    }
    return 0;
}


char[] filter(char[] input)
{
    char[] output;
    size_t j;

    for (size_t i = 0; i < input.length; i++)
    {
        auto c = input[i];

        switch (c)
        {
            case '\r':
                c = '\n';
                break;

            case '\n':
                if (i && input[i - 1] == '\r')
                    continue;
                break;

            case 0:
                continue;

            default:
                break;
        }
        output ~= c;
        j++;
    }
    return output[0 .. j];
}
------------------------------------------
August 07, 2010
On 08/06/2010 08:34 PM, Walter Bright wrote:
> I wrote these two trivial utilities for the purpose of canonicalizing
> source code before checkins and to deal with FreeBSD's inability to deal
> with CRLF line endings, and because I can never figure out the right
> settings for git to make it do the canonicalization.
>
> tolf - converts LF, CR, and CRLF line endings to LF.
>
> detab - converts all tabs to the correct number of spaces. Assumes tabs
> are 8 column tabs. Removes trailing whitespace from lines.
>
> Posted here just in case someone wonders what they are.
[snip]

Nice, though they don't account for multiline string literals.

A good exercise would be rewriting these tools in idiomatic D2 and assess the differences.


Andrei
August 07, 2010
Or improve your google-fu by finding some existing tools that do the job right. :)

I'm pretty sure Uncrustify is good at most of these issues, not to mention it's a very nice source-code "prettifier/indenter". There's a front-end called UniversalIndentGUI, which has about a dozen integrated versions of source-code prettifiers (including uncrustify, and for many languages). It has varios settings on the left, and togglable *Live* preview mode which you can view on the right.

I invite you guys to try it out sometime:

http://universalindent.sourceforge.net/

(+ you can save different settings which is neat when you're coding for different projects that have different "code design & look" standards)

On Sat, Aug 7, 2010 at 3:50 AM, Andrei Alexandrescu < SeeWebsiteForEmail@erdani.org> wrote:

> On 08/06/2010 08:34 PM, Walter Bright wrote:
>
>> I wrote these two trivial utilities for the purpose of canonicalizing source code before checkins and to deal with FreeBSD's inability to deal with CRLF line endings, and because I can never figure out the right settings for git to make it do the canonicalization.
>>
>> tolf - converts LF, CR, and CRLF line endings to LF.
>>
>> detab - converts all tabs to the correct number of spaces. Assumes tabs are 8 column tabs. Removes trailing whitespace from lines.
>>
>> Posted here just in case someone wonders what they are.
>>
> [snip]
>
> Nice, though they don't account for multiline string literals.
>
> A good exercise would be rewriting these tools in idiomatic D2 and assess the differences.
>
>
> Andrei
>


August 07, 2010
Andrej Mitrovic wrote:
> Or improve your google-fu by finding some existing tools that do the job right. :)

Sure, but I suspect it's faster to write the utility! After all, they are trivial.
August 07, 2010
Andrei Alexandrescu wrote:
> A good exercise would be rewriting these tools in idiomatic D2 and assess the differences.

Some D2-fu would be cool. Any takers?
August 07, 2010
What does idiomatic D means?

On Fri, 06 Aug 2010 20:50:52 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 08/06/2010 08:34 PM, Walter Bright wrote:
>> I wrote these two trivial utilities for the purpose of canonicalizing
>> source code before checkins and to deal with FreeBSD's inability to deal
>> with CRLF line endings, and because I can never figure out the right
>> settings for git to make it do the canonicalization.
>>
>> tolf - converts LF, CR, and CRLF line endings to LF.
>>
>> detab - converts all tabs to the correct number of spaces. Assumes tabs
>> are 8 column tabs. Removes trailing whitespace from lines.
>>
>> Posted here just in case someone wonders what they are.
> [snip]
>
> Nice, though they don't account for multiline string literals.
>
> A good exercise would be rewriting these tools in idiomatic D2 and assess the differences.
>
>
> Andrei


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
August 07, 2010
On 08/06/2010 09:33 PM, Yao G. wrote:
> What does idiomatic D means?

At a quick glance - I'm thinking two elements would be using string and possibly byLine.

Andrei
August 07, 2010
"Yao G." <nospamyao@gmail.com> wrote in message news:op.vg1qpcjfxeuu2f@miroslava.gateway.2wire.net...
>
> What does idiomatic D means?
>

"idiomatic D" -> "In typical D style"


August 08, 2010
On Friday 06 August 2010 18:50:52 Andrei Alexandrescu wrote:
> On 08/06/2010 08:34 PM, Walter Bright wrote:
> > I wrote these two trivial utilities for the purpose of canonicalizing source code before checkins and to deal with FreeBSD's inability to deal with CRLF line endings, and because I can never figure out the right settings for git to make it do the canonicalization.
> > 
> > tolf - converts LF, CR, and CRLF line endings to LF.
> > 
> > detab - converts all tabs to the correct number of spaces. Assumes tabs are 8 column tabs. Removes trailing whitespace from lines.
> > 
> > Posted here just in case someone wonders what they are.
> 
> [snip]
> 
> Nice, though they don't account for multiline string literals.
> 
> A good exercise would be rewriting these tools in idiomatic D2 and assess the differences.
> 
> 
> Andrei

I didn't try and worry about multiline string literals, but here are my more idiomatic solutions:



detab:

/* Replace tabs with spaces, and remove trailing whitespace from lines.
  */

import std.conv;
import std.file;
import std.stdio;
import std.string;

void main(string[] args)
{
    const int tabSize = to!int(args[1]);
    foreach(f; args[2 .. $])
        removeTabs(tabSize, f);
}


void removeTabs(int tabSize, string fileName)
{
    auto file = File(fileName);
    string[] output;

    foreach(line; file.byLine())
    {
        int lastTab = 0;

        while(lastTab != -1)
        {
            const int tab = line.indexOf('\t');

            if(tab == -1)
                break;

            const int numSpaces = tabSize - tab % tabSize;

            line = line[0 .. tab] ~ repeat(" ", numSpaces) ~ line[tab + 1 .. $];

            lastTab = tab + numSpaces;
        }

        output ~= line.idup;
    }

    std.file.write(fileName, output.join("\n"));
}

-------------------------------------------

The three differences between mine and Walter's are that mine takes the tab size as the first argumen,t it doesn't put a newline at the end of the file, and it writes the file even if it changed (you could test for that, but when using byLine(), it's a bit harder). Interestingly enough, from the few tests that I ran, mine seems to be somewhat faster. I also happen to think that the code is clearer (it's certainly shorter), though that might be up for debate.

-------------------------------------------



tolf:

/* Replace line endings with LF
  */

import std.file;
import std.string;

void main(string[] args)
{
    foreach(f; args[1 .. $])
        fixEndLines(f);
}

void fixEndLines(string fileName)
{
    auto fileStr = std.file.readText(fileName);
    auto result = fileStr.replace("\r\n", "\n").replace("\r", "\n");

    std.file.write(fileName, result);
}

-------------------------------------------

This version is ludicrously simple. And it was also faster than Walter's in the few tests that I ran. In either case, I think that it is definitely clearer code.


I would have thought that being more idomatic would have resulted in slower code than what Walter did, but interestingly enough, both programs are faster with my code. They might take more memory though. I'm not quite sure how to check that. In any cases, you wanted some idiomatic D2 solutions, so there you go.

- Jonathan M Davis
August 08, 2010
Jonathan M Davis:
> I would have thought that being more idomatic would have resulted in slower code than what Walter did, but interestingly enough, both programs are faster with my code. They might take more memory though. I'm not quite sure how to check that. In any cases, you wanted some idiomatic D2 solutions, so there you go.

Your code looks better.

My (probably controversial) opinion on this is that the idiomatic D solution for those text "scripts" is to use a scripting language, as Python :-)

In this case a Python version is more readable, shorter and probably faster too because reading the lines of a _normal_ text file is faster in Python compared to D (because Python is more optimized for such purposes. I can show benchmarks on request).

On the other hand D2 is in its debugging phase, so it's good to use it even for purposes it's not the best language for, to catch bugs or performance bugs. So I think it's positive to write such scripts in D2, even if in a real-world setting I want to use Python to write them.

Bye,
bearophile
« First   ‹ Prev
1 2 3 4 5 6
Top | Discussion index | About this forum | D home