View mode: basic / threaded / horizontal-split · Log in · Help
August 07, 2010
tolf and detab
I wrote these two trivial utilities for the purpose of canonicalizing source 
code before checkins and to deal with FreeBSD's inability to deal with CRLF line 
endings, and because I can never figure out the right settings for git to make 
it do the canonicalization.

tolf - converts LF, CR, and CRLF line endings to LF.

detab - converts all tabs to the correct number of spaces. Assumes tabs are 8 
column tabs. Removes trailing whitespace from lines.

Posted here just in case someone wonders what they are.
---------------------------------------------------------
/* Replace tabs with spaces, and remove trailing whitespace from lines.
 */

import std.file;
import std.path;

int main(string[] args)
{
    foreach (f; args[1 .. $])
    {
        auto input = cast(char[]) std.file.read(f);
        auto output = filter(input);
        if (output != input)
            std.file.write(f, output);
    }
    return 0;
}


char[] filter(char[] input)
{
    char[] output;
    size_t j;

    int column;
    for (size_t i = 0; i < input.length; i++)
    {
        auto c = input[i];

        switch (c)
        {
            case '\t':
                while ((column & 7) != 7)
                {   output ~= ' ';
                    j++;
                    column++;
                }
                c = ' ';
                column++;
                break;

            case '\r':
            case '\n':
                while (j && output[j - 1] == ' ')
                    j--;
                output = output[0 .. j];
                column = 0;
                break;

            default:
                column++;
                break;
        }
        output ~= c;
        j++;
    }
    while (j && output[j - 1] == ' ')
        j--;
    return output[0 .. j];
}
-----------------------------------------------------
/* Replace line endings with LF
 */

import std.file;
import std.path;

int main(string[] args)
{
    foreach (f; args[1 .. $])
    {
        auto input = cast(char[]) std.file.read(f);
        auto output = filter(input);
        if (output != input)
            std.file.write(f, output);
    }
    return 0;
}


char[] filter(char[] input)
{
    char[] output;
    size_t j;

    for (size_t i = 0; i < input.length; i++)
    {
        auto c = input[i];

        switch (c)
        {
            case '\r':
                c = '\n';
                break;

            case '\n':
                if (i && input[i - 1] == '\r')
                    continue;
                break;

            case 0:
                continue;

            default:
                break;
        }
        output ~= c;
        j++;
    }
    return output[0 .. j];
}
------------------------------------------
August 07, 2010
Re: tolf and detab
On 08/06/2010 08:34 PM, Walter Bright wrote:
> I wrote these two trivial utilities for the purpose of canonicalizing
> source code before checkins and to deal with FreeBSD's inability to deal
> with CRLF line endings, and because I can never figure out the right
> settings for git to make it do the canonicalization.
>
> tolf - converts LF, CR, and CRLF line endings to LF.
>
> detab - converts all tabs to the correct number of spaces. Assumes tabs
> are 8 column tabs. Removes trailing whitespace from lines.
>
> Posted here just in case someone wonders what they are.
[snip]

Nice, though they don't account for multiline string literals.

A good exercise would be rewriting these tools in idiomatic D2 and 
assess the differences.


Andrei
August 07, 2010
Re: tolf and detab
Or improve your google-fu by finding some existing tools that do the job
right. :)

I'm pretty sure Uncrustify is good at most of these issues, not to mention
it's a very nice source-code "prettifier/indenter". There's a front-end
called UniversalIndentGUI, which has about a dozen integrated versions of
source-code prettifiers (including uncrustify, and for many languages). It
has varios settings on the left, and togglable *Live* preview mode which you
can view on the right.

I invite you guys to try it out sometime:

http://universalindent.sourceforge.net/

(+ you can save different settings which is neat when you're coding for
different projects that have different "code design & look" standards)

On Sat, Aug 7, 2010 at 3:50 AM, Andrei Alexandrescu <
SeeWebsiteForEmail@erdani.org> wrote:

> On 08/06/2010 08:34 PM, Walter Bright wrote:
>
>> I wrote these two trivial utilities for the purpose of canonicalizing
>> source code before checkins and to deal with FreeBSD's inability to deal
>> with CRLF line endings, and because I can never figure out the right
>> settings for git to make it do the canonicalization.
>>
>> tolf - converts LF, CR, and CRLF line endings to LF.
>>
>> detab - converts all tabs to the correct number of spaces. Assumes tabs
>> are 8 column tabs. Removes trailing whitespace from lines.
>>
>> Posted here just in case someone wonders what they are.
>>
> [snip]
>
> Nice, though they don't account for multiline string literals.
>
> A good exercise would be rewriting these tools in idiomatic D2 and assess
> the differences.
>
>
> Andrei
>
August 07, 2010
Re: tolf and detab
Andrej Mitrovic wrote:
> Or improve your google-fu by finding some existing tools that do the job 
> right. :)

Sure, but I suspect it's faster to write the utility! After all, they are trivial.
August 07, 2010
Re: tolf and detab
Andrei Alexandrescu wrote:
> A good exercise would be rewriting these tools in idiomatic D2 and 
> assess the differences.

Some D2-fu would be cool. Any takers?
August 07, 2010
Re: tolf and detab
What does idiomatic D means?

On Fri, 06 Aug 2010 20:50:52 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> On 08/06/2010 08:34 PM, Walter Bright wrote:
>> I wrote these two trivial utilities for the purpose of canonicalizing
>> source code before checkins and to deal with FreeBSD's inability to deal
>> with CRLF line endings, and because I can never figure out the right
>> settings for git to make it do the canonicalization.
>>
>> tolf - converts LF, CR, and CRLF line endings to LF.
>>
>> detab - converts all tabs to the correct number of spaces. Assumes tabs
>> are 8 column tabs. Removes trailing whitespace from lines.
>>
>> Posted here just in case someone wonders what they are.
> [snip]
>
> Nice, though they don't account for multiline string literals.
>
> A good exercise would be rewriting these tools in idiomatic D2 and  
> assess the differences.
>
>
> Andrei


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
August 07, 2010
Re: tolf and detab
On 08/06/2010 09:33 PM, Yao G. wrote:
> What does idiomatic D means?

At a quick glance - I'm thinking two elements would be using string and 
possibly byLine.

Andrei
August 07, 2010
Re: tolf and detab
"Yao G." <nospamyao@gmail.com> wrote in message 
news:op.vg1qpcjfxeuu2f@miroslava.gateway.2wire.net...
>
> What does idiomatic D means?
>

"idiomatic D" -> "In typical D style"
August 08, 2010
Re: tolf and detab
On Friday 06 August 2010 18:50:52 Andrei Alexandrescu wrote:
> On 08/06/2010 08:34 PM, Walter Bright wrote:
> > I wrote these two trivial utilities for the purpose of canonicalizing
> > source code before checkins and to deal with FreeBSD's inability to deal
> > with CRLF line endings, and because I can never figure out the right
> > settings for git to make it do the canonicalization.
> > 
> > tolf - converts LF, CR, and CRLF line endings to LF.
> > 
> > detab - converts all tabs to the correct number of spaces. Assumes tabs
> > are 8 column tabs. Removes trailing whitespace from lines.
> > 
> > Posted here just in case someone wonders what they are.
> 
> [snip]
> 
> Nice, though they don't account for multiline string literals.
> 
> A good exercise would be rewriting these tools in idiomatic D2 and
> assess the differences.
> 
> 
> Andrei

I didn't try and worry about multiline string literals, but here are my more 
idiomatic solutions:



detab:

/* Replace tabs with spaces, and remove trailing whitespace from lines.
 */

import std.conv;
import std.file;
import std.stdio;
import std.string;

void main(string[] args)
{
   const int tabSize = to!int(args[1]);
   foreach(f; args[2 .. $])
       removeTabs(tabSize, f);
}


void removeTabs(int tabSize, string fileName)
{
   auto file = File(fileName);
   string[] output;

   foreach(line; file.byLine())
   {
       int lastTab = 0;

       while(lastTab != -1)
       {
           const int tab = line.indexOf('\t');

           if(tab == -1)
               break;

           const int numSpaces = tabSize - tab % tabSize;

           line = line[0 .. tab] ~ repeat(" ", numSpaces) ~ line[tab + 1 .. $];

           lastTab = tab + numSpaces;
       }

       output ~= line.idup;
   }

   std.file.write(fileName, output.join("\n"));
}

-------------------------------------------

The three differences between mine and Walter's are that mine takes the tab size 
as the first argumen,t it doesn't put a newline at the end of the file, and it 
writes the file even if it changed (you could test for that, but when using 
byLine(), it's a bit harder). Interestingly enough, from the few tests that I 
ran, mine seems to be somewhat faster. I also happen to think that the code is 
clearer (it's certainly shorter), though that might be up for debate.

-------------------------------------------



tolf:

/* Replace line endings with LF
 */

import std.file;
import std.string;

void main(string[] args)
{
   foreach(f; args[1 .. $])
       fixEndLines(f);
}

void fixEndLines(string fileName)
{
   auto fileStr = std.file.readText(fileName);
   auto result = fileStr.replace("\r\n", "\n").replace("\r", "\n");

   std.file.write(fileName, result);
}

-------------------------------------------

This version is ludicrously simple. And it was also faster than Walter's in the 
few tests that I ran. In either case, I think that it is definitely clearer code.


I would have thought that being more idomatic would have resulted in slower code 
than what Walter did, but interestingly enough, both programs are faster with my 
code. They might take more memory though. I'm not quite sure how to check that. 
In any cases, you wanted some idiomatic D2 solutions, so there you go.

- Jonathan M Davis
August 08, 2010
Re: tolf and detab
Jonathan M Davis:
> I would have thought that being more idomatic would have resulted in slower code 
> than what Walter did, but interestingly enough, both programs are faster with my 
> code. They might take more memory though. I'm not quite sure how to check that. 
> In any cases, you wanted some idiomatic D2 solutions, so there you go.

Your code looks better.

My (probably controversial) opinion on this is that the idiomatic D solution for those text "scripts" is to use a scripting language, as Python :-)

In this case a Python version is more readable, shorter and probably faster too because reading the lines of a _normal_ text file is faster in Python compared to D (because Python is more optimized for such purposes. I can show benchmarks on request).

On the other hand D2 is in its debugging phase, so it's good to use it even for purposes it's not the best language for, to catch bugs or performance bugs. So I think it's positive to write such scripts in D2, even if in a real-world setting I want to use Python to write them.

Bye,
bearophile
« First   ‹ Prev
1 2 3 4 5
Top | Discussion index | About this forum | D home