Thread overview
[OT] Is this more readable, or just way too verbose?
Aug 09, 2010
simendsjo
Aug 09, 2010
bearophile
Aug 10, 2010
Lutger
Aug 10, 2010
simendsjo
Aug 10, 2010
Lutger
August 09, 2010
I took splitlines from std.string, which is a simple, short method.

S[] splitlines(S)(S s)
{
    size_t istart;
    auto result = Appender!(S[])();

    foreach (i; 0 .. s.length)
    {
        immutable c = s[i];
        if (c == '\r' || c == '\n')
        {
            result.put(s[istart .. i]);
            istart = i + 1;
            if (c == '\r' && i + 1 < s.length && s[i + 1] == '\n')
            {
                i++;
                istart++;
            }
        }
    }
    if (istart != s.length)
    {
        result.put(s[istart .. $]);
    }

    return result.data;
}


I guess it takes less than 30 seconds to fully understand this method. Then I rap.. I mean refactored it to this:

S[] mysplitlines(S)(S s)
{
    const CR = '\r';
    const LF = '\n';

    size_t istart;

    auto result = Appender!(S[])();

    foreach (i; 0 .. s.length)
    {
        immutable c = s[i];

        immutable isCR = (c == CR);
        immutable isLF = (c == LF);

        if (isCR || isLF)
        {
            auto line = s[istart .. i];
            result.put(line);

            istart = i + 1;

            // Might be CRLF. In that case we need to consume LF too
            if (isCR)
            {
                immutable hasMoreCharacters = (i + 1 < s.length);
                immutable nextIsLF = hasMoreCharacters && (s[i + 1] == LF);
                immutable isCRLF = isCR && nextIsLF;

                if (isCRLF)
                {
                    i++;
                    istart++;
                }
            }
        }
    }

    immutable lineNotEmpty = (istart != s.length);
    if (lineNotEmpty)
    {
        auto lastLine = s[istart .. $];
        result.put(lastLine);
    }

    return result.data;
}


Yes, I'm reading some books now and don't have much experience :)

It went from a small 26 line, very readable function to 48 lines (85% increase!) with many more temporary variables..

So... Do you think this kind of code is (more) readable, or just way too verbose and doing more harm than good?

And will the compiler generate slower code, or should the optimizer be able to inline the temporaries?
August 09, 2010
simendsjo:
> And will the compiler generate slower code, or should the optimizer be able to inline the temporaries?

Write a good enough benchmark and run it.

Bye,
bearophile
August 10, 2010
simendsjo wrote:

> I took splitlines from std.string, which is a simple, short method.
> 
> S[] splitlines(S)(S s)
> {
>      size_t istart;
>      auto result = Appender!(S[])();
> 
>      foreach (i; 0 .. s.length)
>      {
>          immutable c = s[i];
>          if (c == '\r' || c == '\n')
>          {
>              result.put(s[istart .. i]);
>              istart = i + 1;
>              if (c == '\r' && i + 1 < s.length && s[i + 1] == '\n')
>              {
>                  i++;
>                  istart++;
>              }
>          }
>      }
>      if (istart != s.length)
>      {
>          result.put(s[istart .. $]);
>      }
> 
>      return result.data;
> }
> 
> 
> I guess it takes less than 30 seconds to fully understand this method. Then I rap.. I mean refactored it to this:
> 
> S[] mysplitlines(S)(S s)
> {
>      const CR = '\r';
>      const LF = '\n';
> 
>      size_t istart;
> 
>      auto result = Appender!(S[])();
> 
>      foreach (i; 0 .. s.length)
>      {
>          immutable c = s[i];
> 
>          immutable isCR = (c == CR);
>          immutable isLF = (c == LF);
> 
>          if (isCR || isLF)
>          {
>              auto line = s[istart .. i];
>              result.put(line);
> 
>              istart = i + 1;
> 
>              // Might be CRLF. In that case we need to consume LF too
>              if (isCR)
>              {
>                  immutable hasMoreCharacters = (i + 1 < s.length);
>                  immutable nextIsLF = hasMoreCharacters && (s[i + 1] == LF);
>                  immutable isCRLF = isCR && nextIsLF;
> 
>                  if (isCRLF)
>                  {
>                      i++;
>                      istart++;
>                  }
>              }
>          }
>      }
> 
>      immutable lineNotEmpty = (istart != s.length);
>      if (lineNotEmpty)
>      {
>          auto lastLine = s[istart .. $];
>          result.put(lastLine);
>      }
> 
>      return result.data;
> }
> 
> 
> Yes, I'm reading some books now and don't have much experience :)
> 
> It went from a small 26 line, very readable function to 48 lines (85% increase!) with many more temporary variables..
> 
> So... Do you think this kind of code is (more) readable, or just way too verbose and doing more harm than good?

The CR and LF constants are a bit too much, probably because they don't really abstract over the literals which I can actually parse faster. The isCR and isLF are nice however. Taking it a step further:

bool canSplit = inPattern(c,"\r\n");
if (canSplit)
{
  ...

You have increased the nesting of ifs to 3 inside a for-loop. Personally I don't read deep nesting very well. To go for readability I would use a small function for the entire expression:

if( s[i..$].startsWithCRLF() ) // same as startsWithCRLF(s[i..$])
{
  i++;
  istart++;
}

or use std.algorithm: if ( s[i..$].startsWith("\r\n") )

> And will the compiler generate slower code, or should the optimizer be able to inline the temporaries?

I don't think it matters much, but you can only tell by testing. There is a benchmark function (called benchmark iirc) somewhere in phobos to start with and dmd has a builtin profiler too.
August 10, 2010
Lutger wrote:
> simendsjo wrote:
> 
(...)
> 
> The CR and LF constants are a bit too much, probably because they don't really abstract over the literals which I can actually parse faster. The isCR and isLF are nice however. Taking it a step further:
> 
> bool canSplit = inPattern(c,"\r\n");
> if (canSplit)
> {
>   ...
> 
> You have increased the nesting of ifs to 3 inside a for-loop.Personally I don't read deep nesting very well. To go for readability I would use a small function for the entire expression: 
> 
> if( s[i..$].startsWithCRLF() ) // same as startsWithCRLF(s[i..$])
> {
>   i++;
>   istart++;
> }
> 
> or use std.algorithm: if ( s[i..$].startsWith("\r\n") )
>  
(...)
Nice. I didn't increase the if nesting though.

Something like this then?

S[] mysplitlines(S)(S s)
{
    size_t istart;

    auto result = Appender!(S[])();

    foreach (i; 0 .. s.length)
    {
        immutable c = s[i];
        immutable isEOL = inPattern(c, "\r\n");
        if (isEOL)
        {
            auto beforeEOL = s[istart .. i];
            result.put(beforeEOL);

            auto rest = s[i .. $];
            immutable isCRLF = rest.startsWith("\r\n");

            istart = i + 1; // consume first EOL character
            if (isCRLF) // skip \n too
            {
                i++;
                istart++;
            }
        }
    }

    // The last line might not end with EOL
    immutable lineNotEmpty = (istart != s.length);
    if (lineNotEmpty)
    {
        auto lastLine = s[istart .. $];
        result.put(lastLine);
    }

    return result.data;
}
August 10, 2010
simendsjo wrote:

> Lutger wrote:
...
>  I didn't increase the if nesting though.

I count 2 nested if-statements inside of the foreach loop in the original, you
have 3 nested if-statements.

> Something like this then?

Looks good to me, yes.