palindrome function

Aug 07, 2010

Aug 08, 2010

Hello! I'm learning D and I was wondering on what can be optimized of my implementation of a palindrome function: bool isPalindrome(int num) { string s = text(num); int len = s.length; for (int i = 0; i < len / 2; ++i) { if (s[i] != s[len - 1 - i]) { return false; } } return true; } 1) Is std.conv.text() the best function to call here to convert a number to string? 2) Does using a variable to store the string length is more efficient than calling the length method? 3) This example use numeric characters but if the string were UTF8 the [] operator would be OK for multibyte characters? Not related: 4) Is some way to declare the nested functions at the end, this way I could create nice "routines" of a function. 5) I could not find in the documentation if when I declare a variable inside a block (a loop for example) that variable creation is optimized by the compiler.

Sorry I forg to paste my last version" bool isPalindrome(int num) { string s = text(num); int length = s.length; int limit = length / 2; for (int i = 0; i < limit; ++i) { if (s[i] != s[$ - 1 - i]) { return false; } } return true; }

On Saturday 07 August 2010 16:56:08 ex wrote: > Hello! > I'm learning D and I was wondering on what can be optimized of my > implementation of a palindrome function: > > bool isPalindrome(int num) { > string s = text(num); > int len = s.length; > for (int i = 0; i < len / 2; ++i) { > if (s[i] != s[len - 1 - i]) { > return false; > } > } > return true; > } > > 1) Is std.conv.text() the best function to call here to convert a number to > string? Use to!string(num) instead of text(num). > 2) Does using a variable to store the string length is more efficient than calling the length method? length is a property, not a method. I find it highly unlikely that storing length in a variable is going to make it any faster. > 3) This example use numeric characters but if the string were UTF8 the [] operator would be OK for multibyte characters? Using [] with strings is a _bad_ idea. [] gives you the chars (or wchars in the case of wstring). chars are UTF-8 code units, and wchars are UTF-16 code units. If you want to use [] with a string, use a dstring, since dchars are UTF-32 code units, and unlike UTF-8 and UTF-16, a code unit and a code point are the same size. A code point represents what you would usually consider a character, while a code unit is one portion of a character's encoding (which could be the whole thing if the character is only one code unit which ASCII characters would be). For instance, if you're looping over a string or wstring, you need to use dchar even if though the array itself is char or wchar: foreach(dchar c; str) { //... } The compiler understands the conversion and loops over the code points rather than the code units. [] isn't going to work with strings and wstrings if you want code points (characters) rather than code units. So, in your function, you'd want to use dstring rather than string. So, you'd do auto s = to!dstring(num) to get your string. > > Not related: > > 4) Is some way to declare the nested functions at the end, this way I could create nice "routines" of a function. You can declare nested functions, but I think that they're like local variables in that they have to be declared before you can use them. Otherwise, the symbol isn't in scope yet when you try and use it. > 5) I could not find in the documentation if when I declare a variable inside a block (a loop for example) that variable creation is optimized by the compiler. Whether the compiler optimizes stuff or not is not likely to be in the documentation. That's not part of the D spec and is entirely implementation- dependent. You're going to have to look at the generated assembly code if you want to know what it actually, does. As for a better way to do what you're doing, I'd do this: bool isPalindrome(int num) { auto s = to!dstring(num); while(!s.empty) { if(s.length == 1) return true; if(s.front != s.back) return false; s.popFront(); s.popBack(); } return true; } It would be more typical to use strings as ranges in this manner. Also, in this case, it doesn't matter whether you use string, wstring, or dstring, since front, back, popFront(), and popBack() all deal with the unicode properly. front and back return dchar in all cases, and popFront() and popBack() always pop a code point rather than a code unit. It's probably better to use a dstring in this case because then no conversions have to take place from UTF-8 or UTF-16 to UTF-32, but it probably doesn't matter unless you're calling the function a lot. In addition, this version of the code is easier to read than having to deal with array indicies. - Jonathan M Davis

As a side note, I'd suggest making your function take a string of some kind, if not outright making it a template to deal with multiple string types. That way, you can check for a palindrome regardless of whether you're dealing with numbers, and if you need to do it with a number, you just convert it to a string when you call isPalidrome. So, you're function (without changing anything other than the signature) would become something like bool isPalindrome(S)(in S s) { int length = s.length; int limit = length / 2; for (int i = 0; i < limit; ++i) { if (s[i] != s[$ - 1 - i]) { return false; } } return true; } and my version would look like this: bool isPalindrome(S)(in S s) { while(!s.empty) { if(s.length == 1) return true; if(s.front != s.back) return false; s.popFront(); s.popBack(); } return true; } So, calls to it with a number would look like isPalindrome(to!string(num)) - Jonathan M Davis

August 08, 2010

Re: palindrome function

Posted by bearophile
in reply to ex

Permalink

bearophile

Posted in reply to ex

Permalink

ex:
Welcome to D :-) Few more comments beside the ones written by Jonathan M Davis.

> I'm learning D and I was wondering on what can be optimized of my implementation of a palindrome function:

See below.


> 2) Does using a variable to store the string length is more efficient than calling the length method?

Nope. When you have similar questions you can write a small benchmark and you can take a look at the produced assembly code.


> 4) Is some way to declare the nested functions at the end, this way I could create nice "routines" of a function.

I don't understand this question. But inside a function the names need to be written in their natural visibility order, so you have to define nested functions before you use them.


> 5) I could not find in the documentation if when I declare a variable inside a block (a loop for example) that variable creation is optimized by the compiler.

With DMD sometimes the compiler is able to pull it out of the loop, and some other times it's not able to do it. Generally in the simple cases DMD is able to do it. But I have seen were updating an object field inside a method was not pulled out of the loop and was not replaced by a temporary variable by the DMD compiler. LDC (D1) compiler is usually better in this regard. So if you have a performance critical routine it's better to manually move all things out of the loop :-)


> Sorry I forg to paste my last version"
> 
> bool isPalindrome(int num) {
>     string s = text(num);
>     int length = s.length;
>     int limit = length / 2;
>     for (int i = 0; i < limit; ++i) {
>         if (s[i] != s[$ - 1 - i]) {
>             return false;
>         }
>     }
>     return true;
> }

D is a multi-level language, so usually there are always ways to further optimize code. At the end you can even translate your routine in inlined assembly. Often you don't need to write the faster code, a more natural implementation is enough. Idiomatic D style is often not the most efficient way to write your code, so if you look for max efficiency, you end with code that is often not idiomatic.


This is written in C-style, and seems faster:

import std.c.stdio: sprintf;
import std.c.stdlib: exit, EXIT_FAILURE;

size_t isPalindrome(int num) {
    if (num == int.min) // because you can't negate int.min
        return 0;
    if (num < 0)
        num = -num;

    char[20] s = void;
    char* inf = s.ptr;
    int size = sprintf(inf, "%d", num); // atoi is faster
    if (size < 0)
        exit(EXIT_FAILURE); // or you can return 2 as error code
    char* sup = &(s[size - 1]);

    while (inf < sup)
        // you can add a loop invariant test here
        if (*inf++ != *sup--)
            return 0;
    return 1;
}

void main() {
    assert(isPalindrome(0));
    assert(!isPalindrome(10));
    assert(isPalindrome(101));
    assert(!isPalindrome(1010));
    assert(isPalindrome(-1));
    assert(isPalindrome(-101));
    assert(isPalindrome(1234554321));
    assert(isPalindrome(123454321));
}


This function allocates the string on the stack to avoid heap activity, returns a word because it's sometimes more efficient than fiddling with a byte type (the bool).

But it's not the fastest possible you can write, even if you don't touch assembly, because sprintf() is not the most efficient here, you can write your own faster atoi in D:


size_t isPalindrome(int num) {
    if (num == int.min) // because you can't negate int.min
        return 0;
    if (num < 0)
        num = -num;

    int[20] snum = void;
    int* inf = snum.ptr;
    int* sup = inf;

    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
    if (!num) goto END; *sup++ = num % 10; num /= 10;
  END:
    sup--;

    while (inf < sup)
        if (*inf++ != *sup--)
            return 0;
    return 1;
}

void main() {
    assert(isPalindrome(0));
    assert(isPalindrome(5));
    assert(!isPalindrome(10));
    assert(isPalindrome(101));
    assert(!isPalindrome(1010));
    assert(isPalindrome(-1));
    assert(isPalindrome(-101));
    assert(isPalindrome(1234554321));
    assert(isPalindrome(123454321));
    assert(!isPalindrome(int.max));
}


But DMD is not good at coverting %10 and /10 in shifts and the like as better compilers are able to do, so such code fiddly code ends slower than the version with sprintf(). If compiled with LDC this latest version can be faster than the version with sprintf.

In normal D programs you never write code like that unless your profiler has shown you finding the palindrome is time-critical in your program, and even then you may start thinking about 1/2 GB of RAM to pre-compute all palindromes of 32-bit inputs (keeping one bit for each answer), but this is not nice for the CPU caches, so tests and timings are needed.

Bye,
bearophile

Forums