June 21, 2011
On 6/21/11, bearophile <bearophileHUGS@lycos.com> wrote:
> After translating so much code have you seen some recurring patterns in your code conversion bugs?

A lot of C code is missing default labels. I've used the github 2.054 clone to compile all examples in order to catch missing labels at compile time. I've missed many of them before this feature became available.

Float/Doubles are initialized to NaN in D, however I don't know what they're initialized to in C. But it seems none of the examples had floating point variables initialized. This causes halts/exceptions in D, and without ddbg it's really hard to track down what went wrong. I don't think the stack traces helped me in this regard. Generally forgetting to initialize floats seems to be common in other C code as well.

C struct initializer syntax is used for static arrays of structs in the C code. I think this could easily be a source of bugs if you add or remove fields from a struct. I'm not sure if that syntax works in D but I've used explicit ctor calls instead, such as:

struct Foo { int x, y; }
Foo[5] foo = [Foo(1, 2), Foo(1, 2), Foo(1, 2), Foo(1, 2), Foo(1, 2)];

It's verbose but it's safer.

In some code a variable would be defined as an int, and returned as a short. There are other cases where it's defined as an int, and then casted to a ubyte (BYTE is an alias for ubyte there). I have to fix some of these.

I did notice an annoying thing with D, for example:
ubyte val = min(0, 128);
Error: cannot implicitly convert expression (min(0,128)) of type int to ubyte

The problem with this is I have to add a cast or a safer to!() call.
to!() can only do a runtime check:
ubyte val = to!ubyte(min(1024, 2048));  // overflow exception at runtime

And a cast is completely unsafe:
ubyte val = cast(ubyte)(min(1024, 2048));  // no error, bugs introduced!

Of course the compiler doesn't know what min does, only that it returns an int, so I don't know whether it could ever figure out that a function would never return a value that overflows the assignment variable (maybe this feature could be implemented for pure functions..).

C Globals become an instant source of bugs as soon as you fire up a
new thread. This is due to D's thread-local storage by default
feature. I like D's approach, and C globals should be replaced with
messaging or some other technique. I've used __gshared in various
modules to make the examples compile and work properly, but this
should be replaced.
This isn't unique to these examples. I've seen globals used everywhere
for messaging purposes in C code, e.g. some audio processing examples
in a different library.

A weird thing I've seen in the examples was a pointer to a static variable was passed to a newly spawned thread's function. And via this static variable the original thread and the work thread communicated together. How this can work is beyond me, but it can't work in D. Luckily D's spawn() function does offer protection for this kind of code.

These examples used implicit fallbacks a lot, with only a comment
explicitly mentioning that there is a fallback in play. I think it's
good that we've abandoned implicit fallbacks, they could easily cause
bugs.
But, on the other hand someone might get an error about an implicit
fallback and think they forgot to add a "break" when porting C code.
So they add that, but now they've introduced a bug because the
fallback was actually necessary. I had this happen to me in one
example because the fallback wasn't explicitly mentioned.

There are issues with calling format() and writef(). This is not just
for this translation project. Most C code uses some form of
printf/wsprintf to either store a string to a variable or print it
out. But the format specifiers often do not translate into D. I think
what is probably causing this is that C types are not the same as D
types, so when you see this in C code:
printf("Note Count %ld, (long) notescount);

This might not translate into D and you could easily get an exception at runtime (a pity that this doesn't get caught at compile-time). The root of the problem is that the format specifiers explicitly name the type, while you inadvertently change the actual type in D (maybe you were fixing a conversion bug), and then you randomly get an exception thrown at runtime.

The #1 problem with these exceptions is that you get no error line
whatsoever. You just get back "Invalid format specifier" or something
like that. This is of no help, and I always have to fire up ddbg or
manually hunt down each format specifier and replace it with D's safe
"%s", which works great.
The exceptions with no error line problem is common for most Phobos
functions, I absolutely hate getting back an error message such as:
std.exception.ErrnoException@std\stdio.d(286): Cannot open file
`blablabla' in mode `r' (No such file or directory)

No stack trace, no error line except the error line in Phobos. I *don't care* about error lines in Phobos, the error is in my code not in Phobos, and I really think this needs to improve.

Many functions assume ASCII or USC2, both of which are wrong to assume
these days. I have some examples I still have to fix, because I've
used code like:
    string buffer;
    TextOut(buffer.toUTF16z, buffer.length);
This is incorrect code. TextOut takes the number of characters, not
code units when issuing the call. The code will compile and run fine
for the majority of cases because most characters will fit into
wchar's code point, and you will have a 1to1 code-point to code-unit
ratio. But add a Japanese character and you will see garbage being
printed out to the screen. The correct workaround for this is to
either:
1) Store everything as a dchar[]:
    dstring buffer;
    TextOut(buffer.toUTF16z, buffer.length);  // correct as long as
UTF never has more than 4 code units
2) Use some Phobos code-point counting function
    string buffer;
    TextOut(buffer.toUTF16z, std.utf.count(buffer));

The second option might cause performance issues, while using UTF32 dchars might cause excessive use of memory. Other solutions might be to wrap strings and cache the count of code points. It's a balancing act between memory and performance.

Some API functions take char* as a parameter when expecting some pointer to a data buffer which can but doesn't necessarily store text. I don't know why char* is used instead of void*, perhaps char* are more often used with these functions so they've decided to type it like that.

There are many cases of implicit 0 to void* casts, but D catches this at compile time.

A big issue with porting is that D's character arrays are always initialized to invalid values. This wreaks havoc in code, in particular with static arrays. For example:

    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(buffer[]);  // <- buggy code

The problem is twofold. First, the API call might not work because it
could easily expect a null terminator to be present as a first
character:
    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);  // Might not work!

So I've had to do this:
    char[100] buffer;
    buffer[] = '\0';  // or buffer[0] = '\0', depending on the function
    APIWriteToBuffer(buffer.ptr);

Second, after the call you will have a buffer that has some valid text stored into it, and then either null terminators to the length of the array (if you did buffer[] = '\0'), OR you will have a null terminator followed by a series of char.init invalid UTF values.

So when you finally call a D function with such a static array, it
could print out garbage or throw an exception:
    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(buffer[]);  // what will this function do when
stumbles on '\0' or char.init?

Behind the scense that D function might in turn call a C function, so
continued from the last example:
    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(buffer[]);

    void myDStringFunction(char[] buffer)
    {
        // huge mess, the API function might write nonsense like ['f',
'o', 'o', '\0', char.init, char.init..]
        APIPrintToScreen(buffer.ptr, buffer.length);
    }

So you have to do conversion and duplication all the time. You have to do:

    char[100] buffer;
    APIWriteToBuffer(buffer.ptr);  // buffer might now be ['f', 'o',
'o', '\0', char.init, char.init..];
    myDStringFunction(to!string(buffer.ptr.toStringz));

How it works: First, you need to treat the buffer as a C null-terminated char array, hence using .ptr. Then, you have to call a Phobos function so it can retrieve a string until the first null terminator.

But it gets worse when UTF is in play because in v2.043 conv.to doesn't know how to convert wchar*'s. I've made a request on github and I think this feature will be incorporated, but for now you can't do this:

    wchar[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(to!string(buffer.ptr));  // error, to!() can't
convert wchar*

So I've had to create my own function which converts a wchar* to wchar[], which can then be used for whatever purpose:

    wchar[100] buffer;
    APIWriteToBuffer(buffer.ptr);
    myDStringFunction(to!string(fromWStringz(buffer.ptr)));

So generally static arrays of characters and string handling between C and D can be painful.
June 21, 2011
On 6/21/2011 12:35 PM, Andrej Mitrovic wrote:
> [...]

This is interesting and valuable information. Please add it to your repository as a readme, or even as a blog post!
June 21, 2011
On 6/21/2011 12:35 PM, Andrej Mitrovic wrote:
> Float/Doubles are initialized to NaN in D, however I don't know what
> they're initialized to in C.

In C they're initialized to 0 for globals, and garbage for locals and allocated data.


> A big issue with porting is that D's character arrays are always
> initialized to invalid values. This wreaks havoc in code, in
> particular with static arrays. For example:
>
>      char[100] buffer;
>      APIWriteToBuffer(buffer.ptr);
>      myDStringFunction(buffer[]);  //<- buggy code

In C buffer is not initialized to 0, if it is not global, it is initialized to garbage.

> The problem is twofold. First, the API call might not work because it
> could easily expect a null terminator to be present as a first
> character:
>      char[100] buffer;
>      APIWriteToBuffer(buffer.ptr);  // Might not work!

Fix:

       char[100] buffer = 0;

> So when you finally call a D function with such a static array, it
> could print out garbage or throw an exception:
>      char[100] buffer;
>      APIWriteToBuffer(buffer.ptr);
>      myDStringFunction(buffer[]);  // what will this function do when
> stumbles on '\0' or char.init?

Fix:
	char[100] buffer;
	APIWriteToBuffer(buffer.ptr);
        auto p = buffer[0 .. strlen(buffer.ptr)];
	myDStringFunction(p);
June 21, 2011
On 6/21/11, Walter Bright <newshound2@digitalmars.com> wrote:
> Fix:
>
>         char[100] buffer = 0;

I didn't know about this syntax! I've been under a rock apparently.

> Fix:
> 	char[100] buffer;
> 	APIWriteToBuffer(buffer.ptr);
>          auto p = buffer[0 .. strlen(buffer.ptr)];
> 	myDStringFunction(p);
>

Yeah I was going to suggest that. My only problem was I didn't know where to look for an strlen equivalent for UTF, but I found it in std.utf and it's called stride().

I'll fix these examples soon. I'll give a thought or two about that blogging idea as well, thanks.
June 21, 2011
On 2011-06-21 13:46, Andrej Mitrovic wrote:
> On 6/21/11, Walter Bright <newshound2@digitalmars.com> wrote:
> > Fix:
> > char[100] buffer = 0;
> 
> I didn't know about this syntax! I've been under a rock apparently.
> 
> > Fix:
> > char[100] buffer;
> > APIWriteToBuffer(buffer.ptr);
> > 
> > auto p = buffer[0 .. strlen(buffer.ptr)];
> > 
> > myDStringFunction(p);
> 
> Yeah I was going to suggest that. My only problem was I didn't know where to look for an strlen equivalent for UTF, but I found it in std.utf and it's called stride().

stride isn't a UTF version of strlen. stride returns how many code units it is to the beginning of the next code point. You can use it to get the number of characters in a string by calling it in a loop, but it isn't a UTF version of strlen.

- Jonathan M Davis
June 21, 2011
On 6/21/11, Jonathan M Davis <jmdavisProg@gmx.com> wrote:
> stride isn't a UTF version of strlen.

Brain misshap, I meant std.utf.count.
June 21, 2011
Walter, would it be possible to make .map file generation follow the -od flag? It's odd when a  build script fills a directory with map files while executables and object files are properly in their own directories as specified via -od and -of flags.

Btw, I have just pushed a changeset and made the build script
parallel. I turned this:
foreach (dir; dirs)

into this:
foreach (dir; taskPool.parallel(dirs, 1))

And added other small cosmetic changes, but the above is the only real change.

Here's my timings:
Serial build: 1min, 3sec
Parallel build: 22sec
June 21, 2011
On 6/20/11 11:14 PM, Andrej Mitrovic wrote:
> This is a translation project of Charles Petzold's Programming Windows
> (5th edition) book code samples.
>
> Currently over 120 code samples have been translated into D, with only
> a few modules remaining.
>
> Everything else you need to know is in the Readme file:
> https://github.com/AndrejMitrovic/DWindowsProgramming
>
> The examples were tested on fresh installs of XP and Win7 with the
> only dependency being DMD v2.053 and an NT operating system. I hope
> everyone will be able to build these examples without too much
> trouble. *crosses fingers*

Now on reddit:

http://www.reddit.com/r/programming/comments/i5m0c/petzolds_windows_programming_5th_ed_code_samples/


Andrei
June 21, 2011
Am 21.06.2011, 20:32 Uhr, schrieb Andrej Mitrovic <andrej.mitrovich@gmail.com>:
> I don't know whether it is planned to incorporate the WinAPI bindings
> into Phobos/Druntime, there's at least 2.5 megs of code in prototypes
> and I'm not sure about the license (it's based on the MinGW headers).

That is indeed long overdue!
June 21, 2011
Am 21.06.2011 06:14, schrieb Andrej Mitrovic:
> This is a translation project of Charles Petzold's Programming Windows (5th edition) book code samples.
> 
> Currently over 120 code samples have been translated into D, with only a few modules remaining.
> 
> Everything else you need to know is in the Readme file: https://github.com/AndrejMitrovic/DWindowsProgramming
> 
> The examples were tested on fresh installs of XP and Win7 with the only dependency being DMD v2.053 and an NT operating system. I hope everyone will be able to build these examples without too much trouble. *crosses fingers*

The README mentions 2.043 instead of 2.053.

Cheers,
- Daniel