March 22, 2006
Rory, I've put together some code that I got to work, you can download it in the following zip file: http://spottedtiger.tripod.com/Downloads/SampleDLL.zip

David L.

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
-------------------------------------------------------------------

MKoD: http://spottedtiger.tripod.com/D_Language/D_Main_XP.html
March 22, 2006
In article <dvqi0d$29mi$1@digitaldaemon.com>, David L. Davis says...
>
>Rory, I've put together some code that I got to work, you can download it in the following zip file: http://spottedtiger.tripod.com/Downloads/SampleDLL.zip
>
>David L.
>
>-------------------------------------------------------------------
>"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
>-------------------------------------------------------------------
>
>MKoD: http://spottedtiger.tripod.com/D_Language/D_Main_XP.html

Thanks.  I was going to ask for this.

Tom J


March 22, 2006
James Dunne wrote:
> Rory Starkweather wrote:
> 
>> ...
> 
> As much as I'd love to help you out, I'm also hopelessly lost.  Can you ZIP up your project and document your test cases and post them on the NG or e-mail them to me personally?  The e-mail I post with on the newsgroups is my valid e-mail.
> 

The working code is attached with a sample Excel spreadsheet with embedded VBA test code.

Since VB uses equivalents of D's wchar* strings, you cannot easily use D's phobos string handling functions, since most of them expect char[] arguments, not wchar[].

My conclusion is that we really need wchar[] and dchar[] string-handling functions in phobos, or at least be assured that all the string-handling functions which accept char[] arguments assume UTF-8 encoding, not just ASCII.

I had to look at the std.string.d module's find() method to see if it is accepting ASCII or UTF-8, and I couldn't figure out which!  The code is:

int find(char[] s, dchar c)
{
     char* p;

     if (c <= 0x7F)
     {	// Plain old ASCII
	p = cast(char*)memchr(s, c, s.length);
	if (p)
	    return p - cast(char *)s;
	else
	    return -1;
     }

     // c is a universal character
     foreach (int i, dchar c2; s)
     {
	if (c == c2)
	    return i;
     }
     return -1;
}

This doesn't make a lick of sense to me why one can iterate over a char[] with foreach, expecting dchars to come out of it outside the range of ASCII...  is there something going on under the hood that I'm not aware of?  Is this code trying to imply that the char[] is being treated as UTF-8 magically?

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O
M--@ V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e
h>--->++ r+++ y+++
------END GEEK CODE BLOCK------

James Dunne


March 23, 2006
On Thu, 23 Mar 2006 02:59:09 +1100, James Dunne <james.jdunne@gmail.com> wrote:



> I had to look at the std.string.d module's find() method to see if it is
> accepting ASCII or UTF-8, and I couldn't figure out which!  The code is:
>
> int find(char[] s, dchar c)
> {
>      char* p;
>
>      if (c <= 0x7F)
>      {	// Plain old ASCII
> 	p = cast(char*)memchr(s, c, s.length);
> 	if (p)
> 	    return p - cast(char *)s;
> 	else
> 	    return -1;
>      }
>
>      // c is a universal character
>      foreach (int i, dchar c2; s)
>      {
> 	if (c == c2)
> 	    return i;
>      }
>      return -1;
> }
>
> This doesn't make a lick of sense to me why one can iterate over a
> char[] with foreach, expecting dchars to come out of it outside the
> range of ASCII...  is there something going on under the hood that I'm
> not aware of?

The foreach() has a mode of operation that automatically converts UTF encodings one character at a time.  Thus  foreach(dchar c; "abcdef"c) is valid code.

> Is this code trying to imply that the char[] is being
> treated as UTF-8 magically?

char[] *is* utf-8 it is not ASCII.  No magic here.

-- 
Derek Parnell
Melbourne, Australia
March 23, 2006
Derek Parnell wrote:
> On Thu, 23 Mar 2006 02:59:09 +1100, James Dunne <james.jdunne@gmail.com>  wrote:
> 
> 
> 
>> I had to look at the std.string.d module's find() method to see if it is
>> accepting ASCII or UTF-8, and I couldn't figure out which!  The code is:
>>
>> int find(char[] s, dchar c)
>> {
>>      char* p;
>>
>>      if (c <= 0x7F)
>>      {    // Plain old ASCII
>>     p = cast(char*)memchr(s, c, s.length);
>>     if (p)
>>         return p - cast(char *)s;
>>     else
>>         return -1;
>>      }
>>
>>      // c is a universal character
>>      foreach (int i, dchar c2; s)
>>      {
>>     if (c == c2)
>>         return i;
>>      }
>>      return -1;
>> }
>>
>> This doesn't make a lick of sense to me why one can iterate over a
>> char[] with foreach, expecting dchars to come out of it outside the
>> range of ASCII...  is there something going on under the hood that I'm
>> not aware of?
> 
> 
> The foreach() has a mode of operation that automatically converts UTF  encodings one character at a time.  Thus  foreach(dchar c; "abcdef"c) is  valid code.
> 

Okay, that's where I was confused.  Thanks!

>> Is this code trying to imply that the char[] is being
>> treated as UTF-8 magically?
> 
> 
> char[] *is* utf-8 it is not ASCII.  No magic here.
> 

I understand that, I just didn't know the hidden foreach magic.

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/MU/S d-pu s:+ a-->? C++++$ UL+++ P--- L+++ !E W-- N++ o? K? w--- O M--@ V? PS PE Y+ PGP- t+ 5 X+ !R tv-->!tv b- DI++(+) D++ G e++>e h>--->++ r+++ y+++
------END GEEK CODE BLOCK------

James Dunne
March 23, 2006
 I guess I would like to ask why I shouldn't do this at the same time I ask how to do it.

 I've been looking at a piece of code like:

	foreach (int i, dchar c; theString)
	{
                if (c == searchChar)
		return i + 1;
	}
	return 0;
}

 I understand the reason for doing this, but prefer to do things like this:

	int iPointer;
	
	iPointer = 0;
	foreach (int i, dchar c; theString)
	{
                if (c == searchChar)
		iPointer =  i + 1;
		// ?? break;
	}
	return (iPointer);
}

 I realize that the extra integer takes up a little memory space, but . . .

 My questions are:
Will 'break' work here?
Why not do it this way?
March 24, 2006
On Thu, 23 Mar 2006 17:51:09 -0600, Rory Starkweather wrote:

>   I guess I would like to ask why I shouldn't do this at the same time I
> ask how to do it.
> 
>   I've been looking at a piece of code like:
> 
> 	foreach (int i, dchar c; theString)
> 	{
>                  if (c == searchChar)
> 		return i + 1;
> 	}
> 	return 0;
> }
> 
>   I understand the reason for doing this, but prefer to do things like this:
> 
> 	int iPointer;
> 
> 	iPointer = 0;
> 	foreach (int i, dchar c; theString)
> 	{
>                  if (c == searchChar)
> 		iPointer =  i + 1;
> 		// ?? break;
> 	}
> 	return (iPointer);
> }
> 
>   I realize that the extra integer takes up a little memory space, but . . .
> 
>   My questions are:
> Will 'break' work here?
Yes it will, though it should be coded ...

 iPointer = 0;
 foreach (int i, dchar c; theString)
 {
   if (c == searchChar)
   {
     iPointer =  i + 1;
     break;
   }
 }
 return (iPointer);


> Why not do it this way?

It is just a coding-style issue. People code to different standards.

BTW, using the foreach this way can be misleading. The pointer value returned represents the number of dchars examined and *not* an index into theString. This is significant if theString is not a dchar[].

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
24/03/2006 11:07:48 AM
March 24, 2006
Derek Parnell wrote:

> BTW, using the foreach this way can be misleading. The pointer value returned represents the number of dchars examined and *not* an index into theString. This is significant if theString is not a dchar[].

That is not correct. The index returned is an index into the char[] array, not the number of dchars processed:

void main() {
        foreach(uint ix, dchar c; "åäö"c)
                writefln("c = %s, ix = %s",c,ix);
}

Prints:

c = å, ix = 0
c = ä, ix = 2
c = ö, ix = 4

/Oskar
March 24, 2006
Derek Parnell wrote:
> On Thu, 23 Mar 2006 17:51:09 -0600, Rory Starkweather wrote:
> 
> 
>>  I guess I would like to ask why I shouldn't do this at the same time I 
>>ask how to do it.
>>
>>  I've been looking at a piece of code like:
>>
>>	foreach (int i, dchar c; theString)
>>	{
>>                 if (c == searchChar)
>>		return i + 1;
>>	}
>>	return 0;
>>}
>>
>>  I understand the reason for doing this, but prefer to do things like this:
>>
>>	int iPointer;
>>	
>>	iPointer = 0;
>>	foreach (int i, dchar c; theString)
>>	{
>>                 if (c == searchChar)
>>		iPointer =  i + 1;
>>		// ?? break;
>>	}
>>	return (iPointer);
>>}
>>
>>  I realize that the extra integer takes up a little memory space, but . . .
>>
>>  My questions are:
>>Will 'break' work here?
> 
> Yes it will, though it should be coded ...

 Understood. The // ?? was added for emphasis.

> 
>  iPointer = 0;
>  foreach (int i, dchar c; theString)
>  {
>    if (c == searchChar)
>    {
>      iPointer =  i + 1;
>      break;
>    }
>  }
>  return (iPointer);
> 
> 
> 
>>Why not do it this way?
> 
> 
> It is just a coding-style issue. People code to different standards.
> 
> BTW, using the foreach this way can be misleading. The pointer value
> returned represents the number of dchars examined and *not* an index into
> theString. This is significant if theString is not a dchar[].

 Good point. Thanks for mentioning it.I hadn't really considered that. Another option that has been suggested is using 'ifind' after suitable conversions. 'ifind' is pretty much guaranteed to give me a pointer to the actual character I want, isn't it?
March 24, 2006
Oskar Linde wrote:
> Derek Parnell wrote:
> 
> 
>>BTW, using the foreach this way can be misleading. The pointer value
>>returned represents the number of dchars examined and *not* an index into
>>theString. This is significant if theString is not a dchar[].
> 
> 
> That is not correct. The index returned is an index into the char[] array,
> not the number of dchars processed:
> 
> void main() {
>         foreach(uint ix, dchar c; "åäö"c)
>                 writefln("c = %s, ix = %s",c,ix);
> }
> 
> Prints:
> 
> c = å, ix = 0
> c = ä, ix = 2
> c = ö, ix = 4
> 
> /Oskar


 I'm having some trouble understanding this implementation of the 'foreach' construct. From the definiton of the'foreach' expression, the purpose of the 'c' in . . .; "åäö"c) is not clear to me. Does this implicitly declare an array of items with the same data type as 'c'? In other words, three dchars in this case? From Oskar's comment that seems unlikely.

 For me a large part of the problem is the variable naming convention in the documentation, which seems a little ambiguous, although reading the entries carefully usually clarifies things. I think I am just not used to one letter variable names yet. A style I have never been comfortable with. Oddly enough, I am also trying to learn Oberon-2 now, and Wirth uses the same convention.