November 18, 2006
Alexander Panek wrote:
> PDF would be great, too.
>
> Tydr Schnubbis wrote:
>> Daniel Keep wrote:
>>> Here's a draft of an article which, hopefully, will explain some of the details of how text in D works.  Any constructive criticism is welcomed, along with edits or corrections.
>>>
>> Any chance of an .rtf, .doc, or even .txt? :)

I used the .odt since I wanted people to be able to make modifications to it directly, if they wanted.

I really don't like .rtf or .doc (long, painful history with those two), and .txt would probably destroy all formatting.  I usually write stuff in reStructuredText, but just didn't on this occasion.

Finally, the OOo-produced .pdf is kinda big (by an order of magnitude).

So here is an .xhtml version, and I will continue to supply this with any updates.  If someone needs it in something else, I'll do that as necessary.  No point in continually converting it when I'm still updating it :P

> If you change the license you can put it in the Wiki4D ?

I've duel-licensed it under CC At-Sa and FDL but WOW the FDL is bad. Reading it is like trying to swim through tar.  Also, I'm not entirely sure, but I think I may be violating the license by distributing it as ODT... I'm... not entirely sure.

I've also got some moral objections to a few parts of the license, but I suppose it's not enough to prevent me using it.  Problem is that GNU state specifically that the CC At-Sa license is not compatible with the FDL.  Bloody hippies :3

> I would avoid the term "Unicode character" like the plague... If you must have something similar, then use "code point" ? It's OK to have it in the casual text, like "ASCII character, BMP character, Unicode character" but better not in the lists.

I've changed references to "characters" to "code points", but it now seems very cumbersome.  I read the Wikipedia article, but I'm still not 100% sure where the distinction lies.

So: what *precisely* is a "character", and when it is appropriate to use the word?

> It also has an example on why: printf("Hello, World!\n"); doesn't work. But it does, since string *literals* are all NUL-terminated. However, when you then try to extend that to a string variable, and that variable contains a slice...

I've changed it to say that "statements like the above", and put in a note that yeah, ok, the example actually *does* work, but you really shouldn't count on that.

Apart from the "character" -> "code point" changes, I've tried to mark all changes by hi lighting them yellow.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/


November 18, 2006

Max Samuha wrote:
> On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek <a.panek@brainsware.org> wrote:
> 
>> PDF would be great, too.
>>
>> Tydr Schnubbis wrote:
>>> Daniel Keep wrote:
>>>> Here's a draft of an article which, hopefully, will explain some of the details of how text in D works.  Any constructive criticism is welcomed, along with edits or corrections.
>>>>
>>> Any chance of an .rtf, .doc, or even .txt? :)
> For those who is still on Windows :), thiere is a free and compact doc viewer that supports the open office format http://www.officeviewers.com/

Hey, *I'm* still on Windows :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/
November 18, 2006
Max Samuha wrote:
> On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
> <a.panek@brainsware.org> wrote:
> 
>> PDF would be great, too.
>>
>> Tydr Schnubbis wrote:
>>> Daniel Keep wrote:
>>>> Here's a draft of an article which, hopefully, will explain some of the
>>>> details of how text in D works.  Any constructive criticism is welcomed,
>>>> along with edits or corrections.
>>>>
>>> Any chance of an .rtf, .doc, or even .txt? :)
> For those who is still on Windows :), thiere is a free and compact doc
> viewer that supports the open office format
> http://www.officeviewers.com/ 

Thanks for the link, Max.

Daniel, I like it.  Seems quite clear to me.

One minor thing.  In one section you recommend just using dchar[] everywhere as a solution for not slicing characters in the middle.  But then in the next section you recommend using std.string as a comprehensive solution for manipulating strings.  Unfortunately std.string really only deals with char[] strings.  So you might want to point out explicitly the dilemma that poses to the developer:  If you go with dchar[] and have to do a lot of string munging, you're likely to find lots of toUTF8's and toUCS32's popping up in your code.  If you go with char[] you've got to remember that mystring[1..$] may not mean what you think it means.

--bb
November 18, 2006
"Daniel Keep" <daniel.keep.lists@gmail.com> wrote in message news:ejn63u$1v79$1@digitaldaemon.com...

> Very true.  I suppose I *should* say that literals are NUL-terminated, but I want to make it perfectly clear that relying on this is a bad idea; is it accepted practice to simply treat all strings as if they were possibly non NUL-terminated?

Is null-termination of string literals even part of the D spec?  Or is it entirely up to the implementation?  If the latter, then I'd put something in there about it, saying that it can't even be relied on..


November 18, 2006
Daniel Keep wrote:

> Very true.  I suppose I *should* say that literals are NUL-terminated,
> but I want to make it perfectly clear that relying on this is a bad
> idea; is it accepted practice to simply treat all strings as if they
> were possibly non NUL-terminated?

I'm not sure if the text primarily wants to discuss Unicode encodings,
or if it wants to discuss strings and text in D in general, but....

The main problem with printf is that you see a line like printf("foo")
and think that all strings are allowed. If neither would work, then it
wouldn't be as tempting to try it. But your conclusion/practice is OK,
you shouldn't use printf with D strings without having a *good* reason
(chances are that the C library will choke on the UTF-8 format anyway?)

Even the good ole "%.*s" hack is not portable to all possible platforms.
(it depends on how parameters are passed, think it breaks on Solaris...)
toStringz is the safest, even if you probably need to couple it with a
call to an encoding conversion if the local platform isn't using UTF-8 ?
But then you are on your own, the D library doesn't do such conversions.

Even simple D programs such as:
import std.stdio;
void main(char[][] args)
{
  foreach(char[] arg; args)
    writefln("%s", arg);
}

Will break down if you run them on a platform without UTF-8 support,
since you will get illegal strings in "args" (exceptions on writefln)
As a workaround you can cast them over to ubyte[], translate to UTF-8
from the local encoding, and cast them back into (now legal) char[]...
But I would hardly characterize that as a language "support" for the
legacy platforms, it's better to say D *requires* Unicode support ?


You might also want to touch briefly on the topics on COW and mutability
and how you might get segfaults writing to string literals. Or not... :)

--anders
November 18, 2006
On Sun, 19 Nov 2006 02:43:10 +1100, Daniel Keep <daniel.keep.lists@gmail.com> wrote:

>
>
>Max Samuha wrote:
>> On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek <a.panek@brainsware.org> wrote:
>> 
>>> PDF would be great, too.
>>>
>>> Tydr Schnubbis wrote:
>>>> Daniel Keep wrote:
>>>>> Here's a draft of an article which, hopefully, will explain some of the details of how text in D works.  Any constructive criticism is welcomed, along with edits or corrections.
>>>>>
>>>> Any chance of an .rtf, .doc, or even .txt? :)
>> For those who is still on Windows :), thiere is a free and compact doc viewer that supports the open office format http://www.officeviewers.com/
>
>Hey, *I'm* still on Windows :P
>
>	-- Daniel

Daniel, I didn't intend to offend you, really. Sorry, if I did.

The article is great and useful. I would add a note for those coming from C# (and Java?) that D strings are mutable and doing the following is a bad idea:

class BlackBox
{
	private char[] _text;

	this()
	{
		_text = "object state";
	}

	char[] text()
	{
		return _text; // should be 'return _text.dup' if you
don't want the user of the object to change the internal _text;
	}
}

Or something like that.
November 18, 2006
Daniel Keep wrote:

> Here's a draft of an article which, hopefully, will explain some of the
> details of how text in D works.  Any constructive criticism is welcomed,
> along with edits or corrections.
> 

As someone who has not been coding in D except for trying out some D every so often, I find:

- the discussion of Unicode and its support of D clear and useful
- the description of the use of printf and string confusing:

You wrote::

   Back before D had the std.stdio.writefln method, most examples used
   the old C function printf. This worked fine until you tried to output
   a string::

      printf(“Hello, World!\n”);

   The above statement was very likely to print out garbage that left
   many people scratching their heads. The reason is that C uses
   NUL-terminated strings, whereas D uses true arrays. In other words:

   - Strings in C are a pointer to the first character. A string ends at
     the first NUL character.
   - Strings in D are a pointer to the first character, followed by a
     length. There is no terminating character.

   And that's the problem: printf is looking for a terminator that
   doesn't necessarily exist.


That would lead me to believe that I could not use printf to print a string litteral.  But then I just wrote and compiled the following D code::

  int
  main()
  {
     printf("Hello!\n");
     printf("Bye!\n");
     return 1;
  }

But it prints just fine.  So, something must be missing in your explanation or my understanding.  I'll have to read more about D to understand.

Just my 2 cents,

--
P.R.


November 18, 2006
Pierre Rouleau wrote:

> Daniel Keep wrote:
> 
>> Here's a draft of an article which, hopefully, will explain some of the
>> details of how text in D works.  Any constructive criticism is welcomed,
>> along with edits or corrections.
>>
> 
> As someone who has not been coding in D except for trying out some D every so often, I find:
> 
> - the discussion of Unicode and its support of D clear and useful
> - the description of the use of printf and string confusing:
> 
> You wrote::
> 
>    Back before D had the std.stdio.writefln method, most examples used
>    the old C function printf. This worked fine until you tried to output
>    a string::
> 
>       printf(“Hello, World!\n”);
> 
>    The above statement was very likely to print out garbage that left
>    many people scratching their heads. The reason is that C uses
>    NUL-terminated strings, whereas D uses true arrays. In other words:
> 
>    - Strings in C are a pointer to the first character. A string ends at
>      the first NUL character.
>    - Strings in D are a pointer to the first character, followed by a
>      length. There is no terminating character.
> 
>    And that's the problem: printf is looking for a terminator that
>    doesn't necessarily exist.
> 
> 
> That would lead me to believe that I could not use printf to print a string litteral.  But then I just wrote and compiled the following D code::
> 
>   int
>   main()
>   {
>      printf("Hello!\n");
>      printf("Bye!\n");
>      return 1;
>   }
> 
> But it prints just fine.  So, something must be missing in your explanation or my understanding.  I'll have to read more about D to understand.
> 
> Just my 2 cents,
> 
> -- 
> P.R.
> 
> 

And BTW, the line::

  printf(“Hello, World!\n”);

does not compile because of the non ASCII characters used for quoting.

So other questions comes to mind:

- Can D source code contain Unicode characters freely?
- If so, how is it done?
- If not, how can we define a Unicode string literal?
- Does D have a Unicode string type like, say Python, or is it better at specifying them?
- How do we handle internationalization of presentation strings in D?
- gettext support...
- Do we have to use text codecs (as in Python for example)?


This information would fit quite nicely in an article describing text in D.
November 18, 2006
Daniel Keep wrote:
> 
> Max Samuha wrote:
> 
>>On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek
>><a.panek@brainsware.org> wrote:
>>
>>
>>>PDF would be great, too.
>>>
>>>Tydr Schnubbis wrote:
>>>
>>>>Daniel Keep wrote:
>>>>
>>>>>Here's a draft of an article which, hopefully, will explain some of the
>>>>>details of how text in D works.  Any constructive criticism is welcomed,
>>>>>along with edits or corrections.
>>>>>
>>>>
>>>>Any chance of an .rtf, .doc, or even .txt? :)
>>
>>For those who is still on Windows :), thiere is a free and compact doc
>>viewer that supports the open office format
>>http://www.officeviewers.com/ 
> 
> 
> Hey, *I'm* still on Windows :P
> 
> 	-- Daniel
> 

Same here -- for the most part.  Luckily I'm an OOo fanboy.  ;)  As for making the PDF, I have also noticed the bloat of OOo's PDF output, but you might try CutePDF and see if it gives you better results.  (Its a virtual printer that outputs to a PDF, so its usable with anything supporting printers.)

-- Chris Nicholson-Sauls
November 18, 2006
Daniel Keep wrote:
> I really don't like .rtf or .doc (long, painful history with those two),
> and .txt would probably destroy all formatting.  I usually write stuff
> in reStructuredText, but just didn't on this occasion.

I usually send articles around for review in .txt format, that way everyone can read them. After all the reviews are done, then I format it into html (using Ddoc) and put up the web page.

The problems with sending around text files in non-text format attached to postings are:

1) the discussions always seem to focus on how to read the files, rather than their content

2) when the posting gets archived, the content of the non-text format becomes inaccessible (it isn't searched by google, either)

That said, I think it's great you're working on a good article on strings in D. It'll be very helpful.