Text in D article (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Text in D article (page 3)

November 18, 2006

Re: Text in D article

Posted by Serg Kovrov
in reply to Daniel Keep

Serg Kovrov

Posted in reply to Daniel Keep

Hi Daniel,

You may want to give a try to Google Docs http://docs.google.com/
Seems your case is exactly what it for.


-- 
serg.

November 19, 2006

Re: Text in D article

Posted by Daniel Keep
in reply to Max Samuha

Daniel Keep

Posted in reply to Max Samuha


Max Samuha wrote:
> On Sun, 19 Nov 2006 02:43:10 +1100, Daniel Keep <daniel.keep.lists@gmail.com> wrote:
> 
>>
>> Max Samuha wrote:
>>> On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek <a.panek@brainsware.org> wrote:
>>>
>>>> PDF would be great, too.
>>>>
>>>> Tydr Schnubbis wrote:
>>>>> Daniel Keep wrote:
>>>>>> Here's a draft of an article which, hopefully, will explain some of the details of how text in D works.  Any constructive criticism is welcomed, along with edits or corrections.
>>>>>>
>>>>> Any chance of an .rtf, .doc, or even .txt? :)
>>> For those who is still on Windows :), thiere is a free and compact doc viewer that supports the open office format http://www.officeviewers.com/
>> Hey, *I'm* still on Windows :P
>>
>> 	-- Daniel
> 
> Daniel, I didn't intend to offend you, really. Sorry, if I did.

None taken at all.  Hence the ":P" -- OpenOffice.org *does* work on Windows quite nicely :)

> The article is great and useful. I would add a note for those coming from C# (and Java?) that D strings are mutable and doing the following is a bad idea:
> 
> class BlackBox
> {
> 	private char[] _text;
> 
> 	this()
> 	{
> 		_text = "object state";
> 	}
> 
> 	char[] text()
> 	{
> 		return _text; // should be 'return _text.dup' if you
> don't want the user of the object to change the internal _text;
> 	}
> }
> 
> Or something like that.

Perhaps.  This was basically written to be a quick look at all the things people expect to work, but don't.  To be honest, I've never had this problem since strings are arrays and arrays are passed by reference and thus can be mutated.  But then, maybe not everyone catches that first time :P

I'll definitely give it some thought.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

November 19, 2006

Re: Text in D article

Posted by Daniel Keep
in reply to Chris Nicholson-Sauls

Daniel Keep

Posted in reply to Chris Nicholson-Sauls


Chris Nicholson-Sauls wrote:
> Daniel Keep wrote:
>>
>> Max Samuha wrote:
>>
>>> On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek <a.panek@brainsware.org> wrote:
>>>
>>>
>>>> PDF would be great, too.
>>>>
>>>> Tydr Schnubbis wrote:
>>>>
>>>>> Daniel Keep wrote:
>>>>>
>>>>>> Here's a draft of an article which, hopefully, will explain some
>>>>>> of the
>>>>>> details of how text in D works.  Any constructive criticism is
>>>>>> welcomed,
>>>>>> along with edits or corrections.
>>>>>>
>>>>>
>>>>> Any chance of an .rtf, .doc, or even .txt? :)
>>>
>>> For those who is still on Windows :), thiere is a free and compact doc viewer that supports the open office format http://www.officeviewers.com/
>>
>>
>> Hey, *I'm* still on Windows :P
>>
>>     -- Daniel
>>
> 
> Same here -- for the most part.  Luckily I'm an OOo fanboy.  ;)  As for making the PDF, I have also noticed the bloat of OOo's PDF output, but you might try CutePDF and see if it gives you better results.  (Its a virtual printer that outputs to a PDF, so its usable with anything supporting printers.)
> 
> -- Chris Nicholson-Sauls

I actually have... oh, what's it called?  PDFCreator or somesuch.  That doesn't usually do that much better than OOo.  I actually had to zip the ODT and XHTML files since the newsgroup said they were too large together.  I doubt I'd even be able to post the PDF at all :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

November 19, 2006

Re: Text in D article

Posted by Daniel Keep
in reply to Walter Bright

Daniel Keep

Posted in reply to Walter Bright

Walter Bright wrote:
> Daniel Keep wrote:
>> I really don't like .rtf or .doc (long, painful history with those two), and .txt would probably destroy all formatting.  I usually write stuff in reStructuredText, but just didn't on this occasion.
> 
> I usually send articles around for review in .txt format, that way everyone can read them. After all the reviews are done, then I format it into html (using Ddoc) and put up the web page.
> 
> The problems with sending around text files in non-text format attached to postings are:
> 
> 1) the discussions always seem to focus on how to read the files, rather than their content
> 
> 2) when the posting gets archived, the content of the non-text format
> becomes inaccessible (it isn't searched by google, either)
> 
> That said, I think it's great you're working on a good article on strings in D. It'll be very helpful.

Usually I write up stuff in reStructuredText which is basically plain text with markup that can be read without running it through a formatter.  In this case I didn't because... I'm not really sure why.  I think it was just because OOo has a better spell-checker than Vim :P

I might try dumping it out to a text file and see what happens...

Also, thanks for the response.  Let me know if you think there's anything I should include :)

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

November 19, 2006

Re: Text in D article

Posted by Daniel Keep
in reply to Serg Kovrov

Daniel Keep

Posted in reply to Serg Kovrov

Serg Kovrov wrote:
> Hi Daniel,
> 
> You may want to give a try to Google Docs http://docs.google.com/ Seems your case is exactly what it for.

Blech.  No offense, but I hate web apps.  Dialup makes these things slow as molasses to use.  I've made a website with Google Pages before, and it was not a fun experience.

*click a button*  *wait* ... ... ... ... *page loads*

In an ideal world, I could edit in OOo or GVim and have the files mirrored over FTP or somesuch.  I really aught to try that one of these days...

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

November 19, 2006

Re: Text in D article

Posted by Daniel Keep
in reply to Pierre Rouleau

Daniel Keep

Posted in reply to Pierre Rouleau


Pierre Rouleau wrote:
> Pierre Rouleau wrote:
> 
>> Daniel Keep wrote:
>>
>>> Here's a draft of an article which, hopefully, will explain some of the details of how text in D works.  Any constructive criticism is welcomed, along with edits or corrections.
>>>
>>
>> As someone who has not been coding in D except for trying out some D every so often, I find:
>>
>> - the discussion of Unicode and its support of D clear and useful - the description of the use of printf and string confusing:
>>
>> You wrote::
>>
>>    Back before D had the std.stdio.writefln method, most examples used
>>    the old C function printf. This worked fine until you tried to output
>>    a string::
>>
>>       printf(“Hello, World!\n”);
>>
>>    The above statement was very likely to print out garbage that left
>>    many people scratching their heads. The reason is that C uses
>>    NUL-terminated strings, whereas D uses true arrays. In other words:
>>
>>    - Strings in C are a pointer to the first character. A string ends at
>>      the first NUL character.
>>    - Strings in D are a pointer to the first character, followed by a
>>      length. There is no terminating character.
>>
>>    And that's the problem: printf is looking for a terminator that
>>    doesn't necessarily exist.
>>
>>
>> That would lead me to believe that I could not use printf to print a string litteral.  But then I just wrote and compiled the following D code::
>>
>>   int
>>   main()
>>   {
>>      printf("Hello!\n");
>>      printf("Bye!\n");
>>      return 1;
>>   }
>>
>> But it prints just fine.  So, something must be missing in your explanation or my understanding.  I'll have to read more about D to understand.
>>
>> Just my 2 cents,
>>
>> -- 
>> P.R.

Read down a little bit further: it points out that you want to use std.string.toStringz to ensure that the NUL terminator exists.

It also admits that the example actually DOES work, simply because dmd sticks the NUL terminator on the end of all string literals.  But as someone already pointed out, if what you're dealing with is NOT a string literal: a slice of another string, or something read from disk, then it won't be there and the code will choke.

I should probably reorganise the section to be clearer on this.  I used that (wrong) example because an example that actually fails would be somewhat longer, and probably make people think "Ok, so why can't I use slices to C functions?  Are they not really strings?"

> 
> And BTW, the line::
> 
>   printf(“Hello, World!\n”);
> 
> does not compile because of the non ASCII characters used for quoting.

Damnit... every time I go to write prose that option's off, and every time I write code examples it's ON.  I swear OOo is out to get me >_<

> So other questions comes to mind:

Off the top of my head:

> - Can D source code contain Unicode characters freely?

- Yup, you betcha!

> - If so, how is it done?

- Use a text editor that supports saving files in UTF-8.  I'm not sure off the top of my head if UTF-16 and UTF-32 are supported directly...

> - If not, how can we define a Unicode string literal?

- If you don't have access to a Unicode-enabled editor, you can use
escape sequences with \uXXXX (or \UXXXXXXXX for higher Unicode code points.)

> - Does D have a Unicode string type like, say Python, or is it better at specifying them?

- That's *all* D has.  Remember, char, wchar and dchar correspond to UTF-8, UTF-16 and UTF-32 which are the three main ways of storing Unicode text.  Internally, Python uses UTF-16.

> - How do we handle internationalization of presentation strings in D? - gettext support...

I don't know if gettext would work in D, simply because I've never seen it tried.  D doesn't have any *direct* support for this, tho.

(Then again, I'm yet to see *any* programming language that does.)

> - Do we have to use text codecs (as in Python for example)?

D has no built-in support for converting between code pages, as far as I know.  You need to download and use a conversion library like iconv to convert between code pages.

> This information would fit quite nicely in an article describing text in D.

I may have to restructure it into two sections: a "What the... it's a borken!" section and a "Q&A" section.

Thanks for the feedback.

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

November 19, 2006

Re: Text in D article

Posted by Daniel Keep
in reply to Bill Baxter

Daniel Keep

Posted in reply to Bill Baxter


Bill Baxter wrote:
> Max Samuha wrote:
>> On Sat, 18 Nov 2006 15:59:33 +0100, Alexander Panek <a.panek@brainsware.org> wrote:
>>
>>> PDF would be great, too.
>>>
>>> Tydr Schnubbis wrote:
>>>> Daniel Keep wrote:
>>>>> Here's a draft of an article which, hopefully, will explain some of
>>>>> the
>>>>> details of how text in D works.  Any constructive criticism is
>>>>> welcomed,
>>>>> along with edits or corrections.
>>>>>
>>>> Any chance of an .rtf, .doc, or even .txt? :)
>> For those who is still on Windows :), thiere is a free and compact doc viewer that supports the open office format http://www.officeviewers.com/
> 
> Thanks for the link, Max.
> 
> Daniel, I like it.  Seems quite clear to me.
> 
> One minor thing.  In one section you recommend just using dchar[] everywhere as a solution for not slicing characters in the middle.  But then in the next section you recommend using std.string as a comprehensive solution for manipulating strings.  Unfortunately std.string really only deals with char[] strings.  So you might want to point out explicitly the dilemma that poses to the developer:  If you go with dchar[] and have to do a lot of string munging, you're likely to find lots of toUTF8's and toUCS32's popping up in your code.  If you go with char[] you've got to remember that mystring[1..$] may not mean what you think it means.
> 
> --bb

You are, of course, right.

"OK; if you're doing array indexing or slicing, stick to dchar; if you're going to be using std.string, stick to char."

Doesn't really sound good.  It implies that either the standard library has a hole in it or that indexing and slicing on char[] and wchar[] *should* work as expected.

I think I'll change the article so that it's correct, but here's a question for Walter:

  Is std.string going to support wchar[]s and dchar[]s?  If not, why?

Heh, they say the best way to learn something is to teach it.  Guess I'm still learning :P

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

November 19, 2006

Re: Text in D article

Posted by Walter Bright
in reply to Daniel Keep

Walter Bright

Posted in reply to Daniel Keep

Daniel Keep wrote:
> Also, thanks for the response.  Let me know if you think there's
> anything I should include :)

To tell the truth, I haven't read it yet, because I am reluctant to download viewers and install them.

November 19, 2006

Re: Text in D article

Posted by Daniel Keep
in reply to Walter Bright

Daniel Keep

Posted in reply to Walter Bright

Walter Bright wrote:
> Daniel Keep wrote:
>> Also, thanks for the response.  Let me know if you think there's anything I should include :)
> 
> To tell the truth, I haven't read it yet, because I am reluctant to download viewers and install them.

Ah, well, the latest zip contains an XHTML version which should open in just about any browser.  Don't tell me you don't even browse your own website :3

	-- Daniel

-- 
Unlike Knuth, I have neither proven or tried the above; it may not even make sense.

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

November 19, 2006

Re: Text in D article

Posted by Pierre Rouleau
in reply to Daniel Keep

Pierre Rouleau

Posted in reply to Daniel Keep

Daniel Keep wrote:

> 
> Pierre Rouleau wrote:
> 
>>Pierre Rouleau wrote:
>>
>>
>>>Daniel Keep wrote:
>>>
>>>
>>>>Here's a draft of an article which, hopefully, will explain some of the
>>>>details of how text in D works.  Any constructive criticism is welcomed,
>>>>along with edits or corrections.
>>>>
>>>
>>>As someone who has not been coding in D except for trying out some D
>>>every so often, I find:
>>>
>>>- the discussion of Unicode and its support of D clear and useful
>>>- the description of the use of printf and string confusing:
>>>
>>>You wrote::
>>>
>>>   Back before D had the std.stdio.writefln method, most examples used
>>>   the old C function printf. This worked fine until you tried to output
>>>   a string::
>>>
>>>      printf(“Hello, World!\n”);
>>>
>>>   The above statement was very likely to print out garbage that left
>>>   many people scratching their heads. The reason is that C uses
>>>   NUL-terminated strings, whereas D uses true arrays. In other words:
>>>
>>>   - Strings in C are a pointer to the first character. A string ends at
>>>     the first NUL character.
>>>   - Strings in D are a pointer to the first character, followed by a
>>>     length. There is no terminating character.
>>>
>>>   And that's the problem: printf is looking for a terminator that
>>>   doesn't necessarily exist.
>>>
>>>
>>>That would lead me to believe that I could not use printf to print a
>>>string litteral.  But then I just wrote and compiled the following D
>>>code::
>>>
>>>  int
>>>  main()
>>>  {
>>>     printf("Hello!\n");
>>>     printf("Bye!\n");
>>>     return 1;
>>>  }
>>>
>>>But it prints just fine.  So, something must be missing in your
>>>explanation or my understanding.  I'll have to read more about D to
>>>understand.
>>>
>>>Just my 2 cents,
>>>
>>>-- 
>>>P.R.
> 
> 
> Read down a little bit further: it points out that you want to use
> std.string.toStringz to ensure that the NUL terminator exists.
> 

I saw that.  My point was that the article should be a little clearer as to why you would want to use it.  As an introduction of text processing in D, and a treatment of the different string format (NUL terminated or lenght-based) a newbie would need to know the implications of the code he writes, the effect of transformations (such as slices or whatever).


> It also admits that the example actually DOES work, simply because dmd
> sticks the NUL terminator on the end of all string literals.  But as
> someone already pointed out, if what you're dealing with is NOT a string
> literal: a slice of another string, or something read from disk, then it
> won't be there and the code will choke.
> 
> I should probably reorganise the section to be clearer on this.  I used
> that (wrong) example because an example that actually fails would be
> somewhat longer, and probably make people think "Ok, so why can't I use
> slices to C functions?  Are they not really strings?"

> 
> 
>>And BTW, the line::
>>
>>  printf(“Hello, World!\n”);
>>
>>does not compile because of the non ASCII characters used for quoting.
> 
> 
> Damnit... every time I go to write prose that option's off, and every
> time I write code examples it's ON.  I swear OOo is out to get me >_<

I also like reStructuredText myself...  but writing extra symbols is a little trickier...

> 
>>So other questions comes to mind:
> Off the top of my head:
>>- Can D source code contain Unicode characters freely?
> - Yup, you betcha!
>>- If so, how is it done?
> - Use a text editor that supports saving files in UTF-8.  I'm not sure
> off the top of my head if UTF-16 and UTF-32 are supported directly...

Readers might be interested to know that they can use these in the source code file. As well, they wonder whether or not non ASCII characters are acceptables for things such as variable names.


>>- If not, how can we define a Unicode string literal?
> - If you don't have access to a Unicode-enabled editor, you can use
> escape sequences with \uXXXX (or \UXXXXXXXX for higher Unicode code points.)
>>- Does D have a Unicode string type like, say Python, or is it better at
>>specifying them?
> - That's *all* D has.  Remember, char, wchar and dchar correspond to
> UTF-8, UTF-16 and UTF-32 which are the three main ways of storing
> Unicode text.  Internally, Python uses UTF-16.
> 
> 
>>- How do we handle internationalization of presentation strings in D?
>>- gettext support...
> 
> 
> I don't know if gettext would work in D, simply because I've never seen
> it tried.  D doesn't have any *direct* support for this, tho.

I can't see why it would not.  Can we have a function named  '_()' in D?
Since gettext philosophy is to write all presentation strings in English, then the code can be written in ASCII-only files and since the strings are Unicode, the translated strings could contain any symbol at runtime.

One aspect is the string formatting.  Does D support string formatting similar to Python's dictionary-based formatting like:

a_dict = {person_name : 'Daniel'}
a_string = 'Hello %(person_name)s ! How are you?' % a_dict

Python dictionaries are very useful for that purpose.  Translating presentation strings works better when the entire string context is available to the person doing the natural language translation.  As far as I am concerned, this is an important feature for programming language used to (client-side) write applications.


> 
> (Then again, I'm yet to see *any* programming language that does.)
> 
Support for gettext does not have to be built in the language.  Simply that the language does not preclude using gettext.

> 
>>- Do we have to use text codecs (as in Python for example)?
> 
> 
> D has no built-in support for converting between code pages, as far as I
> know.  You need to download and use a conversion library like iconv to
> convert between code pages.
> 


> 
> Thanks for the feedback.
> 

You're welcome.

--

Pierre

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation