February 06, 2016

On 2/6/2016 2:28 PM, Walter Bright wrote:
> On 2/2/2016 8:46 PM, Михаил Страшун wrote:
> > When it comes to encoding, there is also issue of how lacking is current 
> support of non-UTF encodings in Phobos.
>
> This is a deliberate choice. Phobos is designed so that UTF is the only encoding supported. Other encodings are expected to:
>
>    other => toUTF => usePhobos => toOther
>
> i.e. translate to UTF, do the processing, and then translate back to whatever encoding is desired.
>
> I have experience with other encodings, like Shift-JIS. May they all burn in hell. If you think that people forgetting they are dealing with UTF being a problem, imagine all the other encodings, and their peculiar weirdnesses that **** up every piece of code that is not explicitly set up to handle them.


As events in the last 15 years unfolded, it appears to be a mistake that we even support wchar and dchar.
_______________________________________________
Dlang-study mailing list
Dlang-study@puremagic.com
http://lists.puremagic.com/cgi-bin/mailman/listinfo/dlang-study

February 06, 2016
On 02/06/2016 07:47 PM, Михаил Страшун wrote:
> If reference count is stored inside allocator metadata, no cast becomes
> necessary as relevant memory will never be allocated as immutable
> (allocator is in full control on how its own metadata is stored).

I understand. For now I can say a cautious "seems interesting and doable". When we get to the implementation I'll see whether and how it can be made to work. Thanks! -- Andrei
_______________________________________________
Dlang-study mailing list
Dlang-study@puremagic.com
http://lists.puremagic.com/cgi-bin/mailman/listinfo/dlang-study

February 08, 2016
On 02/07/2016 05:39 AM, Andrei Alexandrescu wrote:
> On 02/06/2016 07:47 PM, Михаил Страшун wrote:
>> If reference count is stored inside allocator metadata, no cast becomes necessary as relevant memory will never be allocated as immutable (allocator is in full control on how its own metadata is stored).
> 
> I understand. For now I can say a cautious "seems interesting and doable". When we get to the implementation I'll see whether and how it can be made to work. Thanks! -- Andrei

Thanks!

I don't have full convidence that it will work but it feels like something that should be explored (and possibly discarded by practice) before considering even small tweaks to language specification.



February 08, 2016
On 02/07/2016 01:47 AM, Andrei Alexandrescu wrote:
>>> D uses UTF for strings. Vivid anecdotes aside, we really can't be everything to everyone. Your friend could have written a translator to UTF in a few lines.The DNA optimization points at performance bugs in phobos that far as I know have been fixed or are fixable by rote. I think this non-UTF requirement would just stretch things too far and smacks of solving the wrong problem.
>>
>>  From a pure technical point of view you are perfectly right. But does
>> that makes the fact potential users leave dissapointed better?
> 
> What greener pastures do they leave to? We should draw a page from the languages that support multiple encodings seamlessly.

He has switched to some Haskell parser generator solution (https://wiki.haskell.org/Parsec) - but I don't know anything about it or how does it expose encoding support. My personal understanding is that he was so frustrated by debugging mysterious parsing failures without language/library even slightly hinting what can be wrong that moved to different solution even if it wasn't strictly superior.

My gut feeling is that D is right in making UTF-8 default and main supported option - but the problem is that it assumes everything is UTF-8 too silently, without making neither library writers nor developers recognize it soon enough.

To be more specific, consider this canonical D example:

auto processed_text =
    File("something.txt", "r")
    .byLineCopy()
    .doSomeProcessing();

It is easy and natural thing to do thus there is high chance someone will write it without remembering doSomeProcessing() will do UTF-8 decoding internally. And a very simple addition can improve it a lot:

auto processed_text =
    File("something.txt", "r")
    .assumeUTF8() // or validateUTF8() to do early validation
                  // with throwing
    .byLineCopy()
    .doSomeProcessing();

This changes nothing in functionality and actual support of different encodings. Yet it changes two important things:

1) Serves as a visual reminder: "Hey, this assumes UTF-8, maybe you
should consider using `File.rawRead` instead?"
2) Allows to make a choice between eagerly enforcing input is valid
UTF-8 and simply assuming it.

I won't insist if this topic looks completely out of the line - it is not that important. But, as I have already mentioned before, it is likely to be last chance to change something about it if it is ever wanted to be changed.



Next ›   Last »
1 2 3