February 09, 2014
On Sunday, 9 February 2014 at 05:00:15 UTC, Marco Leise wrote:
>
> https://yourlogicalfallacyis.com/black-or-white

Off topic, but that is a fantastic web site.  I wish I had known
about it before.
February 09, 2014
On 2/8/14, 9:00 PM, Marco Leise wrote:
> Am Sat, 08 Feb 2014 14:01:12 -0800
> schrieb Walter Bright <newshound2@digitalmars.com>:
>
>> On 2/7/2014 8:40 AM, Dmitry Olshansky wrote:
>>> Meh. If exceptions are such a liability we'd better make them (much) faster.
>>
>> They can be made faster by slowing down non-exception code.
>>
>> This has been debated at length in the C++ community, and the generally accepted
>> answer is that non-exception code performance is preferred and exception
>> performance is thrown under the bus in order to achieve it.
>>
>> I think it's quite a reasonable conclusion.
>
> https://yourlogicalfallacyis.com/black-or-white
>
> The reasons for slow exceptions in D could be the generation
> of stack trace strings or the garbage collector instead of
> inherent trade offs to keep the successful code path fast.

This threads is about memory allocation, not exceptions being slow.

> And static allocation isn't an exactly appealing option...
>
>    throw staticException ? staticException : (staticException =
>    new SomethingException("Don't do this at home kids!"));
>

Function calls could do that.


Andrei

February 11, 2014
On 2/9/2014 2:17 AM, Dmitry Olshansky wrote:
> If you can show me how a single unconditional jump propagates error code 4 calls
> up the stack I'm sold.
>
> I do understand it's slow, it's not that slow to make difference in the
> discussed case. It's all about jumping to the wrong conclusions.
>
> To put it in one pitch: it should be possible to throw/catch in excess of 100k
> exceptions per second no problem at all (assuming a single core of some run of
> the mill modern CPU).
>
> Nobody is asking to optimize it better then the normal flow.

It's the table lookup that's inherently slow.

February 17, 2014
Am Sun, 9 Feb 2014 22:24:21 +1100
schrieb "Daniel Murphy" <yebbliesnospam@gmail.com>:

> "Dmitry Olshansky"  wrote in message news:ld7dla$pdg$1@digitalmars.com...
> 
> > > gedit does in fact throw an error message at you
> > > saying "My bad, it's broken UTF-8, I'm giving up!".
> >
> > I know and it's piece of junk :)
> > Seriously it doesn't even has regular expressions for search and replace!
> 
> That would be a luxury, gedit doesn't even have auto-indent.

You can talk about missing features in gedit all day, but from
my point of view an editor is broken when it doesn't throw an
error message at you. By silently replacing incorrect UTF-8
they change the original text.
0xFFFD should probably be used only when error messages are
out of question like when displaying/printing text only.

-- 
Marco

February 17, 2014
Am Sun, 09 Feb 2014 12:18:41 +0400
schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> 09-Feb-2014 09:35, Marco Leise пишет:
> > Thats neither an improvement over calling "validate" nor does that deal with distinguishing between invalid UTF and
> 
> Means text is broken but wasn't ever read...
> >\uFFFD
> > in the input.
> ...means text was broken sometime before.
> 
> Hardly makes any difference to the most applications. Normal text doesn't contain \uFFFD.

Of course it does. It is a valid symbol and a lot of websites describing the "Specials" Unicode block make use of it, like the one on Wikipedia: http://en.wikipedia.org/wiki/Specials_(Unicode_block)

With your definition, pulling such a document from the web and parsing it in D would mean playing on broken strings.

> >> [...]
> >> Every single text editor out there seems to disagree with you: they do
> >> show you partially substituted text, not a dialog box "My bad, it's
> >> broken UTF-8, I'm giving up!".

> > gedit does in fact throw an error message at you
> > saying "My bad, it's broken UTF-8, I'm giving up!".

> I know and it's piece of junk :)
> Seriously it doesn't even has regular expressions for search and replace!

https://yourlogicalfallacyis.com/no-true-scotsman :p

-- 
Marco

February 17, 2014
"Marco Leise"  wrote in message news:20140217030525.67a21dfc@org.homedns.org...

> 0xFFFD should probably be used only when error messages are
> out of question like when displaying/printing text only.

What do you use for displaying text, if not a text editor? 

February 18, 2014
Am Tue, 18 Feb 2014 01:01:53 +1100
schrieb "Daniel Murphy" <yebbliesnospam@gmail.com>:

> "Marco Leise"  wrote in message news:20140217030525.67a21dfc@org.homedns.org...
> 
> > 0xFFFD should probably be used only when error messages are out of question like when displaying/printing text only.
> 
> What do you use for displaying text, if not a text editor?

That was directed at D development. Or programming with Unicode encodings in general. If you load a text file and replace broken UTF-8 with \0xFFFD or ? as Sublime 3 does, you loose information. I think that smells and asks for a big red message box. gedit is an editor that works this way.

What I meant by displaying text is static UI elements, since there is no risk of propagating the error. Everything else that can notify the user of the incorrect encoding or loss of information should do so.

-- 
Marco

February 18, 2014
17-Feb-2014 06:19, Marco Leise пишет:
> Am Sun, 09 Feb 2014 12:18:41 +0400
> schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:
>
>> 09-Feb-2014 09:35, Marco Leise пишет:
>>> Thats neither an improvement over calling "validate" nor does
>>> that deal with distinguishing between invalid UTF and
>>
>> Means text is broken but wasn't ever read...
>>> \uFFFD
>>> in the input.
>> ...means text was broken sometime before.
>>
>> Hardly makes any difference to the most applications.
>> Normal text doesn't contain \uFFFD.
>
> Of course it does. It is a valid symbol and a lot of websites
> describing the "Specials" Unicode block make use of it, like
> the one on Wikipedia:
> http://en.wikipedia.org/wiki/Specials_(Unicode_block)
>
> With your definition, pulling such a document from the web and
> parsing it in D would mean playing on broken strings.

In a sense, \uFFFD means broken encoding. What about lone surrogates? Private use symbols that must not occur in transmission? They all displayed in various Unicode listings. About 'playing on broken strings' - ignoring broken/partially broken strings, I specifically think that it's what most users/use cases want.

A more useful and sensible default of decoding is to substitute on broken encoding. And it's a standard procedure. It's particularly better for displaying text.

To remind: since it's only a decode you are still in the control of original text - in fact you may re-test what bytes are there IF you want.

The way of "throw on bad encoding" could be useful but I hardly see it as what you want for default.

I'm wary of breaking code that relies on throwing. For the moment I think the best course of action would be to introduce xdecode or some such that will do substitution on failure, see how it floats and then change ranges/foreach etc to use xdecode.

>>>> [...]
>>>> Every single text editor out there seems to disagree with you: they do
>>>> show you partially substituted text, not a dialog box "My bad, it's
>>>> broken UTF-8, I'm giving up!".
>
>>> gedit does in fact throw an error message at you
>>> saying "My bad, it's broken UTF-8, I'm giving up!".
>
>> I know and it's piece of junk :)
>> Seriously it doesn't even has regular expressions for search and replace!
>
> https://yourlogicalfallacyis.com/no-true-scotsman :p

Well, gedit is a nice example of why just throwing exception is not good enough for many apps (editors in particular). The fact that it's piece of junk might be irrelevant ;)

-- 
Dmitry Olshansky
February 18, 2014
On 2/18/14, Dmitry Olshansky <dmitry.olsh@gmail.com> wrote:
> Well, gedit is a nice example of why just throwing exception is not good enough for many apps (editors in particular). The fact that it's piece of junk might be irrelevant ;)

OT: Considering how many big-budget events (World Cup / Olympics) do such a poor job at displaying any kind of unicode text (e.g. they frequently display č/ć/đ ad c/c/dj), the only thing that could be worse is a big red dialog box, lol!

February 18, 2014
Am Tue, 18 Feb 2014 12:14:58 +0400
schrieb Dmitry Olshansky <dmitry.olsh@gmail.com>:

> In a sense, \uFFFD means broken encoding.

In a sense yes, in another no. It is a defined code point and it has a symbol: � a diamond with a question mark inside.

> What about lone surrogates?

Those are actual broken encoding.

> Private use symbols that must not occur in transmission?

Then that "transmission" seems to exclude private symbols. It may also exclude special characters like \uFFFD. That's part of the particular protocol and should be handled there.

> They all displayed in various Unicode listings. About 'playing on broken strings' - ignoring broken/partially broken strings, I specifically think that it's what most users/use cases want.
> 
> A more useful and sensible default of decoding is to substitute on broken encoding. And it's a standard procedure. It's particularly better for displaying text.

Correct. I just don't agree that displaying text should the the one true use case and instead prefer exceptions instead of silent loss of information as the default.

> To remind: since it's only a decode you are still in the control of original text - in fact you may re-test what bytes are there IF you want.
> 
> The way of "throw on bad encoding" could be useful but I hardly see it as what you want for default.
> 
> I'm wary of breaking code that relies on throwing. For the moment I think the best course of action would be to introduce xdecode or some such that will do substitution on failure, see how it floats and then change ranges/foreach etc to use xdecode.

We wont convince each other. Let's just stop here.

-- 
Marco

10 11 12 13 14 15 16 17 18 19 20
Next ›   Last »