Why UTF-8/16 character encodings? (page 10) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Why UTF-8/16 character encodings? (page 10)

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by Joakim
in reply to Marcin Mstowski

Joakim

Posted in reply to Marcin Mstowski

On Sunday, 26 May 2013 at 21:08:40 UTC, Marcin Mstowski wrote:
> On Sun, May 26, 2013 at 9:42 PM, Joakim <joakim@airpost.net> wrote:
>> Also, one of the first pages talks about representations of floating point
>> and integer numbers, which are outside the purview of the text encodings
>> we're talking about.
>
>
> They are outside of scope of CDRA too. At least read picture description
> before making out of context assumptions.
Which picture description did you have in mind?  They all seem fairly generic.  I do see now that one paragraph does say that CDRA only deals with graphical characters and that they were only talking about numbers earlier to introduce the topic of data representation.

>> If you can show that it is materially similar to my single-byte encoding
>> idea, it might be worth looking into.
>>
>
> Spending ~15 min to read Introduction isn't worth your time, so why should
> i waste my time showing you anything ?
You claimed that my encoding was reinventing the wheel, therefore the onus is on you to show which of the multiple encodings CDRA uses that I'm reinventing.  I'm not interested in delving into the docs for some dead IBM format to prove _your_ point.  More likely, you are just dead wrong and CDRA simply uses code pages, which are not the same as the single-byte encoding with a header idea that I've sketched in this thread.

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by John Colvin
in reply to Joakim

John Colvin

Posted in reply to Joakim

On Monday, 27 May 2013 at 06:11:20 UTC, Joakim wrote:
> You claimed that my encoding was reinventing the wheel, therefore the onus is on you to show which of the multiple encodings CDRA uses that I'm reinventing.  I'm not interested in delving into the docs for some dead IBM format to prove _your_ point.

It's your idea and project. Showing that it is original / doing your research on previous efforts is probably something that *you* should do, whether or not it's someone else's "point".

> More likely, you are just dead wrong and CDRA simply uses code pages
Based on what?

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by Joakim
in reply to John Colvin

Joakim

Posted in reply to John Colvin

On Monday, 27 May 2013 at 12:25:06 UTC, John Colvin wrote:
> On Monday, 27 May 2013 at 06:11:20 UTC, Joakim wrote:
>> You claimed that my encoding was reinventing the wheel, therefore the onus is on you to show which of the multiple encodings CDRA uses that I'm reinventing.  I'm not interested in delving into the docs for some dead IBM format to prove _your_ point.
>
> It's your idea and project. Showing that it is original / doing your research on previous efforts is probably something that *you* should do, whether or not it's someone else's "point".
Sure, some research is necessary.  However, software is littered with past projects that never really got started or bureaucratic efforts, like CDRA appears to be, that never went anywhere.  I can hardly be expected to go rummaging through all these efforts in the hopes that what, someone else has already written the code?  If you have a brain, you can look at the currently popular approaches, which CDRA isn't, and come up with something that makes more sense.  I don't much care if my idea is original, I care that it is better.

>> More likely, you are just dead wrong and CDRA simply uses code pages
> Based on what?
Based on the fact that his link lists EBCDIC and several other antiquated code page encodings in its list of proposed encodings.  If Marcin believes one of those is similar to my scheme, he should say which one, otherwise his entire line of argument is irrelevant.  It's not up to me to prove _his_ point.

Without having looked any of the encodings in detail, I'm fairly certain he's wrong.  If he feels otherwise, he can pipe up with which one he had in mind.  The fact that he hasn't speaks volumes.

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by H. S. Teoh
in reply to Wyatt

H. S. Teoh

Posted in reply to Wyatt

On Mon, May 27, 2013 at 04:17:06AM +0200, Wyatt wrote:
> On Sunday, 26 May 2013 at 21:23:44 UTC, H. S. Teoh wrote:
> >I have been thinking about this idea of a "reprogrammable keyboard", in that the keys are either a fixed layout with LCD labels on each key, or perhaps the whole thing is a long touchscreen, that allows arbitrary relabelling of keys (or, in the latter case, complete dynamic reconfiguration of layout). There would be some convenient way to switch between layouts, say a scrolling sidebar or roller dial of some sort, so you could, in theory, type Unicode directly.
> >
> >I haven't been able to refine this into an actual, implementable idea, though.
> >
> I've given this domain a fair bit of thought, and from my
> perspective you want to throw hardware at a software problem.  Have
> you ever used a Japanese input method?  They're sort of a good
> exemplar here, wherein you type a sequence and then hit space to
> cycle through possible ways of writing it.  So "ame" can become,
> あめ, 雨, 飴, etc.  Right now, in addition to my learning, I also
> use it for things like α (アルファ) and Δ (デルタ).  It's limited,
> but...usable, I guess.  Sort of.
> 
> The other end of this is TeX, which was designed around the idea of composing scientific texts with a high degree of control and flexibility.  Specialty characters are inserted with backslash-escapes, like \alpha, \beta, etc.
> 
> Now combine the two:  An input method that outputs as usual, until
> you enter a character code which is substituted in real time to what
> you actually want.
> Example:
> "values of \beta will give rise to dom!" composes as
> "values of β will give rise to dom!"
> 
> No hardware required; just a smarter IME.  Like maybe this one: http://www.andonyar.com/rec/2008-03/mathinput/ (I'm honestly not yet sure how mature or usable that one is as I'm a UIM user, but it does serve as a proof of concept).

I like this idea. It's certainly more feasible than reinventing the Optimus Maximus keyboard. :) I can write code for free, but engineering custom hardware is a bit beyond my abilities (and means!).

If we go the software route, then one possible strategy might be:

- Have a default mode that is whatever your default keyboard layout is
  (the usual 100+-key layout, or DVORAK, whatever.).

- Assign one or two escape keys (not to be confused with the Esc key,
  which is something else) that allows you to switch mode.

   - Under the 1-key scheme, you'd use it to begin sequences like \beta,
     except that instead of the backslash \, you're using a dedicated
     key. These sequences can include individual characters (e.g.
     <ESC>beta == β) or allow you to change the current input mode (e.g.
     <ESC>grk to switch to a Greek layout that takes effect from that
     point onwards until you enter, say, <ESC>eng). For convenience, the
     sequence <ESC><ESC> can be shorthand for switching back to whatever
     the default layout is, so that if you mistype an escape sequence
     and end up in some strange unexpected layout mode, hitting <ESC>
     twice will reset it back to the default.

   - Under the 2-key scheme, you'd have one key dedicated for the
     occasional foreign character (<ESC1>beta == β), and the second key
     dedicated for switching layouts (thus allowing shorter sequences
     for switching between languages without fear of conflicting with
     single-character sequences, e.g., <ESC2>g for Greek).

Perhaps the 1-key scheme is the simplest to implement. The capslock key is a good candidate, being conveniently located where your left little finger is, and having no real useful function in this day and age.

The only drawback is no custom key labels. But perhaps that can be alleviated by hooking an escape sequence to toggle an on-screen visual representation of the current layout. Maybe <ESC>? can be assigned to invoke a helper utility that renders the current layout on the screen.


T

-- 
Don't get stuck in a closet---wear yourself out.

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by Vladimir Panteleev
in reply to Wyatt

Vladimir Panteleev

Posted in reply to Wyatt

On Monday, 27 May 2013 at 02:17:08 UTC, Wyatt wrote:
> No hardware required; just a smarter IME.

Perhaps something like the compose key?

http://en.wikipedia.org/wiki/Compose_key

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by H. S. Teoh
in reply to Vladimir Panteleev

H. S. Teoh

Posted in reply to Vladimir Panteleev

On Mon, May 27, 2013 at 09:59:52PM +0200, Vladimir Panteleev wrote:
> On Monday, 27 May 2013 at 02:17:08 UTC, Wyatt wrote:
> >No hardware required; just a smarter IME.
> 
> Perhaps something like the compose key?
> 
> http://en.wikipedia.org/wiki/Compose_key

I'm already using the compose key. But it only goes so far (I don't think compose key sequences cover all of unicode). Besides, it's impractical to use compose key sequences to write large amounts of text in some given language; a method of temporarily switching to a different layout is necessary.

T

-- 
Тише едешь, дальше будешь.

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by Vladimir Panteleev
in reply to H. S. Teoh

Vladimir Panteleev

Posted in reply to H. S. Teoh

On Monday, 27 May 2013 at 21:24:15 UTC, H. S. Teoh wrote:
> Besides, it's impractical to use compose key sequences to write large amounts of text in some given language; a method of temporarily switching to a different layout is necessary.

I thought the topic was typing the occasional Unicode character to use as an operator in D programs?

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by H. S. Teoh
in reply to Vladimir Panteleev

H. S. Teoh

Posted in reply to Vladimir Panteleev

On Tue, May 28, 2013 at 12:04:52AM +0200, Vladimir Panteleev wrote:
> On Monday, 27 May 2013 at 21:24:15 UTC, H. S. Teoh wrote:
> >Besides, it's impractical to use compose key sequences to write large amounts of text in some given language; a method of temporarily switching to a different layout is necessary.
> 
> I thought the topic was typing the occasional Unicode character to use as an operator in D programs?

Well, D *does* support non-English identifiers, y'know... for example:

	void main(string[] args) {
		int число = 1;
		foreach (и; 0..100)
			число += и;
		writeln(число);
	}

Of course, whether that's a good practice is a different story. :)

But for operators, you still need enough compose key sequences to cover all of the Unicode operators -- and there are a LOT of them -- which I don't think is currently done anywhere. You'd have to make your own compose key maps to do it.

T

-- 
Freedom: (n.) Man's self-given right to be enslaved by his own depravity.

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by Simen Kjaeraas

Simen Kjaeraas

On Tue, 28 May 2013 00:18:31 +0200, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Tue, May 28, 2013 at 12:04:52AM +0200, Vladimir Panteleev wrote:
>> On Monday, 27 May 2013 at 21:24:15 UTC, H. S. Teoh wrote:
>> >Besides, it's impractical to use compose key sequences to write
>> >large amounts of text in some given language; a method of
>> >temporarily switching to a different layout is necessary.
>>
>> I thought the topic was typing the occasional Unicode character to
>> use as an operator in D programs?
>
> Well, D *does* support non-English identifiers, y'know... for example:
>
> 	void main(string[] args) {
> 		int число = 1;
> 		foreach (и; 0..100)
> 			число += и;
> 		writeln(число);
> 	}
>
> Of course, whether that's a good practice is a different story. :)
>
> But for operators, you still need enough compose key sequences to cover
> all of the Unicode operators -- and there are a LOT of them -- which I
> don't think is currently done anywhere. You'd have to make your own
> compose key maps to do it.


The Fortress programming language has some 900 or so operators:

https://java.net/projects/projectfortress/sources/sources/content/Specification/fortress.1.0.pdf?rev=5558

Appendix C, and

https://java.net/projects/projectfortress/sources/sources/content/Documentation/Specification/fortress.pdf?rev=5558

chapter 14


-- 
Simen

May 27, 2013

Re: Why UTF-8/16 character encodings?

Posted by Walter Bright
in reply to H. S. Teoh

Walter Bright

Posted in reply to H. S. Teoh

On 5/27/2013 3:18 PM, H. S. Teoh wrote:
> Well, D *does* support non-English identifiers, y'know... for example:
>
> 	void main(string[] args) {
> 		int число = 1;
> 		foreach (и; 0..100)
> 			число += и;
> 		writeln(число);
> 	}
>
> Of course, whether that's a good practice is a different story. :)

I've recently come to the opinion that that's a bad idea, and D should not support it.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation