January 18, 2011
On 2011-01-18 01:16:13 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> On 1/17/11 9:48 PM, Michel Fortin wrote:
>> On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.fortin@michelf.com>
>> said:
>> 
>>> More seriously, you have four choice:
>>> 
>>> 1. code unit
>>> 2. code point
>>> 3. grapheme
>>> 4. require the client to state explicitly which kind of 'character' he
>>> wants; 'character' being an overloaded word, it's reasonable to ask
>>> for disambiguation.
>> 
>> This makes me think of what I did with my XML parser after you made code
>> points the element type for strings. Basically, the parser now uses
>> 'front' and 'popFront' whenever it needs to get the next code point, but
>> most of the time it uses 'frontUnit' and 'popFrontUnit' instead (which I
>> had to add) when testing for or skipping an ASCII character is
>> sufficient. This way I avoid a lot of unnecessary decoding of code points.
>> 
>> For this to work, the same range must let you skip either a unit or a
>> code point. If I were using a separate range with a call to toDchar or
>> toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't
>> have helped much because the new range would essentially become a new
>> slice independent of the original, so you can't interleave "I want to
>> advance by one unit" with "I want to advance by one code point".
>> 
>> So perhaps the best interface for strings would be to provide multiple
>> range-like interfaces that you can use at the level you want.
>> 
>> I'm not sure if this is a good idea, but I thought I should at least
>> share my experience.
> 
> Very insightful. Thanks for sharing. Code it up and make a solid proposal!

What I use right now is this (see below). I'm not sure what would be a good name for it though. The expectation is that I'll get either an ASCII char or something out of ASCII range if it isn't ASCII.

The abstraction doesn't seem very 'solid' to me, in the sense that I can't see how it'd apply to ranges other than strings, so it's only useful for strings (the character array kind), and it's only useful as a workaround since you made ElementType!(char[]) a dchar. Well, any range returning char,dchar,wchar could map frontUnit to front and popFrontUnit to popFront to keep things working, but it makes the optimization rather pointless. I don't really have an idea where to go from here.


char frontUnit(string input) {
	assert(input.length > 0);
	return input[0];
}
wchar frontUnit(wstring input) {
	assert(input.length > 0);
	return input[0];
}
dchar frontUnit(dstring input) {
	assert(input.length > 0);
	return input[0];
}

void popFrontUnit(ref string input) {
	assert(input.length > 0);
	input = input[1..$];
}
void popFrontUnit(ref wstring input) {
	assert(input.length > 0);
	input = input[1..$];
}
void popFrontUnit(ref dstring input) {
	assert(input.length > 0);
	input = input[1..$];
}

version (unittest) {
	import std.string : front, popFront;
}

unittest {
	string test = "été";
	assert(test.length == 5);
	
	string test2 = test;
	assert(test2.front == 'é');
	test2.popFront();
	assert(test2.length == 3); // removed "é" which is two UTF-8 code units
	
	string test3 = test;
	assert(test3.frontUnit == "é"c[0]);
	test3.popFrontUnit();
	assert(test3.length == 4); // removed first half of "é" which, one UTF-8 code units
}


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 18, 2011
On 01/18/2011 03:52 AM, Andrei Alexandrescu wrote:
> On 1/17/11 5:13 PM, spir wrote:
>> On 01/17/2011 07:57 PM, Andrei Alexandrescu wrote:
>>> * Line 130: representing a text as a dchar[][] has its advantages but
>>> major efficiency issues. To be frank I think it's a disaster. I think a
>>> representation building on UTF strings directly is bound to be vastly
>>> better.
>>
>> I don't understand your point. Where is the difference with D's builtin
>> types, then?
>
> Unfortunately I won't have much time to discuss all these points, but
> this is a simple one: using dchar[][] wastes memory and time. You need
> to build on a flatter representation. Don't confuse the abstraction you
> are building with its underlying representation. The difference between
> your abstraction and char[]/wchar[]/dchar[] (which I strongly recommend
> you to build on) is that the abstractions offer different, higher-level
> primitives that the representation doesn't.

I think it is needed to repeat again the following: Text in my view (or whatever variant solution to work correctly with universal text) is _not_ intended as a basic string type, even less default.
If programmers can guarantee all their app's input will ever hold single-codepoint characters only, _or_ if they jst pass pieces of text around without manipulation, then such a tool is big overkill.

It has a time cost a Text construction time, which I consider as an investment. It has also some space & time cost for operations that should be only slightly relevant compared to speed offered by the simple facts routines can then operate just (actualy nearly) like with historic charsets.
Indexing is just normal O(1) indexing, possibly plus producing the result. Not O(n) across the source with building piles along the way. (1000X slower, 1000000X slower?)
Counting is just O(n) with mini-array compares, not building & normalising piles across the whole code sequence. (10X, 100X slower?)

> Let me repeat again: if anyone in this community wants to put work in a
> forward range that iterates one grapheme at a time, that work would be
> very valuable because it will allow us to experiment with graphemes in a
> non-disruptive way while benefiting of a host of algorithms. ByGrapheme
> and friends will help more than defining new string types.

Right. I understand your point-of-view, esp "non-disruptive".
But then, how to avoid the possibly huge inefficiency evoked above? We have no true perf numbers yet, right, for any alternative to Text's approach. But for this reason we also should not randomly speak of this approach's space & time costs. Compared to what?


Denis
_________________
vita es estrany
spir.wikidot.com

January 18, 2011
On 1/18/11 1:58 AM, Steven Wawryk wrote:
> On 18/01/11 16:46, Andrei Alexandrescu wrote:
>> On 1/17/11 9:48 PM, Michel Fortin wrote:
>>> On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.fortin@michelf.com>
>>> said:
>>>
>>>> More seriously, you have four choice:
>>>>
>>>> 1. code unit
>>>> 2. code point
>>>> 3. grapheme
>>>> 4. require the client to state explicitly which kind of 'character' he
>>>> wants; 'character' being an overloaded word, it's reasonable to ask
>>>> for disambiguation.
>>>
>>> This makes me think of what I did with my XML parser after you made code
>>> points the element type for strings. Basically, the parser now uses
>>> 'front' and 'popFront' whenever it needs to get the next code point, but
>>> most of the time it uses 'frontUnit' and 'popFrontUnit' instead (which I
>>> had to add) when testing for or skipping an ASCII character is
>>> sufficient. This way I avoid a lot of unnecessary decoding of code
>>> points.
>>>
>>> For this to work, the same range must let you skip either a unit or a
>>> code point. If I were using a separate range with a call to toDchar or
>>> toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't
>>> have helped much because the new range would essentially become a new
>>> slice independent of the original, so you can't interleave "I want to
>>> advance by one unit" with "I want to advance by one code point".
>>>
>>> So perhaps the best interface for strings would be to provide multiple
>>> range-like interfaces that you can use at the level you want.
>>>
>>> I'm not sure if this is a good idea, but I thought I should at least
>>> share my experience.
>>
>> Very insightful. Thanks for sharing. Code it up and make a solid
>> proposal!
>>
>> Andrei
>
> How does this differ from Steve Schveighoffer's string_t, subtract the
> indexing and slicing of code-points, plus a bidirectional grapheme range?

There's no string, only range...

Andrei

January 18, 2011
On 1/18/11 7:17 AM, Michel Fortin wrote:
> On 2011-01-18 01:16:13 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> said:
>
>> On 1/17/11 9:48 PM, Michel Fortin wrote:
>>> On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.fortin@michelf.com>
>>> said:
>>>
>>>> More seriously, you have four choice:
>>>>
>>>> 1. code unit
>>>> 2. code point
>>>> 3. grapheme
>>>> 4. require the client to state explicitly which kind of 'character' he
>>>> wants; 'character' being an overloaded word, it's reasonable to ask
>>>> for disambiguation.
>>>
>>> This makes me think of what I did with my XML parser after you made code
>>> points the element type for strings. Basically, the parser now uses
>>> 'front' and 'popFront' whenever it needs to get the next code point, but
>>> most of the time it uses 'frontUnit' and 'popFrontUnit' instead (which I
>>> had to add) when testing for or skipping an ASCII character is
>>> sufficient. This way I avoid a lot of unnecessary decoding of code
>>> points.
>>>
>>> For this to work, the same range must let you skip either a unit or a
>>> code point. If I were using a separate range with a call to toDchar or
>>> toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't
>>> have helped much because the new range would essentially become a new
>>> slice independent of the original, so you can't interleave "I want to
>>> advance by one unit" with "I want to advance by one code point".
>>>
>>> So perhaps the best interface for strings would be to provide multiple
>>> range-like interfaces that you can use at the level you want.
>>>
>>> I'm not sure if this is a good idea, but I thought I should at least
>>> share my experience.
>>
>> Very insightful. Thanks for sharing. Code it up and make a solid
>> proposal!
>
> What I use right now is this (see below). I'm not sure what would be a
> good name for it though. The expectation is that I'll get either an
> ASCII char or something out of ASCII range if it isn't ASCII.
>
> The abstraction doesn't seem very 'solid' to me, in the sense that I
> can't see how it'd apply to ranges other than strings, so it's only
> useful for strings (the character array kind), and it's only useful as a
> workaround since you made ElementType!(char[]) a dchar. Well, any range
> returning char,dchar,wchar could map frontUnit to front and popFrontUnit
> to popFront to keep things working, but it makes the optimization rather
> pointless. I don't really have an idea where to go from here.
[snip]

I was thinking along the lines of:

struct Grapheme
{
    private string support_;
    ...
}

struct ByGrapheme
{
    private string iteratee_;
    bool empty();
    Grapheme front();
    void popFront();
    // Additional funs
    dchar frontCodePoint();
    void popFrontCodePoint();
    char frontCodeUnit();
    void popFrontCodeUnit();
    ...
}

// helper function
ByGrapheme byGrapheme(string s);

// usage
string s = ...;
size_t i;
foreach (g; byGrapheme(s))
{
    writeln("Grapheme #", i, " is ", g);
}

We need this range in Phobos.


Andrei
January 18, 2011
On 1/18/11 7:25 AM, spir wrote:
> On 01/18/2011 03:52 AM, Andrei Alexandrescu wrote:
>> On 1/17/11 5:13 PM, spir wrote:
>>> On 01/17/2011 07:57 PM, Andrei Alexandrescu wrote:
>>>> * Line 130: representing a text as a dchar[][] has its advantages but
>>>> major efficiency issues. To be frank I think it's a disaster. I think a
>>>> representation building on UTF strings directly is bound to be vastly
>>>> better.
>>>
>>> I don't understand your point. Where is the difference with D's builtin
>>> types, then?
>>
>> Unfortunately I won't have much time to discuss all these points, but
>> this is a simple one: using dchar[][] wastes memory and time. You need
>> to build on a flatter representation. Don't confuse the abstraction you
>> are building with its underlying representation. The difference between
>> your abstraction and char[]/wchar[]/dchar[] (which I strongly recommend
>> you to build on) is that the abstractions offer different, higher-level
>> primitives that the representation doesn't.
>
> I think it is needed to repeat again the following: Text in my view (or
> whatever variant solution to work correctly with universal text) is
> _not_ intended as a basic string type, even less default.
> If programmers can guarantee all their app's input will ever hold
> single-codepoint characters only, _or_ if they jst pass pieces of text
> around without manipulation, then such a tool is big overkill.
>
> It has a time cost a Text construction time, which I consider as an
> investment. It has also some space & time cost for operations that
> should be only slightly relevant compared to speed offered by the simple
> facts routines can then operate just (actualy nearly) like with historic
> charsets.
> Indexing is just normal O(1) indexing, possibly plus producing the
> result. Not O(n) across the source with building piles along the way.
> (1000X slower, 1000000X slower?)
> Counting is just O(n) with mini-array compares, not building &
> normalising piles across the whole code sequence. (10X, 100X slower?)

You don't provide O(n) indexing.

Andrei
January 18, 2011
On 2011-01-18 11:38:45 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:

> On 1/18/11 7:17 AM, Michel Fortin wrote:
>> On 2011-01-18 01:16:13 -0500, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> said:
>> 
>>> On 1/17/11 9:48 PM, Michel Fortin wrote:
>>>> On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.fortin@michelf.com>
>>>> said:
>>>> 
>>>>> More seriously, you have four choice:
>>>>> 
>>>>> 1. code unit
>>>>> 2. code point
>>>>> 3. grapheme
>>>>> 4. require the client to state explicitly which kind of 'character' he
>>>>> wants; 'character' being an overloaded word, it's reasonable to ask
>>>>> for disambiguation.
>>>> 
>>>> This makes me think of what I did with my XML parser after you made code
>>>> points the element type for strings. Basically, the parser now uses
>>>> 'front' and 'popFront' whenever it needs to get the next code point, but
>>>> most of the time it uses 'frontUnit' and 'popFrontUnit' instead (which I
>>>> had to add) when testing for or skipping an ASCII character is
>>>> sufficient. This way I avoid a lot of unnecessary decoding of code
>>>> points.
>>>> 
>>>> For this to work, the same range must let you skip either a unit or a
>>>> code point. If I were using a separate range with a call to toDchar or
>>>> toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't
>>>> have helped much because the new range would essentially become a new
>>>> slice independent of the original, so you can't interleave "I want to
>>>> advance by one unit" with "I want to advance by one code point".
>>>> 
>>>> So perhaps the best interface for strings would be to provide multiple
>>>> range-like interfaces that you can use at the level you want.
>>>> 
>>>> I'm not sure if this is a good idea, but I thought I should at least
>>>> share my experience.
>>> 
>>> Very insightful. Thanks for sharing. Code it up and make a solid
>>> proposal!
>> 
>> What I use right now is this (see below). I'm not sure what would be a
>> good name for it though. The expectation is that I'll get either an
>> ASCII char or something out of ASCII range if it isn't ASCII.
>> 
>> The abstraction doesn't seem very 'solid' to me, in the sense that I
>> can't see how it'd apply to ranges other than strings, so it's only
>> useful for strings (the character array kind), and it's only useful as a
>> workaround since you made ElementType!(char[]) a dchar. Well, any range
>> returning char,dchar,wchar could map frontUnit to front and popFrontUnit
>> to popFront to keep things working, but it makes the optimization rather
>> pointless. I don't really have an idea where to go from here.
> [snip]
> 
> I was thinking along the lines of:
> 
> struct Grapheme
> {
>      private string support_;
>      ...
> }
> 
> struct ByGrapheme
> {
>      private string iteratee_;
>      bool empty();
>      Grapheme front();
>      void popFront();
>      // Additional funs
>      dchar frontCodePoint();
>      void popFrontCodePoint();
>      char frontCodeUnit();
>      void popFrontCodeUnit();
>      ...
> }
> 
> // helper function
> ByGrapheme byGrapheme(string s);
> 
> // usage
> string s = ...;
> size_t i;
> foreach (g; byGrapheme(s))
> {
>      writeln("Grapheme #", i, " is ", g);
> }
> 
> We need this range in Phobos.

Yes, we need a grapheme range.

But that's not what my thing was about. It was about shortcutting code point decoding when it isn't necessary while still keeping the ability to decode to code points when iterating on the same range. For instance, here's a simple made up example:

	string s = "<hello>";
	if (!s.empty && s.frontUnit == '<')
		s.popFrontUnit(); // skip
	while (!s.empty && s.frontUnit != '>')
		s.popFront(); // do something with each code point
	if (!s.empty && s.frontUnit == '>')
		s.popFrontUnit(); // skip
	assert(s.empty);

Here, since I know I'm testing and skipping for '<', an ASCII character, decoding the code point is wasted time, so I skip that decoding. The problem is that this optimization can't happen with a range that abstracts things at the code point level. I can do it with strings because strings still allow you to access code units through the indexing operators, but this can't really apply to ranges of code points in general.

And parsing with range of code unit would also be a pain, because even if I'm testing for '<' for the first character, sometimes I really need to advance by code point and test for code points.

One thing that might be interesting is benchmarking my XML parser by replacing every instance of frontUnit and popFrontUnit with front and popFront. That won't change there results, but it'd give us an idea of the overhead of the unnecessary decoded characters code points.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

January 18, 2011
On 01/18/2011 06:14 PM, Michel Fortin wrote:

On 2011-01-18 11:38:45 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> said:
>> I was thinking along the lines of:
>>
>> struct Grapheme
>> {
>> private string support_;
>> ...
>> }
>>
>> struct ByGrapheme
>> {
>> private string iteratee_;
>> bool empty();
>> Grapheme front();
>> void popFront();
>> // Additional funs
>> dchar frontCodePoint();
>> void popFrontCodePoint();
>> char frontCodeUnit();
>> void popFrontCodeUnit();
>> ...
>> }
>>
>> // helper function
>> ByGrapheme byGrapheme(string s);
>>
>> // usage
>> string s = ...;
>> size_t i;
>> foreach (g; byGrapheme(s))
>> {
>> writeln("Grapheme #", i, " is ", g);
>> }
>>
>> We need this range in Phobos.
>
> Yes, we need a grapheme range.
>
> But that's not what my thing was about. It was about shortcutting code
> point decoding when it isn't necessary while still keeping the ability
> to decode to code points when iterating on the same range. For instance,
> here's a simple made up example:
>
> string s = "<hello>";
> if (!s.empty && s.frontUnit == '<')
> s.popFrontUnit(); // skip
> while (!s.empty && s.frontUnit != '>')
> s.popFront(); // do something with each code point
> if (!s.empty && s.frontUnit == '>')
> s.popFrontUnit(); // skip
> assert(s.empty);
>
> Here, since I know I'm testing and skipping for '<', an ASCII character,
> decoding the code point is wasted time, so I skip that decoding. The
> problem is that this optimization can't happen with a range that
> abstracts things at the code point level. I can do it with strings
> because strings still allow you to access code units through the
> indexing operators, but this can't really apply to ranges of code points
> in general.
>
> And parsing with range of code unit would also be a pain, because even
> if I'm testing for '<' for the first character, sometimes I really need
> to advance by code point and test for code points.

This means a single string type that exposes various _synchrone_ range levels (codeunit, codepoint, grapheme), doesn't it? As opposed to Andrei's approach of ranges beeing structures external to string types, IIUC, which thus move on independantly?

> One thing that might be interesting is benchmarking my XML parser by
> replacing every instance of frontUnit and popFrontUnit with front and
> popFront. That won't change there results, but it'd give us an idea of
> the overhead of the unnecessary decoded characters code points.

Yes, would you have time to do it? I would be interesting in such perf measurements. (--> your idea about a Text variant, for which I would like to know whether it's worth still decoding systematically.)

Denis
_________________
vita es estrany
spir.wikidot.com


January 19, 2011
On 19/01/11 02:40, Andrei Alexandrescu wrote:
> On 1/18/11 1:58 AM, Steven Wawryk wrote:
>> On 18/01/11 16:46, Andrei Alexandrescu wrote:
>>> On 1/17/11 9:48 PM, Michel Fortin wrote:
>>>> This makes me think of what I did with my XML parser after you made
>>>> code
>>>> points the element type for strings. Basically, the parser now uses
>>>> 'front' and 'popFront' whenever it needs to get the next code point,
>>>> but
>>>> most of the time it uses 'frontUnit' and 'popFrontUnit' instead
>>>> (which I
>>>> had to add) when testing for or skipping an ASCII character is
>>>> sufficient. This way I avoid a lot of unnecessary decoding of code
>>>> points.
>>>>
>>>> For this to work, the same range must let you skip either a unit or a
>>>> code point. If I were using a separate range with a call to toDchar or
>>>> toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't
>>>> have helped much because the new range would essentially become a new
>>>> slice independent of the original, so you can't interleave "I want to
>>>> advance by one unit" with "I want to advance by one code point".
>>>>
>>>> So perhaps the best interface for strings would be to provide multiple
>>>> range-like interfaces that you can use at the level you want.
>>>>
>>>> I'm not sure if this is a good idea, but I thought I should at least
>>>> share my experience.
>>>
>>> Very insightful. Thanks for sharing. Code it up and make a solid
>>> proposal!
>>>
>>> Andrei
>>
>> How does this differ from Steve Schveighoffer's string_t, subtract the
>> indexing and slicing of code-points, plus a bidirectional grapheme range?
>
> There's no string, only range...

Which is exactly what I asked you about.  I understand that you must be very busy,  But how do I get you to look at the actual technical content of something?  Is there something in the way I phrase thing that you dismiss my introductory motivation without looking into the content?

I don't mean this as a criticism.  I really want to know because I'm considering a proposal on a different topic but wasn't sure it's worth it as there seems to be a barrier to getting things considered.

January 19, 2011
On 1/18/11 6:00 PM, Steven Wawryk wrote:
> On 19/01/11 02:40, Andrei Alexandrescu wrote:
>> On 1/18/11 1:58 AM, Steven Wawryk wrote:
>>> On 18/01/11 16:46, Andrei Alexandrescu wrote:
>>>> On 1/17/11 9:48 PM, Michel Fortin wrote:
>>>>> This makes me think of what I did with my XML parser after you made
>>>>> code
>>>>> points the element type for strings. Basically, the parser now uses
>>>>> 'front' and 'popFront' whenever it needs to get the next code point,
>>>>> but
>>>>> most of the time it uses 'frontUnit' and 'popFrontUnit' instead
>>>>> (which I
>>>>> had to add) when testing for or skipping an ASCII character is
>>>>> sufficient. This way I avoid a lot of unnecessary decoding of code
>>>>> points.
>>>>>
>>>>> For this to work, the same range must let you skip either a unit or a
>>>>> code point. If I were using a separate range with a call to toDchar or
>>>>> toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't
>>>>> have helped much because the new range would essentially become a new
>>>>> slice independent of the original, so you can't interleave "I want to
>>>>> advance by one unit" with "I want to advance by one code point".
>>>>>
>>>>> So perhaps the best interface for strings would be to provide multiple
>>>>> range-like interfaces that you can use at the level you want.
>>>>>
>>>>> I'm not sure if this is a good idea, but I thought I should at least
>>>>> share my experience.
>>>>
>>>> Very insightful. Thanks for sharing. Code it up and make a solid
>>>> proposal!
>>>>
>>>> Andrei
>>>
>>> How does this differ from Steve Schveighoffer's string_t, subtract the
>>> indexing and slicing of code-points, plus a bidirectional grapheme
>>> range?
>>
>> There's no string, only range...
>
> Which is exactly what I asked you about. I understand that you must be
> very busy, But how do I get you to look at the actual technical content
> of something? Is there something in the way I phrase thing that you
> dismiss my introductory motivation without looking into the content?
>
> I don't mean this as a criticism. I really want to know because I'm
> considering a proposal on a different topic but wasn't sure it's worth
> it as there seems to be a barrier to getting things considered.

One simple fact is that I'm not the only person who needs to look at a design. If you want to propose something for inclusion in Phobos, please put the code in good shape, document it properly, and make a submission in this newsgroup following the Boost model. I get one vote and everyone else gets a vote.

Looking back at our exchanges in search for a perceived dismissive attitude on my part (apologies if it seems that way - it was unintentional), I infer your annoyance stems from my answer to this:

>>> How does this differ from Steve Schveighoffer's string_t,
>>> subtract the indexing and slicing of code-points, plus a
>>> bidirectional grapheme range?

I happen to have discussed at length my beef with Steve's proposal. Now in one sentence you change the proposed design on the fly without fleshing out the consequences, add to it again without substantiation, and presumably expect me to come with a salient analysis of the result. I don't think it's fair to characterize my answer to that as dismissive, nor to pressure me into expanding on it.

Finally, let me say again what I already said for a few times: in order to experiment with grapheme-based processing, we need a byGrapheme range. There is no need for a new string class. We need a range over the existing string types. That would allow us to play with graphemes, assess their efficiency and ubiquity, and would ultimately put us in a better position when it comes to deciding whether it makes sense to make grapheme a character type or the default character type.


Andrei
January 19, 2011
On 19/01/11 11:37, Andrei Alexandrescu wrote:
> On 1/18/11 6:00 PM, Steven Wawryk wrote:
>> Which is exactly what I asked you about. I understand that you must be
>> very busy, But how do I get you to look at the actual technical content
>> of something? Is there something in the way I phrase thing that you
>> dismiss my introductory motivation without looking into the content?
>>
>> I don't mean this as a criticism. I really want to know because I'm
>> considering a proposal on a different topic but wasn't sure it's worth
>> it as there seems to be a barrier to getting things considered.
>
> One simple fact is that I'm not the only person who needs to look at a
> design. If you want to propose something for inclusion in Phobos, please
> put the code in good shape, document it properly, and make a submission
> in this newsgroup following the Boost model. I get one vote and everyone
> else gets a vote.

Ok, thanks for this suggestion.  But if developing a proposal as concrete code is a lot of work that may be rejected, is there a way to sound out the idea first before deciding to commit to developing it?


> Looking back at our exchanges in search for a perceived dismissive
> attitude on my part (apologies if it seems that way - it was
> unintentional), I infer your annoyance stems from my answer to this:
>
>>>> How does this differ from Steve Schveighoffer's string_t,
>>>> subtract the indexing and slicing of code-points, plus a
>>>> bidirectional grapheme range?

No, this was just a summary.  Here is the post that you answered dismissively: news://news.digitalmars.com:119/ih030g$1ok1$1@digitalmars.com

>
> In the interest of moving this on, would it become acceptable to you if:
>
> 1. indexing and slicing of the code-point range were removed?
> 2. any additional ranges are exposed to the user according to decisions
> made about graphemes, etc?
> 3. other constructive criticisms were accommodated?
>
> Steve
>
>
> On 15/01/11 03:33, Andrei Alexandrescu wrote:
>> On 1/14/11 5:06 AM, Steven Schveighoffer wrote:
>>> I respectfully disagree. A stream built on fixed-sized units, but with
>>> variable length elements, where you can determine the start of an
>>> element in O(1) time given a random index absolutely provides
>>> random-access. It just doesn't provide length.
>>
>> I equally respectfully disagree. I think random access is defined as
>> accessing the ith element in O(1) time. That's not the case here.
>>
>> Andrei
>


> I happen to have discussed at length my beef with Steve's proposal. Now
> in one sentence you change the proposed design on the fly without
> fleshing out the consequences, add to it again without substantiation,
> and presumably expect me to come with a salient analysis of the result.
> I don't think it's fair to characterize my answer to that as dismissive,
> nor to pressure me into expanding on it.

Sorry, I could have given more context.  But you didn't discuss what I asked, based on the observation that your detailed criticisms of Steve's proposal all related to a single aspect of it.

Steve