What type does byGrapheme() return? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » What type does byGrapheme() return?

Thread overview

What type does byGrapheme() return?
Dec 27, 2019 Robert M. Münch
Dec 27, 2019 Steven Schveighoffer
Dec 29, 2019 Robert M. Münch
Dec 27, 2019 H. S. Teoh
Dec 29, 2019 Robert M. Münch
Dec 30, 2019 H. S. Teoh
Dec 30, 2019 H. S. Teoh
Dec 31, 2019 Steven Schveighoffer
Dec 31, 2019 H. S. Teoh
Dec 31, 2019 Steven Schveighoffer
Dec 31, 2019 H. S. Teoh
Dec 31, 2019 Steven Schveighoffer
Dec 31, 2019 H. S. Teoh
Jan 04, 2020 Robert M. Münch
Jan 05, 2020 H. S. Teoh
Jan 06, 2020 Robert M. Münch
Jan 06, 2020 Alex
Dec 30, 2019 H. S. Teoh
Dec 30, 2019 Alexandru Ermicioi

December 27, 2019

What type does byGrapheme() return?

Posted by Robert M. Münch

Robert M. Münch

I love these documentation lines in the D docs:

	auto byGrapheme(Range)(Range range)

How should I know what auto is? Why not write the explicit type so that I know what to expect? When declaring a variable as class/struct member I can't use auto but need the explicit type...

I used typeof() but that doesn't help a lot:

gText = [Grapheme(53, 0, 0, 72057594037927936, [83, ...., 1)]Result!string

I want to iterate a string byGrapheme so that I can add, delete, change graphemes.

-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

December 27, 2019

Re: What type does byGrapheme() return?

Posted by Steven Schveighoffer
in reply to Robert M. Münch

Steven Schveighoffer

Posted in reply to Robert M. Münch

On 12/27/19 12:26 PM, Robert M. Münch wrote:
> I love these documentation lines in the D docs:
> 
>      auto byGrapheme(Range)(Range range)
> 
> How should I know what auto is? Why not write the explicit type so that I know what to expect? When declaring a variable as class/struct member I can't use auto but need the explicit type...
> 
> I used typeof() but that doesn't help a lot:
> 
> gText = [Grapheme(53, 0, 0, 72057594037927936, [83, ...., 1)]Result!string
> 
> I want to iterate a string byGrapheme so that I can add, delete, change graphemes.
> 

This is the rub with ranges. You need to use typeof. There's no other way to do it, because the type returned by byGrapheme depends on the type of Range.

If you know what type Range is, it would be:

struct S
{
   typeof(string.init.byGrapheme()) gText;
   // or
   alias GRange = typeof(string.init.byGrapheme());
   GRange gText;
}

Subbing in whatever your real range for `string`. Or if it's the result of a bunch of adapters, use the whole call chain with typeof.

Why not just declare the real range type? Because it's going to be ugly, especially if your underlying range is the result of other range algorithms. And really, typeof is going to be the better mechanism, even if it's not the best looking thing.

-Steve

December 27, 2019

Re: What type does byGrapheme() return?

Posted by H. S. Teoh
in reply to Robert M. Münch

H. S. Teoh

Posted in reply to Robert M. Münch

On Fri, Dec 27, 2019 at 06:26:58PM +0100, Robert M. Münch via Digitalmars-d-learn wrote:
> I love these documentation lines in the D docs:
> 
> 	auto byGrapheme(Range)(Range range)
> 
> How should I know what auto is? Why not write the explicit type so that I know what to expect? When declaring a variable as class/struct member I can't use auto but need the explicit type...
> 
> I used typeof() but that doesn't help a lot:
> 
> gText = [Grapheme(53, 0, 0, 72057594037927936, [83, ...., 1)]Result!string
> 
> I want to iterate a string byGrapheme so that I can add, delete, change graphemes.
[...]

Since graphemes are variable-length in terms of code points, you can't exactly *edit* a range of graphemes -- you can't replace a 1-codepoint grapheme with a 6-codepoint grapheme, for example, since there's no space in the underlying string to store that.

If you want to add/delete/change graphemes, what you *really* want is to use an array of Graphemes:

	Grapheme[] editableGraphs;

You can then splice it, insert stuff, delete stuff, whatever.

When you're done with it, convert it back to string with something like this:

	string result = editableGraphs.map!(g => g[]).joiner.to!string;

T

-- 
The irony is that Bill Gates claims to be making a stable operating system and Linus Torvalds claims to be trying to take over the world. -- Anonymous

December 29, 2019

Re: What type does byGrapheme() return?

Posted by Robert M. Münch
in reply to Steven Schveighoffer

Robert M. Münch

Posted in reply to Steven Schveighoffer

On 2019-12-27 17:54:28 +0000, Steven Schveighoffer said:

> This is the rub with ranges. You need to use typeof. There's no other way to do it, because the type returned by byGrapheme depends on the type of Range.

Hi, ok, thanks a lot and IMO these are the fundamental important information for people using D (beginners, causual programmers, etc.) to understand how things fit together.

> If you know what type Range is, it would be:
> 
> struct S
> {
>     typeof(string.init.byGrapheme()) gText;
>     // or
>     alias GRange = typeof(string.init.byGrapheme());
>     GRange gText;
> }

Ah... I didn't know that I can use a basic type "string" combined with ".init" to manually build the type. Neat...

> Subbing in whatever your real range for `string`. Or if it's the result of a bunch of adapters, use the whole call chain with typeof.

Ok, and these are good candidates for alias definitions to avoid re-typing it many times.

> Why not just declare the real range type? Because it's going to be ugly, especially if your underlying range is the result of other range algorithms. And really, typeof is going to be the better mechanism, even if it's not the best looking thing.

I think I got it... thanks a lot.

-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

December 29, 2019

Re: What type does byGrapheme() return?

Posted by Robert M. Münch
in reply to H. S. Teoh

Robert M. Münch

Posted in reply to H. S. Teoh

On 2019-12-27 19:44:59 +0000, H. S. Teoh said:

> Since graphemes are variable-length in terms of code points, you can't
> exactly *edit* a range of graphemes -- you can't replace a 1-codepoint
> grapheme with a 6-codepoint grapheme, for example, since there's no
> space in the underlying string to store that.

Hi, my idea was that when I use a grapheme range, it will abstract away that graphemes consist of different sized code-points. And the docs at https://dlang.org/phobos/std_uni.html#byGrapheme show an example using this kind of range:

auto gText = text.byGrapheme;

gText.take(3);
gText.drop(3);

But maybe I need to get a better understanding of the ranges stuff too...

> If you want to add/delete/change graphemes, what you *really* want is to
> use an array of Graphemes:
> 
> 	Grapheme[] editableGraphs;
> 
> You can then splice it, insert stuff, delete stuff, whatever.
> 
> When you're done with it, convert it back to string with something like
> this:
> 
> 	string result = editableGraphs.map!(g => g[]).joiner.to!string;

I played around with this approach...

string r1 = "Robert M. Münch";
	// Code-Units  = 16
	// Code-Points = 15
	// Graphemes   = 15

Grapheme[] gr1 = r1.byGrapheme.array;
writeln(" Text = ", gr1.map!(g => g[]).joiner.to!string);
	//  Text = obert M. Münch
writeln("wText = ", gr1.map!(g => g[]).joiner.to!wstring);
	//  wText = obert M. Münch
writeln("dText = ", gr1.map!(g => g[]).joiner.to!dstring);
	//  dText = obert M. Münch

Why is the first letter missing? Is this a bug?

-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster

December 30, 2019

Re: What type does byGrapheme() return?

Posted by Alexandru Ermicioi
in reply to Robert M. Münch

Alexandru Ermicioi

Posted in reply to Robert M. Münch

On Friday, 27 December 2019 at 17:26:58 UTC, Robert M. Münch wrote:
> ...

There are set of range interfaces that can be used to mask range type. Check for https://dlang.org/library/std/range/interfaces/input_range.html for starting point, and for https://dlang.org/library/std/range/interfaces/input_range_object.html for wrapping any range to those interfaces.

Note: resulting wrapped range is an object and has reference semantics, beware of using it directly with other range algorithms as they can consume your range.

Best regards,
Alexandru.

December 30, 2019

Re: What type does byGrapheme() return?

Posted by H. S. Teoh
in reply to Robert M. Münch

H. S. Teoh

Posted in reply to Robert M. Münch

On Sun, Dec 29, 2019 at 01:19:09PM +0100, Robert M. Münch via Digitalmars-d-learn wrote:
> On 2019-12-27 19:44:59 +0000, H. S. Teoh said:
[...]
> > If you want to add/delete/change graphemes, what you *really* want is to use an array of Graphemes:
> > 
> > 	Grapheme[] editableGraphs;
> > 
> > You can then splice it, insert stuff, delete stuff, whatever.
> > 
> > When you're done with it, convert it back to string with something like this:
> > 
> > 	string result = editableGraphs.map!(g => g[]).joiner.to!string;
> 
> I played around with this approach...
> 
> string r1 = "Robert M. Münch";
> 	// Code-Units  = 16
> 	// Code-Points = 15
> 	// Graphemes   = 15
> 
> Grapheme[] gr1 = r1.byGrapheme.array;
> writeln(" Text = ", gr1.map!(g => g[]).joiner.to!string);
> 	//  Text = obert M. Münch
> writeln("wText = ", gr1.map!(g => g[]).joiner.to!wstring);
> 	//  wText = obert M. Münch
> writeln("dText = ", gr1.map!(g => g[]).joiner.to!dstring);
> 	//  dText = obert M. Münch
> 
> Why is the first letter missing? Is this a bug?
[...]

I suspect there's a scope-related bug/issue somewhere here.  I did some experiments and discovered that using foreach to iterate over a Grapheme[] is OK, but somehow when using Grapheme[] with .map to slice over each one, I get random UTF-8 encoding errors and missing characters.

I suspect the cause is that whatever Grapheme.opSlice returns is going out-of-scope when used with .map, that's why it's malfunctioning. The last time I looked at the Grapheme code, there's a bunch of memory-related stuff involving dtors that's *probably* the cause of this problem.

Please file a bug for this.

T

-- 
Life is complex. It consists of real and imaginary parts. -- YHL

December 30, 2019

Re: What type does byGrapheme() return?

Posted by H. S. Teoh

H. S. Teoh

On Mon, Dec 30, 2019 at 03:09:58PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]
> I suspect the cause is that whatever Grapheme.opSlice returns is going out-of-scope when used with .map, that's why it's malfunctioning.
[...]

Haha, it's actually right there in the Grapheme docs for the opSlice overloads:

        Random-access range over Grapheme's $(CHARACTERS).

        Warning: Invalidates when this Grapheme leaves the scope,
        attempts to use it then would lead to memory corruption.

Looks like when you use .map over the Grapheme, it gets copied into a temporary, which gets invalidated when map.front returns.  Somewhere we're missing a 'scope' qualifier...


T

-- 
War doesn't prove who's right, just who's left. -- BSD Games' Fortune

December 30, 2019

Re: What type does byGrapheme() return?

Posted by H. S. Teoh

H. S. Teoh

On Mon, Dec 30, 2019 at 03:31:31PM -0800, H. S. Teoh via Digitalmars-d-learn wrote:
> On Mon, Dec 30, 2019 at 03:09:58PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]
> > I suspect the cause is that whatever Grapheme.opSlice returns is going out-of-scope when used with .map, that's why it's malfunctioning.
> [...]
> 
> Haha, it's actually right there in the Grapheme docs for the opSlice overloads:
> 
>         Random-access range over Grapheme's $(CHARACTERS).
> 
>         Warning: Invalidates when this Grapheme leaves the scope,
>         attempts to use it then would lead to memory corruption.
> 
> Looks like when you use .map over the Grapheme, it gets copied into a temporary, which gets invalidated when map.front returns.  Somewhere we're missing a 'scope' qualifier...
[...]

Indeed, compiling with dmd -dip1000 produces this error message:

	test.d(15): Error: returning g.opSlice() escapes a reference to parameter g, perhaps annotate with return
	/usr/src/d/phobos/std/algorithm/iteration.d(499):        instantiated from here: MapResult!(__lambda1, Grapheme[])
	test.d(15):        instantiated from here: map!(Grapheme[])

Not the most helpful message (the annotation has to go in Phobos code, not in user code), but it does at least point to the cause of the problem.


T

-- 
What doesn't kill me makes me stranger.

December 31, 2019

Re: What type does byGrapheme() return?

Posted by Steven Schveighoffer
in reply to H. S. Teoh

Steven Schveighoffer

Posted in reply to H. S. Teoh

On 12/30/19 6:31 PM, H. S. Teoh wrote:
> On Mon, Dec 30, 2019 at 03:09:58PM -0800, H. S. Teoh via Digitalmars-d-learn wrote:
> [...]
>> I suspect the cause is that whatever Grapheme.opSlice returns is going
>> out-of-scope when used with .map, that's why it's malfunctioning.
> [...]
> 
> Haha, it's actually right there in the Grapheme docs for the opSlice
> overloads:
> 
>          Random-access range over Grapheme's $(CHARACTERS).
> 
>          Warning: Invalidates when this Grapheme leaves the scope,
>          attempts to use it then would lead to memory corruption.
> 
> Looks like when you use .map over the Grapheme, it gets copied into a
> temporary, which gets invalidated when map.front returns.  Somewhere
> we're missing a 'scope' qualifier...
> 

Then the original example should be fixable by putting "ref" in for all the lambdas.

But this is kind of disturbing. Why does the grapheme do this? The original data is not scoped.

e.g.:

writeln(" Text = ", gr1.map!((ref g) => g[]).joiner.to!string);

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation