Best approach to handle accented letters

Oct 28, 2016

Alfred Newman

Oct 28, 2016

Chris

Oct 28, 2016

Oct 28, 2016

Oct 28, 2016

Oct 28, 2016

Oct 28, 2016

Oct 28, 2016

Hello, I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts. Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ? Cheers

On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote: > Hello, > > I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts. > > Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ? > > Cheers You could try something like this. It works for accents. I haven't tested it on other characters yet. import std.stdio; import std.algorithm; import std.array; import std.conv; enum { dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U'] } void main() { auto str = "très élégant"; auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); writeln(removed); // prints "tres elegant" }

On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote: > Hello, > > I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts. > > Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ? > > Cheers import std.stdio; import std.algorithm; import std.uni; import std.conv; void main() { auto str = "très élégant"; immutable accents = unicode.Diacritic; auto removed = str .normalize!NFD .filter!(c => !accents[c]) .to!string; writeln(removed); // prints "tres elegant" } This first decomposes all characters into base and diacritic, and then removes the latter.

On Friday, 28 October 2016 at 12:52:04 UTC, Marc Schütz wrote: > On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote: >> [...] > > import std.stdio; > import std.algorithm; > import std.uni; > import std.conv; > > void main() > { > auto str = "très élégant"; > immutable accents = unicode.Diacritic; > auto removed = str > .normalize!NFD > .filter!(c => !accents[c]) > .to!string; > writeln(removed); // prints "tres elegant" > } > > This first decomposes all characters into base and diacritic, and then removes the latter. Cool. That looks pretty neat and it should cover all cases.

On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote: > On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote: >> Hello, >> >> I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts. >> >> Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ? >> >> Cheers > > You could try something like this. It works for accents. I haven't tested it on other characters yet. > > import std.stdio; > import std.algorithm; > import std.array; > import std.conv; > > enum > { > dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U'] > } > > void main() > { > auto str = "très élégant"; > auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); > writeln(removed); // prints "tres elegant" > } @Chris As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ? auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics. Thanks in advance

On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote: > On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote: >> [...] > > @Chris > > As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ? > > auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a)); > > Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics. > > Thanks in advance It boils down to something like: if (c in _accent) return _accent[c]; else return c; Just a normal lambda (condition true) ? yes : no; I'd recommend you to use Marc's approach, though.

October 28, 2016

Re: Best approach to handle accented letters

Posted by Chris
in reply to Chris

Permalink

Chris

Posted in reply to Chris

Permalink

On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
> On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:
>
> It boils down to something like:
>
> if (c in _accent)
>   return _accent[c];
> else
>   return c;
>
> Just a normal lambda (condition true) ? yes : no;
>
> I'd recommend you to use Marc's approach, though.

What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]):

map!(myLogic)(range);

or (more idiomatic)

range.map!(myLogic);

This is true of a lot of functions, or rather templates, in the Phobos standard library, especially functions in std.algorithm (like find [2], canFind, filter etc.). In this way, instead of writing for-loops with if-else statements, you pass the logic to be applied within the `!()`-part of the template.

// Filter the letter 'l'
auto result = "Hello, world!".filter!(a => a != 'l'); // returns "Heo, word!"

However, what is returned is not a string. So this won't work:

`writeln("Result is " ~ result);`

// Error: incompatible types for (("Result is ") ~ (result)): 'string' and
// 'FilterResult!(__lambda2, string)'

It returns a `FilterResult`.

To fix this, you can either write:
`
import std.conv;
auto result = "Hello, world!".filter!(a => a != 'l').to!string;
`
which converts it into a string.

or you do this:

`
import std.array;
auto result = "Hello, world!".filter!(a => a != 'l').array;
`

Then you have a string again and

`
writeln("Result is " ~ result);
`
works.

Just bear that in mind, because you will get the above error sometimes. Marc's example is idiomatic D and you should become familiar with it asap.

void main()
{
    auto str = "très élégant";
    immutable accents = unicode.Diacritic;
    auto removed = str
        // normalize each character
        .normalize!NFD
        // replace each diacritic with its non-diacritic counterpart
        .filter!(c => !accents[c])
        // convert each item in FilterResult back to string.
        .to!string;
    writeln(removed);  // prints "tres elegant"
}

[1] http://dlang.org/phobos/std_algorithm_iteration.html#.map
[1] http://dlang.org/phobos/std_algorithm_searching.html#.find

On Friday, 28 October 2016 at 15:08:59 UTC, Chris wrote: > On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote: >> [...] > > What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]): > > [...] The life is beautiful ! Thx.

Forums