Thread overview
Best approach to handle accented letters
Oct 28, 2016
Alfred Newman
Oct 28, 2016
Chris
Oct 28, 2016
Alfred Newman
Oct 28, 2016
Chris
Oct 28, 2016
Chris
Oct 28, 2016
Alfred Newman
Oct 28, 2016
Marc Schütz
Oct 28, 2016
Chris
October 28, 2016
Hello,

I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts.

Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ?

Cheers
October 28, 2016
On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
> Hello,
>
> I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts.
>
> Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ?
>
> Cheers

You could try something like this. It works for accents. I haven't tested it on other characters yet.

import std.stdio;
import std.algorithm;
import std.array;
import std.conv;

enum
{
  dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U']
}

void main()
{
  auto str = "très élégant";
  auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));
  writeln(removed);  // prints "tres elegant"
}
October 28, 2016
On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
> Hello,
>
> I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts.
>
> Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ?
>
> Cheers

import std.stdio;
import std.algorithm;
import std.uni;
import std.conv;

void main()
{
    auto str = "très élégant";
    immutable accents = unicode.Diacritic;
    auto removed = str
        .normalize!NFD
        .filter!(c => !accents[c])
        .to!string;
    writeln(removed);  // prints "tres elegant"
}

This first decomposes all characters into base and diacritic, and then removes the latter.
October 28, 2016
On Friday, 28 October 2016 at 12:52:04 UTC, Marc Schütz wrote:
> On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
>> [...]
>
> import std.stdio;
> import std.algorithm;
> import std.uni;
> import std.conv;
>
> void main()
> {
>     auto str = "très élégant";
>     immutable accents = unicode.Diacritic;
>     auto removed = str
>         .normalize!NFD
>         .filter!(c => !accents[c])
>         .to!string;
>     writeln(removed);  // prints "tres elegant"
> }
>
> This first decomposes all characters into base and diacritic, and then removes the latter.

Cool. That looks pretty neat and it should cover all cases.
October 28, 2016
On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
> On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
>> Hello,
>>
>> I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts.
>>
>> Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ?
>>
>> Cheers
>
> You could try something like this. It works for accents. I haven't tested it on other characters yet.
>
> import std.stdio;
> import std.algorithm;
> import std.array;
> import std.conv;
>
> enum
> {
>   dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U']
> }
>
> void main()
> {
>   auto str = "très élégant";
>   auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));
>   writeln(removed);  // prints "tres elegant"
> }

@Chris

As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ?

auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));

Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics.

Thanks in advance
October 28, 2016
On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:
> On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
>> [...]
>
> @Chris
>
> As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ?
>
> auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));
>
> Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics.
>
> Thanks in advance

It boils down to something like:

if (c in _accent)
  return _accent[c];
else
  return c;

Just a normal lambda (condition true) ? yes : no;

I'd recommend you to use Marc's approach, though.
October 28, 2016
On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
> On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:
>
> It boils down to something like:
>
> if (c in _accent)
>   return _accent[c];
> else
>   return c;
>
> Just a normal lambda (condition true) ? yes : no;
>
> I'd recommend you to use Marc's approach, though.

What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]):

map!(myLogic)(range);

or (more idiomatic)

range.map!(myLogic);

This is true of a lot of functions, or rather templates, in the Phobos standard library, especially functions in std.algorithm (like find [2], canFind, filter etc.). In this way, instead of writing for-loops with if-else statements, you pass the logic to be applied within the `!()`-part of the template.

// Filter the letter 'l'
auto result = "Hello, world!".filter!(a => a != 'l'); // returns "Heo, word!"

However, what is returned is not a string. So this won't work:

`writeln("Result is " ~ result);`

// Error: incompatible types for (("Result is ") ~ (result)): 'string' and
// 'FilterResult!(__lambda2, string)'

It returns a `FilterResult`.

To fix this, you can either write:
`
import std.conv;
auto result = "Hello, world!".filter!(a => a != 'l').to!string;
`
which converts it into a string.

or you do this:

`
import std.array;
auto result = "Hello, world!".filter!(a => a != 'l').array;
`

Then you have a string again and

`
writeln("Result is " ~ result);
`
works.

Just bear that in mind, because you will get the above error sometimes. Marc's example is idiomatic D and you should become familiar with it asap.

void main()
{
    auto str = "très élégant";
    immutable accents = unicode.Diacritic;
    auto removed = str
        // normalize each character
        .normalize!NFD
        // replace each diacritic with its non-diacritic counterpart
        .filter!(c => !accents[c])
        // convert each item in FilterResult back to string.
        .to!string;
    writeln(removed);  // prints "tres elegant"
}

[1] http://dlang.org/phobos/std_algorithm_iteration.html#.map
[1] http://dlang.org/phobos/std_algorithm_searching.html#.find
October 28, 2016
On Friday, 28 October 2016 at 15:08:59 UTC, Chris wrote:
> On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
>> [...]
>
> What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]):
>
> [...]

The life is beautiful !
Thx.