Thread overview | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
October 28, 2016 Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Hello, I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts. Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ? Cheers |
October 28, 2016 Re: Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alfred Newman | On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
> Hello,
>
> I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts.
>
> Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ?
>
> Cheers
You could try something like this. It works for accents. I haven't tested it on other characters yet.
import std.stdio;
import std.algorithm;
import std.array;
import std.conv;
enum
{
dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U']
}
void main()
{
auto str = "très élégant";
auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));
writeln(removed); // prints "tres elegant"
}
|
October 28, 2016 Re: Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alfred Newman | On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
> Hello,
>
> I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts.
>
> Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ?
>
> Cheers
import std.stdio;
import std.algorithm;
import std.uni;
import std.conv;
void main()
{
auto str = "très élégant";
immutable accents = unicode.Diacritic;
auto removed = str
.normalize!NFD
.filter!(c => !accents[c])
.to!string;
writeln(removed); // prints "tres elegant"
}
This first decomposes all characters into base and diacritic, and then removes the latter.
|
October 28, 2016 Re: Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marc Schütz | On Friday, 28 October 2016 at 12:52:04 UTC, Marc Schütz wrote:
> On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
>> [...]
>
> import std.stdio;
> import std.algorithm;
> import std.uni;
> import std.conv;
>
> void main()
> {
> auto str = "très élégant";
> immutable accents = unicode.Diacritic;
> auto removed = str
> .normalize!NFD
> .filter!(c => !accents[c])
> .to!string;
> writeln(removed); // prints "tres elegant"
> }
>
> This first decomposes all characters into base and diacritic, and then removes the latter.
Cool. That looks pretty neat and it should cover all cases.
|
October 28, 2016 Re: Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
> On Friday, 28 October 2016 at 11:24:28 UTC, Alfred Newman wrote:
>> Hello,
>>
>> I'm getting some troubles to replace the accented letters in a given string with their unaccented counterparts.
>>
>> Let's say I have the following input string "très élégant" and I need to create a function to return just "tres elegant". Considering we need to take care about unicode chars, what is the best way to write a D code to handle that ?
>>
>> Cheers
>
> You could try something like this. It works for accents. I haven't tested it on other characters yet.
>
> import std.stdio;
> import std.algorithm;
> import std.array;
> import std.conv;
>
> enum
> {
> dchar[dchar] _accent = ['á':'a', 'é':'e', 'è':'e', 'í':'i', 'ó':'o', 'ú':'u', 'Á':'A', 'É':'E', 'Í':'I', 'Ó':'O', 'Ú':'U']
> }
>
> void main()
> {
> auto str = "très élégant";
> auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));
> writeln(removed); // prints "tres elegant"
> }
@Chris
As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ?
auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));
Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics.
Thanks in advance
|
October 28, 2016 Re: Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Alfred Newman | On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote:
> On Friday, 28 October 2016 at 11:40:37 UTC, Chris wrote:
>> [...]
>
> @Chris
>
> As a new guy in the D community, I am not sure, but I think the line below is something like a Python's lambda, right ?
>
> auto removed = to!string(str.map!(a => (a in _accent) ? _accent[a] : a));
>
> Can you please rewrite the line in a more didatic way ? Sorry, but I'm still learning the basics.
>
> Thanks in advance
It boils down to something like:
if (c in _accent)
return _accent[c];
else
return c;
Just a normal lambda (condition true) ? yes : no;
I'd recommend you to use Marc's approach, though.
|
October 28, 2016 Re: Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote: > On Friday, 28 October 2016 at 13:50:24 UTC, Alfred Newman wrote: > > It boils down to something like: > > if (c in _accent) > return _accent[c]; > else > return c; > > Just a normal lambda (condition true) ? yes : no; > > I'd recommend you to use Marc's approach, though. What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]): map!(myLogic)(range); or (more idiomatic) range.map!(myLogic); This is true of a lot of functions, or rather templates, in the Phobos standard library, especially functions in std.algorithm (like find [2], canFind, filter etc.). In this way, instead of writing for-loops with if-else statements, you pass the logic to be applied within the `!()`-part of the template. // Filter the letter 'l' auto result = "Hello, world!".filter!(a => a != 'l'); // returns "Heo, word!" However, what is returned is not a string. So this won't work: `writeln("Result is " ~ result);` // Error: incompatible types for (("Result is ") ~ (result)): 'string' and // 'FilterResult!(__lambda2, string)' It returns a `FilterResult`. To fix this, you can either write: ` import std.conv; auto result = "Hello, world!".filter!(a => a != 'l').to!string; ` which converts it into a string. or you do this: ` import std.array; auto result = "Hello, world!".filter!(a => a != 'l').array; ` Then you have a string again and ` writeln("Result is " ~ result); ` works. Just bear that in mind, because you will get the above error sometimes. Marc's example is idiomatic D and you should become familiar with it asap. void main() { auto str = "très élégant"; immutable accents = unicode.Diacritic; auto removed = str // normalize each character .normalize!NFD // replace each diacritic with its non-diacritic counterpart .filter!(c => !accents[c]) // convert each item in FilterResult back to string. .to!string; writeln(removed); // prints "tres elegant" } [1] http://dlang.org/phobos/std_algorithm_iteration.html#.map [1] http://dlang.org/phobos/std_algorithm_searching.html#.find |
October 28, 2016 Re: Best approach to handle accented letters | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Friday, 28 October 2016 at 15:08:59 UTC, Chris wrote:
> On Friday, 28 October 2016 at 14:31:47 UTC, Chris wrote:
>> [...]
>
> What you basically do is you pass the logic on to `map` and `map` applies it to each item in the range (cf. [1]):
>
> [...]
The life is beautiful !
Thx.
|
Copyright © 1999-2021 by the D Language Foundation