Thread overview
GREETINGS FROM iSTANBUL
Aug 01, 2021
Salih Dincer
Aug 01, 2021
rikki cattermole
Aug 01, 2021
Paul Backus
Aug 01, 2021
Salih Dincer
Aug 01, 2021
Salih Dincer
August 01, 2021

Greetings from istanbul...

In our language, the capital letter 'i' is used, similar to the lower case. But in this example:

// D 2.0.83

import std.stdio, std.uni;

void main()
{
  auto message = "Greetings from istanbul"d;

  message.asUpperCase.writeln; // GREETINGS FROM ISTANBUL

  /* D is very talented at this,
   * except for one letter: 'i'
   * ref: https://en.m.wikipedia.org/wiki/Istanbul
   */
}

I've to code a custom solution. Is it possible to solve the problem from within std.uni?

We are discussing the issue in our own community. I also saw: https://forum.dlang.org/post/vxnnykllgxsghlludpqv@forum.dlang.org

Thanks...

August 02, 2021
It appears you are using the wrong lowercase character.

https://en.wikipedia.org/wiki/Dotted_and_dotless_I

From a quick experiment, it appears std.uni is treating the upper case dotted I's lower case as a grapheme. Which it probably shouldn't be as there is an actual character for that.

We might need to update our unicode database... or something.
August 01, 2021
On Sunday, 1 August 2021 at 17:56:00 UTC, rikki cattermole wrote:
> It appears you are using the wrong lowercase character.
>
> https://en.wikipedia.org/wiki/Dotted_and_dotless_I
>
> From a quick experiment, it appears std.uni is treating the upper case dotted I's lower case as a grapheme. Which it probably shouldn't be as there is an actual character for that.
>
> We might need to update our unicode database... or something.

It's not the wrong lower-case character. Turkish uses U+0069 (a.k.a. ASCII 'i') for lower-case dotted I, but has a non-default case mapping that pairs U+0069 with U+0130 ('İ') rather than U+0049 (ASCII 'I'). Phobos' std.uni uses the default case mapping for its toUpper function, so it does not produce the correct result for Turkish text.

Source: https://www.unicode.org/faq/casemap_charprop.html#1

A common solution to this in other languages is to have a version of toUpper that takes a locale as an argument. Some examples:

- Javascript: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleUpperCase
- Go: https://pkg.go.dev/strings#ToUpperSpecial
- Java: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#toUpperCase(java.util.Locale)
- C#: https://docs.microsoft.com/en-US/dotnet/api/system.string.toupper?view=net-5.0
August 01, 2021

On Sunday, 1 August 2021 at 18:22:05 UTC, Paul Backus wrote:

>

On Sunday, 1 August 2021 at 17:56:00 UTC, rikki cattermole wrote:

>

It appears you are using the wrong lowercase character.

I think so too, here's the proof:

import std.string, std.stdio;

void main()
{
  auto istanbul = "\u0131stanbul";
  enum capitalized = "Istanbul";
  assert(istanbul.capitalize == capitalized);
  assert("istanbul".capitalize == capitalized);
}

Different characters but same and seamless results...

August 01, 2021

On Sunday, 1 August 2021 at 18:22:05 UTC, Paul Backus wrote:

>

A common solution to this in other languages is to have a version of toUpper that takes a locale as an argument. Some examples:

I did not know that; exactly that I want to talk about. So clean code...

Thank you Paul.