unicode combinig mark/ std.uni question

December 05, 2017
Posted by ikod
Permalink
ikod
Permalink
Hello,

I have to create very basic IDNA (Internationalized Domain Names in Applications) library. There are two parts in IDNA - user input checks and punycode encoding/decoding.

Punycode part already completed, and now I have to add some checks but I'm weak in unicode and cant find proper way to express these tests using std.uni.

Here are list of prohibited domain labels (https://tools.ietf.org/html/rfc5891):

   o  Labels whose first character is a combining mark (see The Unicode
      Standard, Section 2.11 [Unicode]).

   o  Labels containing prohibited code points, i.e., those that are
      assigned to the "DISALLOWED" category of the Tables document
      [RFC5892].

   o  Labels containing code points that are identified in the Tables
      document as "CONTEXTJ", i.e., requiring exceptional contextual
      rule processing on lookup, but that do not conform to those rules.
      Note that this implies that a rule must be defined, not null: a
      character that requires a contextual rule but for which the rule
      is null is treated in this step as having failed to conform to the
      rule.

   o  Labels containing code points that are identified in the Tables
      document as "CONTEXTO", but for which no such rule appears in the
      table of rules.  Applications resolving DNS names or carrying out
      equivalent operations are not required to test contextual rules
      for "CONTEXTO" characters, only to verify that a rule is defined
      (although they MAY make such tests to provide better protection or
      give better information to the user).

   o  Labels containing code points that are unassigned in the version
      of Unicode being used by the application, i.e., in the UNASSIGNED
      category of the Tables document.

Can anybody help with this task?

Thanks!
Forums