On Sunday, 3 July 2022 at 20:28:18 UTC, rikki cattermole wrote:
> We only support UTF-16/UTF-32 for the target endian.
Text input comes from many sources, stdin, files and say the windowing system are three common sources that do not make any such guarantees.
Well, then the application author will use an external Unicode library anyway. If you support UTF-16 or UTF-32 there might not be a BOM mark, so you might need to use heuristics to figure out the LE/LB endian issue.
For things like gzip, png, crypto and unicode there are most likely faster and better tested open source alternatives than a small community can come up with. Maybe just use out whatever Chromium or Clang uses?
What I never liked about C++ is the string mess: char, signed char, unsigned char, char8_t, char16_t, char32_t, wchar_t, string, wstring, u8string, u16string, u32string, pmr::string, pmr::wstring, pmr::u8string, pmr::u16string, pmr::u32string… And this doesn't even account for endianess!! This is what happens over time as new needs pops up. One of the best things about Python3 and JavaScript is that there is one commonly used string type that is well supported.
Having one common string representation is a good thing for API authors.
(But make sure to have a maintained binding to a versatile C unicode library.)