April 08, 2012 Encodings | ||||
|---|---|---|---|---|
| ||||
For most of the string processing I do, I read/write text in UTF-8 and convert it to UTF-32 for processing (with std.utf), so I don't have to worry about encoding. Is this a good or bad paradigm? Is there a better way to do this? What method do all of you use? Just curious, NMS | ||||
April 08, 2012 Re: Encodings | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nathan M. Swan | On Sunday, April 08, 2012 23:36:23 Nathan M. Swan wrote:
> For most of the string processing I do, I read/write text in UTF-8 and convert it to UTF-32 for processing (with std.utf), so I don't have to worry about encoding. Is this a good or bad paradigm? Is there a better way to do this? What method do all of you use?
>
> Just curious, NMS
It depends on what you're doing. Depending on the functions that you use and your memory requirements, UTF-8 may be faster or UTF-32 may be faster. UTF-32 has the advantage of being a random-access range, which will make it work with a number of functions that UTF-8 won't work with. But UTF-32 also takes considerably more memory (especially if most of your characters are ASCII characters), which can be a problem.
I think that the most common thing is to just operate on UTF-8 unless another encoding is needed (e.g. UTF-32 is required because random-access is needed), and in plenty of cases, you end up operating on generic ranges anyway if you use range-based functions on strings and don't use std.array.array on them.
You're going to have to profile your code to see whether using UTF-8 or UTF-32 primarily in your string-processing is more efficient.
- Jonathan M Davis
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply