D-ish way to work with strings?

Dec 22, 2019

Robert M. Münch

Dec 22, 2019

Robert M. Münch

Dec 23, 2019

Dec 27, 2019

Dec 27, 2019

Dec 22, 2019

Dec 27, 2019

I want to do all the basics mutating things with strings: append, insert, replace What is the D-ish way to do that since string is aliased to immutable(char)[]? Using arrays, using ~ operator, always copying, changing, combining my strings into a new one? Does it make sense to think about reducing GC pressure? I'm a bit lost in the possibilities and don't find any "that's the way to do it". -- Robert M. Münch http://www.saphirion.com smarter | better | faster

Want to add I'm talking about unicode strings. Wouldn't it make sense to handle everything as UTF-32 so that iteration is simple because code-point = code-unit? And later on, convert to UTF-16 or UTF-8 on demand? -- Robert M. Münch http://www.saphirion.com smarter | better | faster

On 12/22/19 9:15 AM, Robert M. Münch wrote: > I want to do all the basics mutating things with strings: append, insert, replace > > What is the D-ish way to do that since string is aliased to immutable(char)[]? switch to using char[]. Unfortunately, there's a lot of code out there that accepts string instead of const(char)[], which is more usable. I think many people don't realize the purpose of the string type. It's meant to be something that is heap-allocated (or as a global), and NEVER goes out of scope. Many things are shoehorned into string which shouldn't be. > Using arrays, using ~ operator, always copying, changing, combining my strings into a new one? Does it make sense to think about reducing GC pressure? It really depends on your use cases. strings are great precisely because they don't change. slicing makes huge sense there. > I'm a bit lost in the possibilities and don't find any "that's the way to do it". Again, use char[] if you are going to be rearranging strings. And you have to take care not to cheat and cast to string. Always use idup if you need one. If you find Phobos functions that unnecessarily take string instead of const(char)[] please post to bugzilla. -Steve

On Sun, Dec 22, 2019 at 06:27:03PM +0100, Robert M. Münch via Digitalmars-d-learn wrote: > Want to add I'm talking about unicode strings. > > Wouldn't it make sense to handle everything as UTF-32 so that iteration is simple because code-point = code-unit? > > And later on, convert to UTF-16 or UTF-8 on demand? [...] Be careful that code point != "character" the way most people understand the word "character". The word you're looking for is "grapheme". Which, unfortunately, is rather complex and very slow to handle in Unicode. See std.uni.byGrapheme. Usually you want to just stick with UTF-8 (usually) or UTF-16 (for Windows and Java interop). UTF-32 wastes a lot of space, and *still* doesn't give you what you think you want, and Grapheme[] is just dog slow because of the amount of decoding/recoding needed to manipulate it. What are you planning to do with your strings? IME, using ~ occasionally doesn't add *too* much GC pressure, and slicing is usually the idiomatic way of working with strings in D (it can result in faster code than C because you don't have to keep strcpy()'d stuff all over the place). If you're appending string a LOT, you might want to consider using std.array.appender in your inner loops to alleviate some of the cost of using ~ too much. Or use lazy evaluation and ranges to defer actually constructing the string until the end when it's ready to be stored. Still, this all depends on what you're trying to do with your strings. Elaborate a bit more about your use case, and we might be able to give better advice. T -- Nobody is perfect. I am Nobody. -- pepoluan, GKC forum

On 2019-12-22 18:45:52 +0000, Steven Schveighoffer said: > switch to using char[]. Unfortunately, there's a lot of code out there that accepts string instead of const(char)[], which is more usable. I think many people don't realize the purpose of the string type. It's meant to be something that is heap-allocated (or as a global), and NEVER goes out of scope. Hi Steve, thanks for the feedback. Makes sense to me. > It really depends on your use cases. strings are great precisely because they don't change. slicing makes huge sense there. My "strings" change a lot, so not really a good fit to use string. > Again, use char[] if you are going to be rearranging strings. And you have to take care not to cheat and cast to string. Always use idup if you need one. Will do. > If you find Phobos functions that unnecessarily take string instead of const(char)[] please post to bugzilla. Ok, will keep an eye on it. -- Robert M. Münch http://www.saphirion.com smarter | better | faster