string to character code hex string

I need to convert a string of characters to a string of their hex representations. "AAA" -> "414141" This seems like something that would be in the std lib, but I can't find it. Does it exist? Thanks

On Saturday, 2 September 2017 at 15:53:25 UTC, bitwise wrote: > [...] This seems to work well enough. string toAsciiHex(string str) { import std.array : appender; auto ret = appender!string(null); ret.reserve(str.length * 2); foreach(c; str) ret.put(format!"%x"(c)); return ret.data; }

On 09/02/2017 09:23 AM, bitwise wrote: > On Saturday, 2 September 2017 at 15:53:25 UTC, bitwise wrote: >> [...] > > This seems to work well enough. > > string toAsciiHex(string str) > { > import std.array : appender; > > auto ret = appender!string(null); > ret.reserve(str.length * 2); > foreach(c; str) ret.put(format!"%x"(c)); > return ret.data; > } > Lazy version, which the user can easily generate a string from by appending .array: import std.stdio; auto hexString(R)(R input) { import std.conv : text; import std.string : format; import std.algorithm : map, joiner; return input.map!(c => format("%02x", c)).joiner; } void main() { writeln("AAA".hexString); } To generate string: import std.range : array; writeln("AAA".hexString.array); Ali

On Saturday, 2 September 2017 at 16:52:17 UTC, Ali Çehreli wrote: > On 09/02/2017 09:23 AM, bitwise wrote: >> On Saturday, 2 September 2017 at 15:53:25 UTC, bitwise wrote: >>> [...] >> >> This seems to work well enough. >> >> string toAsciiHex(string str) >> { >> import std.array : appender; >> >> auto ret = appender!string(null); >> ret.reserve(str.length * 2); >> foreach(c; str) ret.put(format!"%x"(c)); >> return ret.data; >> } >> > > Lazy version, which the user can easily generate a string from by appending .array: > > import std.stdio; > > auto hexString(R)(R input) { > import std.conv : text; > import std.string : format; > import std.algorithm : map, joiner; > return input.map!(c => format("%02x", c)).joiner; > } > > void main() { > writeln("AAA".hexString); > } > > To generate string: > > import std.range : array; > writeln("AAA".hexString.array); > > Ali Please correct my if i'm wrong, but it think this has issues regarding unicode. "ö…" becomes "f62026", which, interpreted as UTF-8, is a control character ~ " &", so you either need to add padding or use .byCodeUnit so it becomes "c3b6e280a6" (correct UTF-8) instead.

On 09/02/2017 10:07 AM, lithium iodate wrote: >> Lazy version, which the user can easily generate a string from by >> appending .array: >> >> import std.stdio; >> >> auto hexString(R)(R input) { >> import std.conv : text; >> import std.string : format; >> import std.algorithm : map, joiner; >> return input.map!(c => format("%02x", c)).joiner; >> } >> >> void main() { >> writeln("AAA".hexString); >> } >> >> To generate string: >> >> import std.range : array; >> writeln("AAA".hexString.array); >> >> Ali > > Please correct my if i'm wrong, but it think this has issues regarding > unicode. > "ö…" becomes "f62026", which, interpreted as UTF-8, is a control > character ~ " &", so you either need to add padding or use ..byCodeUnit > so it becomes "c3b6e280a6" (correct UTF-8) instead. You're right but I think there is no intention of interpreting the result as UTF-8. "f62026" is just to be used as "f62026", which can be converted byte-by-byte back to "ö…". That's how understand the requirement anyway. Ali

September 02, 2017

Re: string to character code hex string

Posted by Moritz Maxeiner
in reply to bitwise

Permalink

Moritz Maxeiner

Posted in reply to bitwise

Permalink

On Saturday, 2 September 2017 at 16:23:57 UTC, bitwise wrote:
> On Saturday, 2 September 2017 at 15:53:25 UTC, bitwise wrote:
>> [...]
>
> This seems to work well enough.
>
> string toAsciiHex(string str)
> {
>     import std.array : appender;
>
>     auto ret = appender!string(null);
>     ret.reserve(str.length * 2);
>     foreach(c; str) ret.put(format!"%x"(c));
>     return ret.data;
> }

Note: Each of those format calls is going to allocate a new string, followed by put copying that new string's content over into the appender, leaving you with \theta(str.length) tiny memory chunks that aren't used anymore for the GC to eventually collect.

If this (unnecessary waste) is of concern to you (and from the fact that you used ret.reserve I assume it is), then the easy fix is to use `sformat` instead of `format`:

---
string toHex(string str)
{
	import std.format : sformat;
	import std.exception: assumeUnique;

	auto   ret = new char[str.length * 2];
	size_t len;

	foreach (c; str)
	{
		auto slice = sformat!"%x"(ret[len..$], c);
		//auto slice = toHex(ret[len..$], c);
		assert (slice.length <= 2);
		len += slice.length;
	}

	return ret[0..len].assumeUnique;
}
---

If you want to cut out the format import entirely, notice the `auto slice = toHex...` line, which can be implemented like this (always returns two chars):

---
char[] toHex(char[] buf, char c)
{
	import std.ascii : lowerHexDigits;

	assert (buf.length >= 2);
	buf[0] = lowerHexDigits[(c & 0xF0) >> 4];
	buf[1] = lowerHexDigits[c & 0x0F];

	return buf[0..2];
}
---

On Saturday, 2 September 2017 at 17:41:34 UTC, Ali Çehreli wrote: > > You're right but I think there is no intention of interpreting the result as UTF-8. "f62026" is just to be used as "f62026", which can be converted byte-by-byte back to "ö…". That's how understand the requirement anyway. > > Ali My intention is compute the mangling of a D template function that takes a string as a template parameter without having the symbol available. I think that means that converting each byte of the string to hex and tacking it on would suffice.

On Saturday, 2 September 2017 at 17:41:34 UTC, Ali Çehreli wrote: > You're right but I think there is no intention of interpreting the result as UTF-8. "f62026" is just to be used as "f62026", which can be converted byte-by-byte back to "ö…". That's how understand the requirement anyway. > > Ali That is not possible, because you cannot know whether "f620" and "26" or "f6" and "2026" (or any other combination) should form a code point each. Additional padding to constant width (8 hex chars) is needed.

On Saturday, 2 September 2017 at 17:45:30 UTC, Moritz Maxeiner wrote: > > If this (unnecessary waste) is of concern to you (and from the fact that you used ret.reserve I assume it is), then the easy fix is to use `sformat` instead of `format`: > Yes, thanks. I'm going to go with a variation of your approach: private string toAsciiHex(string str) { import std.ascii : lowerHexDigits; import std.exception: assumeUnique; auto ret = new char[str.length * 2]; int i = 0; foreach(c; str) { ret[i++] = lowerHexDigits[(c >> 4) & 0xF]; ret[i++] = lowerHexDigits[c & 0xF]; } return ret.assumeUnique; } I'm not sure how the compiler would mangle UTF8, but I intend to use this on one specific function (actually the 100's of instantiations of it). It will predictably named though. Thanks!

On Saturday, 2 September 2017 at 18:07:51 UTC, bitwise wrote: > On Saturday, 2 September 2017 at 17:45:30 UTC, Moritz Maxeiner wrote: >> >> If this (unnecessary waste) is of concern to you (and from the fact that you used ret.reserve I assume it is), then the easy fix is to use `sformat` instead of `format`: >> > > Yes, thanks. I'm going to go with a variation of your approach: > > private > string toAsciiHex(string str) > { > import std.ascii : lowerHexDigits; > import std.exception: assumeUnique; > > auto ret = new char[str.length * 2]; > int i = 0; > > foreach(c; str) { > ret[i++] = lowerHexDigits[(c >> 4) & 0xF]; > ret[i++] = lowerHexDigits[c & 0xF]; > } > > return ret.assumeUnique; > } If you never need the individual character function, that's probably the best in terms of readability, though with a decent compiler, that and the two functions one should result in the same opcode (except for bitshift&bitmask swap). > > I'm not sure how the compiler would mangle UTF8, but I intend to use this on one specific function (actually the 100's of instantiations of it). In UTF8: --- utfmangle.d --- void fun_ༀ() {} pragma(msg, fun_ༀ.mangleof); ------------------- --- $ dmd -c utfmangle.d _D6mangle7fun_ༀFZv --- Only universal character names for identifiers are allowed, though, as per [1] [1] https://dlang.org/spec/lex.html#identifiers

Forums