Thread overview | |||||
---|---|---|---|---|---|
|
April 08, 2016 [Issue 15382] std.uri has an incorrect set of reserved characters | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15382 Eugene Wissner <belka@caraus.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |belka@caraus.de --- Comment #1 from Eugene Wissner <belka@caraus.de> --- Look at "2.4. When to Encode or Decode": "the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data." So reserved characters can be encoded but it isn't a must. Only characters used as delimiters in a particular URL scheme must be encoded. Wikipedia differs between reserved characters with or without reserved meaning. I tested it quickly in Firefox and Firefox doesn't seem to encode characters like * or (). The behavior of encodeComponent is actually exactly the same as encodeURIComponent from JavaScript. The behavior described in the issue, is how PHP urlencode works, that encodes all reserved characters. -- |
December 23, 2019 [Issue 15382] std.uri has an incorrect set of reserved characters | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15382 berni44 <bugzilla@d-ecke.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |bugzilla@d-ecke.de Resolution|--- |INVALID --- Comment #2 from berni44 <bugzilla@d-ecke.de> --- This is more a question how std.uri works, than a bug report. Please use the forum [1] for such questions in the future. [1] https://forum.dlang.org/group/learn -- |
January 24, 2021 [Issue 15382] std.uri has an incorrect set of reserved characters | ||||
---|---|---|---|---|
| ||||
https://issues.dlang.org/show_bug.cgi?id=15382 Stefan <kdevel@vogtner.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED CC| |kdevel@vogtner.de Resolution|INVALID |--- --- Comment #3 from Stefan <kdevel@vogtner.de> --- According to ยง 2.2 of RFC 3986 there are the following character classes: unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" The code in phobos/std/uri.d references these character classes instead: 62 uflags['#'] |= URI_Hash; 66 uflags[c] |= URI_Alpha; 67 uflags[c + 0x20] |= URI_Alpha; // lowercase letters 69 foreach (c; '0' .. '9' + 1) uflags[c] |= URI_Digit; 70 foreach (c; ";/?:@&=+$,") uflags[c] |= URI_Reserved; 71 foreach (c; "-_.!~*'()") uflags[c] |= URI_Mark; If encodeComponent is used URI_Encode is invoked with unescapedSet = URI_Alpha | URI_Digit | URI_Mark. This leads to some reserved characters not beeing encoded, e.g. ! or (. The notion of mark characters stems from the obsoleted RFC 2396 [2]. RFC 3986 explains the changes in its Appendix D.2 [3]. [1] https://tools.ietf.org/html/rfc3986#section-2 [2] https://tools.ietf.org/html/rfc2396#section-2.3 [3] https://tools.ietf.org/html/rfc3986#appendix-D.2 -- |
Copyright © 1999-2021 by the D Language Foundation