Uri class and parser - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Uri class and parser

Thread overview

Uri class and parser
Oct 23, 2012 Mike van Dongen
Oct 24, 2012 Jacob Carlborg
Oct 24, 2012 Adam D. Ruppe
Oct 25, 2012 Mike van Dongen
Oct 25, 2012 Jacob Carlborg
Oct 25, 2012 Jens Mueller
Oct 26, 2012 Jacob Carlborg
Oct 26, 2012 Jens Mueller
Oct 26, 2012 Jonathan M Davis
Oct 26, 2012 Jens Mueller
Oct 24, 2012 ponce
Oct 24, 2012 Mike van Dongen
Oct 24, 2012 Jacob Carlborg
Oct 24, 2012 Adam D. Ruppe
Oct 25, 2012 Jacob Carlborg
Oct 25, 2012 Mike van Dongen
Oct 25, 2012 Jacob Carlborg
Oct 24, 2012 Adam D. Ruppe
Oct 25, 2012 Jens Mueller
Oct 25, 2012 Mike van Dongen
Oct 25, 2012 Jens Mueller
Oct 26, 2012 John Chapman
Oct 26, 2012 Mike van Dongen
Oct 26, 2012 Adam D. Ruppe
Oct 26, 2012 Mike van Dongen
Oct 25, 2012 Jonathan M Davis
Oct 25, 2012 Jens Mueller
Oct 25, 2012 Jonathan M Davis
Oct 25, 2012 Jens Mueller
Oct 26, 2012 Jonathan M Davis
Oct 26, 2012 Jacob Carlborg
Oct 26, 2012 Walter Bright
Oct 28, 2012 Jens Mueller
Oct 28, 2012 Jonathan M Davis
Oct 28, 2012 Jacob Carlborg
Nov 08, 2012 Mike van Dongen
Nov 08, 2012 jerro
Nov 08, 2012 Mike van Dongen
Nov 08, 2012 jerro
Nov 08, 2012 Mike van Dongen
Nov 08, 2012 jerro
Nov 08, 2012 Jonathan M Davis
Nov 09, 2012 Mike van Dongen
Nov 09, 2012 Jonathan M Davis
Feb 24, 2013 RommelVR
Oct 26, 2012 Jens Mueller
Oct 26, 2012 Jonathan M Davis
Oct 26, 2012 Jens Mueller
Oct 26, 2012 Jacob Carlborg
Oct 26, 2012 Jens Mueller
Oct 27, 2012 Jacob Carlborg
Oct 26, 2012 Walter Bright
Oct 27, 2012 Adam D. Ruppe
Oct 27, 2012 Jacob Carlborg

October 23, 2012

Uri class and parser

Posted by Mike van Dongen

Mike van Dongen

Hi all!

I've been working on an URI parser which takes a string and then separates the parts and puts them in the correct properties.
If a valid URI was provided, the (static) parser will return an instance of Uri.

I've commented all relevant lines of code and tested it using unittests.

Now what I'm wondering is if it meets the phobos requirements and standards.
And of course if you think I should do a pull request on GitHub!

My code can be found here, at the bottom of the already existing file uri.d:
https://github.com/MikevanDongen/phobos/blob/uri-parser/std/uri.d


Thanks,

Mike van Dongen.

October 24, 2012

Re: Uri class and parser

Posted by Jacob Carlborg
in reply to Mike van Dongen

Jacob Carlborg

Posted in reply to Mike van Dongen

On 2012-10-23 22:47, Mike van Dongen wrote:
> Hi all!
>
> I've been working on an URI parser which takes a string and then
> separates the parts and puts them in the correct properties.
> If a valid URI was provided, the (static) parser will return an instance
> of Uri.
>
> I've commented all relevant lines of code and tested it using unittests.
>
> Now what I'm wondering is if it meets the phobos requirements and
> standards.
> And of course if you think I should do a pull request on GitHub!
>
> My code can be found here, at the bottom of the already existing file
> uri.d:
> https://github.com/MikevanDongen/phobos/blob/uri-parser/std/uri.d
>
>
> Thanks,
>
> Mike van Dongen.

I would have expected a few additional components, like:

* Domain
* Password
* Username
* Host
* Hash

A way to build an URI base on the components.
It would be nice if there were methods for getting/setting the path component as an array. Also methods for getting/setting the query component as an associative array.

A few stylistic issues. There are a lot of places where you haven't indented the code, at least how it looks like on github.

I wouldn't put the private methods at the top.

-- 
/Jacob Carlborg

October 24, 2012

Re: Uri class and parser

Posted by Adam D. Ruppe
in reply to Mike van Dongen

Adam D. Ruppe

Posted in reply to Mike van Dongen

On Tuesday, 23 October 2012 at 20:47:26 UTC, Mike van Dongen wrote:
> https://github.com/MikevanDongen/phobos/blob/uri-parser/std/uri.d

If you want to take any of the code from mine, feel free. It is struct Uri in my cgi.d:

https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff/blob/master/cgi.d#L1615

My thing includes relative linking and some more parsing too. The ctRegex in there however, when it works it is cool, but if there's an error in an *other* part of the code, other module, doesn't call it, completely unrelated such as just making a typo on a local variable name... the compiler spews like 20 errors about ctRegex.

That's annoying. But the bug is in the compiler and only makes other errors uglier so I'm just ignoring it for now.

October 24, 2012

Re: Uri class and parser

Posted by Adam D. Ruppe
in reply to Jacob Carlborg

Adam D. Ruppe

Posted in reply to Jacob Carlborg

On Wednesday, 24 October 2012 at 07:38:58 UTC, Jacob Carlborg wrote:
> It would be nice if there were methods for getting/setting the path component as an array. Also methods for getting/setting the query component as an associative array.

BTW don't forget that this is legal:

?value&value=1&value=2

The appropriate type for the AA is

string[][string]


This is why my cgi.d has functions two decodeVariables and decodeVariablesSingle and two members (in the Cgi class, I didn't add it to the Uri struct) get and getArray.

decodeVariables returns the complete string[][string]

and the single versions only keep the last element of the string[], which gives a string[string] for convenience.

October 24, 2012

Re: Uri class and parser

Posted by ponce
in reply to Jacob Carlborg

ponce

Posted in reply to Jacob Carlborg

On Wednesday, 24 October 2012 at 07:38:58 UTC, Jacob Carlborg
wrote:
>
> I would have expected a few additional components, like:
>
> * Domain
> * Password
> * Username
> * Host
> * Hash
>
> A way to build an URI base on the components.
> It would be nice if there were methods for getting/setting the path component as an array. Also methods for getting/setting the query component as an associative array.

I have a public domain URI parser here:
http://github.com/p0nce/gfm/blob/master/common/uri.d

October 24, 2012

Re: Uri class and parser

Posted by Mike van Dongen
in reply to Jacob Carlborg

Mike van Dongen

Posted in reply to Jacob Carlborg

On Wednesday, 24 October 2012 at 07:38:58 UTC, Jacob Carlborg wrote:
> I would have expected a few additional components, like:
>
> * Domain
> * Password
> * Username
> * Host
> * Hash
>
> A way to build an URI base on the components.
> It would be nice if there were methods for getting/setting the path component as an array. Also methods for getting/setting the query component as an associative array.

Thanks for the suggestions!
I've added many, if not all, of them to the repo:

- Identifying/separating the username, password (together the userinfo), the domain and the port number from the authority.
- The hash now also can be get/set and the same thing goes for the data in the query


On Wednesday, 24 October 2012 at 12:47:15 UTC, Adam D. Ruppe wrote:
> On Wednesday, 24 October 2012 at 07:38:58 UTC, Jacob Carlborg wrote:
>> It would be nice if there were methods for getting/setting the path component as an array. Also methods for getting/setting the query component as an associative array.
>
> BTW don't forget that this is legal:
>
> ?value&value=1&value=2
>
> The appropriate type for the AA is
>
> string[][string]

It does not yet take into account the fact that multiple query elements can have the same name. I'll be working on that next.


On Wednesday, 24 October 2012 at 07:38:58 UTC, Jacob Carlborg wrote:
> A few stylistic issues. There are a lot of places where you haven't indented the code, at least how it looks like on github.
>
> I wouldn't put the private methods at the top.

As for the indentations, I use tabs with the size of 4 spaces.
Viewing the code on Github (in Chromium) you'll see tabs of 8 spaces.
I'm not sure what the phobos standard is?

As all my code is part of a single class and the file std/uri.d already existed, I decided to 'just' append my code to the file. Should I perhaps put it in another file as the private methods you mentioned are not relevant to my code?


You may be able to see the new getters by checking out this unittest:

uri = Uri.parse("foo://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal&novalue#nose");
assert(uri.scheme == "foo");
assert(uri.authority == "username:password@example.com:8042");
assert(uri.path == "over/there/index.dtb");
assert(uri.pathAsArray == ["over", "there", "index.dtb"]);
assert(uri.query == "type=animal&name=narwhal&novalue");
assert(uri.queryAsArray == ["type": "animal", "name": "narwhal", "novalue": ""]);
assert(uri.fragment == "nose");
assert(uri.host == "example.com");
assert(uri.port == 8042);
assert(uri.username == "username");
assert(uri.password == "password");
assert(uri.userinfo == "username:password");
assert(uri.queryAsArray["type"] == "animal");
assert(uri.queryAsArray["novalue"] == "");
assert("novalue" in uri.queryAsArray);
assert(!("nothere" in uri.queryAsArray));

October 24, 2012

Re: Uri class and parser

Posted by Jacob Carlborg
in reply to Mike van Dongen

Jacob Carlborg

Posted in reply to Mike van Dongen

On 2012-10-24 20:22, Mike van Dongen wrote:

> Thanks for the suggestions!
> I've added many, if not all, of them to the repo:
>
> - Identifying/separating the username, password (together the userinfo),
> the domain and the port number from the authority.
> - The hash now also can be get/set and the same thing goes for the data
> in the query

> As for the indentations, I use tabs with the size of 4 spaces.
> Viewing the code on Github (in Chromium) you'll see tabs of 8 spaces.
> I'm not sure what the phobos standard is?

Ok, I'm using firefox and it doesn't look particular good on github. The Phobos standard is to use tabs as spaces with the size of 4.

> As all my code is part of a single class and the file std/uri.d already
> existed, I decided to 'just' append my code to the file. Should I
> perhaps put it in another file as the private methods you mentioned are
> not relevant to my code?

If the some methods aren't used by the URI parser you should remove the. If they're used I would suggested you move the further down in the code, possibly at the bottom.

> You may be able to see the new getters by checking out this unittest:

Cool. It would be nice to have a way to set the query and path as an (associative) array as well.

Just a suggestion, I don't really see a point in having getters and setters that just forwards to the instance variables. Just use public instance variables. The only reason to use getters and setters would be to be able to subclass and override them. But I think you could just make Uri a final class.

About path and query. I wonder that's best to be default return an (associative) array or a string. I would think it's more useful to return an (associative) array and then provide rawPath() and rawQuery() which would return strings.

A nitpick, I'm not really an expert on URI's but is "fragment" really the correct name for that I would call the "hash"? That would be "nose" in the example below.

> uri =
> Uri.parse("foo://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal&novalue#nose");
>
> assert(uri.scheme == "foo");
> assert(uri.authority == "username:password@example.com:8042");
> assert(uri.path == "over/there/index.dtb");
> assert(uri.pathAsArray == ["over", "there", "index.dtb"]);
> assert(uri.query == "type=animal&name=narwhal&novalue");
> assert(uri.queryAsArray == ["type": "animal", "name": "narwhal",
> "novalue": ""]);
> assert(uri.fragment == "nose");
> assert(uri.host == "example.com");
> assert(uri.port == 8042);
> assert(uri.username == "username");
> assert(uri.password == "password");
> assert(uri.userinfo == "username:password");
> assert(uri.queryAsArray["type"] == "animal");
> assert(uri.queryAsArray["novalue"] == "");
> assert("novalue" in uri.queryAsArray);
> assert(!("nothere" in uri.queryAsArray));

-- 
/Jacob Carlborg

October 24, 2012

Re: Uri class and parser

Posted by Adam D. Ruppe
in reply to Jacob Carlborg

Adam D. Ruppe

Posted in reply to Jacob Carlborg

On Wednesday, 24 October 2012 at 19:54:54 UTC, Jacob Carlborg wrote:
> A nitpick, I'm not really an expert on URI's but is "fragment" really the correct name for that I would call the "hash"? That would be "nose" in the example below.

Yes, that's the term in the standard.

http://en.wikipedia.org/wiki/Fragment_identifier

Javascript calls it the hash though, but it is slightly different: the # symbol itself is not part of the fragment according to the standard.

But javascript's location.hash does return it.

URL: example.com/
>>> location.hash
""

>>> location.hash = "test"
"test"

URL changes to: example.com/#test

>>> location.hash;
"#test"

The fragment would technically just be "test" there.

October 25, 2012

Re: Uri class and parser

Posted by Jacob Carlborg
in reply to Adam D. Ruppe

Jacob Carlborg

Posted in reply to Adam D. Ruppe

On 2012-10-24 22:36, Adam D. Ruppe wrote:

> Yes, that's the term in the standard.
>
> http://en.wikipedia.org/wiki/Fragment_identifier
>
> Javascript calls it the hash though, but it is slightly different: the #
> symbol itself is not part of the fragment according to the standard.
>
> But javascript's location.hash does return it.
>
> URL: example.com/
>>>> location.hash
> ""
>
>>>> location.hash = "test"
> "test"
>
> URL changes to: example.com/#test
>
>>>> location.hash;
> "#test"
>
>
>
> The fragment would technically just be "test" there.

I've obviously done too much JavaScript :). Thanks for the clarification.

-- 
/Jacob Carlborg

October 25, 2012

Re: Uri class and parser

Posted by Mike van Dongen
in reply to Adam D. Ruppe

Mike van Dongen

Posted in reply to Adam D. Ruppe

On Wednesday, 24 October 2012 at 20:36:51 UTC, Adam D. Ruppe wrote:
> On Wednesday, 24 October 2012 at 19:54:54 UTC, Jacob Carlborg wrote:
>> A nitpick, I'm not really an expert on URI's but is "fragment" really the correct name for that I would call the "hash"? That would be "nose" in the example below.
>
> Yes, that's the term in the standard.
>
> http://en.wikipedia.org/wiki/Fragment_identifier

The only reason I used "fragment" was because both the RFC and the Wikipedia page called it that way. I hate to break protocol ;)

> Cool. It would be nice to have a way to set the query and path as an (associative) array as well.

Now it allows you to create/edit an URI. You can do so by using an array or string, whichever you prefer.
I also added a toString() method and fixed the indentation to 4 spaces, instead of 1 tab.

uri = new Uri();
uri.scheme = "foo";
uri.username = "username";
uri.password = "password";
uri.host = "example.com";
uri.port = 8042;
uri.path = ["over", "there", "index.dtb"];
uri.query = ["type": "animal", "name": "narwhal", "novalue": ""];
uri.fragment = "nose";
assert(uri.toString() == "foo://username:password@example.com:8042/over/there/index.dtb?novalue=&name=narwhal&type=animal#nose");

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation