Thread overview | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
April 12, 2016 Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Hi all, I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises. The tools are here: https://github.com/eBay/tsv-utils-dlang They are likely of interest primarily to people regularly working with large files, though others might find the performance benchmarks of interest as well (included in the README). I'd welcome any feedback, either on the apps or the code. Intention is that the code be reasonable example programs. And, I may write a blog post about my D explorations at some point, they'd be referenced in such an article. --Jon |
April 12, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon D | On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote:
> Hi all,
>
> I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises.
>
> [...]
Hmm, benchmarks are nice, someone post to reddit?
|
April 12, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon D | On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote:
> Hi all,
>
> I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises.
>
> [...]
Interesting, I have large csv files, and this lib will be useful.
Can you put it onto code.dlang.org so that we could use it with dub?
|
April 12, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Puming | On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote:
> On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote:
>> Hi all,
>>
>> I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises.
>>
>> [...]
>
> Interesting, I have large csv files, and this lib will be useful.
> Can you put it onto code.dlang.org so that we could use it with dub?
I'd certainly like to make it available via dub, but I wasn't sure how to set it up. There are two issues. One is that the package builds multiple executables, which dub doesn't seem to support easily. More problematic is that quite a bit of the test suite is run against the executables, which I could automate using make, but didn't see how to do it with dub.
If there are suggestions for setting this up in dub that'd be great. An example project doing something similar would be really helpful.
--Jon
|
April 12, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon D | On Tuesday, 12 April 2016 at 07:17:05 UTC, Jon D wrote:
>
> I'd certainly like to make it available via dub, but I wasn't sure how to set it up. There are two issues. One is that the package builds multiple executables, which dub doesn't seem to support easily. More problematic is that quite a bit of the test suite is run against the executables, which I could automate using make, but didn't see how to do it with dub.
>
> If there are suggestions for setting this up in dub that'd be great. An example project doing something similar would be really helpful.
Dub is indeed not ideal for building multiple executables. You can either use subConfigurations or subPackages. In your case I would probably go the subPackages route, with the root dub file depending on all the executables. Never done that before though, so not exactly sure if that would work. If it works though then I'd think dub test in the root would run the tests for each subPackage.
|
April 12, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon D | On 04/11/2016 05:50 PM, Jon D wrote: > The tools are here: https://github.com/eBay/tsv-utils-dlang > --Jon Congratulations Jon. Really cool stuff! :) Ali |
April 13, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon D | On Tuesday, 12 April 2016 at 07:17:05 UTC, Jon D wrote:
> On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote:
>> On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote:
>>> Hi all,
>>>
>>> I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises.
>>>
>>> [...]
>>
>> Interesting, I have large csv files, and this lib will be useful.
>> Can you put it onto code.dlang.org so that we could use it with dub?
>
> I'd certainly like to make it available via dub, but I wasn't sure how to set it up. There are two issues. One is that the package builds multiple executables, which dub doesn't seem to support easily. More problematic is that quite a bit of the test suite is run against the executables, which I could automate using make, but didn't see how to do it with dub.
>
> If there are suggestions for setting this up in dub that'd be great. An example project doing something similar would be really helpful.
>
> --Jon
Here is what I know of it, using subPackages:
Say you have a project named myapp, and you need three executables, app1, app2, app3, they all depend on a common code base, which you name it common.
Using dub, you can have a parent project myapp, that does nothing but is a container of the three apps and their common code.
dub.sdl in myapp dir:
```
name "myapp"
dependency ":common" version="*"
subPackage "./common/"
dependency ":app1" version="*"
subPackage "./app1/"
dependency ":app2" version="*"
subPackage "./app2/"
dependency ":app3" version="*"
subPackage "./app3/"
```
the comma in dependency name ":common" is equal to "myapp:common"
now use `dub init common` and the like to create subdirectories.
change dub.sdl in the subdirectory common so that it becomes a library type:
```
name "common"
targetType "library"
```
change dub.sdl in myapp* subdirectories to depend on common:
```
name "app1"
targetType "executable"
dependency "myapp:common" version="*"
```
note here you need to add root project name "myapp:common".
Then you should register your whole project into the local dub repo, so that subpackages can find its dependencies when building:
in the project root directory:
dub add-local .
Now you can build each executable with:
dub build :app1
dub build :app2
dub build :app3
Unfortunately dub does not build all sub packages at once when you dub in the root directory.
But I think there might be a better way to handle multiple executables?
|
April 13, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Puming Attachments:
| On Wed, Apr 13, 2016 at 3:41 AM, Puming via Digitalmars-d-announce < digitalmars-d-announce@puremagic.com> wrote: > On Tuesday, 12 April 2016 at 07:17:05 UTC, Jon D wrote: > >> On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote: >> >>> On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote: >>> >>>> Hi all, >>>> >>>> I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises. >>>> >>>> [...] >>>> >>> >>> Interesting, I have large csv files, and this lib will be useful. Can you put it onto code.dlang.org so that we could use it with dub? >>> >> >> I'd certainly like to make it available via dub, but I wasn't sure how to set it up. There are two issues. One is that the package builds multiple executables, which dub doesn't seem to support easily. More problematic is that quite a bit of the test suite is run against the executables, which I could automate using make, but didn't see how to do it with dub. >> >> If there are suggestions for setting this up in dub that'd be great. An example project doing something similar would be really helpful. >> >> --Jon >> > > Here is what I know of it, using subPackages: > > Say you have a project named myapp, and you need three executables, app1, app2, app3, they all depend on a common code base, which you name it common. > > Using dub, you can have a parent project myapp, that does nothing but is a container of the three apps and their common code. > > dub.sdl in myapp dir: > > ``` > name "myapp" > > dependency ":common" version="*" > subPackage "./common/" > > dependency ":app1" version="*" > subPackage "./app1/" > > dependency ":app2" version="*" > subPackage "./app2/" > > dependency ":app3" version="*" > subPackage "./app3/" > ``` > > the comma in dependency name ":common" is equal to "myapp:common" > > now use `dub init common` and the like to create subdirectories. > > change dub.sdl in the subdirectory common so that it becomes a library type: > > ``` > name "common" > > targetType "library" > > ``` > > change dub.sdl in myapp* subdirectories to depend on common: > > ``` > name "app1" > targetType "executable" > > dependency "myapp:common" version="*" > ``` > > note here you need to add root project name "myapp:common". > > Then you should register your whole project into the local dub repo, so that subpackages can find its dependencies when building: > > in the project root directory: > > dub add-local . > > Now you can build each executable with: > > dub build :app1 > dub build :app2 > dub build :app3 > > Unfortunately dub does not build all sub packages at once when you dub in the root directory. > > But I think there might be a better way to handle multiple executables? > > > Just tried your suggestion and it works. I just added the below to the parent project to get the apps build: void main() { import std.process : executeShell; executeShell(`dub build :app1`); executeShell(`dub build :app2`); executeShell(`dub build :app3`); } |
April 13, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jon D | On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote:
> Hi all,
>
> I've open sourced a set of command line utilities for manipulating tab-separated value files. They are complementary to traditional unix tools like cut, grep, etc. They're useful for manipulating large data files. I use them when prepping files for R and similar tools. These tools were part of my 'explore D' programming exercises.
>
> The tools are here: https://github.com/eBay/tsv-utils-dlang
>
> They are likely of interest primarily to people regularly working with large files, though others might find the performance benchmarks of interest as well (included in the README).
>
> I'd welcome any feedback, either on the apps or the code. Intention is that the code be reasonable example programs. And, I may write a blog post about my D explorations at some point, they'd be referenced in such an article.
>
> --Jon
I rarely need TSV files, but I deal with CSV files every day.
- It would be nice to test your implementation against std.csv (it can use TAB as separator). Did you try to compare the two?
|
April 13, 2016 Re: Command line utilities for tab-separated value files | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rory McGuire | On Wednesday, 13 April 2016 at 07:34:11 UTC, Rory McGuire wrote:
> On Wed, Apr 13, 2016 at 3:41 AM, Puming via Digitalmars-d-announce < digitalmars-d-announce@puremagic.com> wrote:
>
>>> On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote:
>> Here is what I know of it, using subPackages:
>>
>
> Just tried your suggestion and it works. I just added the below to the
> parent project to get the apps build:
> void main() {
> import std.process : executeShell;
> executeShell(`dub build :app1`);
> executeShell(`dub build :app2`);
> executeShell(`dub build :app3`);
> }
Thanks Rory, Puming. I'll look into this and see how best to make it fit. I'm realizing also there's one additional capability it'd be nice to have in dub for tools like this, which in an option to install the executables somewhere that can be easily be put on the path. Still, even without this there'd be benefit to having them fetched via dub.
--Jon
|
Copyright © 1999-2021 by the D Language Foundation