Faster Command Line Tools in D (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » Faster Command Line Tools in D (page 3)

May 26, 2017

Re: Faster Command Line Tools in D

Posted by Nick Sabalausky (Abscissa)
in reply to xtreak

Nick Sabalausky (Abscissa)

Posted in reply to xtreak

On 05/25/2017 08:30 AM, xtreak wrote:
> 
> There are repeated references over usage of D at Netflix for machine learning. It will be a very helpful boost if someone comes up with any reference or a post regarding how D is used at Netflix and addition of Netflix to https://dlang.org/orgs-using-d.html will be amazing.
> 

I've used netflix. If its "suggestion" features are any indication, I'm not sure such a thing would be a feather in D's cap ;)

May 28, 2017

Re: Faster Command Line Tools in D

Posted by cym13
in reply to Jonathan M Davis

cym13

Posted in reply to Jonathan M Davis

On Thursday, 25 May 2017 at 16:19:16 UTC, Jonathan M Davis wrote:
> I wouldn't expect any of the split-related functions to be going away. We often have a function that operates on arrays or strings and another which operates on more general ranges. It may mainly be for historical reasons, but removing the array-based functions would break existing code, and we'd get a whole other set of complaints about folks not understanding that you need to slap array() on the end of a call to splitter to get the split that they were looking for (especially if they're coming from another language and don't understand ranges yet). And ultimately, the array-based functions continue to serve as a way to have simpler code when you don't care about (or you actually need) the additional memory allocations.

I don't think know if people coming from other languages would really mind. Of course it would have to be taught onces, everything has, but many languages (and I have python especially in mind) have been lazifying their standard libraries for years now. I think consistency is what brings less questions, not diversity where one of the possibilities corresponds to what the programmer wants. He'll ask for the difference anyway.

> Also, splitLines/lineSplitter can't actually be written in terms of split/splitter, because split/splitter does not have a way to provide multiple delimeters (let alone multiple delimeters where one includes the other, which is what you get with "\n" and "\r\n"). So, that distinction isn't going away. It's also a common enough operation that having a function for it rather than having to pass all of the delimeters to a more general function is arguably worth it, just like having the overload of split/splitter which takes no delimiter and then splits on whitespace is arguably worth it over having a more general function where you have to feed it every variation of whitespace.
>
> - Jonathan M Davis

May 30, 2017

Re: Faster Command Line Tools in D

Posted by Steven Schveighoffer
in reply to John Colvin

Steven Schveighoffer

Posted in reply to John Colvin

On 5/26/17 10:41 AM, John Colvin wrote:
> On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
>> Some of you may remember Jon Degenhardt's talk from one of the Silicon
>> Valley D meetups, where he described the performance improvements he
>> saw when he rewrote some of eBay's command line tools in D. He has now
>> put the effort into crafting a blog post on the same topic, where he
>> takes D version of a command-line tool written in Python and
>> incrementally improves its performance.
>>
>> The blog:
>> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>>
>> Reddit:
>> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/
>>
>
> I spent some time fiddling with my own manual approaches to making this
> as fast, wasn't satisfied and so decided to try using Steven's iopipe
> (https://github.com/schveiguy/iopipe) instead. Results were excellent.
>
> https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242

nice! hm....

/** something vaguely like this should be in iopipe, users shouldn't need to write it */
auto ref runWithEncoding(alias process, FileT, Args...)(FileT file, auto ref Args args)

stealing for iopipe, thanks :) I'll need to dedicate another slide to you...

>
> On my machine:
> python takes a little over 20s, pypy wobbles around 3.5s, v1 from the
> blog takes about 3.9s, v4b took 1.45s, a version of my own that is
> hideous* manages 0.78s at best, the above version with iopipe hits below
> 0.67s most runs.
>
> Not bad for a process that most people would call "IO-bound" (code for
> "I don't want to have to write fast code & it's all the disk's fault").
>
> Obviously this version is a bit more code than is ideal, iopipe is
> currently quite "barebones", but I don't see why with some clever
> abstractions and wrappers it couldn't be the default thing that one does
> even for small scripts.

The idea behind iopipe is to give you the building blocks to create exactly the pipeline you need, without a lot of effort. Once you have those blocks, then you make higher level functions out of it. Like you have above :)

BTW, there is a byLineRange function that handles slicing off the newline character inside iopipe.textpipe.

-Steve

May 30, 2017

Re: Faster Command Line Tools in D

Posted by Steven Schveighoffer
in reply to John Colvin

Steven Schveighoffer

Posted in reply to John Colvin

On 5/26/17 11:20 AM, John Colvin wrote:
> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>> I spent some time fiddling with my own manual approaches to making
>> this as fast, wasn't satisfied and so decided to try using Steven's
>> iopipe (https://github.com/schveiguy/iopipe) instead. Results were
>> excellent.
>>
>> https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242
>
> This version also has the advantage of being (discounting any bugs in
> iopipe) correct for arbitrary unicode in all common UTF encodings.

I worked a lot on making sure this works properly. However, it's possible that there are some lingering issues.

I also did not spend much time optimizing these paths (whereas I spent a ton of time getting the utf8 line parsing as fast as it could be). Partly because finding things other than utf8 in the wild is rare, and partly because I have nothing to compare it with to know what is possible :)

-Steve

May 30, 2017

Re: Faster Command Line Tools in D

Posted by Patrick Schluter
in reply to Steven Schveighoffer

Patrick Schluter

Posted in reply to Steven Schveighoffer

On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote:
> On 5/26/17 11:20 AM, John Colvin wrote:
>> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>>> [...]
>>
>> This version also has the advantage of being (discounting any bugs in
>> iopipe) correct for arbitrary unicode in all common UTF encodings.
>
> I worked a lot on making sure this works properly. However, it's possible that there are some lingering issues.
>
> I also did not spend much time optimizing these paths (whereas I spent a ton of time getting the utf8 line parsing as fast as it could be). Partly because finding things other than utf8 in the wild is rare, and partly because I have nothing to compare it with to know what is possible :)
>
> -Steve

If you want UCS-2 (aka UTF-16 without surrogates) data I can give you gigabytes of files in tmx format.

May 30, 2017

Re: Faster Command Line Tools in D

Posted by Steven Schveighoffer
in reply to Patrick Schluter

Steven Schveighoffer

Posted in reply to Patrick Schluter

On 5/30/17 5:57 PM, Patrick Schluter wrote:
> On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote:
>> On 5/26/17 11:20 AM, John Colvin wrote:
>>> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>>>> [...]
>>>
>>> This version also has the advantage of being (discounting any bugs in
>>> iopipe) correct for arbitrary unicode in all common UTF encodings.
>>
>> I worked a lot on making sure this works properly. However, it's
>> possible that there are some lingering issues.
>>
>> I also did not spend much time optimizing these paths (whereas I spent
>> a ton of time getting the utf8 line parsing as fast as it could be).
>> Partly because finding things other than utf8 in the wild is rare, and
>> partly because I have nothing to compare it with to know what is
>> possible :)
>
> If you want UCS-2 (aka UTF-16 without surrogates) data I can give you
> gigabytes of files in tmx format.

The data I can (and have) generated from UTF-8 data. I have tested my byLine parser to make sure it properly splits on "interesting" code points in all widths. UTF-16 data without surrogates should probably work fine. I haven't tuned it though like I tuned the UTF-8 version. Is there a memchr for wide characters? ;)

What I really haven't done is compared my line parsing code with multi-code-unit delimiters against one that can do the same thing. I know Phobos and C FILE * really can't do it. I haven't really looked at all in C++, so I should probably look there before giving up.

-Steve

May 31, 2017

Re: Faster Command Line Tools in D

Posted by Patrick Schluter
in reply to Steven Schveighoffer

Patrick Schluter

Posted in reply to Steven Schveighoffer

On Tuesday, 30 May 2017 at 22:31:50 UTC, Steven Schveighoffer wrote:
> On 5/30/17 5:57 PM, Patrick Schluter wrote:
>> On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote:
>>> On 5/26/17 11:20 AM, John Colvin wrote:
>>>> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>>>>> [...]
>>>>
>>>> This version also has the advantage of being (discounting any bugs in
>>>> iopipe) correct for arbitrary unicode in all common UTF encodings.
>>>
>>> I worked a lot on making sure this works properly. However, it's
>>> possible that there are some lingering issues.
>>>
>>> I also did not spend much time optimizing these paths (whereas I spent
>>> a ton of time getting the utf8 line parsing as fast as it could be).
>>> Partly because finding things other than utf8 in the wild is rare, and
>>> partly because I have nothing to compare it with to know what is
>>> possible :)
>>
>> If you want UCS-2 (aka UTF-16 without surrogates) data I can give you
>> gigabytes of files in tmx format.
>
> The data I can (and have) generated from UTF-8 data. I have tested my byLine parser to make sure it properly splits on "interesting" code points in all widths. UTF-16 data without surrogates should probably work fine. I haven't tuned it though like I tuned the UTF-8 version. Is there a memchr for wide characters? ;)
>
> What I really haven't done is compared my line parsing code with multi-code-unit delimiters against one that can do the same thing. I know Phobos and C FILE * really can't do it. I haven't really looked at all in C++, so I should probably look there before giving up.
>
> -Steve

In any case, you can download the dataset from [1] if you like. There are several 100 Mb big zip files containing a collection of tmx files (translation memory exchange) with European Legislation. The files contain multi-alignment texts in up to 24 languages. The files are encoded in UCS-2 little-endian. I know for a fact (because I compiled the data) that they don't contain characters outside of the BMP. The data is public and can be used freely (as in beer).
When I get some time, I will try to port the java app that is distributed with it to D (partially done yet).

[1]: https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory

May 31, 2017

Re: Faster Command Line Tools in D

Posted by Steven Schveighoffer
in reply to Patrick Schluter

Steven Schveighoffer

Posted in reply to Patrick Schluter

On 5/31/17 1:09 AM, Patrick Schluter wrote:
> In any case, you can download the dataset from [1] if you like. There
> are several 100 Mb big zip files containing a collection of tmx files
> (translation memory exchange) with European Legislation. The files
> contain multi-alignment texts in up to 24 languages. The files are
> encoded in UCS-2 little-endian. I know for a fact (because I compiled
> the data) that they don't contain characters outside of the BMP. The
> data is public and can be used freely (as in beer).
> When I get some time, I will try to port the java app that is
> distributed with it to D (partially done yet).
>
> [1]:
> https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory

Thanks, I'll bookmark it for later use.

-Steve

August 08, 2017

Re: Faster Command Line Tools in D

Posted by Joakim
in reply to Mike Parker

Joakim

Posted in reply to Mike Parker

On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
> Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance.
>
> The blog:
> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>
> Reddit:
> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/

Heh, happened to notice that this blog post now has 21 comments, with people posting links to versions in Go, C++, and Kotlin up till this week, months after the post went up! :D

August 08, 2017

Re: Faster Command Line Tools in D

Posted by bachmeier
in reply to Joakim

bachmeier

Posted in reply to Joakim

On Tuesday, 8 August 2017 at 21:51:30 UTC, Joakim wrote:
> On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
>> Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance.
>>
>> The blog:
>> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>>
>> Reddit:
>> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/
>
> Heh, happened to notice that this blog post now has 21 comments, with people posting links to versions in Go, C++, and Kotlin up till this week, months after the post went up! :D

There was also a Haskell version on Reddit.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation