Faster Command Line Tools in D (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » Faster Command Line Tools in D (page 2)

May 25, 2017

Re: Faster Command Line Tools in D

Posted by xtreak
in reply to Mike Parker

xtreak

Posted in reply to Mike Parker

On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
> Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance.
>
> The blog:
> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>
> Reddit:
> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/

There are repeated references over usage of D at Netflix for machine learning. It will be a very helpful boost if someone comes up with any reference or a post regarding how D is used at Netflix and addition of Netflix to https://dlang.org/orgs-using-d.html will be amazing.

References :

https://news.ycombinator.com/item?id=14064012
https://news.ycombinator.com/item?id=14413546

May 25, 2017

Re: Faster Command Line Tools in D

Posted by Steven Schveighoffer
in reply to Wulfklaue

Steven Schveighoffer

Posted in reply to Wulfklaue

On 5/25/17 6:27 AM, Wulfklaue wrote:

> - Also wondering why one needed std.algorithm splitter, when you expect
> string split to be the fasted. Even the fact that you need to import
> std.array to split a string simply felt  strange.

Because split allocates on every call. The key, in many cases in D, to increasing performance is avoiding allocations. Has been that way for as long as I can remember.

Another possibility to "fix" this problem is to simply use an allocator with split that allocates on some predefined stack space. This is very similar to what v3 does with the Appender. Unfortunately, allocator is still experimental, and so split doesn't support using it.

> - So much effort for relative little gain ( after v2 splitter ). The
> time spend on finding a faster solution is in business sense not worth
> it. But not finding a faster way is simply wasting performance, just on
> this simple function.

The answer is always "it depends". If you're processing hundreds of these files in tight loops, it probably makes sense to optimize the code. If not, then it may make sense to focus efforts elsewhere. The point of the article is, this is how to do it if you need performance there.

> - Started to wonder if Python its PyPy is so optimized that without any
> effort, your even faster then D. What other D idiomatic functions are slow?

split didn't actually seem that slow. I'll note that you could opt for just the AA optimization (the converting char[] to string only when storing a new hash lookup is big, and not that cumbersome) and leave the code for split alone, and you probably still could beat the Python code.

> Off-topic:
>
> Yesterday i was struggling with split but for a whole different reason.
> Take in account that i am new at D.
>
> Needed to split a string. Simple right? Search Google for "split string
> dlang". Get on the https://dlang.org/phobos/std_string.html page.
>
> After seeing the splitLines and start experimenting with it. Half a hour
> later i realize that the wrong function was used and needed to import
> std.array split function.
>
> Call it a issue with the documentation or my own stupidity. But the fact
> that Split was only listed as a imported function, in this mass of text,
> totally send me on the wrong direction.
>
> As stated above, i expected split to be part of the std.string, because
> i am manipulating a string, not that i needed to import std.array what
> is the end result.

std.string, std.array, and std.algorithm all have cross-polination when it comes to array operations. It has to do with the history of when the modules were introduced.

> I simply find the documentation confusing with the wall of text. When i
> search for string split, you expect to arrive on the string.split page.
> Not only that, the split example are using split as a separate keyword,
> when i was looking for variable.split().

There is a search field on the top, which helps to narrow down what choices are available.

> Veteran D programmers are probably going to laughing at me for this but
> one does feel a bit salty after that.

I understand your pain. I work with Swift often, and sometimes it's very frustrating trying to find the right tool for the job, as I'm not thoroughly immersed in Apple's SDK on a day-to-day basis. I don't know that any programming language gets this perfect.

-Steve

May 25, 2017

Re: Faster Command Line Tools in D

Posted by Suliman
in reply to Steven Schveighoffer

Suliman

Posted in reply to Steven Schveighoffer

> std.string, std.array, and std.algorithm all have cross-polination when it comes to array operations. It has to do with the history of when the modules were introduced.

Is there any plan to deprecate all splitters and make one single. Because now as I understand we have 4 functions that make same task.

May 25, 2017

Re: Faster Command Line Tools in D

Posted by Jonathan M Davis
in reply to Steven Schveighoffer

Jonathan M Davis

Posted in reply to Steven Schveighoffer

On Thursday, May 25, 2017 08:46:17 Steven Schveighoffer via Digitalmars-d- announce wrote:
> std.string, std.array, and std.algorithm all have cross-polination when it comes to array operations. It has to do with the history of when the modules were introduced.

Not only that, but over time, there has been a push to generalize functions. So, something that might have originally gotten put in std.string (because you'd normally think of it as a string function) got moved to std.array, because it could easily be generalized to work on arrays in general and not just string operations (I believe that split is an example of this). And something which was in std.array or std.string might have been generalized for ranges in general, in which case, we ended up with a new function in std.algorithm (hence, we have splitter in std.algorithm but split in std.array).

The end result tends to make sense if you understand that functions that only operate on strings go in std.string, functions that operate on dynamic arrays in general (but not ranges) go in std.array, and functions which could have gone in std.string or std.array except that they operate on ranges in general go in std.algorithm. But if you don't understand that, it tends to be quite confusing, and even if you do, it's often the case that when you want to find a function to operate on a string, you're going to need to look in std.string, std.array, and std.algorithm.

So, in part, it's an evolution thing, and in part, it's often just plain hard to find stuff when you're focused on a specific use case, and the library writer is focused on making the function that you need as general as possible.

- Jonathan M Davis

May 25, 2017

Re: Faster Command Line Tools in D

Posted by Jonathan M Davis
in reply to Suliman

Jonathan M Davis

Posted in reply to Suliman

On Thursday, May 25, 2017 14:17:27 Suliman via Digitalmars-d-announce wrote:
> > std.string, std.array, and std.algorithm all have cross-polination when it comes to array operations. It has to do with the history of when the modules were introduced.
>
> Is there any plan to deprecate all splitters and make one single. Because now as I understand we have 4 functions that make same task.

I wouldn't expect any of the split-related functions to be going away. We often have a function that operates on arrays or strings and another which operates on more general ranges. It may mainly be for historical reasons, but removing the array-based functions would break existing code, and we'd get a whole other set of complaints about folks not understanding that you need to slap array() on the end of a call to splitter to get the split that they were looking for (especially if they're coming from another language and don't understand ranges yet). And ultimately, the array-based functions continue to serve as a way to have simpler code when you don't care about (or you actually need) the additional memory allocations.

Also, splitLines/lineSplitter can't actually be written in terms of split/splitter, because split/splitter does not have a way to provide multiple delimeters (let alone multiple delimeters where one includes the other, which is what you get with "\n" and "\r\n"). So, that distinction isn't going away. It's also a common enough operation that having a function for it rather than having to pass all of the delimeters to a more general function is arguably worth it, just like having the overload of split/splitter which takes no delimiter and then splits on whitespace is arguably worth it over having a more general function where you have to feed it every variation of whitespace.

- Jonathan M Davis

May 25, 2017

Re: Faster Command Line Tools in D

Posted by Ali Çehreli
in reply to Mike Parker

Ali Çehreli

Posted in reply to Mike Parker

On 05/24/2017 06:39 AM, Mike Parker wrote:

> Reddit:
> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/
>

Inspired Nim version, found on Reddit:


https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/

Ali

May 26, 2017

Re: Faster Command Line Tools in D

Posted by Basile B.
in reply to Ali Çehreli

Basile B.

Posted in reply to Ali Çehreli

On Thursday, 25 May 2017 at 22:04:36 UTC, Ali Çehreli wrote:
> On 05/24/2017 06:39 AM, Mike Parker wrote:
>
>> Reddit:
>> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/
>>
>
> Inspired Nim version, found on Reddit:
>
>
> https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/
>
> Ali

Wow, the D blog post opened Pandora's box.

May 26, 2017

Re: Faster Command Line Tools in D

Posted by bachmeier
in reply to Basile B.

bachmeier

Posted in reply to Basile B.

On Friday, 26 May 2017 at 06:05:11 UTC, Basile B. wrote:
> On Thursday, 25 May 2017 at 22:04:36 UTC, Ali Çehreli wrote:
>> On 05/24/2017 06:39 AM, Mike Parker wrote:
>>
>>> Reddit:
>>> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/
>>>
>>
>> Inspired Nim version, found on Reddit:
>>
>>
>> https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/
>>
>> Ali
>
> Wow, the D blog post opened Pandora's box.

I guess programmers will do comparisons of language speed independent of whether it makes sense for that problem.

May 26, 2017

Re: Faster Command Line Tools in D

Posted by John Colvin
in reply to Mike Parker

John Colvin

Posted in reply to Mike Parker

On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
> Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance.
>
> The blog:
> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>
> Reddit:
> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/

I spent some time fiddling with my own manual approaches to making this as fast, wasn't satisfied and so decided to try using Steven's iopipe (https://github.com/schveiguy/iopipe) instead. Results were excellent.

https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242

On my machine:
python takes a little over 20s, pypy wobbles around 3.5s, v1 from the blog takes about 3.9s, v4b took 1.45s, a version of my own that is hideous* manages 0.78s at best, the above version with iopipe hits below 0.67s most runs.

Not bad for a process that most people would call "IO-bound" (code for "I don't want to have to write fast code & it's all the disk's fault").

Obviously this version is a bit more code than is ideal, iopipe is currently quite "barebones", but I don't see why with some clever abstractions and wrappers it couldn't be the default thing that one does even for small scripts.

*using byChunk and manually managing linesplits over chunks, very nasty.

May 26, 2017

Re: Faster Command Line Tools in D

Posted by John Colvin
in reply to John Colvin

John Colvin

Posted in reply to John Colvin

On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
> I spent some time fiddling with my own manual approaches to making this as fast, wasn't satisfied and so decided to try using Steven's iopipe (https://github.com/schveiguy/iopipe) instead. Results were excellent.
>
> https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242

This version also has the advantage of being (discounting any bugs in iopipe) correct for arbitrary unicode in all common UTF encodings.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation