May 26, 2017
On 05/25/2017 08:30 AM, xtreak wrote:
> 
> There are repeated references over usage of D at Netflix for machine learning. It will be a very helpful boost if someone comes up with any reference or a post regarding how D is used at Netflix and addition of Netflix to https://dlang.org/orgs-using-d.html will be amazing.
> 

I've used netflix. If its "suggestion" features are any indication, I'm not sure such a thing would be a feather in D's cap ;)
May 28, 2017
On Thursday, 25 May 2017 at 16:19:16 UTC, Jonathan M Davis wrote:
> I wouldn't expect any of the split-related functions to be going away. We often have a function that operates on arrays or strings and another which operates on more general ranges. It may mainly be for historical reasons, but removing the array-based functions would break existing code, and we'd get a whole other set of complaints about folks not understanding that you need to slap array() on the end of a call to splitter to get the split that they were looking for (especially if they're coming from another language and don't understand ranges yet). And ultimately, the array-based functions continue to serve as a way to have simpler code when you don't care about (or you actually need) the additional memory allocations.

I don't think know if people coming from other languages would really mind. Of course it would have to be taught onces, everything has, but many languages (and I have python especially in mind) have been lazifying their standard libraries for years now. I think consistency is what brings less questions, not diversity where one of the possibilities corresponds to what the programmer wants. He'll ask for the difference anyway.

> Also, splitLines/lineSplitter can't actually be written in terms of split/splitter, because split/splitter does not have a way to provide multiple delimeters (let alone multiple delimeters where one includes the other, which is what you get with "\n" and "\r\n"). So, that distinction isn't going away. It's also a common enough operation that having a function for it rather than having to pass all of the delimeters to a more general function is arguably worth it, just like having the overload of split/splitter which takes no delimiter and then splits on whitespace is arguably worth it over having a more general function where you have to feed it every variation of whitespace.
>
> - Jonathan M Davis
May 30, 2017
On 5/26/17 10:41 AM, John Colvin wrote:
> On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
>> Some of you may remember Jon Degenhardt's talk from one of the Silicon
>> Valley D meetups, where he described the performance improvements he
>> saw when he rewrote some of eBay's command line tools in D. He has now
>> put the effort into crafting a blog post on the same topic, where he
>> takes D version of a command-line tool written in Python and
>> incrementally improves its performance.
>>
>> The blog:
>> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>>
>> Reddit:
>> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/
>>
>
> I spent some time fiddling with my own manual approaches to making this
> as fast, wasn't satisfied and so decided to try using Steven's iopipe
> (https://github.com/schveiguy/iopipe) instead. Results were excellent.
>
> https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242

nice! hm....

/** something vaguely like this should be in iopipe, users shouldn't need to write it */
auto ref runWithEncoding(alias process, FileT, Args...)(FileT file, auto ref Args args)

stealing for iopipe, thanks :) I'll need to dedicate another slide to you...

>
> On my machine:
> python takes a little over 20s, pypy wobbles around 3.5s, v1 from the
> blog takes about 3.9s, v4b took 1.45s, a version of my own that is
> hideous* manages 0.78s at best, the above version with iopipe hits below
> 0.67s most runs.
>
> Not bad for a process that most people would call "IO-bound" (code for
> "I don't want to have to write fast code & it's all the disk's fault").
>
> Obviously this version is a bit more code than is ideal, iopipe is
> currently quite "barebones", but I don't see why with some clever
> abstractions and wrappers it couldn't be the default thing that one does
> even for small scripts.

The idea behind iopipe is to give you the building blocks to create exactly the pipeline you need, without a lot of effort. Once you have those blocks, then you make higher level functions out of it. Like you have above :)

BTW, there is a byLineRange function that handles slicing off the newline character inside iopipe.textpipe.

-Steve
May 30, 2017
On 5/26/17 11:20 AM, John Colvin wrote:
> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>> I spent some time fiddling with my own manual approaches to making
>> this as fast, wasn't satisfied and so decided to try using Steven's
>> iopipe (https://github.com/schveiguy/iopipe) instead. Results were
>> excellent.
>>
>> https://gist.github.com/John-Colvin/980b11f2b7a7e23faf8dfb44bd9f1242
>
> This version also has the advantage of being (discounting any bugs in
> iopipe) correct for arbitrary unicode in all common UTF encodings.

I worked a lot on making sure this works properly. However, it's possible that there are some lingering issues.

I also did not spend much time optimizing these paths (whereas I spent a ton of time getting the utf8 line parsing as fast as it could be). Partly because finding things other than utf8 in the wild is rare, and partly because I have nothing to compare it with to know what is possible :)

-Steve
May 30, 2017
On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote:
> On 5/26/17 11:20 AM, John Colvin wrote:
>> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>>> [...]
>>
>> This version also has the advantage of being (discounting any bugs in
>> iopipe) correct for arbitrary unicode in all common UTF encodings.
>
> I worked a lot on making sure this works properly. However, it's possible that there are some lingering issues.
>
> I also did not spend much time optimizing these paths (whereas I spent a ton of time getting the utf8 line parsing as fast as it could be). Partly because finding things other than utf8 in the wild is rare, and partly because I have nothing to compare it with to know what is possible :)
>
> -Steve

If you want UCS-2 (aka UTF-16 without surrogates) data I can give you gigabytes of files in tmx format.
May 30, 2017
On 5/30/17 5:57 PM, Patrick Schluter wrote:
> On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote:
>> On 5/26/17 11:20 AM, John Colvin wrote:
>>> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>>>> [...]
>>>
>>> This version also has the advantage of being (discounting any bugs in
>>> iopipe) correct for arbitrary unicode in all common UTF encodings.
>>
>> I worked a lot on making sure this works properly. However, it's
>> possible that there are some lingering issues.
>>
>> I also did not spend much time optimizing these paths (whereas I spent
>> a ton of time getting the utf8 line parsing as fast as it could be).
>> Partly because finding things other than utf8 in the wild is rare, and
>> partly because I have nothing to compare it with to know what is
>> possible :)
>
> If you want UCS-2 (aka UTF-16 without surrogates) data I can give you
> gigabytes of files in tmx format.

The data I can (and have) generated from UTF-8 data. I have tested my byLine parser to make sure it properly splits on "interesting" code points in all widths. UTF-16 data without surrogates should probably work fine. I haven't tuned it though like I tuned the UTF-8 version. Is there a memchr for wide characters? ;)

What I really haven't done is compared my line parsing code with multi-code-unit delimiters against one that can do the same thing. I know Phobos and C FILE * really can't do it. I haven't really looked at all in C++, so I should probably look there before giving up.

-Steve
May 31, 2017
On Tuesday, 30 May 2017 at 22:31:50 UTC, Steven Schveighoffer wrote:
> On 5/30/17 5:57 PM, Patrick Schluter wrote:
>> On Tuesday, 30 May 2017 at 21:18:42 UTC, Steven Schveighoffer wrote:
>>> On 5/26/17 11:20 AM, John Colvin wrote:
>>>> On Friday, 26 May 2017 at 14:41:39 UTC, John Colvin wrote:
>>>>> [...]
>>>>
>>>> This version also has the advantage of being (discounting any bugs in
>>>> iopipe) correct for arbitrary unicode in all common UTF encodings.
>>>
>>> I worked a lot on making sure this works properly. However, it's
>>> possible that there are some lingering issues.
>>>
>>> I also did not spend much time optimizing these paths (whereas I spent
>>> a ton of time getting the utf8 line parsing as fast as it could be).
>>> Partly because finding things other than utf8 in the wild is rare, and
>>> partly because I have nothing to compare it with to know what is
>>> possible :)
>>
>> If you want UCS-2 (aka UTF-16 without surrogates) data I can give you
>> gigabytes of files in tmx format.
>
> The data I can (and have) generated from UTF-8 data. I have tested my byLine parser to make sure it properly splits on "interesting" code points in all widths. UTF-16 data without surrogates should probably work fine. I haven't tuned it though like I tuned the UTF-8 version. Is there a memchr for wide characters? ;)
>
> What I really haven't done is compared my line parsing code with multi-code-unit delimiters against one that can do the same thing. I know Phobos and C FILE * really can't do it. I haven't really looked at all in C++, so I should probably look there before giving up.
>
> -Steve

In any case, you can download the dataset from [1] if you like. There are several 100 Mb big zip files containing a collection of tmx files (translation memory exchange) with European Legislation. The files contain multi-alignment texts in up to 24 languages. The files are encoded in UCS-2 little-endian. I know for a fact (because I compiled the data) that they don't contain characters outside of the BMP. The data is public and can be used freely (as in beer).
When I get some time, I will try to port the java app that is distributed with it to D (partially done yet).

[1]: https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory
May 31, 2017
On 5/31/17 1:09 AM, Patrick Schluter wrote:
> In any case, you can download the dataset from [1] if you like. There
> are several 100 Mb big zip files containing a collection of tmx files
> (translation memory exchange) with European Legislation. The files
> contain multi-alignment texts in up to 24 languages. The files are
> encoded in UCS-2 little-endian. I know for a fact (because I compiled
> the data) that they don't contain characters outside of the BMP. The
> data is public and can be used freely (as in beer).
> When I get some time, I will try to port the java app that is
> distributed with it to D (partially done yet).
>
> [1]:
> https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory

Thanks, I'll bookmark it for later use.

-Steve
August 08, 2017
On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
> Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance.
>
> The blog:
> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>
> Reddit:
> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/

Heh, happened to notice that this blog post now has 21 comments, with people posting links to versions in Go, C++, and Kotlin up till this week, months after the post went up! :D
August 08, 2017
On Tuesday, 8 August 2017 at 21:51:30 UTC, Joakim wrote:
> On Wednesday, 24 May 2017 at 13:39:57 UTC, Mike Parker wrote:
>> Some of you may remember Jon Degenhardt's talk from one of the Silicon Valley D meetups, where he described the performance improvements he saw when he rewrote some of eBay's command line tools in D. He has now put the effort into crafting a blog post on the same topic, where he takes D version of a command-line tool written in Python and incrementally improves its performance.
>>
>> The blog:
>> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
>>
>> Reddit:
>> https://www.reddit.com/r/programming/comments/6d25mg/faster_command_line_tools_in_d/
>
> Heh, happened to notice that this blog post now has 21 comments, with people posting links to versions in Go, C++, and Kotlin up till this week, months after the post went up! :D

There was also a Haskell version on Reddit.
1 2 3
Next ›   Last »