October 17, 2015
On Saturday, 17 October 2015 at 08:20:33 UTC, Daniel N wrote:
> On Saturday, 17 October 2015 at 08:07:57 UTC, Martin Nowak wrote:
>> On Wednesday, 14 October 2015 at 07:35:49 UTC, Marco Leise wrote:
>>>   - Data size limited by available contiguous virtual memory
>>
>> Mmaping files for sequential reading is a very debatable choice, b/c the common use case is to read a file once. You should at least compare the numbers w/ drop_caches between each run.
>>
>
> It's a sensible choice together with appropriate madvise().

Mmap is very expensive, as it affects all cores, you need a realistic multithreaded aync benchmark on smaller files to see the effect.
October 17, 2015
Am Sat, 17 Oct 2015 09:27:46 +0200
schrieb Sönke Ludwig <sludwig@rejectedsoftware.com>:

> Am 16.10.2015 um 17:11 schrieb Marco Leise:
> > Am Thu, 15 Oct 2015 18:17:07 +0200
> > schrieb Sönke Ludwig <sludwig@rejectedsoftware.com>:
> >
> >> (...)
> >> Do you have a test case for your error?
> >
> > Well it is not an error. Rory originally wrote about conversions between "1" and 1 happening on the browser side. That would mean adding a quirks mode to any well-behaving JSON parser. In this case: "read numbers as strings". Hence I was asking if the data on the client could be fixed, e.g. the json number be turned into a string first before serialization.
> >
> 
> Okay, I obviously misread that as a once familiar issue. Maybe it indeed makes sense to add a "JavaScript" quirks mode that behaves exactly like a JavaScript interpreter would.

Ok, but remember: https://www.youtube.com/watch?v=20BySC_6HyY And then think again. :D

-- 
Marco

October 17, 2015
Am Sat, 17 Oct 2015 08:29:24 +0000
schrieb Ola Fosheim Grøstad
<ola.fosheim.grostad+dlang@gmail.com>:

> On Saturday, 17 October 2015 at 08:20:33 UTC, Daniel N wrote:
> > On Saturday, 17 October 2015 at 08:07:57 UTC, Martin Nowak wrote:
> >> On Wednesday, 14 October 2015 at 07:35:49 UTC, Marco Leise wrote:
> >>>   - Data size limited by available contiguous virtual memory
> >>
> >> Mmaping files for sequential reading is a very debatable choice, b/c the common use case is to read a file once. You should at least compare the numbers w/ drop_caches between each run.

The results are:
* The memory usage is then fixed at slightly more than the
  file size. (While it often stays below when using the disk
  cache.)
* It would still be faster than copying the whole
  thing to a separate memory block.
* Depending on whether the benchmark system uses a HDD or SSD,
  the numbers may be rendered meaningless by a 2 seconds wait
  on I/O.
* Common case yes, but it is possible that you read JSON that
  had just been saved.

> > It's a sensible choice together with appropriate madvise().

Obviously agreed :). Just that in practice (on my HDD system)
it never made a difference in I/O bound sequential reads. So I
removed posix_madvise.

> Mmap is very expensive, as it affects all cores, you need a realistic multithreaded aync benchmark on smaller files to see the effect.

That's valuable information. It is trivial to read into an allocated block when the file size is below a threshold. I would just need a rough file size. Are you talking about 4K pages or mega-bytes? 64 KiB maybe?

-- 
Marco

October 17, 2015
Am 17.10.2015 um 13:16 schrieb Marco Leise:
> Am Sat, 17 Oct 2015 09:27:46 +0200
> schrieb Sönke Ludwig <sludwig@rejectedsoftware.com>:
>> Okay, I obviously misread that as a once familiar issue. Maybe it indeed
>> makes sense to add a "JavaScript" quirks mode that behaves exactly like
>> a JavaScript interpreter would.
>
> Ok, but remember: https://www.youtube.com/watch?v=20BySC_6HyY
> And then think again. :D
>

What about just naming it SerializationMode.WAT?
October 17, 2015
On Saturday, 17 October 2015 at 09:30:47 UTC, Marco Leise wrote:
> It is trivial to read into an allocated block when the file size is below a threshold. I would just need a rough file size. Are you talking about 4K pages or mega-bytes? 64 KiB maybe?

Maybe, I guess you could just focus on what you think is the primary usage patterns for your library and benchmark those for different parameters.

If you want to test processing of many small files combined with computationally/memory intensive tasks then you could try to construct a simple benchmark where you iterate over memory (M*cache 3 size) using a "realistic" pattern like brownian motion in N threads and also repeatedly/concurrently load JSON code for different file sizes so that the CPUs page table mechanisms are stressed by mmap, cache misses and (possibly) page faults.

October 17, 2015
Am Sat, 17 Oct 2015 11:12:08 +0000
schrieb Ola Fosheim Grøstad
<ola.fosheim.grostad+dlang@gmail.com>:

> […] you could try to construct a simple benchmark where you iterate over memory (M*cache 3 size) using a "realistic" pattern like brownian motion in N threads and also repeatedly/concurrently load JSON code for different file sizes so that the CPUs page table mechanisms are stressed by mmap, cache misses and (possibly) page faults.

O.O   Are you kidding me? Just give me the correct value
already.

-- 
Marco

October 17, 2015
On Friday, 16 October 2015 at 10:08:06 UTC, Andrei Alexandrescu wrote:
> On 10/15/15 10:40 PM, Jacob Carlborg wrote:
>> On 2015-10-15 14:51, Johannes Pfau wrote:
>>
>>> Doesn't the GPL force everybody _using_ fast.json to also use the GPL
>>> license?
>>
>> Yes, it does have that enforcement.
>
> Then we'd need to ask Marco if he's willing to relicense the code with Boost. -- Andrei

I've just crossed my fingers.

Piotrek
October 17, 2015
On Saturday, 17 October 2015 at 13:09:45 UTC, Marco Leise wrote:
> Am Sat, 17 Oct 2015 11:12:08 +0000
> schrieb Ola Fosheim Grøstad
> <ola.fosheim.grostad+dlang@gmail.com>:
>
>> […] you could try to construct a simple benchmark where you iterate over memory (M*cache 3 size) using a "realistic" pattern like brownian motion in N threads and also repeatedly/concurrently load JSON code for different file sizes so that the CPUs page table mechanisms are stressed by mmap, cache misses and (possibly) page faults.
>
> O.O   Are you kidding me? Just give me the correct value
> already.

:-P
October 17, 2015
If this is the benchmark I'm remembering, the bulk of the time is spent parsing the floating point numbers. So it isn't a test of JSON parsing in general so much as the speed of scanf.
October 17, 2015
On 10/17/15 6:43 PM, Sean Kelly wrote:
> If this is the benchmark I'm remembering, the bulk of the time is spent
> parsing the floating point numbers. So it isn't a test of JSON parsing
> in general so much as the speed of scanf.

In many cases the use of scanf can be replaced with drastically faster methods, as I discuss in my talks on optimization (including Brasov recently). I hope they'll release the videos soon. -- Andrei