Thread overview |
---|
May 02, 2022 D for BigData: the first BetterC library by Tamediadigital | ||||
---|---|---|---|---|
| ||||
https://forum.dlang.org/post/hngfeheyklalzoxkyuwq@forum.dlang.org On Saturday, 25 February 2017 at 14:32:00 UTC, Ilya Yaroshenko wrote: > HyperLogLog++ is advanced cardinality estimation algorithm with normal and compressed sparse representations. It can be used to estimate approximate number of unique elements in an unordered set. > > hll-d [1, 2] is written in D. It can be used as betterC library without linking with DRuntime. hll-d has C header and C example. > > Its implementation is based on Mir Algorithm [3] > 1. mir.ndslice.topology.bitpack is used for arrays composed of packed 6bit integers > 2. mir.ndslice.sorting.sort is used for betterC sorting. > > [1] Git: https://github.com/tamediadigital/hll-d > [2] Dub: http://code.dlang.org/packages/hll-d > [3] Mir Algorithm: https://github.com/libmir/mir-algorithm > > Best regards, > Ilya Thanks for the great work. I check the c api, can not figure out how to get the count number for one element. For example if I use it as IP counter, is there a way to know how much count for one IP has been add into set ? |
May 02, 2022 Re: D for BigData: the first BetterC library by Tamediadigital | ||||
---|---|---|---|---|
| ||||
Posted in reply to test123 | On Monday, 2 May 2022 at 05:22:07 UTC, test123 wrote:
> https://forum.dlang.org/post/hngfeheyklalzoxkyuwq@forum.dlang.org
>
> On Saturday, 25 February 2017 at 14:32:00 UTC, Ilya Yaroshenko wrote:
>> HyperLogLog++ is advanced cardinality estimation algorithm with normal and compressed sparse representations. It can be used to estimate approximate number of unique elements in an unordered set.
>>
>> hll-d [1, 2] is written in D. It can be used as betterC library without linking with DRuntime. hll-d has C header and C example.
>>
>> Its implementation is based on Mir Algorithm [3]
>> 1. mir.ndslice.topology.bitpack is used for arrays composed of packed 6bit integers
>> 2. mir.ndslice.sorting.sort is used for betterC sorting.
>>
>> [1] Git: https://github.com/tamediadigital/hll-d
>> [2] Dub: http://code.dlang.org/packages/hll-d
>> [3] Mir Algorithm: https://github.com/libmir/mir-algorithm
>>
>> Best regards,
>> Ilya
>
> Thanks for the great work.
>
> I check the c api, can not figure out how to get the count number for one element.
>
>
> For example if I use it as IP counter, is there a way to know how much count for one IP has been add into set ?
No, that's not what this is for. Hyperloglog is useful if you have a big dataset that may contain duplicates and you want to know how many unique items you have (with a reasonnable probability). For example, as a website, this can be used to estimate how many visitors you have without having to store every single IP address to check for duplicates at new connections. The tradeoff is that it's probabilistic: you don't need to store every address so you need much less space and time to get a count of unique ips, but you have to accept a margin of error on that result and you can't know what the IPs were in the first place, just how many of them there are.
|
May 02, 2022 Re: D for BigData: the first BetterC library by Tamediadigital | ||||
---|---|---|---|---|
| ||||
Posted in reply to Cym13 | On Monday, 2 May 2022 at 06:17:17 UTC, Cym13 wrote:
> No, that's not what this is for. Hyperloglog is useful if you have a big dataset that may contain duplicates and you want to know how many unique items you have (with a reasonnable probability). For example, as a website, this can be used to estimate how many visitors you have without having to store every single IP address to check for duplicates at new connections. The tradeoff is that it's probabilistic: you don't need to store every address so you need much less space and time to get a count of unique ips, but you have to accept a margin of error on that result and you can't know what the IPs were in the first place, just how many of them there are.
Thanks for quick anwser.
You mean with Hyperloglog, I can not get each IP count but only the value how much IP has beed add into set ?
|
Copyright © 1999-2021 by the D Language Foundation