Coding for solid state drives (page 2)

On Saturday, 25 April 2015 at 14:19:31 UTC, Laeeth Isharc wrote: > On Saturday, 25 April 2015 at 11:34:22 UTC, ketmar wrote: >> On Fri, 24 Apr 2015 01:27:15 -0700, Walter Bright wrote: >> >>> if there are any >>> modifications we should make to std.stdio to work better with SSDs? >>> (Such as changing the buffer sizes.) >> >> yes: don't do anything. it's OS task to cope with that. > > well beyond the area I know, but it seems like given the relative structure of costs for random seeks for SSDs you often want to process files in parallel, whereas the opposite is true for spinning platters. The OS can't help you here. Well, actually, it should. In theory, all you need to do is to queue as many reads/writes as you can - using threads, fibers, async I/O calls, etc. This is not the same as sequentially reading/writing random blocks. The OS I/O scheduler should reorder the operations so that the accessed blocks are in order and physically close to each other.

On Saturday, 25 April 2015 at 16:10:11 UTC, ketmar wrote: > On Sat, 25 Apr 2015 14:19:30 +0000, Laeeth Isharc wrote: > >> But surely, it would be a start to make it easy for the user to know so >> she can shape her approach accordingly. > > i believe that this must be controlled with `version` or cli arg, and it > belongs to application logic, not standard library. I defer to your greater expertise. But I should have thought that if csv parsing belongs in a standard library (something that is easy for a user to write himself) then detecting whether a path is on an SSD might perhaps too. (Bearing in mind it's more of a system thing not so easy for every user to write himself in a platform independent way). Laeeth.

On 4/24/2015 10:26 PM, Vladimir Panteleev wrote: > On Friday, 24 April 2015 at 19:35:08 UTC, Walter Bright wrote: >> Things are configurable in std.stdio. But most people will just use the >> default settings. The default settings should be optimized for SSDs, not >> spinning drives. > > That would be unwise - as HDDs are much slower (and still much more common), > optimizing for SSDs at the expense of HDD performance will cause overall > performance to be much worse until HDDs become rare. > > I mean, assuming that such optimizations aren't just theoretical. Hard disks are dead today for anyone who cares about performance. I still use them, but only for secondary storage.

April 25, 2015

Re: Coding for solid state drives

Posted by Walter Bright
in reply to Vladimir Panteleev

Permalink

Walter Bright

Posted in reply to Vladimir Panteleev

Permalink

On 4/24/2015 10:24 PM, Vladimir Panteleev wrote:
> On Friday, 24 April 2015 at 08:27:06 UTC, Walter Bright wrote:
>> http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/
>>
>>
>> An interesting article. Anyone want to see if there are any modifications we
>> should make to std.stdio to work better with SSDs? (Such as changing the
>> buffer sizes.)
>
> This article seems to target operating system authors more than application
> programmers, as OS caches will invalidate most application-side changes.
>
> The HN comments are also mostly dismissive of this article:
> https://news.ycombinator.com/item?id=9431571



"The high-level optimizations are important: * Choose a good SSD * Read and Write in "page" multiples and "page" aligned * Use lots of parallel IOs (high queue depth) * Do not put unrelated data in the same "page"

A page used to be 4KB, SSDs are now switching to 8KB and will switch to 16KB later on. Just pick a reasonable size around that (16KB if you can do it will last you a while). Don't sweat the page multiples too much, the SSDs will most likely have to handle 4KB pages for a long while due to databases and such so they will keep some optimization around that size anyhow, it will make it easier for them if you use a larger size.

I wouldn't heed any of the advice on single-threading, the biggest performance boost comes from parallelism and writes are anyway buffered by the SSD (a good SSD has a super-cap to have a good sized write cache)."

On Saturday, 25 April 2015 at 20:12:55 UTC, Walter Bright wrote: > Hard disks are dead today for anyone who cares about performance. > > I still use them, but only for secondary storage. For anybody who wants to buy 4TB of storage for $100, hard drives are still very much alive. Not to mention USB flash drives and SD cards which don't have the performance characteristics of SSDs. Let's not be so hasty. Until SSDs truly replace all other forms of storage, it's best that we don't optimize D and Phobos for one type of storage only.

On 4/25/2015 1:42 PM, Xinok wrote: > On Saturday, 25 April 2015 at 20:12:55 UTC, Walter Bright wrote: >> Hard disks are dead today for anyone who cares about performance. >> >> I still use them, but only for secondary storage. > > For anybody who wants to buy 4TB of storage for $100, hard drives are still very > much alive. I presume what sensible people wanting speed do is what I do - I have a 256Gb SSD for my primary drive, and a 4TB drive as secondary. > Not to mention USB flash drives and SD cards which don't have the > performance characteristics of SSDs. They wouldn't behave like spinning disks do, either. > Let's not be so hasty. Until SSDs truly replace all other forms of storage, it's > best that we don't optimize D and Phobos for one type of storage only. Um, it's currently optimized for HDs. But those aren't what people who want fast IO use.

On Sat, 25 Apr 2015 16:40:51 +0000, Laeeth Isharc wrote: > On Saturday, 25 April 2015 at 16:10:11 UTC, ketmar wrote: >> On Sat, 25 Apr 2015 14:19:30 +0000, Laeeth Isharc wrote: >> >>> But surely, it would be a start to make it easy for the user to know so she can shape her approach accordingly. >> >> i believe that this must be controlled with `version` or cli arg, and it belongs to application logic, not standard library. > > > I defer to your greater expertise. > > But I should have thought that if csv parsing belongs in a standard library (something that is easy for a user to write himself) then detecting whether a path is on an SSD might perhaps too. (Bearing in mind it's more of a system thing not so easy for every user to write himself in a platform independent way). and now something more serious: trying to detect what storage propgram using is completely unreliable. you can't optimise for all cases, and you can't even detect all cases. big raid which can be faster than SSD with "SSD pattern"? ah, ok, nobody cares, we detected it as HDD. virtual drive, which can be anything at all? fuse mount point? i can think out alot of that. that's why operational mode should be controlled by cli switch. if user *really* cares about performance, he *will* know what HW he has and how to make program fully utilize it. and in other cases let OS i/o scheduler do it work without trying to needlessly "help" it.

Forums