April 30, 2018
On Sat, Apr 28, 2018 at 04:47:54AM +0000, Meta via Digitalmars-d wrote: [...]
> Unfortunately, I think the Chinese have us beat; they can construct redundant sentences far beyond anything we could ever imagine, and thus I predict that within 50 years Chinese will be the new international language of science, commerce, and politics:
> 
> Shíshì shīshì Shī Shì, shì shī, shì shí shí shī. Shì shíshí shì shì shì shī.
> Shí shí, shì
> shí shī shì shì. Shì shí, shì Shī Shì shì shì. Shì shì shì shí shī, shì shǐ
> shì, shǐ shì shí shī shìshì. Shì shí shì shí shī shī, shì shíshì. Shíshì
> shī, Shì shǐ shì shì shíshì. Shíshì shì, Shì shǐ shì shí shì shí shī. Shí
> shí, shǐ shí shì shí shī shī, shí shí shí shī shī. Shì shì shì shì.
> 
> 石室诗士施氏,嗜狮,誓食十狮。氏时时适市视狮。十时,适十狮适市。 是时,适施氏适市。氏视是十狮,恃矢势,使是十狮逝世。氏拾是十狮尸,适石室。石室湿,氏使侍拭石室。石室拭,氏始试食是十狮尸。食时,始识是十狮,实十石狮尸。试释是事。

As a native Chinese speaker, I find contortions of this kind mildly amusing but mostly ridiculous, because this is absolutely NOT how the language works.  It is carrying an ancient scribal ivory-tower ideal of one syllable per word to ludicrous extremes, an ideal that's mostly unattained, because most so-called monosyllabic "words" in the language are in fact multi-consonantal clusters retroactively analysed as monosyllables. Isolated syllables taken out of their context have no real meaning of their own (except perhaps in writing, which again is an invention of the scribes that doesn't fully reflect the spoken reality [*]). Actually pronouncing the atrocity above might as well be speaking reverse-encrypted Klingon as far as comprehensibility by a native speaker is concerned.

[*] The fact that the written ideal doesn't really line up with the vernacular is evidenced by the commonplace exchange where native speakers have to explain to each other which written word they mean (especially where names are involved), and this is done by -- you guessed it -- quoting the multiconsonantal cluster from which said monosyllabic "word" was derived, thus reconstructing the context required to actually make sense of what would otherwise be an ambiguous, unintelligible monosyllable.  Foreigners, of course, have little idea about this, and gullibly believe the fantasy that the language actually works on monosyllables.  It does not.


T

-- 
Mediocrity has been pushed to extremes.
April 30, 2018
On Monday, 30 April 2018 at 16:20:38 UTC, H. S. Teoh wrote:
> As a native Chinese speaker, I find contortions of this kind mildly amusing but mostly ridiculous, because this is absolutely NOT how the language works.  It is carrying an ancient scribal ivory-tower ideal of one syllable per word to ludicrous extremes, an ideal that's mostly unattained, because most so-called monosyllabic "words" in the language are in fact multi-consonantal clusters retroactively analysed as monosyllables. Isolated syllables taken out of their context have no real meaning of their own (except perhaps in writing, which again is an invention of the scribes that doesn't fully reflect the spoken reality [*]). Actually pronouncing the atrocity above might as well be speaking reverse-encrypted Klingon as far as comprehensibility by a native speaker is concerned.

Oh yes, I'm well aware that there's a lot of semantic contortion required here, and that as spoken, this sounds like complete gibberish. I don't know where the monosyllable meme came from, either; it's readily apparently from learning even basic vocabulary. 今天, 马上, 故事, hell, 中国 is a compound word.
April 30, 2018
On Mon, Apr 30, 2018 at 06:10:35PM +0000, Meta via Digitalmars-d wrote: [...]
> Oh yes, I'm well aware that there's a lot of semantic contortion required here, and that as spoken, this sounds like complete gibberish. I don't know where the monosyllable meme came from, either; it's readily apparently from learning even basic vocabulary. 今天, 马上, 故事, hell, 中国 is a compound word.

AFAICT, the monosyllable thing is something that came from the ancient scribes who were (at least partly) responsible for the writing system. It's an ideal that gives a nice 1-to-1 mapping between glyphs and "words", providing a philosophically elegant way to rationalize everything into monosyllabic units.  Sortof like representing everything with 1's and 0's. :-D  But, like all ideals, it's also somewhat detached from reality, despite having permeated Chinese thinking such that people have come to equate "word" with "syllable", even though many such "words" clearly only exist as compounds that cannot be separated without destroying its meaning.

There's also another, currently not-fully-understood factor, and that is that pronunciation has changed over time, and there is some evidence that certain features of the written language may reflect inserted consonants that formed consonant clusters in the ancient language, sounds that have since become silent. It makes one wonder if some of these "monosyllables" may have been, in some distant past, actually polysyllabic words in their own right, and the monosyllable thing may have been merely a side-effect of the shift in pronunciation over many centuries.  (This pronunciation shift is evidenced by the "alternate readings" of certain glyphs that's adopted when reading ancient poetry, clearly a retroactive effort to compensate for the divergence in rhyme caused by sound change since the time said poetry was written. Who knows what other compensatory measures have been taken over time that may have, consciously or not, contributed to the monosyllabic illusion.)


T

-- 
Debian GNU/Linux: Cray on your desktop.
May 01, 2018
On Friday, 27 April 2018 at 00:03:34 UTC, H. S. Teoh wrote:
> On Thu, Apr 26, 2018 at 07:14:17PM -0400, Nick Sabalausky (Abscissa) via Digitalmars-d wrote:
>> On 04/26/2018 06:47 PM, H. S. Teoh wrote:
>> > 
>> > If "less is more" were universally true, we'd be programming in BF instead of D.  :-O  (Since, after all, it's Turing-complete, which is all anybody really needs. :-P)
>> > 
>> 
>> Yea. Speaking of which, I wish more CS students were taught the the inherent limitations of "Turing-complete" vs (for example) "Big-O". There's faaaar too many people being taught "Turing-complete means it can do anything" which, of course, is complete and total bunk in more (important) ways than one.
>
> Actually, Turing-complete *does* mean it can do anything... well, anything that can be done by a machine, that is.  There are inherently unsolvable problems that no amount of Turing-completeness will help you with.
>
> The problem, however, lies in how *practical* a particular form of Turing-completeness is.  You wouldn't want to write a GUI app with lambda calculus, for example, even though in theory you *could*. (It would probably take several lifetimes and an eternity of therapy afterwards, but hey, we're talking theory here. :-P)  Just like you *could* write basically anything in machine language, but it's simply not practical in this day and age.
>
>
> And actually, speaking of Big-O, one thing that bugs me all the time is that the constant factor in front of the Big-O term is rarely considered.  When it comes to performance, the constant factor *does* matter.  You can have O(log n) for your algorithm, but if the constant in front is 1e+12, then my O(n^2) algorithm with a small constant in front will still beat yours by a mile for small to medium sized use cases.  The O(log n) won't mean squat unless the size of the problem you're solving is commensurate with the constant factor in the front. You can sort 5 integers with an on-disk B-tree and rest assured that you have a "superior" algorithm, but my in-memory bubble sort will still beat yours any day.  The size of the problem matters.
>
> Not to mention, constant-time setup costs that's even more frequently
> disregarded when it comes to algorithm analysis.  You may have a
> O(n^1.9) algorithm that's supposedly superior to my O(n^2) algorithm,
> but if it takes 2 days to set up the data structures required to run
> your O(n^1.9) algorithm, then my O(n^2) algorithm is still superior
> (until the problem size becomes large enough it will take more than 2
> days to compute).  And if your O(n^1.9) algorithm has a setup time of 10
> years, then it might as well be O(2^n) for all I care, it's pretty much
> useless in practice, theoretical superiority be damned.
>
> And that's not even beginning to consider practical factors like the hardware you're running on, and why the theoretically-superior O(1) hash is in practice inferior to supposedly inferior algorithms that nevertheless run faster because they are cache-coherent, whereas hashing essentially throws caching out the window.  Thankfully, recently there's been a slew of papers on cache-aware and cache-oblivious algorithms that are reflect reality closer than ivory-tower Big-O analyses that disregard reality.
>
>
>> I see the same thing in other areas of CS, too, like parser theory. The formal CS material makes it sound as if LR parsing is more or less every bit as powerful as LL (and they often straight-up say so in no uncertain terms), but then they all gloss over the fact that: That's ONLY true for "detecting whether an input does or doesn't match the grammar", which is probably the single most UNIMPORTANT characteristic to consider when ACTUALLY PARSING.  Outside of the worthless "does X input satisfy Y grammar: yes or no" bubble, LL-family is vastly more powerful than LR-family, but you'd never know it going by CS texts (and certainly not from those legendary-yet-overrated Dragon texts).
>
> Well, LR parsing is useful for writing compilers that tell you "congratulations, you have successfully written a program without syntax errors!".  What's that?  Where's the executable?
>  Sorry, I don't know what that word means.  And what?  Which line did the syntax error occur in?  Who knows!  That's your problem, my job is just to approve or reject the program in its entirety! :-P
>
> (And don't get me started on computability theory courses where the sole purpose is to explore the structure of the hierarchy of unsolvable problems.  I mean, OK, it's kinda useful to know when something is unsolvable (i.e., when not to waste your time trying to do something that's impossible), but seriously, what is even the point of the tons of research that has gone into discerning entire *hierarchies* of unsolvability?!  I recall taking a course where the entire term consisted of proving things about the length of proofs. No, we weren't actually *writing* proofs. We were proving things *about* proofs, and I might add, the majority of the "proofs" we worked with were of infinite length.  I'm sure that it will prove (ha!) to be extremely important in the next iPhone release, which will also cure world hunger and solve world peace, but I'm a bit fuzzy on the details, so don't quote me on that.  :-P)
>
>
> T

The point of O is for the most dominant rate of growth(asymptotic behavior). In mathematics, one only cares about n as it approaches infinity and so any constant term will eventually be dwarfed. So technically, in the theory, it is "incorrect" to have any extraneous terms.

In CS, it is used to approximate the time for large n to be able to compare different algorithms in the "long run". Since computers cannot process infinite n, n will be finite and generally relatively small(e.g., less than 10^100, which is quite small compared to infinity).

So the failure you are pointing out is really because the application. In some cases the constant term may be applicable and same cases it isn't. Since it depends on n and we cannot use the simplification that n is infinity, it matters what n is. This is why it is also important to know which algorithms do better for small n because if n is small during program use one might be using the wrong algorithm.

But once one goes down this road one then has to not count ideal cycles but real cycles and include all the other factors involved. Big O was not mean to give real world estimates because it is a mathematical domain. It may or may not work well depending on how poorly it is used, sorta like statistics. Generally though, they are such a great simplification tool that it works for many processes that are well behaved.

Ideally it would be better to count the exact number of cycles used by an algorithm and have them normalized to some standard cycle that could be compared across different architectures. Machines can do the accounting easily. Even then, many anomalous behavior will generally be seen but it would be more accurate, although possibly not much more informative, than Big O.



May 01, 2018
On Tue, May 01, 2018 at 05:13:13PM +0000, IntegratedDimensions via Digitalmars-d wrote: [...]
> The point of O is for the most dominant rate of growth(asymptotic behavior).  In mathematics, one only cares about n as it approaches infinity and so any constant term will eventually be dwarfed. So technically, in the theory, it is "incorrect" to have any extraneous terms.

Well, yes.  Of course the whole idea behind big O is asymptotic behaviour, i.e., behaviour as n becomes arbitrarily large. Unfortunately, as you point out below, this is not an accurate depiction of the real world:


> In CS, it is used to approximate the time for large n to be able to compare different algorithms in the "long run". Since computers cannot process infinite n, n will be finite and generally relatively small(e.g., less than 10^100, which is quite small compared to infinity).

Exactly, and therefore the model does not quite match reality, and so when the scale of n matters in reality, then the conclusions you draw from big O will be inaccurate at best, outright wrong at worst.


> So the failure you are pointing out is really because the application. In some cases the constant term may be applicable and same cases it isn't.  Since it depends on n and we cannot use the simplification that n is infinity, it matters what n is. This is why it is also important to know which algorithms do better for small n because if n is small during program use one might be using the wrong algorithm.

Exactly.  Yet this important point is often overlooked / neglected to be mentioned when big O is taught to CS students.

I'm not saying big O is useless -- it has its uses, but people need to be aware of its limitations rather than blindly assuming (or worse, being told by an authority) that it's a magical pill that will solve everything.


> But once one goes down this road one then has to not count ideal cycles but real cycles and include all the other factors involved. Big O was not mean to give real world estimates because it is a mathematical domain. It may or may not work well depending on how poorly it is used, sorta like statistics.  Generally though, they are such a great simplification tool that it works for many processes that are well behaved.

I agree that big O is a wonderful simplification in many cases.  But this comes with caveats, and my complaint was that said caveats are more often than not overlooked or neglected.

To use a concrete example: traditional big O analysis says a hashtable is fastest, being O(1), especially when the hash function minimizes collisions. Minimal collisions means short linear search chains when multiple entries fall into the same bucket, i.e., we stay close to O(1) instead of the worst case of O(n) (or O(log n), depending on how you implement your buckets). In certain situations, however, it may actually be advantageous to *increase* collisions with a locality-sensitive hash function, because it increases the likelihood that the next few lookups may already be in cache and therefore doesn't incur the cost of yet another cache miss and RAM/disk roundtrip.  The buckets are bigger, and according to big O analysis "slower", because each lookup incurs the cost of O(n) or O(log n) search within a bucket.  However, in practice it's faster, because it's expensive to load a bucket into the cache (or incur disk I/O to read a bucket from disk).  If lookups are clustered around similar keys and end up in the same few buckets, then once the buckets are cached any subsequent lookups become very cheap. Large buckets actually work better because instead of having to incur the cost of loading k small buckets, you just pay once for one large bucket that contains many entries that you will soon access in the near future. (And larger buckets are more likely to contain entries you will need soon.)

Also, doing an O(n) linear search within small-ish buckets may actually be faster than fancy O(log n) binary trees, due to the CPU's cache predictor.  A linear scan is easy for the cache predictor to recognize and load in a series of consecutive cache lines, thus amortizing away the RAM roundtrip costs, whereas with a fancy binary tree the subsequent memory access is hard to predict (or may have no predictable pattern), so the predictor can't help you, and you have to pay for the RAM roundtrips.  When n gets large, of course, the binary tree will overtake the performance of the linear search.  But the way big O is taught in CS courses gives the wrong impression that O(n) linear search is always inferior and therefore bad and to be avoided.  Students need to be told that this is not always the case, and that there are times when O(n) is actually better than O(log n).


> Ideally it would be better to count the exact number of cycles used by an algorithm and have them normalized to some standard cycle that could be compared across different architectures. Machines can do the accounting easily. Even then, many anomalous behavior will generally be seen but it would be more accurate, although possibly not much more informative, than Big O.
[...]

Sorry, counting cycles does not solve the problem.  That may have worked back in the days of the 8086, but CPU architectures have moved on a long way since then.  These days, cache behaviour is arguably far more important than minimizing cycles, because your cycle-minimized algorithm will do you no good if every other cycle the instruction pipeline has to be stalled because of branch hazards, or because of a cache miss that entails a RAM roundtrip.

Furthermore, due to the large variety of cache structures out there, it's unrealistic to expect a single generalized cycle-counting model to work for all CPU architectures. You'd be drowning in nitty gritty details instead of getting useful results from your analysis.  CPU instruction pipelines, out-of-order execution, speculative execution, etc., will complicate the analysis so much that a unified model that works across all CPUs would pretty much be impossible.

A more promising approach that has been pursued in the recent years is the cache-oblivious model, where the algorithm is designed to take advantage of a cache hierarchy, but *without* depending on any specific one. I.e., it is assumed that linear access of N elements sequentially across blocks of size B will be faster than N random accessese, but the algorithm is designed in such a way that it does not depend on specific values of B and N, and it does not need to be tuned to any specific value of B and N.  This model has shown a lot of promise in algorithms that may in theory have "poor" big O behaviour, but in practice operates measurably faster because they take advantage of the modern CPU cache hierarchy.

As an added bonus, the cache hierarchy analysis naturally extends to include secondary storage like disk I/O, and the beauty of cache oblivious algorithms is that they can automatically take advantage of this without needing massive redesign, unlike the earlier situation where you have to know beforehand whether your code is going to be CPU-bound or disk-bound, and you may have to use completely different algorithms in either case.  Or you may have to hand-tune the parameters of your algorithm (such as optimize for specific disk block sizes). A cache-oblivious can Just Work(tm) without further ado, and without sudden performance hits.


T

-- 
One reason that few people are aware there are programs running the internet is that they never crash in any significant way: the free software underlying the internet is reliable to the point of invisibility. -- Glyn Moody, from the article "Giving it all away"
May 01, 2018
On Tuesday, 1 May 2018 at 18:46:20 UTC, H. S. Teoh wrote:
>
> Well, yes.  Of course the whole idea behind big O is asymptotic behaviour, i.e., behaviour as n becomes arbitrarily large. Unfortunately, as you point out below, this is not an accurate depiction of the real world:
>
> [snip]

The example I like to use is parallel computing. Sure, throwing 8 cores at a problem might be the most efficient with a huge amount of data, but with a small array there's so much overhead that it's way slower than a single processor algorithm.
May 02, 2018
On Thursday, 26 April 2018 at 23:26:30 UTC, Walter Bright wrote:
> I posit that redundancy is something programmers learn to appreciate as they gain experience, and that eliminating redundancy is something new programmers think is a new idea :-)
>

Not just 'new programmers', but even old programmers! (i.e. think 'Go'..)

How did they get 'Go'... so wrong?

Such 'smart' 'experienced' people .. came up with such an awful language... I don't get it.

'For' is Go's 'while' - wtf!@!!!#!

There's no substituion for taste...some have it.. some don't.

'Experience' is irrelevant.

May 01, 2018
On 05/01/2018 10:51 PM, TheDalaiLama wrote:
> 
> There's no substituion for taste...some have it.. some don't.
> 
> 'Experience' is irrelevant.
> 

Honestly, there's a lot of truth to this. People can certainly learn, of course (well, at least some people can), but experience definitely does not imply learning has actually occurred.

Contrary to popular belief (especially common HR belief), experience is NOT a consistent, commoditized thing. There's such a thing as quality of experience, and that has far more impact than quantity.
May 02, 2018
On Wed, 2018-05-02 at 02:51 +0000, TheDalaiLama via Digitalmars-d wrote:
> 
[…]
> How did they get 'Go'... so wrong?

They didn't. A lot of people out there are using Go very effectively and thoroughly enjoying it. True it is a language by Google for Google, but it has massive traction outside Google. Go got a lot right.

> Such 'smart' 'experienced' people .. came up with such an awful language... I don't get it.

Just because you think it is awful doesn't make it awful. Indeed Go got a lot
right exactly because smart experienced people were involved in its
development. People who have spent 40+ years designing languages generally do
language design quite well.

> 'For' is Go's 'while' - wtf!@!!!#!

It works very well for those people who use it as it is meant to be used.

> There's no substituion for taste...some have it.. some don't.

Taste is a personal thing not an objective measure. That you dislike Go so much is fine, that is your opinion. That does not make it objective fact. Others really like Go, that is their opinion, and equally valid. It also is not objective fact.

What can be shown consistently though is that the process (preferably lightweight) and channel model avoids much if not all the unpleasantness of shared memory multithreaded programming. Go has this built in and it makes it a very appealing language. Rust has channels but only heavyweight processes, at least at present in release form. D has heavyweight processes with message passing, but I couldn't seem to make it work for Me TV, I just got no message passing and lots of segfaults.

> 'Experience' is irrelevant.

Now that is just a rubbish statement. Experience can be crucial. cf. Roman architecture. Even people who claim to have no experience are in fact usually using knowledge gained by experience of others.

-- 
Russel.
==========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk


May 02, 2018
On Tue, 2018-05-01 at 23:54 -0400, Nick Sabalausky (Abscissa) via Digitalmars-
d wrote:
> On 05/01/2018 10:51 PM, TheDalaiLama wrote:
> > 
> > There's no substituion for taste...some have it.. some don't.
> > 
> > 'Experience' is irrelevant.
> > 
> 
> Honestly, there's a lot of truth to this. People can certainly learn, of course (well, at least some people can), but experience definitely does not imply learning has actually occurred.
> 
> Contrary to popular belief (especially common HR belief), experience is NOT a consistent, commoditized thing. There's such a thing as quality of experience, and that has far more impact than quantity.

Agreed that hours claimed as experience is not a measure of knowledge. But
conversely experience with learning is critical to future development. Thus
statements such as "experience is irrelevant" are dangerous statements in most
contexts.

-- 
Russel.
==========================================
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk