May 05, 2016
On Thu, May 05, 2016 at 09:28:01AM +0000, Chris via Digitalmars-d wrote: [...]
> There was a spelling reform in Germany in the 1990ies. Portuguese spelling has been reformed several times (and there are two major spelling systems Brazilian and Portuguese Portuguese)[1], and in Spanish it has also been a to and fro (Latin America vs. Spain). All these languages have produced a vast body of literature too and still spelling reforms have been pushed successfully. So quantity is not an argument. First, most people don't have problems reading texts in older spellings, and second, it only takes one or only half a generation of school children to make the new spelling feel "natural".

You're quite right, of course.  A lot of it has to do with inertia. People just prefer what they're used to, rather than what's objectively better.  And children will just grow up liking whatever they were taught, so the key to success really just lies with education. :-)

In the early days of the introduction of Simplified Chinese writing, there was a lot of resistance from older educated folk, especially Chinese immigrants overseas (such as the significant population in Southeast Asia) who perceived it as a "denigration" of the old writing system. The older, more complex system preserves some of the arguably flagrant shenanigans by ancient Chinese scribes who went overboard with the whole derivation from radicals idea and invented some of the most ridiculously complex characters that nobody uses. This was perceived to be superior because, well, it was more "literary" (whatever that means!), and it shows the clear derivation of the character from ancient constructs -- you could guess at the meaning of an unknown character just by extrapolation from its various intricate components, obviously very useful when you encounter an unknown obscure overly-complex character that nobody actually uses (much less pronounce!).  Plus, it just *looked* more artistic, nevermind the fact that the sheer number of strokes made the writing laughably inefficient in today's impatient world.

Well, fast-forward a couple o' decades, and now almost all overseas Chinese populations have adopted the new system, and the present situation is well illustrated by one instance when a child piped up one day in class, saying that the teacher had made a mistake in her writing. Afterwards, the teacher had to explain that it was actually not a mistake, but an older way of writing the same character. The youngster, of course, knows nothing but the new system, and has no reason to regard the old system as anything other than a "mistake". Which, perhaps, it is. :-D


[...]
> But hey, it's just a coding convention. We shouldn't be too attached to spellings, especially if reforms make it easier to spell (i.e. to spell out a word as you hear it in your head) and parse text. It's a code to communicate, not a religion.
[...]

It's a falsehood that you can just spell out a word "as you hear it in your head". No writing system actually does that, even though some come pretty close. Almost all writing systems are compromises, balancing etymology, grammatical marking, ease of use, and closeness to actual pronunciation -- the latter of which is actually an extremely thorny issue due to the existence of myriads of dialects and personal pronunciation peculiarities. If you're merely talking about what's spoken in the Queen's court, then there's no issue, but it's a big problem when applied to the diverse regional English dialects across the globe. The way a Texan spells will be incomprehensible to a Briton, for example.  (But perhaps that would actually be an advantage of sorts, in recognizing that Texan is actually a different language, contrary to popular belief. :-P)  Or, for that matter, American vs. Australian.  It would cause a splintering of dialects.  Even across different persons within the same dialectal community, there are bound to be subtle differences that would make a difference in a pure spell-it-as-you-say-it system.

Chinese writing is actually an ironic illustration of the last point, in fact. Thousands of years ago everybody spoke the same ancestral tongue, but since then, the original ancient Chinese language has splintered into what's commonly called "dialects" today, but in actuality are completely different languages on their own. The distance between, say, Mandarin and Cantonese is far greater than between Spanish and Portuguese, for example, yet for some unfathomable reason we regard the latter as separate languages whereas the former are somehow still mere "dialects".  But in spite of that, the one thing they all have in common is a writing system understood by all -- thanks to the writing *not* being phonetic, which is something usually regarded as a bad thing. Since the writing isn't phonetic, it has survived as a common system of communication in spite of thousands of years of sound change and language drift, which in any other community would have caused complete breakdown in communication. (Of course, it's not a *perfect* common system of communication, because "dialectal" differences are in some cases big enough that one "dialect" would use characters that don't exist in other dialects, or some words can't be represented at all. But still, you can at least understand each other to a workable extent just by having pen and paper handy, which is a lot more than can be said for, say, an Englishman trying to communicate with a Russian, having no common writing system at all, even though thousands of years ago their respective ancestors spoke the same proto-Indo-European tongue.)

So you see, "write as you say it" isn't quite the panacea as it may first appear to be. Neither is "keep the ancestral spelling even though nobody actually talks that way anymore, just so we can communicate with the Russians in writing in spite of having completely mutually unintelligible pronunciation".  All real-life writing systems are compromises between conflicting goals. (Reminds one of programming language design, doesn't it? :-P)


T

-- 
May you live all the days of your life. -- Jonathan Swift
May 05, 2016
On Thursday, 5 May 2016 at 14:52:00 UTC, H. S. Teoh wrote:

>
> [...]
>> But hey, it's just a coding convention. We shouldn't be too attached to spellings, especially if reforms make it easier to spell (i.e. to spell out a word as you hear it in your head) and parse text. It's a code to communicate, not a religion.
> [...]
>
> It's a falsehood that you can just spell out a word "as you hear it in your head". No writing system actually does that, even though some come pretty close. Almost all writing systems are compromises, balancing etymology, grammatical marking, ease of use, and closeness to actual pronunciation -- the latter of which is actually an extremely thorny issue due to the existence of myriads of dialects and personal pronunciation peculiarities. If you're merely talking about what's spoken in the Queen's court, then there's no issue, but it's a big problem when applied to the diverse regional English dialects across the globe. The way a Texan spells will be incomprehensible to a Briton, for example.  (But perhaps that would actually be an advantage of sorts, in recognizing that Texan is actually a different language, contrary to popular belief. :-P)  Or, for that matter, American vs. Australian.  It would cause a splintering of dialects.  Even across different persons within the same dialectal community, there are bound to be subtle differences that would make a difference in a pure spell-it-as-you-say-it system.
>
> Chinese writing is actually an ironic illustration of the last point, in fact. Thousands of years ago everybody spoke the same ancestral tongue, but since then, the original ancient Chinese language has splintered into what's commonly called "dialects" today, but in actuality are completely different languages on their own. The distance between, say, Mandarin and Cantonese is far greater than between Spanish and Portuguese, for example, yet for some unfathomable reason we regard the latter as separate languages whereas the former are somehow still mere "dialects".  But in spite of that, the one thing they all have in common is a writing system understood by all -- thanks to the writing *not* being phonetic, which is something usually regarded as a bad thing. Since the writing isn't phonetic, it has survived as a common system of communication in spite of thousands of years of sound change and language drift, which in any other community would have caused complete breakdown in communication. (Of course, it's not a *perfect* common system of communication, because "dialectal" differences are in some cases big enough that one "dialect" would use characters that don't exist in other dialects, or some words can't be represented at all. But still, you can at least understand each other to a workable extent just by having pen and paper handy, which is a lot more than can be said for, say, an Englishman trying to communicate with a Russian, having no common writing system at all, even though thousands of years ago their respective ancestors spoke the same proto-Indo-European tongue.)
>
> So you see, "write as you say it" isn't quite the panacea as it may first appear to be. Neither is "keep the ancestral spelling even though nobody actually talks that way anymore, just so we can communicate with the Russians in writing in spite of having completely mutually unintelligible pronunciation".  All real-life writing systems are compromises between conflicting goals. (Reminds one of programming language design, doesn't it? :-P)
>
>
> T

I knew I'd regret it, when I wrote "as you hear it in your head". :) The ideal is phonetic spelling (Spanish comes quite close to it). This does not mean that you have a letter for each sound, or that you write allophones or every little local nuance. However, it is important to be consistent, even if the spelling system does not 100% reflect the spoken reality (which is the next best thing to phonetic spelling). If in English you wrote "nite" (instead of night), the grapheme <ite> would be identifiable as the phonemes /ait/, bite, fite, lite, tite, although the -e is silent.

In Irish, due to the differences between local dialects the spelling is somewhat conservative and doesn't reflect the phonetic reality of each dialect, however, it is quite consistent and everybody can read it using their respective pronunciation.
May 05, 2016
On Thu, May 05, 2016 at 04:03:46PM +0000, Chris via Digitalmars-d wrote: [...]
> I knew I'd regret it, when I wrote "as you hear it in your head". :)

:-)


> The ideal is phonetic spelling (Spanish comes quite close to it). This does not mean that you have a letter for each sound, or that you write allophones or every little local nuance. However, it is important to be consistent, even if the spelling system does not 100% reflect the spoken reality (which is the next best thing to phonetic spelling). If in English you wrote "nite" (instead of night), the grapheme <ite> would be identifiable as the phonemes /ait/, bite, fite, lite, tite, although the -e is silent.

Point taken, though I think the correct term is "phonemic spelling". ;-) Even then, there are still compromises, because not all dialects share the same phonemes, and some dialects may consider certain words as having different phonemes from another dialect (and not all dialects share the same set of phonemes -- though they are close, at least as far as English is concerned).

Another issue is that the Latin alphabet, with its dearth of vowel
letters, is really inadequate for representing the extensive English
vowel system.  Modern English has far more vowels than there are letters
to represent them, and in an ideal writing system you'd have a distinct
symbol for each of them. In current writing these vowels are
contextually represented, mostly in their historic forms, hence the
proliferation of silent e's everywhere. These were actually pronounced
as separate vowels way back when, but since then they have been dropped,
leaving behind their trace of modifying the quality of the previous
vowel. Hence in writing, these silent e's have come to represent that
modification of preceding vowel quality, rather than an actual vowel. (A
similar thing happens in old Russian orthography, with those ъ's and ь's
everywhere, coloring the previous consonant, and, by modern times, also
the preceding vowel.) This contextual representation is one of the
reasons why English spelling is so atrocious -- you're basically
replicating about 400-500 years' worth of sound change when you write
/ate/ to represent [eːt] (or [ejt], depending on dialect) as opposed to
/at/ [æt]. But, as any historic linguist knows, many sound changes tend
to be contextual, so not all final e's are silent, and not all silent
e's have the same effect on the preceding vowel. Hence the inscrutable
list of unending exceptions to English spelling "rules".


> In Irish, due to the differences between local dialects the spelling is somewhat conservative and doesn't reflect the phonetic reality of each dialect, however, it is quite consistent and everybody can read it using their respective pronunciation.

Present-day English dialects are probably still close enough that a common representation of phonemes is possible, barring some minor exceptions. Of course, good luck convincing people to adopt whatever system you come up with. :-P  I think there has been no shortage of good ideas in spelling reform proposals; the main obstacle is the inertia of the status quo.


T

-- 
What are you when you run out of Monet? Baroque.
May 05, 2016
On Thursday, 5 May 2016 at 16:28:58 UTC, H. S. Teoh wrote:

>
> Point taken, though I think the correct term is "phonemic spelling". ;-)

Yep. "phonemic spelling", you're right.

> Another issue is that the Latin alphabet, with its dearth of vowel
> letters, is really inadequate for representing the extensive English
> vowel system.  Modern English has far more vowels than there are letters
> to represent them, and in an ideal writing system you'd have a distinct
> symbol for each of them.

What about combining existing vowel graphemes? In German you write <au> for the diphthong /au/, and <ai> or <ei> for /ai/, why wouldn't you be able to do the same thing in English?

Mai father was aut and abaut.

There would be nothing wrong with keeping <ou> as long as it represents only /au/ and not /u:/ "through" among other sounds.

Consistency is important. Spelling should at least serve as a template:

Sound convertGrapheme(T)(grapheme gr)
{
  static if (T == RP)
    return map!T(gr);
  else static if (T == HibernoEnglish)
    return map!T(gr);
  else
    return to!Sound("Bahhh!");
}

convertGrapheme!RP(ate); // returns /eit/
convertGrapheme!HibernoEnglish(ate) // returns /e:t/


May 05, 2016
As a not on the side, there are those who say that letter-to-sound systems should never be rule based, they should purely be based on machine learning. The proponents of this are usually native English speakers. For English you do need machine learning. For Spanish not so much. If you can feed the computer the rule "ch" = /tʃ/, why would you want to train it :)
May 05, 2016
On Thu, May 05, 2016 at 05:20:00PM +0000, Chris via Digitalmars-d wrote:
> As a not on the side, there are those who say that letter-to-sound systems should never be rule based, they should purely be based on machine learning.  The proponents of this are usually native English speakers. For English you do need machine learning. For Spanish not so much. If you can feed the computer the rule "ch" = /tʃ/, why would you want to train it :)

Rule-based letter-to-sound systems don't work too well for English precisely because you have to basically reproduce 500 years' worth of sound change plus all the exceptions introduced by words borrowed from other contemporous languages across the centuries. A rule-based system possibly could work, provided the rules were extensive enough (and multi-layered, to account for borrowed exceptions and other oddities). But there comes a point where even the most industrious programmer would throw up his hands and say, forget this exercise in futility, let's just have the machine teach itself instead.

Rule-based systems work better for Spanish because the orthography is much closer to actual pronunciation, and other parameters such as stress is more predictable.  I'd venture to guess that rule-based systems might not work as well for Russian, in spite of the orthography being almost 1-to-1 with actual pronunciation, because of unpreditable stress positions which can fundamentally alter vowel values. At best, you'd need a database of stress patterns for various words so that the accent would fall in the correct places. Plus a set of exceptions for certain archaic word combinations that have unusual stress.  If you had a database of English stress positions, I think half the battle is already won.

French would have the same problem as English, except that you could just do as a first approximation:

	if (rand() > someFactor)
		word = word[0 .. $/2];

and then touch it up with a small set of exceptions.  :-P


T

-- 
English is useful because it is a mess. Since English is a mess, it maps well onto the problem space, which is also a mess, which we call reality. Similarly, Perl was designed to be a mess, though in the nicest of all possible ways. -- Larry Wall
May 06, 2016
On Wednesday, 27 April 2016 at 03:59:04 UTC, Seb wrote:
> On Wednesday, 27 April 2016 at 02:57:47 UTC, Walter Bright wrote:
>> To prepare for a week in Berlin, a few German phrases is all you'll need to fit in, get around, and have a great time:
>>
>> 1. Ein Bier bitte!
>> 2. Noch ein Bier bitte!
>> 3. Wo ist der WC!
>
> nitpick: Wo ist _das_ WC?
> In German WC we have definite articles and as a WC can be used by both sexes, it is neutral (disclaimer: not a rule).
> However it's more common to say "Wo ist die nächste Toilette?"

Sorry, WC is neutral, but this has nothing to do with usage of both sexes. If you want a short explanation of where different (linguistic) gender come from, have a look on
http://www.belleslettres.eu/print/genus-gendersprech-v1.pdf (German) p. 3
In a nutshell: Connecting gender with sex is wrong. Correlation is not causality.

Sorry for being a smartass. I just have to.
May 06, 2016
On Thursday, 5 May 2016 at 23:47:15 UTC, H. S. Teoh wrote:
>
> Rule-based letter-to-sound systems don't work too well for English precisely because you have to basically reproduce 500 years' worth of sound change plus all the exceptions introduced by words borrowed from other contemporous languages across the centuries. A rule-based system possibly could work, provided the rules were extensive enough (and multi-layered, to account for borrowed exceptions and other oddities). But there comes a point where even the most industrious programmer would throw up his hands and say, forget this exercise in futility, let's just have the machine teach itself instead.

It's not just sound changes, English is just weird from a non-native speaker's point of view. As Kurt Tucholsky, one of the best German writers ever, once said, English is a simple and a difficult language at the same time. It consists of foreign words that are pronounced wrongly. English pronunciation makes any speaker of a Latin language cringe. In many European languages, and certainly in Latin languages, the letter-to-sound correspondence is more or less one-to-one: <a> is /a/, <e> is /e/ etc. In English it's often /ei/ and /i:/. <i> is often /ai/ (of for f**k's sake!): "emeritus", a Latin word, is pronounced /e.'me(:).ri.tus/, in English it's /em@.'rai.d@s/. This just makes you cringe. Native speakers of English often don't realize how weird their pronunciation sounds to those who natively speak the language they borrowed the words from (around 60% of the words). Makes me laugh when I hear English speakers who say "Oh, there is no Irish word for 'afterhours'!?" - Well, what's the English for "restaurant", "evict", "condone", "depot", "deposit" ... and what's the English for "language"?

> Rule-based systems work better for Spanish because the orthography is much closer to actual pronunciation, and other parameters such as stress is more predictable.  I'd venture to guess that rule-based systems might not work as well for Russian, in spite of the orthography being almost 1-to-1 with actual pronunciation, because of unpreditable stress positions which can fundamentally alter vowel values. At best, you'd need a database of stress patterns for various words so that the accent would fall in the correct places. Plus a set of exceptions for certain archaic word combinations that have unusual stress.  If you had a database of English stress positions, I think half the battle is already won.
>
> French would have the same problem as English, except that you could just do as a first approximation:
>
> 	if (rand() > someFactor)
> 		word = word[0 .. $/2];
>
> and then touch it up with a small set of exceptions.  :-P
>
>
> T

Are Russian stress-rules based on context? Long vs. short vowels, palatalized vs. velarized consonants etc.? If yes, you can program rules.
May 06, 2016
We've had several remarks at DConf that the traffic on this forum makes it intractable. There's good information, but it's drowned by the immense off-topic discussions.

We plan to create one more forum to address that, but one thing we could all do to contribute is to refrain from continuing off-topic comments, or at least mark them with [OT] in the title.


Thanks,

Andrei



May 06, 2016
A beautiful example of how loanwords are twisted around and how natural languages work:

https://en.wiktionary.org/wiki/crayfish