December 17, 2011
On 2011-12-17 13:09:35 +0000, Stewart Gordon <smjg_1998@yahoo.com> said:

> Strange.  I don't recall ever seeing <!DOCTYPE html> before HTML5 came along.
> 
> But I am made to wonder why.  What will happen when HTML6 comes out?  Or have they decided that validators are just going to update themselves to the new standard rather than keeping separate HTML5/HTML6 DTDs (or whatever the HTML5+ equivalent of a DTD is)?

Thing is, if they could have removed the doctype completely they would have done so. The doctype doesn't tell anything meaningful to a browser, except that today's browser use the presence of a doctype to switch between a quirk mode and a standard mode. <!DOCTYPE html> was the shortest thing that'd make every browser use standard mode.

The problem was that forcing everyone to specify either one or another HTML version is just a exercise in pointlessness. Most people get the doctype wrong, either initially or over time when someone updated the site to add some new content. If you're interested in validating your web page, likely you'll know which version you want to validate against and you can tell the validator.

>> Stuff like improperly closed tags or bad entity
>> encoding can break, but that's pretty well independent
>> of doctype validation. That's simply a matter of the
>> document being well-formed.
> 
> No, because in order to determine whether it's well-formed, one must know whether it's meant to be in SGML-based HTML, HTML5 or XHTML.

Perhaps for it matters for validation if you don't say which spec to validate against, but validating against a spec doesn't always reflect reality either. There is no SGML-based-HTML-compliant parser used by a browser out there. Browsers have two parsers: one for HTML and one for XML (and sometime the HTML parser behaves slightly differently in quirk mode, but that's not part of any spec).

And whether a browser uses the HTML or the XML parser has nothing to do with the doctype at the top of the file: it depends on the MIME types given in the Content-Type HTTP header or the file extension if it is a local file. HTML 5 doesn't change that.

Almost all web pages declared as XHTML out there are actually parsed using the HTML parser because they are served with the text/html content type and not application/xhtml+xml. A lot of them are not well formed XML and wouldn't be viewable anyway if parsed according to their doctype.



-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

December 18, 2011
On 17/12/2011 21:06, Michel Fortin wrote:
> On 2011-12-17 13:09:35 +0000, Stewart Gordon <smjg_1998@yahoo.com> said:
<snip>
>> No, because in order to determine whether it's well-formed, one must know whether it's
>> meant to be in SGML-based HTML, HTML5 or XHTML.
>
> Perhaps for it matters for validation if you don't say which spec to validate against, but
> validating against a spec doesn't always reflect reality either. There is no
> SGML-based-HTML-compliant parser used by a browser out there. Browsers have two parsers:
> one for HTML and one for XML (and sometime the HTML parser behaves slightly differently in
> quirk mode, but that's not part of any spec).

But there is a subset of HTML that is likely to be parsed correctly by browsers' HTML parsers, and this subset is all the HTML you're likely to need to use most of the time. On the other hand, the interpretation of tag soup is undefined and liable to vary from browser to browser.  So validation certainly helps you out here.

> And whether a browser uses the HTML or the XML parser has nothing to do with the doctype
> at the top of the file: it depends on the MIME types given in the Content-Type HTTP header
> or the file extension if it is a local file. HTML 5 doesn't change that.
>
> Almost all web pages declared as XHTML out there are actually parsed using the HTML parser
> because they are served with the text/html content type and not application/xhtml+xml. A
> lot of them are not well formed XML and wouldn't be viewable anyway if parsed according to
> their doctype.

But does any pre-HTML5 spec stipulate that HTML parsers accept tag soup in the first place?  ISTM this is all down to a tendency of browser/engine authors to implement fallback for malformed HTML but not for malformed XML.

Stewart.
December 18, 2011
On 17/12/2011 18:09, Nick Sabalausky wrote:
> "Stewart Gordon"<smjg_1998@yahoo.com>  wrote in message
> news:jci2bj$225s$1@digitalmars.com...
<snip>
>> <em>  isn't really an old-school example.  It's the proper semantic markup
>> for emphasis.
>>
>
> Ok. It was a dedicated HTML tag instead of a span/div with class attribute.
> Seems like most of those are non-kosher these days.
<snip>

I think half these tags just fell out of fashion when somebody invented the likes of <b> and <i>.  It was probably for a combination of reasons:

- fewer characters to type

- it's just one tag to remember for bold, and one for italics, rather than lots of different ones for emphasis, terms being defined, book titles, addresses, variables in mathematical expressions, biological taxa, foreign words/phrases, etc.

- no discrete set of semantic elements can be sure of covering _all_ possible things bold and italics may be used to denote.

- people wanted, in a time before CSS, to be able to "force" certain rendering, as opposed to the potentially application-dependent rendering of semantic elements.

And so it stuck.  It's perhaps as a concession to these that HTML 4.01 and XHTML 1.0 have kept <b> and <i> even in strict mode.

I recall reading somewhere that in HTML5, they are redefined along the lines of "stuff that is typically printed in bold" and "stuff that is typically printed in italics".  But just looking at the current working draft:

http://www.w3.org/TR/html5/text-level-semantics.html#the-i-element

"The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, or a ship name in Western texts."

"The b element represents a span of text to which attention is being drawn for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood, such as key words in a document abstract, product names in a review, actionable words in interactive text-driven software, or an article lede."

Stewart.
1 2 3 4
Next ›   Last »