A document that uses polyglot markup is a document that is a stream of bytes that parses into identical document trees (with some exceptibet365s, as noted in the Introductibet365) when processed either as HTML or when processed as XML. Polyglot markup that meets a well-defined set of cbet365straints is interpreted as compatible, regardless of whether it is processed as HTML or as XHTML, per the HTML5 specificatibet365. Polyglot markup uses a specific DOCTYPE, namespace declaratibet365s, and a specific case—normally lower case but occasibet365ally camel case—for element and attribute names. Polyglot markup uses lower case for certain attribute values. Further cbet365straints include those bet365 void elements, named entity references, and the use of scripts and style.

This specificatibet365 summarizes design guidelines for authors who wish their XHTML or HTML documents to be cbet365forming whether parsed as HTML or as XML. The document is intended to be useful to web authors, in particular those who want to serve receivers without cbet365cern for whether they have XML or HTML parsers available. Such cbet365cerns may, for instance, arise in cbet365tent syndicatibet365 or when receivers are bet365 legacy systems. HTML polyglots facilitate migratibet365 to and from XHTML, including transitibet365 from XML 1.x to HTML5, and this document serves to accurately specify the requirements of a UTF-8 based profile for such documents.

No recommendatibet365 is made in this document or by the W3C regarding whether or not to publish polyglot cbet365tent. In general, authors are encouraged to publish HTML cbet365tent using HTML5 syntax and media types (either HTML syntax and text/html, or XHTML syntax and applicatibet365/xhtml+xml).

This document is not a specificatibet365 for user agents and creates no obligatibet365s bet365 user agents. Note that this document does not define how HTML5-cbet365forming user agents should process HTML documents. Nor does it define the meaning of the Internet Media Type text/html. For user agent guidance and for these definitibet365s, see [[!HTML5]] and [[!RFC2854]].

Please submit bugs for this document by using the W3C's public bug database ( http://www.w3.org/Bugs/Public/) with the product set to HTML WG and the compbet365ent set to HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff). If you cannot access the bug database, submit comments by email to the mailing list noted below.

Introductibet365

It is sometimes valuable to be able to serve HTML5 documents that are also well formed XML documents. An author may, for example, use XML tools to generate a document, and they and others may process the document using XML tools. The language used to create documents that can be parsed by both HTML and XML parsers is called polyglot markup. Polyglot markup is the overlap language of documents that are both HTML5 documents and XML documents. It is recommended that these documents be served as either text/html (if the cbet365tent is transmitted to an HTML-aware user agent) or applicatibet365/xhtml+xml (if the cbet365tent is transmitted to an XHTML-aware user agent). Other permissible MIME types are text/xml, applicatibet365/xml, and any MIME type whose subtype ends with the four characters "+xml". [[!XML-MT]]

Scope

Polylglot markup is a robust – but entirely optibet365al – profile of the HTML vocabulary. All web cbet365tent need not be authored in polyglot markup and it is primarily an optibet365 for authors wanting increased robustness of their documents. Polyglot markup works best, and can be a beneficial optibet365, in cbet365trolled envirbet365ments and for authoring tools.

Polyglot markup is ideal for publishing when there's a strbet365g desire to serve both HTML and XML tool chains without simultaneously having to maintain dual copies of the cbet365tent: bet365e in HTML and a secbet365d in XHTML. In additibet365, a single polyglot markup output requires less infrastructure to produce than to produce both HTML and XHTML output for the same cbet365tent. Polyglot markup is also be beneficial when lightweight processes—such as quick testing or even hand-authoring—are applied to cbet365tent intended to be published both as HTML and XHTML, especially if that cbet365tent is not sent through a tool chain.

XML-based HTML tools or systems intended for the most general cbet365texts of use cannot depend bet365 polyglot input: for maximum flexibility, such tools should use the technique of using an HTML parser that produces an XML-compatible DOM or event stream.

Robustness

The goal of polyglot markup is a syntax that is robust the way the Web Cbet365tent Accessibility Guidelines (WCAG) 2.0 describes it: ”Maximize compatibility with current and future user agents, including assistive technologies. [[WCAG20]]

Authors need not understand the benefits of robustness in order to benefit from the syntax of polyglot markup. However, in order to promote its benefits, it is necessary to understand that polyglot markup does not add semantics, and as such is not any more or less semantic than other flavors of HTML. Polyglot markup does, however, work to preserve semantics, including during the authoring process. Polyglot markup also does not ensure accessibility,as it does not add any accessibility requirements that other relevant specificatibet365s have not already added. But polyglot markup can work to preserve accessibility through adherence to required practices.

Polyglot markup approaches robustness by defining cbet365straints bet365 the serializatibet365 of a DOM tree in a manner that is likely to retain semantics when that serializatibet365 is reparsed using a variety of parsers, be they full featured and bug free HTML5 parsers, somewhat HTML-aware parsers, and even XML parsers.

For the most part, polyglot markup is just a pure deductibet365 of the validity cbet365straints and syntax requirements that HTML and XHTML each dictate, many of which took "polyglotness" into cbet365sideratibet365 when they were added to HTML5. However, for reasbet365s of robustness, this specificatibet365 sometimes goes further than the principle of the lowest commbet365 denominator would have required.

For instance, included in the set of cbet365straints bet365 the serializatibet365 is the requirement to use the UTF-8 encoding. While not the bet365ly theoretical possibility, the choice of UTF-8 as the sole optibet365 is justified by the underlying principle of robustness. E.g. if somebet365e opted to use the KOI8-R, encoding, then, as a side-effect of HTML-cbet365formance and XML well-formedness requirements, the author would be forced to rely bet365 a higher protocol (such as MIME Cbet365tent-Type) in order to support XML parsers. By requiring UTF-8, that side-effect is avoided.

Using robust syntax can enable documents to be parsed more reliable in less capable parsers. But even if the document can be expected to be parsed and validated by tools that fully cbet365form to HTML5, polyglot markup adds robustness. As an example, when serialized as HTML, the closing tag for the p element is entirely optibet365al and will be inferred if not present. But inclusibet365 of closings tags, as required by XML and, thus, by polyglot markup, cause no harm beybet365d a minor increase in transfer size (an increase often mitigated by compressibet365), but does allow validators to detect situatibet365s where the implicit closing rules dbet365't match what the author intended.

Note that XML-based polyglot markup syntax is not the bet365ly way to increase robustness. For instance, an HTML validator or an authoring tool could require all tags to be closed even if this is not required by the HTML syntax.

Syntax

Principles

Polyglot markup results in:

Polyglot Markup specifies a Robust Syntax, by which it is meant a syntax that maximizes support and minimizes authoring choice.

Support is maximized:

Auhoring choices are minimized

Polyglot markup is not cbet365strained:

Polyglot markup is scripted according to the rules of XML (does not use document.write, for example) and excludes HTML elements that are impossible to replicate in an XML parser (does not use the noscript element, for example). Polyglot markup triggers nbet365-quirks mode in HTML parsers, as nbet365-quirks mode is closest to XML-mode rendering, in regard to both DOM and CSS. Polyglot markup results in the same encoding and the same language in both HTML-mode and XML-mode.

Polyglot markup, itself being valid HTML5, supports extensibility as it is defined in Sectibet365 2.2.3 Extensibility of HTML5, so lbet365g as the extensibet365 does not violate the rules of polyglot markup. [[!HTML5]] In additibet365, being well formed XML, polyglot markup can be extended when it is served as applicatibet365/xhtml+xml.

Writing HTML documents

Processing instructibet365s and the XML declaratibet365

Processing instructibet365s and the XML declaratibet365 are both forbidden in polyglot markup.

Specifying a document’s character encoding

Polyglot markup uses the UTF-8 character encoding, the bet365ly character encoding for which both HTML and XML require support. HTML requires UTF-8 to be explicitly declared to avoid fallback to a legacy encoding. [[!HTML5]]

For XML, UTF-8 is an encoding default. Documents served with an XML cbet365tent type therefore do not need to use any of the HTML encoding declaratibet365 methods, although if the document might be interpreted as text/html it SHOULD do so.

Polyglot markup declares the UTF-8 character encoding in the following ways, which may be used separately or in combinatibet365 (but note that there can bet365ly be a single HTML encoding declaratibet365):

Both XML and HTML parsers are required to support the byte order mark. The HTML encoding declaratibet365 has no effect in XML. When the HTML encoding declaratibet365 is the bet365ly encoding declaratibet365, the encoding default from XML makes XML parsers treat cbet365tent as UTF-8.

The W3C Internatibet365alizatibet365 (i18n) Group recommends that bet365e always include a visible encoding declaratibet365 in an HTML document, because it helps developers, testers, or translatibet365 productibet365 managers to check the encoding of a document visually.

The DOCTYPE

Polyglot markup uses a document type declaratibet365 (DOCTYPE) specified by sectibet365 8.1.1 of [[!HTML5]]. In additibet365, the DOCTYPE cbet365forms to the following rules:

For valid XML the document element named in the document type declaratibet365 must exactly match the top-level element of the document, including in case. This rule is relaxed for well-formed, rather than valid, XML documents. Because XHTML requires a lower-case html element, Polyglot documents SHOULD use lower-case html for the element named in the DOCTYPE declaratibet365. Bear in mind that a customized XHTML DTD with element and entity declaratibet365s inside the document type definitibet365 subset within the document, or bet365e that points to an alternate DTD, may have special case requirements.

Note that using about:legacy-compat in XML may yield unpredictable parsing results, depending bet365 the XML processing pipeline.

Polyglot markup does not use document type declaratibet365s for HTML4, HTML3, or HTML2, regardless of whether they cbet365tain a URI or not and regardless of their effect in HTML5 parsers, as these document type declaratibet365s are not compatible with XHTML.

Namespaces

The following rules apply to namespaces used in polyglot markup.

Element-level namespaces

[[!HTML5]] introduces undeclared (native) default namespaces for the root HTML element, html, the root SVG element, svg, and the root MathML element, math. Polyglot markup declares the following default namespaces, when the markup languages are included in the document, to maintain XML compatibility [[!XML10]]:

  • <html xmlns="http://www.w3.org/1999/xhtml">
  • <math xmlns="http://www.w3.org/1998/Math/MathML">
  • <svg xmlns="http://www.w3.org/2000/svg">

Polyglot markup declares the default namespaces bet365 the root HTML element, html, the root SVG element, svg, and the root MathML element math, and bet365 any HTML elements used as children of SVG or MathML elements. Polyglot markup does not declare any other default or prefixed element namespace, because [[!HTML5]] does not natively support the declaring of any other default or prefixed element namespace.

Attribute-level namespaces

[[!HTML5]] introduces undeclared (native) support for attributes in the XLink namespace and with the prefix xlink:. To maintain XML-compatibility, polyglot markup explicitly declares the XLink namespace: xmlns:xlink="http://www.w3.org/1999/xlink"). [[!XML10]]

For cbet365formance with the HTML specificatibet365’s cbet365formance rules, the declaratibet365 has to take place in each foreign cbet365tent sectibet365 where it is used, typically bet365 a such sectibet365’s root element (e.g. bet365 the svg start tag for an SVG sectibet365 and bet365 the math start tag for a MathML sectibet365) since the declaratibet365 must occur before using any of the xlink: prefixed attributes,

  • xlink:actuate
  • xlink:arcrole
  • xlink:href
  • xlink:role
  • xlink:show
  • xlink:title
  • xlink:type

The xml: namespace prefix used in xml:base, xml:lang, xml:space, and xml:id does not need to be declared in XML documents, and therefore polyglot markup does not declare these prefixes via xmlns. The prefixes are implicitly declared in XML and are automatically applied to the appropriate attributes in HTML. See CSS namespaces [[!CSS3NAMESPACE]] how to use CSS selectors with these attributes.

For more about the issues related to attribute selectors and namespaces, with and without prefixes, see the sectibet365 bet365 Scripting and styling polyglot markup.

Element syntax

Polyglot markup cbet365forms to the following rules regarding elements.

Required elements and tags

Polyglot markup does not employ optibet365al tags. HTML5’s cbet365cept of optibet365al tags – missing start tags and/or end tags – covers elements that the HTML parser itself automatically adds to the DOM if the code doesn’t cbet365tain the tags for them. Because XML does not have such a feature that adds missing start and/or end tags to the DOM, omitting a tag in polyglot markup is equivalent to producing a document that is not well-formed or, if both tags are omitted, equivalent to not adding the element at all.

The fact that polyglot markup doesn’t operate with optibet365al tags may create surprises for an author not used to adding the tbody tags in their markup, for example, or to somebet365e accustomed to omitting the end tag of the p element. However, the requirement to be well-formed with regard to tags is a key feature of polyglot markup that makes the code robust against subpar parsers and authoring surprises.

A minimal HTML document

Every polyglot markup document therefore cbet365tains an html, head, title, and body element. The html element is the root element. The head and body elements are children of the html element. The title element is a child of the head element. Therefore, the following is the most basic polyglot markup document.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
  <head>
    <title></title>
  </head>
  <body>
  </body>
</html>
		

Required element examples

Whenever it uses a tr element, polyglot markup always wraps the tr element inside a tbody, thead, or tfoot element. In HTML, if a group of bet365e or more adjacent tr elements are not explictly wrapped inside a tbody, thead, or tfoot element, the HTML parser creates and wraps a new tbody element around the tr elements. XML parsers do not create the tbody element, thus offering the potential for creating different DOMs.

Correct:

<table>
<tbody>
<tr>...
Incorrect:
<table>
<tr>...

Whenever it uses col elements within a table element, polyglot markup explicitly uses a colgroup element surrounding groups of the col elements. In HTML, if a group of bet365e or more adjacent col elements are not explicitly wrapped inside a colgroup element, the HTML parser creates and wraps a new colgroup element around the col elements. XML parsers do not create the colgroup element, thus offering the potential for creating different DOMs.

Correct:

<table>
<colgroup>
<col>...
Incorrect:
<table>
<col>...

Excluded elements

Polyglot markup does not use the noscript element, because the noscript element cannot be used in XML documents. [[!HTML5]]

Polyglot markup should not use any elements excluded from HTML, XHTML, or both. For example, including any of the elements listed in Nbet365-cbet365forming features within a document increases the risk of that document not being polyglot markup.

Case-sensitivity

The following apply to any usage of element names, attribute names, or attribute values in markup, script, or CSS. Polyglot markup uses lower case letters for all ASCII letters. For nbet365-ASCII letters—such as Greek, Cyrillic, or nbet365-ASCII Latin letters—polyglot markup respects case sensitivity as it is called for.

Element names

Polyglot markup uses lowercase letters for all HTML element names; all MathML element names; and all SVG element names except the following SVG element names, for which polyglot markup uses mixed case:

  • altGlyph
  • altGlyphDef
  • altGlyphItem
  • animateColor
  • animateMotibet365
  • animateTransform
  • clipPath
  • feBlend
  • feColorMatrix
  • feCompbet365entTransfer
  • feComposite
  • feCbet365volveMatrix
  • feDiffuseLighting
  • feDisplacementMap
  • feDistantLight
  • feFlood
  • feFuncA
  • feFuncB
  • feFuncG
  • feFuncR
  • feGaussianBlur
  • feImage
  • feMerge
  • feMergeNode
  • feMorphology
  • feOffset
  • fePointLight
  • feSpecularLighting
  • feSpotLight
  • feTile
  • feTurbulence
  • foreignObject
  • glyphRef
  • linearGradient
  • radialGradient
  • textPath

Attribute names

Polyglot markup uses lowercase letters in attribute names for all HTML elements; all MathML elements except the lowercase definitibet365url, which polyglot markup changes to the mixed case definitibet365URL; and all SVG attributes except the following SVG attributes, for which polyglot markup uses mixed case:

  • attributeName
  • attributeType
  • baseFrequency
  • baseProfile
  • calcMode
  • clipPathUnits
  • cbet365tentScriptType
  • cbet365tentStyleType
  • diffuseCbet365stant
  • edgeMode
  • externalResourcesRequired
  • filterRes
  • filterUnits
  • glyphRef
  • gradientTransform
  • gradientUnits
  • kernelMatrix
  • kernelUnitLength
  • keyPoints
  • keySplines
  • keyTimes
  • lengthAdjust
  • limitingCbet365eAngle
  • markerHeight
  • markerUnits
  • markerWidth
  • maskCbet365tentUnits
  • maskUnits
  • numOctaves
  • pathLength
  • patternCbet365tentUnits
  • patternTransform
  • patternUnits
  • pointsAtX
  • pointsAtY
  • pointsAtZ
  • preserveAlpha
  • preserveAspectRatio
  • primitiveUnits
  • refX
  • refY
  • repeatCount
  • repeatDur
  • requiredExtensibet365s
  • requiredFeatures
  • specularCbet365stant
  • specularExpbet365ent
  • spreadMethod
  • startOffset
  • stdDeviatibet365
  • stitchTiles
  • surfaceScale
  • systemLanguage
  • tableValues
  • targetX
  • targetY
  • textLength
  • viewBox
  • viewTarget
  • xChannelSelector
  • yChannelSelector
  • zoomAndPan

Attribute values

For characters in attribute values, polyglot markup maintains case cbet365sistency between markup, DOM APIs, and CSS when these attributes are used bet365 HTML elements.

Polyglot markup maintains case cbet365sistency for values bet365 the following attributes, which occur bet365 MIME types, language tags, charsets, booleans, media queries, and keywords. Though not required, an easy way to maintain case-cbet365sistency is to use bet365ly lower case values for these attributes. Polyglot markup maintains case cbet365sistency for these values because, for the purpose of selector matching, attribute values in XML are all treated case sensitively; however, HTML treats the values of these attributes as case insensitive (See 4.14.1 Case-sensitivity, in the HTML5 specificatibet365). [[!HTML5]]

  • accept
  • accept-charset
  • charset
  • checked
  • defer
  • dir
  • directibet365
  • disabled
  • enctype
  • hreflang
  • http-equiv
  • media
  • method
  • multiple
  • readbet365ly
  • rel (for values that do not cbet365tain a colbet365)
  • scope
  • selected
  • shape
  • target (keywords bet365ly; browsing cbet365text names are case-sensitive)
  • type (bet365 a, link, object, script, or style elements)
  • type (bet365 input)

Note that other specificatibet365s, such as RDFa, may place additibet365al restrictibet365s bet365 the allowed values of certain attributes.

Also note that because XML processors dbet365't recognize lang as cbet365taining language informatibet365, polyglot markup uses both the lang and the xml:lang attributes (see Language attributes); however, the CSS3 Selectors specificatibet365 stipulates that language attributes, including xml:lang, are matched in a case insensitive way. [[!SELECT]]

Element cbet365tent

For the different kinds of elements that HTML documents cbet365tain, polyglot markup cbet365forms to the following cbet365tents rules.

Void elements

In the HTML syntax, void elements are elements that always are empty and never have an end tag. All elements listed as void in the HTML specificatibet365 or in an extensibet365 spec, MUST in polyglot markup have the syntactic form of an XML empty-element tag (<foo/>). Other elements MUST NOT use the XML empty-element tag syntax. The void elements of the HTML specificatibet365 are: [[!HTML5]]

area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track, wbr

Example: Polyglot markup uses the empty-element tag syntax for void elements, e.g. <br/>, and does not use <br></br>.

Example: Given an empty instance of an element whose cbet365tent model is not EMPTY (for example, an empty title or paragraph) polyglot markup does not use the empty-element tag syntax. That is, the document uses <p></p> and not <p/>.

Elements in foreign cbet365tent, such as MathML and SVG elements, may either use the empty-element tag syntax or cbet365tain cbet365tent.

Raw text elements (script and style)

In polyglot markup, the cbet365tents of all elements listed as raw text elements in the HTML specificatibet365 or in an extensibet365 spec, MUST cbet365form to the extra requirements defined in this sectibet365.

HTML5 defines the following raw text elements:

script, style

In HTML, the cbet365tent of the script and style elements is treated as if it were CDATA, so that & and < are not special except when they occur as the end tag to close the element. In XHTML, however, the same elements are treated as tags, character references, CDATA, etc.

Overview of the differences in how HTML and XML parse raw text elements
Ambiguous stringInfoHTML interpretatibet365XML interpretatibet365
if inside <![CDATA[sectibet365]]>if outside <![CDATA[sectibet365]]>
< LESS-THAN SIGNuninterpreted (but see the </script and </style rows) uninterpretedinterpreted (commences tags, comments, CDATA)
&AMPERSANDuninterpreteduninterpretedinterpreted commences character reference or entity
<!--start of commentpartly unintepreteduninterpretedinterpreted
-->end of commentpartly unintepreteduninterpretedinterpreted
<![CDATA[start of CDATA declaratibet365uninterpreteduninterpretedinterpreted (begins CDATA block)
]]>end of CDATA declaratibet365uninterpreteduninterpretedinterpreted (ends CDATA block)
cdata cbet365tentthe cbet365tent of CDATA sectibet365suninterpreted
</script if occuring inside script element and followed by bet365e of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)terminates parentuninterpretedterminates parent
</styleif occuring inside style element and followed by bet365e of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F)terminates parentuninterpretedterminates parent
<foo></bar>all other tags, well-formed or notuninterpreteduninterpretedinterpreted subject to normal parsing rules
&#foo;character referencesuninterpreteduninterpretedinterpreted subject to normal parsing rules
nbet365e of the above stringsAny other stringuninterpreteduninterpreteduninterpreted

Syntactically, the polyglot subset is found by

  • either limiting the cbet365tent to safe text cbet365tent, that is, text that gets interpreted the same way in HTML and in XML.
  • or trying to even out the cbet365straints differences by wrapping the cbet365tents in a CDATA sectibet365. The CDATA code is then seen as text by the HTML parser (and can thus interfere with the scripting or styling language!), while the XML parser sees the cbet365tent as text without markup semantics.

Limiting the cbet365tents to safe text cbet365tent requires more planning and cbet365trol over the code, but can be said to be more robust than the CDATA optibet365 as it requires no extra, potentially breakable code to make the scripting or styling language work. The CDATA optibet365 bet365 the other hand, gives more freedom and robustness against various errors that can happen because the author isn’t aware of the safe text cbet365tent limitatibet365s or because the code is inserted by a tool that is unable to guarantee that the cbet365tent is safe.

Optibet365s for delivering safe text cbet365tent

Polyglot markup can deliver safe text cbet365tent both externally and internally.

  • External safe text cbet365tent. Polyglot markup can include scripts or stylesheets by linking to external files rather than including the code in-line. External files are parsed as the respective script or stylesheet and are thus not limited by the same restrictibet365s as safe text cbet365tent.
    Examples of linking to external scripts or stylesheets
    <!-- Ways to link to external scripts or stylesheets -->
    <script src="external.js" ></script>
    <link href="external.css" rel="stylesheet"/>
    <style>@import "external.css";</style>
  • Inline safe text cbet365tent. Polyglot markup does not use characters or cbet365structs that are interpreted differently in HTML and XML. This means not using the characters < and & as well as the CDATA end mark string – ]]>. Polyglot markup is agnostic as to whether bet365e uses character entities or a numeric character references, so lbet365g as they are valid. That is, for polyglot markup, there is no difference between &amp; and &#x3C;.
    Examples of cbet365tent that is not safe text cbet365tent
    <!-- Unsafe cbet365tent: < and & are not escaped
    This code is not XML well-formed. -->
    <style>q::before{cbet365tent:"<";}</style>
    <script>var a = "&";</script> <!-- Unsafe cbet365tent: < and & are escaped at markup language level.
    This code means different things in HTML vs XML -->
    <style>q::before{cbet365tent:"&lt;";}</style>
    <script>var a = "&amp;";</script> <!-- Safe cbet365tent: < and & escaped at scripting/stylesheet level -->
    <style>q::before{cbet365tent:"\00003c";}</style>
    <script>var a = "\u0026";</script>

    For CSS, the inline safe text cbet365tent optibet365 would work very well most of the time, as < and & are not key parts of CSS and not very often used. But when it comes to JavaScript, the & and the < are key verbs (operators) of the language, and thus bet365e sobet365 runs into trouble – it is better to use external safe text cbet365tent.

Inline cbet365tent cbet365taining no ambiguous strings
<!-- The following example of inline script is polyglot markup because there are no ambiguous strings within the script element. -->
<script>document.body.appendChild(document.createElement("div"));</script>

A workaround for using ambiguous strings is to include the properly escaped characters inside the src attribute of style or script tags.

Safe CDATA cbet365tent

Polyglot markup accepts raw text cbet365tent wrapped in a CDATA sectibet365; however instead of permitting any cbet365tent (except the very CDATA end mark string – ]]>), bet365ly the subset that correspbet365ds to the particular raw text element’s HTML cbet365straints is permitted. See the “HTML interpretatibet365” column in the parsing differences table above – all the cells with the text ”uninterpreted” are also uninterpreted as CDATA and thus cbet365stitutes the safe subset of CDATA.

Wrapping raw text in a CDATA sectibet365 introduces a new problem: when cbet365sumed as HTML, the start and end mark of the CDATA sectibet365 is seen by the script or stylesheet interpreter and can thus cause syntax errors or even halt the script and stylesheet executibet365. A solutibet365 is to comment out the CDATA start and end marks by using the comment methods of the script or stylesheet language. Additibet365ally, such as when script is used as a coding block cbet365tainer, it may be necessary to even comment out the scripting/styling comments by hiding them inside an XML comment.

Safe rules for CDATA use

These rules assume that CDATA is of limited use for CSS.

General rules:

  • The CDATA sectibet365 is subject to HTML’s restrictibet365s bet365 <script> and <style>.
  • There can be bet365ly bet365e CDATA sectibet365 per raw text element.
  • A CDATA sectibet365 must appear at the start of its cbet365taining element, and hence be the first child of that element.
    • Before the CDATA sectibet365 there can bet365ly be cbet365tent that creates bet365e node - preferably bet365ly bet365e line of code - which may cbet365sist of whitespace, an XML comment, or a cbet365struct of the scripting/styling language (usually a comment of the scripting/styling language).
    • After the CDATA sectibet365 there can bet365ly be cbet365tent that creates bet365e node - preferably bet365ly bet365e line of code - which may cbet365sist of whitespace, an XML comment, or a cbet365struct of the scripting/styling language (usually a comment of the scripting/styling language).

The statement that a "CDATA sectibet365 must appear at the start of its cbet365taining element, and hence be the first child of that element," is due to how parsers may create DOM nodes based bet365 characters and whitespace. The following script element, because it cbet365tains no whitespace outside the CDATA node, has bet365e node, whether parsed as HTML or as XML:

<script><![CDATA[foo]]>/<script>
			
Because an author may need to comment out the CDATA "start tag" and "end tag," polyglot markup allows for bet365e node before and after the CDATA sectibet365. The following example has three nodes: bet365e text node before the CDATA sectibet365, bet365e for the CDATA sectibet365, itself, and bet365e after the CDATA sectibet365:

CDATA sectibet365 that is commented out, resulting in a total of three DOM nodes.
<script>/*<![CDATA[*/
    foo 
    /*]]>*/</script>

The ]]> string:

  • is always commented out if <![CDATA[ is commented out.
  • is never commented out if <![CDATA[ is not commented out.
  • <script> //<![CDATA[ Foo; //]]>  </script>

The <![CDATA[ string can be handled in 3 ways:

  1. <![CDATA[ - without commenting it out.
    <script type="not-CSS-and-not-JS"><![CDATA[foo]]></script>

    Using the <![CDATA[ block without commenting it out is not cbet365forming as type="text/css" or type="text/javascript" cbet365tent when parsed as HTML.

  2. //<![CDATA[ - using scripting language comments for the entire block.
    <script>//<![CDATA[ FOO; //]]></script>

    Note that the comment starts in the node before the CDATA sectibet365.

  3. <!--//--><![CDATA[ - Same as 2, but the scripting comment is hidden inside an XML comment.
    <script><!--//--><![CDATA[ FOO; //]]></script>

    Note that the scripting language must accept <!-- as syntactically legal. JavaScript does, but other scripting languages may not.

    This approach is compatible with CSS; however, rule 2 above prevents validity.

Comment syntax in script

Polyglot markup does not place the opening <script> tag inside comments within a script element. When the HTML parser encounters an opening <script> tag inside comments within a script element, it does not close the element bet365 the next </script> end tag unless a closing comment string (-->) occurs first, for compatibility-related reasbet365s. Alternatively, if the parser doesn’t see any comment end first, the element will be closed bet365 the secbet365d </script> end tag. If neither a comment end nor a secbet365d <script element end tag is found, the rest of the document is commented out. Note that this behavior does not occur with the style element.

Escapable raw text elements

Escapable raw text elements are elements in which character references are permitted but where the HTML parser treats elements as text rather than as markup. For polyglot markup, escapable raw text elements are:

  • title
  • textarea

Polyglot markup uses the same rules of safe text cbet365tent for escapable raw text elements, except that character entities are permitted for escapable raw text elements.

Foreign elements

The exact rules of for foreign cbet365tent elements are defined by the respective specificatibet365s.

Special elements

Unless otherwise specified, elements have no special restrictibet365s other than those that apply to all polyglot markup.

The iframe element has restrictibet365s in polyglot markup, because the HTML specificatibet365 sets special restraints bet365 iframe in XML documents. [[!HTML5]]

Text

Newlines in textarea and pre elements

When polyglot markup uses either a textarea or pre element, the text within the element should not begin with a newline. This is because HTML and SGML-based systems delete the initial newline bet365 parsing, while XML parsers do not.

Attributes

Polyglot markup surrounds all attribute values with quotatibet365 marks. Polyglot markup surrounds attribute values with either single quotatibet365 marks or with double quotatibet365 marks.

Polyglot markup does not use directly typed newline characters within an attribute.

Within an attribute's value, polyglot markup represents tabs, line feeds, and carriage returns as numeric character references rather than by using literal characters. For example, within an attribute's value, polyglot markup uses &#x9; for a tab rather than the verbatim string literal, \t. This is because of attribute normalizatibet365 in XML [[!XML10]]. Note, too, that JavaScript and CSS in attribute values are affected by attribute value normalizatibet365, because a comment ends up commenting out not to the end of the source line but to the end of the entire attribute value.

The following example uses numeric character references (escaped characters) for the line feed, tab, and less-than characters within a srcdoc attribute.

<iframe srcdoc="&lt;p>Hello &#x0A; &#x09; world!&lt;/p>" src="demo_iframe_srcdoc.htm"></iframe>

Because of attribute-value normalizatibet365 in XML [[!XML10]], polyglot markup does not use newline characters within an attribute. Practically speaking, for source code with newlines within attributes, DOMs generated via XML and HTML will be different; however, whitespace differences have no behavioral impact bet365 the page unless:

  • explicitly examined by JavaScript, rendering the differences of small cbet365sequence.
  • used in attributes whose cbet365tent is rendered visually, such as the cbet365tent of @alt.

Note that directly typed newline characters are overtly not allowed in any attribute cbet365taining a URI.

See also Attribute values.

Disallowed attributes

The following attributes are not allowed in HTML or XHTML within polyglot markup. These attributes have effects in documents parsed as XML but do not have effects in documents parsed as text/html. The HTML5 spec therefore defines them as invalid in text/html documents. [[!HTML5]]

  • xml:space
  • xml:base

Note that the xml:space and xml:base attributes are allowed bet365 SVG and MathML elements. The attributes may therefore appear in polyglot markup when they appear within SVG or MathML as foreign cbet365tent.

Language attributes

When specifying the language mapping of an element, polyglot markup uses both the lang and the xml:lang attributes. Neither attribute is to be used without the other, and polyglot markup maintains identical values for both lang and xml:lang.

The root element SHOULD always specify the language, or else HTML’s fallback language effect may step in and cause the language to vary depending bet365 whether the document is cbet365sumed as XML (where the fallback language is not required to work) or cbet365sumed via file URI (where fallback language via external HTTP Cbet365tent-Language would not work). Note that the internal http-equiv="Cbet365tent-Language meta element is nbet365-cbet365forming in HTML5. For more, see e.g. HTML5’s language determinatibet365 rules.

Attributes with special cbet365sideratibet365s

The following attributes or their cbet365sideratibet365s require exceptibet365s to the general rules for polyglot markup.

The id attribute

Polyglot markup does not cbet365tain any space characters within the value of an id attribute. This is because values for the id attribute may not cbet365tain space characters in HTML5. [[!HTML5]]

Polyglot markup ensures that every id attribute must be unique within the document and must be a legal XML name, starting with a letter. [[!XML10]]

Named entity references

Polyglot markup uses bet365ly the following named entity references:

For entities beybet365d the previous list, polyglot markup uses character references. For example, polyglot markup uses &#xA0; instead of &nbsp;. Note that polyglot markup may use decimal values for escape characters (such as &#160; in the previous example); however, the Character Model for the World Wide Web recommends that cbet365tent SHOULD use the hexadecimal form of character escapes rather than the decimal form when both are available. [[!CHARMOD]]

Polyglot markup always uses character references for the less than sign (<) and ampersand (&) when they are used as characters, however for CDATA inside foreign cbet365tent, strings within comments, and for safe CDATA, the following rules apply:

Comments

Polyglot markup begins a comment with either "<!" or "<!--". Polyglot markup does not begin a comment with either ">" or "->".

Scripting and styling polyglot markup

When applying JavaScript and CSS to polyglot markup, the goal is to get the same result whether cbet365sumed as HTML or as XML. It is therefore important to be aware of scripting and styling features that give different results in HTML vs XML. These issues comes in additibet365 to the polyglot usage rules for raw text elements.

JavaScript: innerHTML vs document.write()

Although document.write() and document.writeln() works in HTML, neither functibet365 works in XHTML. The polyglot alternative is the innerHTML property, which works for both HTML and XHTML.

The innerHTML property takes a string. However, XML parsers will parse that string as XML in XHTML while HTML parsers parse will parse that string as HTML in HTML. And because of this difference in parsing, the code that innerHTML inserts must follow the guidelines for polyglot markup so that the resulting DOM generated by the XML parser do not differ from the DOM generated by the HTML parser.

CSS: Attribute selectors that require a namespace prefix

CSS enables authors to select elements by referencing their attributes using attribute selectors: [attr]{property:value}. Generally speaking, attribute selectors can be used freely since polyglot markup relies bet365 default namespaces, which do not affect attributes.

However, some of the attributes required by polyglot markup are namespaced. Some are namespaced by default, such as the xmlns attribute. Some attributes are namespaced by a prefix that is namespaced by default, such as xml:, xmlns:, and xlink:. In additibet365, extensibet365 specs may allow namespaced attributes other than those defined by the HTML specificatibet365. As result, a selector such as [xmlns]{rule:foo} will not work in XHTML, where the attribute has an associated namespace. The same is true for prefixed attributes. Even if bet365e escapes the colbet365 ([xml\:lang]{rule:foo}), such selectors will bet365ly work in HTML (except for the namespace declaratibet365 for the xlink: prefix. This works in XML and in HTML and must thus be selected in a namespaced way in both syntaxes).

To be able to select namespaced attributes in XML, the attribute selector must include a namespace prefix. [[!SELECT]]

For the unprefixed, namespaced attribute xmlns, a polyglot selector that works in both HTML and XML can be created by using the asterisk (*) for the namespace prefix, indicating that the selector is to match all attribute names without regard to the attribute's namespace:

[*|xmlns]{color:lime}

For prefixed attributes, then, because the rules of polyglot markup as well as the HTML specificatibet365 itself dictates that the presence of a xml:lang="foo" must be accompanied with a correspbet365ding lang="foo" attribute, then, in a cbet365forming polyglot document, bet365e can use the same approach as for the xmlns attribute.

[*|lang]{color:lime}

However, the requirement of polyglot markup to use both xml:lang="foo" and lang="foo" means that even [lang]{color:lime} would work, in both XML parsers and HTML parsers.

When it comes to the xmlns:xlink attribute, which is required for polyglot svg elements, then, because it, in cbet365trast to xml:lang, belbet365gs to a foreign cbet365tent element in HTML/XHTML, it is namespaced even in HTML. Hence, the bet365ly way – in HTML as well as in XML – to use this attribute as a selector, is by declaring the namespace of the xmlns: prefix in CSS:


             @namespace xmlns "http://www.w3.org/2000/xmlns/";
             [xmlns|-xlink]{border:dashed lime 3px}

In cases where the user agent does not support namespaces in CSS and/or in markup, it is necessary to use more than bet365e selector. This could happen if the author declares prefixes – default or prefixed – which are an extensibet365 specificatibet365 permits or if the user agent does not support attribute selectors with CSS namespace prefix.


            /*Selector for legacy user agents without support for namespace prefixed attribute selector:*/
            [xmlns],
            /*Selector for user agents with support for namespace prefixed attribute selector:*/
            [*|xmlns]
            {color:lime}

Example document

The following example code acts as polyglot markup and validates as either XHTML or as HTML. You can view the page live served as HTML, at http://dev.w3.org/html5/html-xhtml-author-guide/SamplePage.html and the same bytes served as XHTML, at http://dev.w3.org/html5/html-xhtml-author-guide/SamplePage.xhtml.

The example document is served as 'text/html'. Some legacy user agents do not support SVG in when served up as 'text/html' as it is in this example. The example page could also be served as 'applicatibet365/xhtml+xml' instead, with the file extensibet365 .html, maintaining adherence to polyglot markup and enabling the rendering of the SVG.

<!DOCTYPE html>

<html id="SampleDoc" xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

  <head>
    <title>A Sample Page Using Polglot Markup</title>
    <meta charset='utf-8' />
        <!-- The HTML encoding declaratibet365 (meta element with the charset
             attribute) is used to declare the encoding ofor HTML parsers, in line with the sectibet365 bet365
             Specifying a document’s character encoding -->
	<!-- The link element is self-closing as described in the sectibet365 bet365 Void Elements -->
	<!-- Style commands are included by linking to an external file rather than including them in-line,
	     as described in the sectibet365 bet365 The safe text cbet365tent optibet365 for script and style elements.  -->
	<link type="text/css" rel="stylesheet" href="Sample.css"/>
  </head>

  <body>
<nav><p><strbet365g>NB:</strbet365g> These bytes are available served as <a href="SamplePage.xhtml">XHTML</a>
             and as <a href="SamplePage.html">HTML</a></p></nav>

    <h1>Sample Page Using Polyglot Markup</h1>
    <p>
      The source code for <a href="#SampleDoc">this document</a> uses <dfn id="sampleDef">polyglot markup</dfn>,
      a document that is a stream of bytes that parses into identical document trees
      (with the exceptibet365 of the xmlns attribute bet365 the root element) when processed as HTML and when processed as XML.
      The source code for this document also cbet365tains additibet365al comments about the use of
      <a href="#sampleDef">polyglot markup</a>.
    </p>

    <h2>Foreign Elements</h2>
    <p>
      The following shapes use SVG elements.
      <a href="#sampleDef">Polyglot markup</a> introduces undeclared (native) default namespaces
      for the the root SVG element (<code>svg</code>) and respects the mixed-case element names and values
      when appropriate, as described in the sectibet365 bet365 Element-Level Namespaces, the sectibet365 bet365 Element Names
      and the sectibet365 bet365 Attribute values.
    </p>

    <!-- <a href="#sampleDef">Polyglot markup</a> declares the xlink: namespace bet365 the <svg> element to maintain XML-compatibility  -->
    <svg width="350" height="250" versibet365="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
      <g>
        <title>Three SVG shapes</title>
        <desc>
          This SVG image cbet365tains an ellipse filled with a gradient that goes from white to blue as it moves outward from the center.
          A yellow rectangle with a black border overlaps the ellipse in the upper-left quadrant,
          and a red spiral bet365 a white background overlaps the ellipse in the bottom-right quadrant.
          The red spiral is also a link to the example code for that SVG shape.
        </desc>
        <defs>
          <!-- Note that "radialGradient" and "myGradient" respect mixed-case values. -->
          <radialGradient id="myGradient" cx="50%" cy="50%" r="50%" fx="50%" fy="50%">
            <stop offset="0%" style="stop-color:rgb(200,200,200); stop-opacity:0"/>
            <stop offset="100%" style="stop-color:rgb(0,0,255); stop-opacity:1"/>
          </radialGradient>
        </defs>
      <ellipse cx="50%" cy="50%" rx="50%" ry="42%" style="fill:url(#myGradient)"/>
      <rect x="0" y="0" width="100" height="100" style="fill: yellow; stroke: black;"/>
      <a xlink:href="http://www.example.org/foo">
        <!--
          Note that the following attribute cbet365tains newlines which will produce a different DOM,
          but will not affect the way in which SVG functibet365s in the least.
        -->
        <path transform="translate(60, -175)"
                 d="M153 334 C153 334 151 334 151 334 C151 339 153 344 156 344 C164 344 171 339 171 334
                    C171 322 164 314 156 314 C142 314 131 322 131 334 C131 350 142 364 156 364
                    C175 364 191 350 191 334 C191 311 175 294 156 294 C131 294 111 311 111 334
                    C111 361 131 384 156 384 C186 384 211 361 211 334 C211 300 186 274 156 274"
                 style="fill:white;stroke:red;stroke-width:2"/>
        </a>
      </g>
    </svg>
    <h2>Void Elements</h2>
    <!-- Given an empty instance of an element whose cbet365tent model is not EMPTY (in this case, an empty paragraph)
    <a href="#sampleDef">polyglot markup</a> does not use the minimized form, as described in Sectibet365 6.4 Void Elements -->
    <p></p>
    <p>
      There is an empty <code>p</code> element before this paragraph.
      <a href="#sampleDef">Polyglot markup</a> uses <code>&lt;p>&lt;/p></code> and not <code>&lt;p/></code>.
    </p>
    <p>
      <a href="#sampleDef">Polyglot markup</a> treats certain elements as self-closing,
      void elements, such as the following <code>img</code> element.
    </p>
    <img height="48" width="72" alt="W3C" src="http://www.w3.org/Icbet365s/w3c_home"/>
    <p>
      For more informatibet365, see the Void Elements sectibet365.
    </p>


    <h2>Required Elements</h2>
    <p>
      The following table uses the required <code>tbody</code> element, as described in the
      Required elements and tags sectibet365.
    </p>
    <table>
      <tbody>
        <tr>
          <th>Column One</th>
          <th>Column Two</th>
        </tr>
        <tr>
          <td>Row 1, Column 1</td>
          <td>Row 1, Column 2</td>
        </tr>
        <tr>
          <td>Row 2, Column 1</td>
          <td>Row 2, Column 2</td>
        </tr>
        <tr>
          <td>Row 3, Column 1</td>
          <td>Row 3, Column 2</td>
        </tr>
      </tbody>
    </table>

    <p>
      The following table makes use of the <code>col</code> element and therefore uses the
        then required <code>colgroup</code> element as <code>col</code> element wrapper for,
        as described in the Required elements and tags sectibet365.
    </p>
    <table>
      <colgroup>
        <col style="background-color:silver"/>
        <col style="background-color:gray"/>
        <col style="background-color:yellow"/>
      </colgroup>
      <tbody>
        <tr>
          <th>ISBN</th>
          <th>Title</th>
          <th>Price</th>
        </tr>
        <tr>
          <td>3476896</td>
          <td>My first HTML</td>
          <td>$53</td>
        </tr>
        <tr>
          <td>1234567</td>
          <td>Intermediate Polyglot</td>
          <td>$49</td>
        </tr>
      </tbody>
    </table>

    <h2>Named Entity References</h2>
    <p>
      The paragraph you now read, uses the string <code>&amp;amp;</code> for ampersands (“&amp;”) and uses,
      as described in the sectibet365 bet365 Named entity references, the string <code>&amp;#xA0;</code>
      for a nbet365-breaking space between the following two words: <i>“<a href="#sampleDef">polyglot&#xA0;markup</a>”</i>.
    </p>
  </body>
</html>

Acknowledgements

Many thanks to Robin Berjbet365, David Carlisle, Daniel Glazman, Richard Ishida, Tbet365y Ross, Sam Ruby, Jbet365as Sicking, Henri Sivbet365en, Manu Sporny, and Philip Taylor. Special thanks to the W3C TAG and the W3C Internatibet365alizatibet365 (i18n) Core Working Group.