Popular validators like the WDG and the W3C validator unfortunately still accept various kinds of broken URIs not limited to unencoded IRIs as "valid". For the W3C validator that's a known bug.
Admittedly it's almost impossible to fix this bug based on a DTD, renaming %URI; as in the related XML schema anyURI to %IRI; in the DTD has the same effect as renaming it to %FOO; for DTD-validators, the datatype is still CDATA, or in other words (almost) anything goes.
Hopefully even DTD-validators will be fixed really soon to check URIs. Broken URLs are abused for attacks, ironically that was a side effect of better URI tests, several applications failed to check the generic RFC 3986 syntax. All valid URIs match this generic syntax, scheme specific URIs are proper subsets of the generic syntax. URI "producers" including MediaWiki as well as URI "consumers" including validators have to get this right, otherwise bad things happen.
The folks at validome hope that they'll get this right soon, schema validators have an advantage. They already identify the IDNwiki and its E-mail test as invalid, big oops for accessibility tests (IANAL).
Update: The IDNwiki pages were fixed 2007-11-21.
No comments:
Post a Comment