xyzzy: 2007

2007-12-23

RFC 5064 Archived-At

Great, RFC 5064 about the Archived-At mail and news message header field got its number, just in time to consider this as an early xmas gift ;-)

2007-12-20

Fresh opensearch & google gadgets

New Opensearch descriptions:

Google Code custom search
en×de translations by LEO
es×de translations by LEO
fr×de translations by LEO

New googlets:

About Flash googlet (version, links, search)
Atomic clock googlet (JAVA applet of the PTB)
Tiny map search googlet (local search)

The tiny map location can be set, default 53.55, 9.99.

2007-11-08

RFC errata

Various pending RFC errata have been published recently, among others for RFC 2069, 2822, and 4408. The RFC editor might soon offer a Web form for submissions as outlined in an Internet Draft.

The 2069 erratum resulted in an editorial update of my MD5 test suite.
The 2822 erratum was already covered in the 2822upd drafts.
The 4408 erratum is actually a link to the OpenSPF errata page.

2007-11-07

Sorbian and Frisian use the Latin script

The language subtag registry defined in BCP 47 was updated, it now contains the new region codes BL & MF and Suppress-Script: Latn for the Frisian languages frr, frs, and fy, for the Sorbian languages dsb and hsb, and for the Low and Swiss German languages nds and gsw.

The mis entry was updated some months ago in ISO 639, its description is now uncoded languages. I've created new experimental XML versions of the registry, for other formats check out the Language Tags site.

Lever dood as slaav ;-)

2007-11-05

ABNF, Archived-At, News, and NNTP

RFC 4234bis was approved, we'll soon see a new Internet Standard (STD) about ABNF, the syntax used in many RFCs. An RFC defining the Archived-At header field in e-mail and news was approved earlier while I was in essence incommunicado after a system crash... :-(

I've submitted new versions of the news and nntp URI draft, adding an appendix with a detailed example about the relations between Archived-At, Message-ID, Xref, news-, and nntp-URLs. Two typos in version 06 fixed, this draft reached a point where it's easier to spoil it than to improve it.

2007-11-02

Broken validators

Popular validators like the WDG and the W3C validator unfortunately still accept various kinds of broken URIs not limited to unencoded IRIs as "valid". For the W3C validator that's a known bug.

Admittedly it's almost impossible to fix this bug based on a DTD, renaming %URI; as in the related XML schema anyURI to %IRI; in the DTD has the same effect as renaming it to %FOO; for DTD-validators, the datatype is still CDATA, or in other words (almost) anything goes.

Hopefully even DTD-validators will be fixed really soon to check URIs. Broken URLs are abused for attacks, ironically that was a side effect of better URI tests, several applications failed to check the generic RFC 3986 syntax. All valid URIs match this generic syntax, scheme specific URIs are proper subsets of the generic syntax. URI "producers" including MediaWiki as well as URI "consumers" including validators have to get this right, otherwise bad things happen.

The folks at validome hope that they'll get this right soon, schema validators have an advantage. They already identify the IDNwiki and its E-mail test as invalid, big oops for accessibility tests (IANAL).

Update: The IDNwiki pages were fixed 2007-11-21.

2007-10-22

New TLDs BL, MF, and TEL

Version 2.1 of rxwhois knows the new TLDs BL, MF, and TEL. The corresponding whois servers are not yet running, but whois.iana.org already supports these TLDs. The corresponding region codes BL and MF will be added to the IANA language subtag registry in the next weeks.

Now rxwhois also supports the eleven IDN test domains. The test started 2007-10-15, but it took me until yesterday to fix a stupid bug.

The worst issue so far from my POV is that popular XHTML validators like the W3C validator don't check the URL syntax in attributes like href="URI", see bug 4916. Many users will be misled to create invalid pages with "unencoded" IRIs in document types like HTML 4.01 and XHTML 1.0, where that's not allowed.

2007-09-28

RFC 2617 vs. 2831 md5-sess

Good news first, RFC 4590bis (approved, still waiting for its number) will fix the Digest-MD5 examples in RFC 4590. I've updated the MD5 test suite using the fixed examples.

While I was at it I've also updated the RFC 3797 code to work for the NOMCOM 2007 case. The entropy limit 30 was too restrictive, 38 is good enough for MD5, 10^38 < 2^128.

Now the bad news, the issue with two md5-sess examples in draft smith-sipping-auth-examples might be in fact precisely what RFC 2617 says, as reported in a semi-official erratum. If that's correct the md5-sess in RFC 2831 would be different. Hopefully draft melnikov-digest-to-historic will shed some light on this before it moves RFC 2831 to historic. For more about this see the IETF SASL WG mailing list.

For now the MD5 test suite still uses only the binary x2c(HA1) form instead of the hex. HA1 form in its md5-sess calculation.

2007-09-23

HTML PUBLIC "-//IETF//DTD HTML i18n//EN"

Admittedly RFC 2070 is old, and its status is historic. But it was the first HTML specification with I18N based on UNICODE, and the last HTML specification published by the IETF.

So far its DTD had to be extracted manually from the RFC, now IANA hosts an official master copy with the public identifier urn:ietf:params:xml:pi:-:IETF:DTD+HTML+i18N:EN. The urn:ietf:params:xml:pi registry was created by RFC 3688 for DTDs developed by the IETF. Of course HTML i18n is still SGML, not XML, but its DTD is now the first registered IETF DTD.

If you have old HTML i18n documents you can use an URL of this DTD as system identifier like this:

<DOCTYPE HTML PUBLIC "-//IETF//DTD HTML i18n//EN" "http://www.iana.org/assignments/xml-registry/publicid/html2070.dtd">

2007-08-23

ftpsynch.rex

Many OS/2 SAAREXX programs work almost as is under WindowsNT ooRexx, the RexxUtil functions are similar, the WindowsNT CMD shell is similar, and the RxSock interface is almost identical. Some OS/2 RexxUtil functions are not yet or not more supported under WindowsNT, e.g. SysGetMessage, SysProcessType, and SysQueryProcessCodePage.

Scripts I've tested under W2K after renaming *.cmd to *.rex include popstop2.cmd, dir2html.cmd, and sitemap.cmd.

Unfortunately ooRexx doesn't come with RxFTP.dll, but it offers the RxFTP.cls class. I've created a new ftpsynch.rex based on ftpsynch.cmd for ooRexx, for details see ftpsynch.htm.

2007-08-21

rxwhois 2.0.5

Version 2.0.5 of rxwhois.cmd now also works as WindowsNT ooRexx script, just rename it to rxwhois.rex. I've adopted additional local character sets from utf-8.cmd including codepage 923 (ISO 8859-15, Latin 9) and 878 (KOI8-R), but I've only tested 858 (pc-multilingual-850+euro) and 1252 (windows-1252).

As always some whois-servers for ccTLDs had to be updated, for details see the source. After a system crash of my OS/2 box in June I was unable to check anything beyond the whois servers already known by rxwhois.cmd, whois.iana.org, and whois-servers.net. Just for fun I've added the eleven IDN TLDs for the test beginning in September 2007:

Arabic xn--kgbechtv
Persian xn--hgbk6aj7f53bba
Chinese, simplified xn--0zwm56d
Chinese, traditional xn--g6w251d
Russian xn--80akhbyknj4f
Hindi xn--11b5bs3a9aj6g
Greek xn--jxalpdlp
Korean xn--9t4b11yi5a
Yiddish xn--deba0ad
Japanese xn--zckzah
Tamil xn--hlcj6aya9esc7a

Known issue: rxwhois.cmd expects UTF-8 as charset of whois servers, but whois.iana.org uses Latin-1 for at least one TLD ht (Haiti) entry. The IANA folks told me that they'll intend to use ASCII data for the eleven IDN test TLDs.

2007-07-27

OpenSearch descriptions for Google CSEs

It's quite simple to create opensearch descriptions for any existing Google CSE. Here's an example using the mozillaZine KB CSE:

This CSE is identified by cx=003258325049489668794:ru2dpahviq8. The &cx=-parameter is used in links and anything else related to this CSE. The left hand side 003258325049489668794 is related to the Google account and the up to 5000 annotations (e.g. sites and URL patterns) associated with this account. The right hand side ru2dpahviq8 is related to the actual CSE context including details of its layout, references to the associated annotations also known as background labels, etc.

It's not my CSE, I can ignore most technical details only relevant for the CSE creator. One detail is probably important, this CSE uses FORID:1 unlike my own CSEs with FORID:0. The value is visible in the monstrous URL of search results, it's a part of the &cof= parameter.

Most other layout details noted in &cof= are set by Google on the fly based on the CSE definition a.k.a. context. For my own CSEs I force LP:0 and AH:center with &cof=FORID%3A0%3BLP%3A0%3BAH%3Acenter, but that's arguably pointless, opensearch only works with Firefox 2, IE7, or better, and these browsers have no issues with the default LP:1 logo position and AH:left aligned header on result pages.

CSEs refuse to return &output=xml or &output=xml-no-dtd results, therefore the opensearch description needs only one type="text/html" template. Just in case I added...

<SyndicationRight> limited </SyndicationRight> <Attribution> Google CSE by Jason Kersey </Attribution>

...anyway, after all the search results are Google results. In this case results filtered and rearranged as defined by the CSE creator Jason Kersey. With up to three searched sites in a CSE Google allegedly also shows its supplemental results.

Putting it all together I arrived at this mozillazine opensearch description. I've no clue how and where Firefox or IE7 might use the Tags or Description, most likely these details are irrelevant for search results on ordinary type="text/html" pages. The validator wants a Query example as specified by opensearch.org, just for fun I picked about%3Aconfig.

One last detail, the icon, fortunately kb.mozillazine.org has a type="image/vnd.microsoft.icon" 16×16 favicon needing less than 10 KB, so this should work as is (http:-URL instead of data:-URL) for Firefox. It's tricky to get the icon right with *.googlepages.com, the Google Page Creator won't let you have your own favicon.ico. Just use another name.

One way to use opensearch descriptions is to add a link in the header of (X)HTML pages. The title in the link should match the ShortName in the description, otherwise browsers won't know if the corresponding search is already installed. I've done that here in my blogger-template:

<link rel="search" href="xyzzy.a9.xml" type="application/opensearchdescription+xml" title="xyzzy" />

Another way is the window.external.AddSearchProvider function, Firefox 2 users can then simply click on the link to copy an opensearch description to their browser. I haven't tested IE7, maybe it uses the same method, i.e. "copy description". Last step, test this OpenSearch description with Firefox 2 or better.

For another example see my googlets page.

2007-06-11

Simple REXX mailto script

Yet another mailto command line tool, rxmailto.cmd v0.1 can so far send one text mail to one receiver via a Mail Submit Agent (MSA) at port 587 supporting SMTP AUTH with CRAM-MD5 and 8bitMIME. That's the minimum I could get away with after the spam flood finally drowned my old mailbox.

Various details are far from perfect, e.g. if a run of words with non-ASCII characters in the subject is longer than 56 octets the subject encoder will emit a folded line longer than 76 characters, and that's not permitted by RFC 2047. On the other hand the script won't break UTF-8 characters in the subject for platforms with UTF-8 as local charset. You get what you pay for, less than 30 KB. ;-)

2007-06-09

MD5 test suite 1.2

The MD5 test suite version 1.2 finally supports streaming and bit string input:

   hash = MD5( bytes )          ==> MD5 of an octet string
   ctxt = MD5( bytes, '' )      ==> init.  new MD5 context
   ctxt = MD5( bytes, ctxt )    ==> update old MD5 context
   hash = MD5( /**/ , ctxt )    ==> finalize   MD5 context
   hash = MD5( bytes, /**/, n ) ==> MD5 of n zero-fill bits
   ctxt = MD5( bytes, ''  , n ) ==> init.  MD5 bit context
   ctxt = MD5( bytes, ctxt, n ) ==> update MD5 bit context

Also added: APR1 can determine the hashed passwords used by BSD and Apache htpasswd. This is a function also offered by openssl passwd -1 and openssl passwd -apr1, for details see a manual of the openssl command line tool.

2007-05-20

sitemap.cmd 0.3

FWIW I've added the schema magic to sitemap.cmd (0.3), adjusting the documentation of the REXX ftpsynch wannabe-content management system.

Apart from being a bit longer and overwriting siteold.xml the new version now passes XML schema validation. Caveat, don't use its buggy text/html output at the moment.

Unrelated, Google's page creator rewrites an uploaded sitemap 0.90 automagically into a sitemap 0.84 removing all <lastmod> elements. Or rather it did that last week for e.g. this sitemap, maybe it's one of the experimental features.

md5.cmd 1.1: Auth Digest + Digest-MD5

The IETF SASL WG recently decided to drop the RFC 2831bis draft from their agenda. Therefore I've removed the code handling <quoted-pair> (backslashes) from the MD5 test suite 1.0 (REXX script).

RFC 4590 contains four examples for Auth Digest. That's in essence the same as Digest-MD5 defined in RFC 2831, only based on the older RFC 2617. The examples were apparently copied as is to RFC 4590bis drafts. I've added the 2*2 (INVITE+rspauth, GET+rspauth) examples to md5.cmd (1.1).

The RFC 4590 examples still fail in my MD5 test suite, or rather my attempt to guess the used password failed. There's also an oddity in these examples not yet supported by the REXX script:

RFC 2617 states that a client sending any qop= parameter, for the RFC 4590 examples that's qop=auth, MUST also send a cnonce= (client nonce) together with a NC= (nonce counter). In the RFC 4590 examples the client doesn't do that, causing a trap in my REXX script.

There are two plausible ways to fix this, either use the RFC 2069 fallback algorithm, or simply omit the missing NC and CNONCE. In simplified REXX the second solution would be:

 return MD5( HA1 || ':' || NONCE || ':auth:' || MD5( XURL ))

The first (2069) solution would use a colon : instead of :auth:. The "official" RFC 2617 string instead of :auth: is:

 ':' || NC || ':' || CNONCE || ':' || QOP || ':'

Other variants of what RFC 4590 actually wants could be to use an empty CNONCE with a dummy NC in the direction of :00000001::auth:. As always Digest-MD5 is messy.

Related, an old 2069-erratum still rots in the pending errata mbox. I'm now confident that the 2069-code in md5.cmd works at least with the IETF tools server. I've not yet submitted an erratum for RFC 2983, three out of four 2983-examples are fine.

2007-05-14

LP Logo Position, AH Align Header

Some troubles with Google's custom search engines (CSE):

The watermark "branding" fails miserably with Javascript 1.1. The code doesn't check this old version resulting in garbage with browsers still using it. I always disable JS 1.1, but I can't ask visitors of pages using my "xyzzy" CSE to disable JS 1.1 before, they wouldn't know what it is. Now I use the ordinary "branding" with one of Google's six CSE logos.
The default position of the search form on the result page is "left" set by AH:left. Fans of the old free site search form know this "Align Header" parameter, it's (kind of) documented on my lab page. It's also straight forward to modify it, add ;AH:center to the cof=FORID:0 parameter, where 0 might be something else depending on the used form.
The free site search result pages show my logo immediately above the search form. The used parameters L: for the logo URL and S: for the site URL are still the same for CSEs, it's only not more necessary to specify all these odd values as part of the cof= parameter, Google inserts them on the fly. Unfortunately it also uses some CSS magic (style sheets) to get this right, failing miserably with browsers not supporting CSS. After some experiments I found the culprit: There's a new LP:1 "Logo Position" , this has to be disabled by LP:0 to get the desired effect.

Putting it all together I ended up with this form input:

 <input type="hidden" name="cof" value="FORID:0;AH:center;LP:0" />

For an example see this form, I'll update the other forms later. Of course I'll need my own "googlet" for this CSE. But the normal "add to Google" CSE gadget is anyway far too big for my taste.

2007-05-04

Inline googlets

I've moved leo-dict.xml and leo-gghp.xml to Web space hosted by Google. Both offer a simple form for a service provided by LEO, English to German (or vice versa) translations.

leo-dict.xml uses Content type html for Googlets with target="_top" as required by LEO, resulting in an <iframe> on an iGoogle-page or wherever it's used. Of course this only works with browsers and devices supporting <iframe>.

leo-gghp.xml uses Content type html-inline without target="_top", resulting in an ordinary search form on an iGoogle-page working with any browser, at least it works for me. There are various disadvantages of html-inline, e.g., users are asked if they really trust that this Googlet won't screw up the layout of their iGoogle-page (it could with some JavaScript magic). On the other hand html-inline could also work on PDAs.

Fictitious u+1E9E character endorsed by German Home Office

A misrepresentation of some ß uses in upper case head lines, on books, and on tombstones resulted in a proposal to add an "upper case ß" to Unicode as code point u+1E9E (PDF).

I've got a warning that this PDF might crash some PDF viewers, but exceptionally AcroReader 3 survived this attack.

The next likely step could be demands to permit ß in I18N domain labels because it suddenly got an upper case variant.

Of course there's also the "minor" problem of upgrading fonts and software for a fictitious "upper case ß" worldwide, as far as they support German.

2007-04-15

Googlet

The Google Gadgets are a cute idea, they allow to encapsulate Web forms into pieces of XML, which then can be added to personal Google start pages and similar services.

A bit like the old favlet/bookmarklet concept based on javascript-URLs, but with more features like user preferences. I'm not yet sure where they store user preferences, on their servers or in cookies.

The name add.gif for the link icon isn't very intuitive, I renamed my copies to googlet.gif. My second experiment after the xyzzy "custom search engine" is a LEO dict search form, but so far Google claims that it's unavailable or empty when I try to add it.

sitemap.cmd

For users of sitemap.cmd, a REXX script generating sitemaps, the improved sitemap specification doesn't require an update.

The location of the sitemap can now be given in a robots.txt file, see their protocol page, example:

 Sitemap: http://www.xyzzy.claranet.de/sitemap.xml

Any services above finding the sitemap like Google's webmaster tools still require some form of submission proving that the submitter has write access on the submitted site.

Syndic8

History, this isn't my first attempt with a "mail2blog" service, for two years mailbucket.org did what I wanted.

Unfortunately guessing the submit address for a mailbucket feed was simple, and the spammers found it. Other services I tested (including blogger) had no "mail2blog" feature OR required a version of SSL not supported by my browser(s) OR required a version of JavaScript not supported by my browser(s). Therefore I was forced to drop my old (pseudo-) blog redirecting to an error page without proper HTTP error code.

Now after it works again I found that Syndic8 never gave up to poll the feed URL for its FeedID 62125.

Validation issues

The Blogger help pages are messy, many links are broken, SSL login fails miserably with old browsers, the feedback link without login doesn't work, etc. If you're looking for a blog hoster try to find a better service. I want the "blog-by-mail" feature available here, so it's my own fault.

Known issue, the application/atom+xml media type for the atom feed apparently degenerates into text/html after the first access. This might be a cache issue, resulting in a warning by the W3C validator:

2007-04-09

HTML test

Here's my OpenSPF page, it offers a customized search engine provided by Google.


Search only IANA, ICANN, IETF, OpenSPF, Unicode, W3C, xyzzy