Ticket #47 (new defect)
Practical length limits on IRIs or components thereoff
|Reported by:||firstname.lastname@example.org||Owned by:||email@example.com|
This issue results from a split of issue #37.
The question is whether we need any practical length limits and implementation advice on IRIs or components theroff.
I haven't found any such advice in RFC 3987 yet (but I haven't looked too closely).
URI producers should use names
that conform to the DNS syntax, even when use of DNS is not
immediately apparent, and should limit these names to no more than
255 characters in length.
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 252 characters
(Unicode code points).
This should most probably read "up to 252 *octets*.
What this seems to point out is that we may need some implementer advice that IRIs in general need more storage than ASCII-only URIs, first because non-ASCII characters usually need more octets than ASCII, and second because a conversion from UTF-8 to percent-encoded will expand the length, in octets, by a factor of three.