* WGs marked with an * asterisk has had at least one new draft made available during the last 5 days

Ticket #47 (new defect)

Opened 4 years ago

Last modified 2 years ago

Practical length limits on IRIs or components thereoff

Reported by: duerst@it.aoyama.ac.jp Owned by: masinter@adobe.com
Priority: major Milestone:
Component: 3987bis Version:
Severity: - Keywords:
Cc:

Description

This issue results from a split of issue #37.
The question is whether we need any practical length limits and implementation advice on IRIs or components theroff.

I haven't found any such advice in RFC 3987 yet (but I haven't looked too closely).

At http://tools.ietf.org/html/rfc3986#section-3.2.2, RFC 3986 says:

URI producers should use names

that conform to the DNS syntax, even when use of DNS is not
immediately apparent, and should limit these names to no more than
255 characters in length.

At http://tools.ietf.org/html/rfc5890#section-4.2, RFC 5890 says:

Because A-labels (the form actually used in the

DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 252 characters
(Unicode code points).

This should most probably read "up to 252 *octets*.

What this seems to point out is that we may need some implementer advice that IRIs in general need more storage than ASCII-only URIs, first because non-ASCII characters usually need more octets than ASCII, and second because a conversion from UTF-8 to percent-encoded will expand the length, in octets, by a factor of three.

Change History

comment:1 Changed 3 years ago by masinter@adobe.com

I remember we had a long discussion about this, but don't remember any recommendation to do anything about imposing a limit.

Perhaps something of the form "This document does not specify any limits on length of IRIs or any components of them. Generators of IRIs should be aware of the potential length limits downstream processors might have for IRIs, parsed IRI components, and the results of translation to parsed URI components or reconstructed URIs.

Leaving open for now.

comment:2 Changed 2 years ago by masinter@adobe.com

  • Owner set to masinter@adobe.com
Note: See TracTickets for help on using tickets.