* WGs marked with an * asterisk has had at least one new draft made available during the last 5 days

Ticket #39 (closed defect: wontfix)

Opened 4 years ago

Last modified 3 years ago

Warn about mistaken conversion for non-BMP characters

Reported by: duerst@it.aoyama.ac.jp Owned by:
Priority: minor Milestone:
Component: 3987bis Version:
Severity: - Keywords:
Cc:

Description

Find a place to note that some older software transcoding to UTF-8 may produce illegal output for some input, in particular for characters outside the BMP (Basic Multilingual Plane). As an example, for the IRI with non-BMP characters (in XML Notation):
"http://example.com/𐌀𐌁&#x10302";
which contains the first three letters of the Old Italic alphabet,
the correct conversion to a URI is
"http://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C%82"

Change History

comment:1 Changed 3 years ago by masinter@adobe.com

  • Status changed from new to closed
  • Resolution set to wontfix

I'm reluctant to add a warning about a situation whose likelihood is unclear without more evidence that mis-implementations of UTF-8 are in practice still deployed and a difficulty.

If there's any evidence that we actually need to do this, please re-open ticket.

Note: See TracTickets for help on using tickets.