* WGs marked with an * asterisk has had at least one new draft made available during the last 5 days

Ticket #20 (closed design: fixed)

Opened 7 years ago

Last modified 2 years ago

Default charsets for text media types

Reported by: mnot@pobox.com Owned by: fielding@gbiv.com
Priority: normal Milestone: 14
Component: p3-payload Severity: Active WG Document
Keywords: Cc:
Origin: http://www.w3.org/mid/B6C10798-0A18-4D37-AEC7-E93E8C0F102A@yahoo-inc.com

Description

2616 Section 3.7.1 states;

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.

However, many, if not all, of the text/* media types define their own defaults; text/plain (RFC2046), for example, defaults to ASCII, as does text/xml (RFC3023).

How do these format-specific defaults interact with HTTP's default? Is HTTP really overriding them?

I'm far from the first to be confused by this text, and I'm sure it's been asked before, but I haven't been able to find a definitive answer. If errata are still being considered, perhaps removing/ modifying this line would be a good start...

Attachments

i20.diff (2.9 KB) - added by julian.reschke@gmx.de 3 years ago.
proposed patch, *removing* those sections (this may be too drastic)
i20.2.diff (4.6 KB) - added by julian.reschke@gmx.de 3 years ago.
new path, now also removing the special case for Accept-Encoding

Change History

comment:1 Changed 7 years ago by mnot@pobox.com

  • version set to d00
  • Component set to payload
  • Milestone set to unassigned

comment:2 Changed 7 years ago by julian.reschke@gmx.de

From [146]:

Add directory for test cases, starting with encoding tests (addresses #20)

comment:3 Changed 7 years ago by julian.reschke@gmx.de

From [147]:

Set mime types for test files (addresses #20)

comment:4 follow-up: ↓ 5 Changed 7 years ago by mnot@pobox.com

  • Milestone changed from unassigned to 02

Resolution:

  1. remove <http://tools.ietf.org/id/draft-ietf-httpbis-p3-payload-01.txt>, section 2.3.1, the entire forth paragraph (i.e., the last one in that section).
  1. From 2.1.1: Move """HTTP/1.1 recipients MUST respect the charset label provided by the sender; and those user agents that have a provision to "guess" a charset MUST use the charset from the content-type field if they support that charset, rather than the recipient's preference, when initially displaying a document. """ to the end of 2.3.1, removing the rest of 2.1.1.
  1. Add text to Security Considerations explaining UTF-7 vulnerability in browsers and exclude such charsets from the guessing algorithm. (see http://www.w3.org/mid/B412EABE-8E69-455F-A00B-A1ED1F386440@gbiv.com)

comment:5 in reply to: ↑ 4 Changed 7 years ago by julian.reschke@gmx.de

  1. Add text to Security Considerations explaining UTF-7 vulnerability in browsers and exclude such charsets from the guessing algorithm. (see http://www.w3.org/mid/B412EABE-8E69-455F-A00B-A1ED1F386440@gbiv.com)

I'll be happy to apply the changes if somebody proposes the exact text to be added to the security considerations...

comment:6 Changed 7 years ago by julian.reschke@gmx.de

From [209]:

Remove character set defaulting for text media types (to be done: add security considerations WRT charset sniffing); relates to #20.

comment:7 Changed 7 years ago by julian.reschke@gmx.de

From [211]:

Back out change [209], see discussion around <http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0233.html>; relates to #20.

comment:8 Changed 7 years ago by mnot@pobox.com

  • Milestone changed from 02 to 03

comment:9 Changed 6 years ago by mnot@pobox.com

  • Milestone changed from 03 to unassigned

comment:10 Changed 6 years ago by julian.reschke@gmx.de

From Dublin meeting minutes (http://jabber.ietf.org/logs/httpbis/2008-07-29.txt):

[09:20:47] <Thomas Roessler> julian: default character set for media types (text/*)?
[09:20:51] <Thomas Roessler> aleksey: oh gosh
[09:20:54] <Thomas Roessler> mnot: wary of this
[09:21:58] <Thomas Roessler> mnot: <grepping RFC 2616 for ISO-8859-1 occurences>
[09:22:08] <Thomas Roessler> mnot: well, the issue is...
[09:22:25] <Thomas Roessler> julian: the issue is when you look in different RFCs for default encoding of text/*, you get different answers
[09:22:33] <Thomas Roessler> ... text registration, text/xml registration, HTTP text
[09:22:37] <Thomas Roessler> ... wouldn't know which one is normative ...
[09:22:59] <Thomas Roessler> ... were close to getting rid of ISO-8859-1, but then Roy stepped in ...
[09:23:11] <Thomas Roessler> ... if we can't make normative change, might be useful to phrase this in a way that makes clear what's going on ...
[09:23:15] <Thomas Roessler> mnot: issue-20
[09:23:47] <Thomas Roessler> ... proposed tetx suggests that we override defaults ...
[09:24:00] <Thomas Roessler> ... relationship between the two isn't clear -- which takes precedence ...
[09:24:04] <Thomas Roessler> ... this is confusing people ...
[09:24:08] <Thomas Roessler> ... we had a proposal that we backed out ...
[09:24:26] <Thomas Roessler> mnot: roy, did you have a proposal for this that you remember?
[09:24:32] <roy.fielding> It was a deliberate decision to override MIME.  Lots of discussion way back then.
[09:24:42] <Thomas Roessler> barry: <channeling roy>
[09:24:48] <roy.fielding> not that I can remember .. will search
[09:25:25] <Thomas Roessler> julian: If it was deliberate discussion to override MIME, should we now override text/...?
[09:25:44] <Thomas Roessler> mnot: remember there were historical reasons for iso-8859-1
[09:25:51] <roy.fielding> right, Mosaic puked on charset parameter
[09:26:06] <Thomas Roessler> julian: problem is that default is harmful for formats that carry their own charset info
[09:26:23] <Thomas Roessler> ... at least for text/xml, should document what's implemented in practice ...
[09:26:40] <Thomas Roessler> mnot: document 
[09:26:55] <Thomas Roessler> ACTION: mnot to research previous discussion, and restate so we can get going again

comment:11 Changed 6 years ago by julian.reschke@gmx.de

Remove our own default, but point out that the MIME default doesn't apply either.

comment:12 Changed 6 years ago by julian.reschke@gmx.de

comment:13 Changed 5 years ago by julian.reschke@gmx.de

  • Priority set to urgent

comment:14 Changed 5 years ago by mnot@pobox.com

  • Priority changed from urgent to normal

Latest summary at:

http://www.w3.org/mid/5565932F-C73D-4183-A09B-46993DD63F88@mnot.net

Discussed at Stockholm editors' meeting; inclination is to define default as 8859-1 and allow sniffing (perhaps just when not declared), but not to allow sniffing to UTF-7 (i.e., only a superset of ascii).

comment:15 Changed 4 years ago by mnot@pobox.com

  • Priority changed from normal to later
  • Severity set to Active WG Document

comment:16 Changed 4 years ago by lmm@acm.org

Text this refers to is currently:

http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-11.html#rfc.section.2.3.1

see

http://tools.ietf.org/html/draft-masinter-mime-web-info also for mention of this as possible change to MIME rather than HTTP.

comment:17 Changed 3 years ago by mnot@pobox.com

  • Owner set to fielding@gbiv.com

Prague 2011 editor discussion: proposal is to remove any default (i.e., default is not ascii as in mime, not iso-8859-1 as in 2616) and allow sniffing for text charset. Update linked "Missing Charset" as a result (as well as any other refs to iso-8859-1).

comment:18 Changed 3 years ago by mnot@pobox.com

  • Priority changed from later to normal

comment:19 Changed 3 years ago by julian.reschke@gmx.de

Accept-Charset still special-cases ISO-8859-1; do we want to get rid of this, too?

Changed 3 years ago by julian.reschke@gmx.de

proposed patch, *removing* those sections (this may be too drastic)

Changed 3 years ago by julian.reschke@gmx.de

new path, now also removing the special case for Accept-Encoding

comment:20 Changed 3 years ago by julian.reschke@gmx.de

From [1240]:

Remove ISO-8859-1 default for text/*, remove special-case for ISO-8859-1 from Accept-Charset (see #20)

comment:21 Changed 3 years ago by julian.reschke@gmx.de

  • Status changed from new to closed
  • Resolution set to incorporated

comment:22 Changed 3 years ago by mnot@pobox.com

  • Milestone changed from unassigned to 14

comment:23 Changed 3 years ago by mnot@pobox.com

  • Status changed from closed to reopened
  • Resolution incorporated deleted

comment:24 Changed 3 years ago by mnot@pobox.com

  • Status changed from reopened to closed
  • Resolution set to fixed

comment:25 Changed 2 years ago by julian.reschke@gmx.de

From [1657]:

see what the historic default charset was so that the change is easier to find (see #20)

Note: See TracTickets for help on using tickets.