* WGs marked with an * asterisk has had at least one new draft made available during the last 5 days

Ticket #426 (closed editorial: incorporated)

Opened 2 years ago

Last modified 2 years ago

p2 editorial feedback

Reported by: mnot@pobox.com Owned by:
Priority: normal Milestone: 22
Component: p2-semantics Severity: Active WG Document
Keywords: Cc:
Origin: http://lists.w3.org/Archives/Public/ietf-http-wg/2012OctDec/0279.html

Description

3.1. Representation Metadata

| Expires | Section 7.3 of [Part6] |

If "Expires" is considered "representation metadata", then it seems like "ETag" and "Last-Modified" should be as well. But I think it would make more sense to just remove "Expires" from the list; it's clearly the odd man out here.

3.1.1.2. Character Encodings (charset)

Implementers need to be aware of IETF character set requirements [RFC3629] [RFC2277].

It's not clear what requirements this is referring to; RFC 2277 places requirements on protocol authors, not on implementors, and RFC 3629 is just the definition of UTF-8. If the requirement is "implementations MUST support UTF-8" then we should say that.

3.1.1.4. Multipart Types

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. HTTP does not use the multipart boundary as an indicator of message body length. In all other respects, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type.

That last part seems completely wrong; a web browser is not expected to handle multipart/alternative or multipart/related in the way a mail reader would. (This requirement came from RFC 2616, but... it was wrong then too.)

The MIME header fields within each body-part of a multipart message body do not have any significance to HTTP beyond that defined by their MIME semantics.

This is not true of multipart/byteranges; in RFC 2616 that was explained separately, but that explanation got lost in httpbis rewrites at some point.

Suggested rewrite for the second and third paragraphs:

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. The one exception is the "multipart/byteranges" type (Appendix A of [Part5]) when it appears in a 206 (Partial Content) response. In all other cases, the MIME header fields within each body-part of a multipart message body do not have any significance at the HTTP level; they are just part of the representation data.

(This drops the newly-added "HTTP does not use the multipart boundary as an indicator of message body length", but that is already implied by the removal of 2616's prohibition on epilogue data; if the multipart is allowed to have an epilogue, then the final boundary doesn't indicate the end of the body anyway. It also drops the "unrecognized multipart subtype" text, which was already irrelevant given the "strictly as payload" rule anyway.)

3.1.3.1. Language Tags

In summary, a language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags:

language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

Kinda weird... the text sets you up to expect an actual grammar for language-tag, but then you just get a cross-reference. I'd rearrange stuff to:

... HTTP uses language tags within the Accept-Language and Content-Language fields.

language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

A language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags. White space is not allowed within the tag and all tags are case-insensitive. Example tags include:

en, en-US, es-419, az-Arab, x-pig-latin, man-Nkoo-GN

See [RFC5646] for further information.

(also dropping the language-subtag-registry ref, since that's covered by the "See [RFC5646]")

3.4. Content Negotiation

(such as when many different formats are supported by a user-agent),

no hyphen

3.4.1. Proactive Negotiation

If the selection of the best representation for a response is made by an algorithm located at the server, it is called proactive negotiation.

That text doesn't motivate the new name. How about:

If the selection of the best representation for a response is made by the server based on preferences indicated by the user agent in its initial request for the resource, it is called proactive negotiation.

  1. It might limit a public cache's ability to use the same response

for multiple user's requests.

users' not user's

For example, the origin server might not implement proactive negotiation, or it might decide that sending a response that doesn't conform to them is better than sending a 406 (Not Acceptable) response.

Not clear what "them" is. "...that doesn't conform to the user agent's preferences..."

3.4.2. Reactive Negotiation

This specification defines the 300 (Multiple Choices) and 406 (Not Acceptable) status codes for enabling reactive negotiation when the server is unwilling or unable to provide a varying response using proactive negotiation.

406 doesn't really "enable reactive negotiation". It just fails to do proactive negotiation.

Also, should we mention how reactive negotiation is *actually* done?

This specification defines the 300 (Multiple Choices) status code for enabling reactive negotiation. However, in practice, Web sites wanting to do reactive negotiation will just return a successful response containing a "default" (or proactively negotiated) representation of the resource, which includes within it links that the user can follow to reach other representations.

  1. Product Tokens

By convention, the products are listed in order of their significance for identifying the application.

"...in *decreasing* order of...", or something like that. (likewise in the description of User-Agent in 6.5.3 and Server in 8.4.2)

5.2.2. Idempotent Methods

Section 6.2.2.1 of Part1 implies that the concept of "idempotent sequences of request methods" (as opposed to merely "idempotent methods") will be discussed here, but it's not. I'm not sure if it should be added here or there.

5.3.1. GET

The semantics of the GET method change to a "partial GET" if the request message includes a Range header field ([Part5]).

"a Range or If-Range header field"

5.3.6. CONNECT

Though obvious, it seems like for consistency's sake, this should end with:

Responses to the CONNECT method are not cacheable.

5.3.7. OPTIONS

If no payload body is included, the response MUST include a Content-Length field with a field-value of "0".

Does this actually mean to prohibit servers from using chunked encoding (or "Connection: close" with no Content-Length) in that case? Or is it just supposed to be a reminder that "empty message body" is different from "no message body"?

(Section 9.1.2 has basically the same text.)

If no Max-Forwards field is present in the request, then the forwarded request MUST NOT include a Max-Forwards field.

"If no Max-Forwards field is present in the upstream request, then the downstream request MUST NOT include a Max-Forwards field."

6.2. Conditionals

The HTTP/1.1 conditional request mechanisms are defined in [Part4].

"and [Part5]" (If-Range)

6.3. Content Negotiation

6.1 and 6.2 had some introductory text before the table, and it seems weird to not have that here.

(6.4 and 6.5 have the same problem)

6.3.1. Quality Values

Should this section be called "Weight" now?

6.3.5. Accept-Language

would mean: "I prefer Danish, but will accept British English and other types of English". (see also Section 2.3 of [RFC4647])

Capitalize "See"

  1. Response Status Codes

The status-code element is a 3-digit integer result code of the attempt to understand and satisfy the request.

"...a 3-digit integer code giving the result of the attempt..."

o 2xx (Successful): The action was successfully received,

understood, and accepted

"The *request* was successfully..."

7.1. Overview of Status Codes

The reason phrases listed here are only recommendations -- they can be replaced by local equivalents without affecting the protocol.

That suggests you can/should translate them into other languages, which isn't really what they're for and kind of contradicts p1 3.1.2's "A client SHOULD ignore the reason-phrase content."

| 415 | Unsupported Media Type | Section 7.5.13 | | 416 | Requested range not | Section 3.2 of | | | satisfiable | [Part5] | | 417 | Expectation Failed | Section 7.5.14 |

The capitalization of "Requested range not satisfiable" is inconsistent with the rest of the table.

7.2. Informational 1xx

A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect a 100 (Continue) status message.

No reason to call out 100 Continue specifically here... "A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect one."

7.3.2. 201 Created

If the newly created resource's URI is the same as the Effective Request URI, this information can be omitted

"effective request URI" is not capitalized like that anywhere else. (Well, except for once more later on in this section which should also be fixed.)

If the action cannot be carried out immediately, the server SHOULD respond with 202 (Accepted) response instead.

"with *a* 202 (Accepted) response"

8.1.1.2. Date

  1. If the response status code is 100 (Continue) or 101 (Switching

Protocols), the response MAY include a Date header field, at the server's option.

Is that really supposed to be limited to 100 and 101, and not other 1xx codes?

8.1.3. Retry-After

This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked to wait

No hyphen in "user agent"

8.4.1. Allow

Allow = #method

Should that be 1#method? If not, it should explain what an empty "Allow" header means.

9.1.1. Procedure

HTTP method registrations MUST include the following fields:

Should "cacheability" be an explicit field (rather than just a required part of the specification text)?

9.3. Header Field Registry

It seems weird to have this in p2 since p1 defines headers too...

9.3.1. Considerations for New Header Fields

o Whether it is appropriate to list the field-name in the Connection

header field (i.e., if the header field is to be hop-by-hop, see Section 6.1 of [Part1]).

should have a semicolon rather than comma after "hop-by-hop". (So that it doesn't read like it's telling you to only follow the xref if the header field is hop-by-hop.)

10.1. Transfer of Sensitive Information

Four header fields are worth special mention in this context: Server, Via, Referer and From.

"Via" is in p1 though, so the Via bits should be moved to p1's Security Considerations? (Or maybe if we end up with a p0, all of the security considerations should be consolidated there.)

The information sent in the From field might conflict with the user's privacy interests or their site's security policy, and hence it SHOULD NOT be transmitted without the user being able to disable, enable, and modify the contents of the field. The user MUST be able to set the contents of this field within a user preference or application defaults configuration.

Do any browsers actually ever send the "From" header? If not, should we just say "From is for robots, not browsers"?

Appendix C. Changes from RFC 2616

Remove base URI setting semantics for "Content-Location" due to poor implementation support, which was caused by too many broken servers emitting bogus Content-Location header fields, and also the potentially undesirable effect of potentially breaking relative links in content-negotiated resources. (Section 3.1.4.2)

That would parse better if the "which was..." clause was parenthesized rather than just set off by commas.

Failed to consider that there are many other request methods that are safe to automatically redirect, and further that the user agent is able to make that determination based on the request method semantics.

This is written in the opposite style from the rest of the list (it describes the problem with 2616 rather than the solution in httpbis). Should be something like:

Allow automatic redirection of all "safe" methods, not just GET and HEAD, and give the user agent more latitude in redirecting unsafe methods. (Section 7.4)

Change History

comment:1 Changed 2 years ago by fielding@gbiv.com

From [2113]:

(editorial) make section on language tags more concise, since we already delegate the definition to RFC5646; partly addresses #426

comment:2 Changed 2 years ago by fielding@gbiv.com

From [2114]:

(editorial) improve description of 300 and 406 in reactive negotiation; partly addresses #426

comment:3 Changed 2 years ago by fielding@gbiv.com

From [2115]:

(editorial) product tokens listed in decreasng order; partly addresses #426

comment:4 Changed 2 years ago by fielding@gbiv.com

3.1. Representation Metadata

Expires | Section 7.3 of [Part6] |

If "Expires" is considered "representation metadata", then it seems like "ETag" and "Last-Modified" should be as well. But I think it would make more sense to just remove "Expires" from the list; it's clearly the odd man out here.

Moved to control data in [2092].

3.1.1.2. Character Encodings (charset)

Implementers need to be aware of IETF character set requirements [RFC3629] [RFC2277].

It's not clear what requirements this is referring to; RFC 2277 places requirements on protocol authors, not on implementors, and RFC 3629 is just the definition of UTF-8. If the requirement is "implementations MUST support UTF-8" then we should say that.

Removed in [1975].

3.1.1.4. Multipart Types

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. HTTP does not use the multipart boundary as an indicator of message body length. In all other respects, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type.

That last part seems completely wrong; a web browser is not expected to handle multipart/alternative or multipart/related in the way a mail reader would. (This requirement came from RFC 2616, but... it was wrong then too.)

It was right back in the days of Mosaic for X. It isn't implemented by browsers today. Removed in [2050].

The MIME header fields within each body-part of a multipart message body do not have any significance to HTTP beyond that defined by their MIME semantics.

This is not true of multipart/byteranges; in RFC 2616 that was explained separately, but that explanation got lost in httpbis rewrites at some point.

Suggested rewrite for the second and third paragraphs:

In general, HTTP treats a multipart message body no differently than any other media type: strictly as payload. The one exception is the "multipart/byteranges" type (Appendix A of [Part5]) when it appears in a 206 (Partial Content) response. In all other cases, the MIME header fields within each body-part of a multipart message body do not have any significance at the HTTP level; they are just part of the representation data.

(This drops the newly-added "HTTP does not use the multipart boundary as an indicator of message body length", but that is already implied by the removal of 2616's prohibition on epilogue data; if the multipart is allowed to have an epilogue, then the final boundary doesn't indicate the end of the body anyway. It also drops the "unrecognized multipart subtype" text, which was already irrelevant given the "strictly as payload" rule anyway.)

A similar rewrite was done in [2050].

3.1.3.1. Language Tags

In summary, a language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags:

language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

Kinda weird... the text sets you up to expect an actual grammar for language-tag, but then you just get a cross-reference. I'd rearrange stuff to:

... HTTP uses language tags within the Accept-Language and Content-Language fields.

language-tag = <Language-Tag, defined in [RFC5646], Section 2.1>

A language tag is composed of one or more parts: A primary language subtag followed by a possibly empty series of subtags. White space is not allowed within the tag and all tags are case-insensitive. Example tags include:

en, en-US, es-419, az-Arab, x-pig-latin, man-Nkoo-GN

See [RFC5646] for further information.

(also dropping the language-subtag-registry ref, since that's covered by the "See [RFC5646]")

Done in [2113].

3.4. Content Negotiation

(such as when many different formats are supported by a user-agent),

no hyphen

Fixed already (and then rewritten later in [2050]).

3.4.1. Proactive Negotiation

If the selection of the best representation for a response is made by an algorithm located at the server, it is called proactive negotiation.

That text doesn't motivate the new name. How about:

If the selection of the best representation for a response is made by the server based on preferences indicated by the user agent in its initial request for the resource, it is called proactive negotiation.

Rewritten in [2050].

  1. It might limit a public cache's ability to use the same response

for multiple user's requests.

users' not user's

Rewritten in [2050].

For example, the origin server might not implement proactive negotiation, or it might decide that sending a response that doesn't conform to them is better than sending a 406 (Not Acceptable) response.

Not clear what "them" is. "...that doesn't conform to the user agent's preferences..."

Done in [2050].

3.4.2. Reactive Negotiation

This specification defines the 300 (Multiple Choices) and 406 (Not Acceptable) status codes for enabling reactive negotiation when the server is unwilling or unable to provide a varying response using proactive negotiation.

406 doesn't really "enable reactive negotiation". It just fails to do proactive negotiation.

Fixed in [2114].

Also, should we mention how reactive negotiation is *actually* done?

This specification defines the 300 (Multiple Choices) status code for enabling reactive negotiation. However, in practice, Web sites wanting to do reactive negotiation will just return a successful response containing a "default" (or proactively negotiated) representation of the resource, which includes within it links that the user can follow to reach other representations.

I have mentioned other patterns in the parent section and within the 300 code.

  1. Product Tokens

By convention, the products are listed in order of their significance for identifying the application.

"...in *decreasing* order of...", or something like that. (likewise in the description of User-Agent in 6.5.3 and Server in 8.4.2)

Fixed in [2115].

... more later ...

comment:5 Changed 2 years ago by fielding@gbiv.com

From [2116]:

rewrite the sections on retrying requests and pipelining to resolve nonsense about non-idempotent sequences; partly addresses #426

comment:6 Changed 2 years ago by fielding@gbiv.com

From [2118]:

reorder paragraphs in method descriptions for consistency; note that CONNECT is not cacheable; partly addresses #426

comment:7 Changed 2 years ago by fielding@gbiv.com

From [2119]:

Accept-Language: clean up prose and note descending order of priority for equal weights (as defined in RFC4647 and original HTTP); partly addresses #426

comment:8 Changed 2 years ago by fielding@gbiv.com

From [2120]:

(editorial) add section intros; partly addresses #426

comment:9 Changed 2 years ago by fielding@gbiv.com

5.2.2. Idempotent Methods

Section 6.2.2.1 of Part1 implies that the concept of "idempotent sequences of request methods" (as opposed to merely "idempotent methods") will be discussed here, but it's not. I'm not sure if it should be added here or there.

Rewritten there in p1 [2116].

5.3.1. GET

The semantics of the GET method change to a "partial GET" if the request message includes a Range header field ([Part5]).

"a Range or If-Range header field"

No, If-Range has no meaning without Range.

5.3.6. CONNECT

Though obvious, it seems like for consistency's sake, this should end with:

Responses to the CONNECT method are not cacheable.

*sigh* [2118].

5.3.7. OPTIONS

If no payload body is included, the response MUST include a Content-Length field with a field-value of "0".

Does this actually mean to prohibit servers from using chunked encoding (or "Connection: close" with no Content-Length) in that case? Or is it just supposed to be a reminder that "empty message body" is different from "no message body"?

(Section 9.1.2 has basically the same text.)

Yes, they were designed to require a specific indicator of no body for the sake of persistent connections.

If no Max-Forwards field is present in the request, then the forwarded request MUST NOT include a Max-Forwards field.

"If no Max-Forwards field is present in the upstream request, then the downstream request MUST NOT include a Max-Forwards field."

Already rephrased in [2064].

6.2. Conditionals

The HTTP/1.1 conditional request mechanisms are defined in [Part4].

"and [Part5]" (If-Range)

That is noted in Part4.

6.3. Content Negotiation

6.1 and 6.2 had some introductory text before the table, and it seems weird to not have that here.

(6.4 and 6.5 have the same problem)

Fixed in prior edits and [2020].

6.3.1. Quality Values

Should this section be called "Weight" now?

I don't think so, mostly for historical reasons.

6.3.5. Accept-Language

would mean: "I prefer Danish, but will accept British English and other types of English". (see also Section 2.3 of [RFC4647])

Capitalize "See"

Led to a larger rewrite in [2119].

... more later ...

comment:10 Changed 2 years ago by fielding@gbiv.com

From [2122]:

(editorial) explain empty Allow field for 405; misc typos; partly addresses #426

comment:11 Changed 2 years ago by fielding@gbiv.com

  1. Response Status Codes

The status-code element is a 3-digit integer result code of the attempt to understand and satisfy the request.

"...a 3-digit integer code giving the result of the attempt..."

o 2xx (Successful): The action was successfully received,

understood, and accepted

"The *request* was successfully..."

Both fixed by Julian [1964].

7.1. Overview of Status Codes

The reason phrases listed here are only recommendations -- they can be replaced by local equivalents without affecting the protocol.

That suggests you can/should translate them into other languages, which isn't really what they're for and kind of contradicts p1 3.1.2's "A client SHOULD ignore the reason-phrase content."

They can be (and often are) localized in practice. The client SHOULD ignore them, yes, but that doesn't mean servers don't have to respect local requirements regarding their own language use.

| 415 | Unsupported Media Type | Section 7.5.13 | | 416 | Requested range not | Section 3.2 of | | | satisfiable | [Part5] | | 417 | Expectation Failed | Section 7.5.14 |

The capitalization of "Requested range not satisfiable" is inconsistent with the rest of the table.

Fixed by Julian [1964]. I've shortened it to Range Not Satisfiable.

7.2. Informational 1xx

A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect a 100 (Continue) status message.

No reason to call out 100 Continue specifically here... "A client MUST be prepared to accept one or more 1xx status responses prior to a regular response, even if the client does not expect one."

Yep, fixed in [2122].

7.3.2. 201 Created

If the newly created resource's URI is the same as the Effective Request URI, this information can be omitted

"effective request URI" is not capitalized like that anywhere else. (Well, except for once more later on in this section which should also be fixed.)

Fixed in [2105].

If the action cannot be carried out immediately, the server SHOULD respond with 202 (Accepted) response instead.

"with *a* 202 (Accepted) response"

Fixed by Julian [1964].

8.1.1.2. Date

  1. If the response status code is 100 (Continue) or 101 (Switching

Protocols), the response MAY include a Date header field, at the server's option.

Is that really supposed to be limited to 100 and 101, and not other 1xx codes?

No, already rewritten to fix that.

8.1.3. Retry-After

This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked to wait

No hyphen in "user agent"

Fixed by Julian.

8.4.1. Allow

Allow = #method

Should that be 1#method? If not, it should explain what an empty "Allow" header means.

Yes, explained in [2122].

9.1.1. Procedure

HTTP method registrations MUST include the following fields:

Should "cacheability" be an explicit field (rather than just a required part of the specification text)?

We discussed this in another issue and decided that it was too complex an issue for a simple checkmark.

9.3. Header Field Registry

It seems weird to have this in p2 since p1 defines headers too...

A registry is primarily for linking from name to semantics.

9.3.1. Considerations for New Header Fields

o Whether it is appropriate to list the field-name in the Connection

header field (i.e., if the header field is to be hop-by-hop, see Section 6.1 of [Part1]).

should have a semicolon rather than comma after "hop-by-hop". (So that it doesn't read like it's telling you to only follow the xref if the header field is hop-by-hop.)

Fixed by Julian [1964].

10.1. Transfer of Sensitive Information

Four header fields are worth special mention in this context: Server, Via, Referer and From.

"Via" is in p1 though, so the Via bits should be moved to p1's Security Considerations? (Or maybe if we end up with a p0, all of the security considerations should be consolidated there.)

I think it belongs here.

The information sent in the From field might conflict with the user's privacy interests or their site's security policy, and hence it SHOULD NOT be transmitted without the user being able to disable, enable, and modify the contents of the field. The user MUST be able to set the contents of this field within a user preference or application defaults configuration.

Do any browsers actually ever send the "From" header? If not, should we just say "From is for robots, not browsers"?

I rewrote this in [2054].

Appendix C. Changes from RFC 2616

Remove base URI setting semantics for "Content-Location" due to poor implementation support, which was caused by too many broken servers emitting bogus Content-Location header fields, and also the potentially undesirable effect of potentially breaking relative links in content-negotiated resources. (Section 3.1.4.2)

That would parse better if the "which was..." clause was parenthesized rather than just set off by commas.

Fixed by Julian and then rewritten again my me in [2083].

Failed to consider that there are many other request methods that are safe to automatically redirect, and further that the user agent is able to make that determination based on the request method semantics.

This is written in the opposite style from the rest of the list (it describes the problem with 2616 rather than the solution in httpbis). Should be something like:

Allow automatic redirection of all "safe" methods, not just GET and HEAD, and give the user agent more latitude in redirecting unsafe methods. (Section 7.4)

Rewritten in [2083].

comment:12 Changed 2 years ago by fielding@gbiv.com

  • Status changed from new to closed
  • Resolution set to incorporated

Thanks for your detailed comments; all have been addressed or explained above.

comment:13 Changed 2 years ago by fielding@gbiv.com

  • Milestone changed from unassigned to 22
Note: See TracTickets for help on using tickets.