Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving text from HTML's web storage into the Storage Standard #95

Open
annevk opened this issue May 29, 2020 · 14 comments
Open

Moving text from HTML's web storage into the Storage Standard #95

annevk opened this issue May 29, 2020 · 14 comments

Comments

@annevk
Copy link
Member

annevk commented May 29, 2020

As some of the advice seems a bit dated, I'm opening this issue to track and discuss what kind of changes we might be making. The relevant PRs are whatwg/html#5560 and #93.

@annevk
Copy link
Member Author

annevk commented May 29, 2020

User agents should not expire data from a browsing context's session storage areas, but may do so when the user requests that such data be deleted, or when the UA detects that it has limited storage space, or for security reasons. User agents should always avoid deleting data while a script that could access that data is running. When a top-level browsing context is destroyed (and therefore permanently inaccessible to the user) the data stored in its session storage areas can be discarded with it, as the API described in this specification provides no way for that data to ever be subsequently retrieved.

I think this mostly follows from the requirements we already have around UI, storage pressure, and tying storage to specific objects, except for the script running thingie.

User agents should expire data from the local storage areas only for security reasons or when requested to do so by the user. User agents should always avoid deleting data while a script that could access that data is running.

This doesn't list sufficient reasons, e.g., storage pressure, but fortunately the Storage Standard covers that already. Script running is important though.

annevk added a commit that referenced this issue May 29, 2020
@annevk
Copy link
Member Author

annevk commented May 29, 2020

Disk space

User agents should limit the total amount of space allowed for storage areas, because hostile authors could otherwise use this feature to exhaust the user's available disk space.

User agents should guard against sites storing data under their origin's other affiliated sites, e.g., storing up to the limit in a1.example.com, a2.example.com, a3.example.com, etc, circumventing the main example.com storage limit.

User agents may prompt the user when quotas are reached, allowing the user to grant a site more space. This enables sites to store many user-created documents on the user's computer, for instance.

User agents should allow users to see how much space each domain is using.

A mostly arbitrary limit of five megabytes per origin is suggested. Implementation feedback is welcome and will be used to update this suggestion in the future.

For predictability, quotas should be based on the uncompressed size of data stored.

https://www.w3.org/Bugs/Public/show_bug.cgi?id=21319#c3 is linked in the source for the last paragraph, which makes me question how much buy-in it has.

As part of Storage Standard discussions we have discussed origin vs site as well and my recollection is that in general we don't really want to put sites on a pedal and instead encourage mitigations that also work against a bad actor that has 10k to buy some registrable domains (or uses github.io or some such).

I think the other requirements are already captured by the existing text. The one exception is the 5 MiB limit. We might well want to keep that for localStorage/sessionStorage, but the current infrastructure doesn't cleanly allow for it. #69 might help with this I suppose.

@inexorabletash
Copy link
Member

Refactoring things out of the HTML monolith? 🎉

Script running is important though.

Yep... but those two sentences are somewhat in conflict, so probably deserve more nuance. If the user clears data, it should happen even if script is running. To do so properly may require stopping script from running, so it's not wrong, just complicated.

Re: Compression - yeah, I think that ship has sailed. Chrome reports actual quota usage regardless of compression or overhead (e.g. write-ahead logs etc prior to compaction)

Re: origin vs. site - there's also the challenges of preventing cross-origin information leakage which we probably need to document in Storage at some point.

Re: 5 MiB limit - FWIW, I think Chrome is at 10MiB (@mkruisselbrink ?). Agreed that we should keep the limit for localStorage/sessionStorage. I think it would be fine for any storage endpoint to impose its own bottle size limit in addition to a bucket limit.

annevk added a commit that referenced this issue Jun 4, 2020
And use this to port HTML's 5 MiB suggestion.

See #95.
@domenic
Copy link
Member

domenic commented Jun 5, 2020

In general I find the text in HTML much clearer and more detailed than what is in Storage today. So let me try to raise some specific concerns in this issue, before we proceed with merging whatwg/html#5560.

User agents should not expire data from a browsing context's session storage areas, but may do so when the user requests that such data be deleted, or when the UA detects that it has limited storage space, or for security reasons. User agents should always avoid deleting data while a script that could access that data is running. When a top-level browsing context is destroyed (and therefore permanently inaccessible to the user) the data stored in its session storage areas can be discarded with it, as the API described in this specification provides no way for that data to ever be subsequently retrieved.

I think this mostly follows from the requirements we already have around UI, storage pressure, and tying storage to specific objects, except for the script running thingie.

I have bolded a sentence which I cannot find a counterpart in Storage (after #93).

User agents should limit the total amount of space allowed for storage areas, because hostile authors could otherwise use this feature to exhaust the user's available disk space.

User agents should guard against sites storing data under their origin's other affiliated sites, e.g., storing up to the limit in a1.example.com, a2.example.com, a3.example.com, etc, circumventing the main example.com storage limit.

User agents may prompt the user when quotas are reached, allowing the user to grant a site more space. This enables sites to store many user-created documents on the user's computer, for instance.

User agents should allow users to see how much space each domain is using.

A mostly arbitrary limit of five megabytes per origin is suggested. Implementation feedback is welcome and will be used to update this suggestion in the future.

For predictability, quotas should be based on the uncompressed size of data stored.

I think the other requirements are already captured by the existing text.

I have bolded sentences for which I cannot find counterparts in Storage (after #93). It sounds like maybe you disagree with sentence 2, and I believe Chrome does too (our most-starred bug is about guarding against this "attack", and I think we've decided to not address it). So that one perhaps should be dropped. But the others seem valuable.

@annevk
Copy link
Member Author

annevk commented Jun 5, 2020

  1. This is covered by https://whatpr.org/storage/93.html#management and being much more explicit about when data is created. The same rules as elsewhere apply. Unless anything suggests it's allowed, it's not.
  2. What it calls storage areas are storage bottles and they are still restricted as per Add the ability for bottles to have a quota #97. Making that advice more general would warrant more discussion. In general I like the balance between https://whatpr.org/storage/93.html#usage-and-quota and https://whatpr.org/storage/93.html#storage-pressure giving user agents more ability to let applications get hold of more storage if they need it (as well as removing things if they don't).
  3. I covered this upthread.
  4. I don't think we should be prompting the user and this model isn't really workable for a site that wants to store a lot of data. It gets an exception, then the user gets prompted, but how does the site know?
  5. Allowing users to clear sites is covered by https://whatpr.org/storage/93.html#ui-guidelines and I think that's good enough. I'm not convinced we should make the UI requirements that specific.
  6. Discussed above.

annevk added a commit that referenced this issue Jun 5, 2020
And use this to port HTML's 5 MiB suggestion.

See #95.
annevk added a commit that referenced this issue Jun 5, 2020
"legacy-clone a browsing session storage shed" can be used by HTML to define creation of auxiliary browsing contexts, as part of whatwg/html#5560.

"obtain a storage key" can be used by APIs that share keying logic with storage, such as BroadcastChannel and shared workers. See whatwg/html#3054. It's potentially also useful for Indexed DB as discussed in w3c/IndexedDB#334.

Also helps a bit with #95 by reorganizing and adding some more detail to how a user agent is supposed to manage storage.

Closes #92.
@domenic
Copy link
Member

domenic commented Jun 9, 2020

  1. This is covered by https://whatpr.org/storage/93.html#management and being much more explicit about when data is created. The same rules as elsewhere apply. Unless anything suggests it's allowed, it's not.

I can't find anything in Storage that says user agents are allowed to clear session storage upon user request or for security reasons. I can only find storage pressure.

  1. What it calls storage areas are storage bottles and they are still restricted as per Add the ability for bottles to have a quota #97. Making that advice more general would warrant more discussion. In general I like the balance between https://whatpr.org/storage/93.html#usage-and-quota and https://whatpr.org/storage/93.html#storage-pressure giving user agents more ability to let applications get hold of more storage if they need it (as well as removing things if they don't).

The sentence (2) talks about a limit on the total size consumed by all bottles. I cannot find a counterpart in Storage.

  1. I covered this upthread.

Are you referring to

As part of Storage Standard discussions we have discussed origin vs site as well and my recollection is that in general we don't really want to put sites on a pedal and instead encourage mitigations that also work against a bad actor that has 10k to buy some registrable domains (or uses github.io or some such).

If so, I'd like to get general confirmation that we want to remove this restriction. As I said above, it seems like Chrome does not intend to implement the restriction, and I guess maybe you're saying Firefox doesn't either?

Additionally, if we're really moving to a world where "Unless anything suggests it's allowed, it's not", then it'd be good to ensure that nobody is doing this today, since such a mitigation would become disallowed.

  1. I don't think we should be prompting the user and this model isn't really workable for a site that wants to store a lot of data. It gets an exception, then the user gets prompted, but how does the site know?

This seems reasonable, but I'd like us to note it as a normative change, since we're disallowing user agents from doing something. We should also ensure that no user agents are currently doing it.

  1. Allowing users to clear sites is covered by https://whatpr.org/storage/93.html#ui-guidelines and I think that's good enough. I'm not convinced we should make the UI requirements that specific.

I'm hesitant to remove user-friendly normative text from the specification.

  1. Discussed above.

Should we add the opposite statement? This seems important for interoperability... otherwise the 5 MiB quota is pretty meaningless.

@annevk
Copy link
Member Author

annevk commented Jul 6, 2020

1:

Instead user agents should offer users the ability to clear all storage for each group of schemelessly same site origins.

2:

The storage quota of a storage shelf is a conservative estimate of the total amount of bytes it can hold. This amount should be less than the total available storage space on the device to give users some wiggle room.

I guess I could add implementation-defined to that: #108.

3: Firefox currently has this restriction, but we are thinking of removing it. As quotas are implementation-defined as per 2, I don't think we would be really disallowing anything. We're just not putting specific solutions in the specification that might not be ideal.

4: Does it matter if implementers agree it's a bad model? Exhaustively testing this across all APIs feels like a waste of time if we all want to move past it.

5: How is requiring particular UI user-friendly? I'm not sure we even know what the best UI would be here in our future partitioned world.

6: Unless we define the cost of each operation exactly we're not going to get interoperability, so I'd rather have it be somewhat vague as I'm not sure we want to commit on how to serialize code units and such.

annevk added a commit that referenced this issue Jul 6, 2020
…e space

Also make it clear usage and quota for storage shelves are ultimately implementation-defined.

Helps with #95 and #70.

Closes #106.
annevk added a commit that referenced this issue Jul 6, 2020
Also make it clear usage and quota for storage shelves are ultimately implementation-defined.

Helps with #95 and #70.

Closes #106.
@domenic
Copy link
Member

domenic commented Jul 6, 2020

1

The text you quoted loses the mention of why user agents should offer this, namely upon user request or for security reasons.

2

Agreed, thanks for pointing that out

3: Firefox currently has this restriction, but we are thinking of removing it. As quotas are implementation-defined as per 2, I don't think we would be really disallowing anything. We're just not putting specific solutions in the specification that might not be ideal.

I think this comes to the crux of the disagreement here. I don't think of this as putting things in to the spec. I'm approaching this from the perspective of removing things from the spec. (At least, for whatwg/html#5560.) I'm not very comfortable removing things from the spec without some kind of confirmation, preferably multi-implementer.

4: Does it matter if implementers agree it's a bad model? Exhaustively testing this across all APIs feels like a waste of time if we all want to move past it.

Again, I'd love to see that implementer agreement expressed, if we're going to make normative removals. And I'd like to see tests, at least informal ones, for the specific API for which you're removing the spec text (localStorage/sessionStorage).

5: How is requiring particular UI user-friendly? I'm not sure we even know what the best UI would be here in our future partitioned world.

The text in question does not require a particular UI. It suggests (should) that a UI exist at all.

Again, it's a removal of a normative should statement, and I'd like to see multi-implementer agreement before removing normative statements.

6: Unless we define the cost of each operation exactly we're not going to get interoperability, so I'd rather have it be somewhat vague as I'm not sure we want to commit on how to serialize code units and such.

I still think adding the opposite statement would be better than leaving this ambiguous, especially if we're making a normative change in the spec. But I agree we're not going to get interop, so it's not a big deal.

@annevk
Copy link
Member Author

annevk commented Jul 7, 2020

I think the problem is that you are putting a lot of weight on text that had no implementer commitments or tests to begin with. I agree we should have implementer agreement though, so maybe @inexorabletash can chime in.

1: I suppose we could allow removal for "security reasons", but it seems rather arbitrary. Do we actually have a concrete scenario where a user agent does this?

3: Consider Firefox confirmed.

4: Consider Firefox confirmed. We also only ever had it for IDB as far as I'm aware. I guess it should be fairly easy to write a demo for these APIs though so I'll give that a shot. Edit: https://software.hixie.ch/utilities/js/live-dom-viewer/?saved=8251 and https://software.hixie.ch/utilities/js/live-dom-viewer/?saved=8252. Interestingly enough Safari seems to have a much higher limit for sessionStorage and throws a RangeError when it's reached... Filed https://bugs.webkit.org/show_bug.cgi?id=214037 on that. Nobody shows UI though.

5: It does require a particular UI, it suggests that it should have storage size. The new text simply suggests you should be able to clear sites, not what you have to display about them.

@annevk
Copy link
Member Author

annevk commented Jul 8, 2020

@pwnall @inexorabletash it would be nice to be able to move forward with this. Could you please share your thoughts?

annevk added a commit that referenced this issue Jul 8, 2020
Also make it clear usage and quota for storage shelves are ultimately implementation-defined.

Helps with #95 and #70.

Closes #106.
@pwnall
Copy link
Contributor

pwnall commented Jul 10, 2020

Sorry for the delay here!

(FWIW, tracking multiple points across multiple replies has been difficult for me. Thank you @annevk for introducing a numbering scheme!)

  1. Eviction unit: I think that "Whenever a storage bucket is cleared by the user agent, it must be cleared in its entirety." (from Add hooks for HTML #93) says that the smallest unit of eviction is the bucket. This matches Chrome's perspective -- we think that more fine-grained eviction wouldn't be Web-compatible, and bucket-scoped eviction is Web-compatible. IOW, we think that the current phrasing is the most permissive we can get away with, and we like that.

  2. Total quota restrictions: Chrome agrees with Make it clear that quota is not supposed to be a function of availabl… #108 in that it's not currently safe to expose the amount of free space to applications. We think that web applications should not be more restricted than native applications, so we're happy with the current recommendation that total quota should be less than the total disk space. We don't see a good reason to be more restrictive.

  3. Per-origin / per-site quota restrictions: We don't have a good answer here. (I think that sharing quota usage across origins (even in the same site / eTLD+1 / whatever) would go against the Same Origin Policy, as it would allow an origin to learn information about another origin. At the same time, I think that having per-eTLD+1 quota would be attractive.) We don't see a good reason for the specification to say that implementations "should" mitigate a problem, given that we haven't found a solid mitigation yet. I think we should acknowledge this as an open problem in a spec issue.

  4. Prompts for quota increase: Chrome UX currently thinks that there is no good way to get an informed user decision on increasing quota via a prompt -- we haven't found a message that we'd be happy with. Therefore, we are not prompting, and don't have any plans to implement prompting. We are not opposed to leaving room in the spec for other implementations to prompt -- some experimentation might be good here. From a different angle, all storage APIs with non-trivial quotas are async, so browsers can theoretically prompt at any point. We don't see a good reason to prohibit that behavior in a spec.

  5. Storage management UI: Chrome currently happens to meet the requirement that "User agents should allow users to see how much space each domain is using." That being said, domain seems overly specific -- I don't have a good understanding of why "domain" is better than "origin", or "host", or "(first-party origin, third-party origin)". Chrome is not willing to commit to following this recommendation in the future. I think that the golden age of Web storage is yet to come, so I don't think we have the experience to be prescriptive around the right granularity for a storage management UI.

  6. Measuring quota usage: Chrome currently measures the space used by the underlying storage implementation. This isn't even guaranteed to be consistent across Chrome versions, as we may switch between internal representations, adopt better compression, etc. We think the spec should currently document this reality -- usage is implementation-specific. At the same time, we recognize the benefits of achieving interoperability here -- we received many developer complaints about the lack of predictability in this area. We would be willing to invest in a predictable, cross-browser quota measurement system, if the other engines would be likely to adopt it as well. This would be a significant undertaking for us, and it's only worthwhile if we achieve cross-browser interop.

I tried to separate clear Chrome position (Chrome / we) from the parts that may only represent personal opinion (I). Sorry for the shifts in voice resulting from this. Please don't hesitate to ask follow-up questions if my answers are not clear, or if I misunderstood the points being discussed here.

@annevk
Copy link
Member Author

annevk commented Jul 10, 2020

Thanks, that helps a lot. I think for 4 browsers aren't really prevented from doing anything if it's not mentioned as browser UI isn't covered and it also seems somewhat weird to mention it given that nobody does anything currently nor is planning to.

5: Very good point, Firefox will most likely also move away from this and probably will also not meet "Instead user agents should offer users the ability to clear all storage for each group of schemelessly same site origins." from the current draft. Maybe we should make it "User agents should offer users the ability to clear storage for websites they visit and may indicate usage for each website." which is sufficiently vague that implementations can do whatever, but still gives some guarantees to users (and coupled with the bucket is the smallest unit requirement also some guarantees to developers).

6: I recall that we discussed this at TPAC and I'm similarly interested in tackling this medium/long term. But yeah, specification-wise that would also be a pretty significant undertaking.

@annevk
Copy link
Member Author

annevk commented Jul 10, 2020

5 is now #109. 6 is #110. Feel free to open an issue on 3, not entirely sure what it would say. Anything still concerning @domenic?

@domenic
Copy link
Member

domenic commented Jul 10, 2020

No, I think this covers it. Please re-ping me on anything needs approval; I've lost track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants