Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saved queries to improve SelectURL budget efficiency #140

Open
jkarlin opened this issue Mar 22, 2024 · 11 comments
Open

Saved queries to improve SelectURL budget efficiency #140

jkarlin opened this issue Mar 22, 2024 · 11 comments

Comments

@jkarlin
Copy link
Collaborator

jkarlin commented Mar 22, 2024

selectURL() has three budgets at the moment.

  1. A per-page-load cap of 6 bits per ad-tech, decremented each time selectURL is called by the ad-tech on the page load
  2. A per-page-load cap of 12 bits per page load across all ad-techs
  3. A per-ad-tech across page budget of 12 bits per day, decremented only when a fenced frame is clicked on (this is meant to be the main budget once fenced frames are enforced)

The first two budgets make the API hard to use before fenced frames are enforced, as they deduct budget regardless of if the user clicks on the frame (since the frame is not a fenced frame). This means that selectURL can only be called a few times (depending on how many urls are in the calls) per ad-tech per page load.

Here is one idea on how to make the per-page budget less onerous, increasing utility while being privacy neutral:

There are some situations in which the caller might ask the same question multiple times. E.g., "is this user in group a or b? Return index 0 if A, and index 1 if B". Additional queries of the same nature do not leak additional information about the user, and should not deduct from budgeting but do today.

My proposal is the following:

Provide an option for a per-document saved query that can be run on multiple sets of urls, and each time will return the url at the same index. The budget will only be deducted once max per saved query.

It would look something like this:

// This first usage of the savedquery will deduct budget, and the browser will 
// remember the association of "control_or_experiment" with the index number 
//returned from the worklet.
const config = await sharedStorage.selectURL('experiment', urls1, {savedQuery: "control_or_experiment", resolveToConfig: true});
document.getElementById("my-fenced-frame1").config = config;

// The second use of savedQuery will return immediately, and not impact the budget.
const config2 = await sharedStorage.selectURL('experiment', urls2, {savedQuery: "control_or_experiment", resolveToConfig: true});
document.getElementById("my-fenced-frame2").config = config2;

The only difference is a savedQuery parameter in selectURL that names the query. On the first call with the savedQuery, budget is deducted per usual from budgets 1+2 and the resulting index number is associated with the name of the saved query. On subsequent calls, no worklet is created, and instead the url is returned at the same index as the previous call. Therefore, you could call the saved query as many times as you like on the page, and they will each consistently return the same index, while only paying the budget once.

@SpaceGnome
Copy link
Contributor

Thanks for the proposal! Is there a TTL on savedQuery or an option to have such a TTL?

@jkarlin
Copy link
Collaborator Author

jkarlin commented Mar 25, 2024

Was thinking of scoping it to a single page or document on a page to start for simplicity. Were you thinking of using it across page loads or even sites?

@bvattikonda
Copy link

We think caching the query will be a great addition to the Shared Storage API and address concerns around limitations imposed by the bit budget of the API. Caching the calls will allow more than one call per page without running into errors. But, the proposed caching mechanism does not address the case where a user might repeatedly refresh the page/visit many pages with the same embedded site and therefore still consume the long term budget. Note that we are assuming that the long term budget will be charged if the fenced frame were to be navigated to the selected URL as opposed to requiring an explicit click as you mention in the initial comment. Please correct me if this interpretation is wrong.

Would the following changes address these concerns?

  • Allow the caller to specify a TTL for which the query is cached.
  • Make the cache available across page loads for the same site.

With these changes if a caller sets a TTL of 1 day, then the site will be consuming the entropy budget once a day allowing them to stay under the 12 bit daily per-site long term budget.

@jkarlin
Copy link
Collaborator Author

jkarlin commented Apr 15, 2024

Note that we are assuming that the long term budget will be charged if the fenced frame were to be navigated to the selected URL as opposed to requiring an explicit click as you mention in the initial comment. Please correct me if this interpretation is wrong.

Sorry, I think that this is a misunderstanding. The long-term budget is only charged when the fenced frame performs a top-level navigation (e.g., a new tab or the entire page, as opposed to the frame, navigates).

Given this, do you still think it's important to have cross-page saved queries?

@bvattikonda
Copy link

With this interpretation of budget, probably not. But, before I confirm, could you please help us understand a few things?

Currently our plan is to use a shared key to join events in the embedding context with the URL returned by selectURL. The embedding context incorporates the shared key into the URLs passed to selectURL. The key -- along with 3 bits of entropy -- is sent to the server when the fenced frame is rendered. If the budget (both short and long term) is only deducted when the user navigates to the fenced frame, what prevents the embedding context from making repeated calls to selectURL and exfiltrating an unlimited number of bits?

@jkarlin
Copy link
Collaborator Author

jkarlin commented Apr 29, 2024

Currently our plan is to use a shared key to join events in the embedding context with the URL returned by selectURL. The embedding context incorporates the shared key into the URLs passed to selectURL. The key -- along with 3 bits of entropy -- is sent to the server when the fenced frame is rendered. If the budget (both short and long term) is only deducted when the user navigates to the fenced frame, what prevents the embedding context from making repeated calls to selectURL and exfiltrating an unlimited number of bits?

The short term budget prevents repeated calls on the same page load (up to 6 bits worth of selectURL calls per page load). Budget is deducted per call to selectURL.

The long term budget deducts budget only on click. The idea being that in the long term, there shouldn't be an immediate network leak when rendering the fenced frame because the fenced frame will have some protection against that.

@SpaceGnome
Copy link
Contributor

Ah thanks for pointing out the long term protections for Fenced frames. For addressing the long term network leak, I see https://github.com/WICG/fenced-frame/blob/master/explainer/use_cases.md mentioning:

The network access being unrestricted is an ongoing technical challenge due to the issue of network timing side channel (described in the explainer https://github.com/WICG/fenced-frame/blob/master/explainer/network_side_channel.md) and we are considering what a long-term solution for this would look like in fenced frames. For the opaque-ads use case, the considerations are either 1) denying any network access (e.g., loaded via navigable web bundles) or 2) network access only allowed to some trusted caching service that promises to only log aggregate data.

For non-ads use cases, do the above two considerations also apply?

Thanks!

@jkarlin
Copy link
Collaborator Author

jkarlin commented May 1, 2024

Yes.

@sanjalijha
Copy link

Hi Josh, this proposal to enable caching on Shared Storage queries would be extremely useful to us. We were wondering what the timeline for this proposal to be implemented is?

Additionally, is it possible to implement "savedQuery" to work across selectURL calls with a different set of "URLs"?

@sanjalijha
Copy link

Hi Josh, our work with Shared Storage is blocked on this proposal. We want to expand our usage to other products that would involve calling selectURL multiple times per page load (> 2), and the use of savedQuery would be essential. Can we get a sense of timeline for caching to be enabled?

@jkarlin
Copy link
Collaborator Author

jkarlin commented Jul 10, 2024

Hey. Yeah, I really think this proposal is important while we have the per-page budget as it's difficult to use the API otherwise. Will get to working on it but I can't yet say when we might expect it to be available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants