Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLEDGE: restricted reporting, training of bidding models #93

Closed
jonasz opened this issue Jan 28, 2021 · 12 comments
Closed

FLEDGE: restricted reporting, training of bidding models #93

jonasz opened this issue Jan 28, 2021 · 12 comments
Labels

Comments

@jonasz
Copy link
Contributor

jonasz commented Jan 28, 2021

Hi Michael,

Thank you for sharing FLEDGE, the proposal looks very promising and we are committed to taking part in the proposed experimentation. After initial analysis, there is one important issue that we would like to raise.

FLEDGE proposes a temporary event-level reporting mechanism based on report_win, a js function supplied by the buyer.

We are concerned about our ability to optimize our bidding models (for example click-through) in the proposed specification, which restricts the kind of data that may be reported. (For example: report_win has no access to user_bidding_signals and to prev_wins.)

To effectively train our models, we need a reporting mechanism that is based on all the signals that are used by our bidding function (relevant issue: #54).

The explainer mentions that:

Once we have a trusted reporting mechanism, we can consider allowing the reports to be influenced by the interest group's user_bidding_signals.

From our perspective such a reporting mechanism with access to user_bidding_signals (and all other signals) is critical before third party cookies are phased out. Otherwise we will have no way of training our models, and that will annul the usefulness of FLEDGE.

I was wondering, is it safe to assume such a reporting mechanism will be available before third party cookies are dropped? Or perhaps should we look for some ways to extend the temporary reporting mechanism?

Best regards,
Jonasz

@michaelkleber
Copy link
Collaborator

Hi Jonasz,

This is an interesting question. In the relevant section of your original Outcome-based TURTLEDOVE write-up, you said:

  • userSignals: We propose to allow the Bidder to store custom bidding signals during a call to joinAdInterestGroup.

We stress that these signals would be kept browser side and used solely for bidding. They would never be shared with publishers or included in network requests, and are in line with the TURTLEDOVE's current privacy guarantees.

What ideas did you have for addressing this optimization problem in the Outcome-based approach, if the user signals are kept browser side, used solely for bidding, and never included in network requests?

Since the actual bid is available at reporting time, there is of course some ability to approximate what the hidden on-browser signals could have been. But of course any user-specific data that we add creates a tracking risk.

@jonasz
Copy link
Contributor Author

jonasz commented Jan 29, 2021

Hi Michael,

At the time the problem of optimizing models was an open question (and it still very much is). We cannot say we had a clear solution in mind for that.

In the recent months we were hoping the multi-browser aggregation service with access to all bidding signals could be the solution that would allow us to optimize models in a safe, private way (let's call this approach "unrestricted aggregates").

So the intent behind the original issue is: "if we are heading towards unrestricted aggregates, but with an interim restricted-reporting phase, this is a huge issue for us". We would likely have to rethink and rebuild significant parts of our system just for this interim phase.

To clarify, we still don't need to and don't want to share userSignals with anyone - what we really need is to optimize models. While we were hoping to use the general aggregate reporting mechanism for that, this needs not to be the case. Model optimization is much different from other reporting use cases (like accounting), and is open to techniques like browser-side sampling, adding noise, browser-side aggregation, and potentially more.

To prepare for FLEDGE, it is important for us that we understand:

  • Is the approach of unrestricted aggregates still the end goal?
  • If so, would it be available before 3p cookies are dropped?
  • If not, what other mechanism could we design to facilitate model optimization? Do we need separate mechanisms for optimization and for other reporting use cases?
  • Do we need to have a simpler, interim solution? Could we adopt it to allow better model optimization?

That's a lot of questions, and I understand the timeline is narrow and the problem is complex. Perhaps this would be a good topic to discuss in a video conference?

Best regards,
Jonasz

@michaelkleber
Copy link
Collaborator

I agree, we should talk about this in a live meeting, preferably one where @csharrison is present as well.

Could you explain what you mean by "the approach of unrestricted aggregates" though? Are you referring to something like SPURFOWL, or a different idea?

@jonasz
Copy link
Contributor Author

jonasz commented Feb 1, 2021

In my view SPURFOWL is orthogonal to this question. By "unrestricted" I mean "with no restrictions as to which signals can be used to build a report from within bidding_fn". In my original understanding, such "unrestricted reports" would later be available as aggregates, thus "unrestricted aggregates".

I agree, we should talk about this in a live meeting, preferably one where @csharrison is present as well.

I was wondering, should we use the scheduled FLEDGE call slot (#88), or perhaps organize a separate call?

@jonasz
Copy link
Contributor Author

jonasz commented Feb 4, 2021

Hi Michael,

Thanks for the brief clarification during the FLEDGE call - as I understand, the target reporting mechanism is still an open question, and so is the timeline.

We'd like to propose a minor yet powerful extension to the temporary event-level reporting:

  • Extend the specification of generate_bid to also return a custom string for reporting (custom_reporting_token).
  • The browser would only allow the custom_reporting_token to be reported if it meets a certain popularity threshold. It could be reported separately from all other information currently described (interest group name, contextual information), and it could be reported with a delay, if needed (exact details TBD).

Some thoughts:

  • This approach would be a flexible and powerful tool for model optimization (conditional on FLEDGE: reporting clicks #99) and ad hoc / custom reporting. The custom_reporting_token would be dynamically created in generate_bid, and would allow us to optimize the usage of all bidding signals. From our perspective, this would be a critical improvement over the current spec.
  • The report is gated behind a popularity threshold, and potentially decorrelated from other reported signals, in line with TD privacy goals.
  • From the technical perspective it should be a minor extension - the popularity check could use the same infrastructure as the interest group check.

This is just a basic high level idea, the final design would require some more work. (For example, it may be useful to specify fallback tokens in case custom_reporitng_token is too rare, or to allow for reporting multiple tokens in an uncorrelated fashion.)

Please let me know what you think!

Best regards,
Jonasz

@michaelkleber
Copy link
Collaborator

Ah, interesting thought!

It seems like the k-anonymity constraint would need to be applied to the complete set of information that can flow from the bidding function to the reporting environment. So if the whole tuple (render_url, interest_group_name custom_reporting_token) is shared by enough different people, we could allow it to flow into reporting.

Does that match your thinking?

From the implementation point of view, this is a somewhat different use of the k-anonymity infrastructure since it's more real-time. But in principle it makes sense to me.

@jonasz
Copy link
Contributor Author

jonasz commented Feb 10, 2021

We were thinking about keeping the current specification of report_win unchanged, and report custom_reporting_tokens completely separately.

Conceptually, that could be thought of as two independent reporting channels:

  • report_win, for campaign monitoring and budget controls. (Synchronous, event level.)
  • custom_reporting_token for model optimization. (Gated on popularity threshold, reported in isolation, potentially delayed and with noise.)

This way, we are hoping to have a popularity threshold on custom_reporting_token alone, not on the tuple (render_url, interest_group_name, custom_reporting_token).

From the implementation point of view, this is a somewhat different use of the k-anonymity infrastructure since it's more real-time. But in principle it makes sense to me.

Note that custom_reporting_token may be sent with a delay (this way it would also be uncorrelated from information available through report_win).

@michaelkleber
Copy link
Collaborator

Ah! Got it, I definitely misunderstood at first. Much easier to offer this signal if it's not joined with the event-level data.

It sounds to me like you want the custom_reporting_tokens to essentially pass through the Aggregated Reporting mechanism that we've been designing. This is essentially the long-term plan: the logging code should get access to all the on-device data as input, and should be constrained to only use Aggregated Reporting as output.

@csharrison and @shivanigithub, FYI for worklets using aggregation.

@jonasz
Copy link
Contributor Author

jonasz commented Feb 12, 2021

Do you think it feasible to support such a custom_reporting_token before 3p cookies are dropped?

This is a critical question for us, and if the Agg Service is not ready by that time, perhaps FLEDGE could temporarily reuse the popularity-counting infrastructure to provide (basic) support for custom_reporting_token?

@michaelkleber
Copy link
Collaborator

Yes! I do think we will make this available before 3p cookies go away.

@Pl-Mrcy
Copy link

Pl-Mrcy commented Feb 25, 2021

Tying this conversation with other declarations. I am now confused about the timeline, particularly, related to what is going to be available for reporting.

On Feb 17, during the WICG call, @michaelkleber, you said:

There is a lot of discussions to still have about the Aggregate Reporting API. Won’t be in place by the time 3rtd party cookies go away.

Yet, in this issue, you said:

Yes! I do think we will make this available before 3p cookies go away.

​What "this" refers to is quite unclear to me.

My understanding is the following:

  1. Until the end of third party cookies support in Chrome: we can use the third-party cookies to run advertising as usual.
    Before this deadline, FLEDGE will get in test mode. We can test FLEDGE, with event-level reporting, using report_win, and potentially report_loss as detailed in the explainer. At this stage the user_bidding_signals are not accessible in any way (they don't come as inputs of report_win).
    At some point during this phase, Chrome will implement and release something close to what was proposed by @jonasz: a separate report with a custom_reporting_token.
  2. After the deadline, we get to use FLEDGE at full scale in the state described just above.
  3. Later on, at a yet undisclosed date, Chrome will release its reporting APIs. This would mean the end of the event-level reporting for FLEDGE. As of this date, report_win would access all on-device data available and the output would be reported to the advertiser through the reporting APIs. At this stage, the custom_reporting_token will thus not be useful anymore.

If my understanding is correct, this means that advertisers would have to build a different bidding model optimization based on custom_reporting_token that would serve for the intermediary phase 2 only. It would be interesting to know how long this phase is going to last.

Could you clarify the timeline with distinct periods and what will be available for reporting during each period?

@JensenPaul
Copy link
Collaborator

custom_reporting_token support was added to the explainer in #558.
I believe the timeline for the various reporting mechanisms was clarified in https://developer.chrome.com/docs/privacy-sandbox/fledge-api/feature-status/
If you have further questions, feel free to reopen this issue or file another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants