Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft spec #93

Merged
merged 27 commits into from
Feb 28, 2023
Merged

Draft spec #93

merged 27 commits into from
Feb 28, 2023

Conversation

xyaoinum
Copy link
Collaborator

No description provided.

@xyaoinum
Copy link
Collaborator Author

@domenic PTAL. Thanks!

Copy link

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of work to do here, it looks like! I'll try to do a fast review turnaround time from now on, as this is just a first round of comments and I think we will have many more rounds to go.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
};
</pre>

A <dfn>taxonomy</dfn> comprises a list of advertising <dfn>topics</dfn>. A [=taxonomy=] is identified by a <dfn>taxonomy version</dfn>, and a [=topic=] is identified by a <dfn>topic id</dfn>.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to be more specific about the structure of any of these strings? E.g. are the versions numeric and increasing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can require the versions to be integers (in string format/type).

The term “increasing” seems ambiguous to me. Do you mean increase with the increase of browser versions? But regardless, I think things would be more flexible to not require this. e.g. the browser could experiment with multiple versions, and settle on one version at a later time.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
1. If the iframe's embeding document is not in [=secure context=]:
1. Return false.
1. For each feature |F| in {"browsing-topic", "interest-cohort"}:
1. Run the <a href="https://www.w3.org/TR/permissions-policy-1/#define-inherited-policy-in-container">inherited policy for feature in container at origin</a> algorithm with input arguments |feature|: |F|, |origin|: the navigation URL origin, <var ignore=''>container</var>: the iframe container. If the algorithm returns "Disabled":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Bikeshed cross-linking instead of manual <a>.

Call algorithms using the usual syntax https://infra.spec.whatwg.org/#algorithm-params , not this colon-separated thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't figure out how to cross-link to a header section without doing it manually. Could you elaborate?

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@xyaoinum
Copy link
Collaborator Author

Hi @domenic , I've addressed / responded to your comment. Some of them are obsolete (e.g. comments regarding navigations, as we tentatively decided to not send topics headers in navigation requests). PTAL. Thanks!

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@xyaoinum xyaoinum requested a review from domenic January 13, 2023 05:56
Copy link

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the rigor! Biggest issues are around time handling.

spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@xyaoinum
Copy link
Collaborator Author

@domenic: I addressed / responded to your comments. PTAL again. Thanks!

@xyaoinum xyaoinum requested a review from domenic January 25, 2023 01:33
@domenic
Copy link

domenic commented Jan 27, 2023

Heya, I'm sorry to say I wasn't able to get to this over the last two days, and I'm about to head out on a two-week vacation. If there's some urgency in getting something landed before then, please feel free to merge and I can do a review after getting back. Otherwise, I'll try to keep this near the top of my queue then.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
1. Let |now| be the [=monotonic clock=]'s [=monotonic clock/unsafe current time=].
1. If either user agent's [=user agent/model=] or [=user agent/taxonomy=] isn't available:
1. Let |epoch| be an [=epoch=] struct with default initial field values.
1. Set |epoch|'s [=epoch/time=] to |now|.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type error: the time is a "DOMHighResTimeStamp", but |now| is an unsafe moment. If you change the epoch's time, you'll have to push that through to several places that compare it against a duration or that concatenate it with strings. If you change |now|'s type, you'll need to note what duration the epoch's time is measuring, and again make sure that's consistent in all the places you're using it.
You can use 'coarsen' directly without having an environment settings object, if you don't care about the sub-millisecond resolution that it removes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If you change |now|'s type, you'll need to note what duration the epoch's time is measuring". Could you clarify? I changed |now| to the "coarsen" of wall clock’s current time, and then it seems all the time references are consistent & comparable. I didn't see any notion of "time origin" in those process (or are they implicitly from "Unix epoch"?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured I just shouldn’t use any of "the document’s relevant settings object's current relative timestamp" as those times aren’t performance related and they won't be compared within the page. Instead, I'm using a single consistent way to get the current time, and keep using Unix epoch as the time origin. PTAL again.

spec.bs Outdated Show resolved Hide resolved
@xyaoinum
Copy link
Collaborator Author

xyaoinum commented Feb 6, 2023

@jyasskin: I've addressed your comments. PTAL again. Thanks!
@domenic: there's no urgency in getting this landed. So I'll wait for your feedback :)

Copy link

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Mostly editorial feedback. I'll give this the green checkmark, but also feel free to request re-review.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
<h2 id="user-agent-associated-state-header">User agent associated state</h2>
Each [=user agent=] has an associated [=browsing topics types/user topics state=] <dfn for="user agent">user topics state</dfn> with [=user topics state/epochs=] initially empty, and [=user topics state/hmac key=] initially a randomly generated 128 bit number.

Each [=user agent=] has an associated <dfn for="user agent">topics history storage</dfn> to store the information about the visited pages that are needed for topics calculation. It is a [=list=] of [=topics history entries=], initially empty.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but based on how I've seen this used in several algorithms, it seems like it'd be better as a [=map=] from document ID strings to topics history entries, and in that case I don't think you need the document ID to be stored inside the topics history entry struct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we want to have 2 indexes: on "document id" and on "time" respectively, to allow quick search based on each field. The current way I put it, it ignores the performance aspect, and my intent is for the browser vendor to figure out how to optimize things.

Letting it be a map from "document id" to "topics history entry" will make searching-based-on-document-id faster, but it still won’t improve the performance when searching based on a time-range.

Perhaps we should use the database terminologies to describe the storage format & indexes? If possible, I’d still prefer keeping things simple and leave the performance optimization to the browser vendor.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't focused on performance, just on how it would make the spec prose a bit simpler. But yeah, it's fine to leave it as a flat list if you think that reflects the underlying model better.

spec.bs Outdated
<div algorithm>
To <dfn>append or modify a request \`<code>Sec-Browsing-Topics</code>\` header</dfn>, given a [=request=] |request|, run these steps:
1. If |request|'s [=request/send browsing topics header boolean=] is not true, then return.
1. [=header list/Delete=] \`<code>Sec-Browsing-Topics</code>\` from |request|’s [=header list=].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be necessary. No other header-setting spec does this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Client Hints spec has similar mechanism https://wicg.github.io/client-hints-infrastructure/#abstract-opdef-remove-client-hints-from-redirect-if-needed. Basically the header is handled on a per-(redirected)-request level. Pervious state shouldn’t persist across redirects.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the intent is that if https://example.com/a redirects to https://example.com/b, then fetch("https://example.com/a", { browsingTopics: true }) should send different Sec-Browsing-Topics: ... values to /a vs. /b, based on variations like the timestamp, random 5% chance, etc.?

If so, including that context as a <p class="note"> would be helpful.

1. Let |topicItem| be a Structured Fields <a href="https://www.rfc-editor.org/rfc/rfc8941.html#name-integers">Integer</a> with value |topic|["{{BrowsingTopic/topic}}"].
1. Let |topicParameters| be an empty Structured Fields <a href="https://www.rfc-editor.org/rfc/rfc8941.html#name-parameters">Parameters</a>.
1. Set |topicParameters|["<code>version</code>"] to |topic|["{{BrowsingTopic/version}}"].
1. Set |topicParameters|["<code>config_version</code>"] to |topic|["{{BrowsingTopic/configVersion}}"].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about this snake_case capitalization. I opened an issue to see if we can get some clarity and documentation for the future: w3ctag/design-principles#422

No need to change anything now, but something we should keep an eye on as the spec proceeds.

spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@xyaoinum
Copy link
Collaborator Author

@domenic Thanks for the feedback. I've addressed and responded to your comment. PTAL again. Thanks!

Copy link

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only potentially-substantive issue left is around my understanding of the header and redirects.

Other issues are editorial nits. Note that in most cases I just pointed out one instance of the general issue, so just accepting the change will not address it; you'd need to do more global find-replaces.

spec.bs Outdated

<div algorithm>
A {{BrowsingTopic}} dictionary |a| is <dfn id=browsing-topics-dictionary-less-than-comparator for="browsing-topic">code unit less than</dfn> a {{BrowsingTopic}} dictionary |b| if the following steps return true:
1. If |a|["{{BrowsingTopic/topic}}"] < |b|["{{BrowsingTopic/topic}}"], then return true.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. If |a|["{{BrowsingTopic/topic}}"] < |b|["{{BrowsingTopic/topic}}"], then return true.
1. If |a|["{{BrowsingTopic/topic}}"] is [=/code unit less than=] |b|["{{BrowsingTopic/topic}}"], then return true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The {{BrowsingTopic/topic}} isn't a string. Can we still say [=/code unit less than=]?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right, this is my bad. Please ignore.

spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
<div algorithm>
To <dfn>append or modify a request \`<code>Sec-Browsing-Topics</code>\` header</dfn>, given a [=request=] |request|, run these steps:
1. If |request|'s [=request/send browsing topics header boolean=] is not true, then return.
1. [=header list/Delete=] \`<code>Sec-Browsing-Topics</code>\` from |request|’s [=header list=].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the intent is that if https://example.com/a redirects to https://example.com/b, then fetch("https://example.com/a", { browsingTopics: true }) should send different Sec-Browsing-Topics: ... values to /a vs. /b, based on variations like the timestamp, random 5% chance, etc.?

If so, including that context as a <p class="note"> would be helpful.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated

In today's web, people’s interests are typically inferred based on observing what sites or pages they visit, which relies on tracking techniques like third-party cookies or less-transparent mechanisms like device fingerprinting. It would be better for privacy if interest-based advertising could be accomplished without needing to collect a particular individual’s browsing history.

This specification provides an API to enable ad-targeting based on the people’s general browsing interest, without exposing the exact browsing history.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a person's general browsing interests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the exact browsing history/their exact browsing history/

Copy link
Collaborator

@jkarlin jkarlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, here are some thoughts:

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated

The [=user agent/taxonomy=] and [=user agent/model=] may be shipped to the browser asynchronous w.r.t. the browser release, and may be unavailable at a given point. They must be updated atomically w.r.t. algorithms that access them (e.g. the [=calculate user topics=] algorithm).

Note: The <a href="https://github.com/patcg-individual-drafts/topics/blob/main/taxonomy_v1.md">initial taxonomy</a> for Chrome experimentation (i.e. with [=user agent/taxonomy version=] "1") includes hundreds of topics. The taxonomy to be used long term is TBD (<a href="https://github.com/patcg-individual-drafts/topics/issues/3">github issue</a>).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to mention Chrome in a spec. Perhaps instead say that the initial taxonomy used by the API is taxonomy_v1.md and the expectation is that it will change over time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
</section>

<section>
<h2 id="meeting-the-privacy-goals-header">Meeting the privacy goals</h2>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to "Privacy Considerations"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

spec.bs Outdated

<section>
<h2 id="meeting-the-privacy-goals-header">Meeting the privacy goals</h2>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my suggested text for this section:

The Topics API attempts to provide just enough relevant interest information for advertisers to be able to personalize their ads for the user while maintaining user privacy. Some privacy safeguards include: usage in secure contexts only, topic limitation to a human curated taxonomy, different topics given to different sites in the same epoch to prevent cross-site reidentification, noised topics, a limited number of topics provided per epoch, user opt outs, site opt outs, and a suggestion that user agents provide UX to give users choice in which Topics are returned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@jkarlin jkarlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for initial landing. Didn't have a chance to look over the algorithms yet but can do so once it lands as it's easier to review there.

@xyaoinum xyaoinum merged commit d2aa0b8 into patcg-individual-drafts:main Feb 28, 2023
github-actions bot added a commit that referenced this pull request Feb 28, 2023
SHA: d2aa0b8
Reason: push, by xyaoinum

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants