Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV Data Loading: Logging #56

Open
thegreatfatzby opened this issue May 31, 2024 · 2 comments
Open

KV Data Loading: Logging #56

thegreatfatzby opened this issue May 31, 2024 · 2 comments

Comments

@thegreatfatzby
Copy link

For production use, will the application do full/normal logging specifically for data loading? I would this would be OK from a privacy perspective as the entity loading the data is the ad tech, so they can't gain any new information from data load metrics, and from an operational perspective it will happen that data loads fail for odd reasons and you'll want to get the "failed rows", failed reasons, etc.

@peiwenhu
Copy link
Collaborator

Yes for logic that process requests after the requests are decrypted, it requires certain protections

For logic unrelated to processing requests, they are considered "safe" and logs/metrics can be exported as-is.

Btw for data loading failures, we're interested to hear what you think the requirements are for handling row failures: other than skipping the row and logging/recording a metric, do you expect other error handling behaviors such as only committing a whole file or a group of rows in the file if all rows are successfully read?

@thegreatfatzby
Copy link
Author

thegreatfatzby commented May 31, 2024

@peiwenhu interesting question indeeds, I'd say there's no one right answer there for a generic tool, allowing skipping and logging of bad rows I think will likely be important, some configureability seems warranted.

Without logging it takes quite the Jedi to find bad rows. Given this data is onboarded by the ad tech, telling them which rows were rejected seems safe to me.

For stopping the entire file or some other type of atomicity, you can definitely see both cases (i.e. some data sets if a row is bad you want to move on and not stop the train, for others it's real important to get some changeset atomically). In theory if you only support skip, clients can adjust to that with some costs.

Also going to ping some other experts here @swapnilpandit and @truemike and others once I find their handles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants