KV Data Loading: Commit Time #55

thegreatfatzby · 2024-05-31T04:04:11Z

Will the commit time field be required for updates or deletes?

peiwenhu · 2024-05-31T13:54:50Z

it'll be required for every operation including updates and deletes.

For reasons such as 1. the pubsub-based data delivery is not in-order 2. internal optimization for file-based data reading may make reading out-of-order, we need to depend on some client-defined time to determine a deterministic order, rather than some server run-time decision.

thegreatfatzby · 2024-05-31T20:44:46Z

I see, so micro and macro question:

Will this be required per row? Or could it somehow be inferred per batch, like from a file, a file name, or other metadata?
Will this type of loading be the only type supported? This isn't required on all cacheing or other data storage solutions...some of them sort of "replicate" that (pun intended) internally but don't require it as part of the API, and leave some of that ordering to the client, which is not unreasonable.

thegreatfatzby · 2024-06-02T00:04:09Z

Also @truemike and @swapnilpandit

peiwenhu · 2024-06-03T18:50:20Z

Will this be required per row? Or could it somehow be inferred per batch, like from a file, a file name, or other metadata?
Yes. It is required per row.

Technically it could also infer from elsewhere but we try to keep things simple unless there is a strong reason. Given that we already have 2 ways to ingest data (pubsub, fs) and 2 data formats (Avro, Riegeli), and we may have more ways in the future, we want to keep the feature matrix as simple as possible.

Will this type of loading be the only type supported?

We're open to suggestions but this is the only type supported as of now. We design within the constraints of TEE, which does not persist data across machine restarts, which makes it really hard to make the KV server as the source of truth of the data, for 1. decisions made by the KV server cannot persist across restarts without great complexity 2. consensus algorithm to make such decisions is also hard due to the constraints so it's much easier for each server to operate independently. Therefore it's much cleaner to let the client control this aspect. I suspect the other caching/storage solutions don't need to worry about this as much as we do. But it's always nice to find some inspiration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV Data Loading: Commit Time #55

KV Data Loading: Commit Time #55

thegreatfatzby commented May 31, 2024

peiwenhu commented May 31, 2024

thegreatfatzby commented May 31, 2024

thegreatfatzby commented Jun 2, 2024

peiwenhu commented Jun 3, 2024

KV Data Loading: Commit Time #55

KV Data Loading: Commit Time #55

Comments

thegreatfatzby commented May 31, 2024

peiwenhu commented May 31, 2024

thegreatfatzby commented May 31, 2024

thegreatfatzby commented Jun 2, 2024

peiwenhu commented Jun 3, 2024