Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe v2: support for filtered_index_values #7589

Merged
merged 3 commits into from
Oct 7, 2024

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Oct 4, 2024

Title.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@teh-cmc teh-cmc added enhancement New feature or request 🔍 re_query affects re_query itself do-not-merge Do not merge this PR include in changelog labels Oct 4, 2024
Base automatically changed from cmc/dataframev2_tests to main October 4, 2024 10:08
@teh-cmc teh-cmc force-pushed the cmc/dataframev2_filtered_index_values branch from be7eaaf to 12852da Compare October 4, 2024 10:09
@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Oct 4, 2024
@@ -527,6 +527,13 @@ impl QueryHandle<'_> {
.min_by_key(|streaming_state| streaming_state.index_value)
.map(|streaming_state| streaming_state.index_value)?;

if let Some(filtered_index_values) = self.query.filtered_index_values.as_ref() {
if !filtered_index_values.contains(&cur_index_value) {
self.increment_cursors_at_index_value(cur_index_value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're not optimizing yet... but it seems like this could take an optional next_index_value which we can easily look up from filtered_index_values. Then we could just increment repeatedly in the inner increment as long as we're less than next_index_value.

Additionally, if we made increment_cursors_at_index_value() return the minimum value across all the chunks, we could loop here and basically guarantee that the call to next_row() will end up with a value included in filtered_index_values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index sampling and clear/tombstone support first -- I don't want to optimize myself into a corner because of weird interactions with other features...

@teh-cmc teh-cmc merged commit bed792e into main Oct 7, 2024
33 of 34 checks passed
@teh-cmc teh-cmc deleted the cmc/dataframev2_filtered_index_values branch October 7, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request include in changelog 🔍 re_query affects re_query itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants