-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore Skip List Algorithm #19
Comments
Preliminary results:
A fast algorithm generally will use a linear/circular buffer for the input values with pointers/indices into the sorted structure and a cache-optimized sorted structure with support for fast O(log n) insertion. Further, the data structure should support quickly finding the next or previous element of the current median, so that the new median can be quickly determined based on insertion position of the new value and removal position of the old value. Based on this, I suggest to use a cache-optimized B-Tree as data structure. A Julian implementation might look like this:
It is hoped that by moving to Arrays, linear searches inside nodes can be made fast due to high memory locality. Keeping the data structure in such a memory local state requires additional overhead (splitting/merging operations), but it is hoped there is enough flexibility in node length to avoid such rebalances most of the time with typical data distributions. A quick implementation will have to be made, followed by benchmarks and profiles against baseline to determine the validity of these assumptions. |
I created a prototype BTree implementation. After removing type instabilities but not optimizing the algorithm it performs about 4x slower than baseline (FastRunningMedian) while scaling worse. The main slowdown comes from:
Searching in the tree is less of a problem, whereas it was the main problem with SkipLists. It is known that BTrees are relatively good at reading and worse at writing than some other data structures. For possible improvements with modified data structures, I plan to read this paper: http://supertech.csail.mit.edu/papers/BenderFaJa15.pdf EDIT: I have a new idea for optimizing: Let's make insertion and deletion lazy by letting leaf nodes be unsorted. |
See https://stackoverflow.com/a/10696252
Is there literature on that? Could it be faster? Does it allow for arbitrary percentiles like SortFilters?
The text was updated successfully, but these errors were encountered: