-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add virtualStorageFabricEvictImmediately config. #18871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The purpose of this config is to enable using SegmentLocalCacheManager for loading segments on MSQ worker tasks, where segments are not assigned by load/drop rules, and where there is not generally a specific maxSize configured for the local cache. We need to evict segments immediately so local disks don't fill up. The main changes: 1) In StorageLocation, update the releaseHold runnable to check for evictImmediately. If it is set, unmount the cache entry if all holds have been released. 2) In SegmentLocalCacheManager, when evictImmediately is set, "mount" sets an onUnmount handler to delete the info file.
This patch integrates MSQ with virtual storage. It also refactors how MSQ reads inputs to give stages more control over how inputs are read and merged. In particular, stages are now able to fully control merging logic. The main changes: 1) Integrate with virtual storage: merge the two DataSegmentProvider impls (Dart and Task) into DataSegmentProviderImpl that relies on SegmentManager. 2) Give stages control over input merging: rework InputSliceReader to return ReadablePartitions directly, without embedding any merging logic. Break out StandardPartitionReader as a separate class. Other changes: 1) Move ReadableInput to the querykit package. It is no longer specific to the MSQ framework. 2) Remove StandardStageProcessor, refactoring dependent code to not require it. 3) Remove ExternalColumnSelectorFactory wrapper. Type casting is now handled directly by RowBasedColumnSelectorFactory. 4) Include full query context in worker context, rather than just a subset. Includes apache#18871.
| // Futhermore, if evictImmediately is set, evict on release if all holds are gone. | ||
| final boolean isMounted = weakCacheEntry.cacheEntry.isMounted(); | ||
| if ((isNewEntry && !isMounted) | ||
| || (evictImmediately && !weakCacheEntry.isHeld())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess the implication here is that segments can only be used once per download (or like, concurrently with multiple holds) if evictImmediately is set, this feels worth a comment somewhere (or rebranding slightly like suggested in other comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the idea is that segments are evicted as soon as they have no holds. I'll do the rebrand.
| /** | ||
| * Current total size of files in bytes, including weak entries. | ||
| */ | ||
| private volatile boolean evictImmediately = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe call this evictImmediatelyOnHoldRelease or like.. burnAfterReading or something to make it more obvious what the behavior is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought evictImmediately was clear, since what else could "immediately" mean beyond "as soon as holds are released"? Will do the rebrand though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i was mostly worried about this causing confusion with reclaim which is an entirely separate eviction process to drop weak references when something else needs the space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, how about 'ephemeral'? maybe something like areWeakEntriesEphemeral or something? also this is fine too though, naming is the worst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me they seem about as good as each other, but if one seems a lot better to you, lmk and I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this but LMK if you want to change it (or just do change it).
| private int virtualStorageLoadThreads = 2 * runtimeInfo.getAvailableProcessors(); | ||
|
|
||
| @JsonProperty("virtualStorageFabricEvictImmediately") | ||
| private boolean virtualStorageFabricEvictImmediately = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if this should this just be set automatically by a module or something instead of a user facing config that needs to be set manually, but this is fine for now we can worry about that later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #18873, which uses this, it is set automatically for tasks. I don't expect an admin to ever want or need to configure this. Will add a comment about that.
The purpose of this config is to enable using SegmentLocalCacheManager for loading segments on MSQ worker tasks, where segments are not assigned by load/drop rules, and where there is not generally a specific maxSize configured for the local cache. We need to evict segments immediately so local disks don't fill up.
The main changes:
In StorageLocation, update the
releaseHoldrunnable to check forevictImmediately. If it is set, unmount the cache entry if all holds have been released.In SegmentLocalCacheManager, when
evictImmediatelyis set,mountsets anonUnmounthandler to delete the info file.