Skip to content

WIP distributed Fate#6119

Draft
keith-turner wants to merge 17 commits intoapache:mainfrom
keith-turner:dist-fate
Draft

WIP distributed Fate#6119
keith-turner wants to merge 17 commits intoapache:mainfrom
keith-turner:dist-fate

Conversation

@keith-turner
Copy link
Contributor

@keith-turner keith-turner commented Feb 6, 2026

It has a lot of rough edges, but this branch has a working distributed Fate where multiple processes can run fate operations. The following are the highlights.

  • ManagerWorker : a new server class that runs fate operations. These are currently only started by the new IT. This could be folded into the existing manager.
  • FateManager : This class partitions fate work across ManagerWorker processes. Its currently only run by the new IT, would eventually be run by the manager.
  • MultipleManagerIT : probably the only IT that works. It spins up multiple processes to run fate operations and then starts many threads that do table operations. The table operations are working and fate operation are running in many processes.

The foundation here is fairly solid so far. The TODOs are

  • integerate w/ existing manager code, its all fairly standalone ATM. Currently this disables the manager process from running fate operations. Would need to clean up the existing fate code in the manager.
  • get tests working
  • decide how these new processes should be run
  • Need a RPC broadcast notification mechanism. The manager currently has an in memory notification system where a fate operation can cause the TGW to take action and visa versa. Could do something w/ one way RPCs. This could be an independent set of changes. Not having it just makes things slower.

Need to determine how this will work out. Seems like could do one of the following.

  1. Make every manager process started attempt to be the primary manager and also start up manager worker services. So if five manager processes were started, then one would be primary and all 5 would run fate (possibly other stuff in the future, like distributed compaction coordination or TGW).
  2. Keep the manager and secondary manager the same as it currently is. Have a new manager worker process that user can start.

I am leaning towards #1 because it seems like a simpler user experience. Also it handles starting a single manager process better. It also avoid a secondary manager process that is doing nothing. Although if pursuing #1 would like to keep the code clean w/ strong encapsulation between functional pieces.

Separated out functionality in the manager Splitter class that was only
used by the fate split operation.  This avoids having to expose code
used by TGW to Fate making it easier to execute fate operation outside
the manager.
@keith-turner
Copy link
Contributor Author

Merged #6115 into this to gets splits working in distributed fate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant