-
Notifications
You must be signed in to change notification settings - Fork 475
env server/client #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: multi-env-eval+dataset-builder
Are you sure you want to change the base?
env server/client #744
Conversation
8d79a1d to
d8cf093
Compare
| self, | ||
| input: RolloutInput, | ||
| client: AsyncOpenAI, | ||
| client_config: ClientConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we shouldn't have some kind backward compatbility and still accept client as input but failed in the case of runtime ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, we could support this but it would always come at the cost of the EnvClient not being a drop-in replacement for an Environment because we cannot mirror the API with client as it's not serializable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer at least initially not breaking the rollout API -- some integrations (e.g. Tinker) use this directly, we'll need to do some other changes here around non-OpenAI client types like Anthropic, which can get subsumed by a generic Client / ClientConfig type eventually.
It's nice for users to be able to play around with envs in scripts/notebooks where the rollout method can be used directly, and IMO we should still support this with a generic OpenAI client.
My preference would be to handle it the same way we're doing DatasetBuilder, where we have a union type + keep the old var name back-compatibility, and check the type where relevant. We can allow certain code paths (prime-rl orchestrator, vf-eval) to fail if a generic client is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was also thinking about Union type here so that we can do both, we can also introduce an new method that only take client config and let this one for backward compatiblity
| @@ -0,0 +1,3 @@ | |||
| from verifiers.workers.client.zmq_env_client import ZMQEnvClient | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if client name is not missleading with open ai client. Do we expect user to instantiate a client or should be handle underthehood by a the load_enviornment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would expect users to instantiate a client/server. imo naming is pretty clear here, it's called exactly what it is: a client to interface with an environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env servers wrap load_environment (and hence Environment), clients are used to interface with those env servers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay sg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which users here? IMO the spawning of clients/servers should always be automated by entrypoints like vf-eval / orchestrator. both can pull in info from configs, create their own clients, and spawn + connect to as many servers as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yea, im considering myself a user of verifiers when developing prime-rl haha. i agree, the user who is only running commands is blind to this and should never have to spawn an env server themselves
| ) | ||
|
|
||
|
|
||
| class EnvClient(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use Protocol here over ABC unless we have a lot of shared code. But not deal breaker can also stick with abc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will check if possible. i think this might work for the client, but unlikely for the server
Description
This PR introduces the
EnvClientandEnvServerwhich are a drop-in replacement for executing environments in a separate process (pool). This is especially useful for multi-env training (e.g. inprime-rl) and evals (e.g. viavf-evalor in online evals during training). A couple of notes on design decisions:EnvClientmirrors the in-processEnvironmentfor the most common public facing methods, such asrun_rollout,run_group,generateandevaluateExample
The env server pattern is integrated into
vf-evalto sidecar an env serverDesign
EnvServer
A
EnvServeris initialized like a regular environment with anenv_idandenv_argsEnvClient
A
EnvClientcommunicates with a env server over the configuredaddressSidecar Pattern
To sidecar an env server (e.g. from
vf-eval) simply wrap therun_serverclass method in aProcessand connect the client to the same addressBreaking
client_config: ClientConfiginstead ofclient: AsyncOpenAIto public-facing methods. This is because clients are not serializable, so there is no way to mirror theEnvironmentAPI otherwiseAsyncContextManagersare not serializable. We can prob define an env-level gen+score concurrency limits that is enforced across all calls but it's still breakng in the user forrun_rolloutandrun_groupTODOs
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes