Skip to content

Conversation

@mikasenghaas
Copy link
Member

@mikasenghaas mikasenghaas commented Jan 19, 2026

Description

WIP. Based on #734 and #739.

This PR introduces the EnvClient and EnvServer which are a drop-in replacement for executing environments in a separate process (pool). This is especially useful for multi-env training (e.g. in prime-rl) and evals (e.g. via vf-eval or in online evals during training). A couple of notes on design decisions:

  • The EnvClient mirrors the in-process Environment for the most common public facing methods, such as run_rollout, run_group, generate and evaluate
  • The client/server pattern is protocol-agnostic, i.e. the client may communicate with the server via any protocol (gRPC/ZMQ/HTTP/...). For now, only ZMQ is implemented.

Example

The env server pattern is integrated into vf-eval to sidecar an env server

uv run vf-eval gsm8k -n5 -r3 --use-env-server

Design

EnvServer

A EnvServer is initialized like a regular environment with an env_id and env_args

env_server = ZMQEnvServer(
    env_id=args.env_id,
    env_args=args.env_args,
    address=address
)

try:
    await server.run()
finally:
    await server.close()

EnvClient

A EnvClient communicates with a env server over the configured address

env = ZMQEnvClient(address=address)

await env.run_rollout(...) # same as Environment.run_rollout
await env.run_group(...) # same as Environment.run_group
await env.evaluate(...) # same as Environment.evaluate

Sidecar Pattern

To sidecar an env server (e.g. from vf-eval) simply wrap the run_server class method in a Process and connect the client to the same address

env_server = Process(
    target=ZMQEnvServer.run_server,
    args=(config.env_id, config.env_args),
    kwargs=dict(address=address)
)
env_server.start()
env = ZMQEnvClient(address=address)

try:
   results = await env.evaluate(...)
finally:
  env_worker.terminate()
  env_worker.join(timeout=5)
  if env_worker.is_alive():
      env_worker.kill()
      env_worker.join()

Breaking

  • Pass client_config: ClientConfig instead of client: AsyncOpenAI to public-facing methods. This is because clients are not serializable, so there is no way to mirror the Environment API otherwise
  • Can only limit concurrency for an entire rollout for now because the AsyncContextManagers are not serializable. We can prob define an env-level gen+score concurrency limits that is enforced across all calls but it's still breakng in the user for run_rollout and run_group

TODOs

  • Typed inputs/outputs (this required lazily creating clients and removing them from the public-facing API)
  • Integrate with multi-env evals
  • Logging
  • Graceful server termination
  • Unit tests
  • Multi-process environment execution

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

@mikasenghaas mikasenghaas changed the base branch from main to eval-tui January 19, 2026 13:57
@mikasenghaas mikasenghaas changed the base branch from eval-tui to multi-env-eval+dataset-builder January 19, 2026 17:47
self,
input: RolloutInput,
client: AsyncOpenAI,
client_config: ClientConfig,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we shouldn't have some kind backward compatbility and still accept client as input but failed in the case of runtime ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, we could support this but it would always come at the cost of the EnvClient not being a drop-in replacement for an Environment because we cannot mirror the API with client as it's not serializable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer at least initially not breaking the rollout API -- some integrations (e.g. Tinker) use this directly, we'll need to do some other changes here around non-OpenAI client types like Anthropic, which can get subsumed by a generic Client / ClientConfig type eventually.

It's nice for users to be able to play around with envs in scripts/notebooks where the rollout method can be used directly, and IMO we should still support this with a generic OpenAI client.

My preference would be to handle it the same way we're doing DatasetBuilder, where we have a union type + keep the old var name back-compatibility, and check the type where relevant. We can allow certain code paths (prime-rl orchestrator, vf-eval) to fail if a generic client is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was also thinking about Union type here so that we can do both, we can also introduce an new method that only take client config and let this one for backward compatiblity

@@ -0,0 +1,3 @@
from verifiers.workers.client.zmq_env_client import ZMQEnvClient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if client name is not missleading with open ai client. Do we expect user to instantiate a client or should be handle underthehood by a the load_enviornment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would expect users to instantiate a client/server. imo naming is pretty clear here, it's called exactly what it is: a client to interface with an environment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env servers wrap load_environment (and hence Environment), clients are used to interface with those env servers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay sg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which users here? IMO the spawning of clients/servers should always be automated by entrypoints like vf-eval / orchestrator. both can pull in info from configs, create their own clients, and spawn + connect to as many servers as needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yea, im considering myself a user of verifiers when developing prime-rl haha. i agree, the user who is only running commands is blind to this and should never have to spawn an env server themselves

)


class EnvClient(ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use Protocol here over ABC unless we have a lot of shared code. But not deal breaker can also stick with abc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will check if possible. i think this might work for the client, but unlikely for the server

@mikasenghaas mikasenghaas changed the title env worker env server/client Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants