[WIP] Refactor Model Tests #12822

DN6 · 2025-12-11T06:16:38Z

What does this PR do?

Following the plan outlined for Diffusers 1.0.0, this PR introduces changes to our model testing approach in order to reduce the overhead involved in adding comprehensive tests for new models and standardize tests across all models.

Changes include

Introducing feature specific tester Mixins and marks for models (breaking up the very large ModelTesterMixin class)
Introduce new test file structure using Config + Mixin pattern
New markers for selective test execution
Adds a ulility script (generate_model_tests.py) to automatically generate tests based on the model file. Also provide a flag that allows us to include any optional features to test e.g. (we can turn this into a bot down the line)

I've only made changes to Flux to make this PR easy to review. I'll open follow ups in phases for the other models once this is approved.

python utils/generate_model_tests.py src/diffusers/models/transformers/transformer_qwenimage.py

Will now generate a template test file that can be populated with the necessary config information

# coding=utf-8
# Copyright 2025 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch

from diffusers import QwenImageTransformer2DModel
from diffusers.utils.torch_utils import randn_tensor

from ...testing_utils import enable_full_determinism, torch_device
from ..test_modeling_common import LoraHotSwappingForModelTesterMixin
from ..testing_utils import (
    AttentionTesterMixin,
    ContextParallelTesterMixin,
    LoraTesterMixin,
    MemoryTesterMixin,
    ModelTesterMixin,
    TorchCompileTesterMixin,
    TrainingTesterMixin,
)


enable_full_determinism()


class QwenImageTransformerTesterConfig:
    model_class = QwenImageTransformer2DModel
    pretrained_model_name_or_path = ""
    pretrained_model_kwargs = {"subfolder": "transformer"}

    @property
    def generator(self):
        return torch.Generator("cpu").manual_seed(0)

    def get_init_dict(self) -> dict[str, int | list[int]]:
        # __init__ parameters:
        #   patch_size: int = 2
        #   in_channels: int = 64
        #   out_channels: Optional[int] = 16
        #   num_layers: int = 60
        #   attention_head_dim: int = 128
        #   num_attention_heads: int = 24
        #   joint_attention_dim: int = 3584
        #   guidance_embeds: bool = False
        #   axes_dims_rope: Tuple[int, int, int] = <complex>
        return {}

    def get_dummy_inputs(self) -> dict[str, torch.Tensor]:
        # forward() parameters:
        #   hidden_states: torch.Tensor
        #   encoder_hidden_states: torch.Tensor
        #   encoder_hidden_states_mask: torch.Tensor
        #   timestep: torch.LongTensor
        #   img_shapes: Optional[List[Tuple[int, int, int]]]
        #   txt_seq_lens: Optional[List[int]]
        #   guidance: torch.Tensor
        #   attention_kwargs: Optional[Dict[str, Any]]
        #   controlnet_block_samples
        #   return_dict: bool = True
        # TODO: Fill in dummy inputs
        return {}

    @property
    def input_shape(self) -> tuple[int, ...]:
        return (1, 1)

    @property
    def output_shape(self) -> tuple[int, ...]:
        return (1, 1)


class TestQwenImageTransformerModel(QwenImageTransformerTesterConfig, ModelTesterMixin):
    pass


class TestQwenImageTransformerMemory(QwenImageTransformerTesterConfig, MemoryTesterMixin):
    pass


class TestQwenImageTransformerAttention(QwenImageTransformerTesterConfig, AttentionTesterMixin):
    pass


class TestQwenImageTransformerTorchCompile(QwenImageTransformerTesterConfig, TorchCompileTesterMixin):
    different_shapes_for_compilation = [(4, 4), (4, 8), (8, 8)]

    def get_dummy_inputs(self, height: int = 4, width: int = 4) -> dict[str, torch.Tensor]:
        # TODO: Implement dynamic input generation
        return {}


class TestQwenImageTransformerLora(QwenImageTransformerTesterConfig, LoraTesterMixin):
    pass


class TestQwenImageTransformerContextParallel(QwenImageTransformerTesterConfig, ContextParallelTesterMixin):
    pass


class TestQwenImageTransformerTraining(QwenImageTransformerTesterConfig, TrainingTesterMixin):
    pass


class TestQwenImageTransformerLoraHotSwappingForModel(QwenImageTransformerTesterConfig, LoraHotSwappingForModelTesterMixin):
    different_shapes_for_compilation = [(4, 4), (4, 8), (8, 8)]

    def get_dummy_inputs(self, height: int = 4, width: int = 4) -> dict[str, torch.Tensor]:
        # TODO: Implement dynamic input generation
        return {}

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul

Excellent stuff! S

ome general comments:

Normalize the model outputs to a common format before they go to torch.allclose().
Initialize the input dict newly before passing to a new initialization of the model with torch.manual_seed(0). This is because some autoencoder models take a generator input.
Use fixtures wherever possible to reduce boilerplate and take advantage of pytest features.
- One particular session-level fixture could be base_output. It should help reduce test time quite a bit.
Use pytest.mark.parametrize where possible.

Okay for me to do in a future PR but:

Should also account for the attention backends.
Should we also do a cross between CP and attention backends?
How about the caching mixins?

Some nits:

Use torch.no_grad() as an entire decorator as opposed to using it inside the functions.

tests/conftest.py

sayakpaul · 2025-12-11T09:53:40Z

tests/models/testing_utils/attention.py

+        - get_dummy_inputs(): Returns dict of inputs to pass to the model forward pass
+
+    Pytest mark: attention
+        Use `pytest -m "not attention"` to skip these tests


How do we implement it in an individual model testing class? For example, say we want to skip it for model X where its attention class doesn't inherit from AttentionModuleMixin?

Ideally, any model using attention also uses AttentionModuleMixin. The options here

Do not add tests from attention Mixin to a module file.

Add a decorator that makes AttentionModuleMixin a requirement for running attention tests.

But there are important classes like Autoencoders that don't use the Attention mixins.

Let's do this?

Add a decorator that makes AttentionModuleMixin a requirement for running attention tests.

But there are important classes like Autoencoders that don't use the Attention mixins.
The check is for AttentionModuleMixin not AttentionMixin and Autoencoders do use it

diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py

Line 39 in 5851928

ModelMixin, AttentionMixin, AutoencoderMixin, ConfigMixin, FromOriginalModelMixin, PeftAdapterMixin

Oh, I see your point!

So, as long as there is an Attention module this class should apply.

So, maybe for each of the tests, at the beginning, we could check if their attention classes inherit from

diffusers/tests/models/testing_utils/attention.py

Line 76 in d9b73ff

if isinstance(module, AttentionModuleMixin):

and if that's not the case, we skip.

Otherwise, I think it could be cumbersome to check which model tests should and shouldn't use this class because attention is a common component.

sayakpaul · 2025-12-11T09:55:17Z

tests/models/testing_utils/attention.py

+
+    Expected class attributes to be set by subclasses:
+        - model_class: The model class to test
+        - base_precision: Tolerance for floating point comparisons (default: 1e-3)


This feels unnecessarily restrictive. Would rather rely on method-specific rtol and atol values because we have seen that they vary from method to method.

Holdover from initial draft. All precisions will be configurable

@DN6 this seems to be the case, still. We should make precision-related arguments atol and rtol configurable instead of relying on a base_precision.

tests/models/testing_utils/attention.py

tests/models/testing_utils/quantization.py

sayakpaul · 2025-12-11T11:12:02Z

tests/models/testing_utils/quantization.py

+    def test_gguf_quantized_layers(self):
+        self._test_quantized_layers({"compute_dtype": torch.bfloat16})
+
+


Where do we include:

diffusers/tests/quantization/test_torch_compile_utils.py

Line 27 in 6708f5c

class QuantCompileTests:

sayakpaul · 2025-12-11T11:19:29Z

utils/generate_model_tests.py

@@ -0,0 +1,489 @@
+#!/usr/bin/env python


I guess it cannot currently generate the dummy input and init dicts? I would understand if so because inferring those is quite non-trivial.

tests/models/testing_utils/attention.py

tests/models/testing_utils/lora.py

sayakpaul · 2025-12-15T12:43:09Z

One thing I think we should do is get a coverage report for tests/models with `main and with this PR and confirm we are not skipping anything truly critical.

If we are, then we should likely be able to explain why that's the case.

sayakpaul

Looking much better. Thanks a TON for working out all my comments.

I have left a couple of comments which are mostly minor.

My major feedback now is regarding the rewrite of the quantization tests. I think we should do a before and after coverage and see if we're missing something important.

LMK if anything is unclear.

sayakpaul · 2025-12-31T14:29:56Z

tests/models/testing_utils/attention.py

+from ...testing_utils import (
+    assert_tensors_close,
+    is_attention,
+    torch_device,
+)


Suggested change

from ...testing_utils import (

assert_tensors_close,

is_attention,

torch_device,

)

from ...testing_utils import assert_tensors_close, is_attention, torch_device

sayakpaul · 2025-12-31T14:30:09Z

tests/models/testing_utils/attention.py

+
+    Expected class attributes to be set by subclasses:
+        - model_class: The model class to test
+        - base_precision: Tolerance for floating point comparisons (default: 1e-3)


@DN6 this seems to be the case, still. We should make precision-related arguments atol and rtol configurable instead of relying on a base_precision.

sayakpaul · 2025-12-31T14:42:11Z

tests/models/testing_utils/cache.py

+        # Create modified inputs for second pass (vary hidden_states to simulate denoising)
+        inputs_dict_step2 = inputs_dict.copy()
+        if "hidden_states" in inputs_dict_step2:
+            inputs_dict_step2["hidden_states"] = inputs_dict_step2["hidden_states"] + 0.1


Some models might not have "hidden_states" in their forward():

diffusers/src/diffusers/models/transformers/transformer_z_image.py

Line 890 in 1cdb872

x: Union[List[torch.Tensor], List[List[torch.Tensor]]],

I agree that this is a special case but while we're at it, would it make sense to:

Make "hidden_states" configurable at the class-level.

0.1 might not make as big a difference as expected. Maybe inputs_dict_step2["hidden_states"] + torch.randn_like(inputs_dict_step2["hidden_states"])?

sayakpaul · 2025-12-31T14:46:11Z

tests/models/testing_utils/cache.py

+        # Test cache_context works without error
+        with model.cache_context("test_context"):
+            pass


Should we test for anything else here as well? For example if _set_context() is working as expected or not?

sayakpaul · 2025-12-31T14:46:53Z

tests/models/testing_utils/cache.py

+
+        model.disable_cache()
+
+    def _test_reset_stateful_cache(self):


For consistency:

Suggested change

def _test_reset_stateful_cache(self):

@torch.no_grad()

def _test_reset_stateful_cache(self):

sayakpaul · 2025-12-31T15:23:50Z

tests/models/testing_utils/memory.py

+
+
+@is_group_offload
+class GroupOffloadTesterMixin:


Same as above.

sayakpaul · 2025-12-31T15:24:19Z

tests/models/testing_utils/memory.py

+    """
+
+    @require_group_offload_support
+    def test_group_offloading(self, record_stream=False):


@DN6 this seems to be unresolved:

diffusers/tests/models/test_modeling_common.py

Line 1766 in 1cdb872

@parameterized.expand([False, True])

sayakpaul · 2025-12-31T15:24:54Z

tests/models/testing_utils/parallelism.py

+from ...testing_utils import (
+    is_context_parallel,
+    require_torch_multi_accelerator,
+)


Suggested change

from ...testing_utils import (

is_context_parallel,

require_torch_multi_accelerator,

)

from ...testing_utils import is_context_parallel, require_torch_multi_accelerator

sayakpaul · 2025-12-31T15:25:47Z

tests/models/testing_utils/parallelism.py

+@is_context_parallel
+@require_torch_multi_accelerator
+class ContextParallelTesterMixin:
+    base_precision = 1e-3


Let's remove base_precision and give the underlying methods atol and rtol for finer control.

sayakpaul · 2025-12-31T15:29:22Z

tests/models/testing_utils/quantization.py

+
+
+@require_accelerator
+class QuantizationTesterMixin:


For sanity checking, I think it would be better to obtain the coverage of current https://github.com/huggingface/diffusers/blob/main/tests/quantization/ and the coverage with the changes from this PR.

This will give us a good idea of what we might be missing and if that needs further fixing.

DN6 added 8 commits November 12, 2025 10:17

update

1f026ad

Merge branch 'main' into model-test-refactor

1c55871

update

bffa3a9

update

aa29af8

update

0f1a4e0

update

fe451c3

update

489480b

update

0fdd9d3

DN6 requested review from dg845, sayakpaul and yiyixuxu December 11, 2025 06:16

update

c366b5a

sayakpaul reviewed Dec 11, 2025

View reviewed changes

tests/models/testing_utils/attention.py Outdated Show resolved Hide resolved

sayakpaul reviewed Dec 11, 2025

View reviewed changes

tests/models/testing_utils/lora.py Show resolved Hide resolved

sayakpaul mentioned this pull request Dec 12, 2025

[tests] enhance attention backend tests #12830

Open

DN6 added 4 commits December 15, 2025 14:19

update

d08e0bb

update

eae7543

update

dcd6026

update

d9b73ff

sayakpaul mentioned this pull request Dec 15, 2025

Migrate lora pipeline tests to use pytest #12426

Closed

DN6 added 3 commits December 18, 2025 13:16

update

e82001e

update

c70de2b

update

7b3ef42

sayakpaul reviewed Dec 31, 2025

View reviewed changes

sayakpaul and others added 3 commits December 31, 2025 21:03

Merge branch 'main' into model-test-refactor

e0ab03d

update

ba475ee

update

6caa0a9

		def test_gguf_quantized_layers(self):
		self._test_quantized_layers({"compute_dtype": torch.bfloat16})

	def _test_reset_stateful_cache(self):
	@torch.no_grad()
	def _test_reset_stateful_cache(self):

[WIP] Refactor Model Tests #12822

Are you sure you want to change the base?

[WIP] Refactor Model Tests #12822

Conversation

DN6 commented Dec 11, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sayakpaul commented Dec 15, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sayakpaul left a comment •

edited

Loading

sayakpaul Dec 15, 2025 •

edited

Loading