Skip to content

Conversation

@jholveck
Copy link
Contributor

This will detect if a cat is on the screen. By which I mean displayed on the screen, not sitting on your laptop.

This is meant as a simple demo of using MSS for AI. It works as-is, but needs to be documented, and there's some bits that could do with cleanup.

There are a lot of additional features that could be added, such as showing a window with bounding boxes, but that's probably more complexity than is called for here.

Changes proposed in this PR

Fixes #
(...)

  • Tests added/updated
  • Documentation updated
  • Changelog entry added
  • ./check.sh passed

This will detect if a cat is on the screen.  By which I mean displayed
on the screen, not sitting on your laptop.

This is meant as a simple demo of using MSS for AI.  It works as-is,
but needs to be documented, and there's some bits that could do with
cleanup.

There are a lot of additional features that could be added, such as
showing a window with bounding boxes, but that's probably more
complexity than is called for here.
@BoboTiG
Copy link
Owner

BoboTiG commented Jan 22, 2026

I like it, great inspiration!

Copy link
Contributor

@halldorfannar halldorfannar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR. Well documented, enjoyable to read. Just some minor improvements suggested.

# identify what it's seeing on its cameras.
#
# For this demo, we want to tell if a cat is anywhere on the screen, not if the whole screen is a picture of a cat.
# That means that we want to use an detector, not a classifier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# That means that we want to use an detector, not a classifier.
# That means that we want to use a detector, not a classifier.

# Performance
# ===========
#
# The biggest determinant of performance is whether the model runs on a GPU or on the CPU. GPUs are extremely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should mention here, right away, that this particular model will work on both? I know this becomes clearer in the end of this section, when GPU vs CPU performance comparisons are discussed.

# ===========
#
# The first time you run this demo, Torchvision will download a 167 MByte DNN. This is cached in
# ~/.cache/torch/hub/checkpoints on Unix. I'm not sure where it's cached on other platforms, but it will tell you.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third person was used earlier in the text. I would avoid switching to first person here. Perhaps just go with this:

Suggested change
# ~/.cache/torch/hub/checkpoints on Unix. I'm not sure where it's cached on other platforms, but it will tell you.
# ~/.cache/torch/hub/checkpoints on Unix. If you want to know where the cache is stored on other platforms, this information will be displayed after downloading the DNN.

import torchvision.models.detection
import torchvision.transforms.v2

# You'll also need to "pip install mss pillow".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than assume the user is leveraging pip (I'm a fan of uv) I would suggest the more general:

Suggested change
# You'll also need to "pip install mss pillow".
# You'll also need to install mss and pillow.

This also aligns with the earlier text where pip is suggested but it is left open for the user how to do this specifically.

# If an image is too small, then it's got a pretty decent chance of being a false positive: it's hard to tell if a
# Discord or Slack reaction icon is a cat or something different. We ignore any results that are too small to be
# reliable. Here, this cutoff is 0.1% of the whole monitor (about 1.5 cm square on a 27" monitor, the diameter of a
# AA battery). Like the score threshold, this is just something you try and see what the model seems to be able to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# AA battery). Like the score threshold, this is just something you try and see what the model seems to be able to
# AA battery). Like the score threshold, this is just something you try and see what the model is able to

preprocess = weights.transforms()

# The labels ("what type of object is this") that the model gives us are just integers; for this model, they're
# from 0 to 90. The English words describing them ("cat") are in a list, stored in the weight's metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# from 0 to 90. The English words describing them ("cat") are in a list, stored in the weight's metadata.
# from 0 to 90. The English words describing them (like "cat") are in a list, stored in the weight's metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants