-
-
Notifications
You must be signed in to change notification settings - Fork 111
New demo: cat detector #465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This will detect if a cat is on the screen. By which I mean displayed on the screen, not sitting on your laptop. This is meant as a simple demo of using MSS for AI. It works as-is, but needs to be documented, and there's some bits that could do with cleanup. There are a lot of additional features that could be added, such as showing a window with bounding boxes, but that's probably more complexity than is called for here.
|
I like it, great inspiration! |
halldorfannar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent PR. Well documented, enjoyable to read. Just some minor improvements suggested.
| # identify what it's seeing on its cameras. | ||
| # | ||
| # For this demo, we want to tell if a cat is anywhere on the screen, not if the whole screen is a picture of a cat. | ||
| # That means that we want to use an detector, not a classifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # That means that we want to use an detector, not a classifier. | |
| # That means that we want to use a detector, not a classifier. |
| # Performance | ||
| # =========== | ||
| # | ||
| # The biggest determinant of performance is whether the model runs on a GPU or on the CPU. GPUs are extremely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should mention here, right away, that this particular model will work on both? I know this becomes clearer in the end of this section, when GPU vs CPU performance comparisons are discussed.
| # =========== | ||
| # | ||
| # The first time you run this demo, Torchvision will download a 167 MByte DNN. This is cached in | ||
| # ~/.cache/torch/hub/checkpoints on Unix. I'm not sure where it's cached on other platforms, but it will tell you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third person was used earlier in the text. I would avoid switching to first person here. Perhaps just go with this:
| # ~/.cache/torch/hub/checkpoints on Unix. I'm not sure where it's cached on other platforms, but it will tell you. | |
| # ~/.cache/torch/hub/checkpoints on Unix. If you want to know where the cache is stored on other platforms, this information will be displayed after downloading the DNN. |
| import torchvision.models.detection | ||
| import torchvision.transforms.v2 | ||
|
|
||
| # You'll also need to "pip install mss pillow". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than assume the user is leveraging pip (I'm a fan of uv) I would suggest the more general:
| # You'll also need to "pip install mss pillow". | |
| # You'll also need to install mss and pillow. |
This also aligns with the earlier text where pip is suggested but it is left open for the user how to do this specifically.
| # If an image is too small, then it's got a pretty decent chance of being a false positive: it's hard to tell if a | ||
| # Discord or Slack reaction icon is a cat or something different. We ignore any results that are too small to be | ||
| # reliable. Here, this cutoff is 0.1% of the whole monitor (about 1.5 cm square on a 27" monitor, the diameter of a | ||
| # AA battery). Like the score threshold, this is just something you try and see what the model seems to be able to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # AA battery). Like the score threshold, this is just something you try and see what the model seems to be able to | |
| # AA battery). Like the score threshold, this is just something you try and see what the model is able to |
| preprocess = weights.transforms() | ||
|
|
||
| # The labels ("what type of object is this") that the model gives us are just integers; for this model, they're | ||
| # from 0 to 90. The English words describing them ("cat") are in a list, stored in the weight's metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # from 0 to 90. The English words describing them ("cat") are in a list, stored in the weight's metadata. | |
| # from 0 to 90. The English words describing them (like "cat") are in a list, stored in the weight's metadata. |
This will detect if a cat is on the screen. By which I mean displayed on the screen, not sitting on your laptop.
This is meant as a simple demo of using MSS for AI. It works as-is, but needs to be documented, and there's some bits that could do with cleanup.
There are a lot of additional features that could be added, such as showing a window with bounding boxes, but that's probably more complexity than is called for here.
Changes proposed in this PR
Fixes #
(...)
./check.shpassed