[comp] Zoom In; Enhance

31 Mar, 2022

To keep it in practical terms that are fresh in my mind: Imagine you've got "just a bunch of" microcontrollers (it probably wants to be a perfect square, say 25) that can be put to use for image analysis purposes ("DL" or otherwise, I guess). I was looking at esp32 stuff, so imagine: 25 esp32 systems all wired to some central system (whatever you like; another esp32, rpi...)

You are allowed to say to any, or maybe some team of these: Analyze this image and mark any cats.

Your image analysis software always works in squares. You've got a square image, which can be partitioned, n⨉n, into more squares, each of which will get the same treatment. Further, your algorithm expects a fixed resolution, to keep it tangible let's invent a spec: 100 ⨉ 100 pixels.

Some other layer of software is responsible for giving you a 100 ⨉ 100 image as input for your software. If, for example, you have a 1000 x 1000 pixel camera and you decide to start with "full size" and then analyze higher-resolution squares successively, this "layer" will take the full-size image, average 10x10 squares of pixels across and down, and give you your expected 100 ⨉ 100 image for analysis. This averaging is cheap, I am assuming.

If you are at the beyond the limit of your camera's resolution you can just stop there; no more zoom in/enhance.

So you could for example, start with full screen, get your 100 x 100, not see anything interesting and then chop the image into quarters and farm out the analysis of those four squares to available microcontrollers. When they're done, they can go back into the "available".

You could then choose to quarter each of those squares, and do more farming out.

You can keep doing this until you've reached the limit of camera's resolution (the next resolution you 'call for' can not be met by the available actual pixels), you can drop off that "branch" and return the microcontrollers to the pool.

It seems arbitrary and wasteful to start of with averaging perfectly good pixels, but this keeps the system simple and in the end, you do get analysis of your whole image. If a cat takes up the entire frame and you're 100 ⨉ 100 image is too fuzzy for your algorithm to see the cat well...that sucks I guess. Maybe it is somehow wasteful. It might depend on the type of cats and/or this might apply to a different application more naturally.

Imagine your algorithm identifies cars. If the algorithm can start with a potato image of a car, guess that it's probably a blue Mercedes, and can also identify "about where the manufacturer's badge should be", your algorithm could suspend it's disbelief, hand over the location of the hypothetical badge to the "zoom in/enhance" layer and wait for the badge results to come back (oh, and the potato resolution pass of the car in question is already done at this point, so those microcontrollers could be investigating another possible badge location on the same car, or maybe jump to identifying another car -- whatever your desires.)

This could apply to other things, maybe.

Recursively refined mass spectroscopy → Recursively better understanding of sample weight distributions → A kind of computer "smell", as we zoom in on the exact components that makes this air so stinky...?