All those parts looks fine. Good old drv8825 or a4498 would be easy to use. TMC takes a bit more work, unless you configure them in standalone mode.
The steppers would get higher voltage, like 12V or at least 9V, I would guess. I would guess the esp32 only uses 5V. So you will need to power it from the usb or get a dcdc buck converter. Be aware when you’re poking around on the breadboard to keep the 12V where it belongs. It won’t arc ir anything, but if you accidentally short 12V to one of those pins, it will be toast.
Doing some kind of automatic bird following is tough. Even something that is easier to recognize, like a bright light in the dark would be tough. But figuring out what in the image is a bird requires more horsepower. I would hope someone had a library that would be pretty much done already that found birds using a neural network and was relatively quick on a pi. If you have low resolution or too much latency, it will be a hard problem fast.
You could do what I did though and set the start/stop points, define a duration and have it slowly pan around to take some fun timelapses.