Hi,
No, that's now what I meant.
My wire-frame design was a comparison of what you are proposing to do with captions (left) versus what would be nice to have as a media gallery with content bellow each image in it (right).
Here is another example of what the end results of a Top X list looks like
http://meresetcie.com/10-choses-a-faire-avant-votre-accouchement/
Notice how there's a back & forth navigation and on each pane there,s an image and text bellow it. This is exactly like your media block except there's related text bellow it. You're proposing to simply add captions which is really close but doesn't allow one to quite do this as it requires more than just a quick caption of text. This is why i proposed you do what you had mentioned with the text field in the EB management interface but just allow full HTML in there too.
See attached screenshot ;-)
Danny