If one of the two is the top window (i.e. the one that's actually
visible), then we should use that one,
Detecting whether "one of the two is the top window (i.e. the one that's
actually visible)" requires access to the Z order of windows.
Not exactly: since this should be done by the C code, it can ask the
window-system, which is the "current window under the mouse pointer".
The result depends on Z ordering, but Emacs doesn't need to know that
Z ordering.
The event structure can only contain what we put into it. Which window
would you put into this structure after you leave the frame containing
the window where the start event occurred?
The GUI window that's under the mouse pointer (according to the
window-system).