Great question! It was my mother's birthday last week, and the family got together in four different venues over Google Hangout to sing "When I'm 64" to her for her birthday :-) I called it before we even started the hangout: the latency (delay) would cause us to all keep slowing down to let each other catch up, then realize that everybody was getting further behind, so we all needed to speed up and skip ahead, and then we would slow down again, etc. Sure enough, every few seconds we seemed to have singing synchronization issues. It made the whole thing a lot funnier, but it wouldn't work for your situation at all!
In the general case, this is not solvable for the same reasons that Einstein said that all simultaneity is relative: when it takes a non-zero amount of time to send information from point A to point B, and back again to point A, it's impossible for both point A and point B to agree on a global concept of "now". You simply cannot reduce the latency to zero for network connections, and much less for running a complex streamed application like a Hangout over the network, and the further apart you are in the world, the greater the expected latency.
The way that this has been solved in the past (e.g. by that massive virtual orchestra / virtual choir project that has been run over YouTube a couple of times before) was to pre-record the music, and have each singer play the sound in their headphones while singing / playing. Then they each separately recorded their videos and sent them to someone who mixed them down into a single track, offline, after they had all finished recording their separate tracks. i.e. they simply avoided the problem entirely by not performing simultaneously :-)
If I were you, I would simply experiment with performing simultaneously, maybe you can practice having one of you (the one on the recording end) singing exactly on time with the other person, and the other person playing/singing at exactly 2x the one-way delay time ahead of the other person. The trick would be to have the performer that is playing ahead (not on the recording end) set the tempo and basically pay no attention to the person on the recording end (i.e. don't try to slow down to let them catch up). As long as the person on the recording end is on-time and keeps up with the one that is leading the piece, nobody will know about the synchronization issues.
If you're not recording locally, but rather broadcasting the Hangout live, you both need to split the delay equally, so that each of you sings/plays at exactly 1x the one-way time delay ahead of the other person (or ahead of what you hear coming out of your speakers). Actually, you probably need to play 0.5x the latency ahead of what you hear coming out of your speakers, because each connection is routed through Google's servers and then back out to the other person, and it's from Google's servers that the two different video signals are mixed and then broadcast out to the rest of the world.
I hope this makes sense. There's really no way around this for live hangouts though! (But you might be able to make it work for recordings.)