Robots for companionship are becoming more and more important, but a serious barrier is the lack of natural human-robot interaction. Our ongoing work is to improve the interaction abilities of music-playing robots, in particular.
Indeed, musicians often have the following problem: they have a music score that requires 2 or more players, but they have no one with whom to practice. A music-playing robot would fit this role perfectly, but current score-playing music robots cannot synchronize with fellow players. In other words, if the human speeds up their play, the robot should also increase its speed.
In this paper, we present a first step towards giving these accompaniment abilities to a music robot. We introduce a new paradigm of beat tracking using 2 types of sensory input -- visual and audio -- using our own visual cue recognition system and state-of-the-art acoustic onset detection techniques.
Specifically, we first formalize this non-verbal language for the case of flutists, such as moving the flute to signal “start now”, “stop now” and “beat cue”. We then fuse this information with audio note onset detection to detect tempo changes in a robust manner.
Initial experiments on our Thereminist robot following one flutist show >83% detection rates for our 3 types of visual cues. Additionally, by coupling visual cues and acoustic beat detection, the robot can extract tempo in less than half a second.