Scientists are teaching computers about the complex factors that make music expressive. Might computers outperform humans someday?

The digital age has been a mixed blessing for musicians. Digital electronics can supplant live performers and illegal copying of recordings has decimated royalty payments. Soundtrack composers can now access vast archives of digitally sampled instruments and voices – such as the Vienna Symphonic Library – to cobble together credible orchestrated compositions on a computer without having to hire the actual players themselves.

Composers may soon face high-tech competition as well. It’s already possible, for example, for a
computer to write a sonata in the style of, say, Mozart.

Now, however, Google’s Magenta Program is ­developing artificial intelligence that can learn how to generate original music, not merely to compose according to pre-programmed stylistic parameters.
Just how far advanced are these technologies ­really?  And where are they leading us?


 

Above: By providing its neural network with only four notes, Google’s Magenta machine-learning program created its first 90-second original tune in June 2016. The drums were added by humans.


 

Music from the can
Austrian researchers are using computers to analyze the expressive elements of musical performance and the way ensemble musicians interact with each ­other. Their goal is to get musicians to be even more creative, rather than to supplant them with ­machines. “Musically gratifying interaction with computers at a high level of musical quality [is] the ultimate goal,” wrote Gerhard Widmer, head of the Intelligent Music Processing and Machine Learning Group (IMP/ML) at the Austrian Research Institute for Artificial Intelligence (OFAI), whose latest project, Con Espressione, was recently awarded a €2.3 million grant from the European Research Council.



Gerard Widmer, a leader in the field of music ­information research, received a €2.3 million ERC advanced grant to develop machines that are aware of certain dimensions of musical expressivity.


 

A leading researcher in Music Information Research (MIR) for nearly 20 years, Widmer and his team have been devising methods to obtain data from live and recorded performances, feed them into a machine-learning program and create applications based on their findings.

One of his first major studies (1999) investigated whether computers could distinguish the idiosyncrasies of a particular piano performance – the tempo, timing, dynamics and articulation. Using the 1989 Konzerthaus recordings of Chopin by Russian pianist Nikita Magaloff on a sensor-equipped Bösendorfer SE, he was able to measure every keystroke, hammer velocity and pedal movement.

Mistakes make the music
Magaloff’s brilliant performances were not perfect; the study recorded “3.67% insertion errors, 3.50% omission errors and 1.55% substitution errors,” providing a critical element of expressive performance, in a form computers could reproduce. From this, Widmer and his associates developed an artificial intelligence to implement their discoveries, and won top awards at the 2008 Rencon Performance Rendering Contest in Hashida, Japan, where they convinced the technical jury and wowed composers and the public with its near-human expressiveness.

Since then, Widmer and the OFAI have improved their tools of capturing performance data. Computers can now “listen” to a performance, find its score and synchronize it to a performance in real time, allowing computers to compare varying interpretations of a piece.

Werner Goebl of the University of Music and Performing Arts Vienna uses motion-capture technology to teach computers about the non-verbal communication between musicians. (Photo: Alex Mayer)
Werner Goebl of the University of Music and Performing Arts Vienna
(Photo: Alex Mayer)

One of Widmer’s early collaborators was Werner Goebl, a professor at Vienna’s University of Music and Performing Arts, who studies how ensemble musicians communicate and synchronize with each other. Based on his research models, he developed an interactive game-like exhibit called “Tapping Friend” (currently on a world tour) in which users set up a rhythm in sync with another player and a computer “maestro,” which can either dictate the tempo or adapt it to the players’ performance in real time.

The more serious side of his research involves analyzing how ensemble performers use audio and visual cues to communicate. One study measured how an accompanist could follow a videotaped soloist using only audio cues, only visual cues, or both. Goebl now uses 3D motion-capture technology to record players’ specific gestures while performing.

A string quartet's non-verbal communication is analyzed using motion-capture technologyPhoto courtesy Werner Goebl
A string quartet’s non-verbal communication is analyzed using motion-capture technology
Photo courtesy Werner Goebl

 

Similarly to the Magaloff study, Goebl’s data can reveal what gestures yield which performance characteristics, information that can help musicians to improve their ensemble technique. As the research progresses and the computers “learn,” it may become possible to have a computer accompany a soloist in perfect synchrony. It’s not everything – balance, tone, warmth, and touch may take longer – but it’s a lot.

 

The more you research, the more you understand how complex the phenomena are and how tiny your point of view was.

Werner Goebl, Professor at Vienna’s University of Music and Performing Arts

 

Searching for the unknowable?
The application of MIR to music is still in its infancy. “The more you research,” said Goebl,  “the more you understand how complex the phenomena are and how tiny your point of view was.” As complex as
piano performance is, the variables are far fewer than for strings and winds, let alone for the human voice.

And what makes white Europeans synchronize their clapping on the “strong” first and third beat of a 4/4 tune, while African-Americans favor the syncopated second and fourth beats? Goebl shrugs, “in Vienna, we have just three beats.”


Harry Connick Jr. was able to “fix” the unfunky clapping of his audience with a little 5/4 trickery. Let’s see if a computer could think that up on the fly!


 

It will take an incalculable amount of data, ­gathered, analyzed, understood and translated into code, before a computer can pass the Turing test of its “humanity,” much less win a seat in the Vienna Philharmonic.