N-gram Music 
Generative music inspired by linguistics. 

A final project for 6.S083: Computation, Cognitive Science, and Language 
taught by Prof. Bob Berwick. 

paper, code 



In this project I developed a relatively simple method that takes an input audio file or files and outputs new music "in the style" of the inputs. It is based on n-grams which are Markov-style generators used in linguistics. Results are at the end of this page. 



A brief introduction to n-grams 

An n-gram model is used in linguistics to generate novel sentences from a large corpus of text. An n-gram is any continuous sequence of n words. For example the previous sentence contained the 3-grams, "An n-gram is", "n-gram is any", "is any continous", etc. If we assume that language is made of an independent stream of n-grams, then we can generate the next word $x_i$ in a stream by randomly sampling from the distribution $P(x_i | x_{i - (n-1)}\ldots x_{i-1})$. 

For example, suppose our corpus is composed of the sentences: 

    [START] to be or not to be [STOP] 
    [START] to die to sleep to sleep perchance to dream [STOP] 

Some sentences generated by 2-grams could be: 

    [START] to be [STOP] 
    [START] to be or to sleep to sleep to sleep perchance to be [STOP] 
    [START] to die or not to die to be [STOP] 

The results are ok but not great - if n is too small the result is nonsensical, if n is too large the output is identical to the input. But what would happen if we applied this to music? 



Applying n-grams to music 

My goal was to generate music using n-grams with raw audio (not MIDI). To do that I first perform transient detection on the input audio to break a song up into individual "words" via [1]. The detected split points between these words (called onsets) are plotted below for the Aria of Bach's Goldberg Variations: 

image:https://github.com/sportdeath/N-Gram-Music/raw/master/Paper/sheetMusic.png 
image:https://github.com/sportdeath/N-Gram-Music/raw/master/Paper/onsetPlot.png 
Then I perform harmonic analysis on each word to determine a 12 dimensional vector that describes the prevalence of each of the 12 notes in that word. To do this I perform frequency estimation [2] on the spectrum of the word and then wrap those frequencies logarithmically. The two chords below containing the same notes C-E-G have the same harmonic profile when played on a piano despite having different voicings: 

image:https://c2.staticflickr.com/2/1965/45557676212_985df5eaeb_o.jpg 
image:https://github.com/sportdeath/N-Gram-Music/raw/master/Paper/sameChord.png 
These words are then clustered based on their cosine distance so that words with similar harmonic content are considered the same. Now, we can simply apply the n-gram streaming algorithm described above to generate music! Some special care needs to be made when stitching words together to prevent clipping, and I decided to perform an independent n-gram for volume to avoid sharp dynamic changes. 

The code can be found on github and a more in depth in depth description can be found in the paper. 



Results  

For songs containing a single instrument, the resulting music is actually pretty good. If you're not listening closely this process could generate an infinite stream of impressionist music: 

 

But other times the result is just abysmal (this one is particularly funny): 

 

And just for fun here is a variation of every Goldberg variation: