ECE 4760 Final Project: Chord Identifier

Ariana Haghighi (ah677), Jeff Nan (jjn48), Tiffany Chou (tlc234)

For our final project, we created a chord identifier using the Raspberry Pi Pico RP2040 microcontroller that uses frequency analysis of a microphone's audio signal to determine the chord played from an instrument and displays the result on a VGA screen and PuTTY serial interface.

Demo Video

Introduction

The aim of our final project was to design a chord identifier. Our project combined our knowledge from previous ECE courses such as circuits, signal analysis, and digital logic with our experience from past labs in this course. Throughout the four weeks we worked on this project, we were able to use an iterative design process and improve upon the initial design week by week.

First, we implemented a note identifier. We set up the microphone the same way as in Lab 1 and played pure tone frequencies. Using the FFT code provided from Lab 1, we performed an FFT on the audio sample and determined the maximum frequency from the FFT. From the maximum frequency, we used a formula to identify the note name and scale (e.g. A4, D3, etc.). We output the maximum frequency and note name on the serial interface.

Building upon the note identifier, we used the same microphone and serial interface setup, and rather than looking for the single maximum frequency, we identified the three highest frequencies from the FFT. These frequencies use the note identifier to label the three most powerful notes. Applying the foundations of music theory, we determine which chord is played; we distinguish major, minor, augmented, and diminished chords as well as triads.

Once the basic chord identification functionality was implemented, the next step was to improve performance by implementing a low pass filter to limit the range of the frequencies picked up by the microphone. This filter allowed us to change the parameters of the sampling for our FFT algorithm, which increased the efficiency and accuracy of our chord identification. Finally, a Protothreads C program was implemented to make use of the two cores on the RP2040. The serial interface and VGA were implemented on one core, and the FFT was implemented on the other.

High Level Design

Rationale for Project Idea

Originally we wanted to build an instrument identifier, due to our shared interest in music. If you play an A4 on a piano and an A4 on a guitar, you are playing the same note, but you can tell which note comes from a piano and which note comes from a guitar. This distinction is from a musical property called timbre, akin to tone color or tone quality. Words like bright, dull, brassy, reedy, harsh, and soft are used to describe timbre; while this property seems largely abstract and hard to quantify, these unique timbre characteristics can be measured by looking at the frequency spectra. When a piano and guitar play an A4, the highest frequency term is the A4 note; however, due to the shape of the instrument, there are other resonant frequencies. These other harmonics define an instrument's timbre. We wanted to build an instrument identifier that analyzed each instruments’ frequency spectra (collected by an FFT) to predict which instrument was being played, a task which requires some amount of machine learning. There are many data sets available online that match an audio file to an instrument, however to train our system, we needed data sets with FFT data matched to an instrument, and processing each audio file through the Pico's FFT is unfeasible.

Ultimately, we decided to change projects, but keep the musical and signal processing flavors. Instead, we opted to implement a chord identifier to keep the musical theme.

Background Math

Implementing and understanding our chord identifier requires understanding some background math including frequency-to-note conversion, an FFT, and a Sallen-Key low pass filter.

Frequency-to-note Conversion

When an A4 is tuned to 440 Hz, the equation to convert a frequency (in Hz) to a note number is as follows:

From the note number, a lookup table is used to convert note number to note name.

Fast Fourier Transform

We used a Fast Fourier Transform (FFT) to quickly collect frequency information. One of the main concepts in signal processing is that any periodic signal can be represented as a sum of sinusoids by the Fourier Series Approximation. By Euler’s Formula, we can represent any sinusoid as a complex exponential: both of which are defined by a magnitude and a direction.

By the Shannon-Nyquist Sampling Theorem, any analog signal can be digitally sampled and reconstructed as long as the sampling rate is greater than twice the highest frequency term. In Signal Processing, analog signals are typically thought of being continuous, whereas digital signals are thought of being discrete. Using both of these facts together we can take an analog signal, sample it ‘fast enough,’ and get a sum of digital sinusoids. Note that the Fourier Series is an infinite sum of sinusoids.

Rather than give a computer an infinite sum (and infinite number of computations), we instead find a finite number of computations that would be a ‘good enough’ approximation, namely 2N-1. So rather than computing the Discrete-Time Fourier Transform (DTFT), we can compute the Discrete Fourier Transform (DFT). The DFT algorithm is very useful, but also very slow; so the Fast Fourier Transform (FFT) was created to optimize it. The FFT recursively splits the sum of discrete sinusoids into smaller and smaller components. If the number of samples is a power of two, then we can simplify the DFT all the way to summations of length one. In the case where the number of samples isn’t a power of 2, we can zero-extend any signal to make its samples a power of 2. For example if we have a signal that has 253 terms in it, two terms with values of zero can be added to the signal. This zero-padding does not change the signal in any way, but allows us to use the optimization algorithm. In summary, the FFT algorithm is used to represent a discretely-sampled analog signal; it is accurate enough to represent any signal, but uses significantly less computational power than the DTFT or the DFT.

Low Pass Filter Using Sallen-Key Topology

In general, a low pass filter will allow signals with frequencies below some cut off to pass through the system and attenuates signals with frequencies that are above some cutoff. This cutoff frequency (f0) is defined by the poles of the transfer function. When designing an amplifier, the transfer function is used to relate the output signal of the system to the input signal. For the Sallen-Key filter design, the corner frequency is defined as follows:

where R1 and R2 are resistor values in Ohms and C1 and C2 are capacitor values in Farads.

Hardware and Software Tradeoffs

Originally, the hardware for this design was fairly simple: the output of the microphone was directly connected to an ADC pin on the Pico. However, due to the broad spectrum of frequencies detected, the initial design resulted in a large FFT computation which was very slow. To improve the speed of the FFT computation, a complex hardware design was added to the design. We chose to implement a Sallen-Key low pass filter through a unity gain non-inverting amplifier, which eliminated the higher frequency terms, which in turn reduced the number of frequency samples inputted into the Pico. By eliminating the higher frequency terms, we could also decrease our sampling rate from 10kHz to 5kHz. This change allowed the FFT to have increased precision because there were less possible frequencies per bin, allowing notes to be more accurately found. If the sampling rate was decreased without implementing a low pass filter, it would have resulted in an aliasing of the higher frequency terms, which would have severely impacted the accuracy of the chord identifier. Ultimately, this tradeoff was necessary to improve our chord identifier by making it faster and more accurate.

Existing Patents

There exists a patented Chord Identifier; however, it is quite different and much more robust than our design. Their patent is for a “computer implemented method, a computer system, and a User Experience (UX) Interface.” Their computer system has a touch screen device, which we do not have. Their patent identifies chords using digital audio files which is quite different from our design which takes audio samples in real time from a microphone. Also, their UX is highly interactive and configurable whereas our UX consists of displaying information on a VGA screen and a PuTTY Serial Interface. They designed a software application which determines which key the chords are played in and stores the chord data it collects so the user can later access it. Our VGA display only displays the chord identified. Our PuTTY Serial Interface displays the three frequencies with the highest power and their corresponding note numbers and notes names. The design described in the patent has far more features than our design and is implemented completely differently.

Hardware Design

The hardware design of the chord identifier is twofold; first, there is the detection of audio from the environment, and secondly, there is the display of the chord identified. To detect audio, we use the Electret Microphone Amplifier, and pass the signal through a low pass filter, which is then connected to an ADC (analog-to-digital) GPIO pin on the RP2040. To display the chord identified, we provide two displays, a VGA screen, which shows the chord identified, and a UART serial interface that displays the three maximum amplitude frequencies detected by the FFT, the note names, and the chord identified. A diagram of the final hardware design is shown below.

Audio Detection and Filtering

Audio input is gathered from a microphone connected to a GPIO pin assigned to analog-to-digital conversion. This pin is controlled by a DMA channel, which samples the converted audio input and stores it into an array in memory for use in the FFT. When the DMA channel is finished sampling, it triggers another DMA channel to send a signal to start the sampling channel again, allowing for continuous sampling.

After implementing our program, we realized that we needed to significantly reduce the sample rate of the FFT, to increase the precision of the note and chord identification computation. Since we only need a certain range of frequencies to interpret musical notes, we decided to implement an analog low pass filter on the microphone input in hardware.

First, we implemented a RC circuit through a unity gain, non-inverting op amp (MCP6242). However, it started to drop off too early and did not decrease fast enough. So, this implementation reduced the power of frequencies we cared about and did not reduce the power of higher frequencies that we wanted to get rid of. We remedied this problem by implementing a Sallen-Key circuit. This design rapidly decreased the power of the frequencies near our desired cut off frequency of 1200 Hz. Below is an image of the Sallen-Key topology, which acts as a low pass filter with unity, non-inverting gain.

Since the Sallen-Key topology requires a strictly positive input, an additional AC coupling stage was placed at its input, using the other op-amp in the MCP6242 package, to give the input a DC offset at half of the rail voltage (which is given by the 3.3V supply from the Pico). Below is an image of our final circuit with the AC coupling stage.

Circuit Analysis of Low Pass Filter

After implementing the low pass filter, we simulated the circuit that was designed using PLECS and obtained a bode plot showing the attenuated signal starting at 1200 Hz. Additionally, we tested the filter using a frequency sweep with output shown on the oscilloscope to verify that our circuit was behaving as expected.

Below is our PLECS model of the Sallen-Key filter with AC coupling stage.

Below is the bode plot of a simulated Sallen-Key Filter, with cursor indicating the -6dB corner at around 1200 Hz and approximately -40dB/dec roll-off. As frequency increases past the corner frequency, response becomes increasingly attenuated.

Finally, this video shows the frequency sweep of signals going through the low pass filter. At low frequencies, the signal is not attenuated. As the frequencies increase, the signal gets significantly attenuated.

VGA and Serial Monitor Displays

As in previous labs, we used provided interfaces with the VGA screen and the UART serial monitor connection. A VGA screen is used to display the identified chord. Connecting the board to the VGA screen requires three 330 ohm resistors on the breadboard between the RP2040 output pins and the color inputs (RGB) on the screen connector. The VGA screen and the RP2040 must have a common ground, so a ground wire is connected to the VGA.

The UART serial connection is used to display detailed note and chord identification information on the serial monitor. A USB port is connected to GPIO pins 0 and 1 for TX0 and RX1, respectively. The USB port is plugged into the computer, and allows the user to type their inputs into the keyboard connected to the computer. Additionally, the ground from the USB is connected to the ground on the RP2040. To set up the serial interface, we use PuTTY with a baud rate of 115200.

Below is an image of the final setup of the VGA and serial monitor displays for the chord identifier.

Software Design

On Core 0, a looping thread runs sampling the ADC input, performing the FFT, and determining notes and chords from the resulting frequency information.

Sampling and FFT were adapted from code provided in Lab 1. As described in Hardware, sampling is done through DMA channels, which get the specified number of samples necessary for FFT and store them into an array.

When the FFT has been computed and the power spectrum derived from it, the 3 frequencies with the highest power are found. To do this, we iterate through the power spectrum, finding the maximum-power frequency bin, noting it, and setting the magnitude at that frequency and neighboring frequencies to 0. Then we iterate through the updated spectrum again to find the second and third highest-power frequencies. This process is illustrated in Fig. x. We remove neighboring frequencies from the spectrum in addition to the maximum in order to remove potential errors -- some frequency bins have higher power due to being close to the max frequency, and we do not want to include these in the calculations.

Once we have the three relevant frequencies, which we expect correspond to the three notes that were played, we translate these frequencies to note numbers, which simply correspond to a note’s position on a piano keyboard (e.g. note number 1 corresponds to A0, the lowest piano note; A4 is note number 49).

Once we have a note number, the following functions are used for translation into more well-known names:

  • getLetter takes as a parameter a note number and returns its letter name (e.g. A, A#, B, etc) by looking it up on a table of note names. The table is 12 entries long (for the 12 possible note letter names) and lookup is done by using the index

and mapping it to a note name in the following list:

  • getName is given a note number and returns the full name of a note, which is made up of its letter (found as in getLetter), as well as its scale number, which is found by the formula

(For example, say we input note number 49: Then the letter is found at index 0, which gives A, and the scale is found to be 4, giving the note A4.)

  • getChord takes a list of note numbers (sorted in increasing order) and determines the corresponding chord. This is done by finding the differences between each note in the chord; so, finding the gap between the lowest and middle notes, and the gap between the middle and highest. A chord is characterized by these gaps according to the table below:

Where the root note is determined in addition to the chord type by calling getLetter on the determined root (1st, 2nd, 3rd note in the chord).

On Core 1, information is outputted on the VGA and serial monitor, each of which uses a different thread.

During testing, the VGA was used to display a visualization of the FFT output (as in Lab 1), to observe the frequency magnitudes detected from the audio input. In the final iteration of the program, the VGA displays the detected chord, updating very quickly for fast response.

The serial monitor displays more detailed information about the inputted sound: the top 3 detected frequencies, their associated note numbers, the note names, and the detected chord. This thread is only queued every second.

Results of the Design

Our design successfully identified 3-note chords played up to quarter note = 100 bpm. We took advantage of the Pico’s dual core to compute the FFT and display on the Serial/VGA separately and ran the clock at 48 MHz. We ran the serial line at a baud rate of 115200 bits per second.

Here is the demo video of our final results. This video shows the chord identifier working in real time. It identifies a series of chords played by a sine wave synthesize, simulated guitar, and simulated piano.

The first rounds of testing involved playing notes and chords on a sine-wave synthesizer (we used the Helm synthesizer software) and observing the response of the program through a visualization of the FFT on the VGA screen, and the determined notes on the serial monitor. Once the system could identify chords to sufficient accuracy, we tested speed by encoding a chord progression into the Ableton DAW, and playing them at increasing tempos, with each chord being a quarter note. The system was able to reasonably catch every chord in the progression up to about 100 bpm, which corresponds to a played chord being identified in about 0.6 seconds.

Finally, we tested playing chords using different instrument sounds, including piano, guitar, and violin. Unlike the sine-wave synth, these sounds have additional harmonics that could be misidentified by the system as notes, and their volume decreases over time, which means that the system may not be able to determine the chord in time before the notes are too soft to identify. String sounds, which do not have this latter restraint, performed decently well, but ‘struck’ instruments such as piano and guitar were difficult for the system to parse. Since we structured the code such that each core had different functionalities, we did not have to enforce safety or implement spin looks since the two cores never update the same values.

This system could be used in a couple of different ways:

  • First, with better behavior on different instruments, it could be used to determine the chords in a played song; so, you could learn how to play a song by feeding it through the program.

  • In addition, the system would be good for learning music theory; if you have experience playing a keyboard but don’t have the theory to know exactly what you are playing, the system could tell you the names of chords, so that you could learn.

Conclusions

In general, the system was quite good at identifying 3-note chords played by pure sine waves, and slightly less effective for other instruments.

For further steps, we might first look into signal processing techniques to better identify notes played on real instruments. For example, when a note is played on a sine-wave synthesizer, the volume of the note is held constant the whole time the note is played. However, when a note is played on an actual instrument (like a piano), the volume of the note drops off drastically. To refine our design, we could research various signal processing techniques to deal with this issue.

In terms of increasing speed, we consider overclocking the Pico in order to process FFTs faster, and therefore get the chord results sooner, allowing for faster chord recognition and possibly more accurate results at higher tempos. We also considered adding 4-note chord functionality; there are a much wider variety of 4-note chords in music, so this step simply comes down to encoding all of these types of chords (in a similar way to the 3-note chords) in the program.

Finally, we consider taking some note of the timing and length of played chords, so that we may be able to tell the musical length (i.e. eighth, quarter, half notes) of chords. This feature would better serve the goal of teaching music theory or telling a player how long to play a given chord in a song.

Intellectual Property Considerations

We reused code from Lab 1, specifically Hunter Adams’ VGA display and FFT code. The latter cites Danielson-Lanczos FFT, code adapted from Tom Roberts 11/8/89 and Malcolm Slaney 12/15/94, as well as bit reversal code developed by Sean Eron Anderson. Outside of this code, we did not use code in the public domain. We did not reverse-engineer a pre-existing design. We did not have to sign a non-disclose to get a sample part.

It is unlikely that there are patent opportunities for our project. In order to qualify for a patent, the invention must be novel, useful, and nonobvious. While our design is useful, it is not novel and it is obvious. To make the design novel and nonobvious, we would have to add far more features than just 3-note chord identification. It may be novel in the sense that this specific project has not been done with our exact components, but it seems that there are better chord identifiers that already exist, albeit with different implementations.

Appendices

Appendix A: Permissions

The group approves this report for inclusion on the course website. The group approves the video for inclusion on the course youtube channel.

Appendix B: Division of Work

Throughout the course of the project, we all worked together, mostly in-person during open lab hours as well as our assigned weekly lab.

After deciding we wanted to work on a musical project, we all researched different potential project ideas, including instrument identification and making a digital instrument such as a violin on a breadboard. Ariana wrote up the final project proposal and Jeff and Tiffany helped to edit it it before submission.

During Week 1, we all worked together to implement the note identifier. During Week 2, we expanded the project to include chord identification. Jeff worked on the chord identification mapping outside of lab, since they had the best knowledge of music theory. During Week 3, Jeff finished up the chord identification code, while Tiffany and Ariana designed the low pass filter to improve the FFT. During Week 4, the three of us worked together to finish up the project.

For the final report, the sections were split up across the three of us. Tiffany worked on putting the website together.

Appendix C: References Used

Appendix D: Code

This is a link to the code for our chord identifier in our Github repository.