Overview

Inspiration

We chose this project as an exploration into the field of digital-signal processing. Initially, we expected this would be a software-heavy, hardware-light project. This was preferred over something more hardware-involved. It ended up being an everything-heavy project, but we learned a lot.

Modern synths can be found in a variety of different forms, such as purely software ones as well as analog / modular ones. To keep our project well-balanced in terms of hardware and software components, we chose to add hardware input / output on top of the digital sound synthesis.

When looking at how we wanted to create the hardware interface, we turned to audio synthesizers from Moog in the late 20th century with their array of knobs and buttons to see how we could organize our interface.

Structure
Our project was split into two main parts: the audio synthesis / FFT component and the hardware interfacing component. This allowed for both parts to be worked on concurrently by predefining a set list of variables that the audio side would need to access from the hardware interfacing side to take in inputs. We also separated the code into logical sections—keeping the FFT with FFT, rotary encoder with rotary encoder, etc.—which made it easier to debug and keep track of where things were in our large, single file.
Patents, Copyrights, Trademarks

The commercial synths on the market are more robust and usually will have more features than what we have here but reverse engineering the basic components of one was a fruitful learning experience. We don’t see any copyright or trademark infringements due to this project as we did not reference commercial code or inspect outside hardware and instead opted to work off the basic ideas all synths use.

Tradeoffs

To keep the workload contained to a reasonable amount for a month-long project, we chose to limit features to the basic components one would find in a synth. Our sound design has 4 main features: waveshapes, in-place FFT filtering, LFO, and ADSR envelopes. We decided to implement in-place FFT filtering as opposed to building a circuit for filtering. This allowed us to maximize the space available on our breadboard for user-interactive hardware.

Physical hardware aspects of our project required: - Four buttons (TFT menu selection)
- One potentiometer slider (adjusting variables)
- One rotary encoder (selecting variables to be adjusted)
- Twelve piano “keys” connected via wires to a capacitive touch sensor module

We had initially considered including leds to display amplitude levels and using two oscillators. However, we did not have enough GPIO pins left by the end of the project to include the leds. We compromised on one oscillator rather than one on each core to save us time, hardware space, and code.

Direct digital synthesis is only one of many methods of audio synthesis. A quick list of other approaches can be found here.

Background
Fixed Point Arithmetic (Fix15)

Almost all of the calculations done in this project use fixed-point arithmetic. This allowed for faster calculations without doing them using floating point numbers. Specifically, we used a fixed point between the 14th and 15th bit of a signed int (fix15 in the code). A more thorough explanation on this is here.

Direct Digital Synthesis (DDS)

We used direct digital synthesis to produce four different wave shapes: sinusoidal, square, triangle, and sawtooth. The provided sinusoidal wavetable from Lab 1 was used as a model for how to generate the other waveforms.
Four wavetables, one for each waveform, were created, all initialized to be the same length. A for-loop iterates through each index of the wavetables. At every index, amplitude values, which are calculated as functions of the index, are assigned to each wavetable. The wavetables are populated at once by iterating through the for-loop. Each array constitutes one period of the waveform.
The maximum possible waveform amplitude (before overflow errors occur) was used. This was a default setting for the sinusoidal waveform that we used for the others. Note that the generated waveforms are shifted so that their maximum and minimum values lie at or above 0. Generated waveforms needed to be non-negative in order to avoid output errors, and to be audible.
Output code samples from the wavetables using an incrementer and outputs these values to the DAC, generating the desired wave shape in sound.

Fast Fourier Transform (FFT)

Every signal (in the time-domain) has a frequency-domain counterpart. Essentially, any complex signal, no matter how weirdly-shaped, can be decomposed into a sum of multiple sinusoidal waves at different frequencies. A fourier transform takes as input a time-domain signal, and outputs the frequencies of the sinusoids that make it up to the frequency-domain. Frequencies that have a larger “effect” on the time-domain signal than others are expressed by large amplitudes in the frequency-domain.
Applying the FFT on our wavetables allows us to modify the intensity of frequencies of that wave, effectively allowing us to filter them.

Inverse FFT

After modifying the frequency domain output of the FFT, we needed to convert that back to the time domain in order for it to be output through the DAC. To do this, we sent the modified FFT output into an inverse FFT (iFFT) function.
The output of the iFFT needed some amplitude scaling before we could use it as our new, filtered wavetable. Using serial output to compare the waveform before and after FFT followed by iFFT was performed, we determined that the waveform needs to be scaled up by a factor of four after iFFT is performed.
More info on FFTs can be found here.

Filtering

In-between FFT and iFFT, we perform frequency-domain filtering. Three filters were implemented: low-pass, high-pass, and band-pass.
- A low-pass filter only “passes” low frequencies (below a cutoff frequency) that compose a time-domain signal, and “blocks” all other frequencies.
- A high-pass filter only “passes” high frequencies (above a cutoff frequency) that compose a time-domain signal.
- A band-pass filter only “passes” frequencies within a middle range (between two cutoff frequencies) that compose a time-domain signal.
FFT bins within our desired frequency range were left alone, while FFT bins outside of it were either zeroed-out or reduced by a factor of 2 (if they were on the edge of the cutoff frequency).
The effect of filtering changes the shape of the time-domain signal. Filtered audio signals sound different than their unfiltered counterparts. Low-passed signals sound “softer”, high-passed signals sound “sharper”, and band-passed signals are somewhere in-between. User input is taken to determine what kind of filter—if any—is to be implemented, and what its cutoff frequency is.

Low Frequency Oscillator (LFO)

A low-frequency oscillator (LFO) signal is used to modulate the cutoff frequencies of the filters. The effect is that the frequency of the output changes in time as its cutoff frequencies grow and decay with the LFO. LFO can only be applied if filtering is applied, otherwise there is no effect.
LFO wavetables (sine, square, triangle, and sawtooth) were populated in the same for-loop along with the sound wavetables. When LFO is activated, the cutoff frequencies are calculated in real-time by multiplying their base (user input) value by an intermittently sampled LFO wavetable. Thus, the cutoff frequencies change with time according to the selected LFO waveshape.

Envelopes (ADSR)

Each time a user touches a key, and the associated note frequency is selected, ADSR (Attack-Decay-Sustain-Release) is applied. An attack parameter is applied immediately to the amplitude of the output note: this quickly increases the amplitude to a peak. Once the output amplitude reaches a desired peak “attack” is over, and “decay” begins: the amplitude of the output quickly decreases to a desired level. Following “decay”, the output amplitude is “sustained” so long as the user continues touching the same key. After the user releases the selected key, “release” is used to slowly decrement the output amplitude of the note until it reaches zero (i.e. no longer played). ADSR is implemented using checking which key is “selected” by the user and applying a series of amplitude increments on its amplitude.

Thin Film Transistor Display (TFT)

The TFT was used to display sound parameters so the user would have an indication of what they were changing and what value they were setting it to. Functions were written to write to the TFT and later linked up with their respective variables. This allowed for decoupling during the coding process. The TFT library used came from a previous student’s project. The link to it is here.

Capcitive Touch

The capacitive touch library used can be found here. The sample code was modified for our uses. This was done by increasing the number of activated sensors to all 12, and placing all their needed parameters into an array.