Identifying multiple sources of signals in a continuous analog timeseries is a challenge faced by several fields (e.g. neuroscience: identifying neuron spikes from background noise in electrode recordings, audio: identifying sounds of interest from background noise).

Generally, the approaches involve multiple steps. First, it is to improve signal-to-noise ratio by filtering noise and amplifying signals of interest using different types of filters. Generally, this involves using lowand high-pass filters to cut out signals other than the signals of interest. Sometimes notch filters are used to remove specific sources of noise (like 50/60 Hz AC power oscillation for electrode recordings). Once that is done, methods such as threshold-based cutoff (used for spike detection) or short-time Fourier transform or wavelet transforms (for sound of interest detection) are used to pick out signals of interest.

In a sense, all these problems are a version of the cocktail party problem. The aim is to pick out one voice (or one neuron’s activity) out of a sea of continuously changing background noise. Traditional approaches required identifying the characteristics of the signal of interest using that to extract it.

Could these be done in a semi-supervised way using the advances in DNNs behind LLMs? Specifically, could the advances in Transformer networks (that use attention mechanisms) be used to quickly train networks to pull out individual signals from a combined/summed signal whose characteristics might be changing?