Scalable software and models for large-scale extracellular recordings
Hurwitz, Cole Lincoln
The brain represents information about the world through the electrical activity of populations of neurons. By placing an electrode near a neuron that is firing (spiking), it is possible to detect the resulting extracellular action potential (EAP) that is transmitted down an axon to other neurons. In this way, it is possible to monitor the communication of a group of neurons to uncover how they encode and transmit information. As the number of recorded neurons continues to increase, however, so do the data processing and analysis challenges. It is crucial that scalable software and analysis tools are developed and made available to the neuroscience community to keep up with the large amounts of data that are already being gathered. This thesis is composed of three pieces of work which I develop in order to better process and analyze large-scale extracellular recordings. My work spans all stages of extracellular analysis from the processing of raw electrical recordings to the development of statistical models to reveal underlying structure in neural population activity. In the first work, I focus on developing software to improve the comparison and adoption of different computational approaches for spike sorting. When analyzing neural recordings, most researchers are interested in the spiking activity of individual neurons, which must be extracted from the raw electrical traces through a process called spike sorting. Much development has been directed towards improving the performance and automation of spike sorting. This continuous development, while essential, has contributed to an over-saturation of new, incompatible tools that hinders rigorous benchmarking and complicates reproducible analysis. To address these limitations, I develop SpikeInterface, an open-source, Python framework designed to unify preexisting spike sorting technologies into a single toolkit and to facilitate straightforward benchmarking of different approaches. With this framework, I demonstrate that modern, automated spike sorters have low agreement when analyzing the same dataset, i.e. they find different numbers of neurons with different activity profiles; This result holds true for a variety of simulated and real datasets. Also, I demonstrate that utilizing a consensus-based approach to spike sorting, where the outputs of multiple spike sorters are combined, can dramatically reduce the number of falsely detected neurons. In the second work, I focus on developing an unsupervised machine learning approach for determining the source location of individually detected spikes that are recorded by high-density, microelectrode arrays. By localizing the source of individual spikes, my method is able to determine the approximate position of the recorded neuriii ons in relation to the microelectrode array. To allow my model to work with large-scale datasets, I utilize deep neural networks, a family of machine learning algorithms that can be trained to approximate complicated functions in a scalable fashion. I evaluate my method on both simulated and real extracellular datasets, demonstrating that it is more accurate than other commonly used methods. Also, I show that location estimates for individual spikes can be utilized to improve the efficiency and accuracy of spike sorting. After training, my method allows for localization of one million spikes in approximately 37 seconds on a TITAN X GPU, enabling real-time analysis of massive extracellular datasets. In my third and final presented work, I focus on developing an unsupervised machine learning model that can uncover patterns of activity from neural populations associated with a behaviour being performed. Specifically, I introduce Targeted Neural Dynamical Modelling (TNDM), a statistical model that jointly models the neural activity and any external behavioural variables. TNDM decomposes neural dynamics (i.e. temporal activity patterns) into behaviourally relevant and behaviourally irrelevant dynamics; the behaviourally relevant dynamics constitute all activity patterns required to generate the behaviour of interest while behaviourally irrelevant dynamics may be completely unrelated (e.g. other behavioural or brain states), or even related to behaviour execution (e.g. dynamics that are associated with behaviour generally but are not task specific). Again, I implement TNDM using a deep neural network to improve its scalability and expressivity. On synthetic data and on real recordings from the premotor (PMd) and primary motor cortex (M1) of a monkey performing a center-out reaching task, I show that TNDM is able to extract low-dimensional neural dynamics that are highly predictive of behaviour without sacrificing its fit to the neural data.