Skip to content

TOMORI233/mysort

Repository files navigation

README

1. Introduction

This sorting method is simply based on PCA and K-means.

By now, overlapping spikes are ignored.

mysort is for TDT Block data in .mat format. It should contain at least fields named streams or snips. streams should contain fields named Wave, which contains fields data (a m*n matrix of entire recorded waves, channels along row), fs (sampling rate, in Hz) and channel (a m*1 vector specifying channel numbers). snips should contain fields named data (a m*p matrix of waveforms of spikes, waveform channel number along row and waveform points along column), fs, chan (a m*1 vector specifying channel number of each waveform).

2. Instructions

Before you start using mysort, you should first download MATLABUtils and add it to your matlab path.

git clone git@github.com:TOMORI233/MATLABUtils.git

The latest supported version is MATLAB 2019b.

2.1 Mysort

See mysort.m for more detailed information.

  1. To add mysort to your MATLAB path, in MATLAB command line type in
>> addpath(genpath(your root path/mysort))
>> savepath
  1. To sort your single-channel TDT block data

You can first use:

sortResult = mysort(data, [], "origin"); % If K is not specified, an optimum K will be used

This will sort spike waveforms of your original block data using an optimum K generated by Gap Statistic for K-means.

If you are not satisfied with the original spike waveform length, you can specify waveLength in mysort.m and use:

sortResult = mysort(data, [], "origin-reshape");

If you want to reselect a threshold for spikes, use:

sortResult = mysort(data, [], "reselect");

Or use

sortResult = mysort(data); % default, same with mysort(data, [], "reselect")

This will plot a time-wave curve of at most 100 seconds for preview. When a threshold (in volts) is input in MATLAB command line, the sorting process continues. For multi-channel sorting, th is required for every channel.

Still not satisfied? You can specify CVCRThreshold (cumulative variance contribution rate threshold, default: 0.9) for PC dimensions selection or convergence condition for K-means in mKmeans.m (usually a minimum ratio of relative cluster center shift in Euclidean distance, default: 0.1).

Also you can use MATLAB pca and kmeans functions instead of mPCA and mKmeans in spikeSorting.m.

  1. To sort your multi-channel TDT block data

You can specify channels to sort with the second parameter of mysort. Usually channels is a vector containing the channel numbers to be sorted one by one. If it is left empty, all channels will be sorted.

% e.g. There 32 channels of waves in your data
% sort specified channels only with raw wave
channels = [1, 2, 14, 20];
sortResult = mysort(data, channels, "reselect");

% sort all channels with raw wave
sortResult = mysort(data, [], "reselect");

% sort all channels with original spike waveforms
sortResult = mysort(data, [], "origin-reshape");
  1. If you consider some clusters as redundant ones, you can specify a K like this:
% with user-specified K
sortResult = mysort(data, channels, "reselect", K);

% default: use gap statistic to find an optimum K
sortResult = mysort(data, channels, "reselect", "gap");

% use elbow method to find an optimum K
sortResult = mysort(data, channels, "reselect", "elbow");

% use gap statistic but also return results of elbow method
sortResult = mysort(data, channels, "reselect", "both");

% preview 3-D PCA data and input a K
sortResult = mysort(data, channels, "reselect", "preview");
  1. For more detailed settings, specify your own sortOpts as the last parameter input of mysort. See defaultConfig.m in folder config for more information about sortOpts.
sortResult = mysort(..., sortOpts);
  1. To view result, use:
% to select an optimum K
plotSSEorGap(sortResult);

% view clusters in 3-D PCA space. Also you can specify the second parameter with a 2-element vector, 
% which will show clusters in 2-D PCA space (default: [1 2]).
plotPCA(sortResult, [1, 2, 3]);

% view waves and templates of different clusters
plotWave(sortResult);

% spike amplitude distribution histogram
plotSpikeAmp(sortResult);

% histogram of normalized SSE of each template on each cluster
plotNormalizedSSE(sortResult);

2.2 Template Matching

Template matching is based on sum of square error of normalized spike PCA data and template PCA data. With critical value (cv) at prominence level p, define waveforms with SSE > cv as noise.

In some case, you have several recordings of different protocols from one cell. Usually the spike waveforms among files are the same. You can sort one file using mysort and apply templateMatching to other files.

To sort a long recording, you can also sort a small part of it at first and apply templateMatching to the rest premised on cell invariance.

% 1. Sort data0 with mysort
sortResult0 = mysort(data0);

% 2. Match templates of data0 in data1
sortResult1 = templateMatching(data1, sortResult0);

2.3 Sort with other data struct

batchSorting is for waves or waveforms from any recording platform. For multi-channel data, it runs in loops of sorting every single channel, considering each channel to be independent.

% Specify your own sorting options
run('defaultConfig.m');
sortOpts = defaultSortOpts;
sortOpts.th = 1e-5; % same unit as your wave data, this option only works with sorting raw waves
sortOpts.fs = 24.43e3; % Hz, this option is necessary for using batchSorting only
% For other parameters, see BATCHSORTING

% 1. Use raw wave data
% waves is an m*n matrix, with channels along row and sampling points along column
% channels is an m*1 column vector, which specifies the channel number of each wave sample
result = batchSorting(waves, channels, sortOpts, "raw_wave");

% Or
% 2. Use extracted waveforms
% Waveforms is an m*n matrix, with channels along row and waveform points along column
% channels is an m*1 column vector, which specifies the channel number of each waveform
result = batchSorting(waveforms, channels, sortOpts, "spike_wave");

2.4 Re-cluster

recluster is for merging and splitting in 2-D PCA view. You can generate a polygon to select spikes whose cluster index you want to alter. After selection, right click the PCA view and click confirm to continue with your work. Or you can also preview waveforms, PCA, spike amplitude and SSE of the selected region, and redo the selection if unsatisfied.

result = mysort(data);
v = validateInput(["non-negative", "integer"], "Please input a cluster number for reclustering: ");
% 1 - Exclude noise from selected points
selectedIdx = recluster(result, [1, 2]);
result.clusterIdx(selectedIdx & ~logical(result.noiseClusterIdx)) = v;
% 2 - Multi-dimension selection
selectedIdx1 = recluster(result, [1, 2]);
selectedIdx2 = recluster(result, [2, 3]);
result.clusterIdx(selectedIdx1 & selectedIdx2 & ~logical(result.noiseClusterIdx)) = v;

3. Algorithm

3.1 PCA and K-means

See docs for detailed information about PCA and K-means algorithm.

3.2 Template matching

Template matching is based on sum of square error of normalized spike PCA data and template PCA data. With critical value (cv) at prominence level p, define waveforms with SSE > cv as noise.

First, extract spikes and their waveforms from raw data using sorting options of sortResult0, which is the sorting result generated by mysort functions.

Second, apply PCA on [Waveforms; templates], where Waveforms is waveform data to sort and templates is template waveforms of sortResult0.

Third, normalize PCA result column by column (along PCs).

Forth, compute SSE of each spike in each cluster.

Fifth, find the smallest-SSE cluster index for each spike.

% SSE definition
% pcaData_norm is normalized pca data. C_norm is normalized cluster center in pca.
for kIndex = 1:K
	SSE_norm(:, kIndex) = sum((pcaData_norm - C_norm(kIndex, :)).^2, 2);
end

About

spike sorting using PCA and kmeans

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages