Decoding toolbox to analyze MEG/EEG evoked responses (beta version)

Matlab codes described in the manuscript Decoding proprioceptive and tactile stimuli from MEG signals: a feasibility study and comparison of methods.

Download toolbox from here. The main script to perform analysis is called decoding_main.m.

Note: the code is dependent on and includes two other packages: glmnet_matlab and mne-matlab-master. Please check the web pages to see how to reference them.

Spectrospatial decoding toolbox

Spectrospatial Decoding Toolbox (SpeDeBox) is a Matlab toolbox designed for the spectrospatial analysis of multichannel biomedical signals, acquired, e.g., using magnetoencephalography (MEG), electroencephalography (EEG) or electromyography (EMG). The version 2.0 of the toolbox has been published in September 2015. The methods implemented in toolbox are described in the following publications:

Kauppi, J-P., Hahne, J., Müller, K-R., and Hyvärinen, A. Three-way analysis of spectrospatial electromyography data: classification and interpretation, PLOS ONE 10(6), e0127231, 2015. Available here

Kauppi, J-P., Parkkonen, L., Hari, R., and Hyvärinen, A. Decoding magnetoencephalographic rhythmic activity using spectrospatial information, NeuroImage 83:921-936, 2013. Available here

Download SpeDeBox v.2.0 from here

Contents

Overview

SpeDeBox combines unsupervised and supervised learning methods to gain insights into interesting spectrospatial structures in data. SpeDeBox implements three classifiers:
The implementation of the classifiers rely on the GLMNET and L1General toolboxes when estimating sparse logistic regression models. The method also uses Fourier-ICA as a preprocessing tool. All these packages are provided together with the SpeDeBox.

The toolbox performs classification on the basis of epochs (time-windows) extracted from multichannel data. Thus, to carry out the analysis, category labels for each time-point must be provided together with an experimental data. For instance, different stimuli applied during experiments can serve as distinct categories. The analysis provides both the accuracy of the classification and the visualization of relevant spectrospatial patterns. Binary classification problem is a special case, where one set of discriminative spectrospatial patterns is provided (category 1 versus category 2). In a multicategory case, each category is characterized by its own set of discriminative spectrospatial patterns.

Input data format

To perform data analysis with the SpeDeBox, two variables must be available in the Matlab workspace: 1) the variable containing the actual multichannel data, and 2) the variable containing category labels of the data. Category labels must be provided as discrete numeric values values: {1,2,...,K}, where K is the total number of categories. Both data and class label variables can be stored e.g. as a mat-file with a specific variable name (such as Data and Labels), and then loaded into the Matlab's workspace using the load-command. Example analysis 2 demonstrates how to load the two variables, Data and Labels into the Matlab. It is assumed that the types of the variables are either double or single.

There are three alternative ways to provide multichannel data for the SpeDeBox:
  1. Raw format, where an input data set is a C by M real-valued matrix, where C is the number of sensors and M is the length of the time-series (i.e., a total number of time-points). In this case, category labels must be provided as an M-dimensional vector, i.e., each time-point must have a category label.  Example analysis 2 below uses this data representation.
  2. Epoch format, where an input data set is a C by N by T real-valued tensor, where C is the number of sensors, N is the is the number of epochs and T is the length of the epoch (i.e., time-window). In this case, category labels must be provided as an N-dimensional vector, i.e., each epoch must have a category label available. Example analysis 1 below uses this data representation.
  3. Spectral format, where an input data sets is a C by N by F complex-valued tensor, where C is the number of sensors, N is the is the number of epochs and F is the number of frequency bins in the epoch. In this case, category labels must be provided as an N-dimensional vector, i.e., each epoch must have a category label available. This representation is like Epoch format, but each epoch has been transformed into the Fourier-domain. 
Note: The actual classifiers implemented in the SpeDeBox are designed to process data sets in the Spectral format. Thus, if a user provides data sets for the toolbox in the Epoch format, the toolbox transforms them automatically into the Spectral format. In addition, if the data sets are provided for the SpeDeBox in the Raw format, the toolbox transforms the data sets first into the Epoch format and subsequently into the Spectral format. This preprocessing takes place in the function preprocess_data. The function automatically checks if the data sets are provided as real-valued matrices (Raw format), real-valued tensors (Epoch format), or complex-valued tensors (Spectral format), and processes the data accordingly.

Note: For the final analysis, the input data set is further split into a training, validation, and test data set. See examples below how this can be performed.

Citations

Please cite the following papers if you use the toolbox in your research.

Spectral LDA classifier:

Kauppi, J-P., Parkkonen, L., Hari, R., and Hyvärinen, A. Decoding magnetoencephalographic rhythmic activity using spectrospatial information, NeuroImage 83:921-936, 2013.

Friedman, J, Hastie, T, and Tibshirani, R. Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software 33(1),:1-22, 2010.

A. Hyvärinen, P. Ramkumar, L. Parkkonen and R. Hari, Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis, NeuroImage 49(1):257-271, 2010

Sparse bilinear classifier:

Kauppi, J-P., Hahne, J., Müller, K-R., and Hyvärinen, A. Three-way analysis of spectrospatial electromyography data: classification and interpretation, PLOS ONE 10(6), e0127231, 2015

Schmidt, M, Fung, G, Rosales, R. Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches, European Conference on Machine Learning (ECML), 2007

A. Hyvärinen, P. Ramkumar, L. Parkkonen and R. Hari, Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis, NeuroImage 49(1):257-271, 2010

Three-way DA classifier:

Kauppi, J-P., Hahne, J., Müller, K-R., and Hyvärinen, A. Three-way analysis of spectrospatial electromyography data: classification and interpretation, PLOS ONE 10(6), e0127231, 2015

Friedman, J, Hastie, T, and Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software 33(1),:1-22, 2010

A. Hyvärinen, P. Ramkumar, L. Parkkonen and R. Hari, Independent component analysis of short-time Fourier transforms for spontaneous EEG/MEG analysis, NeuroImage 49(1):257-271, 2010

If you use the example EMG data set provided in the analysis example 2 below, please cite:

Hahne, J. M., Graimann, B., & Müller, K. R. Spatial filtering for robust myoelectric control, IEEE Transactions on Biomedical Engineering 59(5): 1436-1443, 2012.

Examples

Here, two example Matlab scripts are presented to illustrate the analysis with the SpeDeBox. The first script is an example analysis with two-category synthetic data, without using Fourier ICA or visualization functions. The second script is an example analysis with the real six-category EMG hand-movement data, including the Fourier ICA and the visualization of the most discriminative spectrospatial patterns. You can directly copy and paste these scripts to the Matlab's editor or command line. Alternatively, these same scripts can be found from the m-files analysis_example1 and analysis_example2 of the SpeDeBox.
            This example illustrates the synthetic data analysis with the SpeDeBox through the following steps:
	
% ***********************************************************************
% Set SpeDeBox to Matlab's search path
% ***********************************************************************

% Before performing the analysis, make sure that the toolbox
% and all its subfolders are set to Matlab's path. It is also highly recommended
% to set the directory of the toolbox as the Matlab's current directory.
% If these steps have not been already carried out, below is a script
% to perform this. Verify that the path is correct in case you
% have several copies of the toolbox available in your file system.

z = which('analysis_example1.m'); % Find full filename to this m-file
       f = fileparts(z); % separate filename and pathname
disp('Verify that the toolbox is in the following path:')
disp(f) % visualize pathname
        addpath(genpath(f)) % add path and all its subfolders to Matlab's path
cd(f) % set toolbox directory as current Matlab's directory
 

% ***********************************************************************
% Generate synthetic data
% ***********************************************************************

T = 50; % total number of time-points in each epoch
C = 64; % total number of sensors
N = 500; % total number of epochs
K = 2; % total number of categories

% Generate category labels:
labels = 1 + floor(K*rand(1,N));

% Generate random data in Epoch format:
data = rand(C,N,T);

% add class-specific information to randomly selected channels:

for m = 1:K
rnCol = randperm(size(data,1));
data(rnCol(1:4),labels == m,:) = 0.1*m + data(rnCol(1:4),labels == m,:);
end


% ***********************************************************************

% Set preprocessing parameters
% ***********************************************************************


% Select preprocessing options:

% Set parameters of the STFT:
params.general.samplfreq = 1000; % sampling frequency in Hz
params.general.minfreq = 1; % minimum frequency of interest in Hz
params.general.maxfreq = 500; % maximum frequency of interest in Hz

params.general.do_ica = 0; % do not run Fourier-ICA


% ***********************************************************************
% Set classifier parameters
% ***********************************************************************

% set regularization sequences for the Sparse bilinear classifier:
params.sparse_bilinear.lambdas_comp = [10 20 30]; % regularization sequence of spatial coefficients
params.sparse_bilinear.lambdas_freq = [0]; % regularization sequence of frequency coefficients
% Note: Forcing higher values for lambdas_comp lead to more easily interpretable models
% with less number of relevant components.


% set rank (number of coefficient vectors) of the classifiers for the Sparse bilinear
% and the Three-way DA classifiers:

params.sparse_bilinear.modelRank = 1;
params.three_way_DA.modelRank = 3;


% ***********************************************************************
% Split data into training, validation and test data sets

% ***********************************************************************

% Split data set such that:
% the first 1/3 of the data is used for classifier training
% the second 1/3 of the data is used for validation of the classifier hyperparameters
% the last 1/3 of the data is used for evaluating the classifier performance for unseen data

% Note: This procedure for classifier testing is not comprehensive and could be improved
% e.g. by using a 10-fold cross-validation.


% number of samples used for training, validation and testing:

nr_samples_train = round(N/3);
nr_samples_validation = round(N/3);
nr_samples_test = round(N/3);

% train, validation and test data set indices:
samples_train = 1:nr_samples_train;
samples_validation = (nr_samples_train+1):(nr_samples_train+nr_samples_validation);
samples_test = (nr_samples_train+nr_samples_validation+1):N;

% obtain train, validation and test data sets:
data_train = data(:,samples_train,:); % training data set (first 1/3 of the data)
labels_train = labels(samples_train); % training labels

data_validation = data(:,samples_validation,:); % validation data set (second 1/3 of the data)
labels_validation = labels(samples_validation); % validation labels

data_test = data(:,samples_test,:); % test data set (last 1/3 of the data)
labels_test = labels(samples_test); % test labels


% ***********************************************************************
% Preprocess data using STFT
% ***********************************************************************


% The following function performs the selected preprocessing options (STFT, ICA).
% Outputs are Fourier-transformed complex-valued epochs, class labels of the epoch,
% and the estimated complex-valued mixing matrix (A).
[data_train,data_validation,data_test,labels_train,labels_validation,...
labels_test,A] = preprocess_data(data_train,data_validation,...
data_test,labels_train,labels_validation,labels_test,params);

% Take the absolute value of the Fourier coefficients as the classifier inputs:
data_train = abs(data_train);
data_validation = abs(data_validation);
data_test = abs(data_test);


% ***********************************************************************
% Classifier training and testing
% ***********************************************************************

% Train and validate Sparse bilinear classifier:

[Classifier_final1,resu1] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'sparse_bilinear',params);
% Train and validate Spectral LDA classifier:
[Classifier_final2,resu2] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'spectral_LDA',params);
% Train and validate Three-way DA classifier:
[Classifier_final3,resu3] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'three_way_DA',params);


% ***********************************************************************
% Display classification accuracies

% ***********************************************************************

% Classification performance is saved in the structs:
% resu1 Sparse bilinear
% resu2 Spectral LDA
% resu3 Three-way DA
% Plot classification accuracy of the test data with the accuracy of two decimals:


disp(' ')
disp('Classification accuracy for the test data set:')
disp(['Sparse bilinear: ' num2str(round(resu1.acc*1e4)/1e2) '%'])
disp(['Spectral LDA : ' num2str(round(resu2.acc*1e4)/1e2) '%'])
disp(['Three-way DA : ' num2str(round(resu3.acc*1e4)/1e2) '%'])
disp(' ')

% ***********************************************************************

          This example illustrates the real EMG data analysis with SpeDeBox.
           
          To begin with, load the real EMG data set from here and save the file under the data-subfolder of the SpeDeBox.
          After this, follow the steps listed below to carry out the analysis using the Matlab:


% ***********************************************************************
% Set SpeDeBox to Matlab's search path
% ***********************************************************************

% Before performing the analysis, make sure that the toolbox
% and all its subfolders are set to Matlab's path. It is also highly recommended
% to set the directory of the toolbox as the Matlab's current directory.
% If these steps have not been already carried out, below is a script
% to perform this. Verify that the path is correct in case you
% have several copies of the toolbox available in your file system.

z = which('analysis_example2.m'); % Find full filename to this m-file
       f = fileparts(z); % separate filename and pathname
disp('Verify that the toolbox is in the following path:')
disp(f) % visualize pathname
        addpath(genpath(f)) % add path and all its subfolders to Matlab's path
cd(f) % set toolbox directory as current Matlab's directory 
	
% ***********************************************************************
        % Load real EMG data set
% ***********************************************************************


% Give the mat-file name containg C by N data matrix and class label vector of length(N):
filename = 'EMG_preproc_hand_subj1.mat';

% Note! Before performing this part, copy the real EMG data through the link
% above to your computer.
 It is assumed that the mat-file is saved under the
% SpeDeBox -folder or one of its subfolders (e.g. subfolder "data") which are
% under the Matlab's search path. If the file is not under the Matlab's search
% path, you must provide the string containing a full pathname to the
% file:
filename = '/.../.../EMG_preproc_hand_subj1.mat';

load(filename) % load the EMG data to Matlab's workspace.
%
Data is a C by N matrix containing the actual data.
% Labels contains class labels {1,2,3,4,5,6} for each time-point.


% For simplicity of visualization in this demonstration, we concentrate on
% 2-category classification and remove rest of the data:

Data(:,Labels>=3)=[];
Labels(Labels>=3)=[];

% initialize random number generator to make re-generation of the results possible:
rng(3,'twister')


% ***********************************************************************
% Set preprocessing parameters
% ***********************************************************************


% Set parameters of the STFT:
params.general.samplfreq = 2500; % sampling frequency in Hz
params.general.minfreq = 20; % minimum frequency of interest in Hz
params.general.maxfreq = 60; % maximum frequency of interest in Hz


% Because the data sets are here provided in the "Raw format", epoch length and
% epoch overlap must be given:
params.general.windowlength_sec = 0.2; % time-window length in seconds
params.general.overlapfactor = 1; % step size of the time-window
% overlapfactor = 1 means step size of windowlength
% overlapfactor = 2 means step size of windowlength/2
% overlapfactor = 3 means step size of windowlength/3, and so on.

% We also apply Fourier-ICA:
params.general.do_ica = true; % run Fourier-ICA

% Set parameters of the ICA:

params.ica.pcadim = 30; % PCA dimension (corresponds here to the number of independent components)
params.ica.complexmixing = true; % complex-valued (true) or real-valued (false) mixing matrix
  params.ica.maxiter = 2000; % maximum number of iterations


% ***********************************************************************
% Set classifier parameters
% ***********************************************************************

% set regularization sequences for the Sparse bilinear classifier:
params.sparse_bilinear.lambdas_comp = [10 20 30]; % regularization sequence of spatial coefficients
params.sparse_bilinear.lambdas_freq = [0]; % regularization sequence of frequency coefficients
% Note: Forcing higher values for lambdas_comp lead to more easily interpretable models
% with less number of relevant independent components.


% set rank (number of coefficient vectors) of the classifiers for the Sparse bilinear and the Three-way DA classifiers:
params.sparse_bilinear.modelRank = 1;
params.three_way_DA.modelRank = 3;


% ***********************************************************************
% Split data into training, validation and test data sets
% ***********************************************************************

% Split data set such that:
% the first 1/3 of the data is used for classifier training
% the second 1/3 of the data is used for validation of the classifier hyperparameters
% the last 1/3 of the data is used for evaluating the classifier performance for unseen data

% Note: This procedure for classifier testing is not comprehensive and could be improved
% e.g. by using a 10-fold cross-validation.


% number of samples used for training, validation and testing:
nr_samples_train = round(size(Data,2)/3);
nr_samples_validation = round(size(Data,2)/3);
nr_samples_test = round(size(Data,2)/3);

% train, validation and test data set indices:
samples_train = 1:nr_samples_train;
samples_validation = (nr_samples_train+1):(nr_samples_train+nr_samples_validation);
samples_test = (nr_samples_train+nr_samples_validation+1):size(Data,2);

% obtain train, validation and test data sets:
data_train = Data(:,samples_train); % training data set (first 1/3 of the data)
labels_train = Labels(samples_train);

data_validation = Data(:,samples_validation); % validation data set (second 1/3 of the data)
labels_validation = Labels(samples_validation);

data_test = Data(:,samples_test); % test data set (third 1/3 of the data)
labels_test = Labels(samples_test);


% ***********************************************************************
% Preprocess data using Fourier-ICA
% ***********************************************************************


% The following function performs the selected preprocessing options (STFT, ICA).
% Outputs are Fourier-transformed complex-valued epochs, class labels of the epoch,
% and the estimated complex-valued mixing matrix (A).
[data_train,data_validation,data_test,labels_train,labels_validation,...
labels_test,A] = preprocess_data(data_train,data_validation,...
data_test,labels_train,labels_validation,labels_test,params);

% Take the absolute value of the Fourier coefficients as the classifier inputs:
data_train = abs(data_train);
data_validation = abs(data_validation);
data_test = abs(data_test);


% ***********************************************************************
% Classifier training and testing
% ***********************************************************************

% Train and validate Sparse bilinear classifier:

[Classifier_final1,resu1] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'sparse_bilinear',params);
% Train and validate Spectral LDA classifier:
[Classifier_final2,resu2] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'spectral_LDA',params);
% Train and validate Three-way DA classifier:
[Classifier_final3,resu3] = train_and_test_classifier(data_train,data_validation,data_test,labels_train,labels_validation,labels_test,'three_way_DA',params);


% ***********************************************************************
% Visualize spectrospatial patterns of the final classifiers
% ***********************************************************************

% Here, we visualize spectrospatial characteristics captured by Spectral LDA and Sparse and bilinear classifiers.


% Generate a position map of the EMG 4*12 channel electrode array:
resolution = 0.008;
nrRows = 4;
nrColumns = 12;
p1 = repmat((0:resolution:(resolution*(nrColumns-1)))',nrRows,1);
p2 = repmat((0:resolution:(resolution*(nrRows-1)))',1,nrColumns)';
pos_map = [p1 p2(:)];

% Pick one of the two electrode arrays for visualization:
secondElectrode = 49:96; % visualize second electrode
  spatialPatterns = A(secondElectrode,:);
% Or visualize first electrode:
  % firstElectrode = 1:48;
% spatialPatterns = A(firstElectrode,:);

phasemap_th = 0.5; % magnitude threshold used in the visualization of the phase pattern
max_nr_components = inf; % The maximum number of spatial patterns visualized
% Set this value to inf to plot the entire information corresponding to all nonzero coefficients.
Note: the maximum number of components can be set lower value to allow easier visualization for complex models.

% visualize spectral LDA classifier:
relevantCoefficients2 = visualize_patterns_spectral_LDA(Classifier_final2,spatialPatterns,pos_map,phasemap_th,max_nr_components,data_train,labels_train,params);
% Visualize sparse bilinear classifier:
relevantCoefficients1 = visualize_patterns_sparse_bilinear(Classifier_final1,spatialPatterns,params.sparse_bilinear,pos_map,phasemap_th,max_nr_components,params);

% Both classifiers provide information about class-discriminative independent components:
%
% Spectral LDA classifier visualization includes:
% 1) Spatial maps of the discriminative independent components, including both magnitude and phase maps
% 2) Spectral filter of these components, denoting class-discriminative frequencies estimated by LDA
% 3) Actual spectra of these components, averaged across epochs
%
% Sparse and bilinear visualization includes:
% 1) Spatial maps of the discriminative independent components, including both magnitude and phase maps
% 2) Category-specific spectral filter(s) denoting class-discriminative frequencies
%
% See our papers for more details.



% ***********************************************************************
% Display classification accuracies
% ***********************************************************************

% Classification performance is saved in the structs:
% resu1 Sparse bilinear
% resu2 Spectral LDA
% resu3 Three-way DA
% Plot classification accuracy of the test data with the accuracy of two decimals:


disp(' ')
disp('Classification accuracy for the test data set:')
disp(['Sparse bilinear: ' num2str(round(resu1.acc*1e4)/1e2) '%'])
disp(['Spectral LDA : ' num2str(round(resu2.acc*1e4)/1e2) '%'])
disp(['Three-way DA : ' num2str(round(resu3.acc*1e4)/1e2) '%'])
disp(' ')

% ***********************************************************************

Contact

If you find a bug or need more help with the use of the toolbox, please contact either Jukka-Pekka Kauppi jukka-pekka.kauppi@jyu.fi or Prof. Aapo Hyvärinen aapo.hyvarinen@helsinki.fi

License

Copyright 2015 Jukka-Pekka Kauppi

Spectrospatial decoding toolbox (SpeDeBox) is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

SpeDeBox is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

History

The first version of the toolbox was published in 2013, consisting of Spectral LDA classifier with a synthetic data analysis example. Version 2.0 was published in September  2015, including additionally Sparse bilinear and Three-way DA classifiers as well as a real data example. Most source code was re-written because the original code contained a high number of user parameters and it was difficult to set all these parameters correctly. The current version requires less user parameters but slightly more manual customization (such as splitting of training, validation and test data sets), having the advantage that the code is easier to follow and adapt for different needs.