Introduction

机器学习可以分为两大类,有监督学习和无监督学习。 Supervised and unsupervised machine learning Fig1

Unsupervised Learning(无监督学习)

  • 无监督学习的输入只有自变量
  • Unsupervised methods (not further considered here) attempt to find structure in the data without using knowledge of the conducted tasks (e.g. ICA).

Supervised Learning

  • 监督学习的输入既有自变量也有相应的响应变量(预测变量response) Fig2

    Learning phase.

  • 训练集的数据用于“学习”或者“训练分类器”阶段。在这一阶段通过将训练集的数据导入分类器可以求解出描述特征空间的函数, 这一函数可以描述哪一种大脑激活模式(input)对应哪一张分类标签(target),此外,我们还可以得到一个区分不同类别的决策边界(decision boundary)。As an example, consider a study presenting different images of object categories, such as “houses” and “faces”. All brain activation patterns in the training set evoked when subjects looked at houses would be labeled as “house class”, those patterns evoked when subjects looked at faces would be labeled as “face class” and so on.

    Testing phase

  • 在训练结束后,学习机(learning machine)应该不仅仅能区分学习过的活动模式,也应该能够区分新的同种类别的活动模式(test data)。要确定训练后的分类器的性能, 则需要通过将它用于新的样本来测试。用于测试分类器性能的数据叫做测试集。类别学习算法的典型假设是:训练集和测试集满足独立同分布的性质, 因此, 如果分类器确实掌握了特征和类别间的关系, 则应该能准确预测它从未见过但同属该类别的新模式(即测试集样本)。
  • Measures.
    1. 评价分类器的常用标准是预测率, 即被分类器正确预测的样本在所有测试样本中所占的比例。一个分类器正确地学习了所有的输入-目标对(input-target pairs),但在区分新刺激时只达到随机水平(chance level),这样的分类器是无用的。Poor generalization performance indicates overfitting, i.e. the classifier might have learned the training exemplars “too well”. Some neural network classifiers suffer from this problem. The support vector machine (SVM) on the other hand is a “smart” classifier because it avoids overfitting the data ensuring optimal generalization performance.
    2. 另外要提醒,ACC在临床上不被认可的。最少还需要计算出AUC,敏感性,特异性,并给出0.95的置信区间。如果能做一些可重复性实验,那就非常完备了。像 test-retest, 或者 reproducibility之类的。

交叉验证(Cross-Validation)

Fig3

  • 有监督学习中,在对模型进行验证时,最简单的方法是把整个数据集分成训练集(training set)和测试集(test set)。不过,这个简单的方法存在两个弊端:
    1. 最终模型与参数的选取将极大程度依赖于你对训练集和测试集的划分方法。
    2. 该方法只用了部分数据进行模型的训练
  • 基于这样的背景,有人就提出了Cross-Validation方法,也就是交叉验证。
    1. 交叉验证(Cross validation),有时亦称循环估计, 是一种统计学上将数据样本切割成较小子集的实用方法。于是可以先在一个子集上做分析, 而其它子集则用来做后续对此分析的确认及验证。 一开始的子集被称为训练集。而其它的子集则被称为验证集或测试集。
    2. 使用交叉验证的方法,可以避免上述问题,能够对模型进行更可靠的估计。
  • 在使用交叉验证时,需要考虑应该如何确定k的大小,使得该模型对解决相应的分类问题最为有效?如何在偏倚(bias)和方差(variance)之间寻求最佳的平衡点?

Application in affective neuroscience

EEG和fMRI直接或间接测量中枢系统的神经反应,其信号既包括空间信息,也有时间信息,是一种四维的信号。而皮肤电阻(skin resistance, SR), 呼吸、体温或心率等指标测量的是自主神经系统的神经反应,其信号只包括时间信息。

Sources of Physiological Signals

from http://biomedikal.in/2011/05/important-physiological-signals-in-the-body/

fMRI Multi-Voxel Pattern Analysis (MVPA)

  • Activation-based analysis versus Pattern-information Analysis (see Table1),两者有两点差异必须掌握:
    1. 统计学原理上的差异:Activation-based analysis is univariate analysis, whereas MVPA is multivariate analysis
    2. 分析对象上的差异: 前者分析的是单个体素BOLD反应的波动情况,后者分析的是多个体素指标的空间模式
  • 传统的脑成像数据分析主要在毫米-厘米尺度上,利用认知减法的原理探索各种不同心理过程所对应的大脑区域(基于激活的分析)。而越来越多的研究表明fmri信号在更微小的尺度上包含一些更精确的信息。模式分类器就可以用来实现这一目的——把大脑中不同的激活模式和相应的心理状态对应起来。例如,人们用MVPA发现了人脑V1视觉区对方向的灵敏性(sensitivity to orientation),而方向性的信息在传统的fmri分析中是做不出来的。
    Table1
  • MVPA(多体素模式分析)从方法上来看,主要使用了机器学习(Machine learning)中的模式分类(pattern classification)技术。而我们使用MVPA的主要目的(或任务)就是区分不同的心理活动的神经反应模式(Classifying data)。MVPA技术的核心原理是利用在不同认知状态下(类别classes)由多个体素信号形成的空间模式训练分类器(classifier)或者说学习机(learning machines), 再用独立的实验数据(testing data)测试分类器的性能。具体而言,如果我们根据大脑的活动模式区分实验条件类别的正确率高于随机水平,那就意味着活动模式携带着有关实验条件的信息。这一方法被称为读脑术(brain reading)或解码(decoding)(Cox and Savoy, 2003) 。
  • 在神经成像研究中, 分类器将不同特征(fMRI 数据中的每一个体素可作为一个特征)组合成一个特征空间(如由多个体素变量形成的多维空间)。特征空间的不同取值表现为不同的空间模式,称为样本(fMRI 数据中特定认知状态下由多个体素信号形成的空间模式即为一个样本)。已有的数据可以分成两大类:训练集和测试集(”training” and a “test” set)。
  • 有许多分类器可以用以构建大脑活动模式和心理状态之间的联结,例如线性判别分析(linear-discriminant analysis,LDA) 和支持向量机(support vector machines)。从大脑的区域上讲,MVPA既可以在某个特点的感兴趣脑区(ROI)进行,也可以对全脑的模式进行分类(searchlight mapping approach,Kriegeskorte, Goebel, Bandettini, 2006)。

fMRI Representational similarity analysis(RSA,表征相似性分析)

  • RSA可以视为MVPA方法的一种扩展。MVPA需要你提供实验条件的类别,然后训练个分类器看看能不能从大脑活动模式上区分不同的实验条件,其因变量的指标是分类器的正确率(是否大于随机水平);RSA则是直接计算不同实验条件的大脑模式之间的表征非相似性,不用分类进行训练分类器,其因变量的指标就是表征非相似性本身(活动模式本身)。两者的共同点就是分析的单位都是多体素模式。
  • RSA的原理很简单:先找出每个条件下某个特定ROI内部不同刺激对应的体素激活模式(见step4 Figure),再求出该条件下两两刺激反应模式的空间相关系数(across voxels),得到每个条件下的相关矩阵,经过数学转化得到表征非相似性矩阵(representational distance matrix,RDM)。一个RDM是一个一个正方形对称矩阵,矩阵的每个元素(entry)本质为相关系数,表示两个实验刺激(trial or block)所对应反应模式(response patterns)的相似性。然后比较不同实验条件的RDM。
  • RSA方法适用于多种实验设计,但最适合于条件众多的实验(condition-rich experiments)(Kriegeskorte et al. 2008, 2009)。例如Kriegeskorte 2008年为了比较人类和猴子两种不同物种颞下回对物体的表征,用不同的方法测量了这两种物种大脑(fmri VS 电生理)对许多刺激的反应,并计算了每对刺激(人类VS猴子)之间表征的相关性。这种方法有效地将特定物种和特定技术的数据抽象为用刺激之间相似性来编码的数据。这种方法使得我们可以比较不同物种对不同刺激表征结构(用相关矩阵或相似性矩阵来表示)。他们发现了高度的相关,揭示出人类和猴子的颞下回可能在视觉编码上使用了相似的方法。
  • 因为RSA在理念上比SVM更简单,RSA的结果也更容易解释,在探索不同表征之间的关系时也更加灵活。RSA的另一个优点是相关矩阵不需要多次刺激的呈现,因此RSA可以研究只呈现一次的刺激的表征属性。另一方面,RSA没有那些复杂算法的灵敏度高。 Fig4

    Figure Computation of a representational dissimilarity matrix. For each pair of experimental stimuli, the response patterns elicited in a brain region or model representation are compared to determine the stimuli’s representational dissimilarity. The dissimilarity between two patterns can be measured as 1 minus the correlation (0 for perfect correlation, 1 for no correlation, 2 for perfect anticorrelation). These dissimilarities for all pairs of stimuli are assembled in the representational dissimilarity matrix (RDM). Each cell of the matrix then compares the response patterns elicited by two images and the matrix is symmetric about a diagonal of zeros. To visualize the representation, we can arrange the stimuli according to response-pattern dissimilarity, such that stimuli are placed close together if they elicited similar response patterns and far apart if they elicited dissimilar response patterns. The color of each connection line here indicates whether the response-pattern difference was significant (red: p < 0.01; light gray: p ≥ 0.05).

References

Kriegeskorte N, Mur M, Bandettini P (2008). Representational similarity analysis - connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 4-10.

Nili H, Wingfield C, Walther H, Su L, Marslen-Wilson W, Kriegeskorte N (2014). A toolbox for representational similarity analysis. PLoS Comput Biol 10(4): e1003553.

EEG Spatiotemporal Neural Pattern Similarity

Lu, Y., et al. (2015). “Spatiotemporal neural pattern similarity supports episodic memory.” Current Biology 25(6): 780-785.

Signals from autonomic nervous system

Reason-Results

  • 人在面临应激事件时,自主神经系统会做出相应的反应来应对情景的变化。如激动紧张时,血液循环会明显加快,具体表现为心率加速、血压和血管容积上升;呼吸加深加速加重;出汗、汗毛竖起;瞳孔散大等。任何一种情感状态可能会导致几种生理特征的变化。
  • 这些生理特征是由自主神经系统和内分泌系统所支配的,很少受人的主观控制,所以其具有明显的真实性和客观性。
  • 如何通过这些生理信号的变化获取人体特定情绪唤醒水平的信息,进而识别出人们内在的情感状态,乃至利用这些信息为人们提供一些有益的情感反馈和指导,具有重要的现实应用意义。
  • 常用的自主神经系统的生理信号:
    1. Heart Rate (心率,HR)
    2. Galvanic Skin Response (皮肤电反应,GSR)
    3. Skin Resistance (皮肤电阻,SR)
    4. Electrocardiography (心电图,ECG)
    5. Respiration (呼吸,RSP)
  • see 《人体生理信号的情感计算方法》 for more details!

文本、语音和肢体语言的情感识别

not discussed here

Where should I start MVPA?

Design considerations for fMRI studies using MVPA

  • Both event-related and block designs can be used in combination with pattern-information analysis. The choice will largely be based on similar considerations as for studies using activation-based analyses.
  • Block designs yield a higher functional contrast-to-noise ratio than event-related designs. This holds both for constant inter-stimulus-interval (ISI) event-related designs (Bandettini and Cox, 2000) and jittered rapid event-related designs (Birn et al., 2002). This implies that block designs will generally yield better estimates of the average response pattern (i.e. the centroid) than event-related designs. This is especially useful for discriminating a small number of conditions (e.g. Haxby et al., 2001).
  • However, event-related designs can be preferable for psychological reasons as they are less predictable and can reduce habituation effects. Moreover, event-related designs can accommodate a larger number of conditions (Kriegeskorte et al., 2008b). Another advantage of particular importance to information-based analysis is that they yield more independent data points than block designs and can therefore yield a better estimate of the shape of each condition’s multivariate response distribution. This can improve classification performance and, thus, increase sensitivity in detecting pattern information. On the other hand, the condition mean pattern estimates (centroids) will typically be somewhat noisier. It should also be noted that rapid-event related designs involve temporally overlapping hemodynamic responses. The effects of temporal overlap can be accounted for using the same design optimization techniques that have proven useful for activation-based studies.

Software

There are a few software packages specifically for MVPA, though many people (myself included) use quite a bit of their own code. A nice comparison of packages is towards the end of Hebart2015.

  1. PRoNTO (works with SPM)
  2. pyMVPA
  3. Princeton MVPA toolbox
  4. parts of BrainVoyager.
  5. The Decoding Toolbox (TDT), also MATLAB-based and works with SPM

Workshops

  1. Pattern Recognition in Neuroimaging (PRNI) workshop, sort-of co-located with OHBM
  2. Machine Learning and Interpretation in Neuroimaging(MLINI), NIPS workshop

Publicly-available fMRI datasets

  • The 1 January 2016 issue of NeuroImage is devoted to public neuroimaging data repositories.
  • I (Jo Etzel) have two datasets up at the Open Science Foundation. Both of these are fully preprocessed (not raw images), with R code to reproduce the papers’ MVPA results. My 2015 Cerebral Cortex paper is at doi:10.1093/cercor/bhu327, and Etzel JA, Valchev N, Gazzola V, Keysers C. (2016). Is Brain Activity during Action Observation Modulated by the Perceived Fairness of the Actor? PLOS ONE is osf.io/dg63m.
  • OpenfMRI, a formal effort organized by Russ Poldrack, with 32 datasets as of 4 February 2015.
  • the Human Connectome Project. Enormous dataset, in terms of number of people, size of files, and assortment of modalities. I’ve written some posts are working with this data; start with this one.
  • CRCNS hosts datasets in various modalities, from fMRI to single-neuron.
  • Kendrick Kay shares multiple datasets (including fMRI) and code on his site.

Literature

  1. Pereira, F., Mitchell, T., Botvinick, M., 2009. Machine learning classifiers and fMRI: A tutorial overview. Neuroimage 45, S199-S209.
  2. Etzel, J.A., Gazzola, V., Keysers, C., 2009. An introduction to anatomical ROI-based fMRI classification analysis. Brain Research 1282, 114-125.
  3. Mur, M., Bandettini, P.A., Kriegeskorte, N., 2009. Revealing representational content with pattern-information fMRI–an introductory guide. Soc Cogn Affect Neurosci, nsn044.
  4. Haynes JD. 2015. A Primer on Pattern-Based Approaches to fMRI: Principles, Pitfalls, and Perspectives. Neuron, 87 (2), 257-70. doi: 10.1016/j.neuron.2015.05.025 PMID: 26182413
  5. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X., 2004. Learning to Decode Cognitive States from Brain Images. Machine Learning 57, 145-175.
  6. Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V., 2006. Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends In Cognitive Sciences 10, 424-430.

Algorithm

  • 因为MVPA的主要目的是区分不同认知状态下的神经信号模式((pattern classification),这一过程中需要用到判别函数(各种称呼,分类器, classifier;算法,algorithms)来区分不同认知状态下形成的空间模式。选择合适的分类器来对自己的数据进行分类是MVPA中关键的一步。
  • 对于每个体素,每种模式可以视为一个平面上的一个点,每个体素的活动决定了其坐标。一种常见的分类方法是构建一条线把属于条件A和属于条件B的点分开。如果两种条件的模式确实存在差异,那么直线一边的点就可以分类为条件A,另一边分为条件B。对于多个体素(>2),这一平面就变成了一个高维的空间,而决策线就变成了一个线性的决策边界(linear decision boundary)(也叫决策平面decision hyperplane)。
  • 根据所学习的关系种类, 分类器可以分成线性分类器(linear classifier)和非线性分类器 (nonlinear classifier)两种。
  • 目前的MVPA研究大都使用线性分类器(Mur,2009),主要有三种:
    1. the minimum distance classifier (e.g. Haxby et al., 2001),
    2. Fisher linear discriminant analysis (FLDA) (e.g. Carlson et al., 2003)
    3. linear support vector machine (SVM) (e.g. Cox and Savoy, 2003) 。

      Figure. Linear classification methods all define a linear decision boundary, but the boundary is placed slightly differently. This is shown for a given set of hypothetical activity patterns. The blue dots represent activity patterns for one experimental condition (e.g. the speech sound /ra/), the red dots represent activity patterns for a second condition (e.g. the speech sound /ra/). For simplicity, the displayed activity patterns are based on activity of only two voxels. Nevertheless, the classification methods generalize to higher dimensional voxel spaces. From Mur (2009 SCAN)

三种方法在构建决策边界上略有不同,但有研究表明这三种方法事实上表现都差不多(see Ku et al., 2008; Mourao-Miranda et al., 2005)。只要使用正确,这三种方法都可以提供有效的模式信息的统计结果。但是注意,Misaki等人在2010年发现,虽然这些线性分类器性能差不多(SVM略优),但使用什么样的指标来量化大脑活动模型会影响到分类器的性能。对于SVM而言,使用t-value来定义大脑活动模式,相比使用beta值,可以有更好的表现。

  • 当然,非线性分类算法也可以应用到模式信息的分析中(如Cox and Savoy, 2003; LaConte et al., 2005)。这些算法可以比线性分类器获得更复杂的决策边界。但是这些非线性分类器相比线性分类器更容易过度拟合数据。

Support vector machine(SVM,支持向量机)

  • 向量机(SVM)是MVPA应用最广泛的一种分类器。Their “fame” mainly rests on two key properties: 1) SVMs find solutions of classification problems that have “generalization in mind”, and 2) they are able to find non-linear solutions efficiently using the “kernel trick”.
  • 第一,SVM在高维数据和小样本的训练集的情况下表现不俗。这对于fmri数据尤其重要,因为我们一般拥有数量众多的特征(体素),但每个类别(或实验条件)只有相对较少的试次。While this SVM property is useful to reduce the “curse of dimensionality” problem by reducing the risk of over-fitting the training data, it is still important to reduce the number of voxels as much as possible. In machine learning, this feature reduction step is referred to as feature selection. One way of feature selection consists in restricting the number of voxels to the ones in anatomically or functionally defined regions-of-interest (ROIs).
  • 第二,向量机(SVM)是一种基于核的算法。基于核的算法把输入数据映射到一个高阶的向量空间,在这些高阶向量空间里,有些非线性分类的问题能够更容易的解决。A binary (i.e. two-class) classification problem is called linearly separable, if a “hyper-plane” can be positioned in such a way, that all exemplars of one class fall on one side and all exemplars of the other class fall on the other side. In case of two feature dimensions (e.g. two voxels), the “hyper-plane” corresponds simply to a line, in case of three features, the hyper-plane corresponds to an actual 2D plane. The term “linear” means that the separating entity is “straight”, i.e. it may not be curved. In difficult classification problems, optimal solutions may only be found if curved (non-linear) separation entities are allowed. In SVMs non-linear solutions can be efficiently found by using the “kernel trick”: The data is mapped into a high-dimensional space in which the problem becomes linearly separable. The “trick” is that this is only “virtually” done by calculating kernel functions. For problems with high-dimensional feature vectors and a relatively small number of training patterns, non-linear kernels are usually not required. This is important for fMRI applications, because only linear classifiers allow to interpret the obtained weight values (one per voxel) in a meaningful way for visualization and feature selection (see below).
  • 常见的基于核的算法还包括径向基函数(Radial Basis Function ,RBF)和线性判别分析(Linear Discriminate Analysis ,LDA)等。

SVM Over-fitting and Under-fitting

  1. an SVM classifier is unstable on a small-sized training set;
  2. SVM’s optimal hyper-plane may be biased when the positive feedback samples are much less than the negative samples
  3. overfitting happens because the number of feature dimensions is much higher than the size of the training set.
    To avoid overfitting, cross-validation is used to evaluate the fitting provided by each parameter value set tried during the grid or pattern search process.

参考资料

  • 交叉检验 http://www.cnblogs.com/bourneli/archive/2013/03/11/2954060.html
  • 交叉检验 https://zhuanlan.zhihu.com/p/24825503?refer=rdatamining
  • MVPA Start http://mvpa.blogspot.com/2013/02/where-to-start-with-mvpa.html