主页 > 互联网  > 

【机器学习】NonlinearIndependentComponentAnalysis-AapoHyvärine

【机器学习】NonlinearIndependentComponentAnalysis-AapoHyvärine
Linear independent component analysis (ICA)

x i ( k ) = ∑ j = 1 n a i j s j ( k ) for all  i = 1 … n , k = 1 … K ( ) x_i(k) = \sum_{j=1}^{n} a_{ij}s_j(k) \quad \text{for all } i = 1 \ldots n, k = 1 \ldots K \tag{} xi​(k)=j=1∑n​aij​sj​(k)for all i=1…n,k=1…K()

x i ( k ) x_i(k) xi​(k) is the i i i-th observed signal in sample point k k k (possibly time) a i j a_{ij} aij​ constant parameters describing “mixing”Assuming independent, non-Gaussian latent “sources” s j s_j sj​ICA is identifiable, i.e. well-defined. Observing only x i x_i xi​ we can recover both a i j a_{ij} aij​ and s j s_j sj​ . Fundamental difference between ICA and PCA PCA doesn’t find the original coordinates, ICA does.

PCA, Gaussian factor analysis are not identifiable: Any orthogonal rotation is equivalent: s ′ = U s s' = Us s′=Us has same distribution. Nonlinear ICA is an unsolved problem

Extend ICA to nonlinear case to get general disentanglement?

Unfortunately, “basic” nonlinear ICA is not identifiable:

If we define nonlinear ICA model for random variables ( x_i ) as

x i = f i ( s 1 , … , s n ) , i = 1 … n x_i = f_i(s_1, \ldots, s_n) , i = 1 \ldots n xi​=fi​(s1​,…,sn​),i=1…n

we cannot recover original sources (Darmois, 1952; Hyvärinen & Pajunen, 1999)

Darmois construction

Darmois (1952) showed the impossibility of nonlinear ICA:

For any x 1 , x 2 x_1, x_2 x1​,x2​, can always construct y = g ( x 1 , x 2 ) y = g(x_1, x_2) y=g(x1​,x2​) independent of x 1 x_1 x1​ as

g ( ξ 1 , ξ 2 ) = P ( x 2 < ξ 2 ∣ x 1 = ξ 1 ) g(\xi_1, \xi_2) = P(x_2 < \xi_2 | x_1 = \xi_1) g(ξ1​,ξ2​)=P(x2​<ξ2​∣x1​=ξ1​)

Independence alone too weak for identifiability:

We could take x 1 x_1 x1​ as an independent component which is absurd

Looking at non-Gaussianity equally absurd:

Scalar transform h ( x 1 ) h(x_1) h(x1​) can give any distribution Time-contrastive learning Observe n n n-dim time series x ( t ) x(t) x(t)Divide x ( t ) x(t) x(t) into T T T segments (e.g., bins with equal sizes)Train MLP to tell which segment a single data point comes from Number of classes is T T TLabels given by index of segmentMultinomial logistic regression In hidden layer h h h, NN should learn to represent nonstationarity 非平稳性 (= differences between segments)Could this really do Nonlinear ICA? Assume data follows nonlinear ICA model x ( t ) = f ( s ( t ) ) x(t) = f(s(t)) x(t)=f(s(t)) with smooth, invertible nonlinear mixing f : R n → R n f : \mathbb{R}^n \rightarrow \mathbb{R}^n f:Rn→Rncomponents s i ( t ) s_i(t) si​(t) are nonstationary, e.g., in variances Assume we apply time-contrastive learning on x ( t ) x(t) x(t) using MLP with hidden layer in h ( x ( t ) ) h(x(t)) h(x(t)) with dim ( h ) = dim ( x ) \text{dim}(h) = \text{dim}(x) dim(h)=dim(x) Then, TCL will find s ( t ) 2 = A h ( x ( t ) ) s(t)^2 = Ah(x(t)) s(t)2=Ah(x(t)) for some linear mixing matrix A A A. (Squaring is element-wise)I.e.: TCL demixes nonlinear ICA model up to linear mixing (which can be estimated by linear ICA) and up to squaring.This is a constructive proof of identifiabilityImposing independence at every segment -> more constraints -> unique solution. 增加了限制保证了indentifiability

用MLP,通过自监督分类(某一个信号来自于哪个时间段)来训练网络。这样MLP可以表示不同时间段内的信号差。而后原始信号 s 2 s^2 s2 可以表示为观测值(x)经MLP隐藏层分离结果的线性组合。

Deep Latent Variable Models

General framework with observed data vector x x x and latent s s s: p ( x , s ) = p ( x ∣ s ) p ( s ) , p ( x ) = ∫ p ( x , s ) d s p(x, s) = p(x|s)p(s), \quad p(x) = \int p(x, s)ds p(x,s)=p(x∣s)p(s),p(x)=∫p(x,s)ds where θ \theta θ is a vector of parameters, e.g., in a neural network

In variational autoencoders (VAE):

Define prior so that s s s white Gaussian (thus s i s_i si​; all independent)Define posterior so that x = f ( s ) + n x = f(s) + n x=f(s)+n

Looks like Nonlinear ICA, but not identifiable

By Gaussianity, any orthogonal rotation is equivalent: s ′ = M s  has exactly the same distribution if  M T M = I s' = Ms \text{ has exactly the same distribution if } M^TM = I s′=Ms has exactly the same distribution if MTM=I Conditioning DLVM’s by another variable

通过引入一个新的变量u来解,比如找视频和音频的关系,时间t就可以作为辅助变量(auxiliary varibale)。通过条件独立(conditional independent)来解。

Conclusion

Typical deep learning needs class labels, or some targets

If no class labels: unsupervised learning

Independent component analysis is a principled approach

can be made nonlinear

Identifiable: Can recover components that actually created the data (unlike PCA, VAE etc)

Special assumptions needed for identifiability, one of:

Nonstationarity (“time-contrastive learning”)Temporal dependencies (“permutation-contrastive learning”)Existence of auxiliary (conditioning) variable (e.g., “iVAE”)

Self-supervised methods are easy to implement

Connection to DLVM’s can be made → iVAE

Principled framework for “disentanglement”

总结来说Linear ICA是可解的,对于Nonlinear ICA则需要增加额外的假设才能可解(原始信号可分离)。Nonlinear ICA的思想可以用在深度学习的其他模型上。

Reference .youtube /watch?v=_cBLSNRWt8c
标签:

【机器学习】NonlinearIndependentComponentAnalysis-AapoHyvärine由讯客互联互联网栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“【机器学习】NonlinearIndependentComponentAnalysis-AapoHyvärine