๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Data Analysis, DA/Linear Algebra

n133 eigenvalue/eigenvector ์ฃผ์„ฑ๋ถ„๋ถ„์„(PCA)

by kiimy 2021. 5. 27.
๋ฐ˜์‘ํ˜•
from sklearn.preprocessing import StandardScaler, Normalizer

from sklearn.decomposition import PCA

fit() : ํ‰๊ท  ๐œ‡๊ณผ ํ‘œ์ค€ํŽธ์ฐจ ๐œŽ๋ฅผ ๊ณ„์‚ฐ

transform() : โ€‹์ •๊ทœํ™”/ํ‘œ์ค€ํ™”, Standardization

fit_trasform() : fit() + transform()

 

fit() -๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฉ”์„œ๋“œ
transform() - ํ•™์Šต์‹œํ‚จ ๊ฒƒ์„ ์ ์šฉํ•˜๋Š” ๋ฉ”์„œ๋“œ

So, Test data set์—๋Š” ์ ์šฉํ•˜๋ฉด ์•ˆ๋จ! 
= sclaer๊ฐ€ ๊ธฐ์กด์— ํ•™์Šต ๋ฐ์ดํ„ฐ์— fitํ•œ ๊ธฐ์ค€์„ ๋‹ค ๋ฌด์‹œํ•˜๊ณ  ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ์ƒˆ๋กœ์šด mean, variance๊ฐ’์„ ์–ป์œผ๋ฉด์„œ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๊นŒ์ง€ ํ•™์Šตํ•ด๋ฒ„๋ฆฐ๋‹ค. ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋Š” ๊ฒ€์ฆ์„ ์œ„ํ•ด ๋‚จ๊ฒจ๋‘” Set

๊ณ ์œ ๋ฒกํ„ฐ (Eigenvector) ๊ณ ์œ ๊ฐ’ (Eigenvalue)

๊ณ ์œ ๋ฒกํ„ฐ๋Š” ์ฃผ์–ด์ง„ transformation์— ๋Œ€ํ•ด์„œ ํฌ๊ธฐ๋งŒ ๋ณ€ํ•˜๊ณ  ๋ฐฉํ–ฅ์€ ๋ณ€ํ™” ํ•˜์ง€ ์•Š๋Š” ๋ฒกํ„ฐ

์—ฌ๊ธฐ์„œ ๋ณ€ํ™”ํ•˜๋Š” ํฌ๊ธฐ๋Š” ๊ฒฐ๊ตญ ์Šค์นผ๋ผ ๊ฐ’์œผ๋กœ ๋ณ€ํ™” ํ•  ์ˆ˜ ๋ฐ–์— ์—†๋Š”๋ฐ, ์ด ํŠน์ • ์Šค์นผ๋ผ ๊ฐ’์„ ๊ณ ์œ ๊ฐ’ (eigenvalue)

# ๊ณ ์œ  ๊ฐ’์€ ๊ณ ์œ ๋ฒกํ„ฐ ๋ฐฉํ–ฅ์œผ๋กœ ์–ผ๋งˆ๋งŒํผ์˜ ํฌ๊ธฐ๋กœ ๋ฒกํ„ฐ๊ณต๊ฐ„์ด ๋Š˜๋ ค์ง€๋Š” ์ง€๋ฅผ ์–˜๊ธฐํ•œ๋‹ค

== ์˜ˆ๋ฅผ ๋“ค์–ด ์ง€๊ตฌ์˜ ์ž์ „์šด๋™๊ณผ ๊ฐ™์ด 3์ฐจ์› ํšŒ์ „๋ณ€ํ™˜์„ ์ƒ๊ฐํ–ˆ์„ ๋•Œ, ์ด ํšŒ์ „๋ณ€ํ™˜์— ์˜ํ•ด ๋ณ€ํ•˜์ง€
      ์•Š๋Š” ๊ณ ์œ ๋ฒกํ„ฐ๋Š” ํšŒ์ „์ถ• ๋ฒกํ„ฐ์ด๊ณ  ๊ทธ ๊ณ ์œ ๊ฐ’์€ 1์ด ๋  ๊ฒƒ์ด๋‹ค.

- transformation์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š” ํšŒ์ „์ถ•, (ํ˜น์€ ๋ฒกํ„ฐ)
(= ๋ถ„์‚ฐ์˜ ๋ฐฉํ–ฅ์„ ์•Œ๋ ค์คŒ)

- ๋ณ€ํ™”ํ•˜๋Š” ํฌ๊ธฐ๋Š” ๊ฒฐ๊ตญ ์Šค์นผ๋ผ ๊ฐ’์œผ๋กœ ๋ณ€ํ™” ํ•  ์ˆ˜ ๋ฐ–์— ์—†๋Š”๋ฐ, ์ด ํŠน์ • ์Šค์นผ๋ผ ๊ฐ’์„ ๊ณ ์œ ๊ฐ’
(= ์„ค๋ช…๊ฐ€๋Šฅํ•œ ๋ถ„์‚ฐ๋Ÿ‰)
*Vector transformation์€ ๊ฒฐ๊ตญ ๊ถ๊ทน์ ์œผ๋กœ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•œ๋‹ค

https://angeloyeo.github.io/2019/07/17/eigen_vector.html

 

๊ณ ์œณ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ - ๊ณต๋Œ์ด์˜ ์ˆ˜ํ•™์ •๋ฆฌ๋…ธํŠธ

 

angeloyeo.github.io

np.linalg.norm / inv / eig ## ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ(lVl), ์—ญํ–‰๋ ฌ, ๊ณ ์œ ๊ฐ’๊ณผ ๋ฒกํ„ฐ

( value, vector = np.linalg.eig() )


np.dot/ matmul ## ๋‘ ๋ฒกํ„ฐ์˜ ๋‚ด์  ์ฃผ์˜) 2์ฐจ์›์—์„œ๋Š” ๊ฐ™์ง€๋งŒ 3์ฐจ์›๋ถ€ํ„ฐ๋Š” ๊ฐ’์ด๋‹ค๋ฆ„ 

np.multiply ## ๊ณฑ(=์Šค์นผ๋ผ)

Explained Variance Ratio์€ ๊ฐ๊ฐ์˜ ์ฃผ์„ฑ๋ถ„ ๋ฒกํ„ฐ๊ฐ€ ์ด๋ฃจ๋Š” ์ถ•์—

ํˆฌ์˜(projection)ํ•œ ๊ฒฐ๊ณผ์˜ ๋ถ„์‚ฐ์˜ ๋น„์œจ์„ ๋งํ•˜๋ฉฐ, == ๊ฐ eigenvalue์˜ ๋น„์œจ

<value, vector = np.linalg.eig()>

*๋ฒกํ„ฐ array๋กœ ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ๋ฒกํ„ฐ๊ฐ’์€ ์œ„์•„๋ž˜๋กœ ์ฝ์–ด์•ผ๋จ df[:,0] ์ „์ฒดํ–‰์—์„œ 0๋ฒˆ์งธ ์—ด

array([[-0.9057736 , -0.85343697],

[ 0.42376194, -0.52119606]]

 

Principal Component Analysis (PCA)

<์•Œ์•„์•ผํ•  ๊ฐœ๋…>
PCA -> ๊ณต๋ถ„์‚ฐ -> ํ–‰๋ ฌ -> ๋ฒกํ„ฐ์˜ ๋‚ด์ (์ •์‚ฌ์˜), ๊ณ ์œ ๋ฒกํ„ฐ, ๊ณ ์œ ๊ฐ’
  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„์„ ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•
  • ๋‚ฎ์€ ์ฐจ์›์œผ๋กœ ์ฐจ์›์ถ•์†Œ
  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์‹œ๊ฐํ™” + clustering
  • ์›๋ž˜ ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด(๋ถ„์‚ฐ)๋ฅผ ์ตœ๋Œ€ํ•œ ์œ ์ง€ํ•˜๋Š” ๋ฒกํ„ฐ๋ฅผ ์ฐพ๊ณ , ํ•ด๋‹น ๋ฒกํ„ฐ์— ๋Œ€ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ (Linear)Projection
    • ๋ฐ์ดํ„ฐ๋“ค์„ ์ฐจ์› ์ถ•์†Œ์‹œํ‚ฌ ๋•Œ ๊ฐ€์žฅ ์›๋ž˜ ์˜๋ฏธ๋ฅผ ์ž˜ ๋ณด์กด
๋ฐ์ดํ„ฐ๋“ค์„ ์ •์‚ฌ์˜ ์‹œ์ผœ ์ฐจ์›์„ ๋‚ฎ์ถ˜๋‹ค๋ฉด,
์–ด๋–ค ๋ฒกํ„ฐ์— ๋ฐ์ดํ„ฐ๋“ค์„ ์ •์‚ฌ์˜ ์‹œ์ผœ์•ผ ์›๋ž˜์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ์ œ์ผ ์ž˜ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์„๊นŒ?

์ •์‚ฌ์˜์ด๋ž€??

https://cord-ai.tistory.com/12

 

n132 ๊ณต๋ถ„์‚ฐ๊ณผ ์ƒ๊ด€๊ณ„์ˆ˜ cov, corr/ vector

๋ถ„์‚ฐ, ํ‘œ์ค€ํŽธ์ฐจ * ๋ถ„์‚ฐ์€, ๋ฐ์ดํ„ฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ํผ์ ธ์žˆ๋Š”์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ• = (๊ด€์ธก๊ฐ’๊ณผ ํ‰๊ท ์˜ ์ฐจ)๋ฅผ ์ œ๊ณฑํ•œ ๊ฐ’์„ ๋ชจ๋‘ ๋”ํ•˜๊ณ  ๊ด€์ธก๊ฐ’ ๊ฐœ์ˆ˜๋กœ ๋‚˜๋ˆˆ ๊ฐ’ = ์ฐจ์ด ๊ฐ’์˜ ์ œ๊ณฑ์˜ ํ‰๊ท  * ๋ถ„์‚ฐ์„ ๊ตฌํ•˜๋Š” ๊ณผ์ •

cord-ai.tistory.com

๋ถ„์„๊ณผ์ • 

- scikit-learn PCA๋Š” ํŠน์ด๊ฐ’ ๋ถ„ํ•ด(SVD, Singular Value Decomposition)๋ฅผ ์ด์šฉํ•ด ๊ณ„์‚ฐ

https://darkpgmr.tistory.com/106

 

[์„ ํ˜•๋Œ€์ˆ˜ํ•™ #4] ํŠน์ด๊ฐ’ ๋ถ„ํ•ด(Singular Value Decomposition, SVD)์˜ ํ™œ์šฉ

ํ™œ์šฉ๋„ ์ธก๋ฉด์—์„œ ์„ ํ˜•๋Œ€์ˆ˜ํ•™์˜ ๊ฝƒ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋Š” ํŠน์ด๊ฐ’ ๋ถ„ํ•ด(Singular Value Decomposition, SVD)์— ๋Œ€ํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ๋ณดํ†ต์€ ๋ณต์†Œ์ˆ˜ ๊ณต๊ฐ„์„ ํฌํ•จํ•˜์—ฌ ์ •์˜ํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ด์ง€๋งŒ ์ด ๊ธ€์—์„œ๋Š” ์‹ค์ˆ˜(real

darkpgmr.tistory.com

1. ๊ฐ€์žฅ ๋ถ„์‚ฐ์ด ํฐ ์ฃผ์„ฑ๋ถ„ ~~ ์ฐจ๋ก€๋กœ ๊ตฌํ•จ

= ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜(n_components)๋ฅผ ์„ค์ •ํ•  ๋•Œ, ๋ช‡ ๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์œ ๋ฆฌํ• ์ง€ ์ž„์˜๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ต๋‹ค.

๋”ฐ๋ผ์„œ, 0~1 ์‚ฌ์ด์˜ ์‹ค์ˆ˜๋ฅผ ์ž…๋ คํ•˜๋ฉด ์ง€์ •๋œ ๋น„์œจ๋งŒํผ์˜ ๋ถ„์‚ฐ์ด ์œ ์ง€๋˜๋Š” ์ตœ์†Œํ•œ์˜ ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜๋ฅผ ์ž๋™์œผ๋กœ ์„ ํƒ

# 5% ๋งŒํผ์˜ ์ •๋ณด(๋ถ„์‚ฐ)์„ ์žƒ๋Š”๋‹ค
pca = PCA(n_components=0.95)
๋‹ค์ฐจ์›์˜ ๋ฐ์ดํ„ฐ์—์„œ ์ฐจ์› ๊ฐ์†Œ๋ฅผ ์‹œ์ผœ์ฃผ๋Š” ๊ฒƒ์ด PCA์˜ ์ฃผ๋ชฉ์ ,
๊ทธ๋Ÿฌ๋ฉด ๊ณ ์ฐจ์›์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋””๊นŒ์ง€ ์ฐจ์›๊ฐ์†Œ ์‹œ์ผœ์ฃผ๋Š” ๊ฒƒ์ด ํƒ€๋‹นํ• ๊นŒ?

EX) N์ฐจ์›์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ• ๋•Œ, N๊ฐœ์˜ eigenvalue๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ๋ˆ„์ ๋˜๋Š”(= np.cumsum()) Eigenvalue์˜ ํ•ฉ์ด,
90%์ด์ƒ or ๊ทผ์ ‘ํ•˜๋ฉด ํ•ด๋‹น ์ฐจ์›๊นŒ์ง€ ์ค„์ด๋Š” ๊ฒƒ์ด ํƒ€๋‹นํ•จ

2. PC(์ฃผ์„ฑ๋ถ„)์€ ์„œ๋กœ ์ง๊ตํ•จ(= ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ๊ณ ์œ ๋ฒกํ„ฐ์ด๋ฏ€๋กœ) == ์ƒ๊ด€๊ณ„์ˆ˜ 0 / ์„œ๋กœ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค๋Š” ๋œป

 

3. ๋‹จ์œ„ ์Šค์ผ€์ผ๋ง์ด ํ•„์š”ํ•จ => ํ‘œ์ค€ํ™” / ํ‰๊ท 0, ๋ถ„์‚ฐ1

 

4. ๋ถ„์‚ฐ ๊ณต๋ถ„์‚ฐ matrix๋ฅผ ๋งŒ๋“ฌ

๊ณ ์œ  ๋ฒกํ„ฐ๋Š” ๊ทธ ํ–‰๋ ฌ์ด ๋ฒกํ„ฐ์— ์ž‘์šฉํ•˜๋Š” ์ฃผ์ถ•(principal axis)์˜ ๋ฐฉํ–ฅ์„ ๋‚˜ํƒ€๋‚ด๋ฏ€๋กœ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜

๊ณ ์œ  ๋ฒกํ„ฐ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ๋ถ„์‚ฐ๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด์ค€๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋ฐ์ดํ„ฐ ๋ฒกํ„ฐ๋ฅผ ์–ด๋–ค ๋ฒกํ„ฐ์— ๋‚ด์ (ํ˜น์€ ์ •์‚ฌ์˜)ํ•˜๋Š” ๊ฒƒ์ด ์ตœ์ ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด์ฃผ๋Š”๊ฐ€?’

5. ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ๊ตฌํ•จ(= ํ‘œ์ค€ํ™”ํ•˜๊ฒŒ ๋˜๋ฉด ์ด๋ถ„์‚ฐ์€ ๊ฐฏ์ˆ˜๋งŒํผ ๊ฐ’์ด ๋‚˜์˜ด,  2๊ฐœ์˜ ๋ณ€์ˆ˜๋ฉด ์ด๋ถ„์‚ฐ๋„ 2)

๊ณ ์œ  ๊ฐ’์ด ํฐ ์ˆœ์„œ๋Œ€๋กœ ๊ณ ์œ  ๋ฒกํ„ฐ๋ฅผ ์ •๋ ฌํ•˜๋ฉด ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ค‘์š”ํ•œ ์ˆœ์„œ๋Œ€๋กœ ์ฃผ์„ฑ๋ถ„์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๋œ๋‹ค.

๊ณ ์œ  ๋ฒกํ„ฐ ๋‘ ๊ฐœ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๊ณ  ๊ฐ๊ฐ์˜ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ๊ฐ ๋ฒกํ„ฐ์˜ ๊ณ ์œณ๊ฐ’์„ ์˜๋ฏธํ•œ๋‹ค.

 

6. ๋‚˜์˜จ PC๊ฐ’๊ณผ ๋ณ€์ˆ˜์™€ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ด„(= ๊ณ ์œ ๊ฐ’/๋ณ€์ˆ˜ ๊ฐœ์ˆ˜ ==> ๊ฐ ๋ณ€์ˆ˜์˜ ์ƒ๊ด€๊ด€๊ณ„)

( ๋จผ์ € ๋ณ€์ˆ˜์‚ฌ์ด์— ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ณด๋ฉด์„œ '์ด ์ฃผ์„ฑ๋ถ„์€ ๋ฌด์—‡์ด๋‹ค'๋ผ๊ณ  ์นญํ•œ๋‹ค๊ณ  ๊ฐ€์ •์„ ํ•จ)

 

7. ํˆฌ์˜(projection) ๊ฐ’์ด ์ž‘์€ ๊ฐ’ ๊ธฐ์ค€์œผ๋กœ ๋ด„

https://angeloyeo.github.io/2019/07/27/PCA.html

 

์ฃผ์„ฑ๋ถ„ ๋ถ„์„(PCA) - ๊ณต๋Œ์ด์˜ ์ˆ˜ํ•™์ •๋ฆฌ๋…ธํŠธ (Angelo's Math Notes)

 

angeloyeo.github.io

Code

from sklearn.preprocessing import StandardScaler, Normalizer
from sklearn.decomposition import PCA

#๋จผ์ € ๋ฐ์ดํ„ฐ ํ‘œ์ค€ํ™” ์‹œํ‚ด(= ๋‹จ์œ„ ์Šค์ผ€์ผ๋ง) , ์ •๊ทœํ™” = ๋ถ„์‚ฐ
scaler = StandardScaler()
Z = scaler.fit_transform(data_f)
print("\n Standardized Data: \n", Z)

pca = PCA(2) # n_componets == ์ฃผ์„ฑ๋ถ„ ๊ฐœ์ˆ˜
pca.fit(Z) # Z๋ฅผ ๊ฐ€์ง„ ๋ชจ๋ธ๋กœ ๋งž์ถ˜๋‹ค

# pca.components == ๊ณ ์œ ๋ฒกํ„ฐ
print("\n Eigenvectors: \n", pca.components_)

# Eigenvalues == ๊ณ ์œ ๊ฐ’
print("\n Eigenvalues: \n",pca.explained_variance_ )

ratio= pca.explained_variance_ratio_ 
print(ratio)
# ์ฃผ์„ฑ๋ถ„ ๋น„์œจ
[0.68633893 0.19452929]

## cumsum์€ ๋ฐฐ์—ด์—์„œ ์ฃผ์–ด์ง„ ์ถ•์— ๋”ฐ๋ผ ๋ˆ„์ ๋˜๋Š” ์›์†Œ๋“ค์˜ ๋ˆ„์  ํ•ฉ์„ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜.
'''
a = np.array([[1,2,3], [4,5,6]])
print(np.cumsum(a)) # (axis=0, 1) ==> axis์— ๋”ฐ๋ฅธ ๋ˆ„์ ํ•ฉ
[ 1  3  6 10 15 21]
'''
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1
print('์„ ํƒํ•  ์ฐจ์› ์ˆ˜ :', d)


 Standardized Data: 
 [[-0.89604189  0.7807321  -1.42675157 -0.56847478]
 [-0.82278787  0.11958397 -1.06947358 -0.50628618]
 [-0.67627982  0.42472926 -0.42637319 -1.1903608 ]
 ...
 [ 1.17338426 -0.74499437  1.50292796  1.91906927]
 [ 0.22108196 -1.20271231  0.78837197  1.23499466]
 [ 1.08181673 -0.54156417  0.85982757  1.48374906]]

 Eigenvectors: 
 [[ 0.45375317 -0.39904723  0.576825    0.54967471]
 [ 0.6001949   0.79616951  0.00578817  0.07646366]]
'''
์ฃผ์„ฑ๋ถ„ ๋น„์œจ์ด ==> ๋‘ pc1, pc2 ๊ฐ’์˜ ํ•ฉ์ด 0.88 ์ด์ƒ ์ฆ‰, ๋‘ ๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์ด ์ „์ฒด ๋ถ„์‚ฐ์˜ ์•ฝ 88%
๋”ฐ๋ผ์„œ, ์ถ”๊ฐ€์ ์ธ ์ฃผ์„ฑ๋ถ„์„ ํˆฌ์ž…ํ•˜๋”๋ผ๋„ ์„ค๋ช… ๊ฐ€๋Šฅํ•œ ๋ถ„์‚ฐ๋Ÿ‰์ด ์–ผ๋งˆ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— 
์ฃผ์„ฑ๋ถ„์€ ๋‘ ๊ฐœ๋กœ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ์ ์ ˆํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.
'''


# pca.transform(df) ## ํˆฌ์˜ํ•œ(=์ฃผ์„ฑ๋ถ„ ์ ์ˆ˜๋“ค) ๊ฐ’(= np.matmul(matrix, vector) )
B = pca.transform(Z)
# PCA Projection to 2D
D = pd.DataFrame(data= B, columns= ['pc1', 'pc2'])

# ์ฃผ์„ฑ๋ถ„ ์ ์ˆ˜๊ฐ„์˜ ์ƒ๊ด€๊ณ„์ˆ˜
D.corr() #์ฃผ์„ฑ๋ถ„ ์ ์ˆ˜๊ฐ„์˜ ์ƒ๊ด€๊ณ„์ˆ˜ = 0 ==> ์ง๊ฐ์ด๋‹ˆ๊นŒ

Scree Plot

## ๊ทธ๋ž˜ํ”„

fig, ax= plt.subplots(figsize=(10,8))

plt.plot(pca.explained_variance_, 'o-') ## screeplot ํ™•์ธ

'''
๊ทธ๋ž˜ํ”„์—์„œ ํŠน์ • ๊ตฌ๊ฐ„๋ถ€ํ„ฐ ๊ฐ‘์ž๊ธฐ ๊บพ์ด๋Š” ๋ถ€๋ถ„๊นŒ์ง€ ์ฐจ์› ๊ฐ์†Œ๋ฅผ ์‹œ์ผœ์ค€๋‹ค
scree plot์€ PCA์™ธ์—๋„ ๋งŽ์€ method์—์„œ ์‚ฌ์šฉ๋œ๋‹ค.
'''

x ์ถ•์—๋Š” dimensions, y ์ถ•์—๋Š” ํ•ด๋‹น dimension์˜ eigenvalue

Feacture Selection:

 Feature Selection์ด๋ž€ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋œ ์ค‘์š”ํ•œ feature๋ฅผ ์ œ๊ฑฐ ํ•˜๋Š” ๋ฐฉ๋ฒ•

Feature Extraction

  • ๊ธฐ์กด์— ์žˆ๋Š” Feature or ๊ทธ๋“ค์„ ๋ฐ”ํƒ•์œผ๋กœ ์กฐํ•ฉ๋œ Feature๋ฅผ ์‚ฌ์šฉ ํ•˜๋Š” ๊ฒƒ

Selection ์˜ ๊ฒฝ์šฐ

  • ์žฅ์  : ์„ ํƒ๋œ feature ํ•ด์„์ด ์‰ฝ๋‹ค.
  • ๋‹จ์  : feature๋“ค๊ฐ„์˜ ์—ฐ๊ด€์„ฑ์„ ๊ณ ๋ คํ•ด์•ผํ•จ.
  • ์˜ˆ์‹œ : LASSO, Genetic algorithm ๋“ฑ

Extraction์˜ ๊ฒฝ์šฐ

  • ์žฅ์  : feature ๋“ค๊ฐ„์˜ ์—ฐ๊ด€์„ฑ ๊ณ ๋ ค๋จ. feature์ˆ˜ ๋งŽ์ด ์ค„์ผ ์ˆ˜ ์žˆ์Œ
  • ๋‹จ์  : feature ํ•ด์„์ด ์–ด๋ ค์›€.
  • ์˜ˆ์‹œ : PCA, Auto-encoder ๋“ฑ
728x90

๋Œ“๊ธ€