ããã¹ãå šäœãéããŠãç§ãã¡ã¯æåž«ãšäžç·ã«åŠã¶ããšã«ã€ããŠè©±ããŠããã ã¢ãã«ããã¬ãŒãã³ã°ããããã®ããŒã¿ã»ããã¯ããã¬ãŒãã³ã°ã»ãããšåŒã°ããŸãã ç¬ç«å€æ°ã¯æ©èœãšåŒã°ããåŸå±å€æ°ãã¿ãŒã²ããå€æ°ãšåŒã³ãŸãã ãããã®èšèã¯èªè ã«ãªãã¿ããããšæããŸãã
æ©èœã®éžæ
äžè¬çã«æ©èœã®éžæãå¿ èŠãªçç±ã 2ã€ã®äž»ãªçç±ããããŸãã ãŸããå€ãã®æ©èœãããå Žåãåé¡åšã®äœæ¥æéãå¢å ããŸãã æé©ãªãã®ãéžæããããã«ããã€ãã®åé¡åšããã¹ãããããšãç®æšã§ããå Žåãèšç®ã«å¿ èŠãªæéã¯åçŽã«èšå€§ã«ãªãå¯èœæ§ããããŸãã ããã«ãããŒã¿ïŒãã¬ãŒãã³ã°ã»ããïŒãRAMã«åãŸããªããªãå¯èœæ§ããããããåé¡ã¢ã«ãŽãªãºã ãå€æŽããå¿ èŠããããŸãã ã»ããã®1è¡ã§ãåãŸããªãå ŽåããããŸãããããã¯ãã§ã«ãŸããªã±ãŒã¹ã§ãã
äž»ãªçç±ã¯ãŸã 2çªç®ã§ããç¹åŸŽã®æ°ãå¢ãããšãäºæž¬ã®ç²ŸåºŠãäœäžããããšããããããŸãã ç¹ã«ãããŒã¿ã«ã¬ããŒãžæ©èœãå€æ°ããå ŽåïŒã¿ãŒã²ããå€æ°ãšã®çžé¢ã¯ã»ãšãã©ãããŸããïŒã ãã®çŸè±¡ã¯éå°é©åãšåŒã°ããŸã ã
æ©èœéžæã¡ãœããã¯ããã£ã«ã¿ãŒã¡ãœãããã©ãããŒã¡ãœãããåã蟌ã¿ã¡ãœããã®3ã€ã®ã«ããŽãªã«åé¡ãããŸãã æåã®ã«ããŽãªãããã£ã«ã¿ãªã³ã°ã¡ãœããããæåŸã®ã«ããŽãªããçµã¿èŸŒã¿ã¡ãœããããšåŒã³ã2çªç®ã®ã«ããŽãªã«ã€ããŠã¯é©åãªç¿»èš³ããããŸããïŒææ¡ãèããŸãïŒã
ãã£ã«ã¿ãªã³ã°æ¹æ³
ãããã¯çµ±èšçææ³ã«åºã¥ããŠãããååãšããŠåæ©èœãåå¥ã«æ€èšããŸãã ãããã¯ãéèŠåºŠã«ãã£ãŠç¹åŸŽãè©äŸ¡ããã³ã©ã³ã¯ä»ãããããšãå¯èœã«ãããã®ç¹åŸŽãšã¿ãŒã²ããå€æ°ãšã®çžé¢ã®åºŠåããåãããŸãã ããã€ãã®äŸãèŠãŠã¿ãŸãããã
æ å ±ç²åŸ
æ©èœã®ãã£ã«ã¿ãªã³ã°æ¹æ³ã®1ã€ã®äŸã¯ãæ å ±ã²ã€ã³ã§ãã ããã¯ãæ å ±ãšã³ããããŒã®æŠå¿µãšå¯æ¥ã«é¢é£ããŠããŸãã ãšã³ããããŒã®åŒã¯éåžžã«ç°¡åã«è¡šãããŸãïŒ
ããã§ãpïŒx i ïŒã¯å€æ°Xãå€x iããšã確çã§ãã ç§ãã¡ã®æ¡ä»¶ã§ã¯ããã®ç¢ºçã¯ãX = x iãã¬ã³ãŒãã®ç·æ°ã§å²ã£ãã¬ã³ãŒãïŒäŸïŒã®æ°ãšèŠãªãããŸãã
ãã®å°ºåºŠã®æå³ãããããç解ããããã«ã2ã€ã®ç°¡åãªäŸãæ瀺ã§ããŸãã 第äžã«ãã¯ã·ãšå°Ÿã®æ倱ãåæ§ã«ãããããªã³ã€ã³ãæããŸãã ãã®å ŽåãåŒã§èšç®ããããšã³ããããŒã¯1ã«ãªããŸããã³ã€ã³ãåžžã«ã€ãŒã°ã«ãäžã«åããŠèœäžããå Žåããšã³ããããŒã¯0ã«ãªããŸãã
å€æ°éã®çžé¢ãèšç®ããã«ã¯ãããã«ããã€ãã®æž¬å®å€ã決å®ããå¿ èŠããããŸãã æåã®ãã®ã¯ãç¹å®ã®æ¡ä»¶ä»ããšã³ããããŒã§ãã
-X = x iã®ã¬ã³ãŒãã«ã€ããŠã®ã¿èšç®ããããšã³ããããŒHïŒYïŒã
çžå¯Ÿãšã³ããããŒïŒæ¡ä»¶ä»ããšã³ããããŒïŒã¯æ¬¡ã®ããã«èŠãªãããŸãã
ãã®ãããªèå³æ·±ãå€ã¯ããèªäœã§ã¯ãããŸãããããã£ãŒãã£Yã®éåžžã®ãšã³ããããŒãšã®éãã§ããã€ãŸãã Xã®å€ãããã£ãŠããå Žåãå€æ°Yãã©ãã ã秩åºç«ã£ãŠãããã瀺ã尺床ã§ãããŸãã¯ãããç°¡åã«èšãã°ãXãšYã®å€ã®éã«çžé¢é¢ä¿ãããããã®å€§ããã§ãã ããã¯ãæ å ±ç²åŸã®äŸ¡å€ã«ãã£ãŠèšŒæãããŸãã
IGãã©ã¡ãŒã¿ãŒã倧ããã»ã©ãçžé¢ã匷ããªããŸãã ãããã£ãŠããã¹ãŠã®æ©èœã®æ å ±ã²ã€ã³ãç°¡åã«èšç®ããã¿ãŒã²ããå€æ°ã«ãããã«åœ±é¿ããæ©èœãé€å€ã§ããŸãã ãããã£ãŠãæåã«ãã¢ãã«ã®èšç®æéãççž®ãã次ã«ãåèšç·Žã®ãªã¹ã¯ã軜æžããŸãã
çžäºæ
å ±ããã³æ
å ±ç²åŸãšã®æ··ä¹±
ãŠã£ãããã£ã¢ã§ã¯ãäžèšã®åŒã¯çžäºæ
å ±éãšåŒã°ããæ
å ±éã²ã€ã³ã¯ ãã«ã«ããã¯-ã©ã€ãã©ãŒè·é¢ãã®å矩èªãšããŠäœ¿çšãããŸãã ããããã»ãšãã©ã®å Žåãæ
å ±ã®ç²åŸãšçžäºæ
å ±ã¯ç°ãªãååãšåããã®ã䜿çšããŸãã ãããã£ãŠããããã®ååã®ããããã§äžèšã®åŒãçºçããå ŽåããããŸãã ç§ã¯ããã«æ
£ããŠãããããšãã£ãŠããã®æž¬å®æ
å ±ãã²ã€ã³ãšåŒã³ãŸãã
ã«ã€äºä¹
ã«ã€äºä¹æ€å®ãšåŒã°ããå¥ã®äžè¬çãªç¹åŸŽãã£ã«ã¿ãªã³ã°æ¹æ³ãæ€èšããŠãã ããã ãããç解ããã«ã¯ã確çè«ããããã€ãã®åŒãæãåºãå¿ èŠããããŸãã ãããã®æåã®ãã®ã¯ãã€ãã³ãã®äº€å·®ïŒä¹ç®ïŒã®ç¢ºçã®åŒã§ãã ã€ãŸã ã€ãã³ãAãšBã®äž¡æ¹ãçºçãã確çïŒ
ããã§ãPïŒA / BïŒã¯ãBãæ¢ã«çºçããŠããå Žåã«ã€ãã³ãAãçºçãã確çã§ãã ãããã®ã€ãã³ããç¬ç«ããŠããå ŽåïŒäžæ¹ã®çºçãä»æ¹ã®çºçã®ç¢ºçã«åœ±é¿ããªãå ŽåïŒã次ã®ããã«ãªããŸãã
ãã®åŒã«åºã¥ããŠãã€ãã³ãAãšã€ãã³ãBãç¬ç«ããŠãããšä»®å®ããå Žåãããããåæã«çºçããäºæ³ç¢ºçãèšç®ã§ããŸãã ãããŠãçŸå®ãç§ãã¡ã®æåŸ ãšã©ã®ããã«ç°ãªãããèšç®ããŸãã ã«ã€äºä¹å ¬åŒã¯æ¬¡ã®ããã«ãªããŸãã
äŸãšããŠã®äœ¿çšãæ€èšããŠãã ããã ç¹å®ã®ç æ°ã®çºçã«å¯Ÿããç¹å®ã®å¹æã®å¹æã調æ»ããããšããŸãã çµ±èšæ å ±ãå«ãããŒãã«ã¯æ¬¡ã®ããã«ãªããŸãã
ç æ° | |||
ã€ã³ãã¯ã | ãããŸã | ãã | åèš |
ã ã£ã | 37 | 13 | 50 |
ãªãã£ã | 17 | 53 | 70 |
åèš | 54 | 66 | 120 |
æåã®è¡ãšæåã®åã®äº€ç¹ã«ããã»ã«ã¯ãé²åºããç æ°ã®æ°ãåæ ããŠããŸãã æåã®è¡ãš2çªç®ã®å-æŽé²ãããããç æ°ã§ã¯ãªããªã©ã®æ°
æåã®ã»ã«ïŒé²åºããŠç æ°ã«ãªã£ãã»ã«ïŒã®æåŸ å€ãèšç®ããŸãã
ä»ã®ã»ã«ã«ã€ããŠãåæ§ã§ãã ãããŠãåŒã«åŸã£ãŠãã«ã€2ä¹ãèšç®ããŸãïŒãã®å Žåã29.1ã§ãïŒã
ãããã£ãŠãç¬ç«ããã€ãã³ãã®å Žåãã«ã€2ä¹ãã©ã¡ãŒã¿ãŒã¯ãŒãïŒãŸãã¯ããã«è¿ãæ°ïŒã«ãªããŸãã ãããã2ã€ã®ç¬ç«ããã€ãã³ãã§ãã®ãããªåçãåŸããã確çãæ£ç¢ºã«ç解ããããã«ããã1ã€ã®æŠå¿µãã€ãŸãèªç±åºŠãå°å ¥ãããŸãã 次ã®ããã«å®çŸ©ãããŸãã
ïŒïŒvariable_values1-1ïŒ*ïŒïŒvariable_values_2-1ïŒ
ïŒvariable_values1ã¯ãå€æ°1ãåãããšãã§ããå€ã®æ°ã§ãïŒãã®å Žåãèªç±åºŠ= 1ïŒã
ã«ã€äºä¹å€ãšèªç±åºŠãæšå®ããããã«ãç¹å¥ãªããŒãã«ããããŸãïŒãã®ãããªããŒãã«ïŒ https://www.easycalculation.com/statistics/chisquare-table.php ïŒïŒã
ã¢ã«ãŽãªãºã ã®åäœã«ã€ããŠã®ã¢ã€ãã¢ãåŸãŸããã ãããããã¡ãããå®éã«ã¯ããã®ã¢ã«ãŽãªãºã ãèªåã§èšè¿°ããå¿ èŠã¯ãªããçµ±èšãæåã§èªã¿åãå¿ èŠã¯ãããŸããã Pythonçšã®scikit-learnã©ã€ãã©ãªã䜿çšãããšãå®è£ ã®è©³çŽ°ã«ã€ããŠèããå¿ èŠããªããªããŸãã
from nltk import WordNetLemmatizer from sklearn.feature_selection import chi2 from sklearn.feature_selection import SelectKBest select = SelectKBest(chi2, k=50) X_new = select.fit_transform(train_data_features, train["sentiment"])
ç§ã®ååã®èšäºã§ã¯ãã«ã€äºä¹çµ±èšã䜿çšããŠNLPåé¡ã解決ããããšã®æå¹æ§ã®äŸãèŠã€ããããšãã§ããŸãã
mRmR
ãããšã¯å¥ã«ãæ©èœãšã¿ãŒã²ããå€æ°ã®éã®çžé¢ã ãã§ãªããæ©èœã®åé·æ§ïŒmRmRïŒæ倧ã®é¢é£æ§ãæã€æå°ã®åé·æ§ïŒïŒãèæ ®ã«å ¥ãããããè€éãªãã£ã«ã¿ãªã³ã°æ¹æ³ã«ã€ããŠç°¡åã«èª¬æããŸãã ãã®ã¡ãœããã¯ã次ã®åŒãæ倧åããããšããŸãã
ããã§ãæåã®é ã¯éžæãããç¹åŸŽã®ã»ããSãšã¿ãŒã²ããå€æ°Yã®éã®çžé¢ãæ倧åãã責任ãããïŒæ å ±ã²ã€ã³æ³ãšåæ§ïŒã2çªç®ã¯ç¹åŸŽéã®çžé¢ãæå°åããŸãã ãããã£ãŠãçµæãšããŠåŸãããæ©èœã®ã»ããã¯é¢é£æ§ãããã ãã§ãªãããã®ã»ããã®æ©èœã¯æå°éã«äºãã«ç¹°ãè¿ãããŸãã ãã®æ¹æ³ã§ã¯ãåã¹ãããã§æé©ãªãã®ãéžæããªãããäžåºŠã«1ã€ãã€æ©èœãã»ããã«è¿œå ãããŸãã
ãã£ã«ã¿ãªã³ã°æ¹æ³ã®é·æãšçæ
ãã®ã¯ã©ã¹ã®ã¡ãœãããåªããŠããã®ã¯ãªãã§ããïŒ èšç®ã³ã¹ããäœããæ©èœã®ç·æ°ã«ç·åœ¢ã«äŸåããŸãã ãããã¯éåžžã«é«éã§ãããã©ãããŒãšåã蟌ã¿ã¡ãœããããããŸãã ããã«ããã¬ãŒãã³ã°ã»ããã®äŸã®æ°ãè¶ ããæ©èœã®æ°ïŒä»ã®ã«ããŽãªã®ã¡ãœãããåžžã«èªæ ¢ã§ãããã®ã§ã¯ãªãïŒã䜿çšããå Žåã§ãããããã¯ããŸãæ©èœããŸãã
圌ãã®æ¬ ç¹ã¯äœã§ããïŒ åæ©èœãåå¥ã«æ€èšããŸãã ãã®ãããäžäœNåã®æãçžé¢ããç¹åŸŽãèŠã€ããããšã¯ãéåžžãäºæž¬ã®ç²ŸåºŠãæé«ã«ãªããµãã»ãããååŸããããšãæå³ããŸããã ç°¡åãªäŸãèããŠã¿ãŸãããã
ãã£ãŒãã£ãŒã®é åãããããã®äžã«X1ãšX2ããããšããŸãã ã¿ãŒã²ããå€æ°ã¯æ¬¡ã®ããã«ãããã«äŸåããŸãã
ïŒè«çé¢æ°XORïŒ
ççå€è¡šã¯æ¬¡ã®ããã«ãªããŸãïŒèª°ããå¿ããå ŽåïŒïŒ
X1 | X2 | Y |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
ãã®ããŒãã«ãèŠãŠãå€æ°X1ã®çµ±èšæ å ±ãå«ãããŒãã«ãäœæããã«ã€2ä¹åŒã䜿çšããŠå€æ°Yãšã®çžé¢ãèšç®ããŸãïŒX2ã®å Žåãåæ§ã§ãïŒã
Y | |||
---|---|---|---|
X1 | 1 | 0 | åèš |
1 | 1 | 1 | 2 |
0 | 1 | 1 | 2 |
åèš | 2 | 2 | 4 |
ã«ã€2ä¹ã0ã«ãªãããšãã€ãŸãããã£ãŒãã£ãšã¿ãŒã²ããå€æ°ã®éã«çžé¢é¢ä¿ããªãããšãèšç®ããã®ã¯ç°¡åã§ãã
ãã®äŸã¯èªåŒµãããŠããŸããããã£ã«ã¿ãªã³ã°æ¹æ³ã§ã¯ãã¿ãŒã²ããå€æ°ã«å¯Ÿããããã€ãã®æ©èœã®è€åå¹æããã£ããã§ããªãããšãããããŸãã
ã©ãããŒã¡ãœãã
ãã®ã«ããŽãªã®ã¡ãœããã®æ¬è³ªã¯ãå ã®ãã¬ãŒãã³ã°ã»ããã®æ©èœã®ããŸããŸãªãµãã»ããã§åé¡åãèµ·åãããããšã§ãã ãã®åŸããã¬ãŒãã³ã°ã»ããã§æé©ãªãã©ã¡ãŒã¿ãŒãæã€æ©èœã®ãµãã»ãããéžæãããŸãã ãããŠããã¹ãã»ããã§ãã¹ããããŸãïŒãã¹ãã»ããã¯ãæé©ãªãµãã»ãããéžæããããã»ã¹ã«é¢äžããŸããïŒã
ãã®ã¯ã©ã¹ã®ã¡ãœããã«ã¯ãåæ¹éžæã¡ãœãããšåŸæ¹éžææ©èœã®2ã€ã®ã¢ãããŒãããããŸãã æåã®ãã®ã¯ã空ã®ãµãã»ããããå§ãŸããããã§ã¯ããŸããŸãªæ©èœãåŸã ã«è¿œå ãããŸãïŒåã¹ãããã§æé©ãªè¿œå ãéžæããããïŒã 2çªç®ã®ã±ãŒã¹ã§ã¯ãã¡ãœããã¯å ã®æ©èœã»ããã«çãããµãã»ããããéå§ããæ¯ååé¡åãåèšç®ããŠãæ©èœãåŸã ã«åé€ããŸãã
ãã®ãããªæ¹æ³ã®1ã€ã®äŸã¯ãååž°çãªç¹åŸŽã®é€å»ã§ãã ååã瀺ãããã«ãå ±éããŒã«ããæ©èœãåŸã ã«é€å€ããããã®ã¢ã«ãŽãªãºã ãæããŸãã Pythonã§ã¯ããã®ã¢ã«ãŽãªãºã ã®å®è£ ã¯scikit-learnã©ã€ãã©ãªã«ãããŸãã ãã®æ¹æ³ã§ã¯ãç·åœ¢ååž°ãªã©ãæ©èœãè©äŸ¡ããåé¡åãéžæããå¿ èŠããããŸãã
from sklearn.feature_selection import RFE from sklearn.linear_model import LinearRegression data= load_data() X = data["data"] Y = data["target"] lr = LinearRegression() #select 5 the most informative features rfe = RFE(lr, 5) selector = rfe.fit(X,Y)
é€å€æ¹æ³ã¯æ©èœéã®é¢ä¿ãããé©åã«è¿œè·¡ããŸãããèšç®ã³ã¹ããã¯ããã«é«ããªããŸãã ãã ãããã¹ãŠã®ã©ãããŒã¡ãœããã¯ããã£ã«ã¿ãªã³ã°ã¡ãœãããããã¯ããã«å€ãã®èšç®ãå¿ èŠãšããŸãã ããã«ãå€æ°ã®æ©èœãšå°ããªãã¬ãŒãã³ã°ã»ããã®å Žåããããã®æ¹æ³ã«ã¯åãã¬ãŒãã³ã°ã®ãªã¹ã¯ããããŸãã
åã蟌ã¿ã¡ãœãã
ãããŠæåŸã«ãæ©èœã®éžæãšåé¡åšã®ãã¬ãŒãã³ã°ãåé¢ãããã¢ãã«ã®èšç®ããã»ã¹å ã§éžæãè¡ãçµã¿èŸŒã¿ã¡ãœããã ããã«ããããã®ã¢ã«ãŽãªãºã ã¯ãã©ãããŒã¡ãœãããããå°ãªãèšç®ãå¿ èŠãšããŸãïŒãã ãããã£ã«ã¿ãªã³ã°ã¡ãœãããããå€ãïŒã
ãã®ã«ããŽãªã®äž»ãªæ¹æ³ã¯ãæ£ååã§ãã ããã«ã¯ããŸããŸãªçš®é¡ããããŸãããåºæ¬çãªååã¯äžè¬çã§ãã æ£ååãªãã§åé¡åšã®äœæ¥ãèæ ®ããå Žåããã¬ãŒãã³ã°ã»ããã®ãã¹ãŠã®ãã€ã³ãã®äºæž¬ã«æé©ã«èª¿æŽããããããªã¢ãã«ãæ§ç¯ããããšã«ãªããŸãã
ããšãã°ãåé¡ã¢ã«ãŽãªãºã ãç·åœ¢ååž°ã®å Žåãç¹åŸŽãšã¿ãŒã²ããå€æ°éã®äŸåé¢ä¿ãè¿äŒŒããå€é åŒã®ä¿æ°ãéžæãããŸãã æšæºèª€å·®ïŒ RMSE ïŒã¯ãéžæããä¿æ°ã®å質ã®è©äŸ¡ãšããŠäœ¿çšãããŸãã ã€ãŸã ãã©ã¡ãŒã¿ã¯ãå®éã®ãã€ã³ãããåé¡åšã«ãã£ãŠäºæž¬ããããã€ã³ãã®åèšåå·®ïŒããæ£ç¢ºã«ã¯ãåå·®ã®åèšå¹³æ¹ïŒãæå°ã«ãªãããã«éžæãããŸãã
æ£ååã®èãæ¹ã¯ããšã©ãŒã ãã§ãªãã䜿çšãããå€æ°ã®æ°ãæå°éã«æããã¢ã«ãŽãªãºã ãæ§ç¯ããããšã§ãã
Tikhonovæ£ååæ³ïŒãªããžååž°ïŒ
ç·åœ¢ååž°ã®äŸã§åãããšãèŠãŠã¿ãŸãããã ãã¹ãã»ããã§ç¹åŸŽAã®ãããªãã¯ã¹ãšã¿ãŒã²ããå€æ°bã®ãã¯ãã«ãäžããããå ŽåãAx = bã®åœ¢åŒã®è§£ãæ¢ããŠããŸãã ã¢ã«ãŽãªãºã ã®åäœäžã次ã®åŒã¯æå°åãããŸãã
ããã§ãæåã®é ã¯åãªãäºä¹å¹³åå¹³æ¹æ ¹èª€å·®ã§ããã2çªç®ã®é ã¯æ£ååæŒç®åïŒãã¹ãŠã®ä¿æ°ã®å¹³æ¹åã«ã¢ã«ãã¡ãæãããã®ïŒã§ãã ã¢ã«ãŽãªãºã ã®åäœäžãä¿æ°ã®ãµã€ãºã¯å¯Ÿå¿ããå€æ°ã®éèŠæ§ã«æ¯äŸãããšã©ãŒã®é€å»ã«æãè²¢ç®ããªããã®ã¯ã»ãŒãŒãã«ãªããŸãã
ã¢ã«ãã¡ãã©ã¡ãŒã¿ã«ã€ããŠã®ããã€ãã®èšèã ç·éã«å¯Ÿããæ£ååæŒç®åã®å¯äžãã«ã¹ã¿ãã€ãºã§ããŸãã ããã䜿çšããŠãåªå 床ïŒã¢ãã«ã®ç²ŸåºŠãŸãã¯äœ¿çšããå€æ°ã®æå°æ°ïŒãæå®ã§ããŸãã
Tikhonovæ£ååã§ç·åœ¢ååž°ã䜿çšããå Žåã¯ãè»èŒªãåçºæããå¿ èŠã¯ãããŸããã scikit-learnã©ã€ãã©ãªã«ã¯ããã®ã¿ã€ãã®æ£ååãå«ãRidgeååž°ãšåŒã°ããã¢ãã«ããããŸãã
from sklearn.linear_model import Ridge data= load_data() X = data["data"] y = data["target"] clf = Ridge(alpha=1.0) clf.fit(X, y)
ã¢ã«ãã¡ãã©ã¡ãŒã¿ãæåã§èª¿æŽã§ããããšã«æ³šæããŠãã ããã
ãªããªã
æ£ååæŒç®åã®éããé€ããŠãåã®ãã®ãšåæ§ã§ãã ããã¯äºä¹ã®åèšã§ã¯ãªããä¿æ°ã®ã¢ãžã¥ã©ã¹ã®åèšã§ãã ããããªéãã«ãããããããããããã£ã¯ç°ãªããŸãã å°Ÿæ ¹ã«ããå Žåãã¢ã«ãã¡ã倧ãããªããšããã¹ãŠã®ä¿æ°ã®å€ã¯ãŒãã«è¿ããªããŸãããéåžžã¯ãŸã æ¶ããŸããã LASSOã§ã¯ãã¢ã«ãã¡ãå¢å ãããšããŸããŸãå€ãã®ä¿æ°ããŒãã«ãªããã¢ãã«ãžã®å¯äžãå®å šã«åæ¢ããŸãã ãããã£ãŠãå®éã«éžæã§ããæ©èœã¯ãããããããŸãã éèŠãªæ©èœã¯ä¿æ°ããŒã以å€ã«ä¿ã¡ãéèŠã§ãªãæ©èœã¯ãªã»ãããããŸãã ãããã®ããããã£ã«ã€ããŠè©³ããèããŠãããšãã°ãã®è¬çŸ©ã®ã°ã©ããèŠãŠãã ããïŒãããŠãããã«ã€ããŠã¯è©³ãã説æããŸããããElastic Netã«ã€ããŠåŠã³ãŸãïŒã
scikit-learnã©ã€ãã©ãªã§ãã®ã¡ãœããã䜿çšããããšããåã®ã¡ãœãããšåãã§ãã Ridgeã®ã¿ãLassoã«çœ®ãæããããŸãã
ãããã£ãŠãæ£èŠåã¯ãã¢ãã«ã®é床ã®è€éãã«å¯Ÿããäžçš®ã®ããã«ãã£ã§ãããæ©èœéã«äžèŠãªãã®ãããå Žåã«éåŠç¿ãã身ãå®ãããšãã§ããŸãã æ£ååã¯ç·åœ¢ã¢ãã«ã§ã®ã¿çºçãããšèããã¹ãã§ã¯ãããŸãããããŒã¹ãã£ã³ã°ããã³ãã¥ãŒã©ã«ãããã¯ãŒã¯ã«ã¯ç¬èªã®æ£ååæ¹æ³ããããŸãã
æ£ååã®ãã€ãã¹ã®ãã¡ãã¢ãã«ãç¹åŸŽã®é åå šäœã«åºã¥ããŠæ§ç¯ãããŠãããšããäºå®ã«æ³šç®ããããšãã§ããŸããã€ãŸããåé¡åšã®äœæ¥ãå éããŸããã ããããäžè¬çãªå Žåããã®ã¡ãœããã¯ãã£ã«ã¿ãªã³ã°ã¡ãœãããããå€æ°ã®çžäºäŸåæ§ãããé©åã«ãã£ããã§ããŸãã
çµè«ãšããŠ
ç¹å®ã®ç¶æ³ã§ããããã®æ¹æ³ãéžæããããã®åºæºã«ã€ããŠã®çµè«ã¯æžããŸããã ã»ãšãã©ã®å Žåãçµã¿èŸŒã¿ã®ã¡ãœããã䜿çšããã®ãæãç°¡åã§äŸ¿å©ã§ãã å¯èŠæ§ãå¿ èŠãªå Žå-ãã£ã«ã¿ãªã³ã°ã¡ãœããã§æäŸã§ããŸãã æ®ãã¯ç·Žç¿ã®åé¡ã ãšæããŸãã
ç§ã¯ã³ã¡ã³ããèããŠããããã§ãã ããªãã®æèŠã§ãããã¹ãã«äžæ£ç¢ºããããå Žåãäœããæ¬ ããŠããå Žåãäœããç解ã§ããªãå Žåãå®éã®èŠ³å¯ãå ±æãããå Žåã¯ãæžããŸãã
åç §è³æ
stats.stackexchange.com/questions/13389/information-gain-mutual-information-and-related-measures
www.coursera.org/course/mmds
www.cs.binghamton.edu/~lyu/publications/Gulgezen-etal09ECML.pdf
habrahabr.ru/company/mailru/blog/254897
machinelearningmastery.com/an-introduction-to-feature-selection
ocw.jhsph.edu/courses/fundepiii/pdfs/lecture17.pdf
blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side
ai.stanford.edu/~ronnyk/wrappersPrint.pdf
www.math.kent.edu/~reichel/publications/modtikh.pdf
scikit-learn.org/stable/modules/linear_model.html