ç§ã«ãšã£ãŠæ©æ¢°åŠç¿ã®ç¥èã¯ã2017幎2æã®ML Boot Camp IIIããå§ãŸãããã®ãããªã¿ã¹ã¯ãã©ãåŠçãããã«ã€ããŠã®ããçš®ã®ã¢ã€ãã¢ããä»ãŸãã«åœ¢ã«ãªãå§ããŠããŸãã 第5åã³ã³ãã¹ãã§è¡ãããããšã®å€ãã¯ããŸã第äžã«ãkaggleã«é¢ããèšäºã®ã³ã¬ã¯ã·ã§ã³ããã£ã¹ã«ãã·ã§ã³ãããã³ããããã®ã³ãŒãäŸãæ€èšããçµæã§ãã 以äžã¯ã3äœãç²åŸããããã«äœãããªããã°ãªããªãã£ããã«ã€ããŠããããã«æ¹èšãããã¬ããŒãã§ãã
ã¿ã¹ã¯ããŒã¿
ããŒã¿ã»ããã¯ã100,000ã®å®éã®èšåºãã¹ããã圢æãããŸãã 幎霢ã身é·ãäœéãæ§å¥ãäžè¡å§ãšäžè¡å§ãã³ã¬ã¹ãããŒã«ãè¡ç³å€ãäžããããŸãã
ããã«ãã䞻芳çãªã蚌æ ããããŸã-æ£è ãèªåèªèº«ã«ã€ããŠå ±åããå«ç ãã¢ã«ã³ãŒã«æ¶è²»ã身äœæŽ»åã«é¢ãã質åã«çãã ããŒã¿ã®ãã®éšåãäž»å¬è ã«ãã£ãŠå°ç¡ãã«ãããã®ã§ãç§ã¯åœŒãã«ç¹ã«åžæã¯ãããŸããã§ããã
åæããŒã¿ã«ã¯æããã«éçŸå®çãªå€ãå«ãŸããŠããŸããã3020幎以äžã§50 cmã®æé·ãããã16020ã®ãããªå§åãšè² å§å§åããããŸããã ããã¯ãåæããŒã¿ãæåã§å ¥åããéã®ãšã©ãŒã«ãã£ãŠèª¬æãããŸããã
ããŒã«
ã¿ã¹ã¯ã¯ããã®å Žåã®æšæºã©ã€ãã©ãªã䜿çšããŠPythonã§è§£æ±ºãããŸããã
- ãã³ã -衚圢åŒããŒã¿ã®èªã¿æžããšåŠçïŒå®éã«ã¯ãã£ãšãããããããŸããããã®å Žåã¯æ®ãã¯å¿ èŠãããŸããã§ããïŒ;
- NumPy-æ°å€ã®é åã«å¯Ÿããæäœã
- scikit-learn-åºæ¬çãªMOã¢ã«ãŽãªãºã ãããŒã¿åå²ãæ€èšŒãªã©ãæ©æ¢°åŠç¿çšã®ããŒã«ã»ããã
- XGBoost-åŸé ããŒã¹ãã£ã³ã°ã®æãäžè¬çãªå®è£ ã®1ã€ã
- LightGBM -XGBoostã®ä»£æ¿åã
- TensorFlow + Keras-ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®ãã¬ãŒãã³ã°ãšäœ¿çšã®ããã®ã©ã€ãã©ãªãšãã®ã©ãããŒã
- Hyperopt-æå®ãããåŒæ°ã¹ããŒã¹ã§é¢æ°ãæé©åããããã®ã©ã€ãã©ãªã
CSVãšãã¯ã«ã¹
é·ãèšç®äžã«ããŒã¿ãä¿åããããã«ãå¥ã ã®ããŒãã«ãããè€éãªæ§é ãäžç·ã«ä¿åããå¿ èŠããããŸã§ãæåã«csvã䜿çšããŸããã pickleã¢ãžã¥ãŒã«ã¯éåžžã«åªããŠããããšãå€æããŸãããå¿ èŠãªããŒã¿ã¯ãã¹ãŠ2è¡ã®ã³ãŒãã§ä¿åãŸãã¯èªã¿åãããŸãã åŸã§ç§ã¯å§çž®ãã¡ã€ã«ã«ä¿åãå§ããŸããïŒ
with gzip.open('../run/local/pred_1.pickle.gz', 'wb') as f: pickle.dump((x, y), f)
ãªããžããª
ãã¹ãŠã®ç«¶äºã³ãŒãã¯githubã«ãããŸãã å€ãã¹ã¯ãªããã¯old /ã®ãªããžããªã«é ãããŠããŸãããå®éã®ã¡ãªããã¯ãªããäœæ¥ã®çµæãæ€èšŒã®ããã«éä¿¡ãããããã«æ®ãããŠããŸãã ã³ãŒãã®ãšã©ãŒã«ãããå®è¡ã®äžéçµæã¯åŸã§äœ¿çšã«é©ããªãããšãå€æãããããã³ãŒãã®ãã®éšåã¯æçµæ±ºå®ã«åœ±é¿ããŸããã§ããã
æåã®2é±é
æåã®2é±éã§ããŒã¿ãã¯ãªãŒã³ã¢ããããéå»ã®ç«¶æäŒã®æ®ãã®ã¢ãã«ã«çœ®ãæããŸããããããã¯ããŸãæåããŸããã§ããã æ°ãããµããããããšã«ãæ¢åã®ã¹ã¯ãªããã®1ã€ããã®ã³ãŒãå šäœãæ°ããã¹ã¯ãªããã«å®å šã«ã³ããŒãããããã§ç·šéãããŠããŸãã çµæ-2é±ç®ã®çµããã«ã¯ãæåŸã®ã¹ã¯ãªãããäœãããŠããã®ããã©ã®ã¹ã¯ãªããã§å®éã«äœ¿çšãããŠããŠãã©ã®ã¹ã¯ãªãããå®è¡ãããã®ããããããŸããã§ããã ã³ãŒãã¯æ±ãã«ãããèªã¿ã¥ãããæ°æéåäœãããŸã£ããæçšãªãã®ãäœãä¿æããã«ã¯ã©ãã·ã¥ããå¯èœæ§ããããŸããã
åŸã®2é±é
éå§ãã2é±éåŸãå€ãã¹ã¯ãªãããã³ããŒããŠå°ãå€æŽããããšãéåžžã«å°é£ã«ãªã£ããšããã³ãŒãå šäœã®å®å šãªå€æŽãéå§ããå¿ èŠããããŸããã åºæ¬ã¯ã©ã¹ãšãã®ç¹å®ã®å®è£ -äžè¬çãªéšåã«åå²ãããŸããã
æ°ããã³ãŒãç·šæã®äžè¬çãªèãæ¹ã¯ãããŒã¿âå±æ§âã¬ãã«1ã¢ãã«âã¬ãã«2ã¢ãã«ã®ãã€ãã©ã€ã³ã§ãã åã¹ããŒãžã¯åå¥ã®ã¹ã¯ãªãããã¡ã€ã«ãå®è£ ããèµ·åæã«å¿ èŠãªãã¹ãŠã®èšç®ãå®è¡ããããããäžéçµæãããã³ããŒã¿ãä¿åããŸãã 次ã®åã¹ããŒãžã®ã¹ã¯ãªããã¯ãåã®ã¹ããŒãžã®ã³ãŒããã€ã³ããŒãããã¡ãœããããåŠççšã®ããŒã¿ãåãåããŸãã ãã®ãã¹ãŠã®èåŸã«ããèãæ¹ã¯ãæçµã¢ãã«ã®1ã€ã«å¯ŸããŠã¹ã¯ãªãããå®è¡ã§ããããã«ããããã«ãäžäœã¬ãã«ã®ã¢ãã«ã«å¯ŸããŠã¹ã¯ãªãããå®è¡ããå¿ èŠãªå±æ§ãžã§ãã¬ãŒã¿ãŒãåŒã³åºããŠãããŒã¿ã¯ãªãŒãã³ã°ã«å¿ èŠãªãªãã·ã§ã³ãèµ·åãããšããããšã§ãã åã¹ã¯ãªããã®ã¿ã¹ã¯ã¯ããã¡ã€ã«ãååšãããã©ããããã®çµæãä¿åããå Žæã確èªããååšããªãå Žåã¯å¿ èŠãªèšç®ãå®è¡ããŠããŒã¿ãä¿åããããšã§ãã
ãã®èåŸã«ã¯ãé©åãªã¿ã¹ã¯ã§ãããã°ããªãããå°æ¥ã®äœ¿çšã®ããã«ã¢ãã«ãšããŒã¿ãæŽçãã決å®ãäžãèšç»ããããŸããã å®éããã®æ±ºå®ã¯ã³ã³ãã¹ãã®æãéèŠãªçµæã§ããããã®ã³ã³ãã¹ãã¯ããã®çš®ã®åŸç¶ã®ã³ã³ãã¹ãã«åå ããéã®ç掻ã楜ã«ããããã«ãåŸã ã«å°ããªå³æžé€šã«çºå±ããŠããŸãã
äžè¬èšç»
æåã¯2ã¬ãã«ã®ã¢ãã«ãèšç»ãããŠããŸãããã第1ã¬ãã«ã§ã¯ã§ããã ãå€ãã®ç°ãªãã¢ãã«ãæºåããå¿ èŠããããŸããã ãããå®çŸããæ¹æ³ã¯ãå¯èœãªéãå€ãã®ç°ãªãåŠçããŒã¿ãæºåãããã®äžã§åãã¢ãã«ããã¬ãŒãã³ã°ããããšã§ãã ããããããŒã¿ã®æºåã«ã¯é·ãæéãããããŸãã ããŒã¿ãæäœããããšãæåã®éµã§ããïŒååãªæ°ã®ææ矩ãªèšå·ãè¿œå ããããšã§æãç°¡åãªã¢ãã«ã䜿çšã§ããŸãïŒãå¿ èŠä»¥äžã«æéãããããŸãã 代æ¿æ段ã¯ãã«ãŒããã©ãŒã¹ãœãªã¥ãŒã·ã§ã³ã§ããã€ãŸããæ¯èŒçäžçšåºŠã®ããŒã¿åŠçãšæ倧èšç®æéã§ãã
ãã®ã¢ãããŒãã§æãç°¡åãªããšã¯ãããã€ãã®æ¹æ³ã§ããŒã¿ãåŠçããããã€ãã®è¿œå æ©èœã»ãããèãåºãããããã®çµã¿åããã䜿çšããããšã§ãã ã©ã³ãã éšå空éæ³ã®ããããªå€åœ¢ãå€æããŸãããå®å šãªãã®ãšã¯ç°ãªãããŸã£ããã©ã³ãã ã§ã¯ãªããã°ã«ãŒãã«ãã£ãŠããã«èšå·ãéžæãããŸãã ãã®ãããå°æ°ã®è¿œå æ©èœã䜿çšããŠãåŠçãããããŒã¿ã®æ°çŸã®ãªãã·ã§ã³ãååŸã§ããŸãïŒå®éã«ã¯ãã¯ãªãŒãã³ã°æ¹æ³ã®æ°*ïŒ2 ^ç¹æ§ã°ã«ãŒãã®æ°ïŒïŒã ãã®ãããªã¢ãããŒãã¯ããããããã¬ãã«2ã¢ãã«ã®å質ãåäžãããããã«ãç¹åŸŽã®ç°ãªããµãã»ããã䜿çšããåçŽãªã¢ãã«ã«ãŸã£ããç°ãªããœãªã¥ãŒã·ã§ã³ãæäŸãããšæ³å®ãããŠããŸããã
ããŒã¿æºå
å ã®ããŒã¿ãæ±ããŠãããšããäºå®ã¯ãäœããã®åœ¢ã§èæ ®ããªããã°ãªããŸããã§ããã äž»ãªã¢ãããŒãã¯ãæããã«äžå¯èœãªå€ããã¹ãŠç Žæ£ããããäœããã®æ¹æ³ã§å ã®ããŒã¿ã埩å ããããšã§ãã ãã®ãããªæªã¿ã®åå ã¯ã»ãšãã©æåŸãŸã§äžæã®ãŸãŸã§ãã£ããããããã€ãã®æ¹æ³ã§ããŒã¿ãæºåãããããã®ç°ãªãã¢ãã«ããã¬ãŒãã³ã°ããå¿ èŠããããŸããã
åããŒã¿åŠçãªãã·ã§ã³ã¯ãåŠçããããšã察å¿ããå€æŽãå«ãå®å šãªããŒã¿ã»ãããè¿ãã¯ã©ã¹ã«ãã£ãŠå®è£ ãããŸãã ãã®æ®µéã§ã®ããŒã¿åŠçã¯éåžžã«éãçµéãããããäžéçµæã¯æ¯èŒçé·ãããŒãžã§ã³ïŒ2ïŒã§ã®ã¿ä¿åãããŸãã-xgboostã䜿çšãã䞻芳çãªå±æ§ã®åŸ©å ã æ®ãã®ããŒã¿ã¯èŠæ±ã«å¿ããŠçæãããŸããã
åŠçãªãã·ã§ã³ïŒ
- ãã¹ãã®äž»èŠ³çãªéšåã®æãªãããå€ã0.0001ã«çœ®ãæããŠãæ°å€åœ¢åŒã«ããããç¡å·ã®ãã®ãšåºå¥ããããã®åæããŒã¿ã
- ç ŽæããèªèŠçç¶ã¯çœ®ãæããããŸãããã¢ã«ã³ãŒã«æ¶è²»-0ãã¢ã¯ãã£ããã£-1ãããã«ãæ®ãã®ããŒã¿åã§ã¯ãå«ç ã¯ãå埩ãããŸããã
- èªèŠçç¶ã埩å ãããããŒã¿ã§ã¯ã極端ãªå§åå€ãæ¶å»ãããŸãã
- 埩å ãããèªèŠçç¶ã®ããããŒã¿ïŒ2é ããïŒã§ã¯ãå§åãäœéã身é·ã®æ¥µç«¯ãªå€ãæ¶å»ãããŸããã
- 粟補ãããå§åã®ã¿ã®ããŒã¿ïŒæ¡é 3ããïŒã§ã¯ãééãé«ããå§åãããã«ã¯ãªãŒãã³ã°ãããŸãã
- 粟補ãããå§åã®ããŒã¿ã¯ããã«å€æãããŸã-æé·ãäœéããŸãã¯å§åã®åã ã®ä¿¡ããããå€ã¯ãã¹ãŠNaNã«çœ®ãæããããŸãã
ãµã€ã³
åŠçãããããŒã¿ããè¿œå ã®ç¹æ§ãçæãããŸããã ãããã®æå³ã®ãããã®ã¯ã»ãšãã©ãããŸããã§ãã-ããã£ãã¹ã€ã³ããã¯ã¹ãæ§å¥ãäœéãå€ããã©ãŒãã¥ã©ã«ãã幎霢ã«å¿ããæåŸ å§åå€ãªã© ããªãç°¡åãªæ¹æ³ã§ãããŒã¿ãå«ãããã«å€ãã®åãèªåçã«ååŸãããŸããã
è¿œå ã®å±æ§ã¯ãåŠçãããããŒã¿ã®ããŸããŸãªããªã¢ã³ãããçæãããŸããããå€ãã®å Žååãæ¹æ³ã§çæãããŸããã ç¹æ§ã®äžéšã¯å€ãåèšç®ããã®ã«æéãããããããå¯èœæ§ããããããèšå·ã®åã¯å¥ã ã«ä¿åãããŸããã ã¹ã¯ãªããå ã®å±æ§ã®èšç®ã¯ãããŒã¿ã¯ãªãŒãã³ã°ãšåæ§ã«å®è£ ãããŸãããåã¹ã¯ãªããã§ã¯ãå±æ§ã®è¿œå ã®åãè¿ãã¡ãœããã決å®ãããŸããã
è¿œå æ©èœã®ã°ã«ãŒãïŒ
- æãåçŽãªæå³ã®ããå
åã¯ã BMI ãèå§ãã¿ã€ãã®å§åã®å¹³åå€ã§ã $ inline $ \ frac {ap \ _hi + x * ap \ _lo} {x + 1} $ inline $ xã®ç°ãªãå€ã«å¯ŸããŠã 幎霢/äœéããšã«å§åãèšç®ããããã®è¿äŒŒåŒãååŸãããåæ£è
ã«ã€ããŠäºæ³ãããå§åãèšç®ãããŸãïŒåœ¢åŒã®åŒ $ inline $ ap \ _X = a + b * age + c * weight $ inline $ ïŒ çã®å€ã«åºã¥ããŠèšç®ãããŸãã
- è«æ±é 1ãšåãã§ãããããã«ãå©çšå¯èœãªå§åã«åºã¥ããŠãæ£è ã®äœéãå埩ããè©Šã¿ãè¡ãããŸããã ãã®æ¹æ³ã§äºæž¬ãããåèšå·ã«ã€ããŠããå®éã®ãå€ãšã®å·®ãè¿œå ãããŸãã çã®å€ã«åºã¥ããŠèšç®ãããŸãã
- æåã§åºåãããçããŒã¿åã®ããã¹ãè¡šçŸ-æåã¯å·ŠåŽã«é
眮ããã次ã«å³åŽã«é
眮ãããŸãã ã·ã³ãã«ã¯ãæ°å€ïŒ ordïŒïŒ ïŒã«çœ®ãæããããŸãã è¡ãçãããŠãã¹ãŠã®åã«ååã§ã¯ãªãå Žåã-1ãèšå®ãããŸããã
- è«æ±é
3ãšåãã§ãããçµæã®åã¯ãã€ããªãšã³ã³ãŒãïŒã¯ã³ããããšã³ã³ãŒãïŒã§ãã
- ãã©ã°ã©ã4ã®ããŒã¿ã¯PCAãééããŸãããã ã¡ã«ã»ãã¹ã® kaggleã§ã®æè¿ã®ç«¶äºã®éãéºç£ã§ãã
- 幎霢ãé€ããã¹ãŠã®çã®ãœãŒã¹åã«ã€ããŠãã¿ãŒã²ããåã®å¹³åå€ãèšç®ãããŸãã ãããè¡ãã«ã¯ãæåã«å§åã身é·ãäœéã®å€ã10ã§å²ããäžžããŠããããããã«ããŽãªå±æ§ãååŸããŸãã 次ã«ãããŒã¿ã10åå²ããããããã«ã€ããŠ9åå²ããåã«ããŽãªã«ã€ããŠã¿ãŒã²ããåã®å éå¹³åå€ïŒç
æ°ãç
æ°ã§ã¯ãªãïŒãèšç®ããŸããã å¹³åå€ãèšç®ãããã®ãäœããªãå Žåãç§ã¯åçŽã«ã°ããŒãã«å¹³åå€ãèšé²ããŸããã
- è«æ±é
6ãšåãã§ãããå¹³åã¯è«æ±é
2ã®ç¹æ§ã«ã€ããŠãèšç®ãããŸããã
- ãã©ã°ã©ã7ãšåãã§ããããªãã·ã§ã³No. 5ã«åŸã£ãŠã¯ãªãŒãã³ã°ããããã®ãåæããŒã¿ãšããŠäœ¿çšãããŸããã
- ãã©ã°ã©ã7ãšåãã§ããããªãã·ã§ã³No. 3ã«åŸã£ãŠã¯ãªãŒãã³ã°ããããã®ãåæããŒã¿ãšããŠäœ¿çšãããŸããã
- çããŒã¿ã¯k-meansæ³ã«ãã£ãŠã¯ã©ã¹ã¿ãŒåãããã¯ã©ã¹ã¿ãŒã®æ°ã¯ä»»æã«éžæãããŸãïŒ2ã5ã10ã15ã25ïŒããããã®åã±ãŒã¹ã®ã¯ã©ã¹ã¿ãŒçªå·ã¯ãã€ããªãšã³ã³ãŒããããŸãã
- ãã©ã°ã©ã10ãšåãã§ããã䜿çšãããããŒã¿ã¯ãªãã·ã§ã³3ã«åŸã£ãŠã¯ãªã¢ãããŸããã
ã¢ãã«
ã¢ãã«ã¯éåžžã«é·ãæéïŒæ°åæéïŒåäœãããšã©ãŒãçºçããããæå³çã«äžæãããããå¯èœæ§ããããããæçµçµæã ãã§ãªãäžéããŒã¿ãä¿åããå¿ èŠããããŸãã ãã®ãããåã¢ãã«ã«ã¯ããŒã¹åãäžããããŸãã ããã«ãã¢ãã«ã®ååãšããŒã¿ã«å²ãåœãŠãããååããããã®ããŒã¿ããããã¡ã€ã«ã®ååãååŸãããŸãã ãã¹ãŠã®ä¿åãšèªã¿èŸŒã¿ã¯ãã¢ãã«ã®åºæ¬çãªæ¹æ³ã§è¡ãããŸããããã«ãããäžéããŒã¿ãåäžã«ä¿åãããŸãã å°æ¥ã®èšç»-ãã¡ã€ã«ã§ã¯ãªããããŒã¿ããŒã¹ã«ããŒã¿ãä¿åããŸãã 䜿çšãããå®è£ ã®æ¬ ç¹ã¯ãã¢ãã«ãã³ããŒãããšãã«ååãæŽæ°ããããšãå¿ããŠãå ã®ã¢ãã«ãšãã®ã³ããŒã®ããŒã¿ã®æªå®çŸ©ç¶æ ãååŸã§ããããšã§ãã
ã¢ãã«ãèšç®çµæãä¿åããŠããå Žåãæ®ã£ãŠããã®ã¯ããããèªã¿åã£ãŠåŒã³åºãå ã«è¿ãããšã ãã§ãã äžéçµæã®ã¿ãããå Žåã¯ãããããåèªã¿åãããå¿ èŠããããŸããã ããã«ãããç¹ã«æ°æéã®ã³ã³ãã¥ãŒãã£ã³ã°ã«é¢ããŠã¯ãå€ãã®æéãç¯çŽã§ããŸãã
ã¢ãã«ã«ãã£ãŠä¿åãããããŒã¿ã®äž»ãªåé¢ã¯ããã®ããŒã¿ã®å¯¿åœã§ãã ãã®ãããªåããŒã¿ã°ã«ãŒãã«ã¯ãä¿åçšã®ç¬èªã®ããŒã¹ãã¹ããããŸãã åèšã§3ã€ã®ã°ã«ãŒãããããŸãã
- 次åã®èµ·åæã«äœ¿çšãããªãäžæçãªãã®ãããšãã°ãåã ã®ãã©ãŒã«ãã«å¯Ÿãããã¥ãŒã©ã«ãããã¯ãŒã¯ã®æé©ãªéã¿ã
- ãããã®ã¢ãã«ã¯ã次ã®èµ·åæã«å¿ èŠã«ãªããŸã-ä»ã®ã»ãšãã©ãã¹ãŠã
- è¿œå æ©èœãªã©ãããã€ãã®ã¢ãã«ã«åœ¹ç«ã€ã°ããŒãã«ã«åœ¹ç«ã€ããŒã¿ã
ãã¹ãŠã®ã¢ãã«ã®ã€ã³ã¿ãŒãã§ãŒã¹ã¯å ±éã§ãããéåžžã®ã¹ã¯ãªãããšããŠåå¥ã«å®è¡ã§ããã ãã§ãªããPythonã¢ãžã¥ãŒã«ãšããŠããŒãããããšãã§ããŸãã äžéšã®ã¢ãã«ãä»ã®ã¢ãã«ã®çµæãå¿ èŠãšããå ŽåãããããããŒãããŠå®è¡ããŸãã ãã®çµæãã¬ãã«2ã®åã¢ãã«ã®èª¬æã¯ãçµæãçµã¿åãããå¿ èŠãããã¢ãã«ã®ååã®ãªã¹ããšã貪欲ãªã¢ã«ãŽãªãºã ã§èšå·ãéžæããå¿ èŠæ§ã®èšå·ã«çž®å°ãããŸããã
ã¢ãã«ã®äžã«ã¯ãã¥ãŒã©ã«ãããã¯ãŒã¯ã«åºã¥ãããã®ããããåºåã§ã¯éåžžã«èªä¿¡ã®ãã0ãŸãã¯1ãŸãã¯æ¥µå€ã«éåžžã«è¿ãå€ãäžããããšãã§ããŸããã ãšã©ãŒãçºçããå Žåããã®ãããªèªä¿¡ã¯å¯Ÿæ°æ倱ã«ããéåžžã«çŽ°ãã眰éãç§ãããããããä¿åæã«ãã¹ãŠã®ã¢ãã«ã®å€ãåãæšãŠãããå°ãªããšã1e-5ã0ãŸãã¯1ã®ãŸãŸã«ãªããŸããã æãç°¡åãªæ¹æ³ã¯ãnp.clipïŒzã1e-5ã1-1e-5ïŒãè¿œå ããŠããããå¿ããããšã§ãã ãã®çµæããã¹ãŠã®ã¢ãã«ã®ããŒã¿ã¯ã«ãããããŸãããããããã®ã»ãšãã©ã¯ãã§ã«çŽ0.1ã0.93ã®ç¯å²ã®çµæããããããŸããã
ãã€ããŒãªãã
ã¢ãã«ã®ãã©ã¡ãŒã¿ãŒã調æŽããã«ã¯ãhyperopt ïŒ è©³çŽ° ïŒã䜿çšããå¿ èŠããããŸããã çµæã¯æ¹åãããŸããããé·ãéãç¹ã«é ãã¢ãã«ã§ã¯ãè©Šè¡åæ°ãçŽ20ã«èšå®ããŸããããŸããçµäºã®2æ¥åã«ã èšäºã«èšèŒãããŠãããã€ããŒãªããããŒãã¹ãã©ãããèŠã€ãããŸãã-ããã©ã«ãã§ã¯ãæåã®20åã®ã¢ãã«èµ·åã¯ã©ã³ãã ãã©ã¡ãŒã¿ãŒã§å®è¡ããã ãœãŒã¹ã§ç¢ºèªã§ããŸãã ã¢ãã«ã®ããã€ããæ©æ¥ã«è©³ãã説æããå¿ èŠããããŸããã
ã¬ãã«1ã¢ãã«
åã¢ãã«ã®å ¥åããŒã¿ã®éžæã¯ãã¬ãã«1ã¢ãã«ã®äžè¬çãªã³ãŒãã«åé¡ãããŸãããœãŒã¹ããŒã¿ãš0å以äžã®å±æ§ã°ã«ãŒããã¯ãªãŒãã³ã°ããããã®ãªãã·ã§ã³ã¯åžžã«1ã€ã§ãã å ±éã®ããŒã¿ã»ãããžã®ããŒã¿ãšæ©èœã®ã³ã¬ã¯ã·ã§ã³ã¯ãã¢ãã«ã«å ±éã®ã³ãŒãã§å®è£ ãããŸãã ããã«ãããåã ã®ã¢ãã«ã®ã³ãŒããåæžãããç¹å®ã®åæããŒã¿ãšè¿œå æ©èœãæå®ãããŸããã
æé©åã®ããã®å ±éã³ãŒããäœæããã®ã«ååãªæéããªãã£ããããåã ã®ããŒã¹ã¬ãã«1ã¢ãã«ã¯äŸç¶ãšããŠäºãã«åŒ·ãã³ããŒããŸãã åèšã§ã2çš®é¡ãå€æããŸããã
- ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒkerasïŒ
- æšïŒXGBoostãLightGBMãrfãetïŒ
ãã¥ãŒã©ã«ãããã¯ãŒã¯ã«åºã¥ããŠäœ¿çšãããã¢ãã«ã®äž»ãªéãã¯ããã€ããŒãã©ã¡ãŒã¿ãŒã®é©åããªãããšã§ãã ä»ã®ã¢ãã«ã§ã¯ãhyperoptã䜿çšãããŸããã
ãã¥ãŒã©ã«ãããã¯ãŒã¯
ç§ã¯ãã¥ãŒã©ã«ãããã¯ãŒã¯ã®ãã©ã¡ãŒã¿ãŒãçå£ã«éžæããªãã£ããããçµæã¯ããŒã¹ãã£ã³ã°ã®çµæãããæªãã£ãã ãã£ããã®åŸã®æ¹ã§ã 64-64ã®ãããªæŒããããreluã¢ã¯ãã£ããŒã·ã§ã³ãšã åå±€ ã« 1ã5åã®ãã¥ãŒãã³ã®ããããã¢ãŠãããããæ¯èŒçé©åãªçµæãåŸããããããã¯ãŒã¯ããã€ã¹ã«ã€ããŠèšåããŸããã
ç§ã¯èªå® ã§ãã¥ãŒã©ã«ãããã¯ãŒã¯ãã»ãŒæ¬¡ã®åœ¢åŒã§äœ¿çšããŸããã
- å ¥ãå£;
- æ°çŸã®ãã¥ãŒãã³ïŒéåžž256ïŒã
- ããçš®ã®éç·åœ¢æ§ãããããã¢ãŠãïŒããããã£ãå Žæ-ãã©ã¡ãŒã¿ãå€ãããŠãããã¯ãŒã¯ãåãã¬ãŒãã³ã°ããŠãããšèããããã0.7ã®ãªãŒããŒã®å€ãåããŸããïŒ; ãã¬ãŒãã³ã°äžã«ã¢ãã«ãnan-sã«åå²ããå Žå-ãããæ£èŠåãè¿œå -詳现ã¯ãã¡ããŸãã¯ãã¡ã ã
- 100åãŸãã¯2åã®ãã¥ãŒãã³ïŒ64-128ïŒ;
- éç·åœ¢æ§;
- 1ããŒã¹ãŸãã¯2ãã¥ãŒãã³ïŒ16ïŒ;
- éç·åœ¢æ§;
- å€å žçãªã·ã°ã¢ã€ãåºåãåãã1ã€ã®åºåãã¥ãŒãã³ã
åæ§ã®ããã€ã¹ã¯ã以åã®ç«¶åããã»ãšãã©å€æŽãããã«ç§»è¡ããŸããã ãããšã¯å¥ã«ããã¥ãŒã©ã«ãããã¯ãŒã¯ã¯ããŸãããŸã衚瀺ãããŸããã§ããããã¬ãã«2ã¢ãã«ã®èšç®ã«ãã®çµæã䜿çšããããã«æ®ãããŸããã
å åŽã®å±€ã®ã¢ã¯ãã£ããŒã·ã§ã³é¢æ°ã®éžæã¯éåžžã«ç°¡åã§ã-å©çšå¯èœãªã»ããããããã¹ãŠã®ã·ã°ã¢ã€ãããªã¢ã³ããé€å€ããŸããïŒ0ã«è¿ãå€å¢çä»è¿ã®åŸé ã®ããïŒããã¯ãªãŒã³ãReLUïŒãã¬ãŒãã³ã°ããåºå0ã§åºåãéå§ãããã¥ãŒãã³ã®ããïŒæããŠïŒæ®ãã®ãã®ããäœããåããŸããã æåã¯ã Parametric Reluã§ãããã Scaled Exponential Linear Unitsãåãå§ããææ°ã¢ãã«ã§ã¯ããã§ããã ãã®ãããªçœ®æãšã®å€§ããªéãã«æ°ä»ãããšã¯ã§ããŸããã§ããã
ä»ã®ã¢ãã«ãšåæ§ã«ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®ããŒã¿ã¯ãsklearnã®KFoldã䜿çšããŠãã©ãŒã«ãã§æŠã£ãã åããŒãã£ã·ã§ã³ã®ãã¬ãŒãã³ã°ã§ã¯ãã¢ãã«ãåæ§ç¯ããã«ãããã¯ãŒã¯å±€ã®éã¿ãååæåããæ¹æ³ãé ããããããã¢ãã«ãæ°ãã«æ§ç¯ããå¿ èŠããããŸããã
æ€èšŒçšã«å²ãåœãŠãããããŒã¿ã®äžéšã®äºæž¬ã®å質ãåäžãããŸã§ããããã¯ãŒã¯ããã¬ãŒãã³ã°ããŸããã åæã«ãæ€èšŒã®çµæãæ¹åããããã³ã«ããããã¯ãŒã¯ã®éã¿ãç¯çŽãããŸããã ãããè¡ãã«ã¯ãæšæºã®ã³ãŒã«ããã¯ãškerasã䜿çšããŠãæ€èšŒã®æè¯ã®çµæã§ãããã¯ãŒã¯ã®ç¶æ ãä¿åãããã¬ãŒãã³ã°ãµã³ãã«ã®ç¹å®ã®ãã¹ã§æ€èšŒçµæãæ¹åããªãã£ãå Žåã¯ãã¬ãŒãã³ã°ãæ©æçµäºããçµæãè€æ°ã®ãã¹ã§æ¹åããªãã£ãå Žåã¯åŠç¿çãäžããŸãã
ãããã¯ãŒã¯ãã¬ãŒãã³ã°ãåæ¢ïŒããŒã«ã«ãããã ïŒã«ãªããããŒã¿ã«ãããšæ°åã®ãã¹ã§çµæãæ¹åãããªãå ŽåãåŠç¿çã¯äœäžããããã圹ã«ç«ããªãå Žåã¯ããã¬ãŒãã³ã°ãæ°å移åãããšåæ¢ããŸããã ãã¬ãŒãã³ã°åŸããã¬ãŒãã³ã°æéå šäœã®ãããã¯ãŒã¯ãŠã§ã€ãã®æé©ãªç¶æ ãããŒããããŸããã
åæã«ãè€æ°ã®ãããã¯ãŒã¯ããã¬ãŒãã³ã°ãããšãã«åãã³ãŒã«ããã¯ã€ã³ã¹ã¿ã³ã¹ã®ã»ãããæ°å䜿çšããããšãããšãååã«é ããŠåé¡ã«æ°ä»ããŸããã ãã®å Žåãæ°ãããããã¯ãŒã¯ã®ãã¬ãŒãã³ã°ã®éå§æã®ã³ãŒã«ããã¯ã®ç¶æ ã¯ãæåã®ãããã¯ãŒã¯ã«èªåçã«ãªã»ãããããŸããã ãã®çµæãåæ°ãããããã¯ãŒã¯ã®åŠç¿çã¯ãŸããŸãäœäžããåãã³ãŒã«ããã¯ã䜿çšããããã¹ãŠã®ãããã¯ãŒã¯ã§ä»¥åã«åŸããããã¹ãŠãããè¯ããªãå Žåãæè¯ã®çµæã¯ä¿åãããŸããã§ããã
ããªãŒããŒã¹ã®ã¢ãã«
ã©ã³ãã ãã©ã¬ã¹ããšäœåãªããªãŒã®ãã®ã³ã°ã«åºã¥ããããŠããã£ãã¢ãã«ã®2ã€ã®ããªã¢ã³ããšãXGBoostãšLightGBMã®2ã€ã®åŸé ããŒã¹ãå®è£ ã䜿çšããŸããã ã©ã³ãã ãã©ã¬ã¹ãã®äž¡æ¹ã®ããªã¢ã³ãã¯ã亀差æ€èšŒãšãããªãã¯ã®äž¡æ¹ã§ããã©ãŒãã³ã¹ãäœããããå€ãã®ã³ã³ãã¥ãŒã¿ãŒæéãè²»ãããªããã°ãªãããã¢ãã«ã®çµæãçµã¿åããããšãã«åœ¹ç«ã€ããšãæåŸ ããããããåã«æ®ããŸããã LightGBMãšXGBoostã®ããã©ãŒãã³ã¹ã¯å€§å¹ ã«åäžãã第1ã¬ãã«ã®äºæž¬ã®ã»ãšãã©ã¯ãããããåä¿¡ãããŸããã
ãã©ã¡ãŒã¿ãŒãé©åãããåŸãä¹±æ°ãžã§ãã¬ãŒã¿ãŒã®ããã€ãã®ïŒéåžžã¯3ã€ã®ïŒåæç¶æ ã«ã€ããŠãåãæšè³ªãã¢ãã«ãèšç®ããŸããã ãã®ãããªçµæã¯ãã¹ãŠãã¬ãã«2ã¢ãã«ã§äœ¿çšããããã«åå¥ã«ä¿åãããŸããã ã¬ãã«1ã¢ãã«ã®äºæž¬ã¯ãRNGã®æåŸã«äœ¿çšãããç¶æ ã®çµæããååŸãããŸããã
LightGBMãšXGBoostã¯ãæå®ãããå埩åæ°ã®æ€èšŒã§åŠç¿ã®è³ªãæ¹åãããªãå ŽåãåŠç¿ãåæ¢ããæ©èœãæäŸããŸãã ãã®ããã10,000件ã®æé ãåŠç¿ããæ€èšŒçµæãæ¹åããªããªã£ãæç¹ã§åæ¢ããããšãã§ããŸããã ãã®çµæããã®ãããªã¢ãã«ã®ãã©ã¡ãŒã¿ãŒãéžæãããšãã«ãããªãŒã®æ°ãéžæããå¿ èŠã¯ãããŸããã§ããã ã©ã³ãã ãã©ã¬ã¹ããsklearnããã®äœåãªããªãŒã®ãããªå¯èœæ§ã¯ãããŸããããã®ãããããªãŒæ°ã®éžæãhyperoptã«ã·ããããå¿ èŠããããŸããã æ€èšŒèªäœã®å質ããã§ãã¯ãããã³ã«ããããã1ã€ã®ã¹ãããã§ãã¬ãŒãã³ã°ããããšã¯ã§ããŸããããæ lazããããé²ããŸããã
çš®ãå°ãªã
åã ã®ã¢ãã«ã®äœæ¥ã®çµæã¯ãä¹±æ°ãžã§ãã¬ãŒã¿ãŒã®ç¶æ ã«å€§ããäŸåããŸãã ãã®äŸåé¢ä¿ãåãé€ãããã«ãã¬ãã«1ã®ãã¬ãŒãã³ã°ã¢ãã«ã§ã¯ãããã€ãã®SIDã䜿çšããåŠç¿çµæãèšç®ãããŸããã ããã«ãåã·ãŒãã«ã€ããŠãçµæã¯åå¥ã«ä¿åãããŸããã åæã«ãã³ã³ãã¹ãçµäºåŸãæåŸã®ã·ãŒãã®çµæãå¥ã®ã¬ãã«1ã¢ãã«ã«ãã£ãŠä¿åãããçµæãšããŠäœ¿çšãããããšãå€æããŸããã æ®ãã®çµæã¯åŒãç¶ãä¿åãããã¬ãã«2ã¢ãã«ã§äœ¿çšãããŸããã
ã¬ãã«2ã¢ãã«
ã¬ãã«1ã®åã¢ãã«ã1ã4åã®äºæž¬ãè¡ã£ããšããäºå®ã«ãããã¬ãã«2ã§ã¯ãããŒã¿ã«ã¯æ倧190åãå«ãŸããŠããŸããã æåã®ããŒã¿ãšå åã¯ããã«å°éããŸããã§ãã-ãã§ã«äºæž¬ããã確çã®ã¿ã ã¬ãã«2ã¢ãã«ã®ããããã¯ãã¬ãã«1ã¢ãã«ã®ãµãã»ãããçµã¿åãããŸããïŒäžéšã¯åæã¬ãã«2ã¢ãã«ã®çµæã䜿çšããŸããïŒã
ã¬ãã«2ã®ãã¹ãŠã®ã¢ãã«ã¯ã»ãŒåãæ¹æ³ã§é 眮ãããŸã-ã¢ãã«ã®ã¢ãžã¥ãŒã«åã§ããŒããããããã®äœæ¥ã®çµæãååŸããããŒããã䜿çšããåã®ãªãã·ã§ã³éžæãšãäºæž¬ãçµåããããã®ååž°ãã©ã¡ãŒã¿ãŒã®ãã£ããã£ã³ã°ã
ããŸããããªãã£ãã®ã¯ãå¥ã®ã¬ãã«ãè¿œå ãããã¹ãŠã®ã¬ãã«2ã¢ãã«ã®äºæž¬ãçµåããããšããããšã§ãããã®ãããªå ¬å ±ã®çµåã®çµæã¯éåžžã«æªãã£ãã®ã§ãç§ã¯äºåºŠç®ã®è©Šã¿ãããããšããèããŠããŸããã§ããã
çµã¿åããã§æè¯ã®çµæãåŸãããäºæž¬ã®äžéšãéžæããéã«ãã貪欲ãªãã¢ã«ãŽãªãºã ã䜿çšãããŸãã-䜿çšå¯èœãªãã®ããæè¯ã®åã®1ã€ãéžæããããã®åŸããµã€ã¯ã«ã§æ®ãã®åã以åã«éžæãããåã«1ã€ãã€è¿œå ãããŸããæ€èšŒã®ã¬ãã«2ã¢ãã«ã®çµæãæ¹åããããŸã§ãåã®è¿œå ãç¶ããããŸãããããã«ãæéãç¯çŽããããã«ãéžæã®ã¢ãã«ãšããŠBayesianRidgeã䜿çšãããŸããããã®çµæã¯ããã©ã¡ãŒã¿ãé©åã«èª¿æŽãããRidgeã«æ¬¡ããã®ã§ããããã®éžæã®çµæãéåžžçŽ20åã®ããŒã¿ãæ®ããŸããã
æçµçãªèšç®ã§ã¯ãéžæã¯æåã«å©çšå¯èœãªãã¹ãŠã®ãªã°ã¬ããµãŒã®ãã€ããŒãªãããä»ããŠå®è¡ãããŸããããsklearnã®BayesianRidgeãšRidgeã®ã¿ãå€å°é©åã«è¡šç€ºãããŠãããããæçµçã«ã³ãŒãã¯çž®éããBayesianRidgeã¢ãã«ãçµã¿åãããŠãªããžãã©ã¡ãŒã¿ãŒããã£ããã£ã³ã°ããããšã§ãã®çµæãæ¹åããããšããŸããã
æ€èšŒ
æåã¯10åã§æ€èšŒãããŠããŸããåæã«ãäžéšã®ã¢ãã«ã¯0.534-0.535ã§cvã瀺ãå§ãã0.543-0.544ãŸãã¯ãããããæªãçµæãå ¬è¡šããŸãããã³ã³ãã¹ãã®çµããã«åãã£ãŠãæ€èšŒäžãšå ¬éäžã«çµæãè¿ã¥ããããã«ãåå²ã30åã«å¢ãããŸãããæ°30ã®éžæã¯ãããã»ããµã®èœåã«åºã¥ããŠããŸã-1ã€ã®ã¢ãã«ã®èšç®ã«10æéæªæºããå¿ èŠãšããªãæ倧å€ãéžæããŸããã
ãã®å Žåããã¹ãŠåãããã«ãäžéšã®ã¢ãã«ã¯0.535ã0.536ã®ã¬ãã«ã§æ€èšŒãããŸãããããã¯ãäžè¬ã®0.543çšåºŠã®çµæãèæ¯ã«ãæ€èšŒã¹ããŒã ã®åŠ¥åœæ§ã«ã€ããŠçåãåããŸããã競æçµäºã®çŽ3æ¥åã«ããã¬ãŒãã³ã°ããŒã¿ãã0.7ãš0.3ã®30ã®ã©ã³ãã ããŒãã£ã·ã§ã³ã30åå²ã«è¿œå ããå¿ èŠããããŸãããæ£ç¢ºã«30ãéžæããã®ã¯ãcv-ããã»ããµã®èœåãšåãçç±ã§ãããã¹ãŠã®ããŒãã£ã·ã§ã³ã¯random_stateã«ãã£ãŠã³ããããããŸããããã®åŸãæ€èšŒã®æè¯ã®çµæã¯çŽ0.537ã§ããã
ãããæãŸãããã®ãšã¯ã»ã©é ããã®ã§ããããæåŸã®ã¢ãã«ãã«ãŠã³ãããããŸã§åŸ ã€ã®ã«ååãªæ°æ¥ããæ®ã£ãŠããªãã£ãã®ã§ãããã§æ¢ããªããã°ãªããŸããã§ããããã®çµæãç§ã¯2件ã®æåºãéžæããŸãããçµæã¯ãå ¬éã§0.543ãæ€èšŒã§0.538ãäžåããŸãããåŸã«å€æããããã«ããã®ãããª12ã®æåºç©ã®ãã¡ã7ã3äœã«ãªããèŠéããã®ã¯äœããããŸããã§ããã