* èŸ²å Ž -ïŒè±èªã®èŸ²æ¥ããïŒ-ç¹å®ã®ç®çïŒçµéšã®ç²åŸããªãœãŒã¹ã®ååŸãªã©ïŒã®ããã®ç¹å®ã®ã²ãŒã ã¢ã¯ã·ã§ã³ã®é·ãéå±ãªç¹°ãè¿ãã
ã¯ããã«
æè¿ïŒ10æ1æ¥ïŒã åªããDS / MLã³ãŒã¹ã®æ°ããã»ãã·ã§ã³ãéå§ãããŸããïŒçŸåšã®åŒã³åã§ããããã«ãDSãåæã³ãŒã¹ãšããŠãå ¥åãããã人ã匷ããå§ãããŸãïŒã ãããŠããã€ãã®ããã«ãã³ãŒã¹ãä¿®äºããåŸãåæ¥çã«ã¯çåããããŸã-ãŸã çã®çè«çç¥èãçµ±åããããã«ãã©ãã§å®éã®çµéšãåŸãã¹ããã ãããã£ãŒã«ãã©ãŒã©ã ã§ãã®è³ªåããããšãçãã¯ãããã1ã€ã«ãªããŸããKaggleã解決ããŠãã ããã Kaggleã¯ã€ãšã¹ã§ãããã©ãããå§ããŠãå®çšçãªã¹ãã«ãåŸãããã«ãã®ãã©ãããã©ãŒã ãæãå¹æçã«äœ¿çšãããïŒ ãã®èšäºã§ã¯ãèè ã¯èªåã®çµéšã§ãããã®è³ªåã«çãããšãšãã«ã競äºåã®ããDSã®åéã§ã®äž»ãªã¬ãŒãã®äœçœ®ã説æããŠããã³ãã³ã°ããã»ã¹ãé«éåãããã¡ã³ãç²åŸããŸãã
äœæè ããã®ã³ãŒã¹ã«é¢ããããã€ãã®èšèïŒ
mlcourse.aiã³ãŒã¹ã¯ãOpenDataScienceã³ãã¥ããã£ã®å€§èŠæš¡ãªã¢ã¯ãã£ããã£ã®1ã€ã§ãã @yorkoãšäŒç€ŸïŒã60人ïŒã¯ãã¯ãŒã«ãªã¹ãã«ã倧åŠã®å€ã§ãããã«ã¯å®å šã«ç¡æã§åŸãããããšãå®èšŒããŠããŸãã ã³ãŒã¹ã®äž»ãªã¢ã€ãã¢ã¯ãçè«ãšå®è·µã®æé©ãªçµã¿åããã§ãã äžæ¹ã§ãåºæ¬çãªæŠå¿µã®æ瀺ã¯æ°åŠãªãã§ã¯è¡ãããŸããããäžæ¹ã§ã宿é¡ã®å€ããKaggle Inclassã®ã³ã³ãã¹ãããããžã§ã¯ãã¯ããšãã«ã®ãŒã®äžå®ã®æè³ã§ãåªããæ©æ¢°åŠç¿ã¹ãã«ãæäŸããŸãã ã³ãŒã¹ã®ç«¶äºçæ§è³ªã«æ³šæããããšã¯äžå¯èœã§ã-åŠçã®äžè¬çãªè©äŸ¡ãå®æœãããŠãããããã匷ãåæ©ä»ããããŠããŸãã ãã®ã³ãŒã¹ã¯ãæ¬åœã«æŽ»æ°ã®ããã³ãã¥ããã£ã§è¡ããããšããç¹ã§ãç°ãªããŸãã
ã³ãŒã¹ã®äžç°ãšããŠã2ã€ã®Kaggle Inclassã³ã³ãã¹ãããããŸãã ã©ã¡ããéåžžã«èå³æ·±ããã®ã§ãæšèã®äœæã«é©ããŠããŸãã æåã¯èšªåããããµã€ãã®ç³»åã«ãããŠãŒã¶èå¥ã§ãã 2ã€ç®ã¯ãã¡ãã£ã¢äžã®èšäºã®äººæ°ã®äºæž¬ã§ã ã äž»ãªå©ç¹ã¯ã2ã€ã®å®¿é¡ããåŸãããããšã§ããããã§ã¯ãè³¢ãããããã®ç«¶æäŒã®ããŒã¹ã©ã€ã³ãç Žãå¿ èŠããããŸãã
ã³ãŒã¹ãšãã®äœæè ã«æ¬æãè¡šããŠãç§ãã¡ã¯ç©èªãç¶ããŸã...
ç§ã¯1幎ååã«èªåèªèº«ãèŠããŠããŸãã ã¢ã³ããªã¥ãŒNgããã®ã³ãŒã¹ïŒãŸã æåã®ããŒãžã§ã³ïŒãå®äºããã¢ã¹ã¯ã¯ç©çåŠæè¡ç 究æããã®å°éåãå®äºã ãæžç±ã®å±±ãèªãŸããŸãã-çè«ã®é ã¯ç¥èã«æºã¡ãŠããŸãããåºæ¬çãªæŠéã¿ã¹ã¯ã解決ããããšãããšãstè¿·ãçããŸãã ããããåé¡ã解決ããæ¹æ³-é©çšããã¢ã«ãŽãªãºã ã¯æããã§ã-ãç解ã§ããŸãããsklearn / pandasãã«ããæ¯åã¢ã¯ã»ã¹ãããªã©ãã³ãŒããæžãã®ã¯éåžžã«å°é£ã§ã ãã®çç±-ãã€ãã©ã€ã³ãèç©ãããŠãããããæå ã§ãã³ãŒãã®æèŠããããŸãã
ããã¯æ©èœããªããšèè ã¯èããKaggleã«è¡ããŸããã æŠé競æããããã«å§ããã®ã¯æãã£ãã§ãããããŠãå ¥é競æã ããŠã¹äŸ¡æ ŒïŒé«åºŠãªååž°æè¡ ããæåã®å åãšãªãããã®èšäºã§èª¬æãããŠããå¹æçãªãã³ãã³ã°ãžã®ã¢ãããŒãã圢æããŸããã
åŸã§èª¬æãããã®ã§ã¯ãããŠããŠã¯ãªãããã¹ãŠã®ææ³ãæ¹æ³ãããã³ææ³ã¯æçœã§äºæž¬å¯èœã§ãããããã¯ãããã®æå¹æ§ãæãªããã®ã§ã¯ãããŸããã å°ãªããšãã圌ãã«ç¶ããŠãèè ã¯Kaggleã³ã³ããã£ã·ã§ã³ãã¹ã¿ãŒã6ãæéããœãã¢ãŒãã§3ã€ã®ã³ã³ããã£ã·ã§ã³ã§æ»ã¬ããšã«æåãããã®èšäºã®å·çæç¹ã§ãKaggleã¯ãŒã«ãã¬ãŒãã£ã³ã°ã®ããã200ã«å ¥ããŸããã ãšããã§ãããã¯èè ããã®çš®ã®èšäºãæžãåæ°ããèªåã«äžããçç±ã«ã€ããŠã®è³ªåã«çããŸãã
äžèšã§èšãã°ãKaggleãšã¯
Kaggleã¯ãããŒã¿ãµã€ãšã³ã¹ã®ã³ã³ãã¹ããéå¬ããããã®æãæåãªãã©ãããã©ãŒã ã®1ã€ã§ããåã³ã³ãã¹ãã§ã¯ãäž»å¬è
ãåé¡ã®èª¬æããã®åé¡ã解決ããããã®ããŒã¿ããœãªã¥ãŒã·ã§ã³ã®è©äŸ¡åºæºãããã³æéãšè³åãã¢ããããŒãããŸãã åå è
ã«ã¯ã1æ¥ããã3ã5åïŒäž»å¬è
ã®æåã«ããïŒã®ãéä¿¡ãïŒç¬èªã®ãœãªã¥ãŒã·ã§ã³ã®éä¿¡ïŒãè©Šã¿ãããŸãã
ããŒã¿ã¯ããã¬ãŒãã³ã°ãµã³ãã«ïŒãã¬ãŒãã³ã°ïŒãšãã¹ãïŒãã¹ãïŒã«åããããŸãã ãã¬ãŒãã³ã°éšåã§ã¯ãã¿ãŒã²ããå€æ°ã®å€ã¯æ¢ç¥ã§ããããã¹ãéšåã§ã¯äžæã§ãã åå è ã®ã¿ã¹ã¯ã¯ãããŒã¿ã®ãã¬ãŒãã³ã°éšåã§ãã¬ãŒãã³ã°ããããã¹ãã§æ倧ã®çµæãããããã¢ãã«ãäœæããããšã§ãã
ååå è ã¯ãã¹ããµã³ãã«ã®äºæž¬ãè¡ããçµæãKaggleã«éä¿¡ããŸãããã®åŸãããããïŒãã¹ãã®ã¿ãŒã²ããå€æ°ãç¥ã£ãŠããïŒã¯éä¿¡ãããçµæãè©äŸ¡ãããªãŒããŒããŒãã«è¡šç€ºãããŸãã
ãããããã¹ãŠãããã»ã©åçŽã§ã¯ãããŸããããã¹ãããŒã¿ã¯ãç¹å®ã®å²åã§ãããªãã¯ïŒãããªãã¯ïŒãšãã©ã€ããŒãïŒãã©ã€ããŒãïŒã®éšåã«åå²ãããŸãã 競æäžãäž»å¬è ãèšå®ããææšã«åŸã£ãŠãéä¿¡ããã決å®ãããŒã¿ã®å ¬ééšåã§è©äŸ¡ããããªãŒããŒããŒãïŒããããå ¬éãªãŒããŒããŒãïŒã«ã¬ã€ã¢ãŠããããŸããããã«ãããåå è ã¯ã¢ãã«ã®å質ãè©äŸ¡ã§ããŸãã æçµæ±ºå®ïŒéåžžã¯2-åå è ã®éžæã«ããïŒã¯ããã¹ãããŒã¿ã®éå ¬ééšåã§è©äŸ¡ãããçµæã¯éå ¬éãªãŒããŒããŒãã«è¡šç€ºãããŸããããã¯ã競æã®çµäºåŸã«ã®ã¿å©çšã§ããå®éã«ã¯æçµçµæãè©äŸ¡ãããè³åããã³ãºãã¡ãã«ãé åžãããŸã
ãããã£ãŠã競æäžãåå è ã¯ãã¹ãããŒã¿ã®å ¬ééšåã§ã¢ãã«ãã©ã®ããã«æ¯ãèãã®ãïŒçµæã¯ããã«ããããŸããïŒãæ å ±ã®ã¿ãå©çšã§ããŸãã ç空äžã®ç圢ã®éŠ¬ã®å ŽåãããŒã¿ã®ç§çéšåã®ååžãšçµ±èšãå ¬è¡ãšäžèŽããå Žå-ããã§ãªãå Žåã¯ãã¹ãŠãããŸããããŸã-å ¬ã®å Žã§ããŸãæ©èœããã¢ãã«ã¯ãç§çéšåã§æ©èœããªãå¯èœæ§ããããŸããã€ãŸãããªãŒããŒãã£ãã¯ã¹ïŒåãã¬ãŒãã³ã°ïŒã§ãã ãããŠãããã§ã¯ãå°éçšèªã§ããã©ã€ã³ã°ããšåŒã°ãããã®ãçºçããŸããããã¯ãéžæããã¢ãã«ãåèšç·Žãããå¿ èŠãªç²ŸåºŠãäžããããšãã§ããªãã£ãããã«ãå ¬å ±ã®10æ°ããããŒã¿ã
ãããé¿ããæ¹æ³ã¯ïŒ ãã®ããã«ã¯ããŸããæ£ããæ€èšŒã¹ããŒã ãæ§ç¯ããå¿ èŠããããŸããããã¯ãã»ãšãã©ãã¹ãŠã®DSã³ãŒã¹ã®æåã®ã¬ãã¹ã³ã§æããããŠããŸãã ãªããªã ã¢ãã«ããããŸã§ã«èŠãããšã®ãªãããŒã¿ã®æ£ããäºæž¬ãè¡ãããšãã§ããªãå Žå-ã©ããªè€éãªãã¥ãŒã©ã«ãããã¯ãŒã¯ãæ§ç¯ããå Žåã§ãã䜿çšããé«åºŠãªæè¡ã«é¢ä¿ãªã-çç£ã§ã¯ããã®ãããªã¢ãã«ã¯çç£ã§ããŸãã ãã®çµæã¯äŸ¡å€ããããŸããã
Kaggleã®åã³ã³ããã£ã·ã§ã³ããšã«ãã¡ããªãã¯ã®èª¬æïŒããã³ç§ãã¡ã«ãšã£ãŠæãèå³æ·±ãïŒã®ãã©ãŒã©ã ãšã«ãŒãã«ãå«ãããŒã¿ã®ã»ã¯ã·ã§ã³ãããåå¥ã®ããŒãžãäœæãããŸãã
圌ãšKaggleãã©ãŒã©ã ã®ãã©ãŒã©ã ã§ã¯ã人ã ãã¢ã€ãã¢ãæžããè°è«ããå ±æããŠããŸãã ããããã«ãŒãã«ã¯ãã§ã«ããèå³æ·±ããã®ã§ãã å®éãããã¯ãKaggleã¯ã©ãŠãã®ç«¶åããŒã¿ïŒAmazonian AWSãGoogleã®GCEãªã©ïŒã«çŽæ¥ã¢ã¯ã»ã¹ããç¬èªã®ã³ãŒããå®è¡ããæ©èœã§ããéããããªãœãŒã¹ãåã«ãŒãã«ã«å²ãåœãŠããããããããŒã¿ãå°ãªãå Žåã¯ããããã䜿çšãããšãKaggle Webãµã€ãã®ãã©ãŠã¶ãŒããçŽæ¥ãã³ãŒããèšè¿°ããŠå®è¡ããçµæãéä¿¡ã§ããŸãã 2幎åã«ãKaggleã¯Googleã«è²·åãããããããã®æ©èœãGoogle Cloud Engineããå éšãã§äœ¿çšããããšã¯é©ãããšã§ã¯ãããŸããã
ããã«ãããã€ãã®ã³ã³ããã£ã·ã§ã³ïŒæè¿-Mercari ïŒããããŸãããããã§ã¯äžè¬ã«ã«ãŒãã«ãä»ããŠã®ã¿ããŒã¿ãæäœã§ããŸããã éåžžã«èå³æ·±ã圢åŒã§ãåå è éã®ããŒããŠã§ã¢ã®éããå¹³æºåããã³ãŒããšã¢ãããŒãã®æé©åã®ããã«è³ããªã³ã«ããããšã匷å¶ããŸããããã¯ãåœç¶ã®ããšãªããããã®æç¹ã§ã«ãŒãã«ã¯4ã³ã¢/ 16 GB RAM / 60åéã®å³ãããªãœãŒã¹å¶éããã£ããã/ 1 GBã®ã¹ã¯ã©ãããšåºåãã£ã¹ã¯ã¹ããŒã¹ã ãã®ç«¶äºã«åãçµãã§ããéãèè ã¯çè«çãªã³ãŒã¹ããããã¥ãŒã©ã«ãããã¯ãŒã¯ã®æé©åã«ã€ããŠå€ããåŠã³ãŸããã éã ãã§ã¯ååã§ã¯ãªãã23æ¥ã«ãœããçµããããããªãã®çµéšãšåã³ãåŸã...
ãã®æ©äŒã«ã ods.aiã®ååã§ããã¢ãŒãµãŒã¹ãããã³ã³ïŒã¢ãŒãµãŒïŒ ã ã³ã³ã¹ã¿ã³ãã³ãããã³ïŒã³ã¹ãã£ã¢ïŒ ã ã»ã«ã²ã€ãã£ãããïŒã»ã«ã²ã€ãïŒã«ããã®ã³ã³ãã¹ãã§ã®ã¢ããã€ã¹ãšãµããŒãã«æè¬ããŸãã äžè¬ã«ã PaweÅJankiewicz㧠1äœã«ãªã£ãKonstantin LopukhinïŒkostiaïŒã¯ãå€ãã®èå³æ·±ãç¹ãããããã£ããã«ãŒã ã§ã 75è¡ã®æšæºçãªå±èŸ± ããšåŒã°ãããã®ãã¬ã€ã¢ãŠãããŸããããªãŒããŒããŒãã®ãŽãŒã«ãã³ãŸãŒã³ã«çµæãåºåãã75è¡ã®ã³ãŒãã®ã«ãŒãã«ã§ãã ããã¯ããã¡ããèŠãªããã°ãªããŸãã:)
ããŠãæ°ãæ£ããªã©ã人ã ã¯ã³ãŒããæžããã«ãŒãã«ã«ãœãªã¥ãŒã·ã§ã³ãèå³æ·±ãã¢ã€ãã¢ãªã©ãé 眮ããŸãã éåžžãå競äºã§ã¯ãæ°é±éåŸã«ãããŒã¿ã»ãããçµ±èšãç¹æ§ãªã©ã®è©³çŽ°ãªèª¬æãšãšãã«ã1ã€ãŸãã¯2ã€ã®åªããã«ãŒãã«EDAïŒæ¢çŽ¢çããŒã¿åæïŒã衚瀺ãããŸãã ãã¡ããããªãŒããŒããŒãã«æè¯ã®çµæã衚瀺ããªãããŒã¹ã©ã€ã³ïŒåºæ¬ãœãªã¥ãŒã·ã§ã³ïŒãããã€ããããŸãããç¬èªã®ãœãªã¥ãŒã·ã§ã³ãäœæããããã®åºçºç¹ãšããŠäœ¿çšã§ããŸãã
ãªãKaggleãªã®ãïŒ
å®éãã©ã®ãã©ãããã©ãŒã ã§ãã¬ã€ããå Žåã§ããKaggleã¯æåã§æã宣äŒããããã®ã®1ã€ã§ãããåªããã³ãã¥ããã£ãšéåžžã«å¿«é©ãªç°å¢ãåããŠããŸãïŒå®å®æ§ãšããã©ãŒãã³ã¹ã®ããã«ã«ãŒãã«ãæ¹è¯ããããšãæã¿ãŸããããã§ãªããã°ãå€ãã®äººãMercari ïŒããããäžè¬çã«ããã©ãããã©ãŒã ã¯éåžžã«äŸ¿å©ã§èªçµŠèªè¶³ã§ããããã®ãµã€ã³ãã¯ãŸã é«ãè©äŸ¡ãããŠããŸãã
競äºåã®ããDSã®ãããã¯ã«é¢ããäžè¬çãªå°ããªäœè«ã éåžžã«å€ãã®å ŽåãèšäºãäŒè©±ããã®ä»ã®ã³ãã¥ãã±ãŒã·ã§ã³ã§ã¯ãèãã¯ããã¯ãã¹ãŠã§ãããã§ããã競æã®çµéšã¯å®éã®ã¿ã¹ã¯ãšã¯ç¡é¢ä¿ã§ãããããã®äººã ã¯çæ°ãšé¢å©ããŠããå°æ°ç¹ç¬¬5äœã®èª¿æŽã«åŸäºããŠããŸãçŸå®ã ãã®åé¡ãããå°ã詳ããèŠãŠã¿ãŸãããã
åŠåãç§åŠãšã¯ç°ãªããDSã¹ãã·ã£ãªã¹ãã®å®è·µãšããŠãç§ãã¡ã®ä»äºã§ã¯ãããžãã¹äžã®åé¡ã解決ããªããã°ãªããŸããã ããã¯ïŒããã«CRISP-DMãžã®åç §ããããŸãïŒã¿ã¹ã¯ã解決ããããã«å¿ èŠã§ãïŒ
- ããžãã¹äžã®èª²é¡ãç解ãã
- ãã®ããžãã¹ã¿ã¹ã¯ãžã®çããé ãããŠãããããããªããã©ãããäž»é¡ã«é¢ããããŒã¿ãè©äŸ¡ãã
- ååšããã ãã§ã¯çããåŸãããªãå Žåãè¿œå ã®ããŒã¿ãåéãã
- ããžãã¹ç®æšã«æãè¿ãã¡ããªãã¯ãéžæããŸã
- ãã®åŸãã¢ãã«ãéžæããŠãããéžæããã¢ãã«ã«ããŒã¿ãå€æãããhgbustaãæåºãããŸãã ïŒCïŒ
ãã®ãªã¹ãã®æåã®4ã€ã®ãã€ã³ãã¯ã©ãã«ãæããããŠããŸããïŒãã®ãããªã³ãŒã¹ã衚瀺ãããå Žåã¯ä¿®æ£ããŠãã ãã-ããããããšãªãç»é²ããŸãïŒãããã§ã¯ããã®æ¥çã§åãååã®çµéšããããåŠã¶ããšãã§ããŸããã ããããæåŸã®ãã€ã³ã-ã¢ãã«ã®éžæããå§ããŠã競æäŒã«åå ããããšãã§ããŸãã
ã©ã®ã³ã³ãã¹ãã§ããç§ãã¡ã®ä»äºã®ã»ãšãã©ã¯äž»å¬è ã«ãã£ãŠè¡ãããŸããã 説æããããžãã¹ç®æšããããè¿äŒŒã¡ããªãã¯ãéžæãããããŒã¿ãåéãããŸããããããŠãç§ãã¡ã®ã¿ã¹ã¯ã¯ããã®ãã¹ãŠã®ã¬ãŽããäœæ¥ãã€ãã©ã€ã³ãæ§ç¯ããããšã§ãã ãããŠãããã§ã¯ã¹ãã«ã匷åãããŸã-ãã¹ã®æäœæ¹æ³ããã¥ãŒã©ã«ãããã¯ãŒã¯ãšããªãŒã®ããŒã¿ã®æºåæ¹æ³ïŒããã³ãã¥ãŒã©ã«ãããã¯ãŒã¯ãç¹å¥ãªã¢ãããŒããå¿ èŠãšããçç±ïŒãæ€èšŒãæ£ããæ§ç¯ããæ¹æ³ãåèšç·Žããªãæ¹æ³ããã€ããŒãã©ã¡ãŒã¿ãŒã®éžææ¹æ³ãæ¹æ³.......åæ°åã®ãã©ã®ããã«ãããã®æèœãªããã©ãŒãã³ã¹ã¯ãç§ãã¡ã®è·æ¥ãéãæãã人ã ããåªç§ãªå°é家ãåºå¥ããŸãã
Kaggleã§ãèŸ²å Žãã§ããããš
åºæ¬çã«ãããã¯åççã§ããããã¹ãŠã®æ°åè ãKaggleã«æ¥ãŠå®è·µçãªçµéšãç²åŸããŸãããããã«å ããŠå°ãªããšã2ã€ã®ç®æšãããããšãå¿ããªãã§ãã ããïŒ
- èŸ²å Žã®ã¡ãã«ãšãµã€ã³ã
- Kaggleã³ãã¥ããã£ã§ã®èŸ²å Žã®è©å€
èŠããŠããã¹ãäž»ãªããšã¯ããããã®3ã€ã®ç®æšã¯å®å šã«ç°ãªããããããéæããããã«ç°ãªãã¢ãããŒããå¿ èŠã§ãããç¹ã«åæ段éã§ããããæ··åããªãã§ãã ããïŒ
ãã³ãã³ã°ã®éã«ãåæ段éãã§åŒ·èª¿ãããã®ã¯äœã®çç±ã§ããããŸãã-ããã3ã€ã®ç®æšã¯1ã€ã«çµ±åããã䞊è¡ããŠè§£æ±ºãããŸãããéå§ããŠããéã¯ã ããããæ··ããªãã§ãã ãã ïŒ ãã®ããã«ããŠããã®äžå ¬å¹³ãªäžçã§ã®çã¿ã倱æãresã¿ãé¿ããããšãã§ããŸãã
ããã ã¢ããã§ç®æšãç°¡åã«èŠãŠã¿ãŸãããã
- è©å€ -ãã©ãŒã©ã ã«è¯ãæçš¿ïŒããã³ã³ã¡ã³ãïŒãæžããŠãæçšãªã«ãŒãã«ãäœæããããšã§çãäžãããŸããã ããšãã°ãEDAã«ãŒãã«ïŒäžèšãåç §ïŒãéæšæºã®ææ³ã説æããæçš¿ãªã©ã
- ã¡ãã«ã¯éåžžã«ç©è°ãéžããå«ããªãããã¯ã§ããããŸããŸãã§ãã ãããªãã¯ã«ãŒãã«ïŒ*ïŒããã¬ã³ãããçµéšã«åãã®ããããŒã ã«åå ããç¬èªã®ããããã€ãã©ã€ã³ãäœæããããšã§åŒ·åãããŸãã
- çµéš -ææ決å®ã®åæãéããŠæ±²ã¿äžãããããšã©ãŒã«åãçµã¿ãŸãã
ïŒ*ïŒ ãããªãã¯ã«ãŒãã«ã«ãŒãã«ãã¬ã³ãã£ã³ã°ã¯ããããªãã¯ãªãŒããŒããŒãã§æ倧é床ã§ã¬ã€ã¢ãŠããããã«ãŒãã«ãéžæããããã®äºæž¬ãå¹³ååïŒãã¬ã³ãïŒãããçµæãæåºããããã¡ãŒã ã¡ãã«ãã¯ããã¯ã§ãã éåžžããã®æ¹æ³ã§ã¯ãããŒããªãŒããŒãã£ããïŒãã¬ãŒãã³ã°ãžã®åãã¬ãŒãã³ã°ïŒããã³ãã©ã€ããŒãé£è¡ãè¡ãããŸãããå Žåã«ãã£ãŠã¯ãã»ãŒéè²ã®æåŸãåŸãããšãã§ããŸãã èè ã¯ãåæ段éã§ã¯ãåæ§ã®ã¢ãããŒããæšå¥šããŠããŸããïŒãã«ããšãã³ãã«ã€ããŠã¯ä»¥äžããèªã¿ãã ããïŒã
æåã®ç®æšã¯ããçµéšããéžæããåæã«2ã€ãŸãã¯3ã€ã®ç®æšã«åãçµãæºåãã§ãããšæããç¬éãŸã§ãããé å®ããããšã§ãã
èšåãã䟡å€ã®ãããã2ã€ã®ãã€ã³ãããããŸãïŒãŠã©ãžããŒã«ã»ã€ã°ããŽã£ã³ãïŒãã«ããŠã¹ïŒ -ãªãã€ã³ããŒãããããšãïŒã
1ã€ç®ã¯ãKaggleã«æè³ããåãçµã¿ããæ°ãããããèå³æ·±ããããã³/ãŸãã¯é«çµŠã®ä»äºå Žã«è»¢æããããšã§ãã Kaggleã®ãµã€ã³ããã©ã®ããã«å¹³ãã«ãããšããŠãã人ã ãç解ããããã«ãKaggleã³ã³ããã£ã·ã§ã³ãã¹ã¿ãŒã®å±¥æŽæžããã®ä»ã®ææã«ã¯äŸ¡å€ããããŸãã
ãã®ç¹ã説æããããã«ãååã®ã»ã«ã²ã€ã»ã ã·ã³ã¹ããŒïŒcepera_angïŒãšã¢ã¬ã¯ãµã³ããŒã»ãã¹ã©ãšãïŒalbuïŒãšã®2ã€ã®ã€ã³ã¿ãã¥ãŒïŒ 1ã2 ïŒãåŒçšã§ããŸãã
ãŸãã Valery Babushkin ïŒ venheadsïŒã®æèŠïŒ
Valery Babushkin-X5 Retail Groupã®ããŒã¿ãµã€ãšã³ã¹ãããïŒçŸåšã®ã¹ã¿ããæ°ã¯30人+ 2019幎ãã20人ã®ç©ºåžïŒ
Yandex Advisoråæã°ã«ãŒãã®è²¬ä»»è
Kaggle Competition Masterã¯ãå°æ¥ã®ããŒã ã¡ã³ããŒãè©äŸ¡ããããã®åªãããããã·ã¡ããªãã¯ã§ãã ãã¡ããã30人ã®ããŒã ãšå€è£ ãããŠããªãæ©é¢è»ã®åœ¢ã§ã®ææ°ã®ã€ãã³ãã«é¢é£ããŠã以åãããå°ã培åºçãªãããã¡ã€ã«ã®èª¿æ»ãå¿ èŠã§ãããããã¯ãŸã æ°åã§ãã é«ã確çã§ãã¹ã¿ãŒã®ç§°å·ãç²åŸãã人ã¯ãå°ãªããšãäžçšåºŠã®å質ã®ã³ãŒããæžãæ¹æ³ãç¥ã£ãŠãããæ©æ¢°åŠç¿ã«ããªã粟éããŠãããããŒã¿ãã¯ãªãŒã³ã¢ããããå®å®ãããœãªã¥ãŒã·ã§ã³ãæ§ç¯ããæ¹æ³ãç¥ã£ãŠããŸãã ããã§ããã¹ã¿ãŒã®èãèªæ ¢ã§ããªããªããåå ã®äºå®ããã©ã¹ã§ããå°ãªããšãåè£è ã¯Kaglã®ååšãç¥ã£ãŠãããæ ããããããã¹ã¿ãŒããã®ã«æéãè²»ãããŠããŸããã ãããŠãå ¬éã«ãŒãã«ä»¥å€ã®ãã®ãèµ·åãããçµæã®ãœãªã¥ãŒã·ã§ã³ããã®çµæãè¶ ããå ŽåïŒæ€èšŒã¯éåžžã«ç°¡åã§ãïŒãããã¯æè¡çãªè©³çŽ°ã«é¢ãã詳现ãªè°è«ã®æ©äŒã§ãããå€å žçãªçè«ã®è³ªåãããã¯ããã«åªããŠãããèå³æ·±ãçãã§ã人ãå°æ¥ã©ã®ããã«ä»äºããããã«ã€ããŠã®ç解ãå°ãªãã DSã®ä»äºã¯Kaglã®ãããªãã®ã§ãæ ¹æ¬çã«ééã£ãŠãããšèãã人ããããšããããšããç§ãæããªããã°ãªããªãå¯äžã®ããšã§ãã ããã«å€ãã®äººãDS = MLãšèããŠããŸããããããééãã§ã
2çªç®ã®ãã€ã³ãã¯ãå€ãã®åé¡ã®è§£æ±ºçããã¬ããªã³ããŸãã¯èšäºã®åœ¢ã§çµã¿ç«ãŠãããšãã§ããããšã§ããããã«ãããäžæ¹ã§ãéå£ã®å¿ã競äºäžã«çãŸãããšããç¥èããã©ãŒã©ã ã®èéã§æ»ãªãªãããã«ããä»æ¹ã§èè ã®ããŒããã©ãªãªã«å¥ã®è¡ãè¿œå ããŸãå¯èŠæ§ãžã®+1ããããã®å Žåãããã£ãªã¢ãšåŒçšææ°ã®äž¡æ¹ã«ãã©ã¹ã®å¹æããããŸãã
èè ïŒã¢ã«ãã¡ãããé ïŒïŒ
ã¢ã³ãã¬ã€ã»O.ãã€ãªã€ãã¢ã«ããã¢ã¬ã¯ãµã«ããalex.radionovãalmlnãalxndrkalininãcepera_angãdautovriãdavydovãfartukãgolovanovãikibardinãkesãmpavlovãmvakhrushevãn01z3ãrakhlinãraufãresututãsitatorãsitatorãsitatorãsitator snikolenkoãternausãtwoleggedeyeãvsãvicidentãzfturbo
ã¡ãã«ã倱ãçã¿ãé¿ããæ¹æ³
åŸç¹ããïŒ
説æããŸãã ã»ãŒãã¹ãŠã®ç«¶äºã«ãããŠããã®çµããã«è¿ã¥ãããªãŒããŒããŒãå šäœãäžã«ã·ãããããœãªã¥ãŒã·ã§ã³ãåããã«ãŒãã«ãå ¬éãããŠããŸãããããªãã«ãšã£ãŠã¯ãããã«å¿ããŠæ±ºå®ãäžããŸãã ãããŠããã©ãŒã©ã ãçã¿ãå§ãããã³ã«ïŒ éã«ã€ããŠæ±ºå®ããã®ã¯ã©ãããŠã§ããããããŠä»ã¯éé ããåŒã£åŒµã£ãŠããŸããã å æ°ïŒ
èŠããŠãããŠãã ãã-Kaggleã¯ç«¶äºåã®ããDSã§ãã ããªãããããªãŒããŒããŒãäžã®å Žæã¯ããªã次第ã§ãã ã«ãŒãã«ãã¬ã€ã¢ãŠããã人ããã§ã¯ãªããæãéãŸã£ããã©ããã§ã¯ãªãããœãªã¥ãŒã·ã§ã³ã«ã©ãã ãã®åŽåãè²»ããããããããŠãããæ¹åããããã«ããããæ¹æ³ã䜿çšãããã©ããã ãããã
å ¬éã«ãŒãã«ããªãŒããŒããŒãäžã®ããªãã®å Žæããããªããããã¯ã¢ãŠãããå Žå-ããã¯ããªãã®å Žæã§ã¯ãããŸããã
äžçã®äžæ£ããçã¿ã泚ã代ããã«ããã®ç·ã«æè¬ããŸãã çå£ã«ãããªããããåªãããœãªã¥ãŒã·ã§ã³ãåããå ¬éã«ãŒãã«ã¯ããã€ãã©ã€ã³ã§äœããèŠéããããšãæå³ããŸãã æ£ç¢ºã«äœãèŠã€ãããã€ãã©ã€ã³ãæ¹åããåãé床ã§ãã ã¹ã¿ãŒã®çŸ€è¡ãäžåšããŸãã èŠããŠãããŠãããªãã®å Žæã«æ»ãã«ã¯ããã®äžè¬ã®äººããå°ãã ãè¯ããªãå¿ èŠãããã ãã§ãã
æåã®å€§äŒã§ã¯ããã®ç¬éãã©ãã»ã©åæºããããç§ã®æãèœã¡ãã®ãšåãããããããã§ããªãã¯éè²ã«ãªããŸãã-ãããŠããªãã¯...ãªãŒããŒããŒãã®äžéšã«ããŸãã äœããããªãã¯ãã éãŸã£ãŠãã©ãã§äœãéããããç解ããå¿ èŠããããŸã-ããªãã®æ±ºå®ãããçŽããŠ-å Žæã«æ»ãã
ããã«ããã®ç¬éã¯ã競äºããã»ã¹ã®åæ段éã«ã®ã¿ååšããŸãã çµéšãç©ãã»ã©ãé 眮ãããã«ãŒãã«ãæã®åœ±é¿ãåãã«ãããªããŸãã æåŸã®ç«¶äºã®1ã€ïŒ ããŒãã³ã°ããŒã¿ ãç§ãã¡ã®ããŒã ã8äœã«ãªã£ã ïŒã§ã圌ãããã®ãããªã«ãŒãã«ãæçš¿ããŸãããã圌ã¯Pavel PleskovïŒppleskovïŒããã®ããŒã ãã£ããã§1è¡ã ããåãåããŸãã ã æªåããã ãã§ã ã æšãŠãŠãã ãã ãã ã€ãŸãããã®ã«ãŒãã«ãããŒã¿ãããã«ããæçšãªä¿¡å·ã¯ãã¹ãŠããã§ã«ã¢ãã«ã«ãã£ãŠåŒãåºãããŠããŸãã
ãããŠãã¡ãã«ã«ã€ããŠ-èŠããŠãããŠãã ããïŒ
ãè£ åã®ãªããã«ãã¯ãºãã³ãç¶æããããã«ã®ã¿å¿ èŠã§ããïŒCïŒ
ã©ãã§ãäœããã©ã®ããã«ã³ãŒããæžããã
ããã§ã®æšå¥šäºé ã¯ã ubuntuã®äžã®jupyterããŒãããã¯ã§ã®Python 3.6ã§ãã Pythonã¯é·ãéDSã®ããã¡ã¯ãã¹ã¿ã³ããŒãã«ãªããŸãããèšå€§ãªæ°ã®ã©ã€ãã©ãªãšã³ãã¥ããã£ãèãããš ãç¹ã«jupyter_contrib_nbextensionsã䜿çšããjupyterã¯ãã©ããããããã¿ã€ãã³ã°ãåæãããŒã¿åŠçã«éåžžã«äŸ¿å©ã§ããã ubuntuã¯åç¬ã§äŸ¿å©ã§ãããããã«ããŒã¿åŠçã®äžéšã¯ç°¡åã«å®è¡ã§ããå ŽåããããŸãbash㧠:)
jupyter_contrib_nbextensionsãèšå®ããããããã«æå¹ã«ããããšããå§ãããŸãã
- æãããã¿å¯èœãªèŠåºãïŒã³ãŒããããã¯ã®æŽçã«éåžžã«åœ¹ç«ã¡ãŸãïŒ
- ã³ãŒãã®æãããã¿ïŒåãïŒ
- ã»ã«ãåå²ããŸãïŒãŸãã§ãããäœãã䞊è¡ããŠãããã°ããå¿ èŠãããå Žåã«äŸ¿å©ã§ãïŒ
ãããŠãããªãã®äººçã¯ãã£ãšæ¥œã«ãªãããã£ãšæ¥œãããªããŸãã
ãã€ãã©ã€ã³ãå€å°å®å®ããããããã«ã³ãŒããå¥ã®ã¢ãžã¥ãŒã«ã«é 眮ããããšããå§ãããŸãã ç§ãä¿¡ããŠ-ããªãã¯ããã2åããŸãã¯5åã§ã¯ãªããè€æ°åæžãæããŸãã ããã-ããã¯æ£åžžã§ãã
åå è ãjupyterããŒãããã¯ãã§ããéã䜿çšãããå¿ èŠãªå Žåã«ã®ã¿ãã¹ã¯ãªããã䜿çšããŠãã€ãã©ã€ã³ãããã«äœæããããšããå Žåãå察ã®ã¢ãããŒãããããŸãã ïŒãã®ãªãã·ã§ã³ã®æ¯æè ã¯ãããšãã°ïŒVladimir IglovikovïŒternausïŒ ïŒ
ãããŠã pypyterãäœããã®çš®é¡ã®IDEïŒ pycharmãªã©ïŒãšçµã¿åãããããšããŠãã人ãããŸãã
ããããã®ã¢ãããŒãã«ã¯åœã«å¯Ÿããæš©å©ããããããããã®é·æãšçæããããããã¹ãŠã®ããŒã«ãŒã®å³ãšè²ã¯ç°ãªãããšèšãããŠããŸãã å¿«é©ãªãã®ãéžæããŠãã ããã
ããããä»»æã®ãªãã·ã§ã³ã§ãååãšããŠãããåã
éä¿¡ãããåsubmit / OOFã®ã³ãŒããä¿åããŸãïŒä»¥äžãåç §ïŒ ã
ïŒ*ïŒ OOF-out of folds ãã¯ãã¹æ€èšŒã䜿çšããŠããŒã¿ã»ããã®ãã¬ãŒãã³ã°éšåã®ã¢ãã«äºæž¬ãååŸããææ³ã ã¢ã³ãµã³ãã«ã§è€æ°ã®ãœãªã¥ãŒã·ã§ã³ãããã«çµã¿ç«ãŠãããã«äžå¯æ¬ ã§ãã ããã¯ã³ãŒã¹ã§åã³æããããããç°¡åã«ã°ãŒã°ã«ã§ãã
ã©ããã£ãŠïŒ ããŠãå°ãªããšã3ã€ã®ãªãã·ã§ã³ããããŸãã
- åã³ã³ããã£ã·ã§ã³ããšã«ã githubãŸãã¯bitbucketã«åå¥ã®ãªããžããªãäœæãããåéä¿¡ã®ã³ãŒãããåä¿¡ããé床ãã¢ãã«ãã©ã¡ãŒã¿ãŒãªã©ãå«ãã³ã¡ã³ããšãšãã«ãªããžããªã«ã³ããããããŸãã
- åãµããããã®ã³ãŒãã¯ããµããããã®ãã¹ãŠã®ã¡ã¿æ å ±ã瀺ãããŠãããã¡ã€ã«ã®ååïŒåãé床ããã©ã¡ãŒã¿ãªã©ïŒãæã€åå¥ã®ã¢ãŒã«ã€ãã«åéãããŸã
- DS / MLå°çšã«åŒ·åãããããŒãžã§ã³ç®¡çã·ã¹ãã ã䜿çšãããŸãã ããšãã°ã https://dvc.org ã
äžè¬ã«ãã³ãã¥ããã£ã§ã¯ã3çªç®ã®ãªãã·ã§ã³ã«åŸã ã«åãæ¿ããåŸåããããŸãã 1ã€ç®ãš2ã€ç®ã¯æ¬ ç¹ããããŸãããã·ã³ãã«ã§ä¿¡é Œæ§ãé«ããKaggleã«ãšã£ãŠã¯ååã§ãã
ã¯ããããã°ã©ããŒã§ãªã人ã®ããã®pythonã®è©³çŽ°-ãããæããªãã§ãã ããã ããªãã®ä»äºã¯ãä»ã®äººã®ã«ãŒãã«ãç解ããŠã©ã€ãã©ãªãæžãããã«ãã³ãŒãã®åºæ¬æ§é ãšèšèªã®åºæ¬çãªæ¬è³ªãç解ããããšã§ãã Webã«ã¯åå¿è åãã®åªããã³ãŒã¹ããããããããŸããããããã圌ããæ£ç¢ºã«ã©ãã«ããããæããŠãããã³ã¡ã³ãããããŸãã ( ) , .
,
. , Kaggle â . , , - ResNet/VGG , â , .
, . Camera Identification , , , [ ods.ai ] , Kaggle , , â . , 46- , , , â , 300 , .
â .
( jupyter notebooks + ) :
- EDA (exploratory data analysis) . â Kaggle :), EDA . - , , - , .. . , .
- Data Cleaning â , . , , ..
- Data preparation â , . :
- /
- ( , , FM/FFM )
- ( Vectorizers, TF-IDF , Embeddings )
- Models
- Linear models
- Tree models
- Neural Networks
- Exotic (FM/FFM)
- Feature selection
- Hyperparameters search
- Ensemble
, , ( ). .
â , - .
CSV, feather/pickle/hdf â .
, TalkingData, , memmap , lgb.
â hdf/feather, - ( ) â CSV . â , , .
Getting started ( , House Prices: Advanced Regression Techniques ), . , , , , .. ãªã© , â , .
â â .
, 100% :
- EDA . ( , , , ...)
- Data Cleaning. ( fillna, , )
- Data preparation
- ( â label/ohe/frequency, , , )
- ( )
- Models
- Linear models ( â ridge/logistic)
- Tree models (lgb)
- Feature selection
- grid/random search
- Ensemble
- Regression / lgb
⊠:)
, . â , Mercedes, Santander . Mercedes , ( , â ):
How to Win a Data Science Competition: Learn from Top Kagglers"
â !!!
- , , ⊠âŠ
- /
- . 1
â â ! , . â , , , â . , .
, - , â ?
!
:
- . , Kaggle . .
- . â , , .
!
- 4, (EDA/ Preparation/ Model/ Ensemble/ Feature selection/ Hyperparameters search/ ...)
- , , , .
:
- () , .
- - , .
.
, . , . ããªãã決ããŸãã
ãã©ãŒã©ã ã§è³ªåã§ããã®ã§ããªã5æ¥éåŸ ã£ãŠããã«èªãã§ããªãã®ã§ããïŒãã®æ®µéã§ã¯ïŒç§ã®æèŠã§ã¯ïŒããã§ã«åœ¢æãããã¹ã¬ããã解決çã®è°è«ãããªããæã£ãŠãããããããªã質åã§èªãããšããå§ãããŸã-誰ãããã§ã«å°ããŠãããããŸã£ããå°ããã«èªåã§çããæ¢ãæ¹ãè¯ãã§ãïŒ
ãªããããã¹ãŠãããããã®ã§ããïŒ ããŠãããäžåºŠ-ãã®æ®µéã®ã¿ã¹ã¯ã¯ããœãªã¥ãŒã·ã§ã³ãæ¹æ³ãã¢ãããŒãã®ããŒã¿ããŒã¹ãéçºããããšã§ãã æŠéåºå°ã 次ã®ç«¶äºã§æéãç¡é§ã«ããªãããã«ãããã«èšã£ãŠãã ãã- ã¿ãŒã²ãããšã³ã³ãŒããå ¥ã£ãŠããããšãæå³ããŸããã¡ãªã¿ã«ããã©ãŒã«ãå ã®ãã©ãŒã«ããä»ããŠãã®ããã®æ£ããã³ãŒãããããŸãã ãŸãã¯ããïŒ ã¢ã³ãµã³ãã«ãscipy.optimizeãééããããšãèŠããŠããŸãã ãšãã㧠ãã³ãŒãã¯ãã§ã«æºåãã§ããŠããŸãã
ãã®ãããªãã®...
äœæ¥ã¢ãŒãã«ç§»åããŸã
ãã®ã¢ãŒãã§ã¯ãããã€ãã®ç«¶åã解決ããŸãã ã·ãŒãäžã®ã¬ã³ãŒããå°ãªããªããã¢ãžã¥ãŒã«å ã®ã³ãŒããå¢ããŠããããšã«æ°ä»ããã³ã«ã 次第ã«ãåæã®ã¿ã¹ã¯ã¯ããœãªã¥ãŒã·ã§ã³ã®èª¬æãèªãã ãã®äºå®ã«ãŸã§åæžãããŸãã ãããŠã貯éç®±ã«1ã€ãŸãã¯2ã€ã®æ°ããåªæãã¢ãããŒããè¿œå ããŸãã
ãã®åŸãã¢ãŒãã¯ãšã©ãŒåŠçã¢ãŒãã«å€ãããŸãã ããŒã¹ã¯æºåãã§ããŠããã®ã§ãæ£ããé©çšããå¿ èŠããããŸãã åã³ã³ããã£ã·ã§ã³ã®åŸããœãªã¥ãŒã·ã§ã³ã®èª¬æãèªãã§ãèŠãŠãã ãã-ããªãããããªãã£ãããšãããè¯ãã§ããããšãèŠéããããšããŸãã¯ã©ãã«è¡ããŸããããç§ãToxicã§ããããã«ã 圌ã¯éåžžã«ããæ©ããéã®äžã§æ©ããå人çã«ã¯1,500ã®ããžã·ã§ã³ãé£ã³ãŸããã æ¶ãæµãã®ã¯æ®å¿µã§ãã...èœã¡çããŠãééããèŠã€ãããã£ãããšããæçš¿ãæžããŠãæèšãåŠã³ãŸããã
åäœã¢ãŒããžã®æåŸã®åºå£ã®å åã¯ãæäžäœãœãªã¥ãŒã·ã§ã³ã®èª¬æã®1ã€ãããã¯ããŒã ããæžã蟌ãŸãããšããäºå®ã§ãã
ãã®æ®µéã®çµãããŸã§ã«ããã€ãã©ã€ã³ã«ããããäœããã¹ããïŒ
- ååŠçããã³æ°å€çç¹åŸŽã®äœæã®ããã®ããããçš®é¡ã®ãªãã·ã§ã³-æ圱ãé¢ä¿ã
- ã«ããŽãªãæäœããããŸããŸãªæ¹æ³-æ£ããããŒãžã§ã³ãé »åºŠãã©ãã«/ oheã§ã®å¹³åã¿ãŒã²ãããšã³ã³ãŒãã£ã³ã°ã
- ããã¹ãäžã®ããŸããŸãªåã蟌ã¿ã¹ããŒã ïŒGloveãWord2VecãFasttextïŒ
- ããŸããŸãªããã¹ããã¯ãã«åã¹ããŒã ïŒã«ãŠã³ããTF-IDFãããã·ã¥ïŒ
- ããã€ãã®æ€èšŒã¹ããŒã ïŒã°ã«ãŒãããšã®æéããŒã¹ã®æšæºçãªçžäºæ€èšŒã§ã¯N * MïŒ
- ãã€ãºæé©å/ hyperopt /ãã€ããŒãã©ã¡ãŒã¿ãŒãéžæããããã®ä»ã®äœã
- ã·ã£ããã«/ã¿ãŒã²ããã®é å/ Boruta / RFE-æ©èœãéžæãããã
- ç·åœ¢ã¢ãã«-1ã€ã®ããŒã¿ã»ããã§åãã¹ã¿ã€ã«
- LGB / XGB / Catboost-1ã€ã®ããŒã¿ã»ããã§åãã¹ã¿ã€ã«
èè ã¯ãç·åœ¢ã¢ãã«ãšããªãŒããŒã¹ã¢ãã«ã®ã¡ã¿ã¯ã©ã¹ãåå¥ã«äœæããåäžã®å€éšã€ã³ã¿ãŒãã§ã€ã¹ã䜿çšããŠãç°ãªãã¢ãã«ã®APIã®éããå¹³æºåããŸããã ãã ãã1ã€ã®ããŒã§1è¡ã§å®è¡ã§ããããã«ãªããŸãããããšãã°ã1ã€ã®åŠçæžã¿ããŒã¿ã»ããã«å¯ŸããŠLGBãŸãã¯XGBãå®è¡ã§ããŸãã
- ããããç¶æ³ã«å¯Ÿå¿ããããã€ãã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒãããŸã§ã®ãšããåçã¯æ®ããŸããïŒ-ããã¹ãã®åã蟌ã¿/ CNN / RNNãã·ãŒã±ã³ã¹ã®RNNããã®ä»ãã¹ãŠã®ãã£ãŒããã©ã¯ãŒãã ãªãŒããšã³ã³ãŒããŒãç解ããã§ããããã«ãªãããšã¯è¯ãããšã§ãã
- lgb /ååž°/ scipyã«åºã¥ãã¢ã³ãµã³ãã«-ååž°ããã³åé¡ã¿ã¹ã¯çš
- ãã§ã«éºäŒçã¢ã«ãŽãªãºã ãã§ããã®ã¯è¯ãããšã§ããæã«ã¯ããŸãããããšããããŸã
ãŸãšãããš
ããããã¹ããŒãããããŠç«¶äºåã®ããDSãã¹ããŒãã§ãããå€ãã®æ±ãšå€ãã®ä»äºã§ãã ããã¯è¯ãããšã§ãæªãããšã§ããããŸãããäºå®ã§ãã 競æãžã®åå ïŒããã»ã¹ã«æ£ããã¢ãããŒãããå ŽåïŒã¯ãæè¡çãªã¹ãã«ãéåžžã«äžæãçºæ®ããŸããããã«ãå®éã«äœããããããªããšãã«ã¹ããŒã粟ç¥ãå€å°æºãããçŽæ¥ãã¹ãŠãå£ããŸã-ããããããªãã¯ã©ãããããã«ç«ã¡äžãã£ãŠãã¢ãã«ãããçŽããèšç®ãéå§ããŸããã®äžå¹žãªå°æ°ç¹ç¬¬5äœãããã£ãŠãã ããã
èŸ²å Žäœéšãã¡ãã«ããã¡ã³-Kaggleã決ããŸãããïŒ
èè ã®ãã€ãã©ã€ã³ã«é¢ããããã€ãã®èšè
ãã®ã»ã¯ã·ã§ã³ã§ã¯ã1幎åã«ããã£ãŠåéããããã€ãã©ã€ã³ãšã¢ãžã¥ãŒã«ã®äž»ãªã¢ã€ãã¢ã説æããããšããŸãã ç¹°ãè¿ããŸããããã®ã¢ãããŒãã¯æ®éçãŸãã¯ãŠããŒã¯ã§ãããšäž»åŒµããŠããŸããããçªç¶èª°ããå©ããã§ãããã
- å¹³åã¿ãŒã²ãããšã³ã³ãŒãã£ã³ã°ãé€ããã¹ãŠã®æ©èœãšã³ãžãã¢ãªã³ã°ã³ãŒãã¯ãé¢æ°ã®åœ¢åŒã§å¥ã®ã¢ãžã¥ãŒã«ã«åãåºãããŸãã ãªããžã§ã¯ããä»ããŠåéããããšããŸããããé¢åã§ããããšãå€æããŸããããã®å Žåãå¿ èŠãããŸããã
- æ©èœãšã³ãžãã¢ãªã³ã°ã®ãã¹ãŠã®æ©èœã¯åãã¹ã¿ã€ã«ã§äœæãããåŒã³åºããšæ»ãã®çœ²åã1ã€ã ãã§ãã
def do_cat_dummy(data, attrs, prefix_sep='_ohe_', params=None): # do something return _data, new_attrs
å ¥åã«ã¯ãããŒã¿ã»ãããäœæ¥çšã®å±æ§ãæ°ããå±æ§ã®ãã¬ãã£ãã¯ã¹ãããã³è¿œå ã®ãã©ã¡ãŒã¿ãŒãæž¡ããŸãã åºåã§ãæ°ããå±æ§ãšãããã®å±æ§ã®ãªã¹ããæã€æ°ããããŒã¿ã»ãããååŸããŸãã ããã«ããã®æ°ããããŒã¿ã»ããã¯å¥ã®ãã¯ã«ã¹/ãã§ã¶ãŒã«ä¿åãããŸãã
ããã«ãããäºåã«çæããããã¥ãŒããããã¬ãŒãã³ã°çšã®ããŒã¿ã»ããããã°ããçµã¿ç«ãŠãããšãã§ããŸãã ããšãã°ãã«ããŽãªã®å ŽåãäžåºŠã«3ã€ã®åŠçãè¡ããŸã-ã©ãã«ãšã³ã³ãŒãã£ã³ã°/ OHE /é »åºŠã3ã€ã®å¥ã ã®ãã§ã¶ãŒã«ä¿åããã¢ããªã³ã°ã®æ®µéã§ãããã®ãããã¯ãåçãã1ã€ã®ãšã¬ã¬ã³ããªåãã§ããŸããŸãªãã¬ãŒãã³ã°ããŒã¿ã»ãããäœæããŸãã
pickle_list = [ 'attrs_base', 'cat67_ohe', # 'cat67_freq', ] short_prefix = 'base_ohe' _attrs, use_columns, data = load_attrs_from_pickle(pickle_list) cat_columns = []
å¥ã®ããŒã¿ã»ãããæ§ç¯ããå¿
èŠãããå Žåã¯ã pickle_list
å€æŽããŠåèµ·åããæ°ããããŒã¿ã»ãããæäœããŸãã
衚圢åŒããŒã¿ïŒå®ããã³ã«ããŽãªïŒã®äž»èŠãªé¢æ°ã»ããã«ã¯ãã«ããŽãªã®ããŸããŸãªã³ãŒãã£ã³ã°ãã«ããŽãªã®æ°å€å±æ§ã®æ圱ãããã³ããŸããŸãªå€æãå«ãŸããŸãã
def do_cat_le(data, attrs, params=None, prefix='le_'): def do_cat_dummy(data, attrs, prefix_sep='_ohe_', params=None): def do_cat_cnt(data, attrs, params=None, prefix='cnt_'): def do_cat_fact(data, attrs, params=None, prefix='bin_'): def do_cat_comb(data, attrs_op, params=None, prefix='cat_'): def do_proj_num_2cat(data, attrs_op, params=None, prefix='prj_'):
ãœãŒã¹å±æ§ã®ãªã¹ããšå€æé¢æ°ã®ãªã¹ãã転éããå±æ§ãçµåããããã®æ±çšã¹ã€ã¹ãã€ããéåžžã©ãããããŒã¿ã»ãããšæ°ããå±æ§ã®ãªã¹ããååŸããŸãã
def do_iter_num(data, attrs_op, params=None, prefix='comb_'):
ããã«ãããŸããŸãªè¿œå ã®ç¹å®ã®ã³ã³ããŒã¿ãŒã
ããã¹ãããŒã¿ãåŠçããããã«ãååŠçãããŒã¯ã³åãèŠåºãèªå/ã¹ããã³ã°ãé »åºŠè¡šãžã®å€æãªã©ã®ããŸããŸãªæ¹æ³ãå«ãåå¥ã®ã¢ãžã¥ãŒã«ã䜿çšãããŸãã ãªã© sklearn ã nltk ã kerasã䜿çšãããã¹ãŠãæšæºã§ãã
æç³»åã¯ãéåžžã®ã¿ã¹ã¯ïŒååž°/åé¡ïŒãšã·ãŒã±ã³ã¹éã§äž¡æ¹ã®å ã®ããŒã¿ã»ãããå€æããæ©èœãåããå¥ã®ã¢ãžã¥ãŒã«ã«ãã£ãŠãåŠçãããŸãã ã±ã©ã¹ãä»äžããŠãããFrançoisCholletã«æè¬ããŸããããã«ãããseq-2-seqã¢ãã«ã®æ§ç¯ãæªéãåŒã¶ããŒãã¥ãŒæã®ååŒã«äŒŒãŠããŸãããšã¯ãããŸããã
ã¡ãªã¿ã«ãåãã¢ãžã¥ãŒã«ã«ã¯ãã·ãªãŒãºã®éåžžã®çµ±èšåæã®æ©èœããããŸã-å®åžžæ§ã®ãã§ãã¯ãSTLå解ãªã©ãããã¯ãã·ãªãŒãºããæããŠãããããã©ã®ãããªãã®ããèŠãããã«åæã®åæ段éã§å€§ãã«åœ¹ç«ã¡ãŸãã
ããŒã¿ã»ããå šäœã«ããã«é©çšããããšã¯ã§ããŸããããçžäºæ€èšŒäžã«ãã©ãŒã«ãå ã§äœ¿çšããå¿ èŠãããé¢æ°ã¯ãå¥ã®ã¢ãžã¥ãŒã«ã«é 眮ãããŸãã
- æå³ã¿ãŒã²ãããšã³ã³ãŒãã£ã³ã°
- ã¢ãããµã³ããªã³ã°/ããŠã³ãµã³ããªã³ã°
ãããã¯ããã¬ãŒãã³ã°æ®µéã§ã¢ãã«ã¯ã©ã¹å ã«æž¡ãããŸãïŒä»¥äžã®ã¢ãã«ã«ã€ããŠèªãïŒã
_fpreproc = fpr_target_enc _fpreproc_params = fpr_target_enc_params _fpreproc_params.update(**{ 'use_columns' : cat_columns, })
- ã¢ãã«åã®ããã«ãã¢ãã«ã®æŠå¿µãäžè¬åããã¡ã¿ã¯ã©ã¹ãäœæãããŸãããæœè±¡ã¡ãœããïŒfit / predict / set_params /ãªã©ã ç¹å®ã®ã©ã€ãã©ãªïŒLGBãXGBãCatboostãSKLearnãRGFãªã©ïŒããšã«ããã®ã¡ã¿ã¯ã©ã¹ã®å®è£ ãäœæãããŠããŸãã
ã€ãŸããLGBãšé£æºããããã«ãã¢ãã«ãäœæããŸã
model_to_use = 'lgb' model = KudsonLGB(task='classification')
XGBã®å ŽåïŒ
model_to_use = 'xgb' metric_name= 'auc' task='classification' model = KudsonXGB(task=task, metric_name=metric_name)
ãããŠããã¹ãŠã®æ©èœã¯ããã«model
åäœãmodel
ã
æ€èšŒã®ããã«ãã¯ãã¹æ€èšŒäžã«è€æ°ã®ã·ãŒãã®äºæž¬ãšOOFã®äž¡æ¹ãããã«èšç®ããããã€ãã®é¢æ°ãšãtrain_test_splitãä»ããéåžžã®æ€èšŒã®ããã®å¥ã®é¢æ°ãäœæãããŸããã ãã¹ãŠã®æ€èšŒé¢æ°ã¯ãã¢ãã«ã«äŸåããªãã³ãŒããæäŸããä»ã®ã©ã€ãã©ãªã®ãã€ãã©ã€ã³ãžã®æ¥ç¶ã容æã«ããã¡ã¿ã¢ãã«ã¡ãœããã䜿çšããŠåäœããŸãã
res = cv_make_oof( model, model_params, fit_params, dataset_params, XX_train[use_columns], yy_train, XX_Kaggle[use_columns], folds, scorer=scorer, metric_name=metric_name, fpreproc=_fpreproc, fpreproc_params=_fpreproc_params, model_seed=model_seed, silence=True ) score = res['score']
XX_train [use_columns]ãyy_trainãXX_Kaggle [use_columns]ãæãç®ãåŸç¹=ã¹ã³ã¢ã©ãŒãMETRIC_NAME = METRIC_NAMEãfpreproc = _fpreprocãfpreproc_params = _fpreproc_paramsãmodel_seed = model_seedãæ²é»=çres = cv_make_oof( model, model_params, fit_params, dataset_params, XX_train[use_columns], yy_train, XX_Kaggle[use_columns], folds, scorer=scorer, metric_name=metric_name, fpreproc=_fpreproc, fpreproc_params=_fpreproc_params, model_seed=model_seed, silence=True ) score = res['score']
æ©èœã®éžæã«ã€ããŠ-é¢çœããªããæšæºã®RFEã§ãããããããæ¹æ³ã§ç§ã®ãæ°ã«å ¥ãã®ã·ã£ããã«é åã
ãã€ããŒãã©ã¡ãŒã¿ãŒãæ€çŽ¢ããããã«ããã€ãºæé©åãäž»ã«äœ¿çšãããŸãããããçµ±åããã圢åŒã§ãããä»»æã®ã¢ãã«ã®æ€çŽ¢ãå®è¡ã§ããŸãïŒã¯ãã¹æ€èšŒã¢ãžã¥ãŒã«ã䜿çšïŒã ãã®ãŠãããã¯ãã·ãã¥ã¬ãŒã·ã§ã³ãšåãã©ãããããã«äœãã§ããŸãã
ããã€ãã®æ©èœãã¢ã³ãµã³ãã«çšã«äœæãããRidge / LogregãLGBããã¥ãŒã©ã«ãããã¯ãŒã¯ãããã³ç§ã®ãæ°ã«å ¥ãã®scipy.optimizeã«åºã¥ããŠååž°ããã³åé¡ã¿ã¹ã¯çšã«çµ±åãããŸããã
ç°¡åãªèª¬æ-ãã€ãã©ã€ã³ã®åã¢ãã«ã¯ããã¹ãã®äºæž¬ãšãã¬ã€ã³ã®OOFäºæž¬ã§ããsub_xxxãšoof_xxxã® 2ã€ã®ãã¡ã€ã«ãçµæãšããŠæäŸããŸãã 次ã«ãæå®ããããã£ã¬ã¯ããªã®ã¢ã³ãµã³ãã«ã¢ãžã¥ãŒã«ã§ããã¹ãŠã®ã¢ãã«ããã®äºæž¬ã®ãã¢ã2ã€ã®ããŒã¿ãã¬ãŒã df_sub / df_oofã«ã¢ããããŒãããŸãã ããã§ã¯ãçžé¢é¢ä¿ãèŠãŠãæé©ãªãã®ãéžæããŠããã df_oofã§ã¬ãã«2ã¢ãã«ãæ§ç¯ãã df_subã«é©çšããŸã ã
ã¢ãã«ã®æé©ãªãµãã»ãããæ€çŽ¢ããã«ã¯ã éºäŒçã¢ã«ãŽãªãºã ã«ããæ€çŽ¢ãé©ããŠããå ŽåããããŸãïŒèè ã¯ãã®ã©ã€ãã©ãªã䜿çšããŸã ïŒã æãåçŽãªã±ãŒã¹ã§ã¯ãæšæºååž°ãšscipy.optimizeãããŸãæ©èœããŸãã
ãã¥ãŒã©ã«ãããã¯ãŒã¯ã¯å¥ã®ã¢ãžã¥ãŒã«ã«ååšããŸããèè ã¯æ©èœã¹ã¿ã€ã«ã§kerasã䜿çšããŠããŸã ãã¯ãã pytorchã»ã©æè»ã§ã¯ãããŸããããä»ã®ãšããååã§ãã ç¹°ãè¿ãã«ãªããŸããããããã¯ãŒã¯ã®ã¿ã€ãã«äžå€ãªæ®éçãªãã¬ãŒãã³ã°é¢æ°ãæžãããŠããŸãã
ãã®ãã€ãã©ã€ã³ã¯ã Home Creditãšã®æè¿ã®ç«¶äºã§ããäžåºŠãã¹ããããŸããããã¹ãŠã®ãããã¯ãšã¢ãžã¥ãŒã«ãæ éãã€æ£ç¢ºã«äœ¿çšããããšã§ã94äœã«ãªããŸããã
äœè ã¯äžè¬ã«ã衚圢åŒã®ããŒã¿ãšéåžžäœæããããã€ãã©ã€ã³ã«ã€ããŠã¯ãã³ã³ãã¹ãã§ã®æçµæåºç©ãããã100ã®ãªãŒããŒããŒãã«å ¥ãã¹ãã ãšããå¹³ç©ãªèããè¡šçŸããæºåãã§ããŠããŸãã åœç¶ãäŸå€ããããŸãããäžè¬çã«ãã®å£°æã¯çå®ã®ããã§ãã
ããŒã ã¯ãŒã¯ã«ã€ããŠ
KaggleãããŒã ããœããã決ããã®ã¯ããã»ã©ç°¡åã§ã¯ãããŸãã-ããã¯äººïŒããã³ããŒã ïŒã«å€§ããäŸåããŸãããå§ããã°ããã®äººãžã®ç§ã®ã¢ããã€ã¹ã¯ãœããå§ããããšããããšã§ãã ãªãã§ïŒ ç§ã®èŠç¹ã説æãããïŒ
- ãŸããèªåã®é·æãç解ãã匱ç¹ã確èªããäžè¬ã«ãDSãã©ã¯ãã£ã¹ãšããŠã®å¯èœæ§ãè©äŸ¡ã§ããããã«ãªããŸãã
- 第äºã«ãããŒã ã§äœæ¥ããŠããå Žåã§ãïŒåœ¹å²ãåé¢ããã確ç«ãããããŒã ã§ãªãéãïŒã圌ãã¯ããªãããã®æ¢è£œã®å®å šãªãœãªã¥ãŒã·ã§ã³ãåŸ ã£ãŠããŸã-ã€ãŸãããã§ã«äœæ¥ãã€ãã©ã€ã³ãããã¯ãã§ãã ïŒã éä¿¡ãããã©ãã ãïŒïŒCïŒ
- ãããŠç¬¬äžã«ãããŒã ã®ãã¬ã€ã€ãŒã®ã¬ãã«ãã»ãŒåãïŒãããŠéåžžã«é«ãïŒå Žåã«æé©ã§ãããããŠãããªãã¯æ¬åœã«é«ã¬ãã«ã®æçšãªãã®ãåŠã¶ããšãã§ããŸãïŒåŒ±ãããŒã ã§ã¯ïŒè»œrogã¯ãããŸãããç§ã¯Kaggleã§ã®ãã¬ãŒãã³ã°ãšçµéšã®ã¬ãã«ã«ã€ããŠè©±ããŠããŸãïŒäœããåŠã¶ããšã¯éåžžã«å°é£ã§ãããã©ãŒã©ã ãšã«ãŒãã«ãåãã»ããè¯ãã§ãã ã¯ããã¡ãã«ãé€ãããšãã§ããŸãããç®æšãšãã³ããç¶æããããã®ãã«ãã«ã€ããŠã¯äžèšãåç §ããŠãã ããïŒ
蚌æ ãšçŽæãããã¬ãŒãã«ãŒãã«é¢ãããã£ããã³ããã®æçšãªãã³ã:)
ãããã®ãã³ãã¯èè ã®çµéšãåæ ãããã®ã§ãããæ矩ã§ã¯ãªããç¬èªã®å®éšã«ãã£ãŠæ€èšŒããããšãã§ããŸãïŒããã³æ€èšŒããå¿ èŠããããŸãïŒ
åžžã«æèœãªæ€èšŒãæ§ç¯ããããšããå§ããŸã-äœããããŸãã;ä»ã®ãã¹ãŠã®åªåã¯çã«é£ã³èŸŒã¿ãŸãã ã¡ã«ã»ãã¹ã®ãªãŒããŒããŒããããäžåºŠèŠãŠãã ããã
èè ã¯ããã®ã³ã³ããã£ã·ã§ã³ã§å®å®ããã¯ãã¹æ€èšŒã¹ããŒã ïŒ3x10åïŒãæ§ç¯ããã¹ããŒããç¶æããæ£åœãª42äœã«ãªã£ãããšã«æ¬åœã«æºè¶³ããŠããŸãïŒ
é©åãªæ€èšŒãæ§ç¯ãããŠããå Žåã¯ãåžžã«æ€èšŒçµæãä¿¡é ŒããŠãã ãã ã ã¢ãã«ã®é床ãæ€èšŒã§ã¯åäžããããå ¬éã§ã¯æªåããå Žå-æ€èšŒãä¿¡é Œããæ¹ãåççã§ãã åæãããšãã¯ããããªãã¯ãªãŒããŒããŒããå¥ã®ãã©ãŒã«ããšèŠãªãããããŒã¿ãåã«èªãã§ãã ããã ã¢ãã«ã1åã«ããããªãã§ããïŒ
ã¢ãã«ãšã¹ããŒã ã§èš±å¯ãããŠããå Žåã åžžã«OOFäºæž¬ãè¡ããã¢ãã«ã®è¿ãã«ããããä¿æããŸãã ã¢ã³ãµã³ãã«ã®æ®µéã§ã¯ãäœãæ®åœ±ããããããããŸããã
åžžã«çµæã®é£ã«/ OOFã³ãŒããä¿æããŠåä¿¡ããŸãã githubãããŒã«ã«ãã©ãã§ãåé¡ãããŸããã 2åãã¢ã³ãµã³ãã«ã§æé©ãªã¢ãã«ã¯2é±éåã«ç®±ããåºããŠäœæããããã®ã§ãããã³ãŒãã¯ä¿åãããŠããªãããšãå€æããŸããã çã¿
ã¯ãã¹ããªããŒã·ã§ã³ã®ããã®ãæ£ãããSIDã®éžæã«ãã³ããŒãããã圌èªèº«ãæåã«çœªãç¯ããŸããã ä»»æã®3ã€ãéžæãã3xNã®çžäºæ€èšŒãå®è¡ããããšããå§ãããŸãã çµæã¯ãããå®å®ããŠç°¡åã«ãªããŸãã
ã¢ã³ãµã³ãã«å ã®ã¢ãã«ã®æ°ãè¿œããããªãã§ãã ãã -å°ãªãã»ããè¯ãã§ãããããå€æ§ã§ã-ã¢ãã«ãååŠçãããŒã¿ã»ããã§ããå€æ§ã§ãã ææªã®å Žåããã©ã¡ãŒã¿ã«å¿ããŠãããšãã°ãå³å¯ãªæ£ååãæã€1ã€ã®æ·±ãããªãŒãšã1ã€ã®æµ ãããªãŒããããŸãã
shuffle / boruta / RFEã䜿çšããŠæ©èœãéžæããŸã ãããŸããŸãªããªãŒããŒã¹ã®ã¢ãã«ã§ã®æ©èœã®éèŠæ§ã¯ãããããè¢ã®ãªãŠã ã®ææšã§ããããšã«æ³šæããŠãã ãã ã
èè ã®å人çãªæèŠïŒèªè ã®æèŠãšäžèŽããªãå ŽåããããŸãïŒ ãã€ãºæé©å >ã©ã³ãã æ€çŽ¢> ãã€ããŒãã©ã¡ãŒã¿ãŒãéžæããããã®ãã€ããŒãªãã ã ïŒ ">" ==è¯ãïŒ
ãããªãã¯ã«ãŒãã«ã«é 眮ããããã£ã¢ãªã³ã°ãªãŒããŒããŒã㯠ã次ã®ããã«æé©ã«åŠçãããŸãã
- æéããããŸã-ç§ãã¡ã¯æ°ãããã®ãèŠãŠãèªåã§äœããŸã
- çãæé-æ€èšŒã®ããã«ããçŽããOOFãå®è¡ããã¢ã³ãµã³ãã«ã«åºå®ããŸã
- ãŸã£ããæéããããŸãã-ç§ãã¡ã¯æé«ã®ãœãªã¥ãŒã·ã§ã³ãšæãã«ãã¬ã³ãããé«éã«èŠããŸãã
2ã€ã®æçµæåºç©ãéžæããæ¹æ³ -ãã¡ãããçŽèŠ³ã«ããã ããããçå£ã«ããããŠé垞誰ãã次ã®ã¢ãããŒããå®è·µããŸãïŒ
- ä¿å®çãªæåºïŒæç¶å¯èœãªã¢ãã«ïŒ/ãªã¹ã¯ã®ããæåºã
- OOFã®ãã¹ã/ãããªãã¯ãªãŒããŒããŒã
èŠããŠãããŠãã ãã-ãã¹ãŠãæ°åã§ããããã®åŠçã®å¯èœæ§ã¯ããªãã®æ³ååã«ã®ã¿äŸåããŸãã ååž°ã®ä»£ããã«åé¡ã䜿çšããã·ãŒã±ã³ã¹ãç»åãšããŠæ±ããªã©ã
ãããŠæåŸã«ïŒ
- ods.aiã«åå ããŠãã ãã :) ãã£ãããããŠã DSãšäººçãã楜ãã¿ãŸãããïŒ ïŒ
䟿å©ãªãªã³ã¯
å šè¬
http://ods.ai/-æé«ã®DSã³ãã¥ããã£ã«åå ããã人åã:)
https://mlcourse.ai/-ods.aiã³ãŒã¹Webãµã€ã
https://www.Kaggle.com/general/68205-Kaggleã®ã³ãŒã¹ã«é¢ããæçš¿
äžè¬ã«ããã®èšäºã§èª¬æãããŠããã®ãšåãã¢ãŒãã§ãmltrainingsã®ãããªãµã€ã¯ã«ãã芧ã«ãªãããšã匷ããå§ãããŸããèå³æ·±ãã¢ãããŒããšãã¯ããã¯ããããããããŸãã
æ å
- ã°ã©ã³ããã¹ã¿ãŒã«ãªãæ¹æ³ã«ã€ããŠã®éåžžã«è¯ããããª:) by Pavel PleskovïŒppleskovïŒ
- ã¹ã¿ãã¹ã©ãã»ã»ã¡ããïŒstasg7ïŒããã®BNPããªãã³ã³ããã£ã·ã§ã³ã®äŸã§ã®ãããã³ã°ãéæšæºã¢ãããŒããããã³å¹³åã¿ãŒã²ãããšã³ã³ãŒãã£ã³ã°ã«é¢ãããããª
- ã¹ã¿ãã¹ã©ããšã®å¥ã®ãããªãKaggleãæããããšã
ã³ãŒã¹
å°éåéã®2幎ç®ã®ã ããŒã¿ãµã€ãšã³ã¹ã³ã³ããã£ã·ã§ã³ã«åã€æ¹æ³ïŒãããã«ã°ã©ãŒããåŠã¶ ããããKaggleã§åé¡ã解決ããæ¹æ³ãšã¢ãããŒãã«ã€ããŠè©³ããç¥ãããšãã§ããŸãã
課å€ãªãŒãã£ã³ã°ïŒ
- Laurae ++ãXGBoost / LightGBPãã©ã¡ãŒã¿ãŒ
- FastText-Facebookããã®ããã¹ãã®åã蟌ã¿
- WordBatch / FTRL / FM-FTRL-@anttipã®ã©ã€ãã©ãªã®ã»ãã
- å¥ã®FTRLå®è£
- ãã€ãžã¢ã³æé©å-ãã€ããŒãã©ã¡ãŒã¿ãŒãéžæããããã®ã©ã€ãã©ãª
- Regularized Greedy ForestïŒRGFïŒã©ã€ãã©ãª-å¥ã®ããªãŒæ¹åŒ
- å®éã®ã¢ãã«ã¹ã¿ããã³ã°ã®ã«ã°ã©ãŒã¬ã€ã
- ELI5- Konstantin LopukhinïŒkostiaïŒã® ã¢ãã«ã®éã¿ãèŠèŠåããããã®åªããã©ã€ãã©ãª
- æ©èœã®éžæïŒã¿ãŒã²ããã®é å-å éšã®ãªã³ã¯ããã©ããŸã
- ããªãŒã¢ãã«ã®æ©èœéèŠåºŠæž¬å®
- éèŠåºŠãnullã®æ©èœéžæ
- ãªãŒããšã³ã³ãŒããŒã«ã€ããŠç°¡åã«
- ã¹ã©ã€ãã·ã§ã¢ã§ã®Kaggleãã¬ãŒã³ããŒã·ã§ã³
- ãããŠããäžã€
- ãããŠäžè¬çã«å€ãã®èå³æ·±ãããšããããŸã
- kaggleã³ã³ãã¹ãã®åè³ãœãªã¥ãŒã·ã§ã³
- Kaggleã®ããŒã¿ãµã€ãšã³ã¹çšèªé
ãããã«
äžè¬çãªããŒã¿ãµã€ãšã³ã¹ã®ãããã¯ãç¹ã«ç«¶åããããŒã¿ãµã€ãšã³ã¹ã®ãããã¯ã¯ãã¢ãã ïŒCïŒãšåãããã«ç¡å°œèµã§ãã ãã®èšäºã§ã¯ãèè ã¯ç«¶äºåã®ãããã©ãããã©ãŒã ã䜿çšããŠå®çšçãªã¹ãã«ãåäžããããããã¯ãå°ãã ãæããã«ããŸããã ãããé¢çœããªã£ãã-æ¥ç¶ããèŠåããçµéšãèç©ã-ããªãã®èšäºãæžããŠãã ããã ããè¯ãã³ã³ãã³ããããã°ããã»ã©ãç§ãã¡å šå¡ã«ãšã£ãŠè¯ãããšã§ãïŒ
質åãäºæ³ããŠ-ããããèè ã®ãã€ãã©ã€ã³ãšã©ã€ãã©ãªã¯ãŸã èªç±ã«å©çšå¯èœã«ãããŠããŸããã
ods.aiã®ååã«æè¬ããŸãïŒ Vladimir IglovikovïŒternausïŒ ã Yuri KashnitskyïŒyorkoïŒ ã Valery Babushkin ïŒ venheadsïŒ ã Alexei PronkinïŒpronkin_alexeyïŒ ã Dmitry PetrovïŒdmitry_petrovïŒ ã Arthur KuzinïŒn01z3ïŒ ããŸãã¿ããªç·šéããã³ã¬ãã¥ãŒçšã®å ¬éåã®èšäºã
æçµæ ¡æ£ãããŠãããNikita ZavgorodnoyïŒnjzïŒã«æè¬ããŸãã
ãæž èŽããããšãããããŸããããã®èšäºã誰ãã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã
Kaggle / ods.aiã§ã®ç§ã®ããã¯ããŒã ïŒ kruegger