ããŒã¿ãœãŒã¹ã®äœæ
ML StudioãéããŸãããïŒ
æ°ããå®éšã1ã€äœæãïŒAzure MLã«é¢ããŠã¯ãå ¥åããŒã¿ã®èªã¿åãããåçã®åä¿¡ãŸã§ãåé¡ã«å¯Ÿããå®å šãªãœãªã¥ãŒã·ã§ã³ãè¡šããŸãããã®åŸãWebãµãŒãã¹ã«å€æã§ããŸãïŒãå ¥åããŒã¿ãè¡šã2ã€ã®æ°ããããŒã¿ãœãŒã¹ïŒããŒã¿ã»ããïŒãäœæããŸãèšå·ãå€çšïŒã ML Boot Camp csv Webãµã€ããããã¬ãŒãã³ã°ãµã³ãã«ãã¡ã€ã«ïŒx_train.csvããã³y_train.csvïŒãããŠã³ããŒãããŸãã ããŒã¿ãœãŒã¹ãè¿œå ããã«ã¯ãå·ŠåŽã®ã¡ãã¥ãŒã§[ããŒã¿ã»ãã]é ç®ãéžæããå·Šäžé ã®[æ°èŠ]ãã¯ãªãã¯ããå¿ èŠããããŸãããã®ãŠã£ã³ããŠã衚瀺ãããŸãã
x_train.csvãã¡ã€ã«ãžã®ãã¹ãæå®ãããã®ããŒã¿ãœãŒã¹ã«x_trainãšããååãä»ããŸãã ãŸããy_trainããŒã¿ãœãŒã¹ãäœæããŸãã ããã§ããããã®ããŒã¿ãœãŒã¹ã®äž¡æ¹ã[ããŒã¿ã»ãã]ã¿ãã«è¡šç€ºãããŸãã
å®éšã®äœæãç¹æ§ã®éžæ
å®éšãäœæããŸãããã®ããã«ãå·ŠåŽã®ã¡ãã¥ãŒã§ãå®éšãé ç®ãéžæããå·Šäžã®ãæ°èŠããã¯ãªãã¯ããŠãããã©ã³ã¯å®éšããéžæããŸãã äžçªäžã®è¡ã§ãé©åãªååãä»ããããšãã§ããŸãããã®çµæãããŒã¿ãµã€ãšã³ã¹ã®æäœã«æ¬¡ã®ç¯å²ãé©çšãããŸãã
ã芧ã®ãšãããå·ŠåŽã«ã¯ãããŒã¿ã®å ¥åãšåºåãåã®éžæãååž°ã®ããŸããŸãªæ¹æ³ãåé¡ãªã©ãå®éšã«è¿œå ã§ãããã¹ãŠã®å¯èœãªæäœããªã¹ãããã¡ãã¥ãŒããããŸãã ãããã®ãã¹ãŠã¯ãç°ãªãæäœãäžç·ã«ãã©ãã°ã¢ã³ãããããããã ãã§ãå®éšã«è¿œå ãããŸãã
次ã«ãã¿ã¹ã¯ã®å ¥åãšããŠäœ¿çšãããã®ã衚瀺ããå¿ èŠããããŸãã å·ŠåŽã®ã¡ãã¥ãŒã§ãäžçªäžã®é ç®ãä¿åãããããŒã¿ã»ãããã次ã«ããã€ããŒã¿ã»ããããéžæããŸãããªã¹ãã§äœæããããŒã¿ãœãŒã¹ãx_trainããšãy_trainããéžæããå®éšã®ã¯ãŒã¯ã¹ããŒã¹ã«ãã©ãã°ããŸãã
ãã¹ãŠã®Azure MLã¡ãœããã¯1ã€ã®ããŒãã«ïŒããŒã¿ãã¬ãŒã ïŒã§æ©èœãããããããã2ã€ã®ããŒã¿ãœãŒã¹ã®åãçµåããå¿ èŠããããŸãããã®ããŒãã«ã§ã¯ããã¬ãŒãã³ã°å€ã§ããåãæå®ããå¿ èŠããããŸãã ãããè¡ãã«ã¯ãåã®è¿œå ã¢ãžã¥ãŒã«ã䜿çšããŸãã ãã³ãïŒã¢ãžã¥ãŒã«ãæ€çŽ¢ãããšãããŒã¯ãŒãã§ã¢ãžã¥ãŒã«ãæ€çŽ¢ãããããã®ãããªã¢ãžã¥ãŒã«ããŸã ååšããªãããšã確èªãããã§ããŸãã ãåã®è¿œå ãæäœãã¯ãŒã¯ã¹ããŒã¹ã«ãã©ãã°ããããŒã¿å ¥åçšã®2ã€ã®äžäœãã€ã³ããããããããŒã¿ãœãŒã¹x_trainããã³y_trainã«æ¥ç¶ããŸãã ãã®æäœã«ã¯ãã©ã¡ãŒã¿ãŒããªããããè¿œå ã§æ§æããå¿ èŠã¯ãããŸããã ååŸãããã®ïŒ
ããŒã¿ãã©ã®ããã«èŠãããã確èªããŸãã äžçªäžã®è¡ã«ãã[å®è¡]ãã¿ã³ãã¯ãªãã¯ããŠãå®éšãå®è¡ããŸãã å®éšãæ£åžžã«å®äºãããããåã®è¿œå ãæäœã®åºåãã¯ãªãã¯ããŠããèŠèŠåãã¢ã¯ã·ã§ã³ãéžæã§ããŸãã
ããããã£ãŠã£ã³ããŠã§ã¯ãåç¹æ§ã®åãæåã®è¡ãå¹³åãäžå€®å€ããã¹ãã°ã©ã ãªã©ã確èªã§ããŸãã ããŒãã«ã«ã¯952åïŒèšå·ïŒããããããããéèŠãªåïŒåé¡ã®è§£æ±ºã«åœ¹ç«ã€åïŒãéžæããå¿ èŠããããŸãã æ©èœã®éžæã¯ãData Scienceã§æãè€éã§é決å®çãªæäœã®1ã€ã§ãããããç°¡åã«ããããã«ãäžèŠéèŠãªããã€ãã®æ©èœãéžæããŸãã ãããè¡ãã®ã«åœ¹ç«ã€ã¢ãžã¥ãŒã«ã¯ãããŒã¿ã»ããã®åã®éžæãšåŒã°ããŸãã ã¯ãŒã¯ã¹ããŒã¹ã«è¿œå ãããåã®è¿œå ãæäœã«æ¥ç¶ããŸãã ããã§ããããŒã¿ã»ããã®åãéžæããã©ã¡ãŒã¿ãŒã§ãæ®ããµã€ã³ãæå®ããŸãã ãããè¡ãã«ã¯ããããŒã¿ã»ããã®åãéžæãã¢ãžã¥ãŒã«ãéžæããå³åŽã®ãã€ã³ã®ããããã£ã§ãåã»ã¬ã¯ã¿ãŒãèµ·åããã¯ãªãã¯ããŸãã
ããã§ãæ®ãããåã®ååãè¿œå ããŸãïŒããã¯åã®æé©ãªéžæã§ã¯ãããŸããïŒããæéãåãè¿œå ããããšãå¿ããªãã§ãã ããïŒ
å®éšãå床å®è¡ããéžæããåã®ã¿ãçµæã®è¡šã«æ®ãããã«ããŠãã ããã ããŒã¿ãæºåããæåŸã®ã¹ãããïŒããŒã¿ã70:30ã®å²åã§ãã¬ãŒãã³ã°ãµã³ãã«ãšãã¹ããµã³ãã«ã«åå²ããŸãã ãããè¡ãã«ã¯ãã¯ãŒã¯ã¹ããŒã¹ã§ãSplit Dataãã¢ãžã¥ãŒã«ãèŠã€ããŠé 眮ãããã®èšå®ã§ãæåã®åºåããŒã¿ã»ããã®è¡ã®å²åãã0.7ã«èšå®ããŸãã ååŸãããã®ïŒ
ã¢ã«ãŽãªãºã ã䜿çšãã
ããã§ãæçµçã«äœããã®ååž°æ¹æ³ã䜿çšããæºåãæŽããŸããã ã¡ãœããã¯å·ŠåŽã®ã¡ãã¥ãŒã«ãªã¹ããããŠããŸãïŒæ©æ¢°åŠç¿ãã¢ãã«ã®åæåãååž°ïŒ
æåã«ã決å®æšãã©ã¬ã¹ãã®æ¹æ³ãã決å®ãã©ã¬ã¹ãååž°ããè©ŠããŠã¿ãŸãããã ãããã¯ãŒã¯ã¹ããŒã¹ãšã¢ãžã¥ãŒã«ãTrain modelãã«è¿œå ããŸãã ãã®ã¢ãžã¥ãŒã«ã«ã¯2ã€ã®å ¥åããããŸãã1ã€ã¯ã¢ã«ãŽãªãºã ïŒãã®äŸã§ã¯ãDecision Forest RegressionãïŒã«æ¥ç¶ããããã1ã€ã¯ãã¬ãŒãã³ã°ãµã³ãã«ã®ããŒã¿ïŒãSplit Dataãã¢ãžã¥ãŒã«ã®å·ŠåºåïŒã«æ¥ç¶ãããŸãã å®éšã¯æ¬¡ã®ããã«ãªããŸãã
ãTrain modelãã¢ãžã¥ãŒã«ã®èµ€ãåã¯ã調æŽããŠããªãå¿ é ãã©ã¡ãŒã¿ãŒãããããšã瀺ããŠããŸããäºæž¬ããããšããŠãããµã€ã³ã瀺ãå¿ èŠããããŸãïŒãã®å Žåãããã¯æéã§ãïŒã ãåã»ã¬ã¯ã¿ãŒãèµ·åããã¯ãªãã¯ããŠãåäžã®æéåãè¿œå ããŸãã ã¡ãœããèªäœã«ã¯ããã©ã«ãèšå®ããããæåã§åæ§æããªããŠãéå§ã§ããããšã«æ³šæããŠãã ããã ãã¡ãããè¯ãçµæãåŸãã«ã¯ãåã¡ãœããã«åºæã®ãã©ã¡ãŒã¿ãŒã®ããŸããŸãªçµã¿åãããè©Šãå¿ èŠããããŸãã ããã§å®éšãéå§ã§ããæš¹æšã®æ£®ãæ§ç¯ããããã§ã«ããªãã¿ã®VisualizeãŠã£ã³ããŠãåŒã³åºããŠè¡šç€ºããããšãã§ããŸãã ã¢ãã«ããã¬ãŒãã³ã°ããåŸãåæããŒã¿ã®30ïŒ ãè¡šããã¹ãïŒæ€èšŒïŒãµã³ãã«ã§ã¢ãã«ããã¹ããããšããã§ãããã ãããè¡ãã«ã¯ããã¹ã³ã¢ã¢ãã«ãã¢ãžã¥ãŒã«ã䜿çšããŠãæåã®å ¥åãããã¬ãŒãã³ã°ã¢ãã«ãã¢ãžã¥ãŒã«ïŒãã¬ãŒãã³ã°ã¢ãã«ïŒã®åºåã«æ¥ç¶ãã2çªç®ããã¹ããªããããŒã¿ãã¢ãžã¥ãŒã«ã®2çªç®ã®åºåã«æ¥ç¶ããŸãã äžé£ã®æäœã¯æ¬¡ã®ããã«ãªããŸãã
å®éšãå床å®è¡ããŠããã¹ã³ã¢ã¢ãã«ãã®åºåã確èªã§ããŸãã
ãã¹ã³ã¢ä»ãã©ãã«å¹³åãïŒäºæž¬å€ã®å¹³åïŒãšãã¹ã³ã¢ä»ãã©ãã«æšæºåå·®ãïŒå®éã®å€ããã®äºæž¬å€ã®æšæºåå·®ïŒãšãã2ã€ã®æ°ããåãè¿œå ãããŸããã äºæž¬å€ãšå®éã®å€ïŒå³ã«è¡šç€ºïŒã®æ£åžå³ïŒæ£åžå³ïŒãäœæããããšãã§ããŸãã ããã§ããã¹ã³ã¢ã¢ãã«ãã¢ãžã¥ãŒã«ã«æ¥ç¶ãããŠãããã¢ãã«ã®è©äŸ¡ãã¢ãžã¥ãŒã«ã䜿çšããŠããã®ç²ŸåºŠã確èªããŸãã
Evaluate Modelã¢ãžã¥ãŒã«ã®åºåã«ã¯ã絶察誀差ããã³çžå¯Ÿèª€å·®ãªã©ããã¹ãããŒã¿ã®ã¡ãœããã®ç²ŸåºŠã«é¢ããæ å ±ãå«ãŸããŠããŸãã
ãã¡ããããã®æ¹æ³ã¯å®å šã§ã¯ãããŸãããããŸã£ããèšå®ããŠããŸããã
æ°ããã¡ãœããã®è¿œå ãšã¡ãœããã®æ¯èŒ
決å®æšã«åºã¥ããå¥ã®æ¹æ³ããããŒã¹ãããã決å®æšååž°ããè©ŠããŠã¿ãŸãããã æåã®æ¹æ³ãšåãããã«ããTrain Modelãããã³ãScore Modelãã¢ãžã¥ãŒã«ãè¿œå ããå®éšãéå§ããŠãæ°ããæ¹æ³ã®ãScore Modelãã¢ãžã¥ãŒã«ã®åºåã確èªããŸãã äºæž¬å€ãè¡šããã¹ã³ã¢ä»ãã©ãã«ããšããåã1ã€ã ãè¿œå ãããŠããããšã«æ³šæããŠãã ããããã®ããã®æ£åžå³ãäœæã§ããŸãã
次ã«ãæ¢ã«è¿œå ãããŠããã¢ãã«ã®è©äŸ¡ã¢ãžã¥ãŒã«ã䜿çšããŠãããã2ã€ã®ã¡ãœããã®ç²ŸåºŠãæ¯èŒããŸãããã®ããã2çªç®ã®ã¡ãœããã®ã¹ã³ã¢ã¢ãã«ã®åºåã«æ£ããå ¥åãæ¥ç¶ããŸãã ãã®çµæã次ã®äžé£ã®æäœãååŸããŸãã
Evaluate Modelã¢ãžã¥ãŒã«ã®åºåãèŠãŠã¿ãŸãããã
ããã§ãã¡ãœãããäºãã«æ¯èŒããïŒã¿ã¹ã¯ã«å¿ èŠãªæå³ã§ïŒç²ŸåºŠãé«ãã¡ãœãããéžæã§ããŸãã
å®éã®ããŒã¿ã§åé¡ã解決ããŸã
ç§ãã¡ã¯æ¹æ³ãèšç·Žãããã®æ£ç¢ºããç¥ã£ãŠããŸã-ããã¯æŠãã§ãããããã¹ãããæã§ãã x_test.csvãã¡ã€ã«ãããŠã³ããŒãããŸãããã®ãã¡ã€ã«ã«ã¯ãè¡åä¹ç®ã®æéãäºæž¬ããå¿ èŠãããããŒã¿ãå«ãŸããŠããŸãã èšç·Žãããæ¹æ³ã䜿çšããã«ã¯ã次ã®ãã®ãå¿ èŠã§ãã
- x_testãšããååã®æ°ããããŒã¿ãœãŒã¹ãšãã¡ã€ã«x_test.csvããã®ããŒã¿ãè¿œå ããŸãã
- æ°ããx_testããŒã¿ãœãŒã¹ãå®éšã¯ãŒã¯ã¹ããŒã¹ã«ãã©ãã°ããŸãã
- ããã§ããã¬ãŒãã³ã°ã«åå ããåã®ã¿ãæ®ãããããŒã¿ã»ããã®åãéžæãã¢ãžã¥ãŒã«ãã³ããŒããåã®ãªã¹ããããæéãåãåé€ããå¿ èŠããããŸãïŒãã¹ãããŒã¿ã«ãªãããïŒã
- ããã§ãæºåæžã¿ã®ããŒã¿ã«å¯ŸããŠãã¬ãŒãã³ã°æžã¿ã¡ãœãããå®è¡ã§ããŸããããã«ã¯ããã¹ã³ã¢ã¢ãã«ãæäœãè¿œå ãããã®æåã®å ¥åãBoosted Decision Tree Regressionã¡ãœããã®Train Modelã¢ãžã¥ãŒã«ã®åºåã«æ¥ç¶ãã2çªç®ã®å ¥åãéžæããã°ããã®Select Columnsã®åºåã«æ¥ç¶ããŸãããŒã¿ã»ããå ãã
- ä»ã§ã¯ãML Boot Camp Webãµã€ãã®ãœãªã¥ãŒã·ã§ã³ãšããŠããŠã³ããŒãã§ãã圢åŒã«ããŒã¿ãæã£ãŠããã ãã§ãã ãããè¡ãã«ã¯ãå¥ã®ãããŒã¿ã»ããã®åãéžæãã¢ãžã¥ãŒã«ãè¿œå ããŸãããã®ã¢ãžã¥ãŒã«ã§ã¯ãäºæž¬ããããã¹ã³ã¢ä»ãã©ãã«ãå€ã®ã¿ãéžæãããCSVã«å€æãã¢ãžã¥ãŒã«ãåºåã«è¿œå ããŸãã
ãã®çµæã次ã®å®éšãåŸãããŸãã
ãCSVã«å€æãã¢ãžã¥ãŒã«ã®åºåãã¯ãªãã¯ãããããŠã³ããŒãããéžæããŠãçµæã®csvãã¡ã€ã«ãããŠã³ããŒãã§ããŸãã çµæã®csvããïŒååãæã€ïŒæåã®è¡ãåé€ããML Boot Camp Webãµã€ãã«ã¢ããããŒãããŸãã åäœããŸãïŒ ãããã粟床ã¯è²§åŒ±ã§ãã
ãããªãæé©å
ååž°ã®ç²ŸåºŠãåäžãããã®ã«åœ¹ç«ã€ããã€ãã®ã¢ãžã¥ãŒã«ãæ€èšããŠãã ããã
- å·ŠåŽã®ã¡ãã¥ãŒã«ããããŸããŸãªæ¹æ³ãè©ŠããŠãã ããã
- ïŒããããã£ã«èšå®ãããŠããããã€ãã®ç°ãªãæ¹æ³ã䜿çšããŠïŒæ倧ã®äºæž¬èœåãæã€æ©èœãéžæããããšãããã£ã«ã¿ãŒããŒã¹ã®æ©èœéžæã¢ãžã¥ãŒã«ã¯ãæ©èœã®éžæã«åœ¹ç«ã¡ãŸãã ãã®ã¢ãžã¥ãŒã«ã¯ãããŒã¿ã»ããã¢ãžã¥ãŒã«ã®åã®éžæã®ä»£ããã«è¿œå ãããŸãã
- æ¢ã«åŠç¿æžã¿ã®ã¢ãã«ã§ã©ã®æ©èœããã圹ç«ã€ããè©äŸ¡ããã«ã¯ãåŠç¿æžã¿ã¢ãã«ãšäžé£ã®ãã¹ãããŒã¿ãå ¥åãã©ã¡ãŒã¿ãŒãšããŠäœ¿çšããPermutation Feature Importanceã¢ãžã¥ãŒã«ã圹ç«ã¡ãŸãã
- ãTune Model Hyperparametersãã¢ãžã¥ãŒã«ã¯ãã¡ãœãããã©ã¡ãŒã¿ãŒãéžæããã®ã«åœ¹ç«ã¡ãŸããããã«ãããããŸããŸãªãã©ã¡ãŒã¿ãŒã»ããã§éå§ãããç¹å®ã®ã¡ãœããæ°ãå®è¡ãããåå®è¡ã®ç²ŸåºŠã衚瀺ãããŸãã
- éç«åšãšããŠããRã¹ã¯ãªããã®å®è¡ãããã³ãPythonã¹ã¯ãªããã®å®è¡ãã¢ãžã¥ãŒã«ã䜿çšããŠãRããã³Pythonã¹ã¯ãªããã䜿çšã§ããŸãã
ãããã«
Azure MLã奜ãã§ããAzureMLã䜿çšãããšãåé¡ã®è§£æ±ºçããã°ãããããã¿ã€ãåããŠããã®ãœãªã¥ãŒã·ã§ã³ã®ã«ã¹ã¿ãã€ãºãšæé©åãæãäžããããšãã§ããŸãã
å®éšã¯ã®ã£ã©ãªãŒã«æçš¿ããã次ã®ã¢ãã¬ã¹ã®ãã¹ãŠã®åå è ã«å ¬éãããŠããŸãïŒ gallery.cortanaintelligence.com/Experiment/ML-Boot-Camp-from-Mail-ru-1
ã³ã³ãã¹ãã«åå ããŠãã ããïŒ 0.1æªæºã®MAPEãšã©ãŒãåãåãããšãã§ãã人ã¯èª°ã§ãæžããŠãã ãããèè ã¯åãã§ããŸãã