ãã®èšäºãæžããŠãããMerkuã®Kirk Malevã«æè¬ããŸãã Cyrilã¯3幎以äžã«ããããããŸããŸãªéã®ããŒã¿ã«å¯Ÿããæ©æ¢°åŠç¿ã®å®çšåã«åãçµãã§ããŸããã å瀟ã¯ã顧客ã®è§£çŽãšèªç¶èšèªåŠçãäºæž¬ããåéã®åé¡ã解決ããçµæã®åæ¥åã«å€§ããªæ³šæãæã£ãŠããŸãã ãããŒãã£å€§åŠããã³NSTUãåæ¥
ä»æ¥ã¯ãæ©æ¢°åŠç¿ã¿ã¹ã¯ã解決ããããã«å®éã«Azureã¯ã©ãŠããã©ãããã©ãŒã ã䜿çšããæ¹æ³ã«ã€ããŠèª¬æããŸããæ©æ¢°åŠç¿ã¿ã¹ã¯ã解決ããããã«ãã¿ã€ã¿ããã¯ã®çãæ®ã£ãä¹å®¢ãäºæž¬ããäžè¬çãªã¿ã¹ã¯ãäŸãšããŠäœ¿çšããŸãã
ç§ãã¡ã¯çããã¯ããŠã®æåãªåçãèŠããŠããã®ã§ããã®èšäºã§ã¯ãã¹ãŠã®æé ã«ã€ããŠè©³ããã³ã¡ã³ãããŠããŸãã æé ãããããªãå Žåã¯ãã³ã¡ã³ãã§è³ªåããããšãã§ããŸãã
æè¿ã®ã¬ãŒãããŒã®ã¬ããŒãã«ãããšãããã°ããŒã¿ãšæ©æ¢°åŠç¿ã¯ãã§ã«çç£æ§ã®é«åã«è¿ã¥ããŠããŸãã ããã¯ãåžå Žãããã°ããŒã¿åŠçæè¡ãé©çšããæ¹æ³ãååã«ç解ããŠãããèåšã®3Då°å·ãç«æã®æ€æ°å°åã®ã¢ã€ãã¢ãããããã®ãããã¯ã«äººã ãæ £ããŠããããšãæå³ããŸãã
æ©æ¢°åŠç¿ã«Azureã¯ã©ãŠããã©ãããã©ãŒã ã䜿çšããããšã«é¢ããèšäºãHabrahabrã§æ¢ã«å ¬éãããŠããŸãã 圌ãã¯ããã¹ãŠãã©ã®ããã«æ©èœãããããã®ãã©ãããã©ãŒã ãã©ã®ããã«äœ¿çšã§ãããããããŠãã®é·æãšçæã«ã€ããŠè©±ããŸãã ãã®ã³ã¬ã¯ã·ã§ã³ã2ã€ã®å®çšçãªäŸã§è£å®ããŸãã
ã©ã¡ãã®äŸã§ããKaggle Data Scientist競äºãã©ãããã©ãŒã ã䜿çšããŸãã Kaggleã¯ãäŒæ¥ãããŒã¿ãæçš¿ãã解決ãããåé¡ãç¹å®ãããã®åŸãäžçäžã®ãšã³ãžãã¢ãåé¡ã®æ£ç¢ºãªè§£æ±ºçã競ãå Žæã§ãã ååãšããŠãæé«ã®çµæãåŸãããã«ãåè ã¯æ±ºå®ãçºè¡šãããšçŸéè³éãåãåããŸãã
åæã«ãåæ¥ç«¶äºã«å ããŠãKaggleã¯ãã¬ãŒãã³ã°ã«äœ¿çšãããæäŸãããããŒã¿ã§ã¿ã¹ã¯ã解æããŸããã 圌ãã«ãšã£ãŠããç¥èãã¯è³ãšããŠçŽæ¥ç€ºãããŠããããããã®ç«¶äºã¯çµãããŸããã
Azure MLã®çŽ¹ä»ã¯ãæãæåãªã¿ã€ã¿ããã¯ïŒMachine Learning from Disasterã«åå ããŠéå§ããŸãã ããŒã¿ãã¢ããããŒãããRèšèªã¢ãã«ãšAzure MLçµã¿èŸŒã¿é¢æ°ã䜿çšããŠåæããçµæãååŸããŠKaggleã«ã¢ããããŒãããŸãã
åçŽãªãã®ããè€éãªãã®ã«ç§»è¡ããŸãããŸããRã³ãŒãã䜿çšããŠããŒã¿åæã®çµæãååŸãã次ã«ãã©ãããã©ãŒã èªäœã«çµã¿èŸŒãŸããåæããŒã«ã䜿çšããŸãã Rã®æ±ºå®ã¯ãã«ããã®äžã®èšäºã®æåŸã§äžããããç¹ã«å ¬åŒã®Kaggleãã¬ãŒãã³ã°è³æã«åºã¥ããŠãããããåå¿è ã¯ãåžæãããªãããŸã£ããåãçµæããŒãããåŸãæ¹æ³ãç¥ãããšãã§ããŸãã
ãã®èšäºã®äž»ãªç®çã¯ãã¯ã©ãŠããœãªã¥ãŒã·ã§ã³ã®æ©èœã䜿çšããŠåé¡ã®åé¡ã解決ããããšã§ãã Rã³ãŒãã®æ§æãå 容ãããããªãå Žåã¯ã Trevor Stevens ããã°ã§ Rèšèªã®ã¿ã€ã¿ããã¯åé¡ã®äŸã䜿çšããŠããŒã¿åæåé¡ã解決ããããã®ç°¡åãªçŽ¹ä»ãèªããã datacampã§ã€ã³ã¿ã©ã¯ãã£ããª1æéã³ãŒã¹ãåè¬ã§ããŸãã ã¢ãã«èªäœã«ã€ããŠã®ã質åã¯ã³ã¡ã³ãã§æè¿ããŸãã
ã¯ã©ãŠãã䜿ãå§ãã
Azureã䜿çšããã«ã¯ãMicrosoftã¢ã«ãŠã³ããå¿ èŠã§ãã ç»é²ã«ã¯5åããããŸãããŸããAzureã«ãŸã ç»é²ããŠããªãå Žåã¯ã延é·ç»é²ããŒãã¹ãæäŸããã¹ã¿ãŒãã¢ãããµããŒãããã°ã©ã ã§ããMicrosoft BizSparkã«æ³šæããå¿ èŠããããŸãã ãã®ããã°ã©ã ã®æ¡ä»¶ãæºãããªãå Žåãç»é²æã«200ãã«ã®è©Šçšçãå²ãåœãŠãããAzureã¯ã©ãŠãã®ãªãœãŒã¹ã«äœ¿çšã§ããŸãã Azure MLããã¹ãããèšäºãç¹°ãè¿ãã®ã«ååãªæ°ããããŸãã
ã¢ã«ãŠã³ãããæã¡ã®å Žåã¯ã Azure ML ã»ã¯ã·ã§ã³ã«ã¢ã¯ã»ã¹ã§ããŸãã
å·ŠåŽã®AzureãµãŒãã¹ã®ãªã¹ãã§ãMachine Learningããéžæãããšãã¯ãŒã¯ã¹ããŒã¹ïŒãå®éšããšåŒã°ããã¢ãã«ãšãããžã§ã¯ããã¡ã€ã«ãããïŒãäœæããããã«æ±ããããŸãã ããã«ããŒã¿ã¹ãã¬ãŒãžã¢ã«ãŠã³ããæ¥ç¶ããããæ°ããã¢ã«ãŠã³ããäœæããå¿ èŠããããŸãã ããŒã¿åŠçãŸãã¯ã¢ããããŒãçµæã®äžéçµæãä¿åããããã«å¿ èŠã§ãã
ã¯ãŒã¯ã¹ããŒã¹ãäœæãããããã®äžã«ãããžã§ã¯ããäœæã§ããŸãã ãããè¡ãã«ã¯ããäœæããã¿ã³ãã¯ãªãã¯ããå¿ èŠããããŸãã
ãããŠãããã§æ°ãããã©ãããã©ãŒã ããã¬ãŒã ã¯ãŒã¯ã®äŸãç¿åŸããããšã«æ £ããŠãã人ã¯ãçŽ æŽãããèŽãç©ãåãåããŸãããµãŒãã¹ã®ä»çµã¿ãããã«ç解ã§ããæ¢è£œã®ãããžã§ã¯ãã®äŸã§ãã åé¡ãçºçããå Žåãã»ãšãã©ã®è³ªåã«å¯Ÿããåçã¯ã䟿å©ãªãªã³ã©ã€ã³ããã¥ã¡ã³ãã«èšèŒãããŠããŸã ã Azureã§ã®äœæ¥ã®ãã©ã¯ãã£ã¹ã«ããã°ãå¿ èŠãªæ å ±ã¯ãã¹ãŠããã«ãããŸãã
æ¢è£œã®å®éšã®ã©ã€ãã©ãªã«ã¯ãRã§æ¬¡ã®ãããªã¢ãã«ãéå§ããäŸããããŸãã
ç¢å°ã§æ¥ç¶ããããããã®ãããã¯ã¯ãç¹å®ã®ãããžã§ã¯ãïŒãŸãã¯ãAzure MLçšèªã䜿çšããããã®å®éšïŒã§ã®ããŒã¿åæã®ããã»ã¹ã§ãã ãã®èŠèŠåã䜿çšãããšãããŒã¿åæãšCRISP-DMæ¹æ³è«ã®ç解ãå€§å¹ ã«ä¿é²ãããŸãã
ãã®æ¹æ³è«ã®æ¬è³ªã¯æ¬¡ã®ãšããã§ãã
- ããŒã¿ãç°å¢ã«ã¢ããããŒããããŸãã
- æçšãªæ©èœãããŒã¿ããéžæ/äœæãããŸã
- ã¢ãã«ã¯ãéžæããæ©èœã§ãã¬ãŒãã³ã°ãããŸãã
- å¥ã®ããŒã¿ã»ããã䜿çšããŠãã¢ãã«ã®å質ãè©äŸ¡ãããŸã
- å質ãäžååãªå Žåã¯ãæé 2ã4ãç¹°ãè¿ãããæºè¶³ã§ããå Žåã¯ãã¢ãã«ããã®ç®çã«äœ¿çšãããŸãã
å®éšãäœæããããã®ãã€ã¢ãã°ã®èšäºã§ã¯ãæåã®ãªãã·ã§ã³ããã©ã³ã¯å®éšããéžæãã2ãããã¯ãã¯ãŒã¯ã¹ããŒã¹ã«è»¢éããŸããå·ŠåŽã®ãRèšèªã¢ãã«ãã»ã¯ã·ã§ã³ãããRã¹ã¯ãªãããå®è¡ããããŒã¿å ¥åããã³åºåãããã©ã€ã¿ãŒãã
éå§ããã¢ãã«ã¯ãæåã®äŸã§ç€ºããã¢ãã«ãããæ§ããã«èŠããŸãã
ãExecute R Scriptããããã¯ã«ãããŒã¿ãããŠã³ããŒããããªããžã§ã¯ãã®æ°ããããããã£ãéžæããã¢ãã«ããã¬ãŒãã³ã°ããŠäºæž¬ãè¡ãã³ãŒããé 眮ããŸããã ããŒã«ã«ãã·ã³ã§ã¹ã¯ãªãããå®è¡ããå¯äžã®æ¹æ³ã¯ã次ã®è¡ã眮ãæããããšã§ã
write.csv(my_solution,file="my_solution.csv" , row.names=FALSE)
ã«
maml.mapOutputPort("my_solution")
ãœãªã¥ãŒã·ã§ã³ãã¯ã©ãŠãã«ä¿åããŠããããŠã³ããŒãã§ããããã«ããããïŒä»¥äžã§èª¬æããŸãïŒã
Rã³ãŒãã®æåŸã«ãããŸããŸãªãã©ã¡ãŒã¿ãŒã®éèŠæ§ã瀺ãè¡ããããŸããã ã¹ã¯ãªããåºåã¯ãã¡ãã¥ãŒé ç®ãVizualizeããã¯ãªãã¯ããããšã«ããããããã¯ããã®2çªç®ã®ãåºåãïŒã¹ã¯ãªãŒã³ã·ã§ããã®çªå·ã2ãïŒã§å©çšã§ããŸãã
ã³ãŒããæ©èœããã®ã§ãã©ã®å€æ°ãã¿ã€ã¿ããã¯ã®ä¹å®¢ã®çåã«ãšã£ãŠæãéèŠã§ããããèŠãŸããããçµæãååŸããŠKaggleã«ã¢ããããŒãããæ¹æ³ã¯ïŒ
ããã¯ãã©ã€ã³ã䜿çšããŠäºæž¬ãé 眮ãããããã¯ããã®æåã®åºå£ãåå ã§ãã
maml.mapOutputPort("my_solution")
ããã«ãããåºåãRã³ãŒãããWriterãªããžã§ã¯ãã«ãªãã€ã¬ã¯ãããããŒã¿ãŠã§ã¢ããŠã¹ã«ããŒã¿ã»ãããæžã蟌ãããšãã§ããŸãã èšå®ãšããŠãå®éšã§äœ¿çšãããªããžããªã®ååïŒhabrahabrdata1ïŒãšãçµæãæžã蟌ãã³ã³ãããŒã®ãã¹ãæå®ããŸããïŒsaved-datasets / kaggle-R-titanic-dataset.csv
䟿å®äžããã®å¥åã®ã¹ãã¬ãŒãžãäºåã«äœæããŠãAzure MLãµãŒãã¹ããŒã¿éã§ããŒã¿ã倱ãããªãããã«ããŸãïŒå®éšã¹ãã¬ãŒãžã§è¡šç€ºã§ããŸãïŒã ãšããã§ããªããžããªãäœæãããšãã¯ãã¢ã³ããŒã¹ã³ã¢ã_ããŸãã¯å€§æåã䜿çšã§ããªãããšã«æ³šæããŠãã ããã
ã¢ãã«ã®äºæž¬ã¯ãã¯ã©ãŠãã¹ãã¬ãŒãžãµãŒãã¹ããcsv圢åŒã§ããã«ããŠã³ããŒããããŸãã ãã®ã¢ãã«ãæåºãããšãçµæãåŸãããŸããïŒ 0.78469
Azure Machine Learning Toolsã䜿çšãã
Azure MLã®ã€ã³ã¿ãŒãã§ã€ã¹ãšäœæ¥ã«å°ã粟éãããã¹ãŠãæ©èœããããšã確èªããã®ã§ãã¯ã©ãŠãã«çµã¿èŸŒãŸããããå€ãã®æ©èœã䜿çšããŠããŒã¿ãæäœã§ããŸãã
éå§ããã«ã¯ããã¬ãŒãã³ã°ãšè©äŸ¡ã®ããã«ããŒã¿ãã¯ã©ãŠãã«ã¢ããããŒãããŸãã ãããè¡ãã«ã¯ããããŒã¿ã»ãããã»ã¯ã·ã§ã³ã«ç§»åãã以åã«ããŠã³ããŒããã.csvãã¡ã€ã«ãããŒãããŸãã
ãã®çµæãããŒã¿ã»ããã¯æ¬¡ã®ããã«ãªããŸãã
ãããã£ãŠã以åã¯ãã¹ãŠã®äœæ¥ãè¡ã£ãŠããã¹ã¯ãªãããæžãæããããšãã§ããŸããããŒã¿ãããŠã³ããŒãããåæåŠçãå®è¡ãããã¹ããšãã¬ãŒãã³ã°ã»ããã«åå²ããã¢ãã«ããã¬ãŒãã³ã°ãããã¹ããè©äŸ¡ããŸããã
åçŽãªãã®ããè€éãªãã®ã«ç§»è¡ããŸããRã®ã³ãŒãã¯ããŒã¿åŠçã®ã¿ãæ åœããŸãã ããŒã¿ãã¢ããããŒãããã¯ã©ãŠãã䜿çšããŠãã¬ãŒãã³ã°ãšå質è©äŸ¡ã®ããã®ã»ããã«åå²ããŸãã
ãããè¡ãã«ã¯ãã¹ã¯ãªããã®å é ã«2è¡ãè¿œå ããŸãã
train <- maml.mapInputPort(1) # class: data.frame test <- maml.mapInputPort(2) # class: data.frame
ãããŠãæšèãåŠçããã³ãŒãã®åŸã«ââçµäºããŸãã
maml.mapOutputPort("all_data")
ããã§ãå®éšã®åºæ¬çãªã¹ããŒã ã¯æ¬¡ã®ããã«ãªããŸãã
ãããŠããã¬ãŒãã³ã°ã»ãããšãã¹ãã»ããã«åå²ããæ¡ä»¶ã¯æ¬¡ã®ããã«ãªããŸãã
åæã«ãåºå1ã«ã¯ïŒåå²æ¡ä»¶ãæºããããŠããããïŒãã¹ãã¹ã€ãŒãããããåºå2ã«ã¯ãã¬ãŒãã³ã°ããããŸãã
ããã§ãAUCåºæºã䜿çšããŠãã¯ã©ãŠãã«çµã¿èŸŒãŸãããã€ããªåé¡ã¢ã«ãŽãªãºã ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããæºåãæŽããŸããã ãã®å®éšã®åºç€ãšããŠã ãã€ããªåé¡åã®æ¯èŒã®äŸãåãäžããŸããã
1ã€ã®ã¢ã«ãŽãªãºã ã®ç¢ºèªã¯æ¬¡ã®ãšããã§ãã
åã¢ã«ãŽãªãºã ã¯ãkaggle_titanic_trainãµã³ãã«ã®äžéšãå ¥åãšããŠåãåããæé©ãªã¢ã«ãŽãªãºã ãã©ã¡ãŒã¿ãŒãéžæããŸãã ãããã®ãã©ã¡ãŒã¿ãŒã¯Sweep Parametersãããã¯ïŒè©³çŽ°ã«ã€ããŠã¯ããã©ã¡ãŒã¿ãŒã®åæã«é¢ããèšäºãåç §ïŒã䜿çšããŠäžŠã¹æ¿ããããŸããããã«ãããç¹å®ã®ç¯å²å ã®ãã¹ãŠã®ãã©ã¡ãŒã¿ãŒããã¹ãŠã®ãã©ã¡ãŒã¿ãŒã®ã°ãªããããŸãã¯ã©ã³ãã ãã¹ã䜿çšã§ããŸãã Sweep Parametersèšå®ã§ãè©äŸ¡åºæºãèšå®ã§ããŸãã AUCãããé©åãªåºæºãšããŠèšå®ããŸãã
ãã©ã¡ãŒã¿ãéžæããåŸãåŸãããæé©ãªã¢ãã«ã¯ããµã³ãã«ã®å¥ã®éšåã䜿çšããŠè©äŸ¡ãããŸãã æé©ãªãã©ã¡ãŒã¿ãŒãæã€ã¢ãã«ã®çµæã¯ãå®éšå šäœã®æåŸã«è¡šç€ºãããŸãã
æåŸã®ãããã¯ãExecute R scriptãã®æåã®åºå£ãã¯ãªãã¯ãããšãçµæãåŸãããŸãã
æè¯ã®çµæã¯ãTwo-Class SVMã«ãã£ãŠç€ºãããŸããã Sweep Parametersãããã¯ã®åºå£ãã¯ãªãã¯ããŠãæé©ãªãã©ã¡ãŒã¿ãŒã確èªã§ããŸãã
ãã®çµæããã¹ãããŒã¿ã§ä¹å®¢ãçãæ®ããã©ãããå€æããããã«ãæé©ãªãã©ã¡ãŒã¿ãŒã§ã¢ãã«ãå®è¡ã§ããŸãã
æé©ãªã¢ãã«ã決å®ããããã®å®éšãäœæããåŸãæ°ããã¢ãã«ã¯éåžžã«åçŽã«ãªããŸãã
圌女ã¯ä»¥åã®ãã¹ãŠã®å®éšãšåããããã¯ã䜿çšããŠããŸãã ã¢ãã«ã¯ãTrain Modelå ¥åã§kaggle_titanic_trainããŒã¿ã»ããå šäœãåãåããScore Modelãããã¯ã䜿çšããŠãå¿ èŠãªãã¹ãŠã®å±æ§ïŒRã䜿çšããŠèšç®ããïŒã§kaggle_titanic_testããŒã¿ã»ãããè©äŸ¡ïŒäºæž¬ïŒããŸãã 次ã«ãä¹å®¢IDã®åã®ã¿ãã»ããå šäœããæ®ãããäºæž¬ã¯åœŒãçãæ®ããã©ãããå€æããçµæã¯Blobã¹ãã¬ãŒãžã«ä¿åãããŸãã
ãã®ã¢ãã«ã®çµæãKaggleã«éä¿¡ãããšãçµæã¯0.69856ã«ãªããŸããããã¯ã決å®æšæ³ã䜿çšããŠååŸããå€ãããå°ãããRã§ãã¹ãŠã®äœæ¥ãè¡ããŸãã
ãã ããAzure MLïŒTwo Class Decision ForestïŒããé¡äŒŒã®ãã©ã¡ãŒã¿ãŒïŒããªãŒæ°ïŒ100ïŒã§é¡äŒŒã®ã¢ã«ãŽãªãºã ããã¬ãŒãã³ã°ãããšãçµæãKaggleã«éä¿¡ãããš0.00957æ¹åããã0.79426ã«çãããªããŸã ã
ãããã£ãŠããã©ã¡ãŒã¿ã®ãéæ³ã®ãåæã¯ããã培åºçãªæåæ€çŽ¢ãšãããè¯ãçµæãåŸãããšãã§ããå°é家ã®ä»äºããã£ã³ã»ã«ããŸããã
ãããã«
Azure MLã¯ã©ãŠãç°å¢ã䜿çšããŠãKaggleããŒã¿åæã³ã³ãã¹ãã«åå ããå¯èœæ§ãæ€èšããŸããïŒRã³ãŒããå®è¡ããç°å¢ãšããŠãããã³çµã¿èŸŒã¿ã¯ã©ãŠãããŒã«ãéšåçã«äœ¿çšããïŒRããã©ã€ããªåŠçã«æ®ããŠæ°ããæ©èœãçæããïŒ
ãã®ç°å¢ã¯ãç¹ã«åæã®çµæãHadoopã¯ã©ã¹ã¿ãŒã«æ®ããŠããå¿ èŠãããå ŽåïŒMicrosoftãå®è£ ãæäŸããå ŽåïŒããŸãã¯WebãµãŒãã¹ãšããŠå ¬éããå¿ èŠãããå Žåã«ãæ©æ¢°åŠç¿ã®ã¢ããªã±ãŒã·ã§ã³ã«äœ¿çšã§ããŸãã
è¯å®çãªåå¿ããã£ãå Žåããã®ããã°ã«ãæ©æ¢°åŠç¿ã¢ãã«ãWebãµãŒãã¹ãšããŠå ¬éããåæ§ã«è©³çŽ°ãªäŸãæçš¿ããŸãã
Rã®åé¡ã®å®å
šãªè§£æ±ºç
# All data, both training and test set # Assign the training set train <- read.csv(url("http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv")) # Assign the testing set test <- read.csv(url("http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/test.csv")) test$Survived <- NA all_data = rbind (train,test) # Passenger on row 62 and 830 do not have a value for embarkment. # Since many passengers embarked at Southampton, we give them the value S. # We code all embarkment codes as factors. all_data$Embarked[c(62,830)] = "S" all_data$Embarked <- factor(all_data$Embarked) # Passenger on row 1044 has an NA Fare value. Let's replace it with the median fare value. all_data$Fare[1044] <- median(all_data$Fare, na.rm=TRUE) #Getting Passenger Title all_data$Name <- as.character(all_data$Name) all_data$Title <- sapply(all_data$Name, FUN=function(x) {strsplit(x, split='[,.]')[[1]][2]}) all_data$Title <- sub(' ', '', all_data$Title) all_data$Title[all_data$Title %in% c('Mme', 'Mlle')] <- 'Mlle' all_data$Title[all_data$Title %in% c('Capt', 'Don', 'Major', 'Sir')] <- 'Sir' all_data$Title[all_data$Title %in% c('Dona', 'Lady', 'the Countess', 'Jonkheer')] <- 'Lady' all_data$Title <- factor(all_data$Title) all_data$FamilySize <- all_data$SibSp + all_data$Parch + 1 library(rpart) # How to fill in missing Age values? # We make a prediction of a passengers Age using the other variables and a decision tree model. # This time you give method="anova" since you are predicting a continuous variable. predicted_age <- rpart(Age ~ Pclass + Sex + SibSp + Parch + Fare + Embarked + Title + FamilySize, data=all_data[!is.na(all_data$Age),], method="anova") all_data$Age[is.na(all_data$Age)] <- predict(predicted_age, all_data[is.na(all_data$Age),]) # Split the data back into a train set and a test set train <- all_data[1:891,] test <- all_data[892:1309,] library(randomForest) # Train set and test set str(train) str(test) # Set seed for reproducibility set.seed(111) # Apply the Random Forest Algorithm my_forest <- randomForest(as.factor(Survived) ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked + FamilySize + Title, data=train, importance = TRUE, ntree=1000) # Make your prediction using the test set my_prediction <- predict(my_forest, test, "class") # Create a data frame with two columns: PassengerId & Survived. Survived contains your predictions my_solution <- data.frame(PassengerId = test$PassengerId, Survived = my_prediction) # Write your solution away to a csv file with the name my_solution.csv write.csv(my_solution,file="my_solution.csv" , row.names=FALSE) varImpPlot(my_forest)
åç §è³æ
- æ©æ¢°åŠç¿ã®æŠèŠãšAzure MLã®ã¯ã€ãã¯ã¹ã¿ãŒã
- ããŒã¿ãµã€ãšã³ãã£ã¹ãåãã®Azure Machine Learning