ãã®èšäºã®å¯Ÿè±¡è
ãã®å Žåãæ°ããäºå®ã®æ€çŽ¢ã®æŽå²ãæãäžããããšæã人ããŸãã¯ãæ©æ¢°åŠç¿ãä»äºãããããã¹ãŠãã©ã®ããã«è¡ãã®ãããå°ãªããšãäžåºŠã¯çåã«æã£ãŠãã人ã¯ãããã§åœŒã®è³ªåã«å¯ŸããçããèŠã€ããã§ãããã ã»ãšãã©ã®å Žåãçµéšè±å¯ãªèªè ã¯ããœãããŠã§ã¢ã®éšå
æ°åã§
æ¯å¹ŽãäŒæ¥ãšæ奜家ã®äž¡æ¹ã®ããã°ããŒã¿ãç 究ããå¿ èŠæ§ãé«ãŸã£ãŠããŸãã YandexãGoogleãªã©ã®å€§äŒæ¥ã¯ãRããã°ã©ãã³ã°èšèªãPythonã©ã€ãã©ãªãªã©ã®ããŒã¿ã調æ»ããããã®ããŒã«ããŸããŸã䜿çšããŠããŸãïŒãã®èšäºã§ã¯ãPython 3åãã«äœæãããäŸã瀺ããŸãïŒã ã ãŒã¢ã®æ³å ïŒããã³åç-圌èªèº«ïŒã«ãããš ãéç©åè·¯äžã®ãã©ã³ãžã¹ã¿ã®æ°ã¯24ãæããšã«2åã«ãªããŸãã ããã¯ãæ¯å¹Žã³ã³ãã¥ãŒã¿ãŒã®çç£æ§ãåäžããããã以åã¯ã¢ã¯ã»ã¹ã§ããªãã£ãç¥èã®å¢çãåã³ãå³ã«ã·ãããããããšãæå³ããŸããäž»ã«ãããã°ããŒã¿ã®ç§åŠãã®äœæã«é¢é£ããããã°ããŒã¿ã調æ»ããç¯å²ããããŸããäž»ã«åè¿°ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã䜿çšããããšã§å¯èœã«ãªããåäžçŽåŸã«åããŠæ€èšŒãå¯èœã«ãªããŸããã ãããããæ°å¹ŽåŸã«ã¯ãããšãã°æµäœéåã®ããŸããŸãªåœ¢æ ã絶察çãªç²ŸåºŠã§èª¬æã§ããããã«ãªãã§ãããã
ããŒã¿åæã¯ç°¡åã§ããïŒ
ã¯ã ãããŠãŸãèå³æ·±ãã 人é¡å šäœãããã°ããŒã¿ãç 究ããããã®ç¹å¥ãªéèŠæ§ã«å ããŠãããã°ããŒã¿ãç¬èªã«ç 究ããåãåã£ããåçãïŒæ奜家ããæ奜家ãžïŒãé©çšããããšã¯æ¯èŒçç°¡åã§ãã ä»æ¥ã®åé¡åé¡ã解決ããã«ã¯ãèšå€§ãªéã®ãªãœãŒã¹ããããŸãã ãããã®ã»ãšãã©ãçç¥ãããšãScikit-learnã©ã€ãã©ãªãŒïŒSKlearnïŒã®ããŒã«ã䜿çšã§ããŸãã æåã®åŠç¿ãã·ã³ãäœæããŸãã
clf = RandomForestClassifier() clf.fit(X, y)
ããã§ãå±æ§ã«ãã£ãŠåŒæ°ã®å€ãäºæž¬ïŒãŸãã¯åé¡ïŒã§ããæãåçŽãªãã·ã³ãäœæããŸããã
-ãã¹ãŠãéåžžã«åçŽãªå Žåãããšãã°éè²šäŸ¡æ Œãªã©ã誰ãããŸã äºæž¬ããŠããªãã®ã§ããïŒ
ãããã®èšèã䜿ãã°ãèšäºãå®æãããããšã¯å¯èœã§ããã
èŠç¹ãã€ãã
-ããã§ãç§ã¯ããã«ãã®ããžãã¹ã§ãéã皌ãããšãã§ããŸãããïŒ
ã¯ããè³é100,000ãã«ã®åé¡ã解決ããã«ã¯ãŸã ãŸã é ãã§ããã誰ããç°¡åãªãã®ããå§ããŸããã
ãããã£ãŠãä»æ¥å¿ èŠãªã®ã¯ïŒ
- Python 3ïŒpip3ãã€ã³ã¹ããŒã«ãããŠããïŒ
- ãžã¥ãã¿ãŒ
- SKlearnãNumPyãmatplotlib
äœãã足ããªãå ŽåïŒãã¹ãŠã5åã§å
¥ãã
éå§ããã«ã¯ãPython 3ãããŠã³ããŒãããŠã€ã³ã¹ããŒã«ããŸãïŒã€ã³ã¹ããŒã«äžã«ãWindowsã€ã³ã¹ããŒã©ãŒãããŠã³ããŒãããå Žåã¯ãå¿ããã«pipãã€ã³ã¹ããŒã«ããŠPATHã«è¿œå ããŠãã ããïŒã ãã®åŸã䟿å®äžãPythonçšã®150ãè¶
ããã©ã€ãã©ãªãå«ã Anacondaããã±ãŒãžã䜿çšãããŸããïŒããŠã³ããŒããªã³ã¯ ïŒã Jupyterãnumpyãscikit-learnãmatplotlibã©ã€ãã©ãªã䜿çšããã®ã«äŸ¿å©ã§ããããã¹ãŠã®ã€ã³ã¹ããŒã«ãç°¡çŽ åããŸãã ã€ã³ã¹ããŒã«åŸãAnacondaã³ã³ãããŒã«ããã«ãŸãã¯ã³ãã³ãã©ã€ã³ïŒç«¯æ«ïŒããJupyter Notebookãå®è¡ããŸãïŒãjupyter Notebookãã
ããã«äœ¿çšããã«ã¯ãPythonæ§æã®ããã€ãã®ç¥èãšãªãŒããŒã®æ©èœãå¿ èŠã§ãïŒèšäºã®æåŸã«ããPython 3ã®åºæ¬ããå«ãæçšãªãªãœãŒã¹ãžã®ãªã³ã¯ãæäŸãããŸãïŒã
éåžžã©ãããäœæ¥ã«å¿ èŠãªã©ã€ãã©ãªãã€ã³ããŒãããŸãã
import numpy as np from pandas import read_csv as read
-ããŠãNumpyã§ã¯ããã¹ãŠãæ確ã§ãã ãããããªããã³ããããã«read_csvãå¿ èŠãªã®ã§ããããïŒ
å©çšå¯èœãªããŒã¿ããèŠèŠåããããšäŸ¿å©ãªå ŽåããããŸããããã®å Žåã¯ããããæäœãããããªããŸãã ããã«ã人æ°ã®ããKaggleãµãŒãã¹ã®ã»ãšãã©ã®ããŒã¿ã»ããã¯ããŠãŒã¶ãŒãCSV圢åŒã§ã³ã³ãã€ã«ããŸãã
ãããŠãããã¯ãã³ããèŠèŠåããããŒã¿ã»ããã®ããã§ã
ããã§ã[ã¢ã¯ãã£ããã£]åã«ã¯ãåå¿ãé²è¡äžãã©ããã衚瀺ãããŸãïŒè¯å®ã®å Žåã¯1ãåŠå®ã®å Žåã¯0ïŒã ãããŠãæ®ãã®åã¯äžé£ã®èšå·ãšããã«å¯Ÿå¿ããå€ïŒåå¿äžã®ç©è³ªã®ããŸããŸãªå²åããããã®åéç¶æ ãªã©ïŒã§ãã
ããã§ã[ã¢ã¯ãã£ããã£]åã«ã¯ãåå¿ãé²è¡äžãã©ããã衚瀺ãããŸãïŒè¯å®ã®å Žåã¯1ãåŠå®ã®å Žåã¯0ïŒã ãããŠãæ®ãã®åã¯äžé£ã®èšå·ãšããã«å¯Ÿå¿ããå€ïŒåå¿äžã®ç©è³ªã®ããŸããŸãªå²åããããã®åéç¶æ ãªã©ïŒã§ãã
ãããŒã¿ã»ãããšããèšèã䜿çšããããšãèŠããŠããŸããã ããã¯äœã§ããïŒ
ããŒã¿ã»ããã¯ããŒã¿ã®ãµã³ãã«ã§ãããéåžžã¯ãèšå·ã®ã»ããã®ã»ãããâãäžéšã®å€ãïŒããšãã°ãäœå® äŸ¡æ ŒããŸãã¯ããã€ãã®ã¯ã©ã¹ã®ã»ããã®ã·ãŒã±ã³ã¹çªå·ïŒã®åœ¢åŒã§ããXã¯èšå·ã®ã»ããã§ã yã¯åãã§ããå€ã ããšãã°ãå€ãã®ã¯ã©ã¹ã®æ£ããã€ã³ããã¯ã¹ã決å®ããããšã¯åé¡ã¿ã¹ã¯ã§ãããã¿ãŒã²ããå€ïŒäŸ¡æ Œããªããžã§ã¯ããŸã§ã®è·é¢ãªã©ïŒãæ€çŽ¢ããããšã¯ã©ã³ãã³ã°ã¿ã¹ã¯ã§ãã æ©æ¢°åŠç¿ã®çš®é¡ã®è©³çŽ°ã«ã€ããŠã¯ãèšäºããã³åºçç©ãã芧ãã ãããçŽæã©ãããèšäºãžã®ãªã³ã¯ã¯èšäºã®æåŸã«ãããŸãã
ããŒã¿ãç¥ã
ææ¡ãããããŒã¿ã»ããã¯ããããããŠã³ããŒãã§ããŸã ã ãœãŒã¹ããŒã¿ãžã®ãªã³ã¯ãšç¹æ§ã®èª¬æã¯ãèšäºã®æåŸã«ãããŸãã æ瀺ããããã©ã¡ãŒã¿ãŒã«åŸã£ãŠããã®ã¯ã€ã³ãŸãã¯ãã®ã¯ã€ã³ãã©ã®ã°ã¬ãŒãã«å±ããŠããããå€æããããæ±ããããŸãã ããã§äœãèµ·ãã£ãŠããããææ¡ã§ããŸãã
path = "% %/wine.csv" data = read(path, delimiter=",") data.head()
JupyterããŒãããã¯ã§äœæ¥ãããšã次ã®çããåŸãããŸãã
ããã¯ãåæãå¯èœã«ãªã£ãããšãæå³ããŸãã æåã®åã®ã°ã¬ãŒãå€ã¯ãã¯ã€ã³ãã©ã®ã°ã¬ãŒãã«å±ããŠãããã瀺ããæ®ãã®åã¯ã¯ã€ã³ãåºå¥ã§ããèšå·ã瀺ããŠããŸãã data.headïŒïŒã®ä»£ããã«ããŒã¿ã®ã¿ãå ¥åããŠã¿ãŠãã ãã-ããŒã¿ã»ããã®ãäžéšãã ãã§ãªãã衚瀺ã§ããããã«ãªããŸããã
åé¡ã¿ã¹ã¯ã®ç°¡åãªå®è£
èšäºã®äž»èŠéšåã«ç®ãåããŸã-åé¡åé¡ã解決ããŸãã ãã¹ãŠé ïŒ
- ãã¬ãŒãã³ã°ãµã³ãã«ãäœæãã
- ã©ã³ãã ã«éžæããããã©ã¡ãŒã¿ãŒãšãããã«å¯Ÿå¿ããã¯ã©ã¹ã§è»ãèšç·Žããã
- å®è£ ããããã·ã³ã®å質ãèšç®ããŸã
å®è£ ãèŠãŠã¿ãŸãããïŒã³ãŒãã®åæç²ã¯ãããŒãããã¯ã®åå¥ã®ã»ã«ã§ãïŒã
X = data.values[::, 1:14] y = data.values[::, 0:1] from sklearn.cross_validation import train_test_split as train X_train, X_test, y_train, y_test = train(X, y, test_size=0.6) from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(n_estimators=100, n_jobs=-1) clf.fit(X_train, y_train) clf.score(X_test, y_test)
X-笊å·ïŒ1ã13åïŒã y-ã¯ã©ã¹ïŒ0çªç®ã®åïŒã®é åãäœæããŸãã 次ã«ããœãŒã¹ããŒã¿ãããã¹ãããã³ãã¬ãŒãã³ã°ãµã³ãã«ãåéããããã«ãscikit-learnã§å®è£ ããã䟿å©ãªçžäºæ€èšŒé¢æ°train_test_splitã䜿çšããŸãã æ¢è£œã®ãµã³ãã«ãããã«åŠçããŸããRandomForestClassifierãã¢ã³ãµã³ãã«ããsklearnã«ã€ã³ããŒãããŸãã ãã®ã¯ã©ã¹ã«ã¯ããã·ã³ã®ãã¬ãŒãã³ã°ãšãã¹ãã«å¿ èŠãªãã¹ãŠã®ã¡ãœãããšæ©èœãå«ãŸããŠããŸãã ã¯ã©ã¹randomForestClassifierãclf ïŒåé¡åïŒå€æ°ã«å²ãåœãŠãŠããã fitïŒïŒé¢æ°ãåŒã³åºãããšã«ãããclfã¯ã©ã¹ããè»ãèšç·ŽããŸããããã§ã X_trainã¯y_trainã«ããŽãªãŒã®ç¬Šå·ã§ãã ããã§ãã¯ã©ã¹ã«çµã¿èŸŒãŸããã¹ã³ã¢ã¡ããªãã¯ã䜿çšããŠããããã®y_testã«ããŽãªã®çã®å€ã«ãã£ãŠX_testã«å¯ŸããŠäºæž¬ãããã«ããŽãªã®ç²ŸåºŠãå€æã§ããŸãã ãã®ã¡ããªãã¯ã䜿çšãããšã0ã1ã®ç²ŸåºŠå€ã衚瀺ãããŸãã1<=> 100ïŒ å®äºã§ãïŒ
RandomForestClassifierãšçžäºæ€èšŒã¡ãœããtrain_test_splitã«ã€ããŠ
RandomForestClassifierã®clfãåæåãããšããå€n_estimators = 100ãn_jobs = -1ãèšå®ããŸãã æåã®å€ã¯ãã©ã¬ã¹ãå
ã®ããªãŒã®æ°ã«ã2çªç®ã®å€ã¯é¢é£ããããã»ããµã³ã¢ã®æ°ã«é¢ä¿ããŸãïŒ-1ã§ã¯ãã¹ãŠã®ã³ã¢ãé¢ä¿ããããã©ã«ãã¯1ã§ãïŒã ãã®ããŒã¿ã»ããã䜿çšããŠããããã¹ããµã³ãã«ãååŸããå Žæããªãããã train_test_splitã䜿çšããŠãããŒã¿ããã¬ãŒãã³ã°ãµã³ãã«ãšãã¹ããµã³ãã«ã«ã ã¹ããŒãã« ãåå²ããŸãã èå³ã®ããã¯ã©ã¹ãŸãã¯ã¡ãœããã匷調衚瀺ããJupyterç°å¢ã§Shift + TabãæŒããšããããã«ã€ããŠè©³ããç¥ãããšãã§ããŸãã
-粟床ãè¯ãã ãã€ããããªæãïŒ
åé¡ã®åé¡ã解決ããããã®éèŠãªèŠå ã¯ãã«ããŽãªã®ãã¬ãŒãã³ã°ãµã³ãã«ã«æé©ãªãã©ã¡ãŒã¿ãŒãéžæããããšã§ãã ããå€ãã®ãããè¯ãã ããããåžžã«ã§ã¯ãããŸããïŒãã ããããã«ã€ããŠã¯ã€ã³ã¿ãŒãããã§ã詳ããèªãããšãã§ããŸããããããããåå¿è åãã«èšèšãããå¥ã®èšäºãæžããŠããŸãïŒã
ãç°¡åããããã ãã£ãšèïŒ
ãã®ããŒã¿ã»ããã®åŠç¿ææãèŠèŠåããããã«ãäŸãæããããšãã§ããŸãïŒ2次å 空éã«èšå®ãããã©ã¡ãŒã¿ãŒã2ã€ã ãæ®ããŠãèšç·Žããããµã³ãã«ã®ã°ã©ããäœæããŸãïŒãã®ã°ã©ãã®ãããªãã®ãåŸãããŸããããã¯èšç·Žã«äŸåããŸãïŒïŒ
ã¯ããæšèã®æ°ãæžå°ãããšãèªè粟床ãäœäžããŸãã ã°ã©ãã¯ããã»ã©çŸããã¯ãããŸããã§ããããåçŽãªåæã§ã¯æ±ºå®çã§ã¯ãããŸããã§ããããã·ã³ããã¬ãŒãã³ã°ãµã³ãã«ïŒãã€ã³ãïŒãéžæããäºæž¬å€ïŒãã£ã«ïŒã®å€ãšæ¯èŒããæ¹æ³ãã¯ã£ãããšèŠããŸãã
ããã§å®è£
from sklearn.preprocessing import scale X_train_draw = scale(X_train[::, 0:2]) X_test_draw = scale(X_test[::, 0:2]) clf = RandomForestClassifier(n_estimators=100, n_jobs=-1) clf.fit(X_train_draw, y_train) x_min, x_max = X_train_draw[:, 0].min() - 1, X_train_draw[:, 0].max() + 1 y_min, y_max = X_train_draw[:, 1].min() - 1, X_train_draw[:, 1].max() + 1 h = 0.02 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) pred = clf.predict(np.c_[xx.ravel(), yy.ravel()]) pred = pred.reshape(xx.shape) import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF']) cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF']) plt.figure() plt.pcolormesh(xx, yy, pred, cmap=cmap_light) plt.scatter(X_train_draw[:, 0], X_train_draw[:, 1], c=y_train, cmap=cmap_bold) plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.title("Score: %.0f percents" % (clf.score(X_test_draw, y_test) * 100)) plt.show()
èªè ã«ããªããããã©ã®ããã«æ©èœããã®ããèŠã€ããŠããããŸãã
æåŸã®èšè
ãã®èšäºããPythonã§ã®åçŽãªæ©æ¢°åŠç¿ã®éçºã«å°ãæ £ããã®ã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸãã ãã®ç¥èã¯ãBigData + Machine Learningã®ãããªãç 究ã®ããã®éäžã³ãŒã¹ãç¶ç¶ããã®ã«ååã§ãã äž»ãªãã®ã¯ãåçŽãªãã®ããåŸã ã«æ·±ãããããšã§ãã ãããŠãããã«çŽæãããŠããæçšãªãªãœãŒã¹ãšèšäºããããŸãïŒ
èè ããã®èšäºãäœæãããã£ãããšãªã£ãè³æ
æŽå²çãšãã»ã€ïŒ
- ä»æ¥æ©æ¢°åŠç¿ãå¿ èŠãªçç±ãšããã®åå ã«ã€ããŠã®èšäº
- å¥åŠãªããšã«ãæ©æ¢°åŠç¿å šè¬ã«é¢ããããªãè€éãªãŠã£ãããã£ã¢ã®èšäº
æ©æ¢°åŠç¿ã®è©³çŽ°ïŒ
- æ©æ¢°åŠç¿ãšããŒã¿ãµã€ãšã³ã¹ã«ã€ããŠ
- åããã¹ãããã®å¥ã®èå³æ·±ãè¬æŒ
PythonãåŠã¶ããããŒã¿ãæ±ãåã« ïŒ
- ITMOããã³SPbAUã®åªç§ãªæåž«ã«ããéåžžã«åœ¹ç«ã€ã³ãŒã¹
- PythonåŠç¿è åãã®åªãããã·ã¢èªã®ãªãœãŒã¹ã§ãããè±èªã®ãªãœãŒã¹ã«ã¯ããŸãåããŠããŸãã
ãã ããsklearnã©ã€ãã©ãªãŒãæé©ã«éçºããã«ã¯ãè±èªã®ç¥èã圹ç«ã¡ãŸãã ãã®ãœãŒã¹ã«ã¯å¿ èŠãªç¥èããã¹ãŠå«ãŸããŠããŸãïŒããã¯APIãªãã¡ã¬ã³ã¹ã§ããããïŒã
Pythonã§ã®æ©æ¢°åŠç¿ã®äœ¿çšã«é¢ãããã詳现ãªç 究ãå¯èœã«ãªããYandexã®æåž«ã®ãããã§ç°¡åã«ãªããŸãã- ãã®ã³ãŒã¹ã«ã¯ãã·ã¹ãã å šäœã®ä»çµã¿ã説æããæ©æ¢°åŠç¿ã®çš®é¡ã«ã€ããŠè©³ãã説æããããã«å¿ èŠãªãã¹ãŠã®ããŒã«ããããŸã
ä»æ¥ã®ããŒã¿ã»ããã®ãã¡ã€ã«ã¯ããããååŸãã ããããã«å€æŽãããŸããã
ããŒã¿ãååŸããå ŽæããŸãã¯ãããŒã¿ã»ããã¹ãã¬ãŒãžã-ããŸããŸãªãœãŒã¹ãã倧éã®ããŒã¿ãåéãããŸã ã å®éã®ããŒã¿ã«ã€ããŠãã¬ãŒãã³ã°ããããšã¯éåžžã«äŸ¿å©ã§ãã
ãã®èšäºãæ¹åããããã®ãµããŒãã«æè¬ãããšãšãã«ãããããçš®é¡ã®å»ºèšçãªæ¹å€ã«åããŸãã