ããŸããŸãªãµãŒãã¹ãæäŸããã¢ãã€ã«ãªãã¬ãŒã¿ãŒã¯ãèšå€§ãªéã®çµ±èšããŒã¿ãèç©ããŸãã ç§ã¯ããªãã¬ãŒã¿ãŒã®æäœäžã«1æ¥ã«æ°çŸã®ã¬ãã€ãã®çµ±èšæ å ±ãçæããå å ¥è ãã©ãã£ãã¯ç®¡çã·ã¹ãã ãå®è£ ããéšéã代衚ããŠããŸãã ç§ã¯ããããã®ããã°ããŒã¿ã§æ倧éã®æçšãªæ å ±ãæããã«ããæ¹æ³ãšãã質åã«èå³ããããŸããã ããã°ããŒã¿ã®å®çŸ©ã«ãããVã®1ã€ãè¿œå ã®åå ¥ã§ããããšã¯ãäœããããŸããã
ããŒã¿ãã€ãã³ã°ã®å°é家ã§ã¯ãªãããã®ä»äºãåŒãåããŸããã ããã«å€ãã®çåãçããŸãããåæã«ã¯ã©ã®ãããªæè¡çæ段ã䜿çšããå¿ èŠããããŸããïŒ æ°åŠãçµ±èšãç¥ãã«ã¯ã©ã®ã¬ãã«ã§ååã§ããïŒ ã©ã®æ©æ¢°åŠç¿æ¹æ³ãç¥ã£ãŠããå¿ èŠããããŸããïŒ ãããšããRããŒã¿ãŸãã¯PythonããŒã¿ãç 究ããããã®å°éçãªèšèªãåŠã³å§ããã»ããããã§ããããïŒ
ç§ã®çµéšã瀺ããŠããããã«ãããŒã¿èª¿æ»ã®åæã¬ãã«ã§ã¯ããã»ã©å¿ èŠãããŸããã ãã ããç°¡åã«èª¬æããããã«ãããŒã¿ã調æ»ããããã®å®å šãªã¢ã«ãŽãªãºã ãæ確ã«ç€ºãç°¡åãªäŸã¯ãããŸããã§ããã ãã®èšäºã§ã¯ã ã¢ã€ãªã¹ãã£ãã·ã£ãŒã®äŸã䜿çšããŠãæåã®ãã¬ãŒãã³ã°ãæåŸãŸã§è¡ããéä¿¡äºæ¥è ã®å®éã®ããŒã¿ã«ç解ãé©çšããŸãã æ¢ã«ããŒã¿ãã€ãã³ã°ã«ç²ŸéããŠããèªè ã¯ãTelecomã®ç« ã«ã¹ãããã§ããŸãã
èŠçŽ
ã¯ããã«ãç 究ã®ããŒãã«åãçµã¿ãŸãããã çŸåšã人工ç¥èœãæ©æ¢°åŠç¿ã深局æ©æ¢°åŠç¿ãšããçšèªã¯å矩èªãšããŠãã䜿çšãããŸãããå®éã«ã¯æ確ã«å®çŸ©ãããéå±€ããããŸãã
- 人工ç¥èœã«ã¯ããã§ãã«ãŒããã§ã¹ããã¬ã€ãããªã©ãæ©æ¢°ãç¥çã¿ã¹ã¯ãå®è¡ãããã¹ãŠã®ã¿ã¹ã¯ãã¹ããŒããèªèããŠè³ªåã«çããããšãã§ãããã«ããŒãããŸããŸãªãããããå«ãŸããŸãã
- æ©æ¢°åŠç¿ã¯ããçãæŠå¿µã§ãããã³ã³ãã¥ãŒã¿ãŒãç¹å®ã®ã¢ã¯ã·ã§ã³ãå®è¡ããããã«èšç·ŽãããŠããã¿ã¹ã¯ã®ã¯ã©ã¹ã«å±ããŸããããšãã°ãå±æ§ã®ã»ããã«åŸã£ãŠãªããžã§ã¯ããåé¡ããããé³æ¥œãæ ç»ãæšå¥šãããããæ£ããçãããããŸãã
- ãã£ãŒãã©ãŒãã³ã°ãšã¯ããã¿ãŒã³èªèãããã¹ã翻蚳ãªã©ããã¥ãŒã©ã«ãããã¯ãŒã¯ãšããã°ããŒã¿ã䜿çšããŠè§£æ±ºãããã¿ã¹ã¯ãæå³ããŸãã
ãã®èšäºã§ã¯ãæ©æ¢°åŠç¿ã«ã€ããŠèª¬æããŸãã åŠç¿ã®2ã€ã®æ¹æ³ãåºå¥ããŸãã
- å çãš
- å çãªã
æåž«ã®å Žåãããã¯æ£ããçãã®ããŒã¿ããããšãã§ãã 次ã«ããã®ããŒã¿ã»ããã§ã¢ã«ãŽãªãºã ããã¬ãŒãã³ã°ããäºæž¬ã«é©çšã§ããŸãã ãããã®ã¢ã«ãŽãªãºã ã«ã¯ãåé¡ãšååž°ãå«ãŸããŸãã åé¡ãšã¯ãäžé£ã®ç¹æ§ã«åŸã£ãŠç¹å®ã®ã¯ã©ã¹ã«ãªããžã§ã¯ããå²ãåœãŠãããšã§ãã ããšãã°ãè»ã®çªå·ã®èªèããŸãã¯å»åŠã«ãããèªèãç æ°ã®èšºæããŸãã¯éè¡éšéã§ã®ã¯ã¬ãžããã¹ã³ã¢ãªã³ã°ã ååž°ã¯ãæ ªäŸ¡ãªã©ã®éèŠãªå€æ°ã®äºæž¬ã§ãã
æåž«ïŒèªå·±åŠç¿ïŒããªããã°ãããŒã¿ã«é ããããã¿ãŒã³ãæ€çŽ¢ããŸãã ãã®ãããªã¢ã«ãŽãªãºã ã«ã¯ãã¯ã©ã¹ã¿ãªã³ã°ãå«ãŸããŸãã ããšãã°ããã¹ãŠã®äž»èŠãªå°å£²ãã§ãŒã³ã¯ã顧客ã®è³Œå ¥ãã¿ãŒã³ãæ¢ããäžè¬çãªå€§è¡ã§ã¯ãªãã顧客ã®ã¿ãŒã²ããã°ã«ãŒããšé£æºããããšããŸãã
ååž°ãåé¡ãã¯ã©ã¹ã¿ãªã³ã°ã¯ãããŒã¿èª¿æ»ã®äž»èŠãªã¢ã«ãŽãªãºã ã§ãããããããããæ€èšããŸãã
ããŒã¿ãã€ãã³ã°
ããŒã¿ãã€ãã³ã°ã¢ã«ãŽãªãºã ã¯ãç¹å®ã®äžé£ã®æé ã§æ§æãããŠããŸãã ã¿ã¹ã¯ãšäœ¿çšå¯èœãªããŒã¿ã«å¿ããŠãäžé£ã®ã¹ãããã¯ç°ãªãå ŽåããããŸãããäžè¬çãªæ¹åã¯åžžã«æ±ºå®ãããŸãã
- ããŒã¿ã®åéãšç²Ÿè£œã å®è·µã瀺ãããã«ããã®æ®µéã§ã¯ããŒã¿åæå šäœã®æ倧90ïŒ ãå ããå¯èœæ§ããããŸãã
- ããŒã¿ããã®ååžãçµ±èšã®èŠèŠçåæ;
- å€æ°ïŒæ©èœïŒéã®é¢ä¿ïŒçžé¢ïŒã®åæã
- ã¢ãã«ã®æ§ç¯ã«äœ¿çšãããæ©èœã®éžæãšå®çŸ©ã
- ãã¬ãŒãã³ã°ã¢ãã«ãšãã¹ãã¢ãã«ã®ããŒã¿ãžã®åé¢ã
- ãã¬ãŒãã³ã°ããŒã¿ã®ã¢ãã«ã®æ§ç¯/ãã¹ãããŒã¿ã®çµæã®è©äŸ¡ã
- åŸãããã¢ãã«ã®è§£éãçµæã®èŠèŠåã
ã¢ã«ãŽãªãºã ãææ¡ããåæã«ã©ã®ããŒã«ã䜿çšããå¿ èŠããããŸããïŒ ExcelããMathLabãªã©ã®å°çšããŒã«ãŸã§ãå€æ°ã®ããŒã«ããããŸãã Pythonãç¹æ®ãªã©ã€ãã©ãªãšãšãã«äœ¿çšããŸãã å°é£ãæããå¿ èŠã¯ãããŸããããã¹ãŠãç°¡åã§ãïŒ
- Anacondaãšãã1ã€ã®ãã£ã¹ããªãã¥ãŒã·ã§ã³ã§Pythonãšãã¹ãŠã®æ°åŠããã±ãŒãžãããŠã³ããŒãããŸã
- Linuxã§ã€ã³ã¹ããŒã«ããŠãåé¡ã¯çºçããŸããïŒ bash Anaconda2-4.4.0-Linux-x86_64.sh
- å®è¡ïŒ jupyterããŒãããã¯
- ããã«ããããã©ãŠã¶ãèªåçã«éããŸãã
- ã¢ããªã±ãŒã·ã§ã³ãæ©èœããŠããããšã確èªããŸãïŒ ãHelloWorldïŒããå°å·ããŸã
- Ctrl + EnterãæŒããŠããã¹ãŠãæ£åžžã§ããããšã確èªããŸãã
ã€ã³ã¿ãŒãããäžã®IPython Notebookã§ã®èªç¿ã«ã€ããŠã¯ãããšãã°ãç°¡åãªçŽ¹ä»ïŒ Ipython Notebook 2.0ã®æŠèŠãªã©ãå€ãã®æ å ±ããããŸãã
ãããŠãç§ãã¡ã¯ç 究ãå§ããŠããŸãïŒ
ããŒã¿ã®åéãšç²Ÿè£œ
Irisã®äŸã§ã¯ããã¹ãŠã®ããŒã¿ãåéãããå ¥åãããŸããã ããããããŒãããŠèŠãã ãã§ãïŒ
# : import numpy as np import pandas as pd from sklearn import datasets from sklearn import linear_model from sklearn.cluster import KMeans from sklearn import cross_validation from sklearn import metrics from pandas import DataFrame %pylab inline
次ïŒ
# : iris = datasets.load_iris() # print iris.feature_names # , 10 : print iris.data[:10] # : print iris.target_names print iris.target
ããŒã¿ã»ããã¯ã2çš®é¡ã®ã¢ã€ãªã¹ã®è±ã³ãã®é·ã/å¹ ã§æ§æãããŠããããšãããããŸããããçãšè±ã³ãã§ãã 圌ããã¢ã€ãªã¹ã®åºèº«ã ãšèããªãã§ãã ããïŒã ã¿ãŒã²ããå€æ°ã¯ã¢ã€ãªã¹åçš®ã§ãïŒ0-Setosaã1-Versicolorã2-Virginicaã ãããã£ãŠãç§ãã¡ã®ã¿ã¹ã¯ã¯ãå©çšå¯èœãªããŒã¿ã«åºã¥ããŠãè±ã³ãã®ãµã€ãºãšã¢ã€ãªã¹ã®åçš®ãšã®é¢ä¿ãèŠã€ããããšã§ãã
ããŒã¿ãæäœããããã«ããããããDataFrameãäœæããŸãã
iris_frame = DataFrame(iris.data) # , : iris_frame.columns = iris.feature_names # : iris_frame['target'] = iris.target # : iris_frame['name'] = iris_frame.target.apply(lambda x : iris.target_names[x]) # , : iris_frame
圌ããæãã§ããããšã解決ããããã«èŠããïŒ
èšè¿°çµ±èš
# : pyplot.figure(figsize(20, 24)) plot_number = 0 for feature_name in iris['feature_names']: for target_name in iris['target_names']: plot_number += 1 pyplot.subplot(4, 3, plot_number) pyplot.hist(iris_frame[iris_frame.name == target_name][feature_name]) pyplot.title(target_name) pyplot.xlabel('cm') pyplot.ylabel(feature_name[:-4])
ãã®ãããªãã¹ãã°ã©ã ãèŠããšãçµéšè±å¯ãªç 究è ã¯ããã«æåã®çµè«ãå°ãåºãããšãã§ããŸãã äžéšã®å€æ°ã®ååžãæ£åžžã§ããããã«èŠããã ãã§ãã ãã£ãšæ確ã«ããããšããŸãããã Irisã®çš®é¡ã«å¿ããŠã笊å·éã®äŸåé¢ä¿ãšããŒãã«ã«è²ãä»ããããŒãã«ãäœæããŸãã
import seaborn as sns sns.pairplot(iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)','name']], hue = 'name')
ããã§ã¯ãçµéšã®æµ ãç 究è ã§ãããè±ã³ãã®å¹ ïŒcmïŒããšãè±ã³ãã®é·ãïŒcmïŒãã«åŒ·ãäŸåé¢ä¿ãããããšãããããŸããåãç·ã«æ²¿ã£ãŠç¹ã䌞ã³ãŠããŸãã ãããŠãååãšããŠãåãç¹æ§ã«åºã¥ããŠãåé¡ãæ§ç¯ããããšãå¯èœã§ãã ãããã¯éåžžã«ã³ã³ãã¯ãã«è²å¥ã«ã°ã«ãŒãåãããŸãã ããããããšãã°ãå€æ°ãsepal widthïŒcmïŒãããã³ãsepal lengthïŒcmïŒãã䜿çšãããšãå®æ§çãªåé¡ãæ§ç¯ã§ããŸããã VersicolorãšVirginicaã®åçš®ã«é¢é£ãããã€ã³ãã¯æ··åšããŠããŸãã
å€æ°éã®äŸåé¢ä¿
次ã«ãäŸåé¢ä¿ã®æ°åŠçãªå€ãèŠãŠã¿ãŸãããã
iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']].corr()
ããèŠèŠçãªåœ¢ã§ãæšèã®äŸåæ§ã®ããŒãããããäœæããŸãã
import seaborn as sns corr = iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']].corr() mask = np.zeros_like(corr) mask[np.triu_indices_from(mask)] = True with sns.axes_style("white"): ax = sns.heatmap(corr, mask=mask, square=True, cbar=False, annot=True, linewidths=.5
çžé¢ä¿æ°ã®å€ã¯æ¬¡ã®ããã«è§£éãããŸãã
- æ倧0.2-éåžžã«åŒ±ãçžé¢
- æ倧0.5-匱ã
- æ倧0.7-å¹³å
- 0.9ãŸã§-é«
- 0.9以äž-éåžžã«é«ã
å®éãå€æ°ãè±ã³ãã®é·ãïŒcmïŒããšãè±ã³ãã®å¹ ïŒcmïŒãã®éã«ã¯ã0.96ãšããéåžžã«åŒ·ãäŸåæ§ãæããã«ãªã£ãŠããŸãã
ãµã€ã³ãéžæããŠäœæããŸã
æåã®è¿äŒŒã§ã¯ããã¹ãŠã®å€æ°ãã¢ãã«ã«å«ããã ãã§ãäœãèµ·ãããã確èªã§ããŸãã ãã®åŸãåé€ãããµã€ã³ãšäœæãããµã€ã³ãèããããšãå¯èœã«ãªããŸãã
ãã¬ãŒãã³ã°ããã³ãã¹ãããŒã¿
ããŒã¿ããã¬ãŒãã³ã°çšããŒã¿ãšãã¹ãããŒã¿ã«åå²ããŸãã éåžžããµã³ãã«ã¯66 / 33ã70 / 30ããŸãã¯80/20ã®å²åã§ãã¬ãŒãã³ã°ãšãã¹ãã«åããããŸãã ããŒã¿ã«å¿ããŠãä»ã®ããŒãã£ã·ã§ã³ãå¯èœã§ãã ãã®äŸã§ã¯ããµã³ãã«å šäœã®30ïŒ ããã¹ãããŒã¿ã«å²ãåœãŠãŸãïŒãã©ã¡ãŒã¿ãŒtest_size = 0.3ïŒïŒ
train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']], iris_frame['target'], test_size = 0.3, random_state = 0) # , : print train_data print test_data print train_labels print test_labels
ã¢ãã«æ§ç¯ãµã€ã¯ã«-çµæã®è©äŸ¡
æãèå³æ·±ããã®ã«æž¡ããŸãã
ç·åœ¢ååž°-LinearRegression
ç·åœ¢ååž°ãèŠèŠåããæ¹æ³ã¯ïŒ 2ã€ã®å€æ°éã®é¢ä¿ãèŠããšãç·ããç¹ãŸã§ã®åçŽè·é¢ãå šäœçã«æå°ã«ãªãããã«ç·ãåŒããŠããŸãã æãäžè¬çãªæé©åæ¹æ³ã¯ãåŸé éäžã¢ã«ãŽãªãºã ã«ããæšæºèª€å·®ãæå°åããããšã§ãã åŸé éäžã«ã€ããŠã¯ãããšãã°ãåŸé éäžãšã¯ãã»ã¯ã·ã§ã³ã§èª¬æãããŠããŸãã ããããç·åœ¢ååž°ããªããžã§ã¯ãã®ååžã®æ¹åã«æãè¿ãç·ãèŠã€ããããã®æœè±¡çãªã¢ã«ãŽãªãºã ãšããŠèªãã§ç¥èŠããããšã¯ã§ããŸããã åã«ç解ããããã«ã匷ãäŸåé¢ä¿ãæã€å€æ°ã䜿çšããŠã¢ãã«ãæ§ç¯ããŸãããããã¯ãè±åŒã®é·ãïŒcmïŒããšãè±åŒã®å¹ ïŒcmïŒãã§ãã
from scipy import polyval, stats fit_output = stats.linregress(iris_frame[['petal length (cm)','petal width (cm)']]) slope, intercept, r_value, p_value, slope_std_error = fit_output print(slope, intercept, r_value, p_value, slope_std_error)
ã¢ãã«ã®å質ææšã確èªããŸãã
ïŒ0.41641913228540123ã-0.3665140452167277ã0.96275709705096657ã5.7766609884916033e-86ã0.009612539319328553ïŒ
æãèå³æ·±ãã®ã¯ãå€ã0.96275709705096657ã®r_valueå€æ°éã®çžé¢ä¿æ°ã§ãã ç§ãã¡ã¯ãã§ã«ãããèŠãŠããŸããããããã§åã³ãã®ååšã確信ããŠããŸãã ç¹ãšååž°çŽç·ã§ã°ã©ããæãïŒ
import matplotlib.pyplot as plt plt.plot(iris_frame[['petal length (cm)']], iris_frame[['petal width (cm)']],'o', label='Data') plt.plot(iris_frame[['petal length (cm)']], intercept + slope*iris_frame[['petal length (cm)']], 'r', linewidth=3, label='Linear regression line') plt.ylabel('petal width (cm)') plt.xlabel('petal length (cm)') plt.legend() plt.show()
å®éãèŠã€ãã£ãååž°çŽç·ã¯ç¹ã®ååžã®æ¹åãããç¹°ãè¿ããŠããããšãããããŸãã çŸåšãããšãã°äºè§åœ¢ã®ãªãŒãã¬ããã®é·ããå ¥æã§ããå Žåã¯ãå¹ ãæ£ç¢ºã«æ±ºå®ã§ããŸãïŒ
åé¡
åé¡ãçŽæçã«è¡šçŸããæ¹æ³ã¯ïŒ 2ã€ã®ç¹æ§ãæã€ãªããžã§ã¯ãã®2ã€ã®ã¯ã©ã¹ã«åå²ããåé¡ãèŠããšïŒããšãã°ããµã€ãºãããã£ãŠããå Žåã¯ãªã³ãŽãšããããåé¢ããå¿ èŠããããŸãïŒãåé¡ã¯ãªããžã§ã¯ãã2ã€ã®ã¯ã©ã¹ã«åå²ããå¹³é¢äžã«ç·ãåŒãããšã«ãªããŸãã ããå€ãã®ã¯ã©ã¹ã«åå²ããå¿ èŠãããå Žåã¯ãè€æ°ã®ç·ãæç»ãããŸãã 3ã€ã®å€æ°ãæã€ãªããžã§ã¯ããèŠããšã3次å 空éãšå¹³é¢ã®æç»ã¿ã¹ã¯ã衚瀺ãããŸãã å€æ°ãNã®å ŽåãN次å 空éã®è¶ å¹³é¢ãæ³åããã ãã§ãã
ããã§ãæãæåãªåé¡èšç·Žã¢ã«ãŽãªãºã ã§ãã確ççåŸé éäžæ³ãæ¡çšããŸãã ãã§ã«ç·åœ¢ååž°ã§åŸé éäžã«ééããŠããŸããã確çéäžã¯ãäœæ¥ã®é床ã«ã€ããŠã¯ããã¹ãŠã®ãµã³ãã«ã䜿çšãããã®ã§ã¯ãªããã©ã³ãã ããŒã¿ã䜿çšãããããšã瀺ããŠããŸãã ãããŠããããSVMïŒãµããŒããã¯ã¿ãŒãã·ã³ïŒåé¡æ¹æ³ã«é©çšããŸãã
train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']], iris_frame[['target']], test_size = 0.3, random_state = 0) model = linear_model.SGDClassifier(alpha=0.001, n_iter=100, random_state = 0) model.fit(train_data, train_labels) model_predictions = model.predict(test_data) print metrics.accuracy_score(test_labels, model_predictions) print metrics.classification_report(test_labels, model_predictions)
ã¢ãã«ã®å質ææšã確èªããŸãã
å®éãã¡ããªãã¯ã®å€ã®æ¬è³ªãå®éã«ç解ããããšãªãã¢ãã«ãè©äŸ¡ã§ããŸãã粟床ã粟床ãåçŸçã0.85ãè¶ ããå Žåãããã¯è¯ãã¢ãã«ã§ããã0.95ãè¶ ããå Žåã¯åªããã¢ãã«ã«ãªããŸãã
èŠããã«ããã®äŸã§äœ¿çšãããŠããã¡ããªãã¯ã¯ä»¥äžãåæ ããŠããŸãã
- 粟床ã¯ãæ£ããã¢ãã«å¿çã®å²åã瀺ãäž»èŠãªã¡ããªãã¯ã§ãã ãã®å€ã¯ãã¢ãã«ããã¹ãŠã®ãªããžã§ã¯ãã®æ°ã«äžããæ£è§£ã®æ°ã®æ¯çã«çãããªããŸãã ãã ããã¢ãã«ã®å質ãå®å šã«ã¯åæ ããŠããŸããã ãããã£ãŠã粟床ãšåçŸçãå°å ¥ãããŸãã
ãããã®ã¡ããªãã¯ã¯ãåã¯ã©ã¹ã®èªèå質ïŒè¹åœ©ã®çš®é¡ïŒãšåèšå€ã®äž¡æ¹ã®èŠ³ç¹ããäžããããŸãã åèšå€ã確èªããŸãã
- 粟床-ãã®ã¡ããªãã¯ã¯ãã¢ãã«ãã©ãã ãä¿¡é Œã§ããããã€ãŸãããåœéœæ§ããããã€ãããã瀺ããŸãã ã¡ããªãã¯å€ã¯ãã¢ãã«ãæ£ãããšèŠãªãåçã®æ°ãšããæ£ããããšã¢ãã«ãæ£ãããšèŠãªãããªããžã§ã¯ãã®æ°ã®åèšïŒãã®æ°å€ã¯ãtrue positiveãã§ç€ºãããŸãïŒã®æ¯çã«çããããå®éã¯æ£ãããªãã£ãïŒãã®æ°å€ã誀æ€ç¥ãã§ç€ºãããŸãïŒã åŒã®åœ¢åŒïŒprecision =â true positivesâ /ïŒâ true positivesâ +â false positivesâïŒ
- æ³èµ·ïŒå®å šæ§ïŒ-ãã®ã¡ããªãã¯ã¯ãã¢ãã«ãäžè¬çã«æ£ããçããã©ãã ãæ€åºã§ããããã€ãŸãããåœã®ãã¹ããããã€ãããã瀺ããŸãã ãã®æ°å€ã¯ãã¢ãã«ãæ£ãããšã¿ãªãåçã®æ¯çã«çããããµã³ãã«å ã®ãã¹ãŠã®æ£ããåçã®æ°ã«å¯ŸããŠå®éã«æ£ãããã®ã§ããã åŒã®åœ¢åŒã§ïŒãªã³ãŒã«=ãçã®éœæ§ã/ããã¹ãŠã®éœæ§ã
- f1-scoreïŒf-measureïŒã¯ç²ŸåºŠãšãªã³ãŒã«ã®çµåã§ã
- ãµããŒã-ã¯ã©ã¹ã§èŠã€ãã£ããªããžã§ã¯ãã®æ°ã ã
ãŸããéèŠãªã¢ãã«ã¡ããªãã¯ïŒPR-AUCããã³ROC-AUCããããŸããããšãã°ãããã§èŠã€ããããšãã§ããŸãïŒ æ©æ¢°åŠç¿åé¡ã®ã¡ããªã㯠ã
ãããã£ãŠããã®äŸã®ã¡ããªãã¯å€ã¯éåžžã«åªããŠããããšãããããŸãã ãã£ãŒããèŠãŠã¿ãŸãããã ããããããããããã«ããµã³ãã«ã2ã€ã®åº§æšã§æç»ããã¯ã©ã¹ããšã«è²ä»ãããŸãã
ãŸãããã¹ããµã³ãã«ããã®ãŸãŸè¡šç€ºããŸãã
次ã«ãã¢ãã«ãäºæž¬ããããã«ã å¢çäžã®ãã€ã³ãïŒèµ€ã§å²ãã éšåïŒã誀ã£ãŠåé¡ãããŠããããšãããããŸãã
ãããåæã«ãã»ãšãã©ã®ãªããžã§ã¯ãã¯æ£ããäºæž¬ãããŠããŸãïŒ
çžäºæ€èšŒ
ã©ããããããéåžžã«çãããè¯ãçµæ...äœãééã£ãŠããã®ã§ããããïŒ ããšãã°ã誀ã£ãŠããŒã¿ããã¬ãŒãã³ã°ãµã³ãã«ãšãã¹ããµã³ãã«ã«åå²ããŸããã ãã®ã©ã³ãã æ§ãé€å»ããããã«ããããã亀差æ€èšŒã䜿çšãããŸãã ããã¯ãããŒã¿ããã¬ãŒãã³ã°ãµã³ãã«ãšãã¹ããµã³ãã«ã«æ°ååå²ãããã¢ã«ãŽãªãºã ã®çµæãå¹³ååãããå Žåã§ãã
10åã®ã©ã³ãã ãµã³ãã«ã§ã¢ã«ãŽãªãºã ã®åäœã確èªããŸãããã
train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']], iris_frame['target'], test_size = 0.3) model = linear_model.SGDClassifier(alpha=0.001, n_iter=100, random_state = 0) scores = cross_validation.cross_val_score(model, train_data, train_labels, cv=10) print scores.mean()
çµæãèŠãŸãã äºæ³éãæªåããŸããïŒ 0.860909090909
æé©ãªã¢ã«ãŽãªãºã ãã©ã¡ãŒã¿ãŒã®éžæ
ã¢ã«ãŽãªãºã ãæé©åããããã«ä»ã«äœãã§ããŸããïŒ ã¢ã«ãŽãªãºã èªäœã®ãã©ã¡ãŒã¿ãŒãéžæããŠã¿ãŠãã ããã alpha = 0.001ãn_iter = 100ãã¢ã«ãŽãªãºã ã«è»¢éãããããšãããããŸãã ãããã«æé©ãªå€ãèŠã€ããŸãããã
from sklearn import grid_search train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']], iris_frame['target'], test_size = 0.3) parameters_grid = { 'n_iter' : range(5,100), 'alpha' : np.linspace(0.0001, 0.001, num = 10), } classifier = linear_model.SGDClassifier(random_state = 0) cv = cross_validation.StratifiedShuffleSplit(train_labels, n_iter = 10, test_size = 0.3, random_state = 0) grid_cv = grid_search.GridSearchCV(classifier, parameters_grid, scoring = 'accuracy', cv = cv)grid_cv.fit(train_data, train_labels) print grid_cv.best_estimator_
åºåã§ã¯ãæé©ãªãã©ã¡ãŒã¿ãŒãæã€ã¢ãã«ãååŸããŸãã
SGDClassifierïŒalpha = 0.00089999999999999998ãaverage = Falseãclass_weight = Noneã
ã€ãã·ãã³= 0.1ãeta0 = 0.0ãfit_intercept = Trueãl1_ratio = 0.15ã
learning_rate = 'æé©'ãloss = 'hinge'ãn_iter = 96ãn_jobs = 1
ããã«ãã£= 'l2'ãpower_t = 0.5ãrandom_state = 0ãshuffle = Trueãverbose = 0ã
warm_start = FalseïŒ
ã¢ã«ãã¡= 0.0009ãn_iter = 96ã§ããããšãããããŸãã ã¢ãã«ã§ãããã®å€ã眮ãæããŸãã
train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']], iris_frame['target'], test_size = 0.3) model = linear_model.SGDClassifier(alpha=0.0009, n_iter=96, random_state = 0) scores = cross_validation.cross_val_score(model, train_data, train_labels, cv=10) print scores.mean()
å°ãè¯ããªããŸããïŒ 0.915505050505
ãµã€ã³ãéžæããŠäœæããŸã
å åãè©Šãæéã§ãã ã¢ãã«ããããŸãéèŠã§ãªãç¹åŸŽãã€ãŸããsepal lengthïŒcmïŒããšãsepal widthïŒcmïŒããåé€ããŸãããã ã¢ãã«ã«é²ã¿ãŸãïŒ
train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['petal length (cm)','petal width (cm)']], iris_frame['target'], test_size = 0.3) model = linear_model.SGDClassifier(alpha=0.0009, n_iter=96, random_state = 0) scores = cross_validation.cross_val_score(model, train_data, train_labels, cv=10) print scores.mean()
å°ãè¯ããªããŸããïŒ 0.937727272727
ã¢ãããŒãã説æããããã«ãæ°ãããµã€ã³ãäœæããŸããããè±ã³ãã®èã®é åãšäœãèµ·ããããèŠãŠã¿ãŸãããã
iris_frame['petal_area'] = 0.0 for k in range(0,150): iris_frame['petal_area'][k] = iris_frame['petal length (cm)'][k] * iris_frame['petal width (cm)'][k]
ã¢ãã«ã®ä»£æ¿ïŒ
train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['petal_area']], iris_frame['target'], test_size = 0.3) model = linear_model.SGDClassifier(alpha=0.0009, n_iter=96, random_state = 0) scores = cross_validation.cross_val_score(model, train_data, train_labels, cv=10) print scores.mean()
ãããããã§ããããã®äŸã§ã¯ãè±ã³ãã®è±ã³ãã®é¢ç©ïŒãŸãã¯ãè±ã³ããé·æ¹åœ¢ã§ã¯ãªããå¹ ã«ããé·ãã®ç©ãã§ãããããé¢ç©ã§ãããªãïŒãã¢ã€ãªã¹ã®å€æ§æ§ãæãæ£ç¢ºã«äºæž¬ããããšãããããŸãïŒ 0.942373737374
ããããããã¯ãå€æ°ãè±ã³ãã®é·ãïŒcmïŒããšãè±ã³ãã®å¹ ïŒcmïŒããã¢ã€ãªã¹ãã¯ã©ã¹ã«éåžžã«ããŸãåå²ãããã®è£œåãã¯ã©ã¹ãç·ã«æ²¿ã£ãŠãåŒã䌞ã°ãããšããäºå®ã«ãã£ãŠèª¬æã§ããŸãã
ã¢ãã«æé©åã®äž»ãªæ¹æ³ã«ç²ŸéããŸãããä»åºŠã¯ãã¯ã©ã¹ã¿ãªã³ã°ã¢ã«ãŽãªãºã ïŒæåž«ãªãã®æ©æ¢°åŠç¿ã®äŸïŒãæ€èšããŸãã
ã¯ã©ã¹ã¿ãªã³ã°-K-means
ã¯ã©ã¹ã¿ãªã³ã°ã®æ¬è³ªã¯éåžžã«ç°¡åã§ã-æ¢åã®ãªããžã§ã¯ããã°ã«ãŒãã«åå²ããŠãé¡äŒŒãããªããžã§ã¯ããã°ã«ãŒãã«å«ãŸããããã«ããå¿ èŠããããŸãã çŸåšãã¢ãã«ããã¬ãŒãã³ã°ããããã®æ£ããçãããããŸããããã®ãããã¢ã«ãŽãªãºã èªäœã¯ããªããžã§ã¯ãã®å Žæã®ãè¿æ¥åºŠãã«åŸã£ãŠãªããžã§ã¯ããã°ã«ãŒãåããå¿ èŠããããŸãã
ããšãã°ãæãæåãªK-meansã¢ã«ãŽãªãºã ãèããŠã¿ãŸãããã K-meansãšåŒã°ãããã®ã¯äœããããŸããã ãã®æ¹æ³ã¯ãã¯ã©ã¹ã¿ãŒã®Käžå¿ãèŠã€ããããšã«åºã¥ããŠãããããã¯ã©ã¹ã¿ãŒããããããå±ãããªããžã§ã¯ããŸã§ã®å¹³åè·é¢ã¯æå°ã«ãªããŸãã æåã«ãã¢ã«ãŽãªãºã ã¯Kåã®ä»»æã®äžå¿ã決å®ãã次ã«ãã¹ãŠã®ãªããžã§ã¯ãããããã®äžå¿ã«è¿æ¥ããŠåæ£ãããŸãã ãªããžã§ã¯ãã®Kåã®ã¯ã©ã¹ã¿ãŒãååŸããŸããã ããã«ããããã®ã¯ã©ã¹ã¿ã§ã¯ããªããžã§ã¯ããŸã§ã®å¹³åè·é¢ã«å¿ããŠäžå¿ãåèšç®ããããªããžã§ã¯ããåé åžãããŸãã ãã®ã¢ã«ãŽãªãºã ã¯ãã¯ã©ã¹ã¿ãŒã®äžå¿ãç¹å®ã®ãã«ã¿ã ãã·ããããªããªããŸã§æ©èœããŸãã
train_data, test_data, train_labels, test_labels = cross_validation.train_test_split(iris_frame[['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']], iris_frame[['target']], test_size = 0.3) model = KMeans(n_clusters=3) model.fit(train_data) model_predictions = model.predict(test_data) print metrics.accuracy_score(test_labels, model_predictions) print metrics.classification_report(test_labels, model_predictions)
çµæã確èªããŸãã
ããã©ã«ãã®ãã©ã¡ãŒã¿ãŒã䜿çšããŠããéåžžã«è¯å¥œã§ããããšãããããŸãã粟床ã粟床ãåçŸçã¯0.9ãè¶ ããŠããŸãã åçã§ç¢ºèªããŠãã ããã é©åãªçµæã衚瀺ãããŸãããåžžã«æ£ç¢ºãªçµæã§ã¯ãããŸããã
ã¢ã«ãŽãªãºã ã«ã¯æ¬ ç¹ããããŸã-ãã®æäœã®ããã«ãæ€çŽ¢ããã¯ã©ã¹ã¿ãŒã®æ°ãæå®ããå¿ èŠããããŸãã ãããŠããããäžååãªå Žåãã¢ã«ãŽãªãºã ã®çµæã¯åœ¹ã«ç«ããªããªããŸãã ã¯ã©ã¹ã¿ãŒã®æ°ãããšãã°5ãèšå®ãããšã©ããªããèŠãŠã¿ãŸãããã
å®éã«ã¯ãçµæã¯é©çšãããªãããšãããããŸãã æé©ãªã¯ã©ã¹ã¿ãŒæ°ã決å®ããããã®ã¢ã«ãŽãªãºã ãååšããŸããããã®èšäºã§ã¯ãããã«ã€ããŠè©³ãã説æããŸããã
ã¢ã€ãªã¹ã®ç 究ã«é¢ããçµè«
ããã§ãIrisovã®äŸã䜿çšããŠãæ©æ¢°åŠç¿ã®3ã€ã®äž»èŠãªæ¹æ³ã§ããååž°ãåé¡ãã¯ã©ã¹ã¿ãªã³ã°ãæ€èšŒããŸããã ã¢ã«ãŽãªãºã ã®æé©åãšçµæã®èŠèŠåãå®æœããŸããã éåžžã«è¯ãçµæãåŸãããŸããããããã¯ç¹å¥ã«æºåãããããŒã¿ã»ããã§æåŸ ãããŠããŸããã
å®å šãªPythonããŒãããã¯ã¯Githubã«ãããŸãã Telecomã«æž¡ããŸãã
ãã¬ã³ã
Telecomã«ã¯ãããŒã¿åæã®å©ããåããŠãä»ã®åéïŒéè¡ãä¿éºãå°å£²ïŒã§è§£æ±ºã§ããã¿ã¹ã¯ããããŸãã
- å å ¥è ã®æµåºã®äºæž¬ïŒChurn PreventionïŒ;
- äžæ£é²æ¢
- é¡äŒŒã®ãµãã¹ã¯ã©ã€ããŒã®èå¥ïŒãµãã¹ã¯ã©ã€ããŒããŒã¹ã»ã°ã¡ã³ããŒã·ã§ã³ïŒ;
- ã¯ãã¹ã»ãªã³ã°ïŒã¯ãã¹ã»ãŒã«ïŒããã³è²©å£²éã®åŒãäžãïŒã¢ããã»ãŒã«ïŒ;
- ç°å¢ã«åŒ·ã圱é¿ãããµãã¹ã¯ã©ã€ããŒïŒAlphaãµãã¹ã¯ã©ã€ããŒïŒã®èå¥ã
- å å ¥è ã«ãããããã¯ãŒã¯ãªãœãŒã¹ã®æ¶è²»ã®äºæž¬ïŒãã©ãã£ãã¯éãé話æ°ãSMS;
- ãããã¯ãŒã¯ãæé©åããããã®å å ¥è ã®åãã®èª¿æ»ã
- 課éã·ã¹ãã ã¯ãå å ¥è ã®æ¯æããšè²»çšãé¢çšãå人ããŒã¿ã«é¢ããããŒã¿ãä¿åããŸãã
- å å ¥è ã蚪ãããµã€ãã«é¢ããããŒã¿ã¯ã DPIæ©åšããæœåºãããŸããã
- ããŒã¹ã¹ããŒã·ã§ã³ãããå å ¥è ã®äœçœ®ãå«ããžãªããŒã¿ãååŸã§ããŸãã
- ãµãŒãã¹æ©åšã¯ãå å ¥è ã«ããéä¿¡ãµãŒãã¹ã®æ¶è²»ã«é¢ããããŒã¿ãçæããŸãã
ç§ã®ç®æšã¯ãå å ¥è ãã©ãã£ãã¯ç®¡çã·ã¹ãã ãçæããããŒã¿ã䜿çšããŠè§£æ±ºã§ããã¿ã¹ã¯ã決å®ããããšã§ããã 課éã·ã¹ãã ãå å ¥è ã®ãã©ãã£ãã¯ãæ£ããè©äŸ¡ããã«ã¯ã誰ã/ã©ãã§/ãã€/ã©ã®ã¿ã€ãã®ãã©ãã£ãã¯éãæ¶è²»ããããç¥ãå¿ èŠããããŸãã ãã®æ å ±ã¯ãããããCDRïŒCall Data RecordïŒãã¡ã€ã«ã®åœ¢åŒã§æ©åšããååŸãããŸãã IMSIããã³MSISDNå å ¥è èå¥åãCELL IDåºå°å±ã®æ£ç¢ºãªäœçœ®ã IMEIå å ¥è æ©åšèå¥åãã»ãã·ã§ã³ã¿ã€ã ã¹ã¿ã³ããããã³æ¶è²»ããããµãŒãã¹ã«é¢ããæ å ±ã¯ãcsv圢åŒã§ãããã®ãã¡ã€ã«ã«æžã蟌ãŸããŸãã
æ©å¯æ§ãç¶æããããã«ããã¹ãŠã®ç 究ããŒã¿ã¯éå人åããã圢åŒã«æºæ ããã©ã³ãã ãªå€ã«çœ®ãæããããŸããã ããŒã¿ãèŠãŠã¿ãŸãããïŒ
ãã®ããŒã¿ã«ã©ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãé©çšã§ããŸããïŒ ããšãã°ãäžå®æéã®ãµãã¹ã¯ã©ã€ãã«ããããŸããŸãªã¿ã€ãã®ãã©ãã£ãã¯ã®æ¶è²»ãéèšããã¯ã©ã¹ã¿ãªã³ã°ãå®è¡ã§ããŸãã 次ã®ãããªãã®ãåŸãããã¯ãã§ãã
ããªãã¡ ããšãã°ãã¯ã©ã¹ã¿ãªã³ã°ã®çµæããµãã¹ã¯ã©ã€ããŒãããŸããŸãªæ¹æ³ã§YouTubeã䜿çšããã°ã«ãŒãããœãŒã·ã£ã«ãããã¯ãŒã¯ãé³æ¥œãèŽãã°ã«ãŒãã«åå²ãããŠããããšã瀺ãããå Žåãé¢å¿ãèæ ®ããæéãèšå®ã§ããŸãã éä¿¡äºæ¥è ã¯ããã©ãã£ãã¯ã®ã¿ã€ãããšã«æ¯æããåºå¥ããŠæéè¡šãçºè¡ããããšã§ãããè¡ã£ãŠãããšæããŸãã
å©çšå¯èœãªããŒã¿ã§ä»ã«äœãåæã§ããŸããïŒ å å ¥è ã®æ©åšã«ã¯ããã€ãã®ã±ãŒã¹ããããŸãã ãªãã¬ãŒã¿ãŒã¯ãå å ¥è ã®ããã€ã¹ã®ã¢ãã«ãç¥ã£ãŠãããããšãã°ãç¹å®ã®ãµãŒãã¹ãSamsungãŠãŒã¶ãŒã®ã¿ã«æäŸã§ããŸãã ãŸãã¯ãããŒã¹ã¹ããŒã·ã§ã³ã®åº§æšãããã£ãŠããå Žåã¯ãSamsungæºåž¯é»è©±ã®ååžã®ããŒãããããæç»ã§ããŸãïŒå®éã®åº§æšã¯ãªãããã ãããã¯çŸå®ã«é¢é£ããŠããŸããïŒã
ç¹å®ã®å°åã§ã¯ãä»ã®å°åãããããŒã»ã³ããŒãžãé«ããªãããšããããŸãã ãã®åŸããã®æ å ±ããµã ã¹ã³ã«æäŸããŠãããã¢ãŒã·ã§ã³ãå®æœããããã¹ããŒããã©ã³ã®è²©å£²ã®ããã®ãµãã³ãéãããããããšãã§ããŸãã 次ã«ãå å ¥è ãã€ã³ã¿ãŒãããã«ã¢ã¯ã»ã¹ããããã€ã¹ã®äžäœã¢ãã«ã確èªã§ããŸãã
çŸåšã®ç¶æ³ãé ãããã«ãæ代é ãã®IMEIããŒã¿ããŒã¹ãæ¡çšãããŸããããããã¯ã¢ãããŒãã®æ¬è³ªãå€ããŸããã ãªã¹ãã¯ãã»ãšãã©ã®ããã€ã¹ãAppleãã¢ãã ãSamsungã§ãããMeizuãMicromaxãXiaomiãæåŸã«è¡šç€ºãããããšã瀺ããŠããŸãã
å®éããããã¯ãã¹ãŠãçæéã§èŠã€ããããšãã§ãããœãŒã¹ããŒã¿ã®ã¢ããªã±ãŒã·ã§ã³ã§ãã ãã¡ããããããã®ããŒã¿ã«ãããšãããŸããŸãªçµ±èšãæç³»åã調ã¹ãããæåºéãåæãããããããšãã§ããŸãããæ©æ¢°åŠç¿ã䜿çšããŠäŸåé¢ä¿ãæããã«ããããã«...æ®å¿µãªãããç§ã¯ãŸã ãããè¡ãæ¹æ³ãèŠã€ããŠããŸããã
ãããã£ãŠããã¬ã³ã ã®ããŒã¿èª¿æ»ã«é¢ããçµè«ã¯æ¬¡ã®ãšããã§ãããã¬ã³ã ãªãã¬ãŒã¿ã®ã¿ã¹ã¯ã«å¯Ÿããå®å šãªãœãªã¥ãŒã·ã§ã³ã«ã¯ãå©çšå¯èœãªãã¹ãŠã®æ å ±ã·ã¹ãã ããã®ããŒã¿ãå¿ èŠã§ãããã¹ãŠã®ããŒã¿ã«ã¢ã¯ã»ã¹ã§ããã ãã§ãã¢ãã«ã®ã³ã¹ããå¹æçã«åæžã§ããŸãã
äžè¬çãªçµè«
- åæããŒã¿åæã§ã¯ãéæ³ã¯ãããŸããããã¹ãŠã¯ãçŽæçãªã¬ãã«ã§ç解ããã³é©çšã§ããããã€ãã®åçŽãªã¢ã«ãŽãªãºã ã«åºã¥ããŠããŸãã
- ãããããã¡ãããçµ±èšãæ©æ¢°åŠç¿ãããã°ã©ãã³ã°ã¢ã«ãŽãªãºã ã®çµéšãšæ·±ãç¥èã«ãã£ãŠã®ã¿è§£æ±ºã§ããè€éãªã¿ã¹ã¯ãæ®ã£ãŠããŸãã