Pythonã§ã®å®å šãªæ©æ¢°åŠç¿ã®ãŠã©ãŒã¯ã¹ã«ãŒïŒããŒã2
æ©æ¢°åŠç¿ãããžã§ã¯ãã®ãã¹ãŠã®éšåããŸãšããã®ã¯é£ããå ŽåããããŸãã ãã®ã·ãªãŒãºã®èšäºã§ã¯ãå®éã®ããŒã¿ã䜿çšããæ©æ¢°åŠç¿ããã»ã¹ã®å®è£ ã®ãã¹ãŠã®æ®µéãçµãŠãããŸããŸãªææ³ãã©ã®ããã«çµã¿åããããŠãããã調ã¹ãŸãã
æåã®èšäºã§ã¯ãããŒã¿ã®ã¯ãªãŒãã³ã°ãšæ§é åãæ¢çŽ¢çåæã®å®æœãã¢ãã«ã§äœ¿çšããããã®å±æ§ã»ããã®åéãããã³çµæãè©äŸ¡ããããã®ããŒã¹ã©ã€ã³ã®èšå®ãè¡ããŸããã ãã®èšäºã®å©ããåããŠãPythonã§ã®å®è£ æ¹æ³ãåŠç¿ããããã€ãã®æ©æ¢°åŠç¿ã¢ãã«ãæ¯èŒãããã€ããŒãã©ã¡ããªãã¯ãã¥ãŒãã³ã°ãå®è¡ããŠæé©ãªã¢ãã«ãæé©åãããã¹ãããŒã¿ã»ããã§æçµã¢ãã«ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããŸãã
ãã¹ãŠã®ãããžã§ã¯ãã³ãŒãã¯GitHubã«ãããçŸåšã®èšäºã«é¢é£ãã2çªç®ã®ã¡ã¢åž³ããããŸã ã å¿ èŠã«å¿ããŠã³ãŒãã䜿çšããã³å€æŽã§ããŸãïŒ
ã¢ãã«ã®è©äŸ¡ãšéžæ
ã¡ã¢ïŒ ãã¥ãŒãšãŒã¯ã®å»ºç©ã®ãšãã«ã®ãŒæ å ±ã䜿çšã㊠ãç¹å®ã®å»ºç©ãã©ã®ãšãã«ã®ãŒã¹ã¿ãŒã¹ã³ã¢ãåãåãããäºæž¬ããã¢ãã«ãäœæãããå¶åŸ¡ååž°ã¿ã¹ã¯ã«åãçµãã§ããŸãã äºæž¬ã®ç²ŸåºŠãšã¢ãã«ã®è§£éå¯èœæ§ã®äž¡æ¹ã«é¢å¿ããããŸãã
çŸåšã å€ãã®å©çšå¯èœãªæ©æ¢°åŠç¿ã¢ãã«ããéžæããããšãã§ãããã®è±å¯ãã¯åšå§çã§ãã ãã¡ãããã¢ã«ãŽãªãºã ãéžæããéã«ããã²ãŒãããã®ã«åœ¹ç«ã€æ¯èŒã¬ãã¥ãŒããããã¯ãŒã¯äžã«ãããŸãããäœæ¥äžã«ããã€ãè©ŠããŠãã©ã¡ããåªããŠãããã確èªããããšã奜ã¿ãŸãã ã»ãšãã©ã®å Žåãæ©æ¢°åŠç¿ã¯çè«çãªçµæã§ã¯ãªãçµéšçãªçµæã«åºã¥ããŠããã ã©ã®ã¢ãã«ãããæ£ç¢ºã§ããããäºåã«ç解ããããšã¯ã»ãšãã©äžå¯èœã§ãã
éåžžãç·åœ¢ååž°ãªã©ã®åçŽã§è§£éå¯èœãªã¢ãã«ããéå§ããçµæãæºè¶³ã§ããªãå Žåã¯ãããè€éã§ããéåžžã¯ããæ£ç¢ºãªæ¹æ³ã«é²ãããšããå§ãããŸãã ãã®ã°ã©ãïŒéåžžã«åç§åŠçïŒã¯ãããã€ãã®ã¢ã«ãŽãªãºã ã®ç²ŸåºŠãšè§£éå¯èœæ§ã®é¢ä¿ã瀺ããŠããŸãã
解éå¯èœæ§ãšæ£ç¢ºæ§ïŒ ãœãŒã¹ ïŒã
ããŸããŸãªè€é床ã®5ã€ã®ã¢ãã«ãè©äŸ¡ããŸãã
- ç·åœ¢ååž°ã
- kæè¿åã®æ¹æ³ã
- ãã©ã³ãã ãã©ã¬ã¹ããã
- åŸé ããŒã¹ãã£ã³ã°ã
- ãµããŒããã¯ã¿ãŒã®æ¹æ³ã
ãããã®ã¢ãã«ã®çè«çãªè£ 眮ã§ã¯ãªãããããã®å®è£ ãæ€èšããŸãã çè«ã«èå³ãããå Žåã¯ã çµ±èšåŠç¿ã®çŽ¹ä» ïŒç¡æã§å©çšå¯èœïŒãŸãã¯Scikit-Learnããã³TensorFlowã䜿çšãããã³ãºãªã³æ©æ¢°åŠç¿ãã芧ãã ãã ã äž¡æ¹ã®æ¬ã§ãçè«ã¯å®å šã«èª¬æãããŠãããèšåãããã¡ãœãããRããã³Pythonèšèªã§äœ¿çšããããšã®æå¹æ§ããããã瀺ãããŠããŸãã
æ¬ æå€ãåãã
ããŒã¿ãã¯ãªã¢ãããšãã«ãå€ã®åå以äžãæ¬ èœããŠããåãç Žæ£ããŸãããããŸã å€ãã®å€ããããŸãã æ©æ¢°åŠç¿ã¢ãã«ã¯æ¬ æããŒã¿ãåŠçã§ããªããããããŒã¿ãå ¥åããå¿ èŠããããŸãã
æåã«ãããŒã¿ãæ€èšããã©ã®ããã«èŠããããæãåºããŸãã
import pandas as pd import numpy as np # Read in data into dataframes train_features = pd.read_csv('data/training_features.csv') test_features = pd.read_csv('data/testing_features.csv') train_labels = pd.read_csv('data/training_labels.csv') test_labels = pd.read_csv('data/testing_labels.csv') Training Feature Size: (6622, 64) Testing Feature Size: (2839, 64) Training Labels Size: (6622, 1) Testing Labels Size: (2839, 1)
å
NaN
å€ã¯ãããŒã¿å ã®æ¬ èœããã¬ã³ãŒãã§ãã ãããã¯ããŸããŸãªæ¹æ³ã§å ¥åã§ããŸããããªãåçŽãªäžå€®å€ä»£å ¥æ³ã䜿çšããŸããããã¯ãæ¬ æããŒã¿ã察å¿ããåã®å¹³åå€ã«çœ®ãæããŸãã
以äžã®ã³ãŒãã§ã¯ãäžå€®å€æŠç¥ã§Scikit-Learn Imputer
Imputer
ãäœæããŸãã 次ã«ããã¬ãŒãã³ã°ããŒã¿ã§ãã¬ãŒãã³ã°ãïŒ
imputer.fit
ã䜿çšïŒããã¬ãŒãã³ã°ã»ãããšãã¹ãã»ããã®æ¬ æå€ãåããããã«é©çšããŸãïŒ
imputer.transform
ã䜿çšïŒã ã€ãŸãã ãã¹ãããŒã¿ã«ãªãã¬ã³ãŒãã«ã¯ã ãã¬ãŒãã³ã°ããŒã¿ããã®å¯Ÿå¿ããäžå€®å€ãå ¥åãããŸã ã
ãã¹ãããŒã¿ã»ããããã®æ å ±ããã¬ãŒãã³ã°ã«å ¥ããšãã«ãã¹ãããŒã¿ãæŒæŽ©ããåé¡ãåé¿ããããã«ãããŒã¿ã®ã¢ãã«ããã®ãŸãŸãã¬ãŒãã³ã°ããŸããã
# Create an imputer object with a median filling strategy imputer = Imputer(strategy='median') # Train on the training features imputer.fit(train_features) # Transform both training data and testing data X = imputer.transform(train_features) X_test = imputer.transform(test_features) Missing values in training features: 0 Missing values in testing features: 0
ããã§ãã¹ãŠã®å€ãå ¥åãããã®ã£ããã¯ãªããªããŸããã
æ©èœã®ã¹ã±ãŒãªã³ã°
ã¹ã±ãŒãªã³ã°ã¯ãç¹æ§ã®ç¯å²ãå€æŽããäžè¬çãªããã»ã¹ã§ãã èšå·ã¯ç°ãªãåäœã§æž¬å®ããããããããã¯å¿ èŠãªæé ã§ããã€ãŸãããããã¯ç°ãªãç¯å²ãã«ããŒããŸãã ããã¯ã枬å®å€éã®è·é¢ãèæ ®ã«å ¥ãããµããŒããã¯ãã«æ³ãkæè¿åæ³ãªã©ã®ã¢ã«ãŽãªãºã ã®çµæã倧ããæªããŸãã ãŸããã¹ã±ãŒãªã³ã°ã«ãããããåé¿ã§ããŸãã ãŸãã ç·åœ¢ååž°ããã©ã³ãã ãã©ã¬ã¹ãããªã©ã®æ¹æ³ã§ã¯æ©èœã®ã¹ã±ãŒãªã³ã°ã¯å¿ èŠãããŸããããããã€ãã®ã¢ã«ãŽãªãºã ãæ¯èŒããéã«ãã®ã¹ããããç¡èŠããªãæ¹ãããã§ãããã
åå±æ§ã䜿çšããŠ0ã1ã®ç¯å²ã«ã¹ã±ãŒãªã³ã°ããŸããå±æ§ã®ãã¹ãŠã®å€ãååŸããæå°å€ãéžæããŠãæ倧å€ãšæå°å€ã®å·®ïŒç¯å²ïŒã§é€ç®ããŸãã ãã®ã¹ã±ãŒãªã³ã°æ¹æ³ã¯ãã°ãã°æ£èŠåãšåŒã°ããä»ã®äž»ãªæ¹æ³ã¯æšæºåã§ãã
ãã®ããã»ã¹ã¯æåã§ç°¡åã«å®è£ ã§ãããããScikit-Learnã®
MinMaxScaler
ãªããžã§ã¯ãã䜿çšããŸãã ãã®ã¡ãœããã®ã³ãŒãã¯ãæ¬ æå€ãåããããã®ã³ãŒããšåãã§ãã貌ãä»ãã®ä»£ããã«ã¹ã±ãŒãªã³ã°ã®ã¿ã䜿çšãããŸãã ãã¬ãŒãã³ã°ã»ããã§ã®ã¿ã¢ãã«ãåŠç¿ãããã¹ãŠã®ããŒã¿ãå€æããããšãæãåºããŠãã ããã
# Create the scaler object with a range of 0-1 scaler = MinMaxScaler(feature_range=(0, 1)) # Fit on the training data scaler.fit(X) # Transform both the training and testing data X = scaler.transform(X) X_test = scaler.transform(X_test)
çŸåšãåå±æ§ã®æå°å€ã¯0ãæ倧å€ã¯1ã§ããæ¬ æå€ã®å ¥åãšå±æ§ã®ã¹ã±ãŒãªã³ã°-ããã2ã€ã®æ®µéã¯ãã»ãšãã©ãã¹ãŠã®æ©æ¢°åŠç¿ããã»ã¹ã§å¿ èŠã§ãã
Scikit-Learnã§æ©æ¢°åŠç¿ã¢ãã«ãå®è£ ããŸã
ãã¹ãŠã®æºåäœæ¥ã®åŸãã¢ãã«ã®äœæããã¬ãŒãã³ã°ãå®è¡ã®ããã»ã¹ã¯æ¯èŒçç°¡åã§ãã Pythonã®Scikit-Learnã©ã€ãã©ãªã䜿çšããŸããScikit-Learnã©ã€ãã©ãªã¯ãææžåãããŠãããã¢ãã«ãæ§ç¯ããããã®ç²Ÿå·§ãªæ§æãåããŠããŸãã Scikit-Learnã§ã¢ãã«ãäœæããæ¹æ³ãåŠç¿ããããšã«ãããããããçš®é¡ã®ã¢ã«ãŽãªãºã ããã°ããå®è£ ã§ããŸãã
åŸé ããŒã¹ãã£ã³ã°ã䜿çšããäœæããã¬ãŒãã³ã°ïŒ
.fit
ïŒãããã³ãã¹ãïŒ
.predict
ïŒã®ããã»ã¹ã説æããŸãã
from sklearn.ensemble import GradientBoostingRegressor # Create the model gradient_boosted = GradientBoostingRegressor() # Fit the model on the training data gradient_boosted.fit(X, y) # Make predictions on the test data predictions = gradient_boosted.predict(X_test) # Evaluate the model mae = np.mean(abs(predictions - y_test)) print('Gradient Boosted Performance on the test set: MAE = %0.4f' % mae) Gradient Boosted Performance on the test set: MAE = 10.0132
äœæããã¬ãŒãã³ã°ããã¹ãçšã®1è¡ã®ã³ãŒãã ä»ã®ã¢ãã«ãæ§ç¯ããã«ã¯ãåãæ§æã䜿çšããŠãã¢ã«ãŽãªãºã ã®ååã®ã¿ãå€æŽããŸãã
ã¢ãã«ã客芳çã«è©äŸ¡ããããã«ãç®æšã®äžå€®å€ã䜿çšããŠããŒã¹ã©ã€ã³ãèšç®ãã24.5ãåŸãŸããã ãŸããçµæã¯ã¯ããã«åªããŠãããããæ©æ¢°åŠç¿ã䜿çšããŠåé¡ã解決ã§ããŸãã
ãã®å Žåã åŸé ããŒã¹ãã£ã³ã° ïŒMAE = 10.013ïŒã¯ããã©ã³ãã ãã©ã¬ã¹ããïŒ10.014 MAEïŒããããããã«åªããŠããããšãå€æããŸããã ãããã®çµæã¯å®å šã«æ£çŽã§ãããšã¯èŠãªããŸãããããã€ããŒãã©ã¡ãŒã¿ãŒã§ã¯ã»ãšãã©ã®å Žåããã©ã«ãå€ã䜿çšããããã§ãã ã¢ãã«ã®æå¹æ§ã¯ããããã®èšå®ã ç¹ã«ãµããŒããã¯ãã«æ³ã«åŒ·ãäŸåããŸãã ããã§ãããããã®çµæã«åºã¥ããŠãåŸé ããŒã¹ãã£ã³ã°ãéžæããæé©åãéå§ããŸãã
ãã€ããŒãã©ã¡ããªãã¯ã¢ãã«ã®æé©å
ã¢ãã«ãéžæããåŸããã€ããŒãã©ã¡ãŒã¿ã調æŽããããšã§ãæå ã®ã¿ã¹ã¯çšã«ã¢ãã«ãæé©åã§ããŸãã
ãããããŸãæåã«ã ãã€ããŒãã©ã¡ãŒã¿ãŒãšã¯äœããéåžžã®ãã©ã¡ãŒã¿ãŒãšã©ã®ããã«éãã®ããç解ããŸãããã
- ã¢ãã«ã®ãã€ããŒãã©ã¡ãŒã¿ãŒã¯ããã¬ãŒãã³ã°ã®éå§åã«èšå®ããã¢ã«ãŽãªãºã ã®èšå®ãšèããããšãã§ããŸãã ããšãã°ããã€ããŒãã©ã¡ãŒã¿ãŒã¯ããã©ã³ãã ãã©ã¬ã¹ããã®ããªãŒæ°ããŸãã¯kæè¿åæ³ã®è¿åæ°ã§ãã
- ã¢ãã«ãã©ã¡ãŒã¿ãŒ-圌女ããã¬ãŒãã³ã°äžã«åŠç¿ããããšãããšãã°ãç·åœ¢ååž°ã®éã¿ã
ãã€ããŒãã©ã¡ãŒã¿ãŒãå¶åŸ¡ããããšã«ãããã¢ãã«ã®çµæã«åœ±é¿ãäžãã æè²äžè¶³ãšåèšç·Žã®ãã©ã³ã¹ãå€æŽããŸãã åŠç¿äžãšã¯ãã¢ãã«ãè€éã§ãªãïŒèªç±åºŠãå°ãªãããïŒããµã€ã³ãšç®æšã®å¯Ÿå¿ãç 究ã§ããªãç¶æ³ã§ãã èšç·Žäžè¶³ã®ã¢ãã«ã«ã¯é«ããã€ã¢ã¹ããããã¢ãã«ãè€éã«ããããšã§ä¿®æ£ã§ããŸãã
åãã¬ãŒãã³ã°ã¯ãã¢ãã«ãåºæ¬çã«ãã¬ãŒãã³ã°ããŒã¿ãèšæ¶ããŠããç¶æ³ã§ãã åãã¬ãŒãã³ã°ãããã¢ãã«ã«ã¯é«ãåæ£ããããæ£èŠåã«ããã¢ãã«ã®è€éããå¶éããããšã§èª¿æŽã§ããŸãã ååã«èšç·ŽãããŠããªãã¢ãã«ãšåèšç·Žãããã¢ãã«ã®äž¡æ¹ã¯ããã¹ãããŒã¿ãé©åã«äžè¬åã§ããŸããã
é©åãªãã€ããŒãã©ã¡ãŒã¿ãéžæããããšã®é£ããã¯ãåã¿ã¹ã¯ã«åºæã®æé©ãªã»ãããããããšã§ãã ãããã£ãŠãæé©ãªèšå®ãéžæããå¯äžã®æ¹æ³ã¯ãæ°ããããŒã¿ã»ããã§ããŸããŸãªçµã¿åãããè©Šãããšã§ãã 幞ããªããšã«ãScikit-Learnã«ã¯ããã€ããŒãã©ã¡ãŒã¿ãŒãå¹ççã«è©äŸ¡ããããã®å€ãã®æ¹æ³ããããŸãã ããã«ã TPOTã®ãããªãããžã§ã¯ãã¯ã éºäŒçããã°ã©ãã³ã°ãªã©ã®ã¢ãããŒãã䜿çšããŠãã€ããŒãã©ã¡ãŒã¿ãŒã®æ€çŽ¢ãæé©åããããšããŠããŸã ã ãã®èšäºã§ã¯ãScikit-Learnã®äœ¿çšã«éå®ããŸãã
ã¯ãã¹ãã§ãã¯ã©ã³ãã æ€çŽ¢
ã©ã³ãã çžäºæ€èšŒã«ãã¯ã¢ãããšåŒã°ãããã€ããŒãã©ã¡ãŒã¿ãŒèª¿æŽã¡ãœãããå®è£ ããŸãããã
- ã©ã³ãã æ€çŽ¢ -ãã€ããŒãã©ã¡ãŒã¿ãŒãéžæããææ³ã ã°ãªãããå®çŸ©ããŠãããããããããŸããŸãªçµã¿åãããã©ã³ãã ã«éžæããŸããããã¯ãåçµã¿åãããé£ç¶ããŠè©Šè¡ããã°ãªããæ€çŽ¢ãšã¯å¯Ÿç §çã§ãã ã¡ãªã¿ã«ã ã©ã³ãã æ€çŽ¢ã¯ã°ãªããæ€çŽ¢ãšã»ãŒåãããã«æ©èœããŸãããã¯ããã«é«éã§ãã
- ã¯ãã¹ãã§ãã¯ã¯ããã€ããŒãã©ã¡ãŒã¿ãŒã®éžæãããçµã¿åãããè©äŸ¡ããæ¹æ³ã§ãã ããŒã¿ããã¬ãŒãã³ã°ã»ãããšãã¹ãã»ããã«åå²ããŠãã¬ãŒãã³ã°ã«äœ¿çšã§ããããŒã¿éãåæžãã代ããã«ãkãããã¯ã®ã¯ãã¹æ€èšŒïŒKãã©ãŒã«ãã¯ãã¹æ€èšŒïŒã䜿çšããŸãã ãããè¡ãã«ã¯ããã¬ãŒãã³ã°ããŒã¿ãkãããã¯ã«åå²ããå埩ããã»ã¹ãå®è¡ããŸãããã®ããã»ã¹ã§ã¯ãæåã«k-1ãããã¯ã§ã¢ãã«ããã¬ãŒãã³ã°ãã次ã«kçªç®ã®ãããã¯ã§åŠç¿ãããšãã®çµæãæ¯èŒããŸãã ããã»ã¹ãkåç¹°ãè¿ããæçµçã«åå埩ã®å¹³åãšã©ãŒå€ãååŸããŸãã ãããæçµè©äŸ¡ã«ãªããŸãã
以äžã¯ãk = 5ã§ã®kãããã¯ã®äº€å·®æ€èšŒã®å³è§£ã§ãã
çžäºæ€èšŒã©ã³ãã æ€çŽ¢ããã»ã¹å šäœã¯æ¬¡ã®ããã«ãªããŸãã
- ãã€ããŒãã©ã¡ãŒã¿ãŒã®ã°ãªãããèšå®ããŸãã
- ãã€ããŒãã©ã¡ãŒã¿ãŒã®çµã¿åãããã©ã³ãã ã«éžæããŸãã
- ãã®çµã¿åããã䜿çšããŠã¢ãã«ãäœæããŸãã
- kãããã¯ã®äº€å·®æ€èšŒã䜿çšããŠãã¢ãã«ã®çµæãè©äŸ¡ããŸãã
- æé©ãªçµæãåŸããããã€ããŒãã©ã¡ãŒã¿ãŒã決å®ããŸãã
ãã¡ãããããã¯ãã¹ãŠæåã§ã¯ãªããScikit-LearnïŒã®
RandomizedSearchCV
ããŠè¡ãããŸãã
å°ããªäœè«ïŒ åŸé ããŒã¹ãã£ã³ã°æ³
åŸé ããŒã¹ãããŒã¹ã®ååž°ã¢ãã«ã䜿çšããŸãã ããã¯éåçãªæ¹æ³ã§ããã€ãŸããã¢ãã«ã¯å€æ°ã®ã匱åŠç¿åšãã§æ§æãããŠããŸãããã®å Žåãåã ã®æ±ºå®æšããã®ãã®ã§ãã çåŸããã©ã³ãã ãã©ã¬ã¹ãããªã©ã®äžŠåã¢ã«ãŽãªãºã ã§åŠç¿ããäºæž¬çµæãæ祚ã«ãã£ãŠéžæãããå ŽåãåŸé ããŒã¹ãã£ã³ã°ãªã©ã®ããŒã¹ãã£ã³ã°ã¢ã«ãŽãªãºã ã§ã¯ãçåŸã¯é çªã«èšç·Žããããããããåä»»è ã®ãã¹ã«ãéäžãããŸãã
è¿å¹ŽãããŒã¹ãã£ã³ã°ã¢ã«ãŽãªãºã ãäžè¬çã«ãªããå€ãã®å Žåãæ©æ¢°åŠç¿ã®ã³ã³ãã¹ãã§åªåããŠããŸãã åŸé ããŒã¹ãã£ã³ã°ã¯ãåŸé éäžã䜿çšããŠé¢æ°ã®ã³ã¹ããæå°åããå®è£ ã®1ã€ã§ãã Scikit-Learnã§ã®åŸé ããŒã¹ãã£ã³ã°ã®å®è£ ã¯ã XGBoostãªã©ã®ä»ã®ã©ã€ãã©ãªã»ã©å¹æçã§ã¯ãªããšèŠãªãããŸãããå°ããªããŒã¿ã»ããã§ããŸãæ©èœããããªãæ£ç¢ºãªäºæž¬ãæäŸããŸãã
ãã€ããŒãã©ã¡ããªãã¯èšå®ã«æ»ã
åŸé ããŒã¹ãã£ã³ã°ã䜿çšããååž°ã§ã¯ãèšå®ãå¿ èŠãªãã€ããŒãã©ã¡ãŒã¿ãŒãå€æ°ãããŸãã詳现ã«ã€ããŠã¯ãScikit-Learnã®ããã¥ã¡ã³ããåç §ããŠãã ããã æé©åãè¡ããŸãïŒ
-
loss
ïŒæ倱é¢æ°ã®æå°åã -
n_estimators
ïŒäœ¿çšããã匱ã決å®æšã®æ°ïŒæ±ºå®æšïŒ; -
max_depth
ïŒå決å®æšã®æ倧深ã; -
min_samples_leaf
ïŒãã·ãžã§ã³ããªãŒã®ããªãŒããããŒãã«ãããµã³ãã«ã®æå°æ°ã -
min_samples_split
ïŒæ±ºå®æšããŒããåå²ããããã«å¿ èŠãªäŸã®æå°æ°ã -
max_features
ïŒããŒãã®åé¢ã«äœ¿çšãããæ©èœã®æ倧æ°ã
ãã¹ãŠãã©ã®ããã«æ©èœããããæ¬åœã«ç解ããŠãã人ããããã©ããã¯ããããŸãããæé©ãªçµã¿åãããèŠã€ããå¯äžã®æ¹æ³ã¯ãããŸããŸãªãªãã·ã§ã³ãè©Šãããšã§ãã
ãã®ã³ãŒãã§ã¯ããã€ããŒãã©ã¡ãŒã¿ãŒã®ã°ãªãããäœæããŠããã
RandomizedSearchCV
ãªããžã§ã¯ããäœæãã25åã®ç°ãªããã€ããŒãã©ã¡ãŒã¿ãŒã®çµã¿åããã«å¯ŸããŠ4ãããã¯ã®äº€å·®æ€èšŒã䜿çšããŠæ€çŽ¢ããŸãã
# Loss function to be optimized loss = ['ls', 'lad', 'huber'] # Number of trees used in the boosting process n_estimators = [100, 500, 900, 1100, 1500] # Maximum depth of each tree max_depth = [2, 3, 5, 10, 15] # Minimum number of samples per leaf min_samples_leaf = [1, 2, 4, 6, 8] # Minimum number of samples to split a node min_samples_split = [2, 4, 6, 10] # Maximum number of features to consider for making splits max_features = ['auto', 'sqrt', 'log2', None] # Define the grid of hyperparameters to search hyperparameter_grid = {'loss': loss, 'n_estimators': n_estimators, 'max_depth': max_depth, 'min_samples_leaf': min_samples_leaf, 'min_samples_split': min_samples_split, 'max_features': max_features} # Create the model to use for hyperparameter tuning model = GradientBoostingRegressor(random_state = 42) # Set up the random search with 4-fold cross validation random_cv = RandomizedSearchCV(estimator=model, param_distributions=hyperparameter_grid, cv=4, n_iter=25, scoring = 'neg_mean_absolute_error', n_jobs = -1, verbose = 1, return_train_score = True, random_state=42) # Fit on the training data random_cv.fit(X, y) After performing the search, we can inspect the RandomizedSearchCV object to find the best model: # Find the best combination of settings random_cv.best_estimator_ GradientBoostingRegressor(loss='lad', max_depth=5, max_features=None, min_samples_leaf=6, min_samples_split=6, n_estimators=500)
ãããã®æé©å€ã«è¿ãã°ãªããã®ãã©ã¡ãŒã¿ãŒãéžæããããšã«ããããããã®çµæãã°ãªããæ€çŽ¢ã«äœ¿çšã§ããŸãã ãã ããããã«ãã¥ãŒãã³ã°ããŠãã¢ãã«ãå€§å¹ ã«æ¹åãããããšã¯ã»ãšãã©ãããŸããã äžè¬çãªã«ãŒã«ããããŸãïŒæèœãªãã£ãŒãã£ã®æ§ç¯ã¯ãæãé«äŸ¡ãªãã€ããŒãã©ã¡ããªãã¯ã»ããã¢ãããããã¢ãã«ã®ç²ŸåºŠã«ã¯ããã«å€§ããªåœ±é¿ãäžããŸãã ããã¯ã æ©æ¢°åŠç¿ã«é¢é£ããŠåçæ§ãäœäžãããæ³åã§ã ãå±æ§ã®èšèšã¯æé«ã®å©çãããããããã€ããŒãã©ã¡ããªãã¯ãã¥ãŒãã³ã°ã¯ããããªå©ç¹ãããããããŸããã
ä»ã®ãã€ããŒãã©ã¡ãŒã¿ãŒã®å€ãä¿æããªããæšå®åšïŒæ±ºå®æšïŒã®æ°ãå€æŽããã«ã¯ããã®èšå®ã®åœ¹å²ã瀺ã1ã€ã®å®éšãå®è¡ã§ããŸãã å®è£ ã¯ããã«ãããŸãããçµæã¯æ¬¡ã®ãšããã§ãã
ã¢ãã«ã§äœ¿çšãããããªãŒã®æ°ãå¢ãããšããã¬ãŒãã³ã°ããã³ãã¹ãäžã®ãšã©ãŒã®ã¬ãã«ãäœäžããŸãã ããããåŠç¿ãšã©ãŒã¯ã¯ããã«éãæžå°ãããã®çµæãã¢ãã«ã¯åãã¬ãŒãã³ã°ãããŸãããã¬ãŒãã³ã°ããŒã¿ã§ã¯åªããçµæã瀺ããŸããããã¹ãããŒã¿ã§ã¯æªåããŸãã
ãã¹ãããŒã¿ã§ã¯ã粟床ã¯åžžã«äœäžããŸãïŒçµå±ãã¢ãã«ã¯ãã¬ãŒãã³ã°ããŒã¿ã»ããã®æ£ããçãã確èªããŸãïŒããå€§å¹ ãªäœäžã¯åãã¬ãŒãã³ã°ã瀺ããŸã ã ãã®åé¡ã¯ããã¬ãŒãã³ã°ããŒã¿ã®éãå¢ãããã ãã€ããŒãã©ã¡ãŒã¿ãŒã䜿çšããŠã¢ãã«ã®è€éãã軜æžããããšã§è§£æ±ºã§ããŸãã ããã§ã¯ããã€ããŒãã©ã¡ãŒã¿ãŒã«ã€ããŠã¯è§ŠããŸããããåãã¬ãŒãã³ã°ã®åé¡ã«åžžã«æ³šæããããšããå§ãããŸãã
æçµã¢ãã«ã§ã¯ã800人ã®è©äŸ¡è ãå¿ èŠã«ãªããŸããããã¯ãçžäºæ€èšŒã§æãäœãã¬ãã«ã®ãšã©ãŒãäžããããã§ãã 次ã«ãã¢ãã«ããã¹ãããŠãã ããïŒ
ãã¹ãããŒã¿ã䜿çšããè©äŸ¡
責任è ãšããŠããã¬ãŒãã³ã°äžã«ã¢ãã«ããã¹ãããŒã¿ã«ã¢ã¯ã»ã¹ã§ããªãããã«ããŸããã ãããã£ãŠã ãã¹ãããŒã¿ãå®éã®ã¿ã¹ã¯ã«äœ¿çšããå Žåã®ã¢ãã«å質ææšãšããŠäœ¿çšããå Žåã粟床ã䜿çšã§ããŸã ã
ã¢ãã«ãã¹ãããŒã¿ããã£ãŒããããšã©ãŒãèšç®ããŸãã 以äžã¯ãããã©ã«ãã®åŸé ããŒã¹ãã£ã³ã°ã¢ã«ãŽãªãºã ãšã«ã¹ã¿ãã€ãºãããã¢ãã«ã®çµæã®æ¯èŒã§ãã
# Make predictions on the test set using default and final model default_pred = default_model.predict(X_test) final_pred = final_model.predict(X_test) Default model performance on the test set: MAE = 10.0118. Final model performance on the test set: MAE = 9.0446.
ãã€ããŒãã©ã¡ããªãã¯ãã¥ãŒãã³ã°ã«ãããã¢ãã«ã®ç²ŸåºŠãçŽ10ïŒ åäžããŸããã ç¶æ³ã«ãã£ãŠã¯ãããã¯éåžžã«å€§ããªæ¹åãšãªãå¯èœæ§ããããŸãããå€ãã®æéãããããŸãã
Jupyter Notebooksã®magic
%timeit
ã䜿çšããŠãäž¡æ¹ã®ã¢ãã«ã®ãã¬ãŒãã³ã°æéãæ¯èŒã§ããŸãã ãŸããã¢ãã«ã®ããã©ã«ãæéã枬å®ããŸãã
%%timeit -n 1 -r 5 default_model.fit(X, y) 1.09 s ± 153 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)
å匷ãã1ç§ã¯éåžžã«ãŸãšãã§ãã ãããã調æŽãããã¢ãã«ã¯ããã»ã©é«éã§ã¯ãããŸããã
%%timeit -n 1 -r 5 final_model.fit(X, y) 12.1 s ± 1.33 s per loop (mean ± std. dev. of 5 runs, 1 loop each)
ãã®ç¶æ³ã¯ãæ©æ¢°åŠç¿ã®åºæ¬çãªåŽé¢ã瀺ããŠããŸãã ãã¹ãŠã劥åã§ãã 粟床ãšè§£éå¯èœæ§ã®ãã©ã³ã¹ã å€äœãšåæ£ã®ãã©ã³ã¹ã粟床ãšåäœæéã®ãã©ã³ã¹ãªã©ãåžžã«éžæããå¿ èŠããããŸãã é©åãªçµã¿åããã¯ãç¹å®ã®ã¿ã¹ã¯ã«ãã£ãŠå®å šã«æ±ºå®ãããŸãã ç§ãã¡ã®å Žåãçžå¯Ÿçãªçšèªã§ã®äœæ¥æéã®12åã®å¢å ã¯å€§ããã§ããã絶察çãªçšèªã§ã¯éèŠã§ã¯ãããŸããã
æçµçãªäºæž¬çµæãåŸãããã®ã§ãããããåæããŠãé¡èãªåå·®ããããã©ããã調ã¹ãŸãããã å·ŠåŽã¯äºæž¬å€ãšå®æ°å€ã®å¯åºŠã®ã°ã©ããå³åŽã¯ãšã©ãŒã®ãã¹ãã°ã©ã ã§ãã
ã¢ãã«ã®äºæž¬ã¯å®éã®å€ã®ååžãããç¹°ãè¿ããŸããããã¬ãŒãã³ã°ããŒã¿ã§ã¯ãå¯åºŠããŒã¯ã¯å®éã®å¯åºŠããŒã¯ïŒçŽ100ïŒãããäžå€®å€ïŒ66ïŒã®è¿ãã«äœçœ®ããŠããŸãã ã¢ãã«ã®äºæž¬ãå®éã®ããŒã¿ãšå€§ããç°ãªãå Žåãããã€ãã®å€§ããªè² ã®å€ããããŸããããšã©ãŒã¯ã»ãŒæ£èŠååžã«ãªããŸãã 次ã®èšäºã§ã¯ãçµæã®è§£éãããã«è©³ãã調ã¹ãŸãã
ãããã«
ãã®èšäºã§ã¯ãæ©æ¢°åŠç¿ã®åé¡ã解決ããããã€ãã®æ®µéãæ€èšããŸããã
- æ¬ æå€ã®å ¥åãšã¹ã±ãŒãªã³ã°æ©èœã
- ããã€ãã®ã¢ãã«ã®çµæã®è©äŸ¡ãšæ¯èŒã
- ã©ã³ãã ã°ãªããæ€çŽ¢ãšçžäºæ€èšŒã䜿çšãããã€ããŒãã©ã¡ããªãã¯ãã¥ãŒãã³ã°ã
- ãã¹ãããŒã¿ã䜿çšããæé©ãªã¢ãã«ã®è©äŸ¡ã
çµæã¯ãå©çšå¯èœãªçµ±èšã«åºã¥ããŠæ©æ¢°åŠç¿ã䜿çšããŠEnergy Starã¹ã³ã¢ãäºæž¬ã§ããããšã瀺ããŠããŸãã åŸé ããŒã¹ãã£ã³ã°ã䜿çšãããšããã¹ãããŒã¿ã§9.1ã®ãšã©ãŒãéæãããŸããã ãã€ããŒãã©ã¡ããªãã¯ãã¥ãŒãã³ã°ã¯çµæãå€§å¹ ã«æ¹åã§ããŸãããå€§å¹ ãªé床äœäžãç ç²ã«ããŸãã ããã¯ãæ©æ¢°åŠç¿ã§èæ ®ãã¹ãå€ãã®ãã¬ãŒããªãã®1ã€ã§ãã
次ã®èšäºã§ã¯ãã¢ãã«ã®ä»çµã¿ãç解ããããšããŸãã ãŸãããšãã«ã®ãŒã¹ã¿ãŒã¹ã³ã¢ã«åœ±é¿ãäžããäž»ãªèŠå ã«ã€ããŠã説æããŸãã ã¢ãã«ãæ£ç¢ºã§ããããšãããã£ãŠããå Žåã¯ãã¢ãã«ããã®ããã«äºæž¬ããçç±ãšããããåé¡èªäœã«ã€ããŠæããŠãããããšãç解ããããšããŸãã