
ã³ãŒã¹ã®æ§æïŒ
- Habréã«é¢ãã10ä»¶ã®èšäºïŒããã³è±èªã®Mediumã«é¢ããåãèšäºïŒ
 - 10ã®è¬çŸ©ïŒãã·ã¢èªã®YouTubeãã£ã³ãã«+è±èªã®æè¿ã®è¬çŸ© ïŒãåãããã¯ã®è©³çްãªèª¬æã¯ãã®èšäºã«ãããŸã
 - mlcourse.ai ãªããžããªãŒããã³KaggleããŒã¿ã»ãã圢åŒã®åçŸå¯èœãªè³æïŒJupyterããŒãããã¯ïŒïŒãã©ãŠã¶ãŒã®ã¿ãå¿ èŠã§ãïŒ
 - çŽ æŽãããKaggle Inclassã³ã³ããã£ã·ã§ã³ïŒãxgboostsã®ã¬ã©ã¹ãã§ã¯ãªããæšèã®äœæïŒ
 - åãããã¯ã®å®¿é¡ïŒãªããžããªå -ã¿ã¹ã¯ã®ãã¢ããŒãžã§ã³ã®ãªã¹ã ïŒ
 - è©äŸ¡ã®åæ©ä»ããè±å¯ãªã©ã€ãã³ãã¥ãã±ãŒã·ã§ã³ãèè ããã®è¿ éãªãã£ãŒãããã¯
 
çŸåšã®ã³ãŒã¹ã®éå§ã¯ã 2018幎10æ1æ¥ããè±èªã§è¡ãããŸãïŒåå ã®ããã®èª¿æ»ãžã®ãªã³ã¯ ãè±èªã§èšå ¥ïŒã VK ã°ã«ãŒãã®çºè¡šã«åŸããOpenDataScienceã³ãã¥ããã£ã«åå ããŠãã ããã
ã·ãªãŒãºã®èšäºã®ãªã¹ã
- ãã³ãã䜿çšããäžæ¬¡ããŒã¿åæ
 - Pythonã䜿çšããããžã¥ã¢ã«ããŒã¿åæ
 - åé¡ãæ±ºå®æšãããã³æè¿åæ³
 - ç·åœ¢åé¡ããã³ååž°ã¢ãã«
 - æïŒãã®ã³ã°ãã©ã³ãã ãã©ã¬ã¹ã
 - æšèã®äœæãšéžæã ã¯ãŒãããç»åãããã³ãžãªããŒã¿ã¿ã¹ã¯ã®ã¢ããªã±ãŒã·ã§ã³
 - æåž«ãªãåŠç¿ïŒPCAãã¯ã©ã¹ã¿ãªã³ã°
 - Vowpal Wabbitã«ããã®ã¬ãã€ãããŒã¹ã®ãã¬ãŒãã³ã°
 - Pythonæç³»ååæ
 - åŸé ããŒã¹ã
 
ãã®èšäºã®æŠèŠ
- ã³ãŒã¹ã«ã€ããŠ
 - ã³ãŒã¹ã®å®¿é¡
 - ãã³ãã®åºæ¬çãªæ¹æ³ã®ãã¢ã³ã¹ãã¬ãŒã·ã§ã³
 - æµåºãäºæž¬ããæåã®è©Šã¿
 - 宿é¡â1
 - æçšãªãªãœãŒã¹ã®æŠèŠ
 
1.ã³ãŒã¹ã«ã€ããŠ
ç§ãã¡ã¯ãæ©æ¢°åŠç¿ãŸãã¯ããŒã¿åæã«é¢ããå¥ã®å æ¬çãªå ¥éã³ãŒã¹ãéçºããã¿ã¹ã¯ãèšå®ããŠããŸããïŒã€ãŸããããã¯ãYandexãšMIPTã®å°éåãHSEã§ã®è¿œå æè²ããã®ä»ã®åºæ¬çãªãªã³ã©ã€ã³ããã³ãªãã©ã€ã³ããã°ã©ã ãšæ¬ã«ä»£ãããã®ã§ã¯ãããŸããïŒã ãã®äžé£ã®èšäºã®ç®çã¯ãç¥èããã°ãã磚ããããããªãç ç©¶ã®ããã«ãããã¯ãèŠã€ããã®ãå©ããããšã§ãã ãã®ã¢ãããŒãã¯ãæ°åŠã𿩿¢°åŠç¿ã®åºç€ã®ã¬ãã¥ãŒããå§ãŸãã æ·±å±€åŠç¿ã®æ¬ã®èè ã®ã¢ãããŒãã«äŒŒãŠããŸã-çããæå€§éã«èœåãããããœãŒã¹ãžã®è±å¯ãªãªã³ã¯ããããŸãã
ã³ãŒã¹ãåè¬ããå Žåã¯ãèŠåã衚瀺ããŸãããããã¯ãéžæããŠè³æãäœæãããšãã¯ãåŠçãå°é倧åŠã®2幎ã¬ãã«ã§æ°åŠãçè§£ããå°ãªããšãPythonã§ããã°ã©ãã³ã°ããæ¹æ³ãç¥ã£ãŠããããšã«æ³šç®ããŸãã ãããã¯å³å¯ãªéžæåºæºã§ã¯ãªããåãªãæšå¥šäºé ã§ã-æ°åŠãPythonãç¥ããªããŠãã³ãŒã¹ã«ç»é²ã§ããåæã«æ§æã§ããŸãïŒ
- åºæ¬çãªæ°åŠïŒæ°åŠè§£æãç·åœ¢ä»£æ°ãæé©åãçè«ãçµ±èšïŒã¯ã ãããã® YandexïŒMIPTããŒãïŒèš±å¯ãšå ±æïŒã«åŸã£ãŠç¹°ãè¿ãããšãã§ããŸãã ç°¡åã«èšãã°ããã·ã¢èªã§-ããªããå¿ èŠãªãã®ã 詳现ã§ããã°ããã¿ã³-ã¯ããªã£ããã§ãããªããŒã«-ã³ã¹ããªãã³ãæé©å-ãã€ãïŒè±èªïŒãçè«ãšçµ±èš-ãããºã³ã ããã«ãMIPTã®åªãããªã³ã©ã€ã³ã³ãŒã¹ãšCourseraã®HSEã
 - Pythonã®å ŽåãDatacampã®å°ããªã€ã³ã¿ã©ã¯ãã£ããã¥ãŒããªã¢ã«ãŸãã¯Pythonãšåºæ¬çãªã¢ã«ãŽãªãºã ãšããŒã¿æ§é ã«é¢ãããã®ãªããžããªã§ååã§ãã ããé«åºŠãªãã®ã¯ãããšãã°ããµã³ã¯ãããã«ãã«ã¯ã®ã³ã³ãã¥ãŒã¿ãŒãµã€ãšã³ã¹ã»ã³ã¿ãŒã®ã³ãŒã¹ã§ãã
 - æ©æ¢°åŠç¿ã«é¢ããŠã¯ãã€ãŸããå€å žçãªïŒãããå°ãæä»£é ãã®ïŒAndrew Ngæ©æ¢°åŠç¿ã³ãŒã¹ïŒStanfordãCourseraïŒã§ãã ãã·ã¢èªã§ã¯ãMIPTãšYandexã®åªããå°éåéã§ãããæ©æ¢°åŠç¿ãšããŒã¿åæãããããŸãã ãããŠãããã«æé«ã®æ¬ããããŸãïŒããã¿ãŒã³èªèãšæ©æ¢°åŠç¿ãïŒåžæïŒããæ©æ¢°åŠç¿ïŒç¢ºçç屿ãïŒããŒãã£ãŒïŒããçµ±èšåŠç¿ã®èŠçŽ ãïŒHastieãTibshiraniãFriedmanïŒããDeep LearningãïŒGoodfellowïŒ ããã³ãŽã£ã«ãã¯ãŒã«ãã«ïŒã Goodfellowã®æ¬ã¯ãæ°åŠã®ã¬ãã¥ãŒãšãæ©æ¢°åŠç¿ãšãã®ã¢ã«ãŽãªãºã ã®å éšæ§é ã®ããããããè峿·±ã玹ä»ããå§ãŸããŸãã ãã·ã¢èªã®ãã£ãŒãã©ãŒãã³ã°ã«é¢ããæ¬ããããŸããããã£ãŒãã©ãŒãã³ã°ïŒãã¥ãŒã©ã«ãããã¯ãŒã¯ã®äžçãžã®æ²¡å ¥ãïŒãã³ã¬ã³ã³S. I.ãã«ãã¥ãªã³A. A.ãã¢ã«ãã³ã²ãªã¹ã«ã€E. O.ïŒã
 
ãŸããã³ãŒã¹ã«ã€ããŠã¯ãã®çºè¡šã«èšèŒãããŠããŸãã
ã©ã®ãœãããŠã§ã¢ãå¿ èŠã§ãã
ã³ãŒã¹ãå®äºããã«ã¯ãå€ãã®Pythonããã±ãŒãžãå¿ èŠã§ãããããã®ã»ãšãã©ã¯ãPython 3.6ã䜿çšããAnacondaãã«ãã«å«ãŸããŠããŸãã åŸã§ãä»ã®ã©ã€ãã©ãªãå¿ èŠã«ãªããŸããããã«ã€ããŠã¯åŸã§èª¬æããŸãã å®å šãªãªã¹ãã¯Dockerfileã«ãããŸãã
ãŸããå¿ èŠãªãœãããŠã§ã¢ããã¹ãŠã€ã³ã¹ããŒã«ãããŠããDockerã³ã³ããã䜿çšããããšãã§ããŸãã 詳现ã¯ã ãªããžããªã® Wiki ããŒãžã«ãããŸã ã
ã³ãŒã¹ãžã®æ¥ç¶æ¹æ³
 æ£åŒãªç»é²ã¯å¿
èŠãããŸãããéå§ïŒ18幎1æ1æ¥ïŒåŸã§ããã°ãã€ã§ãã³ãŒã¹ã«æ¥ç¶ã§ããŸããã宿é¡ã®ç· ãåãã¯å€§å€ã§ãã 
      
        
        
        
      
     ããããããªãã«ã€ããŠãã£ãšç¥ãããã«ïŒ 
- ã¢ã³ã±ãŒãã«èšå ¥ããŠã å®éã®ååã瀺ããŸãã
 - OpenDataScienceã³ãã¥ããã£ã«åå ããŠãã ãã ãã³ãŒã¹ã®è°è«ã¯ãã£ã³ãã«ïŒmlcourse.aiã§è¡ããŸãã
 
2.ã³ãŒã¹ã®å®¿é¡
- åèšäºã«ã¯JupyterããŒãããã¯åœ¢åŒã®å®¿é¡ãä»å±ããŠãããã³ãŒãã远å ããå¿ èŠããããŸããããã«åºã¥ããŠãGoogleã®åœ¢åŒã§æ£ããçããéžæããŸãã
 - 宿é¡ã®ãœãªã¥ãŒã·ã§ã³ã¯ããã©ãŒã ã§ãœãªã¥ãŒã·ã§ã³ãæåºãã人ã«éä¿¡ãããŸãã
 - äžé£ã®èšäºã®æåŸã«ãèŠçŽãèŠçŽãããŸãïŒåå è ã®è©äŸ¡ïŒã
 - 宿é¡ã®äŸã¯ãã·ãªãŒãºã®èšäºïŒæåŸïŒã«èšèŒãããŠããŸãã
 
3.ãã³ãã®åºæ¬çãªæ¹æ³ã®ãã¢ã³ã¹ãã¬ãŒã·ã§ã³
ãã¹ãŠã®ã³ãŒãã¯ã ãã® JupyterããŒãããã¯ã§åçŸã§ããŸãã
Pandasã¯ãåºç¯ãªããŒã¿åææ©èœãæäŸããPythonã©ã€ãã©ãªã§ãã ããŒã¿ã»ã³ãã£ã¹ãã䜿çšããããŒã¿ã¯ãå€ãã®å Žåãã©ãã«ã®åœ¢åŒã§ä¿åãããŸããããšãã°ã.csvã.tsvããŸãã¯.xlsx圢åŒã§ãã Pandasã©ã€ãã©ãªã䜿çšãããšããã®ãããªè¡šåœ¢åŒã®ããŒã¿ã¯ãSQLã«äŒŒãã¯ãšãªã䜿çšããŠèªã¿èŸŒã¿ãåŠçãåæããã®ã«éåžžã«äŸ¿å©ã§ãã ãŸããã©ã€ãã©ãªMatplotlibããã³Seaborn Pandasãšçµã¿åãããŠã衚圢åŒããŒã¿ã®èŠèŠçåæã®ããã®ååãªæ©äŒãæäŸããŸãã
Pandasã®äž»ãªããŒã¿æ§é ã¯ã Series ã¯ã©ã¹ãšDataFrameã¯ã©ã¹ã§ãã ãããã®æåã®ãã®ã¯ãåºå®ã¿ã€ãã®1次å ã®ã€ã³ããã¯ã¹ä»ãããŒã¿é åã§ãã 2çªç®ã¯ã2次å ã®ããŒã¿æ§é ã§ããããã¯ãååã«åãã¿ã€ãã®ããŒã¿ãå«ãŸããããŒãã«ã§ãã Seriesãªããžã§ã¯ãã®èŸæžãšèããããšãã§ããŸãã DataFrameæ§é ã¯ãå®éã®ããŒã¿ã®è¡šç€ºã«æé©ã§ããè¡ã¯åã ã®ãªããžã§ã¯ãã®æ©èœã®èª¬æã«å¯Ÿå¿ããåã¯æ©èœã«å¯Ÿå¿ããŸãã
#  Pandas  Numpy import pandas as pd import numpy as np
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       ããžãã¹ã®äž»ãªæ¹æ³ã瀺ããéä¿¡äºæ¥è
ã®é¡§å®¢ã®æµåºã«é¢ããããŒã¿ã»ãããåæããŸã ïŒããŠã³ããŒãããå¿
èŠã¯ãããŸããããªããžããªã«ãããŸãïŒã ããŒã¿ãèªã¿åãïŒ read_csv
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããïŒã head
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠæåã®5è¡ã確èªããŸãã 
 df = pd.read_csv('../../data/telecom_churn.csv')
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       df.head()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
        JupyterããŒãããã¯ã§ã¯ãPandasããŒã¿ãã¬ãŒã ã¯ãã®ãããªçŸãããã¬ãŒãã®åœ¢ã§è¡šç€ºããã print(df.head())
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãæªåããŸãã 
      
        
        
        
      
     ããã©ã«ãã§ã¯ãPandasã¯20åãš60è¡ã®ã¿ã衚瀺ãããããããŒã¿ãã¬ãŒã ã倧ããå Žåã¯ã set_option
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    颿°ã䜿çšããŸãã 
 pd.set_option('display.max_columns', 100) pd.set_option('display.max_rows', 100)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
       åè¡ã¯1ã€ã®ã¯ã©ã€ã¢ã³ãã衚ããŸã-ããã¯èª¿æ»ã®å¯Ÿè±¡ã§ã ã 
      
        
        
        
      
     åã¯ãªããžã§ã¯ãã®æ©èœã§ã ã 
| åœ¹è· | 説æ | çš®é¡ | 
|---|---|---|
| éœéåºç | å·ã®æçŽã³ãŒã | 宿 Œ | 
| ã¢ã«ãŠã³ãã®é·ã | äŒç€Ÿã顧客ã«ãµãŒãã¹ãæäŸããŠããæé | å®éç | 
| åžå€å±çª | é»è©±çªå·ã®ãã¬ãã£ãã¯ã¹ | å®éç | 
| åœéèšç» | åœéããŒãã³ã°ïŒæ¥ç¶æžã¿/æªæ¥ç¶ïŒ | ãã€ã㪠| 
| ãã€ã¹ã¡ãŒã«ãã©ã³ | ãã€ã¹ã¡ãŒã«ïŒæ¥ç¶æžã¿/æªæ¥ç¶ïŒ | ãã€ã㪠| 
| vmailã¡ãã»ãŒãžã®æ° | é³å£°ã¡ãã»ãŒãžã®æ° | å®éç | 
| ç·æ¥å | æ¥äžã®äŒè©±ã®åèšæé | å®éç | 
| åèšæ¥é話 | æ¥äžã®åèšéè©±æ° | å®éç | 
| åèšæ¥æé | æ¥äžã®ãµãŒãã¹ã®æ¯æãç·é¡ | å®éç | 
| åèšåå€ | 倿¹ã®åèšäŒè©±æé | å®éç | 
| ç·éè©±æ° | åèšå€ã®åŒã³åºã | å®éç | 
| å倿é | 倿¹ã®ãµãŒãã¹ã®æ¯æãç·é¡ | å®éç | 
| ç·å€æ° | å€ã®äŒè©±ã®åèšæé | å®éç | 
| åèšå€éé話 | å€ã®åèšéè©±æ° | å®éç | 
| åèšå®¿æ³æé | åèšå€éãµãŒãã¹æ | å®éç | 
| åèšåœéå | åœéé話ã®åèšæé | å®éç | 
| åèšåœéé»è©± | åèšåœéé»è©± | å®éç | 
| åèšæé | åœéé話æéã®åèš | å®éç | 
| ã«ã¹ã¿ããŒãµãŒãã¹ã³ãŒã« | ãµãŒãã¹ã»ã³ã¿ãŒãžã®åŒã³åºãåæ° | å®éç | 
å¯Ÿè±¡å€æ°ïŒ ãã£ãŒã³ - æµåºç¬Šå·ããã€ããªç¬Šå·ïŒ1-ã¯ã©ã€ã¢ã³ãæå€±ãã€ãŸãæµåºïŒã 次ã«ããã®æ©èœãæ®ãããäºæž¬ããã¢ãã«ãæ§ç¯ããŸãããããã¿ãŒã²ãããšåŒã°ããçç±ã§ãã
ããŒã¿ã®ãµã€ãºãç¹æ§ã®ååãããã³ãããã®ã¿ã€ããèŠãŠã¿ãŸãããã
 print(df.shape)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       (3333, 20)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      ããŒãã«ã«ã¯3333è¡ãš20åããããŸãã ååã衚瀺ããŸãã
 print(df.columns)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       Index(['State', 'Account length', 'Area code', 'International plan', 'Voice mail plan', 'Number vmail messages', 'Total day minutes', 'Total day calls', 'Total day charge', 'Total eve minutes', 'Total eve calls', 'Total eve charge', 'Total night minutes', 'Total night calls', 'Total night charge', 'Total intl minutes', 'Total intl calls', 'Total intl charge', 'Customer service calls', 'Churn'], dtype='object')
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       ããŒã¿ãã¬ãŒã ãšãã¹ãŠã®èšå·ã«é¢ããäžè¬æ
å ±ã衚瀺ããã«ã¯ã info
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŸãã 
 print(df.info())
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
       <class 'pandas.core.frame.DataFrame'> RangeIndex: 3333 entries, 0 to 3332 Data columns (total 20 columns): State 3333 non-null object Account length 3333 non-null int64 Area code 3333 non-null int64 International plan 3333 non-null object Voice mail plan 3333 non-null object Number vmail messages 3333 non-null int64 Total day minutes 3333 non-null float64 Total day calls 3333 non-null int64 Total day charge 3333 non-null float64 Total eve minutes 3333 non-null float64 Total eve calls 3333 non-null int64 Total eve charge 3333 non-null float64 Total night minutes 3333 non-null float64 Total night calls 3333 non-null int64 Total night charge 3333 non-null float64 Total intl minutes 3333 non-null float64 Total intl calls 3333 non-null int64 Total intl charge 3333 non-null float64 Customer service calls 3333 non-null int64 Churn 3333 non-null bool dtypes: bool(1), float64(8), int64(8), object(3) memory usage: 498.1+ KB None
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
        bool
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ã int64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ã float64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ããã³object
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¯å±æ§ã®ã¿ã€ãã§ãã  1ã€ã®å±æ§ãè«çïŒããŒã«ïŒã3ã€ã®å±æ§ããªããžã§ã¯ãåã16ã®å±æ§ãæ°å€ã§ããããšãããããŸãã ãŸãã info
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠããŒã¿ã®ã®ã£ããããã°ãã確èªãããšäŸ¿å©ã§ãããã®äŸã§ã¯ãååã«3333ã®èŠ³æž¬å€ããããŸãã 
  astype
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠãåã¿ã€ãã倿Žã§ããŸã ã ãã®ã¡ãœãããChurn
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    é©çšãã int64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    倿ãint64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ïŒ 
 df['Churn'] = df['Churn'].astype('int64')
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
        describe
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã¯ãåæ°å€ç¹æ§ïŒã¿ã€ãint64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ããã³float64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ïŒã®ããŒã¿ã®äž»èŠãªçµ±èšç¹æ§ã瀺ããŸããæ¬ æå€ã®æ°ãå¹³åãæšæºåå·®ãç¯å²ãäžå€®å€ã0.25ããã³0.75ååäœæ°ã 
 df.describe()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
       éæ°å€èšå·ã®çµ±èšã調ã¹ãã«ã¯ã察象ã®ã¿ã€ããæç€ºçã«include
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãã©ã¡ãŒã¿ãŒã«æå®ããå¿
èŠããããŸãã 
 df.describe(include=['object', 'bool'])
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | éœéåºç | åœéèšç» | ãã€ã¹ã¡ãŒã«ãã©ã³ | |
|---|---|---|---|
| æ°ãã | 3333 | 3333 | 3333 | 
| ãŠããŒã¯ãª | 51 | 2 | 2 | 
| ããã | Wv | ãã | ãã | 
| é »åºŠ | 106 | 3010 | 2411 | 
 ã«ããŽãªåïŒ object
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    åïŒããã³ããŒã«åïŒããŒã«åïŒã®èšå·ã«ã¯ã value_counts
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšã§ããŸãã ã¿ãŒã²ãã倿°Churn
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã«ããããŒã¿ã®ååžãèŠãŠã¿ãŸãããã 
 df['Churn'].value_counts()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       0 2850 1 483 Name: Churn, dtype: int64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
        3333人ã®ãŠãŒã¶ãŒã®ãã¡2850人ãå¿ å®ã§ã Churn
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    倿°ã®å€ã¯0
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã§ãã 
 倿°Area code
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã«ãããŠãŒã¶ãŒã®ååžãèŠãŠã¿ãŸãããã ãã©ã¡ãŒã¿ãŒã®å€normalize=True
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãæå®ããŠãçµ¶å¯Ÿåšæ³¢æ°ã§ã¯ãªãçžå¯Ÿåšæ³¢æ°ã衚瀺ããŸãã 
 df['Area code'].value_counts(normalize=True)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       415 0.496550 510 0.252025 408 0.251425 Name: Area code, dtype: float64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      ä»åã
  DataFrameã¯ãä»»æã®ç¬Šå·ã®å€ã§ãœãŒãã§ããŸãã ãã®äŸã§ã¯ãããšãã°ã Total day charge
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ïŒ ascending=False
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã§äžŠã¹æ¿ããå Žåã¯ascending=False
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ïŒã«ãã£ãŠïŒ 
 df.sort_values(by='Total day charge', ascending=False).head()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
      åã°ã«ãŒãã§ãœãŒãã§ããŸãïŒ
 df.sort_values(by=['Churn', 'Total day charge'], ascending=[True, False]).head()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      æä»£é ãã®ãœãŒã makkos ã«é¢ããã³ã¡ã³ãã ããããšã
      ããŒã¿ã®ã€ã³ããã¯ã¹äœæãšååŸ
DataFrameã¯ãããŸããŸãªæ¹æ³ã§ã€ã³ããã¯ã¹ãäœæã§ããŸãã ãã®ç¹ã«é¢ããŠãç°¡åãªè³ªåãäŸãšããŠäœ¿çšããŠãããŒã¿ãã¬ãŒã ããå¿ èŠãªããŒã¿ãã€ã³ããã¯ã¹ä»ãããã³æœåºããããŸããŸãªæ¹æ³ãæ€èšããŸãã
 åäžã®åãååŸããã«ã¯ã DataFrame['Name']
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãšãã圢åŒã®æ§æã䜿çšã§ããŸãã ããã䜿çšããŠãããŒã¿ãã¬ãŒã å
ã®äžèª å®ãªãŠãŒã¶ãŒã®å²åã¯ã©ããããããšãã質åã«çããŸãã 
 df['Churn'].mean(). # : 0.14491449144914492
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      14.5ïŒ ã¯äŒæ¥ã«ãšã£ãŠããªãæªãææšã§ããããã®ãããªå²åã®æµåºãç Žãããšãã§ããŸãã
 éåžžã«äŸ¿å©ãªã®ã¯ãåäžã®åã§DataFrameã®è«çã€ã³ããã¯ã¹ãäœæããããšã§ã ã  df[P(df['Name'])]
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã«ãªããŸããããã§ã P
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¯Name
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    åã®åèŠçŽ ã«å¯ŸããŠãã§ãã¯ãããè«çæ¡ä»¶ã§ãã ãã®ã€ã³ããã¯ã¹ã®çµæã¯ã Name
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    åã®æ¡ä»¶P
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãæºããè¡ã®ã¿ã§æ§æãããDataFrameã§ãã 
ããã䜿çšããŠã äžèª å®ãªãŠãŒã¶ãŒã®æ°å€èšå·ã®å¹³åå€ã¯ãããã§ããïŒãšãã質åã«çããŸãã
 df[df['Churn'] == 1].mean()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       Account length 102.664596 Number vmail messages 5.115942 Total day minutes 206.914079 Total day calls 101.335404 Total day charge 35.175921 Total eve minutes 212.410145 Total eve calls 100.561077 Total eve charge 18.054969 Total night minutes 205.231677 Total night calls 100.399586 Total night charge 9.235528 Total intl minutes 10.700000 Total intl calls 4.163561 Total intl charge 2.889545 Customer service calls 2.229814 Churn 1.000000 dtype: float64
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      åã®2çš®é¡ã®ã€ã³ããã¯ã¹äœæãçµã¿åãããŠã質åã«çããŸããäžèª å®ãªãŠãŒã¶ãŒã¯ãæ¥äžå¹³åããŠã©ããããé»è©±ã§è©±ããŸããïŒ
 df[df['Churn'] == 1]['Total day minutes'].mean() # : 206.91407867494823
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
        åœéããŒãã³ã°ãµãŒãã¹ã䜿çšããªãå¿ å®ãªãŠãŒã¶ãŒïŒ Churn == 0
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ïŒïŒ 'International plan' == 'No'
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ïŒéã®åœééè©±ã®æå€§é·ã¯ïŒ 
 df[(df['Churn'] == 0) & (df['International plan'] == 'No')]['Total intl minutes'].max() # : 18.899999999999999
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       ããŒã¿ãã¬ãŒã ã«ã¯ãåãŸãã¯è¡ã®ååããŸãã¯ã·ãªã¢ã«çªå·ã§ã€ã³ããã¯ã¹ãä»ããããšãã§ããŸãã  ååã«ãã玢åŒä»ãã«ã¯ã çªå· iloc
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã«ãã loc
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšãããŸãã 
æåã®ã±ãŒã¹ã§ã¯ã ã0ãã5ã®è¡ã®idããã³å·ããåžå€å±çªã®å ã®å€ãæž¡ããŸãã ã2çªç®ã®å Žåã ãæåã®3åã®æåã®5è¡ã®å€ãæž¡ããŸãããšèšããŸã ã
ãã¹ãã¹ãžã®æ³šæïŒã¹ã©ã€ã¹ãªããžã§ã¯ããilocã«æž¡ããšãããŒã¿ãã¬ãŒã ã¯éåžžã©ããåããŸãã ãã ãã locã®å Žåãã¹ã©ã€ã¹ã®éå§ãšçµäºã®äž¡æ¹ãèæ ®ãããŸãïŒ ããã¥ã¡ã³ããžã®ãªã³ã¯ ãã³ã¡ã³ããããããšãarkane0906 ïŒã
 df.loc[0:5, 'State':'Area code']
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | éœéåºç | ã¢ã«ãŠã³ãã®é·ã | åžå€å±çª | |
|---|---|---|---|
| 0 | Ks | 128 | 415 | 
| 1 | ãã | 107 | 415 | 
| 2 | ãã¥ãŒãžã£ãŒãžãŒ | 137 | 415 | 
| 3 | ãã | 84 | 408 | 
| 4 | ããã£ã | 75 | 415 | 
| 5 | AL | 118 | 510 | 
 df.iloc[0:5, 0:3]
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | éœéåºç | ã¢ã«ãŠã³ãã®é·ã | åžå€å±çª | |
|---|---|---|---|
| 0 | Ks | 128 | 415 | 
| 1 | ãã | 107 | 415 | 
| 2 | ãã¥ãŒãžã£ãŒãžãŒ | 137 | 415 | 
| 3 | ãã | 84 | 408 | 
| 4 | ããã£ã | 75 | 415 | 
 ããŒã¿ãã¬ãŒã ã®æåãŸãã¯æåŸã®è¡ãå¿
èŠãªå Žåã¯ã df[:1]
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãŸãã¯df[-1:]
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    æ§é ã䜿çšããŸãã 
 df[-1:]
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
      ã»ã«ãåãããã³è¡ãžã®é¢æ°ã®é©çš
ååãžã®é¢æ°ã®é©çšïŒé©çš
 df.apply(np.max)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
       State WY Account length 243 Area code 510 International plan Yes Voice mail plan Yes Number vmail messages 51 Total day minutes 350.8 Total day calls 165 Total day charge 59.64 Total eve minutes 363.7 Total eve calls 170 Total eve charge 30.91 Total night minutes 395 Total night calls 175 Total night charge 17.77 Total intl minutes 20 Total intl calls 20 Total intl charge 5.4 Customer service calls 9 Churn True dtype: object
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
        apply
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠãåè¡ã«é¢æ°ãapply
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ããããšãã§ããŸãã ãããè¡ãã«ã¯ã axis=1
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    æå®ããŸãã 
  åå
ã®åã»ã«ã«é¢æ°ãé©çšããïŒ map
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     
 ããšãã°ã map
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠã {old_value: new_value}
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    圢åŒã®èŸæžãåŒæ°ãšããŠæž¡ãããšã«ãããåã®å€ã眮æã§ã{old_value: new_value}
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ã 
 d = {'No' : False, 'Yes' : True} df['International plan'] = df['International plan'].map(d) df.head()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
        replace
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠãåæ§ã®æäœãå®è¡ã§ããŸãã 
 df = df.replace({'Voice mail plan': d}) df.head()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
      ããŒã¿ã®ã°ã«ãŒãå
äžè¬ã«ãPandasã®ããŒã¿ã°ã«ãŒãã¯æ¬¡ã®ãšããã§ãã
 df.groupby(by=grouping_columns)[columns_to_show].function()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      -   groupbyã¡ãœããã¯
groupby
é©çšãããããŒã¿ãgrouping_columns
ïŒç¹æ§ãŸãã¯ç¹æ§ã»ããïŒã§åé¢ããŸãã -  å¿
èŠãªåãéžæããŸãïŒ 
columns_to_show
ïŒã - 1ã€ãŸãã¯è€æ°ã®æ©èœããåä¿¡ããã°ã«ãŒãã«é©çšãããŸãã
 
  Churn
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    屿§ã®å€ã«å¿ããŠããŒã¿ãã°ã«ãŒãåããåã°ã«ãŒãã®3ã€ã®åã®çµ±èšã衚瀺ããŸãã 
 columns_to_show = ['Total day minutes', 'Total eve minutes', 'Total night minutes'] df.groupby(['Churn'])[columns_to_show].describe(percentiles=[])
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
      
        
        
        
      
    
 åãããšãããŸãããããã ãããããã«ç°ãªãæ¹æ³ã§ã颿°ã®ãªã¹ããagg
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    æž¡ããŸãã 
 columns_to_show = ['Total day minutes', 'Total eve minutes', 'Total night minutes'] df.groupby(['Churn'])[columns_to_show].agg([np.mean, np.std, np.min, np.max])
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
      èŠçŽè¡š
 ãµã³ãã«ã®èŠ³æž¬å€ãã Churn
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãšInternational plan
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     2ã€ã®æ©èœã®ã³ã³ããã¹ãã§ã©ã®ããã«ååžããŠãããã確èªãããšããŸãã ãããè¡ãããã«ã crosstab
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠåå²è¡šãäœæã§ããŸã ã 
 pd.crosstab(df['Churn'], df['International plan'])
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | åœéèšç» | ãã | ã¯ã | 
|---|---|---|
| ãã£ãŒã³ | ||
| 0 | 2664 | 186 | 
| 1 | 346 | 137 | 
 pd.crosstab(df['Churn'], df['Voice mail plan'], normalize=True)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | ãã€ã¹ã¡ãŒã«ãã©ã³ | ãã | ã¯ã | 
|---|---|---|
| ãã£ãŒã³ | ||
| 0 | 0.602460 | 0.252625 | 
| 1 | 0.120912 | 0.024002 | 
ã»ãšãã©ã®ãŠãŒã¶ãŒã¯å¿ å®ã§ãããåæã«è¿œå ã®ãµãŒãã¹ïŒåœéããŒãã³ã°/ãã€ã¹ã¡ãŒã«ïŒã䜿çšããŠããããšãããããŸãã
  Excelã®äžçŽãŠãŒã¶ãŒã¯ãããããããããããŒãã«ãªã©ã®æ©èœãæãåºãã§ãããã  Pandasã§ã¯ã pivot_table
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã¯ããããããŒãã«ãæ
åœãããã©ã¡ãŒã¿ãŒãšããŠåãåããŸãã 
-   
values
-å¿ èŠãªçµ±èšãèšç®ãã倿°ã®ãªã¹ãã -   
index
ããŒã¿ãã°ã«ãŒãåãã倿°ã®ãªã¹ãã -   
aggfunc
ã°ã«ãŒãããšã«å®éã«ã«ãŠã³ãããå¿ èŠããããã®-éãå¹³åãæå€§ãæå°ããŸãã¯ä»ã®äœãã 
ç°ãªãåžå€å±çªã®æ¥äžãå€éãå€éã®å¹³åé話æ°ãèŠãŠã¿ãŸãããã
 df.pivot_table(['Total day calls', 'Total eve calls', 'Total night calls'], ['Area code'], aggfunc='mean').head(10)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | åèšæ¥é話 | ç·éè©±æ° | åèšå€éé話 | |
|---|---|---|---|
| åžå€å±çª | |||
| 408 | 100.496420 | 99.788783 | 99.039379 | 
| 415 | 100.576435 | 100.503927 | 100.398187 | 
| 510 | 100.097619 | 99.671429 | 100.601190 | 
ããŒã¿ãã¬ãŒã ã®å€æ
Pandasã®ãã®ä»ã®æ©èœãšåæ§ã«ãDataFrameã«åã远å ããã«ã¯ããã€ãã®æ¹æ³ããããŸãã
 ããšãã°ããã¹ãŠã®ãŠãŒã¶ãŒã®åèšé話æ°ãèšç®ããŸãã  Seriesåã®ãªããžã§ã¯ãtotal_calls
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    äœæããããŒã¿ãã¬ãŒã ã«æ¿å
¥ããŸãã 
 total_calls = df['Total day calls'] + df['Total eve calls'] + \ df['Total night calls'] + df['Total intl calls'] df.insert(loc=len(df.columns), column='Total calls', value=total_calls) # loc -  ,      Series #   len(df.columns),       df.head()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
      äžéã·ãªãŒãºãäœæããã«ãæ¢åã®åããåã远å ã§ããŸãã
 df['Total charge'] = df['Total day charge'] + df['Total eve charge'] + df['Total night charge'] + df['Total intl charge'] df.head()
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
       åãŸãã¯è¡ãåé€ããã«ã¯ã drop
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã¡ãœããã䜿çšããŠãåŒæ°ãšããŠç®çã®ã€ã³ããã¯ã¹ãšaxis
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãã©ã¡ãŒã¿ãŒã®å¿
èŠãªå€ãæž¡ããŸãïŒåãåé€ããå Žåã¯1
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ãè¡ãåé€ããå Žåã¯0
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ãŸãã¯0
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
     ïŒïŒ 
 #       df = df.drop(['Total charge', 'Total calls'], axis=1) df.drop([1, 2]).head() #      
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      
      4.æµåºãäºæž¬ããæåã®è©Šã¿
æµåºããåœéããŒãã³ã°ã®æ¥ç¶ãïŒåœéèšç»ïŒãšããèšå·ã§ã©ã®ããã«æ¥ç¶ãããŠããããèŠãŠã¿ãŸãããã ãããè¡ãã«ã¯ã ã¯ãã¹ã¿ããããããã¬ãŒãã䜿çšãããšãšãã«ãSeabornã§èª¬æããŸãïŒãã®ãããªç»åãäœæãããããã䜿çšããŠã°ã©ãã£ãã¯ãåæããæ¹æ³ã¯ã次ã®èšäºã®è³æã§ãïŒã
 pd.crosstab(df['Churn'], df['International plan'], margins=True)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | åœéèšç» | åœ | æ¬åœ | å šéš | 
|---|---|---|---|
| ãã£ãŒã³ | |||
| 0 | 2664 | 186 | 2850 | 
| 1 | 346 | 137 | 483 | 
| å šéš | 3010 | 323 | 3333 | 
      ããŒãã³ã°ãæ¥ç¶ããããšãæµåºã·ã§ã¢ãã¯ããã«é«ããªãããšãããããŸã-è峿·±ã芳å¯çµæã§ãïŒ ãããããããŒãã³ã°ã«ãããå€é¡ã§ç®¡çãäžååãªè²»çšã¯ãéåžžã«çžåãããã®ã§ãããéä¿¡äºæ¥è ã®é¡§å®¢ã®äžæºãæãããããã£ãŠæµåºã«ã€ãªãããŸãã
次ã«ããã1ã€ã®éèŠãªå åã§ããããµãŒãã¹ã»ã³ã¿ãŒãžã®åŒã³åºãåæ°ãïŒé¡§å®¢ãµãŒãã¹åŒã³åºãïŒãèŠãŠãã ãã ã ããããããŒãã«ãšç»åãäœæããŸãã
 pd.crosstab(df['Churn'], df['Customer service calls'], margins=True)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | ã«ã¹ã¿ããŒãµãŒãã¹ã³ãŒã« | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | å šéš | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| ãã£ãŒã³ | |||||||||||
| 0 | 605 | 1059 | 672 | 385 | 90 | 26 | 8 | 4 | 1 | 0 | 2850 | 
| 1 | 92 | 122 | 87 | 44 | 76 | 40 | 14 | 5 | 1 | 2 | 483 | 
| å šéš | 697 | 1181 | 759 | 429 | 166 | 66 | 22 | 9 | 2 | 2 | 3333 | 
      ãããããã¬ãŒãã§ã¯ããã»ã©ã¯ã£ãããšèŠããªããããããŸããïŒãŸãã¯æ°åã®ããç·ã«æ²¿ã£ãŠç®ãã¯ããŒã«ããã®ã¯éå±ã§ãïŒããåçã¯ããµãŒãã¹ã»ã³ã¿ãŒãžã®4åã®åŒã³åºãããæµåºã®å²åãå€§å¹ ã«å¢å ããããšãéåŒã«ç€ºããŠããŸãã
 ããã§ãDataFrameã«ãã€ããªèšå·ã远å ããŸããããã¯ã Customer service calls > 3
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã®æ¯èŒã®çµæCustomer service calls > 3
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
    ã§ãã ãããŠåã³ããããã©ã®ããã«æµåºã«é¢é£ä»ããããŠããããèŠãŠã¿ãŸãããã 
 df['Many_service_calls'] = (df['Customer service calls'] > 3).astype('int') pd.crosstab(df['Many_service_calls'], df['Churn'], margins=True)
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | ãã£ãŒã³ | 0 | 1 | å šéš | 
|---|---|---|---|
| Many_service_calls | |||
| 0 | 2721 | 345 | 3066 | 
| 1 | 129 | 138 | 267 | 
| å šéš | 2850 | 483 | 3333 | 
      äžèšã®æ¡ä»¶ãçµã¿åãããŠããã®çµåãšæµåºã®èŠçŽãã¬ãŒããäœæããŸãã
 pd.crosstab(df['Many_service_calls'] & df['International plan'] , df['Churn'])
      
      
        
        
        
      
    
        
        
        
      
      
        
        
        
      
    
      
      | ãã£ãŒã³ | 0 | 1 | 
|---|---|---|
| row_0 | ||
| åœ | 2841 | 464 | 
| æ¬åœ | 9 | 19 | 
ããã¯ããµãŒãã¹ã»ã³ã¿ãŒãžã®ã³ãŒã«æ°ã3ãè¶ ããŠãããããŒãã³ã°ãæ¥ç¶ãããŠããå ŽåïŒããã³ãã€ã€ã«ãã£ãäºæž¬ããå Žå-ãã以å€ïŒã«ã¯ã©ã€ã¢ã³ãã®æµåºãäºæž¬ããããšãæå³ããŸãã ãã®85.8ïŒ ã¯ãéåžžã«åçŽãªæšè«ã§åŸããããã®ã§ãããããããæ§ç¯ããæ©æ¢°åŠç¿ã¢ãã«ã®ããŒã¹ã©ã€ã³ãšããŠé©ããŠããŸã ã
äžè¬ã«ãæ©æ¢°åŠç¿ã®ç»å Žåã¯ãããŒã¿åæããã»ã¹ã¯æ¬¡ã®ããã«èŠããŠããŸããã èŠçŽãããšïŒ
- ãµã³ãã«ã§ã®å¿ å®ãªé¡§å®¢ã®å²åã¯85.5ïŒ ã§ãã ãã®ãããªããŒã¿ã«å¯ŸããŠã顧客ã¯åžžã«å¿ å®ãã§ãããšããçããæãçŽ æŽãªã¢ãã«ã¯ãçŽ85.5ïŒ ã®ã±ãŒã¹ã§æšæž¬ãããŸãã ã€ãŸããåŸç¶ã®ã¢ãã«ã®æ£è§£ã®ã·ã§ã¢ïŒ æ£ç¢ºåºŠ ïŒã¯ãå°ãªããšããã®æ°åããå°ãªããªãã¯ãã§ããããã®æ°åãããããªãé«ããªããã°ãªããŸããã
 - 次ã®åŒã§æ¡ä»¶ä»ãã§è¡šçŸã§ããåçŽãªäºæž¬ã®å©ããåããŠïŒãåœéèšç»= TrueïŒã«ã¹ã¿ããŒãµãŒãã¹ã³ãŒã«> 3 =>ãã£ãŒã³= 1ãããã§ãªããã°ãã£ãŒã³= 0ãã85.8ïŒ ã®æšæž¬çãäºæ³ã§ããŸããããã¯85.5ïŒ ãããããã«é«ãã§ãã ãã®åŸããã·ãžã§ã³ããªãŒã«ã€ããŠèª¬æããå ¥åããŒã¿ã®ã¿ã«åºã¥ããŠãã®ãããªã«ãŒã«ãèªåçã«èŠã€ããæ¹æ³ãèŠã€ããŸãã
 - æ©æ¢°åŠç¿ãªãã§ããã2ã€ã®ããŒã¹ã©ã€ã³ãåãåãããããã¯åŸç¶ã®ã¢ãã«ã®éå§ç¹ãšããŠæ©èœããŸãã ãã®ãããåªåã§ãæ£è§£ã®ã·ã§ã¢ãå šäœã§0.5ïŒ å¢ããããšã倿ããå Žåãããããäœãééã£ãããšãããŠããã®ã§ã2ã€ã®æ¡ä»¶ã®åçŽãªã¢ãã«ã«å¶éããã ãã§ååã§ãã
 - è€éãªã¢ãã«ããã¬ãŒãã³ã°ããåã«ãããŒã¿ãå°ãããã£ãŠãåçŽãªä»®å®ã確èªããããšããå§ãããŸãã ããã«ãæ©æ¢°åŠç¿ã®ããžãã¹ã¢ããªã±ãŒã·ã§ã³ã§ã¯ãã»ãšãã©ã®å ŽåãåçŽãªãœãªã¥ãŒã·ã§ã³ããå§ããŠãè€éããå®éšããŸãã
 
5.宿é¡â1
ããã«ãã³ãŒã¹ã¯è±èªã§è¡ãããŸãïŒã¡ãã£ã¢ã«é¢ããèšäºããããŸãïŒã æ¬¡ã®æã¡äžãã¯2018幎10æ1æ¥ã§ãã
ãŠã©ãŒã ã¢ãã/ãã¬ãŒãã³ã°ã«ã€ããŠã¯ããã³ãã䜿çšããŠäººå£çµ±èšããŒã¿ãåæããããšããå§ãããŸãã Jupyterã®ç©ºçœã«äžè¶³ããŠããã³ãŒããå ¥åãã Webãã©ãŒã ã§æ£ããåçãéžæããå¿ èŠããããŸãïŒããã«ã解決çããããŸãïŒã
6.æçšãªãªãœãŒã¹ã®æŠèŠ
- ãã®èšäºã®è±èªç¿»èš³- äžè©±
 - ãã®èšäºã«åºã¥ãè¬çŸ©ã®ãããª
 - ãŸãæåã«ããã¡ããã ãã³ãã®å ¬åŒããã¥ã¡ã³ã ã ç¹ã«ã ãã³ãã«ã¯10åã®çã玹ä»ããå§ãããŸã
 - æžç±ãLearning pandasã+ ãªããžããªã®ãã·ã¢èªèš³
 - PDFãããŒãããã©ã€ãã©ãª
 - ã¢ã¬ã¯ãµã³ããŒã»ãã£ã¢ã³ããã«ãããã¬ãŒã³ããŒã·ã§ã³ããã³ããšã®ç¥ãåãã
 - äžé£ã®æçš¿ãã¢ãã³ãã³ããïŒè±èªïŒ
 - githubã«ã¯ãPandasã®ãšã¯ãµãµã€ãºãšå¥ã®äŸ¿å©ãªãªããžããªïŒè±èªïŒãEffective PandasãããããŸã
 - scipy - lectures.org-ãã³ããnumpyãmatplotlibãscikit-learnã®æäœã«é¢ãããã¥ãŒããªã¢ã«
 - Pandas From The Ground Up-ãããªfrom PyCon 2015
 
ãã®èšäºã¯ã yorko ïŒYuri KashnitskyïŒãšå ±åå·çããŸãã ã