ä»æ¥ã¯èªç¶èšèªã®ãããªèå³æ·±ã話é¡ã«è§ŠããŸã ã çŸåšããã®åéã«å€ãã®ãéãæè³ãããŠãããå€ãã®ããŸããŸãªåé¡ã解決ããŠããŸãã æ¥çã ãã§ãªããç§åŠçã®æ³šç®ãéããŠããŸãã
è»ã¯èããããšãã§ããŸããïŒ
ç 究è ã¯ãèªç¶èšèªåæãåºæ¬çãªè³ªåã«é¢é£ä»ããŸããæ©æ¢°ã¯èããããšãã§ããŸããïŒ æåãªå²åŠè ã«ãã»ãã«ã«ãã¯æããã«åŠå®çãªçããäžããŸããã XVIIäžçŽã®æè¡ã®çºå±ã¬ãã«ãèãããšãé©ãããšã§ã¯ãããŸããã ãã«ã«ãã¯ãæ©æ¢°ã¯ã©ã®ããã«èããŠããã®ããããããèããããšã決ããŠåŠã°ãªããšä¿¡ããŠããŸããã æ©æ¢°ã¯ãèªç¶ãªé³å£°ã䜿çšããŠäººãšéä¿¡ããããšã¯ã§ããŸããã ç§ãã¡ã圌女ã«åèªã®äœ¿çšæ¹æ³ãšçºé³æ¹æ³ã説æããŠããããã¯ãŸã èšæ¶ããããã¬ãŒãºãæšæºçãªçãã§ã-ãã·ã³ã¯ããããè¶ ããŸããã
ãã¥ãŒãªã³ã°è©Šéš
ããããäœå¹Žãçµã¡ããã¯ãããžãŒã¯å€§ããå€åãã20äžçŽã«ã¯ãã®åé¡ãåã³éèŠã«ãªããŸããã 1950幎ã®æåãªç§åŠè ã¢ã©ã³ãã¥ãŒãªã³ã°ã¯ãæ©æ¢°ãèããããªãããšãçãããã¹ãã®ããã«åœŒã®æåãªãã¹ããæäŸããŸããã
äŒèª¬ã«ãããšããã¹ãã®ã¢ã€ãã¢ã¯ãåŠçããŒãã£ãŒã§å®è·µãããã²ãŒã ã«åºã¥ããŠããŸãã äŒç€Ÿã®2人ã®ç·ïŒç·ãšå¥³ïŒãå¥ã ã®éšå±ã«å ¥ããæ®ãã®äººã¯ã¡ã¢ã䜿ã£ãŠåœŒããšããåãããŸããã ãã¬ãŒã€ãŒã®ä»äºã¯ã圌ãã誰ãæ±ã£ãŠããããæšæž¬ããããšã§ããïŒç·æ§ãŸãã¯å¥³æ§ã ãããŠã女ã®åãšäžç·ã®ç·ã¯ãä»ã®ãã¬ã€ã€ãŒãæããããã«ãäºãã®ãµããããŸããã ãã¥ãŒãªã³ã°ã¯ããªãç°¡åãªä¿®æ£ãå ããŸããã 圌ã¯ãé ããããã¬ãŒã€ãŒã®1人ãã³ã³ãã¥ãŒã¿ãŒã«çœ®ãæããåå è ã«ã人ãæ©æ¢°ãšå¯Ÿè©±ããçžæãèªèããããæåŸ ããŸããã
ãã¥ãŒãªã³ã°ãã¹ãã¯ãåäžçŽä»¥äžåã«çºæãããŸããã ããã°ã©ããŒã¯ã圌ãã®é è³ããã¹ãã«åæ Œãããšç¹°ãè¿ãè¿°ã¹ãŠããŸãã æ¯åããããçå®ãã©ããã«ã€ããŠãç©è°ãéžãèŠä»¶ãšçåãçããŸããã 誰ããã¡ã€ã³ã®ãã¥ãŒãªã³ã°ãã¹ãã管çãããã©ããã®å ¬åŒã®ä¿¡é Œã§ããããŒãžã§ã³ã¯ãããŸããã ãã®ããªãšãŒã·ã§ã³ã®ããã€ãã¯å®éã«æ£åžžã«å®äºããŠããŸãã
ãžã§ãŒãžã¿ãŠã³å®éš
1954幎ã ãžã§ãŒãžã¿ãŠã³ã®å®éšã¯åæ ŒããŸããã ãã·ã¢èªãããã©ã³ã¹èªã«60æãèªåçã«ç¿»èš³ããã·ã¹ãã ã瀺ããŸããã äž»å¬è ã¯ããã£ã3幎ã§ã°ããŒãã«ãªç®æšãéæããããšã確信ããŠããŸãããæ©æ¢°ç¿»èš³ã®åé¡ãå®å šã«è§£æ±ºããã§ãããã ãããŠæšãã«å€±æããŸããã 12幎åŸãããã°ã©ã ã¯çµäºããŸããã 誰ããã®åé¡ã解決ããããšãã§ããŸããã§ããã
çŸä»£ã®èŠ³ç¹ããèšãã°ãäž»ãªåé¡ã¯å°æ°ã®ææ¡ã§ããã ãã®ããŒãžã§ã³ã§ã¯ãã¿ã¹ã¯ã解決ããããšã¯ã»ãšãã©äžå¯èœã§ãã ãŸããå®éšè ã6äžããããã¯600äžã®æç« ã§å®éšãå®æœããå Žåã圌ãã¯ãã£ã³ã¹ãåŸãã§ãããã
æåã®ãã£ããããã
1960幎代ãæåã®ãã£ããããããç»å ŽããŸããããããã¯éåžžã«åå§çãªãã®ã§ãããåºæ¬çã«ãä»ã®äººã話ããããšãèšãçŽããŸããã ææ°ã®ãã£ãããããã¯å ç¥ããããã»ã©é ããããŸããã ãã¥ãŒãªã³ã°ãã¹ãã®ããŒãžã§ã³ã®1ã€ã«åæ ŒãããšèããããŠããæåãªãã£ãããããã®Zhenya Gustmanã§ããã cãªã¢ã«ãŽãªãºã ã®ãããã§ã¯ãããŸããã§ããã æŒæã¯ã¯ããã«æçšã§ããïŒèè ã¯åœŒã®æ§æ ŒãããèããŸããã
圢åŒçãªã³ãããžãŒã ãã§ã ã¹ããŒææ³çè«
ããããæ£åŒãªæ¹æ³ã®æ代ãæ¥ãŸããã ããã¯äžççãªåŸåã§ããã ç§åŠè ã¯ããã¹ãŠã圢åŒåãã圢åŒã¢ãã«ããªã³ãããžãŒãæŠå¿µãé¢ä¿ãæ§æ解æã®äžè¬çãªèŠåããã³æ®éçãªææ³ãæ§ç¯ããããšããŸããã 次ã«ããã§ã ã¹ããŒææ³ã®çè«ãç»å ŽããŸããã ãããã¯ãã¹ãŠéåžžã«çŸããèŠããŸããããå€ãã®éªšã®æããæäœæ¥ãå¿ èŠã ã£ããããé©åãªå®çšçãªã¢ããªã±ãŒã·ã§ã³ã«ã¯éããŸããã§ããã ãããã£ãŠã1980幎代ã«ã¯ãæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãšããããã³ãŒãã¹èšèªåŠãšããå¥ã®ã¯ã©ã¹ã®ã·ã¹ãã ã«æ³šç®ã移ããŸããã
æ©æ¢°åŠç¿ãšã±ãŒã¹èšèªåŠ
ã³ãŒãã¹èšèªåŠã®äž»ãªã¢ã€ãã¢ã¯äœã§ããïŒ ååãªå€§ããã®ããã¥ã¡ã³ãã®ã³ã¬ã¯ã·ã§ã³ã§ããè»å£ãçµã¿ç«ãŠãæ©æ¢°åŠç¿æ³ãšçµ±èšåæã䜿çšããŠãåé¡ã解決ããã·ã¹ãã ãæ§ç¯ããããšããŸãã
1990幎代ããã®é åã¯ãæ€çŽ¢ãå¿ èŠãªã«ã¿ãã°åãå¿ èŠãªãæ§é åãããŠããªãå€æ°ã®ããã¹ããå«ãWorld Wide Webã®éçºã«ãããéåžžã«åŒ·åãªæšé²åãåããŸããã 2000幎代ã«ã¯ãèªç¶èšèªåæãã€ã³ã¿ãŒãããã®æ€çŽ¢ã ãã§ãªããããŸããŸãªåé¡ã®è§£æ±ºã«ãé©çšãããããã«ãªããŸããã ããã¹ããå«ã倧èŠæš¡ãªããŒã¿ã»ãããç»å Žããå€ãã®ããŸããŸãªããŒã«ããããäŒæ¥ã¯ããã«å€ãã®ãéãæè³ãå§ããŸããã
çŸä»£ã®ãã¬ã³ã
ä»äœãèµ·ãã£ãŠããŸããïŒ èªç¶èšèªã®åæã§ç¢ºèªã§ããäž»ãªåŸåã¯ãæåž«ãªãã§æè²ã¢ãã«ãç©æ¥µçã«äœ¿çšããããšã§ãã ãããã䜿çšãããšãäºåã«æ±ºããããã«ãŒã«ãªãã§ã³ãŒãã¹ã®ããã¹ãã®æ§é ãèå¥ã§ããŸãã ãããªãã¯ãã¡ã€ã³ã«ã¯ãããŒã¯ä»ãã§ãããã©ããã«é¢ä¿ãªããå質ã®ç°ãªãå€æ°ã®æé ãªäŸ¡æ Œã®ã±ãŒã¹ãå€æ°ãããŸãã ã¯ã©ãŠããœãŒã·ã³ã°ã«åºã¥ããã¢ãã«ããããŸãããç§ãã¡ã¯æ©æ¢°ã䜿ã£ãŠäœããç解ããããšããŠããã ãã§ãªããããã¹ããæžãããŠããèšèªãããããªè²»çšã§æ±ºå®ãã人ã ãã€ãªãã§ããŸãã ããæå³ã§ã¯ãæ£åŒãªãªã³ãããžãŒã䜿çšãããšããèãæ¹ã¯åŸ©æŽ»ãå§ããŸããããçŸåšã§ã¯ãªã³ãããžãŒã¯ã¯ã©ãŠããœãŒã¹ã®ç¥èããŒã¹ãç¹ã«Linked Open Dataã«åºã¥ãããŒã¿ããŒã¹ãäžå¿ã«å±éããŠããŸãã ããã¯ãã¬ããžããŒã¹ã®ã»ããå šäœã§ããããã®äžå¿ã¯ãã¯ã©ãŠããœãŒã·ã³ã°ã¢ãã«ãå®è£ ãããŠããWikipedia DBpediaã®æ©æ¢°å¯èªããŒãžã§ã³ã§ãã äžçäžã®äººã ãããã«äœããè¿œå ã§ããŸãã
çŽ6幎åãNLPïŒèªç¶èšèªåŠçïŒã¯äž»ã«ä»ã®åéã®æè¡ãšæ¹æ³ãåžåããŠããŸãããããããŠãããããšã¯ã¹ããŒããå§ããŸããã èªç¶èšèªåæã®åéã§éçºãããæ¹æ³ã¯ãä»ã®åéã§ããŸãé©çšããå§ããŠããŸãã ãããŠããã¡ããããã£ãŒãã©ãŒãã³ã°ããªããã°ã©ãã§ïŒ çŸåšãèªç¶èšèªãåæãããšãããã£ãŒããã¥ãŒã©ã«ãããã¯ãŒã¯ã䜿çšããå§ããŠããããããŸã§ã®ãšããããŸããŸãªæåãåããŠããŸãã
NLPãšã¯äœã§ããïŒ ããã¯ãNLPãç¹å®ã®ã¿ã¹ã¯ã§ãããšèšãããšã§ã¯ãããŸããã NLPã¯ãããŸããŸãªã¬ãã«ã®èšå€§ãªã¿ã¹ã¯ã§ãã ããšãã°ã詳现ã¬ãã«ããšã«ã次ã®ããã«åé¡ã§ããŸãã
ä¿¡å·ã¬ãã«ã§ã¯ãå ¥åä¿¡å·ãå€æããå¿ èŠããããŸãã ããã¯ãã¹ããŒããåçš¿ãå°å·ãããã¹ãã£ã³ããã¹ããªã©ã§ãã ãã·ã³ã§äœ¿çšã§ããæåã§æ§æãããã¬ã³ãŒãã«å€æããå¿ èŠããããŸãã
次ã¯åèªã¬ãã«ã§ãã ç§ãã¡ã®ä»äºã¯ãäžè¬çãªèšèãããããšãç解ãããã®åœ¢æ åŠçåæãè¡ãããšã©ãŒãããã°ãããä¿®æ£ããããšã§ãã ã³ãã±ãŒã·ã§ã³ã®ã¬ãã«ã¯å°ãé«ããªã£ãŠããŸãã ãã®äžã«ã決å®ããå¿ èŠãããåè©ã衚瀺ãããååä»ããšã³ãã£ãã£ãèªèããã¿ã¹ã¯ãçºçããŸãã äžéšã®èšèªã§ã¯ãåèªã匷調衚瀺ããã¿ã¹ã¯ãç°¡åã§ã¯ãããŸããã ããšãã°ããã€ãèªã§ã¯ãåèªéã«å¿ ãããã¹ããŒã¹ãå¿ èŠãšããããã§ã¯ãªããé·ãã¬ã³ãŒãããåèªãåé¢ã§ããå¿ èŠããããŸãã
ãã¬ãŒãºãã圢æãããæã§ãã æãçåèŠããå Žåã¯ãå¿ èŠã«å¿ããŠåèªã®ææ§ããæé€ããããã«ãæã«ã¯æ§æ解æãè¡ããçããå®åŒåããããã«ããããã匷調ããå¿ èŠããããŸãã
ãããã®ã¿ã¹ã¯ã¯ãæ§æ解æãšçæã«é¢é£ãããã®ã®2ã€ã®æ¹åã«é²ãããšã«æ³šæããŠãã ããã ç¹ã«ã質åã«å¯ŸããçããèŠã€ãã£ãå Žåããããèªã人ã®èŠ³ç¹ããé©åã«èŠããææ¡ãäœæãã質åã«çããå¿ èŠããããŸãã
æã¯æ®µèœã«ã°ã«ãŒãåãããŠããŸããããã§ã¯ããªã³ã¯ã解決ããç°ãªãæã§èšåããããªããžã§ã¯ãéã®é¢ä¿ã確ç«ãããšããåé¡ãçããŸãã
段èœã䜿çšãããšãæ°ããåé¡ã解決ã§ããŸããããã¹ãã®ææ çãªè²ä»ããåæããããã¹ãã®èšèªã決å®ããŸãã
段èœã¯ããã¥ã¡ã³ãã圢æããŸã ã ãã®ã¬ãã«ã§ã¯ãæãèå³æ·±ãã¿ã¹ã¯ãæ©èœããŸãã ç¹ã«ãã»ãã³ãã£ãã¯åæïŒããã¥ã¡ã³ããšã¯äœã§ããïŒïŒãèªå泚éããã³èªåèŠçŽã®çæãããã¥ã¡ã³ãã®ç¿»èš³ããã³äœæã ããç¥ãããŠããSCIgenã®ç§åŠèšäºãžã§ãã¬ãŒã¿ãŒãç¥ã£ãŠããã¯ãã§ãããã®èšäºãžã§ãã¬ãŒã¿ãŒã¯ããThe RooterïŒAn Algorithm for the Unified and Access Point and RedundancyããšããèšäºãäœæããŸããã SCIgenã¯å®æçã«ç§åŠéèªã®ç·šéå§å¡äŒããã¹ãããŠããŸãã
ããããäœå šäœã«é¢é£ããã¿ã¹ã¯ããããŸãã ç¹ã«ãèšå€§ãªéã®ããã¥ã¡ã³ããéè€æé€ããã«ã¯ããã®äžã®æ å ±ãæ¢ããŸãã
ã¿ã¹ã¯ã®äŸïŒOK.RUã®æçš¿ã誰ã«è¡šç€ºããŸããïŒ
ããšãã°ãOdnoklassnikiãšããŠç¥ãããŠããOK.RUãããžã§ã¯ãã«ã¯ãã¹ããªãŒã å ã®ã³ã³ãã³ããã©ã³ã¯ä»ãããã¿ã¹ã¯ããããŸãã å人ãã°ã«ãŒãã®èª°ããæçš¿ãè¡ããŸãããç¹ã«å人ãæçš¿ããã¡ã¢ãèæ ®ãããšãååãšããŠãã®ãããªæçš¿ãå€æ°ãããŸãã ããªãã«æé©ãªã¬ã³ãŒããããŸããŸãªã¬ã³ãŒãããéžæããå¿ èŠããããŸãã 課é¡ãšæ©äŒã¯äœã§ããïŒ
倧èŠæš¡ãªããŒã¿ã»ãããããããã§ã«20åãè¶ ããæçš¿ãããã1æ¥ã«æ°çŸäžã®æ°ããæçš¿ã衚瀺ãããå ŽåããããŸãã çŽ40ã®èšèªãèšé²ã«ãããŸããããã®äžã«ã¯ããªãç 究ãäžååãªãã®ãå«ãŸããŠããŸãã ããã¥ã¡ã³ãã«ã¯å€ãã®ãã€ãºããããŸãã ãããã¯ãã¥ãŒã¹ãç§åŠèšäºã§ã¯ãªããæ®éã®äººã ã«ãã£ãŠæžãããŠããŸã-ãšã©ãŒãã¿ã€ããã¹ãã¹ã©ã³ã°ãã¹ãã ãã³ããŒã¢ã³ãããŒã¹ããè€è£œããããŸãã å ±åãã£ã«ã¿ãªã³ã°ã«åºã¥ããå€ãã®ã¡ãœããããã§ã«ããå Žåããªãã³ã³ãã³ããåæããããšããã®ã§ããããïŒ ããããããŒãã®å Žåããã®ãããªæšå¥šäºé ã¯ããŸãæ©èœããŸãããããã§ã¯åžžã«ã³ãŒã«ãã¹ã¿ãŒãã®ç¶æ³ãçºçããŸãã æ°ããæœèšããããŸãã 誰ãšã©ã®ããã«åœŒãšäº€æµãããã¯ãŸã ã»ãšãã©ããããŸãããã誰ã«èŠãããã誰ã«èŠããªããããã§ã«æ±ºããªããã°ãªããŸããã ãã®ãããã³ãŒã«ãã¹ã¿ãŒãã¿ã¹ã¯ã«åŸæ¥ã®ã¯ãŒã¯ã©ãŠã³ããé©çšããã³ã³ãã³ãã®æšå¥šäºé ã®ã·ã¹ãã ãæ§ç¯ããŸããæçš¿ãäœã«ã€ããŠæžãããŠããããç解ããããã«ãã·ã³ã«æããŸãããã
é ãããã®åé¡
é³¥ç°å³ããåé¡ãèŠãŠãã¡ã€ã³ãããã¯ã匷調ããŸãã ãŸããã³ãŒãã¹ã¯å€èšèªã§ãããããæåã«ããã¥ã¡ã³ãã®èšèªãèŠã€ããŸãã ã«ã¹ã¿ã ããã¹ãã«ã¿ã€ããã¹ãå«ãŸããŠããŸãã ãããã£ãŠãããã¹ããæšæºåœ¢åŒã«ããããã«ãããçš®ã®ã¿ã€ããã¹ä¿®æ£ããã°ã©ã ãå¿ èŠã§ãã ããã«ããã¥ã¡ã³ããæäœããã«ã¯ããããããã¯ãã«åã§ããå¿ èŠããããŸãã ã±ãŒã¹ã«ã¯å€ãã®éè€ãããããã éè€æé€ãªãã§ã¯ã§ããŸããã ããããæãèå³æ·±ãã®ã¯ããã®æçš¿ã®å 容ãç¥ããããšããããšã§ãã ãããã£ãŠã æå³è§£ææ¹æ³ãå¿ èŠã§ãã ãããŠããªããžã§ã¯ããšãããã¯ã«å¯Ÿããèè ã®æ 床ãç解ããããšæããŸãã ããã§ã¯ãææ çãªè²ã®åæã圹ç«ã¡ãŸãã
èšèªå®çŸ©
é çªã«å§ããŸãããã èšèªã®å®çŸ©ã æåž«ãšã®æšæºçãªæ©æ¢°åŠç¿æè¡ã䜿çšããŸãã èšèªå¥ã«åé¡ãããã³ãŒãã¹ãäœæããåé¡åšããã¬ãŒãã³ã°ããŸãã ååãšããŠãåçŽãªçµ±èšåé¡åã¯éåžžã«ããŸãæ©èœããŸãã ãããã®åé¡åã®èšå·ãšããŠãéåžžNã°ã©ã ãã€ãŸãNïŒããšãã°3ïŒé£ç¶ããæåã®ã·ãŒã±ã³ã¹ã䜿çšãããŸãã ææžå ã®ã·ãŒã±ã³ã¹ã®ååžã®ãã¹ãã°ã©ã ãäœæãããããã«åºã¥ããŠèšèªã決å®ãããŸãã ããé«åºŠãªã¢ãã«ã§ã¯ãç°ãªã次å ã®N-gramã䜿çšã§ããŸãããŸããæè¿ã®éçºãããå¯å€é·ã®N-gramããŸãã¯èè ãããããåŒã³åºããã€ã³ãã£ãã°ã©ã ã«æ³šç®ããŠããŸãã
ã¿ã¹ã¯ã¯ããªãå€ããããå€ãã®æ¢è£œã®äœæ¥ããŒã«ããããŸãã ç¹ã«ãããã¯æ¥æ¬èªã®æ€åºã©ã€ãã©ãªã§ããApache Tikaã§ãããææ°ã®éçºã®1ã€ã¯Pythonããã±ãŒãžLdig㧠ãããã¯infinigramã§ã®ã¿åäœããŸãã
ãããã®æ¹æ³ã¯ãååã«å€§ããããã¹ãã«é©ããŠããŸãã 段èœãŸãã¯å°ãªããšã5ã€ã®æãããå Žåãèšèªã¯99ïŒ ä»¥äžã®ç²ŸåºŠã§æ±ºå®ãããŸãã ããããããã¹ããçãå Žåã1ã€ã®æãŸãã¯è€æ°ã®åèªããããã©ã€ã°ã©ã ã«åºã¥ãå€å žçãªã¢ãããŒãã¯éåžžã«ãã°ãã°èª€è§£ãããŸãã Infinigramsã¯ç¶æ³ãä¿®æ£ã§ããŸãããããã¯æ°ããåéã§ããããã¹ãŠã®èšèªãšã¯ç°ãªãããã§ã«èšç·Žããæºåãããåé¡åšããããŸãã
æ£èŠåœ¢åŒãžã®å€æ
ããã¹ãã®èšèªãå®çŸ©ããŸããã æ£èŠã®åœ¢åŒã«ããå¿ èŠããããŸãã ãªãã§ïŒ ããã¹ãåæã®éèŠãªãªããžã§ã¯ãã®1ã€ã¯èŸæžã§ãããã¢ã«ãŽãªãºã ã®è€éãã¯ãã°ãã°ãã®ãµã€ãºã«äŸåããŸãã ã³ãŒãã¹ã§äœ¿çšããããã¹ãŠã®åèªãåããŸãã ã»ãšãã©ã®å Žåãæ°åèªããŸãã¯æ°åèªã«ãªããŸãã ãããããã詳ããèŠããšãå®éã«ã¯ãããã¯å¿ ãããå¥åã®åèªã§ã¯ãªããåèªã®åœ¢åŒããšã©ãŒã®ããåèªãèŠã€ããããšããããŸãã èŸæžã®ãµã€ãºïŒããã³èšç®ã®è€éãïŒãæžãããå€ãã®ã¢ãã«ã®äœæ¥ã®å質ãåäžãããããã«ãåèªãæšæºåœ¢åŒã«ããŸãã
æåã«ãã°ãšã¿ã€ããã¹ãä¿®æ£ããŸãã ãã®é åã«ã¯2ã€ã®ã¢ãããŒãããããŸãã 1ã€ç®ã¯ãããããé³å£°ãããã³ã°ã«åºã¥ããŠããŸãã ããã圌ã®äž»ãªã¢ã€ãã¢ã§ãã ãªã人ã¯ééã£ãŠããã®ã§ããïŒ åœŒã¯èãããšããã«èšèãæžãããã§ãã æ£ããåèªãšãšã©ãŒã®ããåèªãåããäž¡æ¹ãã©ã®ããã«èãããçºé³ãããããæžãçãããšãåããªãã·ã§ã³ãåŸãããŸãã ãããã£ãŠããšã©ãŒã¯åæã«åœ±é¿ããªããªããŸãã
å¥ã®ã¢ãããŒãã¯ãããããç·šéè·é¢ã§ãããããã䜿çšããŠãæãé¡äŒŒããé¡äŒŒèªãèŸæžã§èª¿ã¹ãŸãã ç·šéè·é¢ã¯ãããåèªãå¥ã®åèªã«ãã°ããå€æããããã«å¿ èŠãªå€æŽæäœã®æ°ã決å®ããŸãã å¿ èŠãªæäœãå°ãªãã»ã©ãããå€ãã®åèªãé¡äŒŒããŸãã
ããã§ããšã©ãŒãä¿®æ£ããŸããã ããã§ããåããã·ã¢èªã§ã¯ãåèªã«ã¯ãããŸããŸãªèªå°Ÿãæ¥é èŸãæ¥å°ŸèŸãä»ããèšå€§ãªæ°ã®æ£ããåèªåœ¢åŒããããŸãã ãã®èŸæžã¯éåžžã«å€ãççºããŸãã èšèãäž»ãªåœ¢ã«ããå¿ èŠããããŸãã ãããŠã2ã€ã®æŠå¿µããããŸãã
æåã®æŠå¿µã¯ã¹ããã³ã°ã§ãåèªã®åºç€ãèŠã€ããããšããŠããŸãã èšèªåŠè ã¯è°è«ããããšãã§ããŸããããããæ ¹ã§ãããšèšããŸãã ããã¯ãæ¥èŸã¹ããªããã³ã°ã¢ãããŒãã䜿çšããŸãã äž»ãªã¢ã€ãã¢ã¯ãåèªã®æåŸããå§ããŸã§ãåèªããšã«ã«ããããããšã§ãã èªå°Ÿãæ¥é èŸãæ¥å°ŸèŸãåé€ããŸãããã®çµæãäž»èŠéšåã®ã¿ãæ®ããŸãã ããç¥ãããŠããå®è£ ãããããPorter StemmerããŸãã¯Snowballãããžã§ã¯ãããããŸãã ã¢ãããŒãã®äž»ãªåé¡ïŒèšèªåŠè ã¯ã¹ãããŒã®ã«ãŒã«ãèšå®ããŸãããããã¯éåžžã«é£ããä»äºã§ãã æ°ããèšèªãæ¥ç¶ããåã«ãèšèªç 究ãå¿ èŠã§ãã
ããŸããŸãªã¢ãããŒãããããŸãã èŸæžãæ€çŽ¢ããããæåž«ãªãã§æåž«ä»ãã¢ãã«ãæ§ç¯ããããé ããã«ã³ãé£éã«åºã¥ã確çã¢ãã«ãæ§ç¯ããããåèªãçž®å°ãã圢ã«å€æãããã¥ãŒã©ã«ãããã¯ãŒã¯ããã¬ãŒãã³ã°ããŸãã
ã¹ã¿ã³ãã³ã°ã¯é·ãé䜿çšãããŠããŸããã 2000幎代åæããGoogleã§ã ããããæãäžè¬çãªããŒã«ã¯ã Apache Luceneããã±ãŒãžã®å®è£ ã§ãã ããããã¹ããã³ã°ã«ã¯æ¬ é¥ããããŸãã åèªãæåŸãŸã§ã«ãããããšãäžéšã®æ å ±ã倱ãããŸãã ç§ãã¡ã¯ã«ãŒãããæã£ãŠããªãã®ã§ã圢容è©ãåè©ãã«é¢ããããŒã¿ã倱ãå¯èœæ§ããããŸãã ãããŠãæã«ã¯ãã以äžã®ã¿ã¹ã¯ãèšå®ããããšãéèŠã§ãã
2çªç®ã®æŠå¿µã¯ãèªå¹¹æœåºã®ä»£æ¿æ¡ã§ããã è£é¡åã§ãã 圌女ã¯ãèªãåºæ¬ãŸãã¯èªæ ¹ã§ã¯ãªããåºæ¬çãªèªåœåœ¢åŒãã€ãŸãè£é¡ã«æã£ãŠè¡ãããšããŠããŸãã ããšãã°ãåè©-äžå®è©ãžã å€ãã®å®è£ ããããããŒãã¯ãŠãŒã¶ãŒçæããã¹ãããŠãŒã¶ãŒã®ããããããã¹ãçšã«éåžžã«ããŸãèšèšãããŠããŸãã ãã ããæšæºåœ¢åŒãžã®ãã£ã¹ãã¯äŸç¶ãšããŠå°é£ã§ããããŸã å®å šã«ã¯è§£æ±ºãããŠããŸããã
ãã¯ãã«å
æšæºåœ¢åŒã«å°ãããŸãã ã»ãŒãã¹ãŠã®æ°åŠã¢ãã«ã倧ããªæ¬¡å ã®ãã¯ãã«ç©ºéã§æ©èœãããããããããã¯ãã«ç©ºéã§è¡šç€ºããŸãã å€ãã®ã¢ãã«ã䜿çšããåºæ¬çãªã¢ãããŒãã¯ã ã¯ãŒãããã°æ³ã§ãã 次å ãèŸæžã®ãµã€ãºã«çããããã¥ã¡ã³ãã®ç©ºéã«ãã¯ãã«ã圢æããŸãã åãã£ã¡ã³ã·ã§ã³ã«ã¯ç¬èªã®ãã£ã¡ã³ã·ã§ã³ããããããã¥ã¡ã³ãã«ã¯ããã®åèªã䜿çšãããé »åºŠã®ãµã€ã³ãèšé²ããŸãã ãã¯ã¿ãŒãåãåããŸãã ãããèŠã€ããã«ã¯å€ãã®ã¢ãããŒãããããŸãã ããããTF-IDFãæ¯é çã§ãã åèªã®é »åºŠïŒçšèªé »åºŠãTFïŒã®å®çŸ©ã¯ç°ãªããŸãã ããã¯åèªæ°ãããããŸããã ãŸãã¯ããã©ã°ãåèªãèŠããã©ããã ãŸãã¯ã察æ°çã«å¹³æ»åãããåèªãžã®åç §ã®æ°ãªã©ãå°ãããªãããŒã§ãã ãããŠããããæãèå³æ·±ããã®ã§ãã ææžã§TFãå®çŸ©ãããããããéææžé »åºŠïŒIDFïŒã§ä¹ç®ããŸãã IDFã¯éåžžããšã³ã¯ããŒãžã£ãŒå ã®ææžæ°ã®å¯Ÿæ°ãåèªã衚瀺ãããŠããææžæ°ã§å²ã£ãŠèšç®ãããŸãã 以äžã«äŸã瀺ããŸãã ç§ãã¡ã¯ãè»å£ã®ãã¹ãŠãã¹ãŠã®ææžã§äœ¿çšãããŠããèšèã«åºäŒããŸããã æããã«ã察æ°ã¯ãŒãã«ãªããŸãã ç§ãã¡ã¯ãã®ãããªèšèãä»ãå ããŸããïŒããã¯æ å ±ãéã°ãããã¹ãŠã®ææžã«ãããŸãã
ã¯ãŒãããã°ã¢ãããŒãã®å©ç¹ã¯äœã§ããïŒ å®è£ ãç°¡åã§ãã ãããã圌ã¯èªé ã«é¢ããæ å ±ãå«ãæ å ±ã®äžéšã倱ããŸãã ãããŠä»ã圌ãã¯åèªã®é åºãã©ãã»ã©éèŠã§ããããšãããããã¯ã«é¢ããå€ãã®ã³ããŒãç Žå£ãç¶ããŠããŸãã æåãªäŸã1ã€ãããŸã-ãã¹ã¿ãŒãšãŒãã§ãã 圌ã¯æã«åèªãã©ã³ãã ã«å ¥ããŸãã ãšãŒãã®ã¹ããŒãã¯çããã§ãããç§ãã¡ã¯ãããèªç±ã«ç解ããŠããŸããã€ãŸãã人éã®è³ã¯ã泚æã倱ã£ãŠãæ å ±ãååã«ç°¡åã«å埩ããŸãã
ãã ãããã®æ å ±ã¯éèŠãªå ŽåããããŸãã ããšãã°ãææ çãªè²ä»ããåæããå ŽåããããããŸãã¯ããªãããšããèšèãäœãæããŠããã®ãã¯ãæ¯èŒçéèŠã§ãã 次ã«ãåèªã®è¢ãšäžç·ã«ãN-gramã®è¢ã圹ç«ã¡ãŸããåèªã ãã§ãªããã¬ãŒãºãèŸæžã«è¿œå ããŸãã ããã¯çµã¿åããã®ççºã«ã€ãªããããããã¹ãŠã®ãã¬ãŒãºã玹ä»ããŸãããããã°ãã°äœ¿çšãããçµ±èšçã«ææãªãã¢ãŸãã¯ååä»ããšã³ãã£ãã£ã«å¯Ÿå¿ãããã¢ãè¿œå ã§ããããã«ããæçµã¢ãã«ã®å質ãåäžããŸãã
ãåèªã®è¢ããæ å ±ã倱ããæªããå¯èœæ§ãããç¶æ³ã®å¥ã®äŸ-åèªã¯å矩èªãŸãã¯ããã€ãã®ç°ãªãæå³ãæã€åèªïŒããšãã°ãããã¯ïŒã§ãã äžéšã§ã¯ããããã®ç¶æ³ã«ãããããšãã°æåãªword2vecããããã¡ãã·ã§ããã«ãªskip-grammãªã©ãã åèªã®ãã¯ãã«è¡šçŸ ããæ§ç¯ããæ¹æ³ãåŠçã§ããŸãã
éè€æé€
ãã¯ãã«åã 次ã«ãéè€ããã±ãŒã¹ãã¯ãªãŒã³ã¢ããããŸãã åçã¯æ確ã§ãã ãã¯ãã«ç©ºéã«ã¯ãã¯ãã«ãããããããã®è¿æ¥åºŠã決å®ããã³ãµã€ã³ãååŸã§ããŸããä»ã®è¿æ¥åºŠã¡ããªãã¯ã䜿çšã§ããŸãããéåžžã¯ã³ãµã€ã³ã䜿çšããŸãã äœåŒŠã1ã«è¿ãå ±éã°ã«ãŒãã®ããã¥ã¡ã³ããçµåããŸãã
ãã¹ãŠãã·ã³ãã«ã§ç解ããããããã«æããŸããã1ã€ã ããããŸãã20åã®ããã¥ã¡ã³ãããããŸãã 20åã20ååãããšãäœåŒŠãæ°ãçµããããšã¯ãããŸããã ã³ãµã€ã³ãèšç®ããããã®åè£ããã°ããéžæãã培åºçãªæ€çŽ¢ãæé€ã§ããæé©åãå¿ èŠã§ãã ãããŠãããã§ããŒã«ã«ã«ææãªããã·ã¥ã圹ç«ã¡ãŸãã æšæºããã·ã¥é¢æ°ã¯ãããã·ã¥ã¹ããŒã¹ã«ããŒã¿ãåäžã«æ¡æ£ããŸãã ããŒã«ã«ã«ææãªããã·ã¥ã¯ããªããžã§ã¯ãã®ã¹ããŒã¹ã«åæ§ã®ãªããžã§ã¯ããå¯æ¥ã«é 眮ããŸãã ããçšåºŠã®ç¢ºçã§ã圌ã¯éåžžãããã«åãããã·ã¥ãäžããããšãã§ããŸãã
ããŸããŸãªé¡äŒŒæ§ã¡ããªãã¯ã®ããŒã«ã«ã«ææãªããã·ã¥ãèšç®ããããã®å€ãã®ææ³ããããŸãã ã³ãµã€ã³ã«é¢ããŠã¯ã ã©ã³ãã å°åœ±æ³ããã䜿çšãããŸãã ã©ã³ãã ãªãã¯ãã«ããã©ã³ãã ãªåºåºãéžæããŸãã åºåºãã¯ãã«ã®1ã€ã䜿çšããŠããã¥ã¡ã³ãã®ã³ãµã€ã³ãæ€èšããŸãã ãŒããã倧ããå Žåãåäœãèšå®ããŸãã ãŒãæªæºãŸãã¯ããã«çãã-ãŒããèšå®ããŸãã 次ã«ãããã2çªç®ã®åºåºãã¯ãã«ãšæ¯èŒãããã1ã€0ãŸãã¯1ãååŸããŸãã åºç€ã«ãããã¯ãã«ã®æ°-çµäºãããããã®æ°ããããããã·ã¥ã§ãã
å©ç¹ã¯äœã§ããããªãæ©èœããã®ã§ããïŒ 2ã€ã®ããã¥ã¡ã³ããäºãã«äœåŒŠã«è¿ãå Žåãé«ã確çã§ããããã¯åºåºãã¯ãã«ããåãåŽã«ãããŸãã ãããã£ãŠãé¡äŒŒã®ããã¥ã¡ã³ãã«ã¯1ã€ã®ããã¥ã¡ã³ããå«ãŸããå¯èœæ§ãé«ããªããŸãã ããã«ãããããããæåºããããŸãã ããããä¿®æ£ããã«ã¯ãæé ãç¹°ãè¿ããŸãã å®éã«ã¯ãéåžž2åå®è¡ããŸãã æåã«ã24ãããããã·ã¥ãèšç®ããã»ãšãã©åäžã®ããã¥ã¡ã³ããå€æ°åé€ããŸãã 次ã«ãå¥ã®ããã·ã¥ãå¥ã®æ¹æ³ã§æ€èšããŸããããã§ã«16ãããã§ãããéè€ãè¿œå ããŸãã ãã®åŸãã³ããŒã¯æ®ããŸããããŸãã¯ãã³ããŒãéåžžã«å°ãªããããã¢ãã«ã®å質ã«å€§ããªåœ±é¿ãäžããããšã¯ã§ããŸããã
ã»ãã³ãã£ãã¯åæ
ãããŠãç§ãã¡ã¯æãèå³æ·±ããã®ã«ãã£ãããšé²ãã§ããŸãã ããã¥ã¡ã³ãã®å 容ãã©ã®ããã«ç解ããŸããïŒ ã»ãã³ãã£ãã¯åæã®ã¿ã¹ã¯ã¯éåžžã«å€ãã§ãã æãªããã®ã¢ãããŒãã¯ããã§ãïŒäºåã«èª¬æãããªã³ãããžãŒãäœæããå³å¯ã«è§£æããæ§æããªãŒã®ããŒãããªã³ãããžãŒã®æŠå¿µã«ãããã³ã°ããå€ãã®ææžãã«ãŒã«ãäœæããŸã-ãªã©ãã»ãã³ãã£ã¯ã¹ãååŸããŸãã ããã¯ãã¹ãŠçè«çã«ã¯çŸããã§ãããå®éã«ã¯æ©èœããŸãããææžãã®ã«ãŒã«ãããããããå Žåãæ©èœããã®ã¯å°é£ã§ãã
ææ°ã®ã¢ãããŒãã¯ãæåž«ãªãã§ã®ã»ãã³ãã£ã¯ã¹ã®åæã§ãããããã£ãŠãé ãããïŒæœåšçãªïŒã»ãã³ãã£ã¯ã¹ã®åæãšåŒã°ããŸãã ãã®ã¡ãœããïŒãŸãã¯ã¡ãœãããã¡ããªãŒïŒã¯ã倧èŠæš¡ãªã±ãŒã¹ã§ããŸãæ©èœããŸã-倧èŠæš¡ãªã±ãŒã¹ã§ã®ã¿é衚瀺ã®ã»ãã³ãã£ã¯ã¹ã®æ€çŽ¢ãå®è¡ããã®ã¯çã«ããªã£ãŠããŸãã ããã«ã¯ãååãšããŠãæ§åŒã®ã¢ãããŒãã®ã«ãŒã«ãæã€ã·ãŒããšã¯ç°ãªããè²ä»ãã§ãããã©ã¡ãŒã¿ãŒãæ¯èŒçå°ãªããæ¢è£œã®ããŒã«ããããŸãïŒäœ¿çšããŠãã ããã
æœåšã»ãã³ãã£ãã¯ã€ã³ããã¯ã¹
æŽå²çã«ãæœåšæå³è§£æãžã®æåã®ã¢ãããŒãã¯æœåšæå³çŽ¢åŒä»ãã§ãã ã¢ã€ãã¢ã¯ãšãŠãã·ã³ãã«ã§ãã ç§ãã¡ã¯ãã§ã«ãå®èšŒæžã¿ã®ãããªãã¯ã¹å解ææ³ã䜿çšããŠãå調çãªæšå¥šäºé ã解決ããŠããŸãã
å æ°å解ã®æ¬è³ªã¯äœã§ããïŒ æšå¥šäºé ã«ã¯ããŠãŒã¶ãŒ-ã¢ã€ãã ïŒãŠãŒã¶ãŒãã¢ã€ãã ãæ°ã«å ¥ã£ãŠããéãïŒã®å€§ããªãããªãã¯ã¹ããããŸãã ãããå°ããªè¡åã®ç©ã«å解ããŸãã ããã§ããŠãŒã¶ãŒã®èŠçŽ ãšã¢ã€ãã ã®èŠçŽ ã®ãããªãã¯ã¹ãã§ããŸããã 次ã«ããããã®2ã€ã®è¡åïŒãŠãŒã¶ãŒ-å åããã³å å-é ç®ïŒãååŸããä¹ç®ããŸãã æ°ãããŠãŒã¶ãŒ-ã¢ã€ãã ãããªãã¯ã¹ãååŸããŸãã å æ°å解ãæ£ããå®è¡ããå Žåãæåã«ã¬ã€ã¢ãŠããããããªãã¯ã¹ã«å¯èœãªéãäžèŽããŸãã åãããšãããã¥ã¡ã³ãã§è¡ãããšãã§ããŸãã ãããªãã¯ã¹ãææž-åèªããŸãã¯ãåèª-ææžããåããããããææž-å åããšãå å-åèªãã®2ã€ã®ãããªãã¯ã¹ã®ç©ã«å解ããŸãã ããã¯ç°¡åã§ããæ¢è£œã®ããŒã«ããããŸãã ãã®ã¢ãããŒãã§ã¯ãããŸããŸãªæå³ãæã€å矩èªãèªåçã«èæ ®ããŸãã ã±ãŒã¹ã«å€ãã®ã¿ã€ããã¹ãããå Žåãã¹ãã«ãã¹ã®åèªããã®ãããªé ãããèŠå ãæããŠããããšãèªèããŸãã æå°éã®ãã©ã¡ãŒã¿ãŒãæ¢è£œã®ããŒã«ã1990幎代ã®åãããããããã®ææ³ã䜿çšãããŠããŸããããã äžã€ã®ããšïŒåŸãããã»ãã³ãã£ã¯ã¹ãããŸãã«ãé ãããŠããŸããææžèŠçŽ ãšåèªèŠçŽ ã®ãã¯ãã«ãããã³ãŒãã¹ã«ã€ããŠäœããèšãããšã¯éåžžã«å°é£ã§ããå ±åæšå¥šã®ã¿ã¹ã¯ã§ãã®åé¡ãããã»ã©éèŠã§ãªãå Žåãå€ãã®åé¡ã®èªç¶èšèªã®åæã§ã¯ç¶æ³ãç°ãªããŸããç§ãã¡ã解éã§ããªãæ°åŠçã¢ãã«ã¯ãç§ãã¡ã«æ°ããç¥èãäžããŸããããããã£ãŠã圌ãã¯ä»£æ¿æ¡ãæ¢ãå§ããŸããã
確ççæœåšã»ãã³ãã£ãã¯ã€ã³ããã¯ã¹
- . .
. , , , . , . . , . . . ( , , ), , .
, : .
, ? , - â . « â » : « â » « â ». , , - . . . , , , , , , : , .
â TF-. TF-IDF , .
, ? . . , , , , . .
. : , . , . . . « â » « â ». ãããã©ããã£ãŠããã®ïŒ , .
EM- . , , . . , . γ ijk : , .
. , N ik , γ ijk , â N ij . N jk , , γ ijk , . N k â . , . ijk , N ik , N jk N k γ ijk . â . γ ijk , γ ijk , , . γ ijk . . γ ijk , â γ ijk , . ., .
? -, . â : . . , , , γ ijk . , , γ ijk , . ããã ãã§ãã γ ijk . . .
? « », « » « ». . ? . ? . , , .
, , . ? . . : « â » « â », , .
? . ( , ), . . , . , .
, . .
(Latent Dirichlet Allocation, LDA) â . , , .
? : . ? , . , , , . , , , . â , . , - , , . , , : - .
? ãšãŠãç°¡åã§ãã , . γ ijk , . β , â α , , , . ? α β
: -, . , , . , .
. α α , β , β , .
, β , , ; , , , β . α . α â , â , . α . , domain specific themes, α , .
. . . , , . , .
. - .
: .
. , â , . , . , -, . : , .
, â , . . â , . â , .
ãªããã®ãããªå°é£ãªã®ã§ããïŒ , . . , , , â .
, John Snow. Snow â , Snow â . snow . , , , , . snow .
, . , , , .
ããã¯ã©ãããæå³ã§ããïŒ â , , α β . , , â , . , , . .
? . , , . γ ijk , , . γ ijk Z , . , . , , , .
, , . â γ ij . : γ ij , . , γ ij , . , , , , , â .
« â » « â » , .
? . , , LDA. α β , , PLSA. α β , â PLSA-. .
. , . . , LDA-. .
, , , . . .
. , , , , , , , , . ? -, .
, . â , 1000, 250 . - , - .
, , , - , - , , , ⊠â , , . . , . . , , , .
?
åœç¶ã å€ãã®éçºãªãã·ã§ã³ããããŸãã ç¹ã«ããããã¯ã Additive regularizers ãã¯çŸåšãã·ã¢ã§åããŠããŸãã ç¹°ãè¿ãæŽæ°ã®äžéšãšããŠã«ãŠã³ãããæ°åŒã«æ°ãããã¹ãã¹ã¯ãªãããè¿œå ãããããããããã»ã¹ãã¢ãã«åããŸãã äžéšã¯ãéã¿ãå°ãããªãããããããã¯ã®äžéšãåé€ããŸãã ããçš®ã®èæ¯ãããã¯ã䟵é£ããããéã«ãã¡ã€ã³ãããã¯ãå¹³åŠåãããããŸãã
æ£ååãè¿œå ããã ãã§ãªããçæã¢ãã«ãè€éã«ããããšãç®çãšããã¢ãããŒãããããŸãã ããšãã°ãæ°ãããšã³ãã£ãã£ãã¿ã°ãäœæè ãããã¥ã¡ã³ãã®èªè ãè¿œå ããç¬èªã®ããŒãååžãæã€ããšãã§ãããããã«åºã¥ããŠå ±éã®ãã®ãæ§ç¯ããããšããŸãã
æœè±¡ãã£ãªã¯ã¬ååžã§ã¯ãªããä¿æãããå éšçååžãšããŠã©ãã«ä»ãã±ãŒã¹ã«é¢ãããããã¯ã®ååžãéžæãããšããã®æåž«ãªãLDAææ³ãšã©ãã«ä»ãã±ãŒã¹ã亀差ãããè©Šã¿ããããŸãã
èå³æ·±ãããšã«ãçæã¢ãã«ã®ããããã¹ãŠã®ç¢ºçã¯ãå®éã«ã¯ãè¡åå åå解ææ³ã§ãã ããããç¹ç°å åå解å åå解ãšã¯ç°ãªãããããã®ææ³ã¯äœããã®åœ¢åŒã®è§£éããµããŒãããŠããŸãã ãã®ãããLDAã¯ä»ã®é åã§äœ¿çšãããããã«ãªããŸãããç»åè¡åãå æ°å解ãããããã«é¢ããå ±åã®æšå¥šäºé ãäœæããŸãã ãœãŒã·ã£ã«ãããã¯ãŒã¯ãšã°ã©ãã®åæã®åéã§ã¯ã確ççãããã¯ã¢ããªã³ã°ã®ãããã¯ããããŸãã ç§ãç解ããŠããããã«ãããã¯å€ããå°ãªããç¬ç«ããŠéçºãããŸããããå®éã«ã¯ã確çççæã¢ãã«ãéããŠè¡åãå æ°å解ããããšã§ããããŸãã ã€ãŸããLDAãšããããèžããã¹ãŠã®ãã®ã¯ãèªç¶èšèªã®åæããä»ã®é åã«ãšã¯ã¹ããŒãããããã®ã§ãã
æ°å€ãããªãããŒãããæè¡ã«ã€ããŠå°ãã ããã»ã¹ã¯ç°¡åã§ããã倧ããªããã£ã§äž»é¡ã¢ãã«ãäœæããã«ã¯æéãããããŸãã æéããããŸãã çŸåšè¡šç€ºãããŠããããã¥ã¡ã³ãã®ããŒããç解ããå¿ èŠããããŸãã
ãã®ã¢ãããŒãã䜿çšããŸããããŒãã¢ãã«ãäºåã«æºåããŸãã ã¢ãã«ã¯ããããã¯ãšåèªã®ãããªãã¯ã¹ã«åºã¥ããŠããŸãã å®æãããã£ãã·ã¥ãããã¯ã¯ãŒããããªãã¯ã¹ã䜿çšãããšãç¹å®ã®æçš¿ã衚瀺ããããšãã«ãããã¥ã¡ã³ããããã¯ã®ååžã調æŽã§ããŸãã å®æçã«æŽæ°ãããäžè¬çãªããŒãã¢ãã«ããããããã¯æšæºã®map reduceã«ãã£ãŠèšç®ãããæ°ããæçš¿ã®é£ç¶ã¹ããªãŒã ããããŸãã ã¹ããªãŒãã³ã°åæããŒã«ã䜿çšããŠããããåŠçããäºåã«æºåãããããããã¯ã¯ãŒãããããªãã¯ã¹ã«åºã¥ããŠããªã³ã¶ãã©ã€ã§ãããã¯ã決å®ããŸãã ããã¯å žåçãªåè·¯ã§ãã æ¬çªç°å¢ã®ãã¹ãŠã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã¯ãéåžžã次ã®ããã«æ©èœããŸããé£ããéšåã¯ãªãã©ã€ã³ã®æºåã§ãããç°¡åãªéšåã¯ãªã³ã©ã€ã³ã§ãã
ææ çãªè²ä»ãã®åæã«ã€ããŠã¯ãŸã 話ããŠããŸããã è¯ãïŒããã¹ãã®å 容ãç解ããŸãã;ãããã¯ã®ç¢ºççååžã決å®ããŸããã ããããèè ããããã¯ã«å¯ŸããŠè¯å®çãŸãã¯åŠå®çãªæ 床ãæã£ãŠãããã©ãããã©ã®ããã«ç解ããã®ã§ããããïŒ
ååãšããŠãæåž«ãšã®å ±åäœæ¥ã«åºã¥ãæ¹æ³ãäŸç¶ãšããŠããã§æ¯é çã§ãã ããžãã£ããªææ ãšãã¬ãã£ããªææ ãæã€ããã¹ãã®ã©ãã«ä»ãã³ãŒãã¹ãå¿ èŠã§ããããã®äžã§åé¡åãèšç·ŽããŸãã åèªã®è¢ã«åºã¥ãã¢ãããŒãã¯ãå€ãã®å Žåã倱æããçµæã«ã€ãªãããŸãã ææ ã¯æã åãèšèã§è¡šçŸãããæèãéèŠã§ãã ãããã£ãŠãåèªã®ããã°ã®ä»£ããã«ãN-gramããã°ããã䜿çšãããŸãã æšæºçãªåèªãŸãã¯ããŒãã£ã¯ã«ã«ãã£ãŠïŒããšãã°ããnotãã«ãã£ãŠïŒã圌ãã¯ãããäœã§ããããç解ããããšããŸãã 圌ãã¯ãã®èšèãç 究ããªãããæã®åãšãã®è·é¢ã«ããªããç²åããããã©ããã調ã¹ãŸãã ããã«ã圌ãã¯ããã®äººãããã¹ããæžãããšãããã®äººãç·åŒµããŠãããæã£ãŠããããŸãã¯åãã§ãããšããè¿œå ã®å åã«æ³šæãæããŸãã æå笊ããã£ãããåèªå ã«å°å·ã§ããªãæåããããããããŸãïŒããããæ±ãèšèªã®ã¹ã¯ãªãŒãã³ã°ã§ãïŒããããŠããã®ãã¹ãŠã«ã€ããŠåé¡åšãèšç·ŽãããŸãã
ç¹ã«åé¡åšãç¹å®ã®ãµããžã§ã¯ããšãªã¢åãã«ãã¬ãŒãã³ã°ããå¿ èŠãããå Žåã¯ãããªãããŸãããããšããããŸãã äžé£ã®æ ç»ã¬ãã¥ãŒãããã°ãææ ã«é¢ããåé¡ãèšç·Žããããšã¯ããªãå¯èœã§ãã åé¡ã¯ããããããã®åé¡åãã¬ã¹ãã©ã³ã®ã¬ãã¥ãŒã§æ©èœããªããªãããšã§ãã ã¬ã¹ãã©ã³ã«å¯Ÿããæ 床ããã°ãã°è¡šãä»ã®èšèããããŸãã ãããŸã§ã®ãšãããææ çãªè²ä»ãã®åæã«å¯Ÿããæåãããœãªã¥ãŒã·ã§ã³ã¯ãäž»ã«ç¹ã«æåãããŠããŸãã
ææ ã¯ãã°ãã°å€åãããããããã¹ãã®ãµã€ãºã¯ååã«éèŠã§ãã ããææ çãªã¡ãã»ãŒãžãå«ã段èœãŸãã¯è€æ°ã®æãšãå¥ã®ææ çãªã¡ãã»ãŒãžãå«ãæãããå ŽåããããŸãã ããšãã°ãã¬ãã¥ãŒã§ã¯ã奜ããªãã®ãšããã§ãªããã®ãæžãããšããããŸãã ãããã£ãŠãããã¥ã¡ã³ãããã®ãããªé åã«åå²ãã䟡å€ããããŸãã
ãã®çµæãææ ã¯äžçšåºŠã®ããã¹ãã«æãããå®çŸ©ãããŸãã å°ãããã-ååãªæ å ±ããªããšãããªã¹ã¯ããããŸãããé·ããã-çµæããŒãããããŸãã
éåžžã«äººæ°ã®ããSentiStrengthã©ã€ãã©ãªã«ã¯ãæç« ãããã¹ããæã¡è² ãããããã«å«ãŸããææ ãå€æã§ããWebãµãŒãã¹ããããŸãã ããããããã§åé¡ã¿ã¹ã¯ã¯ãã€ããªã§ã¯ãªãããšãèšããªããã°ãªããŸãããååãšããŠããããã®æ¹æ³ã¯åã«ãããžãã£ãããŸãã¯ããã¬ãã£ããã§ã¯ãªããããã®ãããªåã§ããžãã£ãã§ãããšèšããŸãã ãããããããã¯ãã®ã¹ã¿ãã¯ã§æãéæ床ã®äœãã¿ã¹ã¯ã®1ã€ã§ãããããã«å€ãã®ã¿ã¹ã¯ãããã§éçºã§ããŸãã
æåŸã«ããŸã 解決ãããŠããªãã¿ã¹ã¯ãããå°ãå®è¡ããŸãã
æå§ãã«ãããã¯ã«ã¹ã¿ã ããã¹ããæšæºåœ¢åŒã«ãã£ã¹ãããŠããŸãã ã¿ã€ããã¹ãã¹ã¿ã³ããä¿®æ£ã§ããŸãã ããããã¹ãŠãçµã¿åãããããšãããšããã°ãã°ã²ã©ããªããŸãã çãããã¹ãã«ã€ããŠã¯ãã€ã³ãã£ãã°ã©ã ã«é¢é£ããã¢ãããŒããå¿ èŠã§ããéåžžã®ç£æ¥çšå®è£ ã¯ãŸã ãªããããæ©èœãããã©ããã¯äžæã§ãã çãããã¹ãã®äž»é¡ã¢ããªã³ã°ãå°é£ã§ãã èšèãå°ãªããã°å°ãªãã»ã©ããã®æå³ãç解ããã®ãé£ãããªããŸãã
ãŸã 話ããŠããªãå¥ã®ã¿ã¹ã¯ã ããŠãããã¥ã¡ã³ããã©ã®ãããã¯ã«é¢é£ããããç解ããŸããã ãããããŠãŒã¶ãŒãããŸãã ç®çïŒã»ãã³ãã£ãã¯ãããã¡ã€ã«ãäœæããŸãã ã»ãã³ãã£ã¯ã¹ãšææ ãçµã¿åãããŸãã ãã®ãããªãããã¯ãææ ãããããšãç解ããã ãã§ã¯ååã§ã¯ãããŸããã ã©ã®ãããã¯ãã©ã®ãããªææ ãåŒãèµ·ãããããç¥ãå¿ èŠããããŸãã
ããŒãã¢ãã«ãçµæçã«èª¿æ»ããã®ã¯èå³æ·±ãããšã§ããããŒãã¢ãã«ãã©ã®ããã«å€æãããããæ°ãããããã¯ãã©ã®ããã«çºçããããæ¢åã®ããŒãã®èªåœãå€åããããªã©ã§ãã éè€æé€ã¯ãã³ããŒãå«ãããã¹ãã§ã¯ããŸãæ©èœããŸããããããã®ã³ããŒãæå³çã«æªããããããã¹ãã§åæ¢ããå¯èœæ§ããããŸããã€ãŸããã¹ãã 察çã«ã€ããŠè©±ããŸãã ã€ãŸããããã¯å€ãã®ç°ãªããœãªã¥ãŒã·ã§ã³ãååšãã巚倧ãªé åã§ãããããã«å€ãã®æªè§£æ±ºã®åé¡ããããŸãã 誰ããæ©æ¢°åŠç¿ã«èå³ããããå®éã®å®çšçãªã¿ã¹ã¯ãæ±ãå Žåã¯ãæè¿ããŸãã
ãããã«
æè¿ãããã¹ãåæã®åéã§ã¯ã人工ãã¥ãŒã©ã«ãããã¯ãŒã¯ã«åºã¥ãæ¹æ³ã®å°å ¥ã«å€ãã®æ³šæãæãããŠããŸãã ç»ååæã®åéã§ã®ãããªå§åçãªæåã¯ãäž»ã«ããã¹ãã«ãšã£ãŠã¯ããã«éèŠãªã¢ãã«ã®äœã解éæ§ã®ããã«éæã§ããŸããã§ããã ãããããŸã æåããŠããŸãã ããã€ãã®äžè¬çãªã¢ãããŒããæ€èšããŠãã ããã
ãæå³ã®ãã¯ãã« ã ã Googleã®2013幎ã®èª¿æ»ã§ã¯ã2å±€ãã¥ãŒã©ã«ãããã¯ãŒã¯ã䜿çšããŠãã³ã³ããã¹ãããšã«åèªãäºæž¬ããããšãææ¡ãããŸããïŒåŸã«ãå察ã®ãªãã·ã§ã³ãç»å ŽããŸããïŒåèªããšã®ã³ã³ããã¹ãã®äºæž¬ïŒã äž»ãªçµæã¯äºæž¬ãã®ãã®ã§ã¯ãªããåèªã«å¯ŸããŠåŸããããã¯ãã«è¡šçŸã§ããã èè ã«ãããšã圌ãã¯èšèã®æå³ã«ã€ããŠã®æ å ±ãå«ãã§ããã ãã¯ãã«è¡šçŸã§ã¯ãåèªã®ã代æ°æŒç®ãã®èå³æ·±ãäŸãèŠã€ããããšãã§ããŸãã ããšãã°ããçã¯ç·+女â女çã§ããã ããã«ããã¯ãã«åãããåèªã¯ãããã¹ãããŒã¿ãä»ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã«è»¢éãã䟿å©ãªåœ¢åŒã«ãªããword2vecã¢ãã«ã®äººæ°ã倧ããä¿èšŒãããŸããã
åèªã®æå³ã®ãã¯ãã«ã䜿çšããã¢ãããŒãã®éèŠãªå¶éã®1ã€ã¯ãææžã§ã¯ãªãåèªã®æå³ã決å®ãããããšã§ãã çãããã¹ãã®å Žåãããã¹ãã«å«ãŸããåèªã®ãã¯ãã«ãå¹³ååããããšã«ãããé©åãªãéçŽããããæå³ãåŸãããšãã§ããŸãããé·ãããã¹ãã®å Žåããã®ã¢ãããŒãã¯ãã§ã«å¹æããããŸããã å¶éãåé¿ããããã«ãããŸããŸãªä¿®æ£ãææ¡ãããŸããïŒsentence2vecãparagraph2vecãdoc2vecïŒããåºæ¬ã¢ãã«ã®ããã«ã¯åºãŸããŸããã§ããã
ãªã«ã¬ã³ããã¥ãŒã©ã«ãããã¯ãŒã¯ ã ããã¹ããæäœãããå€å žçãªãæ¹æ³ã®å€ãã¯ã倧éã®åèªã®ã¢ãããŒãã«åºã¥ããŠããŸãã æäžã®èªé æ å ±ã倱ãããŸãã å€ãã®åé¡ã§ã¯ãããã¯ããã»ã©éèŠã§ã¯ãããŸããïŒããšãã°ãã»ãã³ãã£ãã¯åæãªã©ïŒããéã«ãçµæãèããæªåããå ŽåããããŸãïŒããšãã°ãææ çãªè²ä»ããæ©æ¢°ç¿»èš³ã®åæãªã©ïŒã ãªã«ã¬ã³ããã¥ãŒã©ã«ãããã¯ãŒã¯ ïŒRNSïŒã«åºã¥ãã¢ãããŒãã¯ããã®å¶éãåé¿ã§ããŸãã RNSã¯ãçŸåšã®åèªã«é¢ããæ å ±ãšãåã®åèªããã®ïŒããã³æã«ã¯å察æ¹å-次ããã®ïŒåããããã¯ãŒã¯ã®åºå£ã«é¢ããæ å ±ãèæ ®ããŠãåèªã®é åºãèæ ®ããããšãã§ããŸãã
æãæåããRNSã¢ãŒããã¯ãã£ã®1ã€ã¯ãLSTMïŒLong Short Term MemoryïŒãããã¯ã¢ãŒããã¯ãã£ã§ãã ãã®ãããªãããã¯ã¯ãæ å ±ã®åäœããé·ããéèšæ¶ããæ°ããä¿¡å·ãå°çãããšãã«ãä¿åãããæ å ±ãèæ ®ããŠçããåºãããšãã§ããŸãã ãã®ã¢ãããŒãã¯åŸã ã«ä¿®æ£ããã2014幎ã«GRUïŒ Gated Recurrent Units ïŒã¢ãã«ãææ¡ãããŸãããããã«ãããå€ãã®å Žåãããå°ãªããã©ã¡ãŒã¿ãŒã§åãïŒãããŠæã«ã¯ããå€ãã®ïŒäœæ¥å質ãéæã§ããŸãã
ããã¹ããåèªã®ã·ãŒã±ã³ã¹ãšããŠèãããšãããããããã¯ãã«ïŒéåžžã¯word2vecãã¯ãã«ïŒã§è¡šãããçãããã¹ãã®åé¡ãæ©æ¢°ç¿»èš³ïŒã·ãŒã±ã³ã¹ããã·ãŒã±ã³ã¹ãžã®ã¢ãããŒãïŒããã£ãããããã®éçºã®åé¡ã«å¯Ÿããéåžžã«æåãããœãªã¥ãŒã·ã§ã³ã§ããããšãå€æããŸããã ãã ããé·ãããã¹ãã«ã¯ç¹°ãè¿ããããã¯ã®ååãªã¡ã¢ãªããªãããšãå€ãããã¥ãŒã©ã«ãããã¯ãŒã¯ã®ãåºåãã¯äž»ã«ããã¹ãã®æ«å°Ÿã«ãã£ãŠæ±ºå®ãããŸãã
ãããã¯ãŒã¯ãžã§ãã¬ãŒã¿ãŒã ç»åãšåæ§ã«ããã¥ãŒã©ã«ãããã¯ãŒã¯ã䜿çšããŠæ°ããããã¹ããçæããŸãã ãããŸã§ã®ãšããããã®ãããªãããã¯ãŒã¯ã®çµæã¯ã»ãšãã©ããã¡ã³ãã§ããããããã¯æ¯å¹Žçºå±ããŠããŸãã ãã®åéã®é²æã¯ãããšãã°Yandex.Autopoetã·ã¹ãã ïŒ2013幎ã«éçºïŒãèŠãŠãã°ã«ãŒãNeuron Defense ïŒ2016ïŒãŸãã¯Neurona ïŒ2017ïŒã®ã¢ã«ãã ãèŽãããšã§è¿œè·¡ã§ããŸãã
N-gramãšã·ã³ãã«ã«åºã¥ããããã¯ãŒã¯ ã åèªã«åºã¥ããŠãã¥ãŒã©ã«ãããã¯ãŒã¯ã®å ¥åãæ§ç¯ããããšã«ã¯ãå€ãã®å°é£ã䌎ããŸããå€ãã®åèªãååšããå¯èœæ§ãããããšã©ãŒãã¿ã€ããã¹ãå«ãŸããããšããããŸãã åèªã®ãã¯ãã«è¡šçŸã¯æçµçã«ããããã§ãã ããã«é¢ããŠãè¿å¹Žã§ã¯ãèšå·ããã³/ãŸãã¯N-ã°ã©ã ïŒè€æ°ãéåžžã¯3ã€ã®æåã®ã·ãŒã±ã³ã¹ïŒã«ããã¢ãããŒãããŸããŸã人æ°ãéããŠããŸãã
ããšãã°ãæåããŒã¹ã®ãªã«ã¬ã³ããããã¯ãŒã¯ïŒ Char-RNN ïŒã¯ãåèªïŒååãªã©ïŒãšæã®äž¡æ¹ã®çæã«éåžžã«æåããŠããŸãã ããã«ãååãªéã®ããŒã¿ã«å¯ŸããŠããããã¯ãŒã¯ãåèªãåè©ããåŠç¿ãããã ãã§ãªããåäœãšæŽ»çšã®åºæ¬çãªã«ãŒã«ããèšæ¶ãããããšãä¿èšŒã§ããŸãã
çãããã¹ãã®å Žåãããã©ã€ã°ã©ã ã®è¢ãã¢ãããŒãã䜿çšããŠãå€ãã®åé¡ã§è¯ãçµæãåŸãããšãã§ããŸãã ãã®å Žåãããã¥ã¡ã³ãã¯20ã40å次å ã®ã¹ããŒã¹ãã¯ãã«ïŒåå¯èœãªããªã°ã©ã ã«ã¯ç¬èªã®äœçœ®ãå²ãåœãŠãããŸãïŒãšç §åããããã®åŸãååãšããŠæ¬¡å ãåŸã ã«æžå°ããå¯ãªãããã¯ãŒã¯ã«ãã£ãŠå€å±€åŠçãããŸãã ãã®ãã¥ãŒã§ã¯ãã·ã¹ãã ã¯å€ãã®ã¿ã€ãã®ãšã©ãŒããã³ã¿ã€ããã¹ã«å¯Ÿããèæ§ãæäŸããåé¡ã®åé¡ã解決ããéä¿¡ã®æ€çŽ¢ãæåãããããšãã§ããŸãïŒããšãã°ã質åå¿çã·ã¹ãã ïŒã
ä¿¡å·ãåŠçããã¬ãã«ã®ãããã¯ãŒã¯ ã çä¿¡å·ã®åæã«ããããã¥ãŒã©ã«ãããã¯ãŒã¯ã®äŸå€çãªåœ¹å²ã«æ³šæããå¿ èŠããããŸãã ååãšããŠãçŸä»£ã®ã·ã¹ãã ã§ã®é³å£°ã¯ãªã«ã¬ã³ããããã¯ãŒã¯ã䜿çšããŠèªèãããŸã ã ææžãããã¹ãã®åæã§ã¯ãç³ã¿èŸŒã¿ãããã¯ãŒã¯ã䜿çšããŠåã ã®æåãèªèãããªã«ã¬ã³ããããã¯ãŒã¯ã䜿çšããŠã¹ããªãŒã å ã®æåãã»ã°ã¡ã³ãåããŸãã
質çå¿ç
質å ïŒãã·ã¢èªçšã®LdigããŒã«ã¯ãããŸããïŒ
åç ïŒç§ã®æèŠã§ã¯ããã·ã¢äººã¯ããŸããã ããã¯Pythonããã±ãŒãžã§ããéžæç¯å²ã¯éåžžã«éãããŠããŸãã ãµã€ããŠãºã©ãã§éçºãããŸããã èè ã¯ããŒãã¢ãã«ã«åãæ¿ããŠããããã ãã§ããèšèªã¯ãã¯ãç§ãã¡ã«ãšã£ãŠèå³æ·±ããã®ã§ã¯ãããŸãããã ãããã£ãŠãLdigãçŸåšéçºããŠãã人ã¯ããŸããã ç§ãã¡ã¯èªåèªèº«ã§ããã€ãã®ã¹ããããèžãããšããŠããŸãããããã¯ãã¹ãŠè¯ãããŒã¯ã®å»ºç©ã®æºåã«åž°çããŸãã çµæãããå Žåã¯ãæçš¿ããã§ãããã ããããinfinigramsãšLdigã®éãèšèªã¯ã»ãšãã©ãããŸããã 90ã®èšèªãæã€LangDetectãšã¯ç°ãªããŸãã
質å ïŒPLSAçšã®ãªãŒãã³ããŒã«ã¯ãããŸããïŒ
åç ïŒã±ãŒã¹ãæ¯èŒçå°ããå ŽåãBigARTMã©ã€ãã©ãªããããŸããããã¯ãå ç¢ãªLDAã®åµèšè ã§ããKonstantin Vorontsovã®æå°ã®äžãã¢ã¹ã¯ã¯ã§äœæãããŠããŸãã ããŠã³ããŒãå¯èœã§ã軞äžã§é«éãã€äžŠåã«éããŠããŸãã
Mr.ã®ãããªãåæ£ã·ã¹ãã äžã«æ§ç¯ãããããã€ãã®å®è£ ããããŸãã LDAã ç°ãªãããã±ãŒãžã«ã¯ç¬èªã®å®è£ ããããŸãã Sparkã«ã¯Vowpal WabbitããããŸãã ç§ã®æèŠã§ã¯ãäœãã¯ãããŠãã«ããããŸããã 1å°ã®ãã·ã³ã®ã¡ã¢ãªã«åãŸãã±ãŒã¹ã§äœãããããå Žåã¯ãBigARTMãŸãã¯Pythonã¢ãžã¥ãŒã«ã䜿çšã§ããŸãã ç§ã®ç¥ãéããPythonã«ã¯LDAããããŸãã
質å ïŒPLSA ã«é¢ããå¥ã®è³ªåã MLã¢ã«ãŽãªãºã ã«åæã®ä¿èšŒã¯ãããŸããïŒ
åç ïŒåæã®æ°åŠçåæããããããã«å¯Ÿããä¿èšŒããããŸãã å®éã«ã¯ãåæããªãããšãèŠãããšã¯ãããŸããã ããããåæãããå€ããå°ãªããç§ãã¡ãèŠãŠãããã®ã説æããååžãäžå¿ã«æ¯åããŸãã ã€ãŸããããã¥ã¡ã³ãã¯æ¯åãå§ããå¯èœæ§ããããŸãããèŸæžã¯ä¿®æ£ãããŠããŸãã éåžžãåœæã¯æžå°ããªããªã£ãåŸãå埩ãåæ¢ããŸãã
質å ïŒããã¥ã¡ã³ãå ã®ãããã¯ã®åºçŸã¯ã©ã®ããã«å€æãããŸããïŒ
åç ïŒå埩ããã»ã¹ã«åºã¥ããŸãã ç¹å®ã®åèªããã®ãããã¯ã«ãã£ãŠç¹å®ã®ããã¥ã¡ã³ãã«æã¡èŸŒãŸãã確çã«ãŠã³ã¿ãŒããããŸãã ããã«åºã¥ããŠãããã¥ã¡ã³ãå ã®ãããã¯ã®ãã¯ãŒãæŽæ°ãããã¹ãŠãå床ã«ãŠã³ããããããã¯äžã®ããã¥ã¡ã³ãã®ã¯ãŒãã«ãŠã³ã¿ãŒã®æ°ããå€ãååŸããŸãã ãããŠæåŸã«ãååžãååŸããŸãã
質å ïŒããã¹ãããã®æ å ±ãç 究ããããã«ãã£ãŒãã©ãŒãã³ã°ã¢ãã«ã䜿çšãããŠããŸããïŒ
åç ïŒé©çšããŸãã ãããããã®ãããªç¬éããããŸãã ãã£ãŒãã©ãŒãã³ã°ã®å Žåãããç¥ãããŠããããšã¯word2vecãdoc2vecãsentence2vecã§ãã å³å¯ã«æ£åŒã«ã¢ãããŒãããå Žåãããã¯å®éã«ã¯æ·±å±€åŠç¿ã§ã¯ãããŸããããå®éã«ã¯å®éã®æ·±å±€ãããã¯ãŒã¯ããããããããé©çšããããšããŠããŸãã ç§ã¯ãã®ãããªãããã¯ãŒã¯ã®çµéšãæ··åšããŠããŸãã ãããããå€ãã®ãã€ãºããããå®éã®å®çšçãªåé¡ã解決ããããšãããšãã²ãŒã ã¯ããããã®äŸ¡å€ããªãããšãããããŸãã ããããããã¯ç§ã®å人çãªæèŠã§ãã 人ã ã¯è©Šã¿ãŠããŸãã
質å ïŒããã¥ã¡ã³ãã®ããŒããšææ çãªè²ä»ããå®çŸ©ããããã®ç¢ºç«ããããªãŒãã³ãœãŒã¹ã©ã€ãã©ãªã¯ãããŸããïŒ
åç ïŒBigARTMãšVorontsovã®åœŒã«é¢ããåºçç©ã«å©èšããŸãã ãããŠãã¢ã¹ã¯ã¯ã«ãã人ã¯ãããã圌ã®ããã«ã»ãããŒã«è¡ãããšãã§ããŸãã ããã¯ã»ãã³ãã£ã¯ã¹ã«é¢ãããã®ã§ãã ææ ã¯é£ããã§ãã ç¹ã«ãã¢ã«ãããã¯ã©ã€ã»ã³ã¹ã®äžã§ããœãŒã¹ã³ãŒããæäŸã§ããSentiStrengthããããŸãã ãã ããååãšããŠããã®ãããªã¿ã¹ã¯ã§ã¯ãäž»ãªå€ã¯ã³ãŒãã§ã¯ãªããã©ãã«ä»ãã®ã±ãŒã¹ã§ãã ãã®äžã§ãå®éšãèšç·Žããããšãã§ããŸãã 倧æåå°æåãåºå¥ãããªãå Žåãã³ãŒãã¯åœ¹ã«ç«ã¡ãŸããã 次ã«ãæ¢ã«èšç·Žãããæ¢è£œã®ã¢ãã«ïŒãã®ãããªãã®ãããïŒã䜿çšããããã±ãŒã¹ãäœæããå¿ èŠããããŸãã
質å ïŒNLP ã«é¢ããã©ã®æ¬ããå§ãããŸããïŒ
åç ïŒããŒãã¢ãã«ã§ã¯ãVorontsovã®èšäºãèªãã®ãçã«ããªã£ãŠããŸãã 圌ãã¯éåžžã«è¯ãæŠèŠãæäŸããŸãã NLPå šè¬ã«ã€ããŠã¯ãèªç¶èšèªåŠçãã³ãããã¯ããããŸãã ããªãæŠèŠ³ã§ããŸãããã»ãšãã©ãã¹ãŠã®ãããã¯ãã«ããŒãããŠããŸãã
質å ïŒèå³æ·±ãNLP補åãŸãã¯äŒæ¥ã¯äœã§ããïŒ
åç ïŒç§ã¯åé¡ã調æ»ããŠããŸãããããããããã®ãããªãã®ããããŸãã ä»äºã§ãã¯ããã¯ã䜿çšãã人ã¯ïŒ ãããã¯äž»ã«æ€çŽ¢ãšã³ãžã³ïŒGoogleãªã©ïŒãšå€§èŠæš¡ãªããã¹ãäŒæ¥ãæã€äŒæ¥ã§ãã Facebookã¯ãããããã®1ã€ã ãšæããŸãã
質å ïŒå°èŠæš¡ãªããŒã ã§ç«¶äºåã®ããããã°ã©ã ãäœæããã®ã¯çŸå®çã§ããïŒ
åç ïŒæ¬åœã«ã å€ãã®æªè§£æ±ºã®è³ªåããããŸãã ç¹ã«æ°ããåéã§çŸåšå©çšå¯èœãªãœãªã¥ãŒã·ã§ã³ãèŠãŠãããããã¯å€ãã®å Žåæè¡çã§ã¯ãããŸããã ããã¯ãç 究宀ãåŠçã«ãã£ãŠè¡ãããŸãã ãœãªã¥ãŒã·ã§ã³ã¯æŸèæã«æºã¡ãŠãããå¹æããããŸããã åªãããšã³ãžãã¢ãæ¡çšããå®æããã¢ã«ãããã¯è£œåã®æé©åã«ä»»ããã°ãçŽ æŽããããã®ãæã«å ¥ããããšãã§ããŸãã ããããåŠè¡çãªå°éç¥èãšåªãããšã³ãžãã¢ãªã³ã°ã¹ãã«ãå ±åããããšã¯ãã£ãã«ãããŸããã
質å ïŒèšèªã¯æèãã©ã®ããã«å¶éããŸããïŒ
åç ïŒåœŒããããè¡šçŸããŠããªãå Žåã èšèªãæèãè¡šçŸã§ããªãå Žåãéåžžã¯æ¡åŒµãããŸãã èšèªã¯çããŠããŸãã æªè§£æ±ºã®ã¿ã¹ã¯ãããŒãã¢ãã«ã®é²åãšåŒãã ã®ã¯ãªãã§ããïŒ æ°ãã瀟äŒçŸè±¡ã«å¯ŸããŠèšèãã©ã®ããã«çŸãããããã芳å¯ããŸãã èšèªã¯ã³ãã¥ãã±ãŒã·ã§ã³ããŒã«ã§ãã 圌ãã³ãã¥ãã±ãŒã·ã§ã³ã®åé¡ã®è§£æ±ºããããå Žåã圌ã¯æ¹åããŠããŸãã