ããã«ã¡ã¯Habrahabrã
Game of Thronesã®Graph Theoryèšäºã®æ祚ã«åºã¥ããŠãErik Germaniã«ãããã¬ãŒãã³ã°è³æã翻蚳ããŸããErikGermaniã¯ãäžèšã®èšäºã®åºç€ãšãªã£ãSong of Ice and Fireã·ãªãŒãºã®æåã®5åãããœãŒã·ã£ã«ãªã³ã¯ã°ã©ããåãåããŸããã ãã®èšäºã«ã¯ãæ©æ¢°åŠç¿æ¹æ³ã®è©³çŽ°ãªèª¬æã¯å«ãŸããŠããŸããããå®éã«ã¯ãæ¢åã®ããŒã«ã䜿çšããŠããã¹ãå ã®å¯Ÿè©±ã®èè ãæ€çŽ¢ããæ¹æ³ã説æãããŠããŸãã 泚æãããããã®æçŽïŒ è¡ãã
ãã®ãã¥ãŒããªã¢ã«ã¯ãç§ã1幎åã«ãã®ãããžã§ã¯ããå§ãããšãã®ããã«ãæ©æ¢°åŠç¿ã®åå¿è ã察象ãšããŠããŸãã ïŒãããŠãç§ã¯ä»ã§ãç§ã¯èª°ãªã®ããä»ã¯ç·ã§ããããã®ã¹ã¬ããã§ã¯æããç·ã§ã¯ãããŸãããïŒãžã§ãŒãžR.R. ããŒãã£ã³ã®ãæ°·ãšç«ã®æãã ãããè¡ãã«ã¯ãCRFæ¡ä»¶ä»ãã©ã³ãã ãã£ãŒã«ãæ³ïŒ è¿äŒŒ æ¡ä»¶ä»ãã©ã³ãã ãã£ãŒã«ããã ïŒãšã岡åŽçŽæã®ãã°ãããCRFsuiteãŠãŒãã£ãªãã£ã䜿çšããŸãã ããã¹ãåŠçã«ã¯ãPython 2.7ããã³NLTKïŒNatural Language ToolkitïŒã䜿çšããŸãã
ã§ããéã詳现ã«ãã¹ãŠã説æããããšããŸãã ç§ã®è¡åã®åã¹ãããã説æãããšãã«ãèªåã®ãããžã§ã¯ãã§åœ¹ç«ã€æ°ããããŒã«ãšã¡ãœãããèªåã§æœåºã§ããããšãé¡ã£ãŠããŸãã ã³ãŒãã¯åå¿è ããåå¿è ãŸã§èª¬æãããŸããåå¿è ã¯Pythonæ§æãç解ãããªã¹ãã®æœè±¡åã«ã€ããŠã¯ç¥ã£ãŠããŸããããã以äžã®ããšã¯ãããŸããã ããç§ã®ã³ãŒãã®èª¬æãããªãã®éãæ¶èãããŠãããšæããããããããé£ã°ããŠãã ããã
éèŠïŒæ¡ä»¶ä»ãã©ã³ãã ãã£ãŒã«ãã®æ¹æ³ã«é¢ããçè«ãããã§èŠã€ãããå Žåã¯ããã®è³æã¯é©ããŠããŸããã ç§ã«ãšã£ãŠãCRFsuiteã¯ç¿ã®è¶³ã§è§ŠããçŸããé»ãç®±ã§ãã ã¢ãã«ã®å¹çãäžããããã«ãã°ããæéãè²»ãããŸãããããã¯èª€ã£ãè©Šã¿ã§ããããšãããããŸãã ãããããªããæ··ä¹±ããããªããå¿ã«çããŠãããŠãã ããïŒ
- ç®±ããåºããŠããã«CRFsuiteã§è¯å¥œãªçµæïŒã75ïŒ
ã®ç²ŸåºŠïŒãéæããããšãã§ããŸãã
- LaTeXã¯ãããŸãã
ã²ãŒã ãã©ã³ã¯ã·ã³ãã«ã§ãã ä»ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ãšåæ§ã«ããã¬ãŒãã³ã°ãšæ€èšŒã®ããã«ããŒã¿ãæºåããå¿ èŠããããŸãã 次ã«ãã¢ã«ãŽãªãºã ãåé¡ã«äœ¿çšããããããã£ãéžæããŸãã ãããã®ããããã£ã䜿çšããŠããã¹ããåŠçããåŸãçµæãCRFsuiteã«ãã£ãŒãããããã§ããä»äºãç¥çŠããŸãã ïŒãŸãã¯ããã·ã³ã®æšæž¬ã確èªãã骚ã®æããäœæ¥ãè² æ ããŸãïŒã
å§ããŸãããã
ããã¹ããããŠã³ããŒã
ãŸã第äžã«ãããã¹ãã®ãœãŒã¹ã®ã³ããŒãèŠã€ããå¿ èŠããããŸãããã®ããã«éã®ä»£äŸ¡ãæ¯æããã©ããã¯ããªãã«ãä»»ãããŸãã
èªç¶èšèªåŠçãåããŠäœ¿çšããå ŽåããœãŒã¹ã³ãŒãã®é£ãããéå°è©äŸ¡ããå¯èœæ§ããããŸãã å.txtãã¡ã€ã«ã«ã¯ãåæåã®èšè¿°æ¹æ³ã決å®ãããšã³ã³ãŒãããããŸãã Ocarina of Timeã®ãã¥ãŒããªã¢ã«ãèªãã ASCII圢åŒã¯ããã¹ãŠã®ç¹æ®æåãåŠçã§ããUTF-8ã«çœ®ãæããããŸããã ïŒASCIIã¯128æåãè¡šãããšãã§ããŸããïŒç§ã®PLIPïŒããããã®æ°·ãšç«ã®æïŒã®ã³ããŒã¯UTF-8ã§ãããããå€å°äžäŸ¿ã«ãªããŸãããå®éã«ã¯ããŒãã¹ã§ãã
ãã®ããã¹ããNLTKã«ã¢ããããŒãããŠãæäœããããããŸãã NLTKã¯å€ãã®ã¿ã¹ã¯ãå®è¡ã§ããŸãããããPythonãåŠãã æ¹æ³ã§ãããããããããªãã«ãšã£ãŠèå³æ·±ããã®ã§ããããšãããã£ããããã°ããããªã³ã©ã€ã³ããã¯ãã芧ãã ãã ã ãã®ç®çã®ããã«ããã®ããŒã«ã䜿çšããŠããã¹ããããŒã¯ã³ã«åå²ããŸãã ããã¯ãèªç¶èšèªåŠçãããžã§ã¯ãã§ããè¡ãããããã«ãæãåèªãšå¥èªç¹ã«åå²ããããšãæå³ããŸãã
import nltk nltk.word_tokenize("NLTK is ready to go.")
['NLTK'ã 'is'ã 'ready'ã 'to'ã 'go'ã 'ã']
NLTKã«ã¯ã·ã§ã«ãããªããŒããããŠããŸãããèªåã§ã¢ããããŒãããå¿ èŠããããŸãã
ãã©ã«ããŒãäœæããããã«PLIPããã¹ããã¡ã€ã«ã貌ãä»ããŸãã æ¬ã¯éåžžã«å€§ãããããå ¬éãœãŒã¹ããã¹ãã¯ã»ãŒ10 MBã«ãªããŸãã ããã¹ãã®æ€çŽ¢ããã³çœ®æã«ã¯çæ³çã§ã¯ãããŸããã ããã¹ããæ¬ã«åå²ããŸãããããã£ãšåæããæ¬ç©ã®å°é家ã¯ãåæ¬ãç« ã«åããŠãé çªã«çªå·ãä»ããŸãã
ããããä»ãã¹ãŠãè€éã«ããªãã§ãã ããïŒ ããã¹ãããã©ã«ããŒã«å ¥ã£ããã次ãå®è¡ã§ããŸãã
corpus = nltk.corpus.PlaintextCorpusReader(r'corpus', 'George.*\.txt', encoding = 'utf-8')
ããã§ã rã¯æååãåŠçããªãããšã瀺ããŸãã ããã§ã¯é¢ä¿ãããŸãã ç§ã¯ãã³ãŒãã¹ããã©ã«ãã«çŽæ¥ã¢ã¯ã»ã¹ããŸãããããªãã®å Žåããã©ã«ãã®å Žæãé£ããå Žåã¯ãå¿ããªãã»ããããã§ãããã
2çªç®ã®åŒæ°ã¯ãååã«ãGeorgeããå«ãŸããæ¡åŒµåãã.txtãã§ãããã©ã«ããŒå ã®ãã¹ãŠã®ãã¡ã€ã«ãååŸããããã«NLTKã«æ瀺ããæ£èŠè¡šçŸã§ãã
ãšã³ã³ãŒããã©ã¡ãŒã¿ã¯éåžžã«éèŠã§ã-ããã¹ãã®ãšã³ã³ãŒããæå®ããããã®ãšäžèŽããªãå Žåããšã©ãŒãçºçããŸãã
NLTKã®æ¬æã¯éåžžã«äŸ¿å©ã§ãããŸããŸãªã¬ãã«ã®ããã¹ãããæ å ±ãååŸã§ããŸãã
corpus.words("George RR Martin - 01 - A Game Of Thrones.txt")[-5:]
[u'the 'ãu'music'ãu'of 'ãu'dragons'ãu 'ã']
corpus.words()[0]
u'PROLOGUE '
corpus.sents()[1][:6]
[u '\ u201c'ãu'We 'ãu'should'ãu'start 'ãu'back'ãu 'ã\ u201d']
Game of Thronesã®ããããŒã°ããéåœã®GaredãèããPythonã§è¡šçŸãããUnicodeæåã確èªããŸãã ãã¹ãŠã®Unicodeæååã¯uã§å§ãŸããç¹æ®æåãå«ãŸããŠããããšãããããŸãã \ u201cã¯å·ŠåŒçšç¬Šã\ u201dã¯å³åŒçšç¬Šã§ãã UTF-8ã®æ¹ãããŸãã ãšèšããŸãããããããçç±ã§ãã ãšã³ã³ãŒãã£ã³ã°ãæå®ããã«åããã¡ã€ã«ãéããšã©ããªããèŠãŠã¿ãŸãããã
bad_corpus = nltk.corpus.PlaintextCorpusReader(r'corpus', '.*\.txt') bad_corpus.sents()[1][:9]
['\ xe2'ã '\ x80 \ x9c'ã 'We'ã 'should'ã 'start'ã 'back'ã 'ã'ã '\ xe2'ã '\ x80 \ x9d']
\ uãUnicodeæååãæãããã«ã\ xã¯16é²æ°æååãæãã®ã§ãNLTKã¯3ã€ã®16é²æ°ãã€ãïŒ\ xe2ã\ x80ã\ x9cïŒãäžããããããåå²ããããšããŸãã 圌ã¯ãããè¡ãæ¹æ³ãç¥ããªãããšãããããŸãã
段èœãæ±ãã®ã§ããã®ãã¡ã®1ã€ãèŠãŠã¿ãŸãããã
print corpus.paras()[1]
[[u '\ u201c'ãu'We 'ãu'should'ãu'start 'ãu'back'ãu 'ã\ u201d'ãu'Gared 'ãu'urged'ãu'as 'ã u'the 'ãu'woods'ãu'began 'ãu'to'ãu'grow 'ãu'dark'ãu'around 'ãu'them'ãu 'ã']ã[u '\ u201c 'ãu'The'ãu'wildlings 'ãu'are'ãu'dead 'ãu'ã\ u201d ']]
NLTKãããŒã¿ãæ§é åããæ¹æ³ã«æ°ä»ããããããŸããã ãªãã¡ãŒã¯ããŒã¯ã³ã®ãªã¹ãã§ããã段èœã¯ãªãã¡ãŒã®ãªã¹ãã§ãã ç°¡åïŒ
ã¿ã°
次ã«ããã¬ãŒãã³ã°çšã®ããŒã¿ãæºåããå¿ èŠããããŸããããããè¡ãã«ã¯ã䜿çšããã©ãã«ã決å®ããå¿ èŠããããŸãã ããã¹ãã解æãããšããã¢ã«ãŽãªãºã ã¯ãããŒã¯ã³ãåå¥ã«ããŽãªã«å±ããŠããããšãèªèããŸããåã«ããŽãªã«ã¯ç¬èªã®ã©ãã«ããããŸãã JJã¯åœ¢å®¹è©ãNNã¯åè©ãINã¯å眮è©ã§ãã ãããã®ã©ãã«ã¯ãã¢ãã«ã®ä¿¡é Œæ§ã«ãããŠéèŠãªåœ¹å²ãæãããŸãã Penn Treebank ïŒ ãããããã¹ãã©ãã«ã®ãããžã§ã¯ã ïŒã¯ã36ã®ãã®ãããªã©ãã«ã匷調ããŠããŸã ã
ã¿ã°ã¯äœã«ãªããŸããïŒ æãåçŽãªãªãã·ã§ã³ã¯ãã£ã©ã¯ã¿ãŒåã§ãã ããã¯ããã€ãã®çç±ã§æ©èœããŸããïŒ
- PLIPã«ã¯1000æå以äžãå«ãŸããŠããŸãã ããã¯ç§ãã¡ã®è²§ããã¢ãã«ã«ãšã£ãŠã¯ããŸãã«ãå€ãã®éžæã§ãã å¹³å¡ãªéã«äŸåããŠæ£ããåé¡ããã«ã¯ãã§ããã ãå€ãã®ã¿ã°ãåãé€ãå¿
èŠããããŸãã
- æåã®æ±ãã¯ç°ãªããŸãã Joffreyã¯ããJoffreyãããJoffãããPrinceãããŸãã¯åã«ã圌ãã®ããããã§ãã
- ãã£ã©ã¯ã¿ãŒåãã©ãã«ãšããŠäœ¿çšããå Žåããã¬ãŒãã³ã°ããŒã¿ã§å®çŸ©ããå¿
èŠããããŸãã ããããªããšãã¢ãã«ã¯ãããã®ååšãèªèããªãããããããã決å®ã§ããŸããã
- ãã¹ãŠã®ãã£ã©ã¯ã¿ãŒã®é³ã¯åãã§ãã ïŒæ©æ¢°åŠç¿ã®å¥ã®çµéšã®ãããã§ããããå®çŸããŸãããããã§ã¯ãèªåœã«åŸã£ãŠãã£ã©ã¯ã¿ãŒãåé¢ããããšããŸããïŒã ããã€ãã¯ãVarisã®ãåŸæãïŒ çŽGrievous ïŒãHodorã®ãHodorããªã©ã®ãã£ãããã¬ãŒãºãæã£ãŠããŸãããããã¯ãŸãã§ãã ããã«ãå€ãã®äººã«ãšã£ãŠãä»ã®äººãšè©±ãã®ã«ååãªæéããããŸããã
ãã£ã©ã¯ã¿ãŒã®ååã«ããå®çŸ©ã¯éåžžã«é åçã§ããããã®èããæšãŠãŠãåæ§ã®åé¡ã解決ãããšãã«èªè ã®é ã®äžã§èµ·ããããã»ã¹ã«ã€ããŠèããŠã¿ãŸãããã
ããªãã«äžçªè¿ãæ¬ãåããã©ã³ãã ãªããŒãžãéããŠãããã§èª°ã話ããŠããã®ããå€æããŠãã ããã ãããã©ããã£ãŠããã®ïŒ ãã€ã¢ãã°ã®æšªã«ããæãè¿ãåºæåã確èªããŸãã
ã圌ããèŠãã ããããšã¬ãŒãã¯çããã
[...]
ã¯ã€ããŒã«ã»ãã€ã¹irã¯ç¡é¢å¿ã«ç©ºãèŠäžããã ãå€ã¯æ¯æ¥åãæéã§ãã æéã¯ããªãã®åæ°ã奪ããŸããïŒã
ãã ãããã€ã¢ãã°ã®ãã¹ãŠã®è¡ãããŒã¯ãããŠããããã§ã¯ãããŸããã ããã«èŠãŠãèŠãŠãã ããïŒ
ãäœã®äœçœ®ã«æ°ä»ããïŒã
äžã®æ®µèœãšäžã®æ®µèœãã芧ãã ããã äžã«2ã€ãããŸãã
ããããŠæŠåšïŒã
ãããã€ãã®å£ãšåŒã 1ã€ã¯ã2æ¬ã®åãæã€æ®é ·ãªéã®hadã§ãã...æ®é ·ãªéã 圌ã¯ãã®ç·ã®é£ã®å°é¢ã«åœŒã®æå ã«æšªããããŸãããã
ãã³ãã®äžæ»Žã§ã¯ãããŸããã 以äžã®2ã€ã®æ®µèœïŒ
è©ãããããŸãã ãäžäººã¯åŽã®è¿ãã«åº§ã£ãŠããã æ®ãã¯å°é¢ãäœãã«ãããŸãããã
ããŸãã¯ã圌ãã¯ç ã£ãããšãã€ã¹ã¯ç€ºåããã
ãŠã£ã«ã¯èªåããªãããšãç¥ã£ãŠããã®ã§ã圌ã¯ãã®æŒèª¬ã®èè ã§ã¯ãªããšèšãããšãã§ããå€ãã®ãã€ã¢ãã°ã¯ããã€ãã®æ®µèœã«åºãã£ãŠãããããæåã®è¡ã®èè ã¯ãã€ã¹ã§ãããšä»®å®ããŸãã
ãã®ã¹ããŒã ã¯ãã¢ãã«ãããŒã¯ããã®ã«åœ¹ç«ã¡ãŸãã ããã¹ãã®æšªã«èªåã®ååãèå¥ããããã«åœŒå¥³ã«æããŸããååããªãå Žåã¯ãè¿ãã®æ®µèœã調ã¹ãŸãã 次ã«ãã¿ã°ã¯æ¬¡ã®ããã«ãªããŸãã
PS±2ãFN±2ãNN±2ããã®ä»ã
PS-ã¹ããŒã«ãŒã®åŸã 段èœã®ã©ãã«ãPS -2ã®å Žåããã€ã¢ãã°ã®ååã話ãéšåã2段èœäžã«ããããšãæå³ããŸãã FN 1ã®å Žåã次ã®æ®µèœã®åã NN 0ã¯ãå°ãªããšã2ã€ã®ååã察話ã®åã«ããã察話ã«æãè¿ãååãå¿ èŠã§ããããšãæå³ããŸãã
ãŸãããã€ã¢ãã°ã®ããã¹ãã§åç §ãããæåã«ã€ããŠãADR±2ã決å®ããŸãã
ããŒã¯
次ã«ããã¬ãŒãã³ã°ããŒã¿ãæºåããŸãã ãã®SublimeTextã§åœ¹ç«ã¡ãŸãã ãã²ãŒã ãªãã¹ããŒã³ãºããšããããã¹ããéããå·ŠåŽã®åŒçšç¬Šã匷調衚瀺ãã[æ€çŽ¢]-> [ãã¹ãŠããã°ããæ€çŽ¢]ãéžæããããŒã ããŒã2åæŒããŸããã ããã§ãã«ãŒãœã«ã¯ãã€ã¢ãã°ã®ããå段èœã®å é è¿ãã«ãããŸãã 次ã«ãã{}ããšå ¥åããŸããã ãªããªã ããã¹ãã«ã¯äžæ¬åŒ§ã¯ãããŸããããã®åŸããããã䜿çšããŠãä»åŸäœ¿çšããã¡ã¢ãæ®ãããšãã§ããŸãã
æ£èŠè¡šçŸïŒïŒ<= \ {ïŒïŒïŒ= \}ïŒã䜿çšããŠãäžæ¬åŒ§ãé£ã³è¶ããŸãã ãã®èšèšã«äŒã£ãŠããªãå Žåããããã¯ååããªå顧çãã€äž»èŠãªãã§ãã¯ãšåŒã°ããŸãã æ¬åŒ§ã§å²ãŸããæåã®åŒã«ãããSublimeTextã¯å é ã«éå§äžæ¬åŒ§ïŒããã¯ã¹ã©ãã·ã¥ã§ãšã¹ã±ãŒããããïŒãããè¡ã®åŒ·èª¿è¡šç€ºãéå§ããŸãã 次ã®åŒã¯ãå³äžæ¬åŒ§ãããå Žåã«åæ¢ã瀺ããŸãã ã芧ã®ãšãããäž¡æ¹ã®åŒã¯aïŒ= Constructã§æ§æãããæåã®åŒã®ã¿ã«<ãå«ãŸããŠããŸãã
F3ãæŒããšããã©ã±ããéã移åã§ããŸããããã¯ãWindowsã®SublimeTextã§æ¬¡ã®ãã©ã±ãããèŠã€ããããã®ãããããŒã§ãã ãã®çš®ã®æé©åã¯éèŠã§ã çŽ1,000åã®ãã€ã¢ãã°ã«ã¿ã°ãä»ããŸãã å°ãªããšãç§ã¯ãããªã«ãããŸããã æã£ãã»ã©é£ãããæéãããããŸããã§ããã ïŒãã¶ãç§ã¯åãã€ããŠããŸããããã£ã1幎åŸã«çµãã£ãããã§ãïŒã
å§ããåã«ã1ã€ã®çºèšãããããšæããŸããäœçœ®ã©ãã«ïŒPSãFNãNNïŒã䜿çšãããããã¹ãŠåãæååã䜿çšããããèããŠãã ããã ååã䜿çšããªãããšã¯æ¢ã«è¿°ã¹ãŸããããäœçœ®ã©ãã«ã䜿çšããå Žåã¯ããã®ãã¬ãŒãã³ã°ããŒã¿ã察å¿ããã¢ãã«ã«é¢é£ä»ããŸãã Johnã®ãã€ã¢ãã°ã«ãJonããšããã©ãã«ãä»ãããšãå°æ¥çã«ã©ãã«ãå®äœçœ®ã®ãã®ã«å€æŽããããããé©åã«ä»ã®ã©ãã«ã䜿çšãããããããšãã§ããŸãã
åäžã®çãã¯ãªããšæããŸãã æšå¹Žãç§ã¯ãã£ã©ã¯ã¿ãŒåã§ã¿ã°ä»ãããŸããã 次ã«ããããŸãããè¿œå ããäºåçãªæäœãè¡ãå¿ èŠããããŸãã Eddardã®ååãäžã®2ã€ã®æ®µèœãšäžã®1ã€ã®æ®µèœã«è¡šç€ºãããå Žåãã©ã¡ããéžæããŸããïŒ ããã¯ã¢ãã«ã®åäœã«çŽæ¥åœ±é¿ãããããè¡ããšããã»ã¹ãããã«äžæ£ç¢ºã«ãªããŸãã ãããã£ãŠãäœãã¢ããã€ã¹ããã°ãããããããŸããã æåã¿ã°ã®èŠ³ç¹ããã¯ããã£ã©ã¯ã¿ãŒã®ååãæžãæ¹ãç°¡åãªããã§ãããèªååã®èŠ³ç¹ããã¯ãäœçœ®ã¿ã°ãæã€æ¹ãã¯ããã«äŸ¿å©ã§ãã
ããããã£ã®ååŸ
ããŠãããã¹ãã®äžéšã«ã¿ã°ãä»ããŸããã èªç¶èšèªåŠçãžã®ã³ãããã¡ã³ãã称è³ããŸãã ããã§å¿ èŠãªã®ã¯ã段èœãåŒæ°ãšããŠåãåããèå³ã®ããããããã£ã§ããŒã¯ããããã€ãã®é¢æ°ãæžãããšã§ãã
ã©ã®ããããã£ãéç¥ããŸããïŒ ã¢ãã«ã®æ£ç¢ºæ§ãæ åœããäž»åã¯ã次ã®æ©èœã§ããPSãFNããŸãã¯NNãçŸåšã®æ®µèœãŸãã¯é£æ¥ãã段èœã«ååšãããã©ããã
ååæ€çŽ¢
æåã®æ©èœã¯ãé©åãªååãèŠã€ããããšã§ãã ããã¯ãåè©ãå®çŸ©ããããšã§å®è¡ã§ããŸãã
sentence = corpus.paras()[33][0] print " ".join(sentence) print nltk.pos_tag(sentence)
ããã®ãããªéåŒããGaredãSer Waymarã¯èŠ³å¯ããã [ïŒu '\ u201c'ã 'NN'ïŒãïŒu'Such 'ã' JJ 'ïŒãïŒu'eloquence'ã 'NN'ïŒãïŒu 'ã'ã 'ã'ïŒãïŒu'Gared 'ã' NNP 'ïŒãïŒu'ã\ u201d 'ã' NNP 'ïŒãïŒu'Ser'ã 'NNP'ïŒãïŒu'Waymar 'ã' NNP 'ïŒãïŒu'observed'ã 'VBD 'ïŒãïŒu'ã 'ã'ã 'ïŒ]
SerãšWaymarã«è¿ãNPPã¯ãããããåºæåã§ããããšãæå³ããŸãã ããããæ¬ ç¹ããããŸãïŒ
- ãšã©ãŒãçºçããŸãã çµããã®åŒçšãé©åãªååã«ãªã£ãããšã«æ³šæããŠãã ããã
- åè©ã®èå¥ã«ã¯æéãããããŸãã
%timeit nltk.pos_tag(sentence)
100ã«ãŒãããã¹ã3ïŒã«ãŒãããã8.93ããªç§
asoiaf_sentence_count = 143669 ( asoiaf_sentence_count * 19.2 ) / 1000 / 60
45.974079999999994
PLIPã«ã¯ãåŠçããããã®å€ãã®æ®µèœããããåè©ããã¹ãããã³ãªãã¡ã¯ã¿ãªã³ã°ããã»ã¹ãé 延ããããšå€æããã®ã«45å以äžããããŸãã ãã¡ããããã¹ãŠãäžåºŠåæããŠãäœãèµ·ãã£ãã®ããåŒãç¶ã確èªã§ããŸãã ãã ãããã®ããã«ã¯ãããã«å¥ã®ããŒã¿æ§é ã«å¯ŸåŠããå¿ èŠãããããã®ãããªå®çŸ©ã¯ããœãŒã¹ããã¹ããå€æŽããããã³ã«ããçŽãå¿ èŠããããŸãã ïŒãããŠããã¯é¿ããããŸãããïŒ
幞ããªããšã«ããã£ã©ã¯ã¿ãŒåã決å®ããããã«åè©ã«é£çµ¡ããå¿ èŠã¯ãããŸããã ããã¯ãåæã®ããã«PLIPãéžæããå©ç¹ã®1ã€ã§ãããã§ã«åä¿¡ããããŒã¿ã倧éã«ãããŸãã ãããã®ããã€ããåããŸãããã
æ¢åã®æ å ±
Wiki Songs of Ice and Fireã§ããããšãããããŸãã ã ããŒããŒåã®ãªã¹ããå«ãããŒãžãæåéãã³ããŒããããšã§ããã£ã©ã¯ã¿ãŒåã®ã»ãŒç¶²çŸ çãªãªã¹ããåŸãŸããã çµæã¯ããã§èŠã€ããããšãã§ããŸã ã ããã§ååãªå Žåã¯ãèšäºã®æ¬¡ã®ç« ã§èª¬æããŸãã ããŒãžããããŒã¿ãèªåçã«æœåºããæ¹æ³ã«èå³ããã人ã®ããã«ãä»ã®ãããžã§ã¯ãã§äœ¿çšããããã€ãã®æ¹æ³ã玹ä»ããŸãã
Wget
æ¢ç¥ã®ãªã³ã¯ããã©ãå¿ èŠãããå Žåã«éåžžã«ã·ã³ãã«ãªåªãããŠãŒãã£ãªã㣠ã ãªã³ã¯ããã€ãã¹ããæ¹æ³ã«ã€ããŠèããå¿ èŠã¯ãããŸããããªã¹ããå«ããã¡ã€ã«ãäœæãã次ã®ããã«-iãã©ã°ã䜿çšããŠè»¢éããã ãã§ãã
wget -i list_of_links.txt
å¿ èŠæ¡ä»¶
Pythonã«ã¯ãåã ã®ããŒãžã§ã®äœæ¥ã«é©ããèŠæ±ã©ã€ãã©ãªããããŸãã
import requests r = requests.get("http://awoiaf.westeros.org/index.php/List_of_characters") html = r.text print html[:100]
<ïŒDOCTYPE html> <html lang = "en" dir = "ltr" class = "client-nojs"> <head> <ã¡ã¿æåã»ãã= "UTF-8" /> <ã¿ã€ãã«
解æ
htmlãããŠã³ããŒããããããªã³ã¯ã«ã¢ã¯ã»ã¹ããããã«ãäžèŠãªã¿ã°ããããŒãžãå¥é¢ããå¿ èŠããããŸãã BeautifulSoupã¯ãé¢åãªããšãªããªã³ã¯ãååŸã§ããHTMLããŒãµãŒã§ãã ã€ã³ã¹ããŒã«ãšè§£æåŸã次ãå®è¡ããã ãã§ãã¹ãŠã®ãªã³ã¯ãèŠã€ããããšãã§ããŸãã
parsed_html.find_all("a")
ããã§ããã«ã€ããŠãã£ãšèªãããšãã§ããŸã ã
lxmlã©ã€ãã©ãªã䜿çšããå¥ã®æ¹æ³ã«ã€ããŠèª¬æããããšæããŸãã ãã®ã©ã€ãã©ãªã䜿çšãããšãXpathãæäœã§ããŸãã Xpathã¯åããŠã§ãããããã¯ããªãŒæ§é ãããã²ãŒããã匷åãªæ¹æ³ã§ãã
import lxml.html tree = lxml.html.fromstring(html) character_names = tree.xpath("//ul/li/a[1]/@title") print character_names[:5]
['Abelar Hightower'ã 'Addam'ã 'Addam Frey'ã 'Addam Marbrand'ã 'Addam Osgrey']
äžããXpathåŒãå°ãããšã次ã®ããã«ãªããŸãã
tree.xpath("//ul # /li # /a[1] # . /@title # title ")
次ã«ãçµæã®äžããååã匷調衚瀺ããååãšã¯é¢ä¿ã®ãªãååãåé€ããå¿ èŠããããŸãã PLIPããŒãžãèŠãã ãã§ããMyrã®ããšããã®ãããªèŠçŽ ã«æ°ä»ããŸããã ã¢ãã«ããofãç²åããã€ã¢ãã°ã«äžèŽãããªãããã«ããŸãã
NLTKã¯ããã«åœ¹ç«ã¡ãŸãã ãæªããåèª-ã¹ãããã¯ãŒããå«ãããã¹ãæ¬æããããŸãã ããã¹ããç¹åŸŽä»ããæå³ããªãã»ã©äžè¬çãªãã®ã
particles = ' '.join(character_names).split(" ") print len(set(particles)) stopwords = nltk.corpus.stopwords.words('english') print stopwords[:5] particles = set(particles) - set(stopwords) print len(particles) # . .. Aegon I , # I . . "I" in particles
2167 ['i'ã 'me'ã 'my'ã 'myself'ã 'we'] 2146 æ¬åœ
æåŸã«ãããã¹ããã©ãã¯ãã£ãã·ã¥ããžã§ããªã©ãèŠéããããã¯ããŒã ãè¿œå ããå¿ èŠããããŸãã ååã®ãªã¹ãã«æºè¶³ããããå°æ¥ã®äœ¿çšã«åããŠãã¡ã€ã«ã«ä¿åããŸãã
ååãæ€çŽ¢ããŸãã ããŒã2
åè©ã䜿çšããŠååãèŠã€ãããšããèããæšãŠãååã®ãªã¹ããååŸããŸããã ããŒã¯ã³ã®ã·ãŒã±ã³ã¹ãæœåºããååã®ãªã¹ãã§ããããèŠã€ããããšãã§ãããã©ããã確èªããŸãã æåŸã«ãã³ãŒããäœæããŸãã
import itertools from operator import itemgetter particles = [particle.rstrip('\n') for particle in open('asoiaf_name_particles.txt')] tokens = [u'\u201c', u'Such', u'eloquence', u',', u'Gared', u',\u201d', u'Ser', u'Waymar', u'observed', u'.'] def roll_call(tokens, particles): speakers = {} particle_indices = [i for (i, w) in enumerate(tokens) if w in particles] for k, g in itertools.groupby(enumerate(particle_indices), lambda (i,x): ix): index_run = map(itemgetter(1), g) speaker_name = ' '.join(tokens[i] for i in index_run) speakers[min(index_run)] = speaker_name return speakers
ãã®é¢æ°ã¯ãæšå¹Žãã®ãããžã§ã¯ããè¡ã£ããšãã«äœ¿çšã§ããªãã£ãã©ã ãåŒã䜿çšããŸãã ãã®ãšãç§ã䜿çšããã¹ã¯ãªããã¯ã²ã©ãèªã¿ã«ããããããããŠå ¬éããããšã¯ããŸããã§ããã ããã«ããã®ã¹ã¯ãªããã§ã¯ãåå¿è ãæ°ããããšãåŠã¶ããšãã§ãããšæãã®ã§ãããã«ã€ããŠããå°ã説æããŸãã
Itertoolsã¯æ³šç®ã«å€ããããŒã«ã§ãã ç§ã¯ããå ¥ãåã眮æãåãé€ãããã«ããã䜿çšããŸãã ãã®äžã§ã groupbyé¢æ°ãå¿ èŠã§ãã å·çæç¹ã§ãã®é¢æ°ã®æ°ããããŒãžã§ã³ããªãªãŒã¹ããããããç§ã¯dropwhileãštakewhileãããå®å šã«groupbyã奜ã¿ãŸããããããååž°çã«äœ¿çšããŸããã
ããã°ã©ãã³ã°ãããšãã roll_callé¢æ°ã¯èŠã€ããååã®äœçœ®ãç¥ã£ãŠããã¹ãã ãšæããŸããã ããã§ãååã®ã·ãªã¢ã«çªå·ããã¹ãŠä¿æããããšã«ããŸããã ããã¯ãæ©èœã³ãŒãã®3è¡ç®ã§ç¢ºèªã§ããŸãã
particle_indices = [i for (i, w) in enumerate(tokens) if w in particles]
Enumerateã¯ãPythonã玹ä»ããããšãã«éåžžã«åœ¹ç«ã¡ãŸããã ãªã¹ããååŸããåèŠçŽ ã«å¯ŸããŠäžé£ã®ã·ãªã¢ã«çªå·ãšèŠçŽ èªäœãè¿ããŸãã
4è¡ç®ã¯ããã¹ãŠã®è³æã®äžã§ã³ãŒãã®æãæ±ãã«ããéšåã§ãããç§ã¯ãããæžããŸããã§ããã ã©ã€ãã©ãªã®ããã¥ã¡ã³ãããçŽæ¥ååŸãããŸã ã
for k, g in itertools.groupby(enumerate(particle_indices), lambda (i,x): ix):
Groupbyã¯ãªã¹ãã調ã¹ãã©ã ãé¢æ°ã®çµæã«å¿ããŠèŠçŽ ãã°ã«ãŒãåããŸãã ã©ã ãã¯å¿åé¢æ°ã§ãã roll_callãšã¯ç°ãªããäºåã«å®çŸ©ããå¿ èŠã¯ãããŸããã ããã¯ãåŒæ°ãåãå€ãè¿ãã³ãŒãã®äžéšã«ãããŸããã ãã®å Žåãã·ãªã¢ã«çªå·ããæ°åãåŒãã ãã§ãã
ãããã©ã®ããã«æ©èœããããèŠãŠã¿ãŸãããã
print tokens particle_indices = [i for (i, w) in enumerate(tokens) if w in particles] print particle_indices for index, location in enumerate(particle_indices): lambda_function = index-location print "{} - {} = {}".format(index, location, lambda_function)
[u '\ u201c'ãu'Such 'ãu'eloquence'ãu 'ã'ãu'Gared 'ãu'ã\ u201d 'ãu'Ser'ãu'Waymar 'ãu'observed'ãu 'ã'] [4ã6ã7] 0-4 = -4 1-6 = -5 2-7 = -5
ããã¯groupbyã®ããªãã¯ã§ã ãã€ã³ããã¯ã¹ã«ã¯é çªã«çªå·ãä»ããããããããªã¹ãå ã®é ç®ã次ã ãšç§»åããå Žåãã©ã ãã®çµæã¯åãã«ãªããŸãã
groupbyã¯-4ãèŠãŠãã°ã«ãŒãã«å€4ãå²ãåœãŠãŸãã 6çªç®ãš7çªç®ã®èŠçŽ ã¯äž¡æ¹ãšã-5ãæã¡ãããããã°ã«ãŒãåãããŸãã
ããã§ãè€ååã®å Žæããããããããã䜿çšããå¿ èŠããããŸãã groupbyã¯äœãè¿ããŸããïŒ ããŒãã©ã ãã®çµæãããã³ã°ã«ãŒãèªäœã ã°ã«ãŒããŒãªããžã§ã¯ãã 次ã«ã mapé¢æ°ã䜿çšããŠitemgetterïŒ1ïŒãé©çšãããã³ãã«ããèŠçŽ ãæœåºããã°ã«ãŒãã®ãã¹ãŠã®èŠçŽ ã«é©çšããŸãããããã£ãŠãå ã®ããŒã¯ã³ãªã¹ãã«ååã®ãªã¹ããäœæããŸãã
groupbyã®åŸãèŠã€ãã£ãååãæœåºãã ã¹ããŒã«ãŒã®é£æ³é åã«ä¿åããã ãã§ãã
roll_call(tokens, particles)
{4ïŒu'Gared 'ã6ïŒu'Ser Waymar'}
æé©å
ãã®é¢æ°ã®é床ããåè©ã䜿çšããæ¹æ³ãšæ¯èŒããŠã¿ãŸãããã
%timeit roll_call(tokens, particles)
100ã«ãŒãããã¹ã3ïŒã«ãŒãããã3.85 ms
æªããªãã5-6åé«éã ãããã setã䜿çšããŠçµæãæ¹åã§ããŸãã ã»ãã㯠ãã¢ã€ãã ããªã¹ãã«ãããã©ãããã»ãŒç¬æã«ãã§ãã¯ããŸãã
set_of_particles = set(particle.rstrip('\n') for particle in open('asoiaf_name_particles.txt')) %timeit roll_call(tokens, set_of_particles)
10000ã«ãŒããæé«3ïŒã«ãŒãããã22.6 µs
ã®ãªã·ã¢æåãé«éã§èŠããšãèªåãè¯ãããšãç解ããŸãã
äŒè©±åã®æ€çŽ¢
ããã§ããã€ã¢ãã°ã®ããã¹ãã®åãäžãåŸã®æåã®ååãèŠã€ããããã«ãé©åãªå Žæã§äžèšã®é¢æ°ãåŒã³åºãããã°ã©ã ãäœæããå¿ èŠããããŸãã ããããã¹ãŠããç§ãã¡ã®ããã«ãã£ã©ã¯ã¿ãŒåã®å®å šãªãªã¹ããåéã§ããã¯ã©ã¹ã«å ¥ãããããããããã£ãæœåºããããã®å¥ã®ã¢ã«ãŽãªãºã ã«æž¡ããŠãããCRFsuiteã«æž¡ããŸãã
ããããæåã«ãããŒã¿ãæŽçããããšæããŸãã
XMLããŒãµãŒ
Xpathã䜿çšãã1è¡ã®ã³ãã³ããæåããåŸãããã¹ããã¡ã€ã«çšã®XMLããŒãµãŒãäœæããããšã«ããŸããã ãã®åœ¢åŒãéžæããããšã«ã¯å€ãã®æå³ããããŸãã PLIPã¯ããã©ã°ã©ãã§æ§æãããç« ãããããã®äžã«ã¯ãã€ã¢ãã°ãå«ãŸããå€æ°ã®æ¬ããããŸããããããæ éã«ããŒã¯ããå¿ èŠããããŸãã ããã¹ããXMLã«ç¿»èš³ããŠããªãã£ãå ŽåïŒæåã¯ç¿»èš³ããŠããªãã£ãå ŽåïŒãã©ãã«ã¯ããã¹ãèªäœãæ£ããããŠããã§ãããã
以äžã®ã¹ã¯ãªããã«ã€ããŠã¯é»ã£ãŠããæ¹ã奜ãã§ããPythonã§ã®æåã®ã¹ãããã巚倧ãªé¢æ°ãæŸèæãé·ãååã®å€æ°ãæãåºãããŠãããŸãã
from lxml import etree import codecs import re def ASOIAFtoXML(input): # input . root = etree.Element("root") for item in input: title = item["title"] current_book = etree.Element("book", title=item["title"]) root.append(current_book) with codecs.open(item["contents"], "r", encoding="utf-8") as book_file: # , . current_chapter = etree.Element("chapter", title="Debug") for paragraph in book_file: paragraph = paragraph.strip() if paragraph != "": title_match = re.match("\A[AZ\W ]+\Z", paragraph) if title_match: current_chapter = etree.Element("chapter", title=title_match.group()) current_book.append(current_chapter) else: current_graf = etree.SubElement(current_chapter, "paragraph") while paragraph != "": current_dialogue = current_graf.xpath('./dialogue[last()]') speaker_match = re.search("(\{(.*?)\} )", paragraph) if speaker_match: speaker_tag = speaker_match.group(1) speaker_name = speaker_match.group(2) paragraph = paragraph.replace(speaker_tag, "") open_quote = paragraph.find(u"\u201c") if open_quote == -1: if current_dialogue: current_dialogue[0].tail = paragraph else: current_graf.text = paragraph paragraph = "" elif open_quote == 0: current_dialogue = etree.SubElement(current_graf, "dialogue") if speaker_name: current_dialogue.attrib["speaker"] = speaker_name close_quote = paragraph.find(u"\u201d") + 1 if close_quote == 0: # find -1 , 0 # . # . close_quote = len(paragraph) current_dialogue.text = paragraph[open_quote: close_quote] paragraph = paragraph[close_quote:] else: if current_dialogue: current_dialogue[0].tail = paragraph[:open_quote] else: current_graf.text = paragraph[:open_quote] paragraph = paragraph[open_quote:] return root tree = ASOIAFtoXML([{"title": "AGOT", "contents": "corpus/train_asoiaf_tagged.txt"}]) # . # et = etree.ElementTree(tree) # et.write(codecs.open("asoiaf.xml", "w", encoding="utf-8"), pretty_print=True)
äžèšã®ã³ãŒãã®æ¬è³ªïŒlxmlã䜿çšããŠããªãŒãäœæããããã¹ãã1è¡ãã€ç¢ºèªããŸãããã®è¡ãç« ã®ååïŒå€§æåãå¥èªç¹ãã¹ããŒã¹ïŒãšããŠèªèãããŠããå ŽåãçŸåšã®æ¬ã®äžéšã«æ°ããç« ãè¿œå ããŸããç« ã®æ¬æãèªã¿çµãããããã«ãå¥ã®æ£èŠè¡šçŸã䜿çšããŠæ®µèœãèªã¿ãã ããäŒè©±ã話ããããå€æãããããäŒè©±ã®å¯Ÿå¿ããé ç¹ã«è¿œå ããŸãã以åã¯ããã¡ããæ¢ã«ã©ãã«ä»ããããŠããªããã°ãªããŸããã
XMLã«é¢ããèå³æ·±ãã¡ã¢ãããã¯éå±€æ§é ã§ããããããã®æ§è³ªäžãå³å¯ãªåå²ãå¿ èŠã§ãããæäžéšãæäžéšã«ãããŸããããããããã¯æ£æã§ã¯ããã§ã¯ãããŸãããæ£æã§ã¯ã察話ã¯ããã¹ãå ã«ãããŸãã lxmlã¯ãœãªã¥ãŒã·ã§ã³ãæäŸããŸãïŒããã¹ããšããŒã«ããããã£ãŠãXMLé ç¹ã¯ããã¹ããæ ŒçŽããŸããããã®ããã¹ãã¯æ¬¡ã®é ç¹ãè¿œå ãããåŸã«äžæãããŸãã
markup = '''<paragraph>Worse and worse, Catelyn thought in despair. My brother is a fool. Unbidden, unwanted, tears filled her eyes. <dialogue speaker="Catelyn Stark"> âIf this was an escape,â</dialogue> she said softly, <dialogue speaker="Catelyn Stark">âand not an exchange of hostages, why should the Lannisters give my daughters to Brienne?â</dialogue></paragraph>''' graf = lxml.etree.fromstring(markup) print graf.text
ããã«æªãããšã«ãã«ããªã³ã¯çµ¶æçã«èããŸãããç§ã®å åŒã¯ã°ãã§ãã ç®ã«èŠããªããäžå¿ èŠãªæ¶ã圌女ã®ç®ãæºãããã
print graf[0].text
ããããè±åºã ã£ããªãã
æ®ãã®ã圌女ã¯ç©ããã«èšã£ããã¯ã©ããªããŸããïŒå€æ°ã®é ç¹ã®æ«å°Ÿã«ä¿åããŸãã
print graf[0].tail
圌女ã¯ãã£ãšèšã£ãã
ãªã©ããã€ã¢ãã°ã®åé ç¹ã«æ®ãã®ããã¹ããè¿œå ããŸãã
ãã®çµæããã€ã¢ãã°äœæè ã®æ€çŽ¢ãå¿ èŠãªãšãã«å€§å¹ ã«ç°¡çŽ åãããŸãããããŠä»ããå¿ èŠã§ãïŒ
class feature_extractor_simple: """Analyze dialogue features of a paragraph. Paragraph should be an lxml node.""" def __init__(self, paragraph_node, particles, tag_distance=0): self.paragraph = paragraph_node self.particles = set(particles) self.tag_distance = tag_distance self.raw = ''.join(t for t in self.paragraph.itertext()) self.tokens = self.tokenize(self.raw) def tokenize(self, string): return nltk.wordpunct_tokenize(string) def find_speakers(self, tokens): speakers = {} particle_indices = [i for (i, w) in enumerate(tokens) if w in self.particles] for k, g in itertools.groupby(enumerate(particle_indices), lambda (i,x): ix): index_run = map(itemgetter(1), g) speaker_name = ' '.join(tokens[i] for i in index_run) speakers[min(index_run)] = speaker_name return speakers def pre_speak(self, prior_tag="FN", near_tag="NN"): # . features = {} if self.paragraph.text is not None: speakers = self.find_speakers(self.tokenize(self.paragraph.text)) if len(speakers) > 0: features.update({"{} {}".format(prior_tag,self.tag_distance): speakers.values()[0]}) if len(speakers) > 1: features.update({"{} {}".format(near_tag,self.tag_distance): speakers[max(speakers.keys())]}) return features def dur_speak(self, tag="ADR"): # . features = {} for dialogue in self.paragraph.itertext("dialogue", with_tail=False): tokens = self.tokenize(dialogue) named = self.find_speakers(tokens) addressed = {k: v for (k, v) in named.items() if tokens[k-1] == "," or tokens[k + 1 + v.count(" ")].startswith(",")} if len(addressed) > 0: features.update({"{} {}".format(tag, self.tag_distance): addressed[max(addressed.keys())]}) return features def post_speak(self, tag="PS"): features = {} # . tails = [line.tail for line in self.paragraph.iterfind("dialogue") if line.tail is not None] for tail in tails: tokens = self.tokenize(tail) speakers = {k: v for (k, v) in self.find_speakers(tokens).items() if k <= 1} if len(speakers) > 0: features.update({"{} {}".format(tag, self.tag_distance): speakers[min(speakers.keys())]}) break return features
ãããã®æ©èœã«é¢ããããã€ãã®èšèã
PythonãåããŠäœ¿çšããå Žåã¯ãã¯ã©ã¹ãæããªãã§ãã ãããéåžžã®é¢æ°ãäœæããselfãåŒæ°ãšããŠæž¡ãã ãã§ããããã«ãããé¢æ°ãçŸåšåŠçããŠãããªããžã§ã¯ããPythonã«éç¥ãããŸããã¯ã©ã¹ã¯ã¯ããŒã³ãã¡ã¯ããªã®ãããªãã®ã§ããããªããžã§ã¯ãã¯ã¯ããŒã³ã§ãããã¹ãŠã®ã¯ããŒã³ã¯åãDNAãæã¡ããããã¯æ¹æ³ãšå€æ°ã§ããã人çã®çµéšã®ããã«ãæ§æ Œã¯ç°ãªããŸãããã®ã³ã³ããã¹ãã§ã¯ãã¯ããŒã³ã¯éä¿¡ãããããŒã¿ã§ãã
ã¯ã©ã¹ã«ã¯ããªããžã§ã¯ãå€æ°ãåæåã§ããç¹å¥ãªé¢æ°__init__ããããŸãã
ãªã©ãã¯ã¹ã§ããããã«ãªããŸãã ããŒã¿ã¯ç¹å¥ãªã¯ã©ã¹ã®æã«ãããŸãããããŠãããªãã¯åœŒã®è¡åãæœè±¡åããã®ã§ãæãã¯ãªãã¯ããã ãã§ã圌ã«ãã£ãŠåŠçãããæ å ±ãåŸãããšãã§ããŸãã
paragraph = tree.xpath(".//paragraph")[32] example_extractor = feature_extractor_simple(paragraph, particles) print example_extractor.raw print example_extractor.pre_speak() print example_extractor.dur_speak() print example_extractor.post_speak()
ããã®ãããªéåŒããGaredãSer Waymarã¯èŠ³å¯ããããããªããããªãã®äžã«ãããæã£ãŠãããšã¯æããªãã£ããã {} {'ADR 0'ïŒu'Gared '} {'PS 0'ïŒ 'Ser Waymar'}
äžéšã®æ©èœã®åäœã«æ··ä¹±ããŠããå Žåã¯ããããã®æ©èœã«ã€ããŠç°¡åã«èª¬æããŸããäžèšã®ãã¹ãŠãããªãã«åãå ¥ããããããã«èŠãããªããããªãã¯äœããã¹ãããç¥ã£ãŠããŸãã次ã®ç« ã§äŒããŸãããã
é£æ³é åã®æ±ãã«ããæäœããããŸããããã¯ãPythonã§é åºä»ããããŠããªãããã§ãã家ãåºããšãã«ãã±ããã«éµããªãããã¢ãããã¯ããŠãããšæããæ°æã¡ãæãåºãããŸããå Žåã«ãã£ãŠã¯ãæåã®æåãååŸãããæåŸã®æåãååŸããããåžžã«ç¢ºèªããå¿ èŠããããŸãããããŒã®å€ãèŠãŠãæå°/æ倧ãéžæããŸãã
pre_speak
äžã§èšã£ãããã«ãããã¹ãå±æ§ã«ã¯ããã€ã¢ãã°ã®æåã®è¡ãŸã§ã®ãã¹ãŠã®ããã¹ããå«ãŸããŠããŸãããã®äžã®ãã£ã©ã¯ã¿ãŒã®ååãèŠã€ããã ãã§ãã
dur_speak
ååãå€ãã®è¡ã§æ§æããããã€ã¢ãã°ã®æ¬æã«ããå Žåãããããã¹ãŠã確èªããå¿ èŠããããŸãã
for dialogue in self.paragraph.itertext("dialogue", with_tail=False)
æ©èœitertextã§lxmlã®ã¯ãããªãããã¹ãŠã®ããã¹ãããããååŸããããšãã§ããŸãããŸãããã©ã°with_tail = Falseãèšå®ããŠããããŒã«ãã®ãªãé ç¹ã®ã¿ãæ€çŽ¢ããŸããããã¯ããã€ã¢ãã°ã®ããã¹ãã®ã¿ãæå³ããŸãã
ãã£ã©ã¯ã¿ãŒã®ååãèŠã€ãããããã«ãã³ã³ãã§åºåããããã£ã©ã¯ã¿ãŒã®ã¿ãéžæããå¿ èŠããããŸããããã«ãããã¢ããŒã«ãèŠã€ããããšãã§ããŸããïŒããšãã°ããããããçŽæããŠãã ãããã/ãçŽæããŠããããããïŒ
ç§ã¯ããã€ã¢ãã°ã§èŠã€ãã£ãå§ã次ã®æ®µèœã§çããå¯èœæ§ãéåžžã«é«ããšå éšããæããŠããã®ã§ãèšèŒãããŠããå§ã§å®å ãæžãæããŸãã
post_speak
ãã®æ©èœã§ã¯ããã€ã¢ãã°ã®åŸã®æåã®æåã®ã¿ãå¿ èŠã§ãããããã£ãŠããµã€ã¯ã«ãèŠã€ãããšããã«äžæããŸãã
é¢æ°ã¯ãéãåŒçšç¬Šã®åŸã®æåã®2ã€ã®ããŒã¯ã³ã調ã¹ãŸãããã®ããã次ã®ãããªãã€ã¢ãã°ã衚瀺ãããŸãã
ãããããªãããšãžã§ã³ã¯èšã£ãã
åå¿è ããã°ã©ããŒåãã®ãã³ãïŒãªã¹ããäœæãããšãã«ãã§ããé¢æ°ãåŒã³åºãããšãã§ããŸãã
tails = [line.tail for line in self.paragraph.iterfind("dialogue") if line.tail is not None]
ããã«ããããã€ã¢ãã°ã1è¡ã§ååŸã§ããŸãããïŒæ¡ä»¶ãæå®ããã ãã§ãããŒã«ãªãã§ãã¹ãŠã®çµæãåé€ã§ããŸãïŒ
CRFsuite
ãããããããã¯ããªãã«ãšã£ãŠæãèå³ã®ããéšåã§ããããæ¡ä»¶ã«å¿ããŠã©ã³ãã ãªãã£ãŒã«ããå«ãŸããŠãããã³ãã³ãã©ã€ã³ããèµ·åãããŸãããå éšããã©ã®ããã«æ©èœãããã確èªããæ¹æ³ã¯ãããŸããã
ãããå®éãCRFsuiteã¯éåžžã«ã·ã³ãã«ã§èå³æ·±ãéšåã§ããè³æãæžããŠãããšãã«ã圌ã¯Pythonã®ã©ã€ãã©ãªãæã£ãŠããããšãããããŸããããä»ã¯ç©äºãè€éã«ãããã³ãã³ãã©ã€ã³ã䜿çšããŠå®è¡å¯èœãã¡ã€ã«ã䜿çšããŸãã
ïŒæ¬¡ã®æ¬ãWinds of Winterãã
æ¥ã®ç®ãèŠããšãã¢ãã«ãæŽæ°ããäºå®ã§ãããããããããèµ·ãããŸã§ããšæ°å¹ŽãããŸãïŒCRFsuiteãå¿ èŠãšããã®ã¯ã次ã®ãããªã¿ãåºåãã®ããããã£ãæã€ããã¹ãã§ãã
FN 0 Graf Sent Len = 4 FN 1 = True FN -2 = True FN 0 = True NN 1 = True
ããã¯ããã¬ãŒãã³ã°ããŒã¿ã®åœ¢åŒã§ããæåã®å±æ§ã¯æ£è§£ã§ããåŸç¶ã®ãã¹ãŠã®ããããã£ãèŠãç®ã¯äŒŒãŠãããããããŸããããã³ãã³ã¯äœ¿çšããªãã§ãã ãã-ããã¯éã¿ä»ããããããããã£ã®ããã§ããããã誀ã£ã解éã«ã€ãªããå¯èœæ§ããããŸãã
crfsuite.exeãããå Žæã§ã³ãã³ãã©ã€ã³ãéããããã«æ¬¡ãå ¥åããå¿ èŠããããŸãã
crfsuite learn -m asoiaf.model train.txt
ããã«ããããã¹ãŠã®é è³ã§ããã¢ãã«ãäœæãããŸããããªãã¯åœŒå¥³ã«å¥œããªãã®ãäœã§ãåŒã¶ããšãã§ããŸããç§ã¯ç§ã®asoiafãšåŒã³ãŸãããã¢ãã«ã®ç²ŸåºŠã確èªããã«ã¯ã次ãå ¥åããŸãã
crfsuite tag -qt -m asoiaf.model test.txt
ã¿ã°ä»ãã®ããã«ã¢ãã«ãå®éã«å®è¡ããã«ã¯ã次ãå ¥åããŸã
crfsuite tag -m asoiaf.model untagged.txt
untagged.txtã¯ã®ããã«èŠãããšãã¹ãã§ããtrain.txtããæåã«æ£ããçããå±æ§ã§ã¯ãªããããªãã¡ããã®ãããªãã®ïŒ
NN -1 = True FN 0 = True FN 2 = True FN -1 = True NN 0 = True
ããã§è©³çŽ°ã確èªã§ããŸãã
ã¢ãã«ã®ç²ŸåºŠãåäžãããããšãã§ããå€ãã®ããããã£ãããã£ãŠã¿ãŸããããæãåçŽãªãã®ããå§ããŸãããã段èœå ããã³æ®µèœã®è¿ãã®äœçœ®ã©ãã«ã®äœçœ®ã決å®ããããŒã«å€ã䜿çšããŸãã
ç¹°ãè¿ãã«ãªããŸãããããããã£ãæœåºããããã®ã¯ã©ã¹ã¯ãæåã«ããã€ãã®æ°ããæ©èœãè¿œå ãããŸããã
class feature_extractor: """Analyze dialogue features of a paragraph. Paragraph should be an lxml node.""" def __init__(self, paragraph_node, particles, tag_distance=0): self.paragraph = paragraph_node self.particles = set(particles) self.tag_distance = tag_distance self.raw = ''.join(t for t in self.paragraph.itertext()) self.tokens = self.tokenize(self.raw) self.speaker = self.xpath_find_speaker() def features(self): features = {} features.update(self.pre_speak()) features.update(self.dur_speak()) features.update(self.post_speak()) return features def local_features(self): # features = [] if self.tokens.count(u"\u201c") == 0: features.append("NoQuotes=True") prior = self.paragraph.getprevious() try: last_dialogue = list(prior.itertext("dialogue", with_tail=False))[-1].lower() hits = [w for w in ['who', 'you', 'name', '?'] if w in last_dialogue] if len(hits) > 2: features.append("Who Are You?=True:10.0") except (AttributeError, IndexError): pass try: dialogue = list(self.paragraph.itertext("dialogue", with_tail=False))[0].lower() for token in ['name', 'i am', u'i\u2019m']: if token in dialogue: features.append("My Name=True:10.0") break except (AttributeError, IndexError): pass if self.tokens[0] in self.particles: features.append("FirstSpeakerIndex0=True") if self.paragraph.text is not None: name_precount = len(self.find_speakers(self.tokenize(self.paragraph.text))) if name_precount > 2: features.append("Many Names Before=True") conjunctions = set([w.lower() for w in self.tokenize(self.paragraph.text)]).intersection(set(['and', 'but', 'while', 'then'])) if len(conjunctions) > 0 and self.paragraph.find("dialogue") is not None: features.append("Conjunction in Head=True") short_threshold = 10 if len(self.tokens) <= short_threshold: features.append("Short Graf=True") dialogue_length = sum(map(len, self.paragraph.xpath(".//dialogue/text()"))) dialogue_ratio = dialogue_length / len(self.raw) if dialogue_ratio == 1: features.append("All Talk=True") elif dialogue_ratio >= 0.7: features.append("Mostly Talk=True") elif dialogue_ratio < 0.3 and not self.tokens < short_threshold: features.append("Little Talk=True") return features def feature_booleans(self): bool_features = [] for tag in ["PS", "FN", "NN", "ADR", ]: label = "{} {}".format(tag, self.tag_distance) if label in self.features().keys(): bool_features.append("{}=True".format(label)) else: bool_features.append("{}=False".format(label)) return bool_features def tokenize(self, string): return nltk.wordpunct_tokenize(string) def find_speakers(self, tokens): speakers = {} particle_indices = [i for (i, w) in enumerate(tokens) if w in self.particles] for k, g in itertools.groupby(enumerate(particle_indices), lambda (i,x): ix): index_run = map(itemgetter(1), g) speaker_name = ' '.join(tokens[i] for i in index_run) speakers[min(index_run)] = speaker_name return speakers def xpath_find_speaker(self): speakers = self.paragraph.xpath(".//@speaker") if speakers == []: return "NULL" else: return speakers[0] def pre_speak(self, prior_tag="FN", near_tag="NN"): # features = {} if self.paragraph.text is not None: speakers = self.find_speakers(self.tokenize(self.paragraph.text)) if len(speakers) > 0: features.update({"{} {}".format(prior_tag,self.tag_distance): speakers.values()[0]}) if len(speakers) > 1: features.update({"{} {}".format(near_tag,self.tag_distance): speakers[max(speakers.keys())]}) return features def dur_speak(self, tag="ADR"): # features = {} for dialogue in self.paragraph.itertext("dialogue", with_tail=False): tokens = self.tokenize(dialogue) named = self.find_speakers(tokens) addressed = {k: v for (k, v) in named.items() if tokens[k-1] == "," or tokens[k + 1 + v.count(" ")].startswith(",")} if len(addressed) > 0: features.update({"{} {}".format(tag, self.tag_distance): addressed[max(addressed.keys())]}) return features def post_speak(self, tag="PS"): features = {} # tails = [line.tail for line in self.paragraph.iterfind("dialogue") if line.tail is not None] for tail in tails: tokens = self.tokenize(tail) speakers = {k: v for (k, v) in self.find_speakers(tokens).items() if k <= 1} if len(speakers) > 0: features.update({"{} {}".format(tag, self.tag_distance): speakers[min(speakers.keys())]}) break return features paragraph = tree.xpath(".//paragraph")[-1] example_extractor = feature_extractor(paragraph, particles) print example_extractor.raw print example_extractor.features() print example_extractor.local_features() print example_extractor.feature_booleans()
ãããŠåœŒãã®æã«ãçå£ã {} [ãNoQuotes = TrueãããShort Graf = TrueãããLittle Talk = Trueã] [ãPS 0 = FalseãããFN 0 = FalseãããNN 0 = FalseãããADR 0 = Falseã]
æšå€ãææžåãããŠããªãæ©æ¢°åŠç¿ã®çæ°ã®äžã§ãç§ã¯å€ãã®ç¹æ§ãæ¹åããããšããŸããã以äžã¯ãå ¬éå¯èœãªãã©ããã®äžéšã§ãã
ãªãã·ã§ã³1ïŒçã®äœçœ®ããŒã«å€ã®ã¿
ã©ãã«ã«ãŠã³ããªã³ãŒã« PS 0 207 0.9949 FN 0 185 0.95 NULL 118 0.3492 OTHER 56 0.3939 PS - 2 44 0.5238 Item accuracy: 430 / 678 (0.6342)
ããã«ããã®ãããªçµ±èšã®å€ãã«äŒãã®ã§ããããã®æå³ãããã«å€æããŸãããã
ç§ãã¡ã人ã ãèŠãŠå€é£ã«ãããšæ³åããŠãã ãããã©ã³ãã ãªéè¡äººãã€ã«ãããã£ãã©ãããå€æããããã«é Œã¿ãŸãããããªãã¯ãé°è¬çè«ãå®å šã«ä¿¡ããŠãã人ãšããŠãdumpåãé£ã¹çµããããéè¡äººã«ã¿ã°ãä»ãå§ããŸããããã§ã¯èæ ®ãããªãå€ã§ãã
粟床ïŒãããPrecisionïŒã¯ã第1çš®ã®ãšã©ãŒã®é »åºŠã瀺ããŸããèšãæããã°ãããªããééã£ãŠã€ã«ãããã£ã®äžã§äººãã©ã³ã¯ä»ãããé »åºŠã
å®å šæ§ïŒãããRecallïŒã¯ãã¢ãã«ãæ£ãã決å®ããæ€èšŒããŒã¿ã®ã©ãã«ã®æ°ã枬å®ããŸãã
F1ã¯äž¡æ¹ã®ã©ãã«ã®çµã¿åããã§ãããã¹ãŠã®äººã ãã€ã«ãããã£ãšããŠåé¡ãããšãæ倧éã®å®å šæ§ãšããããªç²ŸåºŠãä¿èšŒãããããšãããããŸãã
ãªããªããã¹ãŠãããŒã¯ãããŠããã®ã§ãã¢ãã«ã®ç²ŸåºŠã«ã¯ããŸãèå³ããããŸãããå®å šæ§ãšæ£ç¢ºããå¿ èŠã§ãã
ããããã£ã®æåã®ããŒãžã§ã³ã§ã¯ãçã®ããŒã«å€ã®ã¿ãèæ ®ããŸãããã€ãŸãäžèšã®æ®µèœã§ã¯ããã¹ãŠã®ã»ããã¯ãADR 0 = Trueãããã³ãPS 0 = Trueãã®åœ¢åŒã§ããã粟床ïŒçŽã¢ã€ãã 粟床ïŒã¯63.4ïŒ ã§ããã
63.4ïŒ ã¯ããã§ããã§ããïŒNULLãPS 0ãããã³FN 0ããã¹ãããŒã¿ã®4åã®3ãæ§æããããããèªç¶ã«èŠã€ãããããšããäºå®ã«åºã¥ããŠãç§ãã¡ã¯ééããªãããè¯ãçµæãåºãããšãã§ããŸãã次ã«ãæ®ãã®äœçœ®ããŒã«å€falseãè¿œå ããŸãã
ãªãã·ã§ã³2ïŒãã¹ãŠã®äœçœ®ããŒã«å€
ã©ãã«ã«ãŠã³ããªã³ãŒã« NULL 254 0.9048 PS 0204 0.9899 FN 0 149 0.975 ãã®ä»24 0.2273 PS-2 19 0.2857 ã¢ã€ãã ã®ç²ŸåºŠïŒ515/678ïŒ0.7596ïŒ
ããã§ãåçŽãªã±ãŒã¹ãå®å šã«å®çŸ©ããé©åãªç²ŸåºŠãååŸããŸãã75ïŒ ã¯ãæåã®æ¬ãGame of ThronesããšãBattle of the Kingsãã®3åã®1ãããã³æ®ãã®4åã®3ã決å®ããããã®ã¢ãã«èªäœãããŒã¯ããã ãã§ããããšãæå³ããŸããäœæéãããããŸãããåœç¶ã®ããšã§ãã
ããã§ããNULLã¿ã°ã98ïŒ +ã®å®å šæ§ã§å®çŸ©ããªãçç±ã¯ãªãã®ã§ããããç®çãšããããããã£ãè¿œå ããŸãããã
ãªãã·ã§ã³3ïŒåŒçšç¬ŠïŒ
ã©ãã«ã«ãŠã³ããªã³ãŒã« PS 0 218 0.9907 NULL 180 0.9119 FN 0 167 0.9118 ãã®ä»63 0.3784 PS 2 25 0.5 ã¢ã€ãã ã®ç²ŸåºŠïŒ550/710ïŒ0.7746ïŒ
段èœã®éå§åŒçšç¬Šã®æ°ãã«ãŠã³ãããŸãã
NULLãããæ£ç¢ºã«ãªã£ãŠããªãããšã«é©ããŠãããšèšãããã§ããããã«åãçµãå¿ èŠããããŸããããã«FN 0ãæ¹åããããšæã
ãŸãããªãã·ã§ã³4ïŒåã®ã€ã³ããã¯ã¹ïŒ
ã©ãã«ã«ãŠã³ããªã³ãŒã« PS 0 218 0.9907 NULL 183 0.9057 FN 0 157 0.8971 ãã®ä»68 0.4189 PS-2 23 0.5484 ã¢ã€ãã ã®ç²ŸåºŠïŒ551/710ïŒ0.7761ïŒ
ãã®ããããã£ã«ã¯ãåã®ã€ã³ããã¯ã¹ãå«ãŸããŸãã
ããŒã...å€åè€éãããã®ã§ãããäžåºŠããŒã«å€ã«æ»ããŸãããã
ãªãã·ã§ã³5ïŒã€ã³ããã¯ã¹0ã®ååïŒ+åé·æ§
ã©ãã«ã«ãŠã³ããªã³ãŒã« PS 0 216 0.986 FN 0 166 0.9265 NULL 160 1 ãã®ä»85 0.5811 PS 2 32 0.7143 ã¢ã€ãã ã®ç²ŸåºŠïŒ578/710ïŒ0.8141ïŒ
ããã«ããïŒæåã®åŒçšç¬Šã®æ°ãæ£ããæ°ããªãã£ããããçµæãå°ç¡ãã«ãªããŸããã
ä¿®æ£ãããšããã«NULLãå®å šã«æ±ºå®ãããŸãããã¢ãã«ãæ¹åããç°¡åãªæ¹æ³ããªããªããŸãããä»ãç§ã¯æ¬åœã«çµæãããã«æ¹åããããã«å·¥å€«ããå¿ èŠããããŸãïŒãããåäœãããã©ããã®ãèŠãŠã¿ãŸããã...
2 - ïŒPSïŒ+ãšè©±ãããåŸïŒãªãã·ã§ã³6
ã¹ããŒã«ãŒã2ã€ã®æ®µèœã®äžãŸãã¯é»æµä»¥äžã§ããã°ããã§ã¯ãããŒã«å€ã䜿çšããŸããçè«çã«ã¯ãããã«ããPS -2ã®çµæãå¢å ããã¯ãã§ãã
ã©ãã«ã«ãŠã³ããªã³ãŒã« PS 0 216 0.986 FN 0 166 0.9265 NULL 160 1 ãã®ä»84 0.5676 PS 2 32 0.7143 ã¢ã€ãã ã®ç²ŸåºŠïŒ578/710ïŒ0.8141ïŒ
å¹æãªãïŒ
ãªãã·ã§ã³7ïŒã·ãŒã±ã³ã¹??
ã©ãã«ã«ãŠã³ããªã³ãŒã« PS 0 217 0.986 FN 0 168 0.9265 NULL 160 1 ãã®ä»82 0.5541 PS 2 30 0.6429 ã¢ã€ãã ã®ç²ŸåºŠïŒ576/710ïŒ0.8113ïŒã€ã³ã¹ã¿ã³ã¹ã®ç²ŸåºŠïŒ56/142ïŒ0.3944ïŒ
åŸ ã£ãŠïŒCRFã¯ã·ãŒã±ã³ã¹ãåŠçã§ããããšãããããŸãããå®éãããããã®æå³ã§ããã€ã³ã¹ã¿ã³ã¹ã®ç²ŸåºŠå€ïŒçŽã€ã³ã¹ã¿ã³ã¹ã®ç²ŸåºŠïŒãç¡èŠããŸããããªããªã ããã¯åžžã«0/1ã§ãããã€ãŸããã¢ãã«ã¯ããã¹ãå šäœã1ã€ã®é·ã察話ãšèŠãªããŠããŸããã
ç³ãèš³ãããŸããããç§ã¯èªåãå¹³ææã¡ããå¿ èŠããããŸãã粟床ãåäžããããšä»®å®ããŸã-ããã¯æªè§£æ±ºã®è³ªåã§ãã-ãã®æ©èœãã©ã®ããã«äœ¿çšããŸããïŒ5ã€ã®æ®µèœã§åã·ãŒã±ã³ã¹ã®é·ãã瀺ãããšãè©Šã¿ãŸããããããã¯æ£ããããã«æããŸããã
ããããã2ã€ã®é£ç¶ããNULLãäžèŽããå ŽåãäŒè©±ãå®äºãããšä»®å®ããŠãããã¯ã·ãŒã±ã³ã¹ã«ãªããŸãã
ããã§éãã åŸãç§ã¯äŒè©±ã§åäœããã¢ãã«ãæ§ç¯ããããšãã§ããŸããã§ãããç§ãç解ããŠããããã«ãã·ãŒã±ã³ã¹å ã®äœçœ®ã«å¿ããŠãå€ãã®ç¹å¥ãªãã©ã³ãžã·ã§ã³ãŠã§ã€ãïŒãããã®ãã©ã³ãžã·ã§ã³ãŠã§ã€ãïŒãå¿ èŠã§ãããããã£ãŠãã¢ãã«ã¯ãäŒè©±ã®éå§æãäžéããŸãã¯çµäºæã«ãç§ãã¡ã®äœçœ®ã«å¿ããŠç°ãªã決å®ãè¡ããŸãã
ããããã¢ãã«ã®åäœã«ã¯ããããèµ·ãã£ãŠããããšã瀺ããã®ã¯äœããããŸãããè¿ãå°æ¥ãä»ã®ããããã£ã§å°ãéãã§ã¿ãŸããã¯ãããã¬ãŒãã³ã°ããŒã¿ãšãã¹ãããŒã¿ãçæããã¹ã¯ãªãããèŠãŠã¿ãŸããããæé©åãããŠããŸãã å段èœã®ããããã£ã5åèšç®ããŸãããã®è³æã«ã€ããŠã¯ãã®ãŸãŸã«ããŠãããŸããã1ã€ã®ãµã€ã¯ã«ã䜿çšããŠæ®µèœã®ããŒã«ããããã£ãä¿æãã1ã€ã䜿çšããŠæ¢åã®æ®µèœã«è¿œå ãããšãé«éåã§ããããšã«æ³šæããŠãã ããã
tree = ASOIAFtoXML([{"title": "ASOIAF", "contents": "corpus/train_asoiaf_pos_tagged.txt"}]) paragraphs = tree.xpath(".//paragraph") In [29]: def prep_test_data(paragraphs): max_index = len(paragraphs) results = [] for index, paragraph in enumerate(paragraphs): extractor = feature_extractor(paragraph, set_of_particles) all_features = extractor.local_features() + extractor.feature_booleans() for n in [-2, -1, 1, 2]: if 0 <= n+index < max_index: neighbor_features = feature_extractor(paragraphs[index + n], set_of_particles, tag_distance = n).feature_booleans() if neighbor_features: all_features += neighbor_features all_features.insert(0, extractor.speaker) results.append("\t".join(all_features)) return results results = prep_test_data(paragraphs) In [31]: max_index = len(results) with codecs.open(r"new_test.txt", "w", "utf-8") as output: for line in results[:int(max_index/2)]: output.write(line + '\n') with codecs.open(r"new_train.txt", "w", "utf-8") as output: for line in results[int(max_index/2):]: output.write(line + '\n')
ãã®ä»ã®ããããã£
ä»ã®ããã€ãã®ããããã£ãè©ŠããŸããïŒ
- ãã€ã¢ãã°ã®æåã®è¡ã®åã®ååã®æ°ãæ°ããŸããçè«çã«ã¯ãããã¯NNãæãå€ãå Žæã§ããçµæã¯ãããŸããã
- 段èœã®å
šäœãŸãã¯äžéšãããŒã¯ããããããã£ã¯ããã€ã¢ãã°ã§ããããã¯PS -2ãšFN -2ã§ç¶æ³ãæ¹åããã®ã«åœ¹ç«ã¡ãŸããããéãã¯éèŠã§ã¯ãããŸããã§ããã
- çã/é·ã段èœãã¡ãã£ãšããã
- 察話åã®ããã¹ãã®ãããã³ããŸãã¯ããããããïŒç¡èŠãããNN 0ã«çŠç¹ãåœãŠãè©Šã¿ã§ïŒ
åŸè ã¯ããªãå·§åŠãªåãã ãšæããŸããããæ©èœããã81ïŒ ãè¶ ãã粟床ã¯åŸãããŸããã§ããã
æ€èšŒã§ãã¬ãŒãã³ã°ããŒã¿ãå€æŽããããšãããšããã84ïŒ ã«ãªããŸãããç¹å®ã®ããŒã¿ã®å€ãã®ããããã£ãæ¹åããã®ã«å€ãã®æéãè²»ããã¹ãã§ã¯ãããŸãããããã¯åèšç·Žã«ã€ãªãããŸããå®éããã¬ãŒãã³ã°ããŒã¿ãšãã¹ãããŒã¿ãæ··åããããšã¯è¯ãèãã§ããç§ã¯ããããæ··ããŸããã§ããããªããªã ããã¯ã·ãŒã±ã³ã¹ã®æå·ã«ã€ãªãããšæããŸãããããã䜿çšããªãã®ã§ããªãã§ããïŒããããæ··ããŸãã
å°ãæ··ãã£ãããŒã¿
82ïŒ ãåãåããŸããã
ããã£ãïŒããã§ã¹ãã«ã®éçã«éãããšæããŸãã
ç¶ç¶ã¯ãããŸãããïŒ
次ã«äœãã§ãããããŸãšããŠè©±ããŸãããã
- . 700 . 40000. , 1.7% 80%. ( 80%, 75%.) 10000 ? , , ADR, 700 .
- CRFsuite. , .
- .
- .
- Python. , . , âŠ
- . , OTHER. OTHER, , , , . OTHER â .
- . . . , , , , «». ; «» , .
ãããã«
ãããïŒããã誰ãã«åœ¹ç«ã€ããšãé¡ã£ãŠããŸããèªãã§ãããŠããããšãããããŠããããªããç§ã«é£çµ¡ããããªããç§ã¯ãã€ãã¿ãŒã«ããŸãã
ãŸããã²ãŒã ã»ãªãã»ã¹ããŒã³ãºã®å€§èŠæš¡ãªæ¹å€çç 究ã®ããã«äžèšã®ãã¹ãŠãè¡ãããããšã«æ³šæããããšæããŸããããªãããããã®æ¬ã®ãã¡ã³ã§ããã察話ã©ãã«ã®ãããã§å¯èœã§ãã£ãåæãèªã¿ãããªããç§ã¯ããã«ãã¹ãŠãå ¬éããŸãã