ããžã¿ã«å€é©ã·ãªãŒãº
æè¡èšäºïŒ
1. å§ãŸã ã
2. éè¡ã®ãããã¯ãã§ãŒã³ ã
3. æ©æ¢°ã«äººéã®éºäŒåãç解ããããã«æããŸãã
4. æ©æ¢°åŠç¿ãšãã§ã³ã¬ãŒã ã
5.èªã¿èŸŒã¿äž...
DZ Onlineãã£ã³ãã«ã§ã®Dmitry Zavalishinãžã®äžé£ã®ã€ã³ã¿ãã¥ãŒïŒ
1. Microsoftã®Alexander LozhechkinïŒå°æ¥ãéçºè ãå¿ èŠã§ããïŒ
2. Robot Veraã®Alexey KostarevïŒHR-aãããããã«çœ®ãæããæ¹æ³ã¯ïŒ
3. Dodo Pizzaã®Fedor OvchinnikovïŒã¬ã¹ãã©ã³ãã£ã¬ã¯ã¿ãŒãããããã«çœ®ãæããæ¹æ³ã¯ïŒ
4. ELSE Corp Srlã®Andrei GolubïŒèšå€§ãªæéã®è²·ãç©ãç¡é§ã«ããã®ãæ¢ããæ¹æ³ã¯ïŒ
ç§ãã¡ã¯æè¿ãè¥ãææãªå»åŠç 究äŒç€Ÿã§ããããã¯ã«ã¹ãšããŒãããŒã·ãããçµã³ãŸããã ããã¯ã«ã¹ã¯ãè¿ éãªè¡çèšæ°ã®ããã®å®äŸ¡ãªããããéçºããŠããŸãã ãã®ãããžã§ã¯ãã®ãã¬ãŒã ã¯ãŒã¯ã§ã¯ãç§åŠçããã³å»åŠçææžãåæããããšã«ããããã€ã¯ãRNAãšéºäŒåéã®ãªã³ã¯ãç¹å®ããåé¡ã«çŠç¹ãåœãŠãä»ã®å€ãã®åéã«é©çšã§ãã解決çãèŠã€ããŸããã
åé¡
Miroculusã·ã¹ãã ã®ã¿ã¹ã¯ã¯ãåã ã®ãã€ã¯ãRNAãšç¹å®ã®éºäŒåãŸãã¯çŸæ£ãšã®é¢ä¿ãç¹å®ããããšã§ãã ãããã®ããŒã¿ã«åºã¥ããŠãããŒã«ãéçºããã絶ããæ¹åãããŠãããããç 究è ã¯ãã€ã¯ãRNAãéºäŒåãçŸæ£ã®çš®é¡ïŒè «çãªã©ïŒã®é¢é£æ§ããã°ããç¹å®ã§ããŸãã
åã ã®ãã€ã¯ãRNAãéºäŒåãããã³çŸæ£éã®çžäºäŸåæ§ã«é¢ããå»åŠæç®ã«ã¯å€ãã®ç 究ããããŸããããã®ãããªæ å ±ãé åºä»ããããæ§é 圢åŒã§å«ãåäžã®éäžåããŒã¿ããŒã¹ã¯ãããŸããã
ããŸããŸãªã¿ã€ãã®ãã€ã¯ãRNAãšéºäŒåãååšããå¯èœæ§ããããŸãããããŒã¿ãäžè¶³ããŠãããããçµåãæœåºããåé¡ã¯ãã€ããªåé¡ã«çž®å°ãããŠããããã®ç®çã¯åã«ãã€ã¯ãRNAãšéºäŒåéã®æ¥ç¶ã®ååšãå€æããããšã§ãã
éæ§é åããã¹ãå ã®ãªããžã§ã¯ãéã®é¢ä¿ã®èå¥ã¯ãé¢ä¿ã®æœåºãšåŒã°ããŸãã
å³å¯ã«èšãã°ãã¿ã¹ã¯ã¯æ§é åãããŠããªãããã¹ãå ¥åãšãªããžã§ã¯ãã®ã°ã«ãŒããåãåããã第1ãªããžã§ã¯ãã第2ãªããžã§ã¯ããã³ãã¥ãã±ãŒã·ã§ã³ã¿ã€ãããšãã圢åŒã®ãã©ã€ã¢ãã®çµæã°ã«ãŒãã衚瀺ããŸãã ã€ãŸããããã¯ã æ å ±ãæœåºãã倧ããªã¿ã¹ã¯å ã®ãµãã¿ã¹ã¯ã§ãã
ãã€ããªåé¡ãæ±ã£ãŠãããããã»ã³ãã³ã¹ãšãªããžã§ã¯ãã®ãã¢ãåãåãã2ã€ã®ãªããžã§ã¯ãéã®é¢ä¿ã®å¯èœæ§ãåæ ããŠ0ãã1ã®ç¯å²ã§çµæã®ã¹ã³ã¢ã衚瀺ããåé¡åãäœæããå¿ èŠããããŸãã
ããšãã°ããmir-335èŠå¶BRCA1ãæãšãªããžã§ã¯ãã®ãã¢ïŒmir-335ãBRCA1ïŒãåé¡åã«æž¡ããšãåé¡åã¯çµæã0.9ããè¿ããŸãã
ãã®ãããžã§ã¯ãã®ãœãŒã¹ã³ãŒãã¯ã ããŒãžã§å ¥æã§ããŸã ã
ããŒã¿ã»ãããäœæ
PMCãšPubMedã® 2ã€ã®ããŒã¿ãœãŒã¹ããã®å»çèšäºã®ããã¹ãã䜿çšããŸããã
瀺ããããœãŒã¹ããããŠã³ããŒããããããã¥ã¡ã³ãã®ããã¹ãã¯ã TextBlobã©ã€ãã©ãªã䜿çšããŠæã«åå²ãããŸããã
åæã¯GNATãªããžã§ã¯ãèªèããŒã«ã«è»¢éãããæã«å«ãŸãããã€ã¯ãRNAãšéºäŒåã®ååãæœåºãããŸããã
é¢ä¿ã®æœåºïŒãŸãã¯æ©æ¢°åŠç¿ã®åºæ¬ã¿ã¹ã¯ïŒã«é¢é£ããæãé£ããã¿ã¹ã¯ã®1ã€ã¯ãã©ãã«ä»ãã®ããŒã¿ã®å¯çšæ§ã§ãã ç§ãã¡ã®ãããžã§ã¯ãã§ã¯ããã®ãããªããŒã¿ã¯å©çšã§ããŸããã§ããã 幞ããªããšã«ãããªã¢ãŒãç£èŠãæ¹åŒã䜿çšã§ããŸãã
ãªã¢ãŒãç£èŠ
ããªã¢ãŒãã¢ãã¿ãªã³ã°ããšããçšèªã¯ãMintz et alã«ãããã¿ã°ä»ãããŒã¿ã䜿çšããã«ãªã¬ãŒã·ã§ã³ã·ãããååŸããéã®ãªã¢ãŒãã¢ãã¿ãªã³ã°ããšããç 究ã§åããŠå°å ¥ãããŸãã ãªã¢ãŒã芳枬æ¹æ³ã§ã¯ããªããžã§ã¯ãéã®æ¢ç¥ã®é¢ä¿ã®ããŒã¿ããŒã¹ãšããããã®ãªããžã§ã¯ããèšåãããŠããèšäºã®ããŒã¿ããŒã¹ã«åºã¥ããã©ãã«ä»ãã®ããŒã¿ã»ãããäœæããŸãã
ãªããžã§ã¯ãã®åãã¢ããã³ãªããžã§ã¯ãã®ããŒã¿ããŒã¹å ã®åãªã³ã¯ã«å¯ŸããŠããªããžã§ã¯ããèšåãããŠããããŒã¿ããŒã¹èšäºã®ãã¹ãŠã®ãªãã¡ãŒã«å¯ŸããŠãªã³ã¯ã©ãã«ãäœæãããŸãã
ãã¬ãã£ããã¿ãŒã³ïŒã³ãã¥ãã±ãŒã·ã§ã³äžè¶³ïŒãçæããããã«ããªã¬ãŒã·ã§ã³ã·ããããŒã¿ããŒã¹ã«è¡šç€ºãããŠããªããªã¬ãŒã·ã§ã³ã·ãããå«ããªãã¡ãŒãã©ã³ãã ã«éžæããŸããã å Žåã«ãã£ãŠã¯ãããŒã¿ãµã³ãã«ã®ã©ã³ãã ãµã³ããªã³ã°ã®çµæãšããŠã誀ã£ããã¬ãã£ããªçµæãåŸãããå¯èœæ§ãããããããªã¢ãŒãã¢ãã¿ãªã³ã°æ¹æ³ã®äž»ãªæ¹å€ã¯ãã¬ãã£ããµã³ãã«ã®å¯èœæ§ã®ããäžæ£ç¢ºãã«åºã¥ããŠããããšã«æ³šæããŠãã ããã
ããã¹ãå€æ
ã©ãã«ä»ãã®ãã¬ãŒãã³ã°ã»ãããäœæããåŸã scikit-learn Pythonã©ã€ãã©ãªãšèªç¶èšèªåŠçïŒNLPïŒãã¯ãããžã«åºã¥ãããã€ãã®Pythonã©ã€ãã©ãªã䜿çšããŠãªã³ã¯åé¡åãäœæãããŸãã å®éšãšããŠãããã€ãã®ç°ãªãèå¥æ©èœãšåé¡åã䜿çšããããšããŸããã
å®éã«ã¡ãœããããã¹ãããŠæ©èœãåºå¥ããåã«ã以äžã§èª¬æããæé ã§æ§æãããããã¹ãå€æãå®è¡ããŸããã
ãªããžã§ã¯ã眮æ
ã¢ãã«ãã¬ãŒãã³ã°ã¯ç¹å®ã®ãªããžã§ã¯ãã®ååã«åŸã£ãŠå¿ èŠã§ã¯ãªããããã¹ãã®æ§é ã«åŸã£ãŠå¿ èŠã§ãããšããèãæ¹ã§ãã
äŸïŒ
miRNA-335 was found to regulate BRCA1
ãžã®å€æïŒ
ENTITY1 was found to regulate ENTITY2
ã€ãŸããå®éã«ã¯åãªãã¡ãŒãããªããžã§ã¯ãã®ãã¹ãŠã®ãã¢ãååŸããããããã«ã€ããŠãç®çã®ãªããžã§ã¯ãããã¬ãŒã¹ãã«ããŒã«çœ®ãæããŸããã ãã®å ŽåãOBJECT1ã¯åžžã«micro-RNAã眮ãæããOBJECT2ã¯éºäŒåã§ãã ãŸããå¥ã®ç¹å¥ãªãã¬ãŒã¹ãã«ããŒã䜿çšããŠãææ¡ã®äžéšã§ãããç®çã®é¢ä¿ã«é¢äžããŠããªããªããžã§ã¯ããããŒã¯ããŸããã
ãããã£ãŠã次ã®æã®å ŽåïŒ
High levels of expression of miRNA-335 and miRNA-342 were found together with low levels of BRCA1
以äžã®å€æããããªãã¡ãŒã®ã»ãããåãåããŸããã
High levels of expression of ENTITY1 and OTHER_ENTITY were found together with low levels of ENTITY2 High levels of expression of OTHER_ENTITY and ENTITY1 were found together with low levels of ENTITY2
ãã®æç¹ã§ããªããžã§ã¯ãã眮ãæããå Žåã¯Pythonã®string.replaceïŒïŒã¡ãœããã䜿çšã§ããå¯èœãªçµã¿åããããã¹ãŠè¡šç€ºããå¿ èŠãããå Žåã¯itertools.combinationsãŸãã¯itertools.productã¡ãœããã䜿çšã§ããŸãã
ããŒã¯ã¢ãã
ããŒã¯ã¢ããã¯ãåèªã®ã·ãŒã±ã³ã¹ãå°ããªã»ã°ã¡ã³ãã«åå²ããããã»ã¹ã§ãã ãã®å Žåãæãåèªã«åå²ããŸãã
ãããè¡ãã«ã¯ã nltkã©ã€ãã©ãªã䜿çšããŸãã
import nltk tokens = nltk.word_tokenize(sentence)
åãæšãŠ
ç§åŠæç®ã§æ瀺ãããæšå¥šäºé ã«åŸã£ãŠãåæãå°ããªã»ã°ã¡ã³ãã«åãæšãŠãŸããããã®ã»ã°ã¡ã³ãã«ã¯ããªããžã§ã¯ãéã®åèªãšãªããžã§ã¯ãã®ååŸã®ããã€ãã®åèªãå«ãŸããŠããŸãã ãã®ãããªåãæšãŠã®ç®çã¯ãé¢ä¿ãæœåºãããšãã«éèŠã§ã¯ãªãæã®éšåãåé€ããããšã§ãã
åã®æ®µéã§ããŒã¯ã¢ãããå®è¡ãããåèªã®é åã®ã¹ã©ã€ã¹ãäœæãã察å¿ããã€ã³ããã¯ã¹ãæäŸããŸããã
WINDOW_SIZE = 3 # make sure that we don't overflow but using the min and max methods FIRST_INDEX = max(tokens.index("ENTITY1") - WINDOW_SIZE , 0) SECOND_INDEX = min(sentence.index("ENTITY2") + WINDOW_SIZE, len(tokens)) trimmed_tokens = tokens[FIRST_INDEX : SECOND_INDEX]
æ£èŠå
ãã¹ãŠã®æåãå°æåã«å€æããã ãã§ãæç« ã®æ£èŠåãå®äºããŸããã æ°åãææ ã®åæãªã©ã®åé¡ã解決ããããã«ã±ãŒã¹äœææ¹æ³ã䜿çšããããšããå§ãããŸãããé¢ä¿ãæœåºãããšããæå³ã§ã¯ãããŸããã æã®åã ã®åèªã匷調ããã®ã§ã¯ãªããããã¹ãã®æ å ±ãšæ§é ã«é¢å¿ããããŸãã
ã¹ãããã¯ãŒã/æ°åã®åé€
ãã®å Žåãã¹ãããã¯ãŒããšã¹ãããã¯ãŒããæããåé€ããæšæºããã»ã¹ã䜿çšããŸãã ã¹ãããã¯ãŒãã¯ãé«é »åºŠã®ã¯ãŒãã§ããããšãã°ãå眮è©ãinãããkããããã³ãonãã§ãã ãããã®åèªã¯ã»ã³ãã³ã¹ã§ã¯éåžžã«äžè¬çã§ãããããã»ã³ãã³ã¹å ã®ãªããžã§ã¯ãéã®æ¥ç¶ã«é¢ããã»ãã³ãã£ãã¯ã®è² è·ã¯ãããŸããã
åãçç±ã§ãæ°åã®ã¿ã§æ§æãããããŒã¯ã³ãšã2æåæªæºã®çãããŒã¯ã³ãåé€ããŸãã
ã«ãŒãéžæ
ã«ãŒãã®åŒ·èª¿è¡šç€ºã¯ ãåäžã®åèªãã«ãŒãã«æžããããã»ã¹ã§ãã
ãã®çµæãåèªã®æå³ç©ºéã®éãæžããåèªèªäœã®æå³ã«éäžã§ããŸãã
å®éã«ã¯ããã®ã¹ãããã¯ç²ŸåºŠãé«ãããšããç¹ã§ç¹ã«å¹æçã§ã¯ãããŸããã ãã®çç±ãšããã®ããã»ã¹ã®çç£æ§ãæ¯èŒçäœãããïŒå®è¡æéã®èŠ³ç¹ããïŒãã«ãŒãå²ãåœãŠã¯æçµã¢ãã«ã«å«ãŸããŠããŸããã§ããã
æ£èŠåãåèªã®åé€ãããã³ã«ãŒãæœåºã¯ãããŒã¯ãããæãšåãæšãŠãããæãç¹°ãè¿ãåŠçããŠå®è¡ãããŸãã å¿ èŠã«å¿ããŠãåèªã®æ£èŠåãšåé€ãå®è¡ãããŸãã
cleaned_tokens = [] porter = nltk.PorterStemmer() for t in trimmed_tokens: normalized = t.lower() if (normalized in nltk.corpus.stopwords.words('english') or normalized.isdigit() or len(normalized) < 2): continue stemmed = porter.stem(t) processed_tokens.append(stemmed)
é¡èãªç¹åŸŽã®æ瀺
以äžã«ç€ºãå€æãå®äºããããããŸããŸãªçš®é¡ã®ç¹åŸŽãå®éšããããšã«ããŸããã
3çš®é¡ã®å±æ§ã䜿çšããŸããïŒå€æ°ã®åèªãæ§æå±æ§ãããã³åèªã®ãã¯ãã«è¡šçŸã
ããããã®èšè
åèªã®ãã«ãã»ãã ïŒMSïŒ ã¢ãã«ã¯ãããã¹ããæ°å€ãã¯ãã«ç©ºéã«å€æããèªç¶èšèªåŠçïŒNLPïŒã¿ã¹ã¯ã§äœ¿çšãããäžè¬çãªæ¹æ³ã§ãã
MSã¢ãã«ã§ã¯ãèŸæžã®ååèªã«ã¯äžæã®æ°å€èå¥åãå²ãåœãŠãããŸãã 次ã«ãåæãèŸæžã®ããªã¥ãŒã å ã®ãã¯ãã«ã«å€æãããŸãã ãã¯ãã«å ã®äœçœ®ã¯ããã®ãããªèå¥åãæã€åèªãããã¹ãã®å Žåã¯å€ã1ãããã以å€ã®å Žåã¯å€ã0ãã§è¡šãããŸãã å¥ã®æ¹æ³ãšããŠãããã¹ãå ã®ç¹å®ã®åèªã®åºçŸåæ°ã®ãã¯ãã«ã®åèŠçŽ ã§è¡šç€ºãæ§æã§ããŸãã äŸã¯ããã«ãããŸã ã
ããã§ããäžèšã®ã¢ãã«ã§ã¯ãæäžã®ç°ãªãåèªã®é åºã¯èæ ®ãããåã ã®åèªã®åºçŸã®ã¿ãèæ ®ããŸãã ã¢ãã«ã«åèªã®é åºãå«ããããã«ãäžè¬çãªN-gramã¢ãã«ã䜿çšããŸããããã®ã¢ãã«ã¯ãé·ãNã®é£ç¶ããåèªã®ã³ã¬ã¯ã·ã§ã³ãè©äŸ¡ãããã®ãããªåã³ã¬ã¯ã·ã§ã³ãåäžã®åèªãšããŠæ±ããŸãã
ããã¹ãåæã§ã®N-gramã®äœ¿çšã®è©³çŽ°ã«ã€ããŠã¯ããããã¹ãåæã®éç«ã£ãç¹åŸŽã®è¡šçŸïŒ1ãã£ãŒãã2ãã£ãŒãããã©ã€ã°ã©ã ...ã©ã®ãããïŒããåç §ããŠãã ããã
幞ããªããšã«ãMCããã³N-gramã¢ãã«ã¯CountVectorizerã¯ã©ã¹ãä»ããŠscikit-learnã§å®è£ ãããŠããŸãã
次ã®äŸã§ã¯ããã©ã€ã°ã©ã ã¢ãã«ã䜿çšããŠããã¹ããMS 1/0ã«å€æããŸãã
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(analyzer = "word", binary = True, ngram_range=(3,3)) # note that 'samples' should be a list/iterable of strings # so you might need to convert the processes tokens back to sentence # by using " ".join(...) data_features = vectorizer.fit_transform(samples)
æ§æèšå·
2çš®é¡ã®æ§ææ©èœã䜿çšããŸããã åè© ïŒCRïŒã®ããŒã«ãŒ ãšäŸåé¢ä¿ã®ãã解æããªãŒã§ãã
spacy.ioã䜿çšããŠPDããŒã«ãŒãšäŸåã°ã©ãã®äž¡æ¹ãæœåºããããšã«ããŸããããã®ãã¯ãããžãŒã¯ãé床ãšç²ŸåºŠã®ç¹ã§æ¢åã®Pythonã©ã€ãã©ãªãããåªããŠãããä»ã®NLPã·ã¹ãã ã«å¹æµããããã§ãã
次ã®ã³ãŒãã¹ããããã¯ãæå®ãããæã®CRãååŸããŸãã
from spacy.en import English parser = English() parsed = parser(" ".join(processed_tokens)) pos_tags = [s.pos_ for s in parsed]
ãã¹ãŠã®æãå€æããåŸãäžèšã®CountVectorizerã¯ã©ã¹ãšPDããŒã«ãŒã®åèªã®ãã«ãã»ããã®ã¢ãã«ã䜿çšããŠãããããæ°å€ãã¯ãã«ç©ºéã«å€æã§ããŸãã
åæ§ã®æ¹æ³ã䜿çšããŠãåæã®2ã€ã®ãªããžã§ã¯ãéã§æ€çŽ¢ãããäŸåé¢ä¿ãæã€æ§æ解æããªãŒã®ç¹åŸŽãåŠçãããããã®å€æãCountVectorizerã¯ã©ã¹ã䜿çšããŠå®è¡ãããŸããã
åèªã®ãã¯ãã«è¡šçŸ
åèªã®ãã¯ãã«è¡šçŸã®æ¹æ³ã¯ãNLPã«é¢é£ããåé¡ã解決ããããã«æè¿éåžžã«äžè¬çã«ãªããŸããã ãã®æ¹æ³ã®æ¬è³ªã¯ããã¥ãŒã©ã«ã¢ãã«ã䜿çšããŠåèªãç¹åŸŽçãªç¹åŸŽã®ç©ºéã«å€æããé¡äŒŒããåèªãäºãã«ããããªè·é¢ã«ãããã¯ãã«ã§è¡šãããããã«ããããšã§ãã
åèªã®ãã¯ãã«è¡šçŸã®è©³çŽ°ã«ã€ããŠã¯ã次ã®ããã°æçš¿ãåç §ããŠãã ããã
Paragraph Vectorããã¥ã¡ã³ãã§èª¬æãããŠããã¢ãããŒããé©çšããŸããïŒç¹åŸŽïŒç¹åŸŽïŒã®é«æ¬¡å 空éã«æïŒãŸãã¯ããã¥ã¡ã³ãïŒãå°å ¥ããŸãã Doc2Vecã©ã€ãã©ãªGensimã®å®è£ ã䜿çšããŸããã 詳现ã«ã€ããŠã¯ããã®ãã¥ãŒããªã¢ã«ãã芧ãã ãã ã
䜿çšãããåºåãã¯ãã«ã®ãã©ã¡ãŒã¿ãŒãšãµã€ãºã¯äž¡æ¹ãšããParagraph Vectorããã¥ã¡ã³ããšGensimãã¥ãŒããªã¢ã«ã®æšå¥šäºé ã«æºæ ããŠããŸãã
ã¿ã°ä»ãããŒã¿ã«å ããŠãDoc2Vecã¢ãã«ã§ã¿ã°ãªãã®æã®å€§èŠæš¡ãªã»ããã䜿çšããŠãã¢ãã«ã«è¿œå ã®ã³ã³ããã¹ããæäŸããã¢ãã«ã®ãã¬ãŒãã³ã°ã«äœ¿çšãããèšèªãšæ©èœãæ¡åŒµããŸããã
ã¢ãã«ã®äœæåŸãåæã¯200ãè¶ ãã次å ãã¯ãã«ã§è¡šãããåé¡åšã®å ¥åãšããŠäœ¿çšã§ããŸãã
åé¡ã¢ãã«ã®è©äŸ¡
ããã¹ãã®å€æãšç¹åŸŽçãªç¹åŸŽã®æœåºãå®äºãããã次ã®ã¹ãããã§ããåé¡ã¢ãã«ã®éžæãšè©äŸ¡ã«é²ãããšãã§ããŸãã
åé¡ã«ã¯ã ããžã¹ãã£ãã¯ååž°ã¢ã«ãŽãªãºã ã䜿çšãããŸããã ãµããŒããã¯ã¿ãŒãã·ã³ãã©ã³ãã ãã©ã¬ã¹ããªã©ã®ã¢ã«ãŽãªãºã ããã¹ãããŸããããé床ãšç²ŸåºŠã®ç¹ã§ããžã¹ãã£ãã¯ååž°ãæè¯ã®çµæã瀺ããŸããã
ãã®æ¹æ³ã®ç²ŸåºŠãè©äŸ¡ããåã«ãããŒã¿ã»ããããã¬ãŒãã³ã°ã»ãããšãã¹ãã»ããã«åå²ããå¿ èŠããããŸãã ãããè¡ãã«ã¯ãtrain_test_splitã¡ãœããã䜿çšããŸãã
from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.25)
ãã®æ¹æ³ã§ã¯ãããŒã¿ã»ãããä»»æã«åå²ããŸããããŒã¿ã®75ïŒ ã¯ãã¬ãŒãã³ã°ã»ããã«é¢é£ãã25ïŒ ã¯ãã¹ãã»ããã«é¢é£ããŸãã
ããžã¹ãã£ãã¯ååž°ã«åºã¥ããŠåé¡åšããã¬ãŒãã³ã°ããããã«ãscikit-learn LogisticRegressionã¯ã©ã¹ã䜿çšããŸããã åé¡åã®ããã©ãŒãã³ã¹ãè©äŸ¡ããããã«ã classification_reportã¯ã©ã¹ã䜿çšããŸãããã®ã¯ã©ã¹ã¯ã 粟床 ã æ»ãå€ã®å®å š æ§ ãããã³åé¡ã®ããã®F1ã¹ã³ã¢ã«é¢ããããŒã¿ãåºåããŸãã
次ã®ã³ãŒãã¯ãããžã¹ãã£ãã¯ååž°åé¡åšã®ãã¬ãŒãã³ã°ãšåé¡ã¬ããŒãã®å°å·ã瀺ããŠããŸãã
from sklearn.linear_model import LogisticRegression from sklearn.metrics import classification_report clf = linear_model.LogisticRegression(C=1e5) clf.fit(x_train, y_train) y_pred = clf.predict(x_test) print classification_report(y_test, y_pred)
äžèšã®ã³ãŒããã©ã°ã¡ã³ãã®çµæã®äŸã¯æ¬¡ã®ãšããã§ãã
precision recall f1-score support 0 0.82 0.88 0.85 1415 1 0.89 0.83 0.86 1660 avg / total 0.86 0.85 0.85 3075
ãã®äŸã§ã¯ããã©ã¡ãŒã¿ãŒCïŒæ£ååã®åºŠåãã瀺ãïŒãä»»æã«éžæãããŠããŸããã以äžã«ç€ºãããã«ãçžäºæ€èšŒã䜿çšããŠèª¿æŽããå¿ èŠããããŸãã
çµæ
äžèšã®ãã¹ãŠã®æ¹æ³ãšææ³ãçµã¿åãããããŸããŸãªç¹åŸŽçãªæ©èœãšå€æãæ¯èŒããŠæé©ãªã¢ãã«ãéžæããŸããã
LogisticRegressionCVã¯ã©ã¹ã䜿çšããŠã«ã¹ã¿ã ãã©ã¡ãŒã¿ãŒãæã€ãã€ããªåé¡åãäœæããå¥ã®ãã¹ãã¹ã€ãŒããåæããŠã¢ãã«ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããŸããã
ç°ãªãç¹åŸŽã®ããŸããŸãªãã©ã¡ãŒã¿ãŒãç°¡åãã€äŸ¿å©ã«ãã¹ãããã«ã¯ã GridSearchã¯ã©ã¹ã䜿çšã§ããŸãã
次ã®è¡šã¯ãããŸããŸãªç¹åŸŽçãªæ©èœãæ¯èŒããäž»ãªçµæããŸãšãããã®ã§ãã
ã¢ãã«ã®ç²ŸåºŠã確èªããããã«ã F1-Scoreã¹ã±ãŒã«ã䜿çšãããŸãããããã¯ãã¢ãã«ã®ãªã¿ãŒã³ã®ç²ŸåºŠãšå®å šæ§ã®äž¡æ¹ãè©äŸ¡ã§ããããã§ãã
ç¹åŸŽ | F1-ã¹ã³ã¢ |
---|---|
åäžã®ãã©ã€ã°ã©ã ïŒåèªã®ãã«ãã»ããïŒ | 0.87 |
åäžã®ãã©ã€ã°ã©ã ïŒMSïŒããã³ãã©ã€ã°ã©ã ïŒãã§ã³å ±ååœã®ããŒã«ãŒïŒ | 0.87 |
ã·ã³ã°ã«ãã©ã€ã°ã©ã ïŒMSïŒããã³Doc2Vec | 0.87 |
ã·ã³ã°ã«ããŒã³ïŒåèªã®ãã«ãã»ããïŒ | 0.8 |
2ã°ã©ã ïŒåèªã®ãã«ãã»ããïŒ | 0.85 |
ãã©ã€ã°ã©ã ïŒåèªã®ãã«ãã»ããïŒ | 0.83 |
Doc2ec | 0.65 |
ãã©ã€ã°ã©ã ïŒãã§ã³å ±ååœã®ããŒã«ãŒïŒ | 0.62 |
äžè¬ã«ã1ããªã°ã©ã ã§åèªã®ãã«ãã»ããã䜿çšããå Žåãä»ã®æ¹æ³ãšæ¯èŒããŠæ倧ã®ç²ŸåºŠãä¿èšŒãããããã§ãã
Doc2Vecã¢ãã«ã¯ãåèªã®é¡äŒŒæ§ãå€æããéã®æ倧ã®ããã©ãŒãã³ã¹ã§æ³šç®ã«å€ããŸãããé¢ä¿ãæœåºãããšããç¹ã§é©åãªçµæãä¿èšŒãããã®ã§ã¯ãããŸããã
ãŠãŒã¹ã±ãŒã¹
ãã®èšäºã§ã¯ããã€ã¯ãRNAãšéºäŒåéã®é¢ä¿ãåŠçããããã®é¢ä¿ãæœåºããããã®åé¡åãäœæããããã«äœ¿çšãããæ¹æ³ãæ€èšããŸããã
ãã®èšäºã§èª¬æããåé¡ãšãµã³ãã«ã¯çç©åŠã®åéã«å±ããŸãããç 究ãããœãªã¥ãŒã·ã§ã³ãšæ¹æ³ã¯ä»ã®åéã«é©çšããŠãéæ§é åããã¹ãããŒã¿ã«åºã¥ããŠé¢ä¿ã°ã©ããäœæã§ããŸãã
Azureãç¡æã§è©Šãããšãã§ããŸã ã
åºåã®å ã ãããžã§ã¯ãã§æ°ãããã¯ãããžãè©ŠããŠã¿ããããå®éã«è©ŠããŠããªãå Žåã¯ãMicrosoft Tech Accelerationããã°ã©ã ã«ã¢ããªã±ãŒã·ã§ã³ãæ®ããŠãã ããã ãã®äž»ãªæ©èœã¯ãã客æ§ãšäžç·ã«å¿ èŠãªã¹ã¿ãã¯ãéžæãããã€ãããã®å®è£ ãæ¯æŽããæåããå Žåãåžå Žå šäœãã客æ§ã«ã€ããŠç¥ãããæ倧éã®åªåãæãããšã§ãã
PSãã®èšäºã説æããŠãããKostya KichinskyïŒ Quantum Quintum ïŒã«æè¬ããŸãã