調æ§ã®åæãšã¯ã ãã³ã³ãã¥ãŒã¿ãŒã«æèŠãæããïŒãã·ã¢èªã§ã®ææ åæïŒããšããèšäºã«è©³ãã説æãããŠããŸãã
ç®æšã¯ãå ¥åãšããŠããã¹ããåãåãããã®ããã¹ããæ£ã®å Žåã¯1ãè¿ããè² ã®å Žåã¯-1ãè¿ãWebãµãŒãã¹ãæ§ç¯ããããšã§ãã Microsoft Azure Machine Learningã¯ãèšç®çµæãWebãµãŒãã¹ãšããŠå ¬éããRèšèªããµããŒãããçµã¿èŸŒã¿æ©èœãããããããã®ã¿ã¹ã¯ã«çæ³çïŒã»ãŒïŒé©ããŠããŸããããã«ãããæŸèæãäœæããŠä»®æ³ãã·ã³/ WebãµãŒããŒãæ§æããå¿ èŠããªããªããŸãã äžè¬çã«ãã¯ã©ãŠãã³ã³ãã¥ãŒãã£ã³ã°ã®ãã¹ãŠã®å©ç¹ã ããã«ãæè¿ãAzureã¢ã«ãŠã³ããšã¯ã¬ãžããã«ãŒãããªããŠã誰ã§ãAzure MLãè©Šãããšãã§ããããšãçºè¡šãããŸãããMicrosoftã¢ã«ãŠã³ãã®ã¿ãå¿ èŠã§ãã
ããã»ã¹å šäœã¯ã2ã€ã®ãã€ã³ãã«ãªããŸãã
- ã¢ãã«ã®äœæãšãã¬ãŒãã³ã°
- çµæã®ã¢ãã«ã䜿çšãã
ã¢ãã«ãã¬ãŒãã³ã°
ããŒãèªèããããã«ãåçŽãªãã€ãºåé¡åšã䜿çšããŸãã ãã¬ãŒãã³ã°ã®ããã«ãããã€ãã®ããã¹ããšããã«å¯Ÿå¿ããè©äŸ¡ã®ã»ãããå«ãã©ãã«ä»ããµã³ãã«ãå¿ èŠã§ãã ããã«ããã®ã»ããã§ã¯ãããã¥ã¡ã³ãçšèªãããªãã¯ã¹ãæ§ç¯ãããŸããè¡ã¯ããã¥ã¡ã³ãã«å¯Ÿå¿ããåã¯ããã¥ã¡ã³ãå ã§çºçããçšèªã«å¯Ÿå¿ããŸãã åã»ã«ã«ã¯ã察å¿ããããã¥ã¡ã³ãå ã®ãã®çšèªã®ç¹°ãè¿ãæ°ãå«ãŸããŠããŸãã ãããã£ãŠãã ä»æ¥ã¯å€©æ°ãè¯ã ããšã 倩æ°ãæªãã倩æ°ãæªã ããšãã2ã€ã®ããã¥ã¡ã³ãã®å Žåãããã¥ã¡ã³ããšçšèªã®ãããªãã¯ã¹ã¯æ¬¡ã®ããã«ãªããŸãã
ä»æ¥ | ããã | å€©æ° | ç§ã¯ | ãããªã | ãšãŠã | 倧äžå€« | ç§èªèº« | æãã | éé£ãã | |
doc1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
doc2 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
ããŒãã«ã«ã¯ãåãåèªã®2ã€ã®åœ¢åŒããgoodããšãgoodããå«ãŸããŠããããšã«æ³šæããŠãã ããã ããã¯ãã¹ããã³ã°ïŒç¹å®ã®èŠåã«åŸã£ãŠããªãã³ã°ã®çµããïŒã䜿çšãããšåé¿ã§ããŸãããçµæãäœäžããå¯èœæ§ããããŸãã èšäºã®åé ã«ãããªã³ã¯ã®èšäºã§ãçšèªãšN-gramã®éã¿ä»ãã®è©³çŽ°ããèªã¿ãã ããã
ãã®è¡åãäœæããåŸããã€ãºåé¡åšãèšç·Žããããã«çŽæ¥äœ¿çšã§ããŸãã ãããæåŸã«ãçè«ããå®è·µã«ç§»ããŸãããã
ç·Žç¿ãã
VKã®å£ã®æçš¿ã®èª¿æ§ãè©äŸ¡ããŸãã ãããã£ãŠãåé¡åšããã¬ãŒãã³ã°ããã«ã¯ã調æ§ãä»å ãããæçš¿ã®éžæãå¿ èŠã§ãã ãã®ãªã³ã¯ããããŠã³ããŒãã§ããŸãïŒæ³šæããã®ãµã³ãã«ã¯ã ããã°ããŒã¿ããã«ãœã³ãžã®ïŒIã ASTAPPããã³MKulikow ïŒåå äžã«ããŒã¯ãããŸããããããã£ãŠãããŒã¯ã¢ããã倧èŠæš¡ã«è¡ããããããéåžžã«æ£ç¢ºã§ãããµããããããšã©ãŒãå«ãŸããŠããå¯èœæ§ããããŸãæ¥ãã§ïŒã ãã®ãµã³ãã«ã§ã¯ãââã©ã³ãã ãªVKontakteå£ããçŽ3,500ã®ã©ã³ãã ãªæçš¿ãããããã®ãã¡341ãæ£ã115ãè² ã§ãã æçš¿ã®è©äŸ¡ã¯ã-10ã10ã®ã¹ã±ãŒã«ã§å®æœãããŸããã
ããã§ãAzure MLã§å®éšãäœæããŠãåé¡åããã¬ãŒãã³ã°ããŸãããã MLããŒã ããŒãžã«ç§»åãã[æ°èŠ]-> [å®éš]-> [空ã®å®éš]ãã¯ãªãã¯ããŸãã æ°ããå®éšã®çŽç²ãªãã£ãŒã«ãã衚瀺ãããŸãã äžèšãããããã«ååãããé©åãªååïŒhabr_article_sentimentãªã©ïŒã«å€æŽã§ããŸãã
次ã«ãããŒã¿ã»ãããAzureã«ã¢ããããŒãããå¿ èŠããããŸãã çè«çã«ã¯ããããè¡ãã«ã¯ã[æ°èŠ]-> [ããŒã¿ã»ãã]-> [ããŒã«ã«ãã¡ã€ã«ãã]ãã¯ãªãã¯ãã[æ°ããããŒã¿ã»ããã®çš®é¡ãéžæïŒ]ãªã¹ãã«ããããŒãæã€æ±çšCSVãã¡ã€ã«ãéžæããŸãã ãã ããåé¡ãçºçããŸã-è¡ã«æ¹è¡æåïŒ\ nïŒãå«ãŸããŠããå ŽåãåŒçšç¬Šã§ãšã¹ã±ãŒããããŠããŠããã€ã³ããŒãã¯å€±æããŸãã ãŸããVKã®å£ã®ãã¹ãã«ã¯ããã®ã·ã³ãã«ãå¿ ãååšããŸãã ãã®ãã°ãåé¿ããã«ã¯ãCSVãã¡ã€ã«ãããŒã¿ããŒã¹ã«ããŒãããããŒã¿å ¥åããã³åºåã»ã¯ã·ã§ã³ã®Readerãããã¯ã䜿çšããŠããŒã¿ãããŒãããŸãã ãã®ãããã¯ãå®éšãã£ãŒã«ãã«ãã©ãã°ããããŒã¿ããŒã¹ãžã®æ¥ç¶ãæ§æããããŒã¿éžæçšã®SQLã¯ãšãªãæå®ããŸãïŒããµãŒããŒèšŒææžãåãå ¥ããïŒå®å šã§ãªãïŒãããã¯ã¹ããã§ãã¯ããããšãå¿ããªãã§ãã ãããSQLã¯ãšãªã¯æ¬¡ã®ããã«ãªããŸãã
SELECT score AS grade, text FROM tmp.big_data_hack
ããã§ãå®éšãéå§ããŠReaderãããã¯ã®åºåã確èªã§ããŸãããããè¡ãã«ã¯ãããŒãžã®äžéšã«ãã[å®è¡]ãã¿ã³ãã¯ãªãã¯ããå®è¡ãå®äºããããåºåããŒããå³ã¯ãªãã¯ããŠ[èŠèŠå]ãéžæããŸãã 次ã®ãããªãã®ãåŸãããã¯ãã§ãã
å³åŽã®ãããã¯ã§ã¯ãããã€ãã®çµ±èšãèŠãããšãã§ããŸã-æ倧å€ã70ã§ããããšãããããŸããããã¯ããŒã¯ã¢ããããã»ã¹äžã®æãããªã¿ã€ããã¹ã§ãããæªå²ãåœãŠã®è¡ïŒãã¥ãŒãã©ã«ããŒã³ãæã€è¡ïŒããããŸãã
次ã«ã空ã®è¡ãåé€ãã-10ãã10ãŸã§ã®ã¹ã±ãŒã«ããã°ã¬ãŒãã-1ã0ã1ã®ã«ããŽãªã°ã¬ãŒãã«ããŸãããããè¡ãã«ã¯ãMissing Value Scrubberãããã¯ãšClip Valuesãããã¯ã䜿çšããŸãã ãããã¯ãå«ãããã«ã®æ€çŽ¢ã䜿çšããŠãMissing Value Scrubberãããã¯ãèŠã€ããŠå®éšãã£ãŒã«ãã«ãã©ãã°ãããã®å ¥åãReaderãããã¯ã®åºåã«æ¥ç¶ããŸãã
äžã®åçã®ããã«ãã®ãããã¯ã®èšå®ãèšå®ããŸã-ããã§ã¯ããã¹ãŠãæ確ã ãšæããŸãã
次ã«ãClip Valuesãããã¯ããã©ãã°ããŸãããã®ãããã¯ã¯ãå€ãå€ãæ€åºããã³çœ®æããã®ã«åœ¹ç«ã¡ãç®çã«æé©ã§ã-æå°å€ãèšå®ããã ãã§ãã ããã³æ倧 ãããã-1ããã³1ã®å€ã
ãã®ãããã¯ã«ã¯åã»ã¬ã¯ã¿ãŒãããããšã«æ³šæããŠãã ãã-åŠçããåãéžæããå¿ èŠããããŸãã ããã©ã«ãã¯ãã¹ãŠããžã¿ã«ã§ãã ã°ã¬ãŒãåãéžæããŸããã-åã»ã¬ã¯ã¿ãŒãèµ·åãã¯ãªãã¯ããŠã次ã®èšå®ãèšå®ããŸãã
å®éšãå®è¡ããŠãäœãèµ·ãããèŠãŠã¿ãŸããã-[å®è¡]ãã¯ãªãã¯ããŠãã¯ãªããå€ãããã¯ã®åºåãèŠèŠåããŸãã
ãããïŒ å¿ èŠãªãã®-åé¡åšã®ãã¬ãŒãã³ã°ã«çŽæ¥é²ãããšãã§ããŸãã Azure MLã¯ä»»æã®Rã¹ã¯ãªããã®å®è¡ããµããŒãããŠãããããRã®e1071ããã±ãŒãžã®åçŽãã€ãºåé¡åšã䜿çšããŸããExecuteR Scriptãããã¯ãå®éšãã£ãŒã«ãã«ãã©ãã°ããClip ValuesåºåãDataset1å ¥åãã€ã³ãã«æ¥ç¶ããŸãã
ããã«æ³šæããå¿ èŠããããŸãïŒçæ³çã«ã¯ãã¢ãã«ã®ãã¬ãŒãã³ã°ãšãã®åŸã®äœ¿çšã®ããã»ã¹ã¯æ¬¡ã®ãšããã§ããå®éšãäœæãã䜿çšããã¢ãã«ãéžæãããã¬ãŒãã³ã°ããŠç²ŸåºŠãã§ãã¯ãå®è¡ããŸãã 次ã«ãã¢ãã«åºåãå³ã¯ãªãã¯ããŠãããã¬ãŒãã³ã°æžã¿ã¢ãã«ãšããŠä¿åããéžæããŸãã ãã®åŸããã¬ãŒãã³ã°ãããã¢ãã«ã¯ãããã¯ã»ã¯ã·ã§ã³ã«æ ŒçŽããããã€ã§ã䜿çšã§ããŸã.WebãµãŒãã¹ãå ¬éããã«ã¯ãæ°ããå®éšãäœæããããã«ãã¬ãŒãã³ã°ãããã¢ãã«ããã©ãã°ããŠãåºåãã€ã³ããšåºåãã€ã³ããèšå®ããŸãã ãã¹ãŠãéåžžã«ç°¡åã§æ確ã§ãã ãã ããçŸæç¹ã§ã¯ããã¬ãŒãã³ã°æžã¿ã¢ãã«ããExecute R Scriptãã¿ã€ãã®ãããã¯ã«ä¿åããããšã¯ã§ããŸããã ç§ã¯ãããããã«ä¿®æ£ãããããšãæ¬åœã«æãã§ããŸãïŒ ããã«æ祚ããŠãã ãã ïŒã ãã ããä»åŸRã¹ã¯ãªããããã¢ãã«ãä¿åããŠäœ¿çšããæ©äŒããŸã ãããŸãïŒãªããžã§ã¯ããäžé£ã®ãã€ãã«ã·ãªã¢ã«åãããããã¯ã®åºåã«éãããšãã§ããŸãïŒãã®ã»ãããåäžåã®DataFrameã«å€æããåŸãDataFramã®ã¿ãåºåã«éä¿¡ã§ããŸãïŒ sïŒã å®éšãå®äºããããåºåãã€ã³ããå³ã¯ãªãã¯ããŠ[ããŒã¿ã»ãããšããŠä¿å]ãéžæã§ããŸãã å°æ¥ã®å®éšã§ã¯ããã®ããŒã¿ã»ãããéžæããŠãRã¹ã¯ãªãããããã¯ã®å ¥åã«æ¥ç¶ããããŒãããŠãã·ãªã¢ã©ã€ãºããããšãå¯èœã«ãªããŸãã ãã®ã¡ãœããã¯æ²ãã£ãŠããŸãããåäœããŸã:)ããŒã«ã«ã«Rãã€ã³ã¹ããŒã«ãããŠããå Žåãå°ãç°¡åã«ã§ããŸã-ã¢ãã«ããã¬ãŒãã³ã°ãã.RDataã«ä¿åããzipã§ããã¯ãããã®zipãããŒã¿ã»ããã»ã¯ã·ã§ã³ã«ããŒãããRã¹ã¯ãªãããããã¯ã®3çªç®ã®å ¥åã«æ¥ç¶ããŸã-ã¹ã¯ãªãããã³ãã«ïŒZipïŒãã äžè¬çã«ã.RDataã¿ã€ãã®ãã¡ã€ã«ãããŒã¿ã»ããã»ã¯ã·ã§ã³ã«çŽæ¥èªã¿èŸŒãããšãã§ããŸãããåŸã§æ¥ç¶ããåäžã®ãããã¯ã¯èŠã€ãããŸããã§ãã-Rã¹ã¯ãªãããããã¯ã«æ¥ç¶ããããšããã§ããŸããã
äžèšã«åºã¥ããŠãRã³ãŒãã¯æ¬¡ã®ãšããã§ãã
library("RTextTools") library("stringr") library("tm") library("e1071") # ( 1) data <- maml.mapInputPort(1) # , / data <- data[data$grade != 0,] # - dtm <- create_matrix(data$text , language="russian" , minWordLength = 2 , maxWordLength = 10, , stemWords = FALSE , removeNumbers = TRUE , removeSparseTerms = 0 ) mat = as.matrix(dtm) # DocumentTermMatrix # classifier = naiveBayes(mat, as.factor(data$grade)) # serClsf <- serialize(classifier, connection = NULL) # DataFrame output <- data.frame(clsfr = as.integer(serClsf)) maml.mapOutputPort("output");
ããã§ãããããå®éšãå®è¡ã§ããŸãïŒ å®äºããã®ã«1åããããŸããã å®äºåŸããã¹ãŠãé 調ã«é²ãã å Žåã¯ãåºåããŒããå³ã¯ãªãã¯ããŠåé¡åšãæ°ããããŒã¿ã»ãããšããŠä¿åã§ããŸãã
ããã§ã¢ãã«ã®ãã¬ãŒãã³ã°ãå®äºãã次ã®ããŒãã«é²ãããšãã§ããŸã-çµæã®ã¢ãã«ã䜿çšããŠãWebãµãŒãã¹ãäœæããã³å ¬éããŸãã
ã¢ãã«äœ¿çš
æ°ããå®éšãäœæããhabr_article_sentiment_useãªã©ã®ååãä»ããŸãã Execute R Scriptãããã¯ããã£ãŒã«ãã«ãã©ãã°ãã以åã«ä¿åããåé¡åšã2çªç®ã®ããŒãã«æ¥ç¶ããŸãã
ãããŠæåã®ããŒãã«ã1è¡ãå«ã1åã®ããã¹ããã¡ã€ã«ã ããæ¥ç¶ããŸããããã¯ãã¢ãã«ããã§ãã¯ããããã®ãã¹ãææ¡ã§ãã ããã«ã¯2ã€ã®çç±ãå¿ èŠã§ãããŸããåé¡åãå®éã«æ©èœããããšãããããŸãããæãéèŠãªããšã¯ãå ¬éããWebãµãŒãã¹å ¥åããŒã¿ã®æ§é ã«é¢ããAzure Machine Learningã®æ å ±ãæäŸããããšã§ãããã©ã¡ãŒã¿ã ãã®ããã¹ããã¡ã€ã«ã¯ãããšãã°æ¬¡ã®ããã«ãªããŸãã
"text" " . , , , . ."
çµæã¯æ¬¡ã®ããã«ãªããŸãã
ãã®ããŒã¿ã»ããã®åºåã§[èŠèŠå]ãã¯ãªãã¯ããŠããtextããšããååã®åã1ã€ãããªãããšã確èªããŸãã
次ã«ãåé¡åã䜿çšããRã¹ã¯ãªãããäœæããŸãããã
library("RTextTools") library("stringr") library("tm") library("e1071") # , - data <- maml.mapInputPort(1) serializedObj <- maml.mapInputPort(2) # classifier <- unserialize(as.raw(serializedObj$clsfr)) # C - doc <- data$text dtm <- create_matrix(doc , language="russian" , minWordLength = 4 , maxWordLength = 10, , stemWords = FALSE , removeNumbers = TRUE , removeSparseTerms = 0 ) mat = as.matrix(dtm) # predicted <- predict(classifier, mat) # DataFrame result <- as.data.frame(predicted) # maml.mapOutputPort("result");
å®éšãå®è¡ããçµæãèŠèŠåããŸã-ããã¹ãã¯äžè¬çã«ããžãã£ãã§ãããç§ã¯-1ãåŸãŸããã ããã¯ããµã³ããªã³ã°å質ãäœããããæŽç·Žãããã¢ãããŒãã䜿çšããå¿ èŠãããããšã瀺ããŠããŸãã ããã«ãœã³ã§ã¯ã粟床ã¯çŽ72ïŒ ã§ããã
次ã«ãWebãµãŒãã¹ã®ãšã³ããªãã€ã³ããèšå®ããå¿ èŠããããŸã-Rã¹ã¯ãªãããããã¯ã®æåã®å ¥åãã¯ãªãã¯ãã[å ¥åãšããŠèšå®]ãéžæããŸãã åãæ¹æ³ã§åºåãèšå®ããŸãããçµæããŒã¿ã»ãããåºåãã€ã³ããã¯ãªãã¯ãããåºåãšããŠèšå®ããéžæããŸãã ããã§ãæçµçã«WebãµãŒãã¹ãå ¬éã§ããŸããäžéšã®ããã«ã§[WebãµãŒãã¹ã®å ¬é]ãã¯ãªãã¯ããŸãïŒãã®ãã¿ã³ã䜿çšã§ããªãå Žåã¯ãå®éšãå®è¡ããã ãã§ãå®è¡åŸã«ã¢ã¯ãã£ãã«ãªããŸãïŒã 確èªåŸãæ°ããå ¬éãããWebãµãŒãã¹ã®ããŒãžã«ç§»åããŸãã
ãããããçæãããWebãµãŒãã¹ã®ãã«ãããŒãžã«ã¢ã¯ã»ã¹ã§ããŸãããã®ããã«ã¯ãREQUEST / RESPONSEè¡ã®APIãã«ãããŒãžãã¯ãªãã¯ããŸãã ãã®ããŒãžã«ã¯ãããŸããŸãªèšèªã®ã³ãŒããµã³ãã«ãå«ããWebãµãŒãã¹ã®äœ¿çšã«é¢ããå æ¬çãªæ å ±ãå«ãŸããŠããŸãã æåã®ãªã¯ãšã¹ããå®è£ ããŠã¿ãŸããã-ãæ°ã«å ¥ãã®RESTã¯ã©ã€ã¢ã³ãã䜿çšããŠã次ã®JSONãµãŒãã¹ãéä¿¡ããŸãã
{ "Id": "score00001", "Instance": { "FeatureVector": { "text": " , ... , , ... " }, "GlobalParameters": {} } }
å¿çãšããŠã次ã®ãã®ãåãåããŸãã
["-1"]
ãããã«
以äžã§ãïŒ ã芧ã®ãšãããAzure Machine Learningã®äœ¿çšã¯éåžžã«ç°¡åã§ãããçŸæç¹ã§ã¯ããã€ãã®åé¡ããããŸãã ããããAzureå šè¬ãšåæ§ã«ãAzure MLã¯éåžžã«é«éã«éçºãããŠãããããããã¹ãŠã®åé¿çãããã«å¿ èŠãªããªãããã°ãæ¶ããããšãæãã§ããŸãã
çµè«ãšããŠãããã«2ã€ã®äŸ¿å©ãªãªã³ã¯ããããŸãã
- Azure Machine Learningã®ã¯ã€ãã¯ã¹ã¿ãŒãïŒ http : //habrahabr.ru/company/microsoft/blog/236823/
- Azureã®Machine Learning CenterïŒ http : //azure.microsoft.com/en-us/documentation/services/machine-learning/