ããŠãããªãã¯ã©ãããæå³ã§ããïŒãã©ãããæå³ã§ãããïŒããã³ä»ã®ã¯ãšãªã®å®äºïŒãããªãã¯æ¬åœã«Vasyaãæ¢ããŠããŸããïŒãïŒ
ãæ°ã«å ¥ãã®æ²ãããŒãããèããŠãã ããïŒ
ç°¡åãªãã®ããå§ããŸãããã ã¯ãšãªã®å®äºã ãŠãŒã¶ãŒã¯æ€çŽ¢è¡ã®å ¥åãéå§ããŸããå ¥åãããšããã«ããã³ãââãã衚瀺ãããã®ã§ãïŒããã¯ãä»ã®äººãšåãããã«ãäžæãªãã®ãæ¢ããã«ãValenokããèãããã§ãïŒã ããããçµæã®æ°ããã£ãŠãã ã©ã®ããã«ãã©ãããããŒã¿ãååŸããŸããïŒ
çŽ2ã€ã®ãªãã·ã§ã³ããããŸãããŠãŒã¶ãŒã¯ãšãªãçŽæ¥å ¥åããæ¹æ³ãšãåã ã®ããŒã¯ãŒããéžæããæ¹æ³ã§ãã
ããŒã¯ãŒãã®ææ¡ã¯ç¹ã«ç°¡åã§ãã äºæããªãååã®ã€ã³ãã¯ãµãŒãæã€ã€ã³ããã¯ã¹äœæããã°ã©ã ã«ã¯ã2ã€ã®çããäºæããªãããŒããããŸããåæã«ã«ãŠã³ãããŸãïŒé »åºŠèŸæžã§ãïŒã ã€ã³ãã¯ãµãŒã䜿çšããŠãããã¥ã¡ã³ãã®ã³ã¬ã¯ã·ã§ã³å ã§æãäžè¬çãªåèªã1äžïŒ100ïŒäœæããŸãã
$ indexer myindex --buildstops dict.txt 10000 --buildfreqs
ãã®ãã¡ã€ã«ã®ãããªãã®ãåŸãããŸãïŒ
i 9533843 ããã³5427048 5418872ãž 5371581 4282225 ããªã2877338 ...
次ã¯æè¡ã®åé¡ã§ãã SQLã©ãã«ãäœæãã12è¡ã§ã€ã³ããŒãã¹ã¯ãªãããèšè¿°ããéåžžã®LIKEããã³ããšããŠäœ¿çšããåèªã®é »åºŠã§çµæã䞊ã¹æ¿ããŸãã ã€ã³ããã¯ã¹ã®LIKEã¯ãããããããªãé«éã«ãªããŸãã ããŒã®å§ãŸããæ¢ããŠããŸãã 1æåã®ã¯ãšãªã®å Žåããããã¿ããšã«äœåè¡ãã·ã£ãã«ããªãããã«ãçµæãããã«èšç®ããŠãã£ãã·ã¥ãããšããã§ãããã ãã ããMySQLã¯ãšãªãã£ãã·ã¥ã¯ãæå¹ã«ãããšä¿åãããããšã«ãªã£ãŠããŸãã
CREATE TABLEããŒã¯ãŒã ïŒ ããŒã¯ãŒãVARCHARïŒ255ïŒNOT NULLã freq INTEGER NOT NULLã INDEXïŒããŒã¯ãŒããé »åºŠïŒ ïŒ; SELECT * FROMããŒã¯ãŒãWHEREããŒã¯ãŒãLIKE 'valenïŒ ' ORDER BY freq DESC LIMIT 10
ãªã¯ãšã¹ãã¯ã»ãŒåãæ¹æ³ã§èŠæ±ãããŸãã ã¿ãã¬ãããLIKEãã©ãã«ãè¡ããŸããã åäžã®åèªã®ä»£ããã«ãä»ã§ã¯å®å šãªè¡ãå¿ èŠã§ãã ã¯ãšãªãã°ã§ããããååŸããå¿ èŠããããããããé »åºŠãèšç®ããŸãã ãã°ã¯ãã¡ã€ã«ã«ãã£ãŠãããã«åŠçãããå¿ èŠããããŸããæ€çŽ¢ã«é¢ãããvasya pupkinããšããè¡ã¯ããVasyaïŒ PupkinïŒãããªãç°ãªããªã¯ãšã¹ããæ€èšããã®ãã¯ããŸããããããŸããã ããã¯ãSphinx APIã®BuildKeywordsïŒïŒã¡ãœããã«ãã£ãŠãããã¯ãããŸããä»»æã®ã¯ãšãªè¡ãååŸãããã®äžã«ããŒã¯ãŒããäœæãïŒå€§æåå°æåã®åæžãªã©ïŒãæ£èŠåãããã¯ãšãªè¡ã埩å ããŸãã ããã¯ãã¹ãŠãæåã«OpenïŒïŒã¡ãœããã䜿çšããŠæ°žç¶çãªæ¥ç¶ãèšå®ããããšã§æé©ã«è¡ãããŸããããããªããšãåäœãäœåãé ããªããŸãã ããŠã芪æã§ã ãã°ããã¡ã€ã«ããœãŒããuniq -cãã¹ã¯ãªããã®ã€ã³ããŒããSQLãã¬ãŒããLIKEãå©çã ãã¡ã€ã«ã¯æ¬¡ã®ããã«ãªããŸãã
$ cl = new SphinxClientïŒïŒ; $ cl-> OpenïŒïŒ; foreachïŒ$ãšã³ããªãšããŠ$ãã°ïŒ { $ããŒã¯ãŒã= $ cl-> BuildKeywordsïŒ$ãšã³ããªããmyindexããfalseïŒ; foreachïŒ$ããŒã¯ãŒããšããŠã®$ããŒã¯ãŒãïŒ $ããŒã¯ãŒã["ããŒã¯ã³å"]ãå°å·ããŸãã ãã; ã\ nããå°å·ããŸãã }
SQLããŒã¿ããŒã¹å ã®2ã€ã®ãã£ãŒã«ãã«é¢ããããã¹ããã¡ã€ã«ããã®ã€ã³ããŒãã¹ã¯ãªããã¯ãèªè ã®å®¿é¡ãšããŠæ®ãããŠããŸãã
ããã¯ãã³ãã§ããããšã«æ°ä»ããŸããã
ãã³ãã倱ã£ãã®ã§ããšã©ãŒã®ä¿®æ£ã«é²ã¿ãŸãã ãåç¥ã®ããã«ãã¿ã€ããã¹ã®ããªãããŒãšããååã¯ã 600çš®é¡æªæºã®æ¹æ³ã§å ¥åã§ããŸã ã ãããã圌女ã«ã¯å§ããããŸãã ãããã圌女ãã¡ã¯æ€çŽ¢ã§ããŸããã圌女ã¯ãŸã£ããæ€çŽ¢ã§ããŸããïŒ ãšã©ãŒã®ããã¯ãšãªã§ã¯ãäœãèŠã€ãããŸããã ããŒãžã¯ç©ºçœã«ãªããŸãã Adsense / Ydirect / Unameitã¯è³ªã®æªãåºåã衚瀺ããŸãã 誰ãã¯ãªãã¯ããŸããã ã¹ã¿ãŒãã¢ããã¯çãå°œããŸãã Sphinxã«é¢ããåçšãµãŒãã¹ãè³Œå ¥ãã人ã¯ããŸãããããããžã§ã¯ããæ»ã«çµ¶ããŸãã ããã¯åãå ¥ããããŸãããããŒã¯ãŒããæ©æ¥ã«ä¿®æ£ããå¿ èŠããããŸãã
ãã¡ãããispellãaspellãhunspellããŸãã¯çŸåšã®ä»»æã®ãã¡ãã·ã§ã³ããã蟌ããªãã·ã§ã³ãåžžã«ãããŸãã æããã«ãããã¯åžžã«xxxspellèŸæžã®å質ã«äŸåããããé©åãªèšèªãååšããªããšããæããã®ã©ã¡ããã«ããã£ãŠããŸãã æ°é èªïŒä¿åïŒãç¹å¥ãªçšèªïŒã¢ã·ãžãŠã ã¢ã»ãããµãªã·ãªãŠã ïŒãå°ççå称ãªã©ã«ã¯äœã®å©ãã«ããªããªãããšã¯æããã§ãã ããã«ãããç§ã¯ãŸã ãã£ãšæ¬²ããã§ãã ãããŠãããã°ã¯ispellã«ã€ããŠã§ã¯ãªããåŸãå¿ èŠããããŸãã
ç¹°ãè¿ããŸãããåšæ³¢æ°èŸæžãå¿ èŠã§ãã 確ãã«ã10,000åãè¶ ããããŒã¯ãŒã-ãŸããªæ£ããåèªãé »ç¹ã«è¿ãåèªã«ãä¿®æ£ããã䟡å€ã¯ãããŸããã éåžžã100äžèªã®èŸæžã§ååã§ããã1000äžèªã§ååã§ãã ã³ãã³ãã¯ã€ã³ãã¯ãµãŒã«å€ãããŸã--buildstops dict.txt 10000000 --buildfreqs MYINDEXNAMEïŒã¡ãªã¿ã«ãC2D E8500ã§ã¯20 MB /ç§ä»¥äžã®é床ã§åäœããŸãïŒã ãã³ããšã¯ç°ãªãããã®ãããªèŸæžã§ã®SQLæ€çŽ¢ã¯åœ¹ã«ç«ã¡ãŸããã ããã«ããŒã¿ãããããªã¯ãšã¹ãã®ã¿ã€ãã¯åãã§ã¯ãããŸããã ããããã¹ãã£ã³ã¯ã¹ã¯åœ¹ç«ã¡ãŸãã
äž»ãªã¢ã€ãã¢ã¯æ¬¡ã®ãšããã§ãã èŸæžããåèªããšã«ããã©ã€ã°ã©ã ã®ã»ãããããããçæããŸãã 3ã€ã®é£ç¶ããæå ã Sphinxã§ãã©ã€ã°ã©ã ã«ã€ã³ããã¯ã¹ãä»ããŸãã 眮æãªãã·ã§ã³ãæ€çŽ¢ããã«ã¯ããšã©ãŒã®ããåèªã®ãã©ã€ã°ã©ã ãäœæããŸããã€ã³ããã¯ã¹ã§ããããæ¢ããŸãã ããã€ãã®åè£ããããŸãã äžèŽãããã©ã€ã°ã©ã ãå€ãã»ã©ãèªé·ã®å·®ã¯å°ãããªããèŠã€ãã£ããªãã·ã§ã³ãèŠã€ããé »åºŠãé«ããªãã»ã©ãããè¯ãçµæãåŸãããŸãã ãããŠãå®äŸã䜿çšããŠãããããã¹ãŠããã詳现ã«åæããŸãã
ã€ã³ãã¯ãµãŒ--buildstopsã«ãã£ãŠäœæãããèŸæžã¯ããŸã 次ã®ããã«ãªã£ãŠããŸãïŒåèªãããæ¬ç©ã«ãªããäŸãããæ確ã«ãªãããã«ãå¥ã®éšåãéžæããŸããïŒã
... ååŒ32431 äœæããã32429 ã©ã€ã32275 å¿ èŠãª32252 ã ãŒã32185 æ»32140 32136ã®èåŸ éåžž32113 ã¢ã¯ã·ã§ã³32053 32052è¡ è ¹ãç«ãŠãŠ32043 ...
åèªããšã«ãäžæã®IDãäœæããåèªèªäœãšãã®é »åºŠãä¿åãããã©ã€ã°ã©ã ãäœæããŠããã¹ãŠãããŒã¿ããŒã¹ã«ä¿åããå¿ èŠããããŸãã ã€ã³ããã¯ã¹ä»ãããŒã¿ããŒã¹ã«ã¿ã€ããã¹ãããå Žåã¯ãããŸãã«ããŸããªåèªãåé€ããã®ãçã«ããªã£ãŠããŸãã ãããããããã¯ã¿ã€ããã¹ã§ãã
CREATE TABLEãµãžã§ã¹ãïŒ id INTEGER PRIMARY KEY AUTO_INCREMENT NOT NULLã ããŒã¯ãŒãVARCHARïŒ255ïŒNOT NULLã trigrams VARCHARïŒ255ïŒNOT NULLã freq INTEGER NOT NULL ïŒ; æ¿å ¥ããŠå€ãææ¡ ... ïŒ735ã 'deal'ã '__ d _de dea eal al_ l __'ã32431ïŒã ïŒ736ããäœææžã¿ããã__ c _cr cre rea eat ate ted ed_ d __ãã32429ïŒã ïŒ737ã 'light'ã '__ l _li lig igh ght ht_ t __'ã32275ïŒã ïŒ738ããå¿ èŠããã__ n _ne nee eed ede ded ed_ d __ãã32252ïŒã ïŒ739ããmoodããã__ m _mo moo ood od_ d __ãã32185ïŒã ïŒ740ã 'death'ã '__ d _de dea eat ath th_ h __'ã32140ïŒã ïŒ741ã 'behind'ã '__ b _be beh ehi hin ind nd_ d __'ã32136ïŒã ïŒ742ããéåžžããã__ u _us usu suaual all lly ly_ y __ãã32113ïŒã ïŒ743ã 'action'ã '__ a _ac act cti tio ion on_ n __'ã32053ïŒã ïŒ744ã 'line'ã '__ l _li lin ine ine _____'ã32052ïŒã ïŒ745ããpissedããã__ p _pi pis iss sse sed ed_ d __ãã32043ïŒã ïŒ746ã 'bye'ã '__ b _by bye ye_ e __'ã32012ïŒã ...
ãã©ã€ã°ã©ã ã§ãã£ãŒã«ãã«ã€ã³ããã¯ã¹ãä»ããããšã ããå¿ èŠã§ãããåè£ãã©ã³ã¯ä»ãããããã«ã¯ïŒããããæè¯ã®ä¿®æ£ãéžæããããïŒãã³ã¬ã¯ã·ã§ã³å ã®åèªã®é·ããšãã®åºçŸé »åºŠãäŸç¶ãšããŠå¿ èŠã§ãã
sql_query = SELECT idãtrigramsãfreqãLENGTHïŒããŒã¯ãŒãïŒAS len FROMææ¡ sql_attr_uint = freq sql_attr_uint = len
æ€çŽ¢ã¯ãšãªã®çµæããçãããåèªãèå¥ããŸããæ€çŽ¢çµæãå°ãªãããïŒãŸãã¯ãŸã£ãããªãïŒå Žåãå¿çã»ã¯ã·ã§ã³$ result ["words"]ãåæããååèªã®ããã¥ã¡ã³ãæ°ã調ã¹ãŸãã ææžãå°ãªãå Žåã¯ããã®ãããªåèªãä¿®æ£ããããšããŸãã ããšãã°ããã¹ãã€ã³ããã¯ã¹ã®ã¯ãšãª "green liight"ã®å Žåã "green"ã®çºçæ°ã¯34421ã§ã "liight"ã®ã¿ã§ããä¿®æ£äœæ¥ã«é²ãã¹ããã®ã¯ããã«ããããŸãã ãå°æ°ãã®ç¹å®ã®ãããå€ã¯ãããã¥ã¡ã³ãããã³ãªã¯ãšã¹ãã®ããŸããŸãªã³ã¬ã¯ã·ã§ã³ã«å¯ŸããŠéåžžã«åå¥ã§ãã èŸæžãšã¯ãšãªãã°ãèŠãŠãããžãã¯å®æ°ãéžæããŸãã
ãã©ã€ã°ã©ã ãäœæãããã©ã€ã°ã©ã ç¹æ®ã€ã³ããã¯ã¹ã§ã¯ãšãªãå®è¡ããŸãã åèªã«ãšã©ãŒãå ¥åãããŠããããã ãã¹ãŠã®ãã©ã€ã°ã©ã ãäžèŽããå¯èœæ§ã¯äœãã§ãã äžæ¹ã1ã€ã®ãã©ã€ã°ã©ã ã®ã¿ãäžèŽããå Žåããã®ãããªåè£ã¯ããŸãé¢å¿ããããŸããïŒããã¯ãåèªã®äžå€®ã®3æåãäžèŽããïŒä»ã«äœããªãïŒå ŽåããŸãã¯å é ã«1æåïŒä»ã«äœããªãïŒå Žåã«ã®ã¿çºçããŸãã ããŠã å®è¶³æ°æŒç®åã䜿çšããŸããããã¯ãŸãã«æ¢ããŠãããã®ã§ããå°ãªããšã2ã€ã®ãã©ã€ã°ã©ã ãäžèŽãããã¹ãŠã®ããã¥ã¡ã³ããçºè¡ããŸãã ãŸããé·ãã®å¶éãå°å ¥ããŠããŸããæ£ããããªã¢ã³ãã®é·ãã¯2æå以å ã§ç°ãªããšæ³å®ããŠããŸãã
$ len = strlenïŒ "liight"ïŒ; $ cl-> SetFilterRangeïŒ "len"ã$ len-2ã$ len + 2ïŒ; $ cl-> QueryïŒ '"__l _li iig igh ght ht_ ht __" / 2'ã 'suggest'ïŒ;
èŠã€ãã£ãåè£è ã®æããœãŒãããããããæé©ãªãã®ãéžæããå¿ èŠããããŸãã ç§ãã¡ãæã£ãŠããèŠå ãæãåºããŠãã ããïŒ
- äžèŽãããã©ã€ã°ã©ã ãå€ãã»ã©è¯ãã
- èªé·ãçãã»ã©è¯ãã
- èŠã€ãã£ããªãã·ã§ã³ãé »ç¹ã«èŠã€ããã»ã©ãããè¯ãçµæãåŸãããŸãã
ããããã¹ãŠã®èŠå ãSphinxã®ææ°ããŒãžã§ã³ã¯ããµãŒããŒåŽã§å®å šã«èšç®ããã³ãœãŒãã§ããŸãã äžèŽãããã©ã€ã°ã©ã ã®æ°ã¯ãã©ã³ã«ãŒSPH_RANK_WORDCOUNTã䜿çšããŠèšç®ã§ããŸãïŒç¹å¥ãªæ€çŽ¢ã®éçšã§ãåãã©ã€ã°ã©ã ã¯åå¥ã®ããŒã¯ãŒããšããŠæ©èœããŸãïŒã èªé·ã®éãã¯absïŒlen- $ lenïŒã§ãé »åºŠã¯freqå±æ§ã«æ ŒçŽãããŸãã èŠå ãèšç®ããããã€ãããŸãšããŠãæé©ãªãã®ãéžæããŸãã
$ cl-> SetMatchModeïŒSPH_MATCH_EXTENDED2ïŒ; $ cl-> SetRankingModeïŒSPH_RANK_WORDCOUNTïŒ; $ cl-> SetSelectïŒ "*ã@ weight + 2-absïŒlen- $ lenïŒAS myrank"ïŒ; $ cl-> SetSortModeïŒSPH_SORT_EXTENDEDã "myrank DESCãfreq DESC"ïŒ;
ããããïŒ liightãšããåèªã¯ãã©ã€ããã£ãã¯ã¹ãæ£åžžã«æ€åºããŸããã ïŒããæ£ç¢ºã«ã¯ãSphinxã¯IDãæ€åºããããŒã¿ããŒã¹ãããlightãè¡ãååŸããŸãïŒã
ããã¯ãSphinx 0.9.9-rc2ã«é©çšããããã¢ã®ä»çµã¿ã§ãïŒã¢ãŒã«ã€ãå ã®misc / suggestãã£ã¬ã¯ããªãåç §ïŒãè¿œå ã®ã³ãŒããèšè¿°ããããšãªããããŒã¿ãããã«è©Šãããšãã§ããŸã:-)
ãã¢ã¯ããã«ç解ã§ããäžå®å šã§ããã ãã¡ã€ã«ã®æ¹è¯ã®å¯Ÿè±¡ãšãªããŸã ã ïŒç³ãèš³ãããŸããããæµæã§ããŸããã§ãããïŒUTF-8ãæåŸ ãããsubstrã䜿çšããããããPHPããã¯ã¹ã®äžéšããã·ã¢èªã§åäœããªããšããå±éºããããŸãã ã»ãŒç¢ºå®ã«ãFREQ_THRESHOLDããããå¿ èŠããããŸãã åèªãã¿ã€ããã¹ãšèŠãªãããç¹æ®ã€ã³ããã¯ã¹ã«åé¡ãããªããšã³ããªã®æå°æ°ã ããŒã¿ã®å°ããªã³ã¬ã¯ã·ã§ã³ã®å Žåã¯äœãã倧ããªã³ã¬ã¯ã·ã§ã³ã®å Žåã¯å¢å ããŸãã åãçç±ã§ïŒãŸããªç Žçãé »ç¹ãªãŽããããé«ãã©ã³ã¯ä»ããããªãããã«ïŒãmyrankã®èšç®åŒãã²ããå¿ èŠããããããããŸããã ããšãã°ã1000åç°ãªãåšæ³¢æ°ã®äžèŽããããªã°ã©ã ã®æ°ã«äœåãªåäœãè¿œå ããŸãã
$ cl-> SetSelectïŒ "*ã@ weight + 2-absïŒlen- $ lenïŒ+ lnïŒfreqïŒ/ lnïŒ1000ïŒAS myrank"ïŒ;
ããã«ããã©ã€ã°ã©ã ã«ãããã©ãŒã«ã¹ã¯å¹æçã§ãããéåžžã«ã·ã³ãã«ã§ãããããŸãèæ ®ãããŠããŸããã ãã©ã€ã°ã©ã ã®é åºã¯èæ ®ãããŸãããã劥åœãªé·ãã®åèªã®å Žåãããã¯äžè¬ã«åé¡ã§ã¯ãããŸããã ããã«èå³æ·±ãã®ã¯ã人ã ãã©ã®ããã«ééããŠããããèæ ®ããŠããªãããšã§ãïŒé£æ¥ãã2ã€ã®æåã3ã°ã©ã ã®æ°ã ã䞊ã¹æ¿ããããšã¯ããããã®æåãä»ã®æåïŒïŒïŒ ããŒããŒãäžã®æåã®è¿ãã¯ããããªãå Žåã«ãèæ ®ãããŸããã é³å£°ã®è¿æ¥æ§ïŒå®éã«ã¯/ akshullyïŒã¯ããããªãæ¹æ³ã§ãèæ ®ãããŸããã å°ãªãè¡ã§è£æ£ã®è³ªãæ¹åããããã®ããªãæçœãªã¢ã€ãã¢ïŒ1ã€ã®æè¯ã®éžæè¢ã®ä»£ããã«ã10-20åãåãåºããã¯ã©ã€ã¢ã³ãã®ã¬ãŒãã³ã·ã¥ã¿ã€ã³è·é¢ãæ°ããèšç®çµæã調æŽããŸãã ããããã®è¡ãæµããã°ãä»ã®èªäœã¢ã«ãŽãªãºã ã䜿çšããŠã1ããŒã¹ãŸãã¯2åè£ããæ°ãããããšãã§ããŸãã
äžè¬ã«ããã¢ã¯ãã®ãŸãŸäœ¿çšã§ããŸãã ããããããã¯ãŸã ãã¢ã§ããããããªãåµé æ§ã®ããã«å€ãã®ã¹ããŒã¹ããã£ã³ã»ã«ãã人ã¯ããŸããã ããã°ã®äœæãçºæãäœæããããã®éä¿¡ïŒ