èŠããã«ãdatrie.Trieã¯ãç¹å®ã®æ¡ä»¶äžïŒããŒ-æååïŒã§ããå°ãªãã¡ã¢ãªãæ¶è²»ããåäžã®èŠçŽ ãååŸããé床ã«å¹æµããè¿œå ã®æäœããµããŒãããïŒç¹å®ã®æååã®ãã¹ãŠã®ãã¬ãã£ãã¯ã¹ãååŸããç¹å®ã®è¡ã§å§ãŸãè¡ãªã©ïŒã¯ãèŸæžæäœãšåçšåºŠã®é床ã§æ©èœããŸãã
Python 2.6-3.3ã§åäœãããŠãã³ãŒããLGPLã©ã€ã»ã³ã¹ããµããŒãããŸãã
datrieã¯ã libdatrieã©ã€ãã©ãªã®Cythonã©ãããŒã§ãã Libdatrieã¯ãããŒãéã®é·ç§»ã®ã¹ããŒããã·ã³ãç¹å¥ãª2ã€ã®é åã«æ ŒçŽããããã¬ãã£ãã¯ã¹ããªãŒã®ããªã¢ã³ããå®è£ ãããµãã£ãã¯ã¹å§çž®ãå®è£ ãããŸãïŒåå²ããªããã©ã³ãã¯ããããŒã«ãã®å¥ã®é åã«æ ŒçŽãããŸãïŒã ããã¯ããã©ã€ã®ãæå 端ãããŒãžã§ã³ã§ã¯ãããŸããïŒHAT-ãã©ã€ãªã©ã¯ããé«éã§ããå¿ èŠããããŸãïŒãããªãã·ã§ã³ã¯éåžžã«é«é/å¹ççã§ãæ¢æã®å®è£ ãåªããŠããŸãïŒãããŠãå®è£ ã¯å®éã«ã¢ã«ãŽãªãºã ã殺ãããšãã§ããŸãïŒã
Pythonã®ãã©ã€åæ§é ã®æ¢åã®ãªãã·ã§ã³ã¯ç§ã«ã¯åããŠããŸããã§ããã çŽç²ã«pythonicã®å®è£ ã¯å¿ ç¶çã«å€ãã®ã¡ã¢ãªãæ¶è²»ããããã«äžæãããŸãã Pythonã®ãã©ã€åæ§é ã®ä»ã®å®è£ ïŒ
- biopythonã®trie.cã ãŠãã³ãŒãããµããŒãããŠããŸããïŒããã§ãæ§ããŸããïŒã3.xã§åäœããŸããïŒ3.xã§åäœãããã«ã¯ãã©ã€ãã©ãªãã裞ã®ãCæ¡åŒµãšããŠå®è£ ãããŠãããããã©ãããŒãé©åã«æžãæããå¿ èŠããããŸãïŒã
- buriyã® github.com/buriy/python-chartrieã¯éåžžã«é«éã§ïŒ__getitem__ã¯æçµçã«datrieãããé«éã§ãïŒã倧éã®ã¡ã¢ãªãå¿ èŠãšããŸãïŒãããä¿®æ£ããã«ã¯ãããããããŒã¿æ§é ãã€ãŸãã©ã€ãã©ãªã®ããŒã¹ãå€æŽããå¿ èŠããããŸãïŒãæ©èœçã§ãUnicodeããµããŒãããŠããŸããïŒãã ããããã¯åé¡ãããŸããïŒã
- www.dalkescientific.com/Python/PyJudy.html-å€ããè€éãªãåäœãããã©ããã¯äžæã§ãïŒ2.3ãš2.4ããµããŒãããããã«æžãããŠããŸãïŒ
ä»ã«ãäœããããããããŸãããããããã«ããããããŸãæ©èœãããã¹ãŠã®äººã«å¹æãããïŒããªãã·ã§ã³ãèŠã€ãããªãã£ãã®ã§ãCãèŠããŠãCythonãšPythonã§éåžžã®æ¬ åŠãå©çšããããšãã欲æ±ãé ãããšæããŸãPythonã®ãã©ã€å®è£ ã
ã€ã³ã¹ããŒã«ïŒéåžžãæ¡åŒµæ©èœãã€ã³ã¹ããŒã«ããå Žåã®ããã«ãã³ã³ãã€ã©ãå¿ èŠã§ãïŒïŒ
pip install datrie
äœæïŒ
import string import datrie trie = datrie.Trie(string.ascii_lowercase)
äœæãããšãããã®ãã©ã€ã§äœ¿çšã§ããããŒãããã«èšãå¿ èŠããããŸãïŒæ瀺çã«ã¢ã«ãã¡ããããŸãã¯ç¯å²ã瀺ããŸãïŒã ããã¯libdatrieã®å¶éã§ããããã«ãããã¹ããŒããã·ã³ãå¹æçã«ä¿åããUnicodeããµããŒãã§ããŸãããŸããæå¹ãªUnicodeæåã®ç¯å²ãæå®ããã次ã«UnicodeããŒãããã³ã³ãã¯ããªå éšè¡šçŸã«å€æãããŸãã
å®éã«ã¯cp1251ã®ãããªã·ã³ã°ã«ãã€ããšã³ã³ãŒãã£ã³ã°ã§ç®¡çããã»ãŒåãæ©èœãšå¹çãå®çŸããããšã¯å¯èœã ãšæãããŸãããæåç¯å²ã®ã¢ãããŒããããŸãæ©èœããŸãããããlibdatrieã§è¡ãããæ¹æ³ã§ãã ãããã£ãŠããUnicodeããµããŒãããŠããŸãã-ããã§æ§ããŸããããšæžããŸãããã©ã€ã®å Žåãã·ã³ã°ã«ãã€ããšã³ã³ãŒãã£ã³ã°ã䟿å©ãªãªãã·ã§ã³ã§ãã
次ã«ããã©ã€ã䜿çšãããšãèŸæžã®ããã«äœæ¥ã§ããŸãã
>>> trie[u'foo'] = 5 >>> trie[u'foobar'] = 10 >>> trie[u'bar'] = 'bar value' >>> trie.setdefault(u'foobar', 15) 10 >>> u'foo' in trie True >>> trie[u'foo'] 5
ããŒã¯ãŠãã³ãŒãã§ããå¿ èŠããããŸã:) Python 3.xã§ã¯ãããã¯éåžžã«èªç¶ã§ãã2.xã§ã¯ãäŸã«æåuãå ¥ããå¿ èŠããããŸããããããªããã å€ã¯ä»»æã®Pythonãªããžã§ã¯ãã«ããããšãã§ããŸãããããã¯ãã©ã€Cå®è£ ã§ã¯äžè¬çã§ã¯ãããŸããïŒéåžžãå€ãšããŠæŽæ°ããããŸãïŒã å®éãå€ã®ãå éšãã¯å®éã«ã¯æŽæ°ã§ãããdatrie.Trieã¯ãããããå®éã®ãå€ã®é åã®ã€ã³ããã¯ã¹ãšããŠäœ¿çšããŸãã ãã®æ©èœãå¿ èŠãšããªã人ïŒããšãã°ãå€ããŸã£ããé¢çœããªãïŒã®ããã«ãdatrieã«ã¯datrie.BaseTrieããããŸããããã¯ãããã«é«éã§ãæ°å€ã®ã¿ãä¿åã§ããŸãã
é床ã«ã€ããŠå°ãã ãã¹ãŠã®æž¬å®ã¯ãã©ã€ã§è¡ãããŸããã10äžã®ãŠããŒã¯ãªãã·ã¢èªãšè±èªã®åèªïŒ50/50ïŒãšintå€ã1ãã䜿çšããŠãä»ã®æž¬å®ïŒ100äžURLïŒãããã§ç¢ºèªããããïŒããã«è¯ãïŒç¬èªã®ããŒã¿ã§å®è¡ã§ããŸãã é床ã®æž¬å®ã¯ãæ°éã®é åºã«ã€ããŠäžè¬çãªèããæã¡ãã©ã€ãã©ãªå ã®ååž°ã远跡ããããã«ã®ã¿è¡ã£ãã®ã§ãããã«å¿ããŠæž¬å®ããŸããã ãã¹ãŠã®ãœãŒã¹ã³ãŒããšããŒã¿ã¯ãªããžããªã«ãããŸãã ããã«ãããŸããŸãªæäœã®æŒžè¿çãªè€éãã«ã€ããŠã¯ã©ãã«ãæžããŸããã 圌女ãæ¢æ€ããŸããã§ããã çè«çã«ã¯ãWikipediaã®ãã©ã€ïŒããšãã°ãèŠçŽ ã®ååŸãŸãã¯æ¥é èŸOïŒmïŒãmã¯ããŒã®é·ãïŒã«ããæ€çŽ¢ã®ããã«ãã¹ãã§ãããå®è£ ã®è©³çŽ°ïŒlibdatrieãšã©ãããŒã®äž¡æ¹ïŒã¯ãã¹ãŠãå€æŽã§ããŸãã 誰ããé©åãªã°ã©ããäœæãããããã¡ããæè¬ããŸãã
èŠçŽ ã®åä¿¡ããã§ãã¯ã€ã³ãèŠçŽ ã®æŽæ°ã®æäœã¯ãæšæºã®èŸæžããå¹³åã§2ââã3åé ããªããŸãïŒã€ãŸããããã§ã¯ãåè¿°ã®ãã©ã€ã§ã¯1ç§ããã100ã300äžåã®æäœã§ãïŒã äŸå€ã¯ãæ°ããå€ããã©ã€ã«æ¿å ¥ããããšã§ããããã¯éåžžã«é ãåäœããŸãïŒãã·ã¢èªãšè±èªã®åèªã䜿çšããåããã©ã€ã§ã¯ãæ¯ç§çŽ5äžåã®æäœïŒã åæã«ããã®ãããªããŒã¿ãè©ŠããŠã¿ããšãRAMã®ã¹ããŒã¹ãã¯ããã«å°ãªããªããŸãïŒ3ãã5ã¡ã¬ãã€ãïŒã€ã³ã¿ãŒããªã¿ãŒã«äŸåïŒvséåžžã®èŸæžã§20ã¡ã¬ãã€ã+ ïŒã¡ã¢ãªãäžåšçšã«æž¬å®ããç¹å®ã®æ°å€ãä¿èšŒã§ããŸããïŒ ã
ã€ãŸã datrie.Trieã¯ãdictã®ä»£æ¿ãšããŠäœ¿çšã§ããŸããéåžžã«é·ããªãè¡ïŒåèªãURLãªã©ïŒãå€æ°ããå ŽåãããŒã¿ã¯äž»ã«èªã¿åãå°çšã¢ãŒãïŒãŸãã¯ãæŽæ°å°çšãïŒã§äœ¿çšãããRAMã¡ã¢ãªãç¯çŽããŸããã¢ã¯ã»ã¹é床ã2ã3åäœäžããŸãã
ãã®æ¬äŒŒèªã¿åãå°çšãããŸãæ¥ãããããªãããã«ããã©ã€ããã¡ã€ã«ã«ä¿åããŠãã¡ã€ã«ããããŒãã§ããŸãïŒ
>>> trie.save('my.trie') >>> trie2 = datrie.Trie.load('my.trie')
ãã©ã€ã®ãã1ã€ã®ç¹åŸŽã¯ããã£ã¯ããŒã·ã§ã³ïŒããã³ãã®ä»ã®ããã·ã¥ããŒãã«ïŒã§ããã¬ãã£ã¯ã¹ããªãŒã«è¿œå ã®ã€ã³ããã¯ã¹ãäœæããããã«å®å šãªåæãŸãã¯å€§éã®ã¡ã¢ãªãå¿ èŠãšããäžéšã®æäœãã»ãŒåãé床ã§åäœããããšã§ãïŒããã«é«éã§ãïŒ ãå€ãååŸããã ãã§ãªãã
ããŒããã®æ¥é èŸã§å§ãŸãèŠçŽ ãtrieã«ãããã©ããã確èªã§ããŸãïŒããã¯åçŽãªãã§ãã¯ãããé«éã§ãïŒã
>>> trie.has_keys_with_prefix(u'fo') True
ãã®ãã©ã€ã«ãããã®è¡ã®ãã¹ãŠã®ãã¬ãã£ãã¯ã¹ãèŠã€ããããšãã§ããŸãïŒãã¹ãã«ããã°ãããã¯ããé ãã§ã-500-600åæäœ/ç§ïŒïŒ
>>> trie.prefixes(u'foobarbaz') [u'foo', u'foobar'] >>> trie.prefix_items(u'foobarbaz') [(u'foo', 5), (u'foobar', 10)]
ãã®è¡ã§å§ãŸãããŒãæã€ãã¹ãŠã®èŠçŽ ãèŠã€ããããšãã§ããŸãã
>>> trie.keys(u'fo') [u'foo', u'foobar'] >>> trie.items(u'fo') [(u'foo', 5), (u'foobar', 10)] >>> trie.values(u'foob') [10]
æåŸã®äŸã§ã¯ãã»ãšãã©ã®æéã¯æ€çŽ¢ã§ã¯ãªãçµæã®æ§ç¯ã«è²»ããããŠããŸãããããã¯æé©åã§ããŸãã ããã§ãé床ã¯çŽ15äžãã30äžå€/ç§ã«ãªããŸããïŒããšãã°ãé·ã7ã®ãã¬ãã£ãã¯ã¹ãšå¹³å3ã€ã®å€ã䜿çšãããšãããã¯7äžæäœ/ç§ã«ãªããŸãïŒã
Datrie.Trieã«ã¯ãŸã ããŸããŸãªæ¹æ³ãããããããã«é¢ãããã«ãããããé床枬å®ã®è¿œå ã®çµæã¯ãªããžããªã®READMEã§èŠãããšãã§ããŸãã
pypyã§ã¯ããã¹ãŠãDebianã§èµ·åããŸããïŒcpythonã®å Žåããã10åé ãïŒã pypyã®äžã®ã±ã·ã§ã¯ãèµ·åããŸããã§ããïŒéåžžã®pythonã§ã¯ãã±ã·ãLinuxãããã³Windowsã§åäœããã¯ãã§ãïŒã C API-pypyã®æ¡åŒµã¯åžžã«é ã cpyextæŸèæãä»ããŠåäœããcythonã¯Cæ¡åŒµAPIãçæããŸãã ctypesã§ã©ãããŒãæžãããšã¯ã§ããŸãããéåžžã®pythonã§ã¯é ããªããŸãïŒpypyã§éããªããšããäºå®ã§ã¯ãããŸããïŒ+ ctypesæ¡åŒµæ©èœã®é åžã¯äžäŸ¿ã§ãã pypyã®äººãã¡ã¯cffiãèŠãŠãïŒcpythonãšpypyã®äž¡æ¹ã§ïŒé«éã«ãªããšçŽæããŠããŸãã ããã«ãããããCythonã¯Cæ¡åŒµAPIã§ã¯ãªãCffiæ¡åŒµãçæããããšããã€ãåŠã¶ã§ãããã ãããããããããã幞ããæ¥ãã§ãããïŒãã®éãç§ã¯pypyã§äœããã¹ããããããŸããã ãã¶ãç§ã¯ããŸããã ã©ããããããããã¹ãŠãLinuxã§åäœããåé¡ãããŸããã
å®è£ äžã«ãPythonã§åæ¢ããutf_32_leã³ãŒããã¯ã«ééããŸããã ãããã«ãã°ããããŸãïŒ bugs.python.org/issue15027 ïŒãããããã¯ãŸã ã³ããããããŠããŸããã åœåãdatrieã®ãã¹ãŠã®æäœã¯10åé ããªããŸãããã1ãæã§æšæºã®python utf_32_leã§æååããšã³ã³ãŒãããã«å®è¡ã§ãããã¹ãŠãããŸãæ©èœããŸããã ãã®ã³ãŒããã¯ã¯ãããã€ãã®ãããããªãå Žæã§äœ¿çšããããããå éããããšããããªå ã®äžéšã®æäœã§æ倧2åã®é床ãåŸãããšãã§ãããšæããŸãã
ããªãŒã®å埩ã¯çŸåšãæãå¹ççã§ã¯ãªããlibdatrieã€ã³ã¿ãŒãã§ãŒã¹æ©èœãšæ¥ç¶ãããŠããŸãã ããããlibdatrieã®äœè ã¯åªç§ãªäººç©ã§ããããã¹ãŠãä¿®æ£ããããšããŠããã®ã§ãèŠéãã¯æªããããŸããã
ãã€ãã®ããã«ããããããã°ã¬ããŒããã¢ã€ãã¢ããã³ãããŒã¯ããã«ãªã¯ãšã¹ããªã©ã¯å€§æè¿ã§ãïŒ
github / bitbucket ã ã©ã¡ãã䟿å©ãªæ¹ ã