æ®å¿µãªãããæäœäžã«pcreãäŸç¶ãšããŠããã«ããã¯ã§ããããšãå€æãã倧ããªæåã§ã¯ãã®ã«ãŒã«ã»ããã¯é ãããŸãã ããšãã°ãã¡ã¬ãã€ããµã€ãºã®æåã§ãpcreã¯çŽ1ã®ã¬ãã€ãïŒïŒïŒã®ããã¹ãããã§ãã¯ããããšãå€æããŸããã æ£èŠè¡šçŸã®ããã¹ãã®éãå¶éãããªã©ã®ããŸããŸãªããªãã¯ã¯ãã«ãŒã«ã«æªåœ±é¿ãåãŒããpcre_jit_execãä»ããjité«éãã¹ãéäžçã«äœ¿çšããŠpcreãæé©åããããšã¯éåžžã«å±éºã§ããããšãå€æããŸãã-ããã€ãã®å€ãè¡šçŸã¯ãããšãã°UTF-8ã®æåããå·ã€ããããããããã°ã©ã ã¹ã¿ãã¯ãç ŽæããåçŸå¯èœãªãã°ãçºçããŸããã ããããé«è² è·ã®äŒè°ã§ãVyacheslav Olkhovchenkovãšè©±ããã圌ã¯ãã€ããŒã¹ãã£ã³ãèŠãããã«ã¢ããã€ã¹ããŸããã 次ã«ãèŠç¹ã説æãããããäœããããããã®ãã説æããŸãã
ãã€ããŒã¹ãã£ã³ã«ã€ããŠç°¡åã«
Hyperscanã¯ããªãé·ãæŽå²ãæã€ãããžã§ã¯ãã§ãããDeep Packet IntrospectionïŒDPIïŒãœãªã¥ãŒã·ã§ã³ã販売ããããã«äœæãããŸããã ããããæšå¹Ž10æã«Intelã¯ãœãŒã¹ã³ãŒãããªãŒãã³ããããšã決å®ããããã«BSDã©ã€ã»ã³ã¹ã®äžã§ãã決å®ããŸããã ãã®ãããžã§ã¯ãã¯C ++ã§æžãããŠãããããŒã¹ããããªãéäžçã«äœ¿çšããŠããŸããããã«ããã移æ€æã«ç¹å®ã®åé¡ãçºçããŸãïŒåŸã§è©³ãã説æããŸãïŒã å éšã§ã¯ããã€ããŒã¹ãã£ã³ã¯é決å®æ§æéç¶æ ãã·ã³ïŒNFAïŒã«åºã¥ãéããã¯ãã©ããã³ã°æ£èŠè¡šçŸãšã³ãžã³ã§ãã ååãšããŠããã¹ãŠã®ææ°ã®ããã©ãŒãã³ã¹æåã®æ£èŠè¡šçŸãšã³ãžã³ã¯åãçè«ã䜿çšããŠèšè¿°ãããããã¯ãã©ããã³ã°ãå®å šã«æŸæ£ããŸãïŒãã¡ãããç·åœ¢æ€çŽ¢é床ã確ä¿ãããå Žåã¯éåžžã«åççã§ãïŒã ç§ã«ãšã£ãŠæãéèŠãªãã€ããŒã¹ãã£ã³æ©èœã¯ãããã€ãã®ããã¹ãã§å€ãã®æ£èŠè¡šçŸãåæã«å®è¡ã§ããããšã§ããã pcreã§åãããšãè¡ãããã®åçŽãªã¢ãããŒãã¯ãè€æ°ã®åŒã|ãšçµã¿åãããŠè©Šãããšã§ãã äžçš®ã®ããœãŒã»ãŒãžãã§ïŒ
ïŒïŒïŒre1ïŒ|ïŒïŒïŒre2ïŒ| ... |ïŒïŒïŒreNïŒã æ®å¿µãªããããã®ã¢ãããŒãã¯æ©èœããŸãããã¿ã¹ã¯ã¯ã以åã«æ©èœããåŒã ãã§ãªããæå®ãããåŒã®ã»ãããããã¹ãŠã®ãªã«ã¬ã³ã¹ãèŠã€ããããšã§ããããã§ãã Hyperscanã¯ãã®ããã«ã¯æ©èœããŸãããæ€åºãããååŒã«å¯ŸããŠãããã¹ãå ã®äœçœ®ïŒãã ãããšã³ããªã®æåŸã®æåã®ã¿ïŒã§ã³ãŒã«ããã¯é¢æ°ãåŒã³åºããŸããããã¯æ¬¡ã®å³ã«ç€ºãããšãã§ããŸãã
ãã€ããŒã¹ãã£ã³ã¢ãŒããã¯ãã£
Hyperscanã¯ãã³ã³ãã€ã©ãšå®éã«ã¯ãšã³ãžã³ã®2ã€ã®éšåã§æ§æãããŠããŸãã ã³ã³ãã€ã©ãŒã¯ãã©ã€ãã©ãªãŒã®éåžžã«å€§ããªéšåã§ãããåŒã®æãåãããã®ããã®NFAãæ§æããããšãã°AVX2ãªã©ã®ãã¯ãã«åœä»€ã䜿çšããŠããã®NFAãã¢ã»ã³ãã©ãŒã³ãŒãã«å€æã§ããŸãã ãã®ã¿ã¹ã¯ã¯é£ããããªãœãŒã¹ãæ¶è²»ãããããã³ã³ãã€ã©ãŒã¯çµæã³ãŒããã·ãªã¢ã«åããŠåŸã§äœ¿çšããããšãã§ããŸãã æ€çŽ¢ãšã³ãžã³ã¯å°ããªã©ã€ãã©ãªã§ãããå®éãç¹å®ã®ããã¹ãã§NFAãå®è¡ããããã«èšèšãããŠããŸãã ããªã³ã³ãã€ã«ãããåŒã»ããã®ã¿ã䜿çšããã¢ããªã±ãŒã·ã§ã³ã®å Žåãå¥åã®libhs_runtimeã©ã€ãã©ãªã䜿çšã§ããŸããããã¯ãã³ã³ãã€ã©+ãšã³ãžã³libhsãçµã¿åãããã©ã€ãã©ãªãããã¯ããã«å°ãããªããŸãã ãŸãããã€ããŒã¹ãã£ã³ãæ§ç¯ããããšããŸãããããã¯ãã·ã¹ãã ã«ããªãæ°ããããŒã¹ããããã°ãéåžžã«ç°¡åãªã¿ã¹ã¯ã§ããããšãããããŸããïŒæå°ããŒãžã§ã³1.57ïŒã ããããå¯äžã®çºèšã¯ããããã°ã·ã³ãã«ã䜿çšããŠãã«ãããå Žåãã©ã€ãã©ãªã¯å·šå€§ã§ãããšããããšã§ã-ç§ã®Macbookã§ã¯çŽ200MBã§ãã ãŸããããã©ã«ãã§ã¯éçã©ã€ãã©ãªã®ã¿ãåéããããããæ¥ç¶æã«ããã€ããªã®ãµã€ãºã200Mbã®é åã§ååŸãããŸãã ãããã°ã·ã³ãã«ãå¿ èŠãªãå Žåã¯ãæå®ããã«ãã€ããŒã¹ãã£ã³ãåéããããšããå§ãããŸã
-DCMAKE_BUILD_TYPE = MinSizeRelèšå®æ®µéã§cmakeã䜿çšããŸãã
ãã¹ãã³ãŒã
ããããããã€ããŒã¹ãã£ã³ãšpcreãæ¯èŒããŠããã¿ãã¬ã®äžã§èŠãããšãã§ããéåžžã«ç²éãªãããã¿ã€ããæžããŠã¿ãŸããïŒç§ã¯ããªãã«èŠåããŸã-ããã¯ãããã¿ã€ãã³ãŒãã§ãå質ã䞻匵ããã«æ¥ãã§æžãããŸããïŒã
Pcreãšãã€ããŒã¹ãã£ã³ã®æ¯èŒã³ãŒã
#include <iostream> #include <string> #include <fstream> #include <vector> #include <stdexcept> #include <algorithm> #include <set> #include "pcre.h" #include "hs.h" #include <time.h> #ifdef __APPLE__ #include <mach/mach_time.h> #endif using namespace std; double get_ticks(void) { double res; #if defined(__APPLE__) res = mach_absolute_time() / 1000000000.; #else struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); res = (double)ts.tv_sec + ts.tv_nsec / 1000000000.; #endif return res; } struct pcre_regexp { pcre* re; pcre_extra* extra; pcre_regexp(const string& pattern) { const char* err; int err_off; re = pcre_compile(pattern.c_str(), PCRE_NEWLINE_ANYCRLF, &err, &err_off, NULL); if (re == NULL) { throw invalid_argument(string("cannot compile: '") + pattern + "' error: " + err + " at offset: " + to_string(err_off)); } extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &err); if (extra == NULL) { throw invalid_argument(string("cannot study: '") + pattern + "' error: " + err + " at offset: " + to_string(err_off)); } } }; struct cb_context { set<int> approx_re; vector<pcre_regexp> pcre_vec; }; struct cb_data { struct cb_context* ctx; vector<int> matched; const std::string* str; }; bool remove_uncompileable(const string& s, int id, struct cb_context* ctx) { hs_compile_error_t* hs_errors; hs_database_t* hs_db; if (hs_compile(s.c_str(), HS_FLAG_ALLOWEMPTY, HS_MODE_BLOCK, NULL, &hs_db, &hs_errors) != HS_SUCCESS) { cout << "pattern: '" << s << "', error: " << hs_errors->message << endl; if (hs_compile(s.c_str(), HS_FLAG_ALLOWEMPTY | HS_FLAG_PREFILTER, HS_MODE_BLOCK, NULL, &hs_db, &hs_errors) != HS_SUCCESS) { cout << "completely bad pattern: '" << s << "', error: " << hs_errors->message << endl; return true; } else { ctx->approx_re.insert(id); } } else { hs_free_database(hs_db); } return false; } int match_cb(unsigned int id, unsigned long long from, unsigned long long to, unsigned int flags, void* context) { auto cbdata = (struct cb_data*)context; auto& matched = cbdata->matched; if (cbdata->ctx->approx_re.find(id) != cbdata->ctx->approx_re.end()) { int ovec[3]; auto re = cbdata->ctx->pcre_vec[id]; auto* begin = cbdata->str->data(); auto* p = begin; auto sz = cbdata->str->size(); while (pcre_exec(re.re, re.extra, p, sz - (p - begin), 0, 0, ovec, 3) > 0) { p = p + ovec[1]; matched[id]++; } } else { matched[id]++; } return 0; } int main(int argc, char** argv) { ifstream refile(argv[1]); vector<string> re_vec; double t1, t2, total_ticks = 0; struct cb_context ctx; int ls; pcre_config(PCRE_CONFIG_LINK_SIZE, &ls); cout << ls << endl; for (std::string line; std::getline(refile, line);) { re_vec.push_back(line); } string re_pipe; const char** pats = new const char*[re_vec.size()]; unsigned int i = 0, *ids = new unsigned int[re_vec.size()]; //re_vec.erase(remove_if(re_vec.begin(), re_vec.end(), remove_uncompileable), re_vec.end()); for (i = 0; i < re_vec.size(); i++) { const auto& r = re_vec[i]; remove_uncompileable(r, i, &ctx); pats[i] = r.c_str(); ids[i] = i; re_pipe = re_pipe + string("(") + r + string(")|"); } // Last | re_pipe.erase(re_pipe.size() - 1); total_ticks = 0; for (const auto& r : re_vec) { t1 = get_ticks(); ctx.pcre_vec.emplace_back(r); t2 = get_ticks(); total_ticks += t2 - t1; } cout << "PCRE compile time: " << total_ticks << endl; ifstream input(argv[2]); std::string in_str((std::istreambuf_iterator<char>(input)), std::istreambuf_iterator<char>()); hs_compile_error_t* hs_errors; hs_database_t* hs_db; hs_platform_info_t plt; hs_populate_platform(&plt); unsigned int* flags = new unsigned int[re_vec.size()]; for (i = 0; i < re_vec.size(); i++) { if (ctx.approx_re.find(i) != ctx.approx_re.end()) { flags[i] = HS_FLAG_PREFILTER; } else { flags[i] = 0; } } t1 = get_ticks(); if (hs_compile_multi(pats, flags, ids, re_vec.size(), HS_MODE_BLOCK, &plt, &hs_db, &hs_errors) != HS_SUCCESS) { cout << "BAD pattern: '" << re_vec[hs_errors->expression] << "', error: " << hs_errors->message << endl; return -101; } t2 = get_ticks(); cout << "Hyperscan compile time: " << (t2 - t1) << "; approx re: " << ctx.approx_re.size() << "; total re: " << re_vec.size() << endl; char* bytes = NULL; size_t bytes_len; t1 = get_ticks(); if (hs_serialize_database(hs_db, &bytes, &bytes_len) != HS_SUCCESS) { cout << "BAD" << endl; return -101; } t2 = get_ticks(); cout << "Hyperscan serialize time: " << (t2 - t1) << "; size: " << bytes_len << " bytes" << endl; hs_database_t* hs_db1 = NULL; t1 = get_ticks(); if (hs_deserialize_database(bytes, bytes_len, &hs_db1) != HS_SUCCESS) { cout << "BAD1" << endl; return -101; } t2 = get_ticks(); cout << "Hyperscan deserialize time: " << (t2 - t1) << "; size: " << bytes_len << " bytes" << endl; auto matches = 0; total_ticks = 0; for (const auto& re : ctx.pcre_vec) { int ovec[3]; auto* begin = in_str.data(); auto* p = begin; auto sz = in_str.size(); t1 = get_ticks(); while (pcre_exec(re.re, re.extra, p, sz - (p - begin), 0, 0, ovec, 3) > 0) { p = p + ovec[1]; matches++; } t2 = get_ticks(); total_ticks += t2 - t1; } //cout << re_pipe << endl; cout << "Time for individual re: " << total_ticks << "; matches: " << matches << endl; //cout << "Time for piped re: " << (t2 - t1) << endl; hs_scratch_t* scratch = NULL; int rc; if ((rc = hs_alloc_scratch(hs_db1, &scratch)) != HS_SUCCESS) { cout << "bad scratch: " << rc << endl; return -102; } struct cb_data cbdata; cbdata.ctx = &ctx; cbdata.matched = vector<int>(re_vec.size(), 0); cbdata.str = &in_str; t1 = get_ticks(); if ((rc = hs_scan(hs_db1, in_str.data(), in_str.size(), 0, scratch, match_cb, &cbdata)) != HS_SUCCESS) { cout << "bad scan: " << rc << endl; return -103; } t2 = get_ticks(); matches = 0; for_each(cbdata.matched.begin(), cbdata.matched.end(), [&matches](int elt) { matches += elt; }); cout << "Time for hyperscan re: " << (t2 - t1) << "; matches: " << matches << endl; return 0; }
çµæã¯éåžžã«å°è±¡çã§ããã1ã¡ã¬ãã€ãã®ã¹ãã ã¡ãŒã«ãš1000åãŸã§ã®æ£èŠè¡šçŸã®ã»ããã§ã次ã®çµæãåŸãããŸããã
PCREã³ã³ãã€ã«æéïŒ0.0138553 ãã€ããŒã¹ãã£ã³ã®ã³ã³ãã€ã«æéïŒ4.94309; çŽreïŒ191; åèšæ¥æïŒ971 ãã€ããŒã¹ãã£ã³ã®ã·ãªã¢ã«åæéïŒ0.00312218; ãµã€ãºïŒ5242956ãã€ã ãã€ããŒã¹ãã£ã³ã®ãã·ãªã¢ã©ã€ãºæéïŒ0.00359501; ãµã€ãºïŒ5242956ãã€ã åã ã®æ¥æïŒ0.440707; ãããïŒ7 ãã€ããŒã¹ãã£ã³ã®æéïŒ0.0770988; ãããïŒ7
ãã¬ãã£ã«ã¿ãŒ
ãã€ããŒã¹ãã£ã³ã®æãå°è±¡çãªæ©èœã®1ã€ã¯ããã¬ãã£ã«ã¿ãŒãšããŠæ©èœããããšã§ãã ãã®ã¢ãŒãã¯ãåŒã«ãµããŒããããŠããªãã³ã³ã¹ãã©ã¯ããããå Žåãããšãã°åãããã¯ãã©ããã³ã°ãããå Žåã«äŸ¿å©ã§ãã ãã®ã¢ãŒãã§ã¯ããã€ããŒã¹ãã£ã³ã¯ãæå®ããããµããŒããããŠããªãåŒã®ä¿èšŒãããã¹ãŒããŒã»ããã§ããåŒãäœæããŸãã ã€ãŸããæ°ããåŒã¯æåã®æäœã®ãã¹ãŠã®ã±ãŒã¹ã§æ©èœããããšãä¿èšŒãããŠããŸãããä»ã®ã±ãŒã¹ã§ãæ©èœãã誀æ€ç¥æäœãçºçããå¯èœæ§ããããŸãã ãã®ãããpcreãªã©ã®åŸæ¥ã®æ£èŠè¡šçŸãšã³ãžã³ã§çµæã確èªããå¿ èŠããããŸãïŒãã ãããã®å Žåã¯ããã¹ãå šäœãå®è¡ããå¿ èŠã¯ãããŸããããäºåè¡šçŸã®æåããçºçç®æãŸã§ç¢ºèªããå¿ èŠããããŸãïŒã ããã¯ã次ã®å³ã«æ確ã«ç€ºãããŠããŸãã
ã³ã³ãã€ã«ã®åé¡
æ®å¿µãªããã2ã€ã®äžå¿«ãªç¬éãæããã«ãªããŸããã ãããã®æåã¯ã³ã³ãã€ã«æéã§ãâ pcreãšæ¯èŒããŠéçŸå®çã«é·ãæéãããããŸãã 2çªç®ã®ãã€ã³ãã¯ãäžéšã®åŒãã³ã³ãã€ã«äžã«åçŽã«ããªãã£ã«ã¿ãŒã«ã³ã³ãã€ã«ããããšããäºå®ã«é¢é£ããŠããŸãã æãåçŽãªãæªè³ªãªãè¡šçŸã¯ãããšãã°æ¬¡ã®ãšããã§ãã
<a \ s [^>] {0,2048} \ bhref =ïŒïŒïŒ3DïŒïŒãïŒïŒhttps ?: [^> "'\ïŒ] {8,29} [^>"' \ïŒïŒ\ /ïŒïŒ=]ïŒ[^>] {0,2048}>ïŒïŒïŒ[^ <] {0,1024} <ïŒ?! \ / AïŒ[^>] {1,1024}>ïŒ{0ã 99} \ s {0.10}ïŒïŒïŒ\ 1ïŒhttpsïŒ[^ \ W <] {1,3} [^ <] {5}
çµæãšããŠãåŒãã³ã³ãã€ã«ããåã«ããã¹ãŠã®äºåãã£ã«ã¿ãŒåŒãæåã«ããã»ã¹ã®å¥ã®ãã©ãŒã¯ã§ãã§ãã¯ãããŸããã åŒã®ã³ã³ãã€ã«ãé·ãããå Žåããã®ããã»ã¹ã¯ç ŽãããåŒã¯çµ¶æçãšããŒã¯ãããŸããã€ãŸããåžžã«pcreã䜿çšãããŸãã ãããã®è¡šçŸã®çŽ4000ãèŠã€ãããŸããã ãããã¯ãã¹ãŠç§ã®å€§å¥œããªspamassassin'aããæ¥ãŠããããè³ã®perlããšåŒã°ããç æ°ã®éåžžã«ç¹åŸŽçãªç£ç©ã§ãã Intelãšã³ãžãã¢ãšã®ããçšåºŠã®ã³ãã¥ãã±ãŒã·ã§ã³ã®åŸã圌ãã¯ç¡éã®ã³ã³ãã€ã«ãä¿®æ£ããŸããããäžèšã®æ£èŠè¡šçŸã¯ãŸã 1åçšåºŠã§ã³ã³ãã€ã«ãããŸãããå®éã®ç®çã«ã¯åãå ¥ããããŸããã
ãã€ããŒã¹ãã£ã³ãæ©èœããããã«ã¯ãåŒã®ã»ããããããããã¯ã©ã¹ãã«å解ããå¿ èŠãããããšãå€æããŸãããããã¯ãã»ããããã®æ£èŠè¡šçŸãä»ããŠãã§ãã¯ãããå ¥åããã¹ãã®ã¿ã€ãã§ãã ãã®ãããªã¯ã©ã¹ã次ã®å³ã«ç€ºããŸãã
ã³ã³ãã€ã«ã«ã¯ã次ã®ã¢ãããŒãã䜿çšããŸããã
- å¥ã®ããã»ã¹ã§ãåŒã¯ã©ã¹ã®ãã§ãã¯ãéå§ããã¯ã©ã¹ããšã«ãã³ã³ãã€ã«æžã¿ã®æå¹ãªãã¡ã€ã«ãæ¢ã«ååšãããã©ããã調ã¹ãŸãã
- ãã®ãããªãã¡ã€ã«ããªãå ŽåããŸãã¯ééã£ãåŒã®ã»ãããå«ãŸããŠããå ŽåïŒåŒãã¿ãŒã³ã®ããã·ã¥ã«ãã£ãŠãã§ãã¯ãããŸãïŒãäºåãã£ã«ã¿ãŒã®ã³ã³ãã€ã«æéããã§ãã¯ããŠã³ã³ãã€ã«ããŸãã
- ãã¹ãŠã®åŒããã£ãã·ã¥ã§ã³ã³ãã€ã«ããããšããã¹ãŠã®ããã»ã¹ã«ã¡ãã»ãŒãžãã¹ãã£ããŒã«éä¿¡ããã¹ãã£ããŒããã£ãã·ã¥åŒãèªã¿èŸŒã¿ãpcreããhyperscanã«åãæ¿ããããšãåä¿¡ããŸããã
ãã®ã¢ãããŒãã«ãããé«äŸ¡ã§é·ããã€ããŒã¹ãã£ã³ã³ã³ãã€ã«ãåŸ ã€ã®ã§ã¯ãªããïŒpcreã䜿çšããŠïŒéå§çŽåŸã«ã¡ãŒã«ã®ãã§ãã¯ãéå§ããã³ã³ãã€ã«ãå®äºãããããã«pcreãããã€ããŒã¹ãã£ã³ã«åãæ¿ããŸãïŒãã£ãã·ã¥ãã¹ã¯ãé¢ããã«åŒã³åºãããŸãïŒã ãŸãããã§ãã¯ããã³ããªã¬ãŒãããæ£èŠè¡šçŸã«ãããã»ããã䜿çšããããšã«ãããã¹ãã£ã³ã§æ¢ã«å®è¡ãããŠããæåã®åŠçãåãæ¿ãããšãã«éªéãããªãããã«ããããšãã§ããŸãã
çµè«
rspamd + hyperscanã®éçšã§ã次ã®ããã«ãªããŸããã
ããã¯ïŒ
lenïŒ610591ãtimeïŒ2492.457ms realã882.251ms virtual æ£èŠè¡šçŸã®çµ±èšïŒ4095 pcreæ£èŠè¡šçŸãã¹ãã£ã³ããã18åã®æ£èŠè¡šçŸãäžèŽããpcreã䜿çšããŠ694Mãã€ããã¹ãã£ã³ãããŸãã
次ã®ããã«ãªããŸããïŒ
lenïŒ610591ãtimeïŒ654.596msãªã¢ã«ã¿ã€ã ã309.785msããŒãã£ã« æ£èŠè¡šçŸã®çµ±èšïŒ34 pcreæ£èŠè¡šçŸãã¹ãã£ã³ããã41åã®æ£èŠè¡šçŸãäžèŽã8.41Mãã€ããpcreã䜿çšããŠã¹ãã£ã³ããã ã¹ãã£ã³ãããåèš9.56Mãã€ã
ãã€ããŒã¹ãã£ã³ããŒãžã§ã³ã§äžèŽããæ£èŠè¡šçŸã®æ°ãå€ãã®ã¯ãpcreã«å¯ŸããŠå®è¡ãããæ§æããªãŒæé©åãåå ã§ããããã€ããŒã¹ãã£ã³ã®å Žåã¯åœ¹ã«ç«ã¡ãŸããïŒãã¹ãŠã®åŒãåæã«ãã§ãã¯ãããããïŒã
HyperscanããŒãžã§ã³ã¯ãã§ã«çç£ãããŠãããrspamdã®æ°ããããŒãžã§ã³ã«å«ãŸããŠããŸãã æ£èŠè¡šçŸïŒDPIããããã·ãªã©ïŒããã§ãã¯ããããã®ããã©ãŒãã³ã¹ã¯ãªãã£ã«ã«ãªãããžã§ã¯ããããã³ããã¹ãå ã®éçãªè¡ãèŠã€ããããã®ã¢ããªã±ãŒã·ã§ã³ã«ã€ããŠããã€ããŒã¹ãã£ã³ãèªä¿¡ãæã£ãŠã¢ããã€ã¹ã§ããŸãã
æåŸã®ã¿ã¹ã¯ã§ã¯ããã€ããŒã¹ãã£ã³ãšãåŸæ¥ãã®ãããªç®çã§äœ¿çšãããŠããaho-corasickã¢ã«ãŽãªãºã ãæ¯èŒããŸããã Mischa Sandbergã§ç¥ã£ãŠããæéã®å®è£ ãæ¯èŒããŸããã
ãã®ãããªè¡ãå€æ°ããã¡ã¬ãã€ãæåã®1äžè¡ã®éçè¡ã®æ¯èŒçµæïŒã€ãŸããè€é床OïŒM + NïŒã®aho-corrasicã®ææªã®æ¡ä»¶ãããã§Mã¯èŠã€ãã£ãè¡ã®æ°ïŒïŒ
åäœã³ã³ãã€ã«æéïŒ0.0743811 ãã€ããŒã¹ãã£ã³ã®ã³ã³ãã€ã«æéïŒ0.1547; çŽreïŒ0; åèšæ¥æïŒ7400 ãã€ããŒã¹ãã£ã³ã®ã·ãªã¢ã«åæéïŒ0.000178297; ãµã€ãºïŒ1250856ãã€ã ãã€ããŒã¹ãã£ã³ãã·ãªã¢ã©ã€ãºæéïŒ0.000312379; ãµã€ãºïŒ1250856ãã€ã ãã€ããŒã¹ãã£ã³ã®åæéïŒ0.117938; äžèŽïŒ3001024 è¡åã®æéïŒ0.100427; ãããïŒ3000144
æ®å¿µãªãããac-trieã³ãŒãã®ãšã©ãŒãåå ã§ãããæ°ãåæããªãã£ãã®ã«å¯Ÿãããã€ããŒã¹ãã£ã³ã¯ãã¹ãç¯ããŠããŸããã§ããã
ãŸãããã®èšäºã®è³æã¯ãã¬ãŒã³ããŒã·ã§ã³ã§å ¥æã§ããŸãã ã³ãŒãã¯ãgithubã® rspamdãããžã§ã¯ãã§èŠãããšãã§ããŸã