
ãã®èŠä»¶ãæºããããã«ãã»ãšãã©ã®åæ£ã·ã¹ãã ã¯å°ãªããšã1åã®é ä¿¡ãä¿èšŒããŠããŸãã ãå°ãªããšã1åéãã®é ä¿¡ããæäŸããæè¡ã¯ãéåžžãåè©Šè¡ãåè©Šè¡ãããã³åè©Šè¡ãã«ãªããŸãã ã¯ã©ã€ã¢ã³ãããæ確ãªç¢ºèªãåãåããŸã§ãã¡ãã»ãŒãžãé ä¿¡ããããšã¯èããŸããã
ãããããŠãŒã¶ãŒãšããŠã å°ãªããšã1åéãã®é ä¿¡ã¯ãç§ãæãã§ãããã®ãšã¯ãŸã£ããç°ãªããŸãã ã¡ãã»ãŒãžã1åé ä¿¡ãããã§ãã ãããŠäžåºŠã ãã
æ®å¿µãªããã æ£ç¢ºã«1åéãã®é ä¿¡ã«è¿ããã®ãå®çŸããã«ã¯ãäžå¯è§£ãªèšèšãå¿ èŠã§ã ã ã¢ãŒããã¯ãã£ã§ã¯ãé害ã®åã±ãŒã¹ãæ éã«æ€èšããå¿ èŠããããŸã-é害ãçºçããåŸãæ¢åã®å®è£ ã®äžéšãšããŠåçŽã«ç»é²ããããšã¯ã§ããŸããã ãããŠãããã§ããã¡ãã»ãŒãžãäžåºŠã ãé ä¿¡ãããã·ã¹ãã ãå®è£ ããããšã¯ã»ãšãã©äžå¯èœã§ãã
éå»3ãæéãå®å šã«æ°ããéè€æé€ã·ã¹ãã ãéçºããã§ããã ãå€ãã®ããŸããŸãªé害ã«çŽé¢ãã1åéãã®é ä¿¡ã«å¯èœãªéãè¿ã¥ããŸããã
æ°ããã·ã¹ãã ã¯ãå€ãã·ã¹ãã ã®100åã®ã¡ãã»ãŒãžã远跡ã§ããŸãããä¿¡é Œæ§ã®åäžãšã³ã¹ãã®åæžã«ããç°ãªããŸãã æ¹æ³ã¯æ¬¡ã®ãšããã§ãã
åé¡
ã»ãšãã©ã®ã»ã°ã¡ã³ãããã¯ãšã³ãã·ã¹ãã ã¯ãåè©Šè¡ãã¡ãã»ãŒãžã®åéä¿¡ãããããã³ã°ãããã³2ã¹ããŒãžã³ãããã§é害ãé©åã«åŠçããŸãã ãã ãã1ã€ã®æ³šç®ãã¹ãäŸå€ããããŸã ã å ¬éAPIã«ããŒã¿ãçŽæ¥éä¿¡ããã¯ã©ã€ã¢ã³ãã§ã ã
ã¯ã©ã€ã¢ã³ãïŒç¹ã«ã¢ãã€ã«ã¯ã©ã€ã¢ã³ãïŒã¯ãããŒã¿ãéä¿¡ã§ãããšãã«éä¿¡ã®äžæããã°ãã°çµéšããŸãããAPIããã®å¿çãã¹ãããããŸãã
ããªãããã¹ã«ä¹ããiPhoneã®HotelTonightã¢ããªããéšå±ãäºçŽãããšæ³åããŠãã ããã ã¢ããªã±ãŒã·ã§ã³ã¯ã»ã°ã¡ã³ããµãŒããŒãžã®ããŒã¿ã®ã¢ããããŒããéå§ããŸããããã¹ãçªç¶ãã³ãã«ã«å ¥ããæ¥ç¶ã倱ãããŸãã éä¿¡ããã€ãã³ãã®äžéšã¯æ¢ã«åŠçãããŠããŸãããã¯ã©ã€ã¢ã³ãã¯ãµãŒããŒããã®å¿çãåä¿¡ããŸããã
ãã®ãããªå ŽåããµãŒããŒã以åã«ãŸã£ããåãã¡ãã»ãŒãžãæè¡çã«æ¢ã«åä¿¡ããŠããã«ãããããããã¯ã©ã€ã¢ã³ãã¯åãã€ãã³ããã»ã°ã¡ã³ãAPIã«ç¹°ãè¿ãéä¿¡ããŸãã
ãµãŒããŒã®çµ±èšããå€æãããšãéå»4é±éã«åä¿¡ããã€ãã³ãã®çŽ0.6ïŒ ã¯ãæ¢ã«åä¿¡ããã¡ãã»ãŒãžã®ç¹°ãè¿ãã§ãã
ãšã©ãŒã®ã¬ãã«ã¯åãã«è¶³ããªãããã«èŠãããããããŸããã ããããæ°ååãã«ã®åçãçã¿åºãeã³ããŒã¹ã¢ããªã±ãŒã·ã§ã³ã®å Žåã 0.6ïŒ ã®å·®ã¯ãæ°çŸäžãã«ã®å©çãšæ倱ã®å·®ãæå³ããå ŽåããããŸãã
ã¡ãã»ãŒãžéè€æé€
ãããã£ãŠãåé¡ã®æ¬è³ªãç解ããŠããŸããAPIã«éä¿¡ãããéè€ã¡ãã»ãŒãžãåé€ããå¿ èŠããããŸãã ãããããããè¡ãæ¹æ³ã¯ïŒ
çè«ã¬ãã«ã§ã¯ãéè€æé€ã·ã¹ãã ã®é«ã¬ãã«APIã¯åçŽã«èŠããŸãã PythonïŒ å¥åpseudo-pseudo-code ïŒã§ã¯ã次ã®ããã«è¡šçŸã§ããŸãã
def dedupe(stream): for message in stream: if has_seen(message.id): discard(message) else: publish_and_commit(message)
ã¹ããªãŒã å ã®åã¡ãã»ãŒãžã«ã€ããŠããã®ã¡ãã»ãŒãžã以åã«ïŒäžæã®èå¥åã«ãã£ãŠïŒæ€åºããããã©ãããæåã«ãã§ãã¯ãããŸãã äŒã£ããããããæšãŠãªããã äŒã£ãŠããªãå Žåã¯ãã¡ãã»ãŒãžãåãªãªãŒã¹ããŠã¢ãããã¯ã«è»¢éããŸãã
ãã¹ãŠã®ã¡ãã»ãŒãžãæ°žç¶çã«ä¿åããªãããã«ãæå¹æéãŸã§ã®ããŒã®ä¿åæéãšããŠå®çŸ©ããããéè€æé€ãŠã£ã³ããŠããåäœããŸãã ã¡ãã»ãŒãžããŠã£ã³ããŠã«åãŸããªãå Žåããããã¯å»æ¢ãšèŠãªãããŸãã ãã®IDãæã€1ã€ã®ã¡ãã»ãŒãžã®ã¿ããŠã£ã³ããŠã§éä¿¡ãããããã«ããŸãã
ãã®åäœã¯ç°¡åã«èª¬æã§ããŸãããç¹å¥ãªæ³šæãå¿ èŠãªè©³çŽ°ã2ã€ãããŸã ã èªã¿åã/æžã蟌ã¿ããã©ãŒãã³ã¹ãšç²ŸåºŠã§ãã
ã·ã¹ãã ã¯ãããŒã¿ã¹ããªãŒã å ã®æ°ååã®ã€ãã³ããéè€æé€ããåæã«äœé 延ã§è²»çšå¯Ÿå¹æã®é«ãæ¹æ³ã§éè€æé€ãè¡ããŸãã
ããã«ãç»é²ãããã€ãã³ãã«é¢ããæ å ±ã確å®ã«ä¿åãããé害ãçºçããå Žåã«åŸ©å ã§ããããã«ãããã£ã¹ãã¬ã€ã«ã¡ãã»ãŒãžãç¹°ãè¿ã衚瀺ãããªãããã«ããå¿ èŠããããŸãã
建ç¯
ãããå®çŸããããã«ãKafkaããããŒã¿ãèªã¿åãã4é±éã®ãŠã£ã³ããŠã«æ¢ã«èšé²ãããŠããéè€ã€ãã³ããåé€ããã2段éãã¢ãŒããã¯ãã£ãäœæããŸããã

é«ã¬ãã«éè€æé€ã¢ãŒããã¯ãã£
ã«ãã«ããããžãŒ
ãã®ã¢ãŒããã¯ãã£ãã©ã®ããã«æ©èœããããç解ããã«ã¯ããŸãKafkaãããŒããããžãèŠãŠãã ããã ãã¹ãŠã®çä¿¡APIåŒã³åºãã¯åå¥ã®ã¡ãã»ãŒãžã«åå²ãããKafkaå ¥åã»ã¯ã·ã§ã³ãæ確ã«è¡šããŸãã
æåã«ãåçä¿¡ã¡ãã»ãŒãžã¯ãã¯ã©ã€ã¢ã³ãåŽã§çæãããäžæã®
messageId
ã§ããŒã¯ãããŸãã ããã¯éåžžUUIDv4ã§ãïŒãã ããksuidãžã®åãæ¿ããæ€èšããŠããŸã ïŒã ã¯ã©ã€ã¢ã³ããmessageIdãå ±åããªãå ŽåãAPIã¬ãã«ã§èªåçã«å²ãåœãŠãŸãã
ã¯ã©ã€ã¢ã³ãåŽãè€éã«ããããªãããããã¯ã¿ãŒã¯ããã¯ãã·ãªã¢ã«çªå·ã¯äœ¿çšããŸããã UUIDã䜿çšãããšãã»ãŒãã¹ãŠã®äž»èŠãªããã°ã©ãã³ã°èšèªãUUIDããµããŒãããããã 誰ã§ãç°¡åã«APIã«ããŒã¿ãéä¿¡ã§ããŸãã
{ "messageId": "ajs-65707fcf61352427e8f1666f0e7f6090", "anonymousId": "e7bd0e18-57e9-4ef4-928a-4ccc0b189d18", "timestamp": "2017-06-26T14:38:23.264Z", "type": "page" }
èä¹ æ§ãšåçŸæ§ã®ããã«ãåå¥ã®ã¡ãã»ãŒãžãKafkaãžã£ãŒãã«ã«èšé²ãããŸãã ãããã¯messageIdã«ãã£ãŠé ä¿¡ããããããåã
messageId
åžžã«åããã³ãã©ãŒã«å°çããããšã確èªã§ããŸãã
ããã¯ãããŒã¿åŠçã«é¢ããŠéèŠãªè©³çŽ°ã§ãã æ°ååã®ã¡ãã»ãŒãžã®äžããããŒã®äžå€®ããŒã¿ããŒã¹ãæ€çŽ¢ãã代ããã«ãæ€çŽ¢ã¯ãšãªãç¹å®ã®ã»ã¯ã·ã§ã³ã«ãªãã€ã¬ã¯ãããã ãã§ãæ€çŽ¢ã¹ããŒã¹ãæ¡éãã«çããããšãã§ããŸããã
éè€æé€ã¯ãŒã«ãŒã¯ãKafkaå ¥åã»ã¯ã·ã§ã³ãèªã¿åãGoããã°ã©ã ã§ãã 圌女ã¯ãã¡ãã»ãŒãžã®èªã¿åããéè€ã®ãã§ãã¯ãããã³ã¡ãã»ãŒãžãæ°ããå Žåã¯Kafkaåºåãããã¯ãžã®éä¿¡ãæ åœããŸãã
ç§ãã¡ã®çµéšã§ã¯ãKafkaã¯ãŒã«ãŒãšããããžã¯ç®¡çãéåžžã«ç°¡åã§ãã ãã§ãŒã«ãªãŒããŒã¬ããªã«ãå¿ èŠãšãã倧èŠæš¡ãªMemcachedã€ã³ã¹ã¿ã³ã¹ã¯ãããããŸããã 代ããã«ãçµã¿èŸŒã¿ã®RocksDBããŒã¿ããŒã¹ã䜿çšããŸãããããã¯èª¿æŽããŸã£ããå¿ èŠãšãããéåžžã«äœäŸ¡æ Œã§æ°žç¶çãªã¹ãã¬ãŒãžãæäŸããŸãã ããã«ã€ããŠè©³ãã説æããŸãã
Worker RocksDB
åã¯ãŒã«ãŒã¯ãããŒã«ã«ã®RocksDBããŒã¿ããŒã¹ãããŒã«ã«ã®EBSããŒããã©ã€ãã«ä¿åããŸã ã RocksDBã¯ã Facebookã«ãã£ãŠéçºãããçµ±åãããããŒãšå€ã®ãªããžããªã§ãã ãéåžžã«é«ãããã©ãŒãã³ã¹ã®ããã«æé©åãããŠããŸãã
å ¥åããŒãã£ã·ã§ã³ããã€ãã³ããååŸããããã³ã«ãã³ã³ã·ã¥ãŒãã¯RocksDBã«ãã®ãããª
messageId
以åã«æ€åº
messageId
ãã©ããã確èªããããèŠæ±ããŸãã
ã¡ãã»ãŒãžãRocksDBã«ãªãå ŽåãããŒãããŒã¿ããŒã¹ã«è¿œå ããŠãããKafkaã®åºåã»ã¯ã·ã§ã³ã§ã¡ãã»ãŒãžãå ¬éããŸãã
ã¡ãã»ãŒãžãæ¢ã«RocksDBã«ããå Žåãã¯ãŒã«ãŒã¯åã«ã¡ãã»ãŒãžãåºåã»ã¯ã·ã§ã³ã«å ¬éãããã¡ãã»ãŒãžãåŠçãããšããéç¥ã§å ¥åã»ã¯ã·ã§ã³ãæŽæ°ããŸãã
æ§èœ
ããŒã¿ããŒã¹ã§é«ãããã©ãŒãã³ã¹ãå®çŸããã«ã¯ãåŠçãããã€ãã³ãããšã«3çš®é¡ã®ã¯ãšãªã«å¯Ÿå¿ããå¿ èŠããããŸãã
- å ¥åã«å°çããããããŒã¿ããŒã¹ã«ä¿åããããã«ãªãã©ã³ãã ãªããŒã®ååšãæ€åºããŸãã ããŒã¹ããŒã¹ã®ã©ãã«ã§ãé 眮ã§ããŸãã
- é«æ§èœã§æ°ããããŒãèšé²ããŸãã
- ãéè€æé€ãŠã£ã³ããŠãã«è©²åœããªãå€ãããŒã宣èšããŸãã
ãã®çµæãããŒã¿ããŒã¹å šäœãç¶ç¶çã«ã¹ãã£ã³ããæ°ããããŒãè¿œå ããå€ãããŒãå»æ¢ããå¿ èŠããããŸãã ãããŠçæ³çã«ã¯ãããã¯å€ãããŒã¿ã¢ãã«ã®ãã¬ãŒã ã¯ãŒã¯å ã§è¡ãããã¹ãã§ãã

ç§ãã¡ã®ããŒã¿ããŒã¹ã¯ãéåžžã«ç°ãªã3ã€ã®ã¿ã€ãã®ã¯ãšãªãæºããå¿ èŠããããŸãã
äžè¬çã«ãããã©ãŒãã³ã¹ã®åäžã®å€§éšåã¯ããŒã¿ããŒã¹ã®ããã©ãŒãã³ã¹ã«ãããã®ã§ãããããã£ãŠãRocksDBããã€ã¹ãææ¡ããããšã¯çã«ããªã£ãŠããŸãã
RocksDBã¯ãã°æ§é ããªãŒïŒLSMããªãŒïŒã§ã ãã€ãŸãããã£ã¹ã¯äžã®å è¡æžã蟌ã¿ãã°ã«æ°ããããŒãç¶ç¶çã«è¿œå ãã memtableã®äžéšãšããŠãœãŒããããããŒãã¡ã¢ãªã«ä¿åããŸãã

ããŒã¯ãmemtableã®äžéšãšããŠã¡ã¢ãªå ã§ãœãŒããããŸã
ããŒã®äœæã¯éåžžã«é«éãªããã»ã¹ã§ãã æ°ããã¢ã€ãã ã¯ãã°ã«è¿œå ããããšã§ãã£ã¹ã¯ã«çŽæ¥æžã蟌ãŸãïŒé害ãçºçããå Žåã®çŽæ¥ä¿åãšå埩ã®ããïŒãããŒã¿ã¬ã³ãŒãã¯ã¡ã¢ãªå ã§ãœãŒããããè¿ éãªæ€çŽ¢ãšéšåèšé²ãæäŸãããŸãã
ååãªæ°ã®ãšã³ããªãmemtableã«å°çãããã³ã«ã SSTable ïŒãœãŒããããè¡ã®ããŒãã«ïŒãšããŠãã£ã¹ã¯ã«ä¿åãããŸãã è¡ã¯ãã§ã«ã¡ã¢ãªå ã§ãœãŒããããŠãããããçŽæ¥ãã£ã¹ã¯ã«ãã©ãã·ã¥ã§ããŸãã

memtableã®çŸåšã®ç¶æ ã¯ãã¬ãã«0ïŒã¬ãã«0ïŒã®SSTableãšããŠãã£ã¹ã¯ã«ãã©ãã·ã¥ãããŸã
äœæ¥ãã°ããã®ãã®ãããªãªã»ããã®äŸã次ã«ç€ºããŸãã
[JOB 40] Syncing log #655020
[default] [JOB 40] Flushing memtable with next log file: 655022
[default] [JOB 40] Level-0 flush table #655023: started
[default] [JOB 40] Level-0 flush table #655023: 15153564 bytes OK
[JOB 40] Try to delete WAL files size 12238598, prev total WAL file size 24346413, number of live WAL files 3.
åSSTableããŒãã«ã¯å€æŽãããªããŸãŸã§ã-äœæåŸãå€æŽãããããšã¯ãããŸãã-ããã«ãããæ°ããããŒã®æžã蟌ã¿ã¯éåžžã«è¿ éã«è¡ãããŸãã ãã¡ã€ã«ãæŽæ°ããå¿ èŠã¯ãªããã¬ã³ãŒãã¯æ°ããã¬ã³ãŒããçæããŸããã 代ããã«ãåããã¬ãã«ãã«ããè€æ°ã®SSTableããŒãã«ã¯ã垯åå€å§çž®ãã§ãŒãºäžã«åäžã®ãã¡ã€ã«ã«ããŒãžãããŸãã

åã ã®SSTableããŒãã«ã1ã€ã®ã¬ãã«ããå§çž®ããããšããããã®ããŒãããŒãžãããæ°ãããã¡ã€ã«ãããé«ãã¬ãã«ã«è»¢éãããŸãã ãã®ãããªã·ãŒã«ã®äŸã¯ãäœæ¥ãã°ã«èšèŒãããŠããŸãã ãã®å Žåãããã»ã¹41ã¯4ã€ã®ãŒãã¬ãã«ãã¡ã€ã«ãå§çž®ããããããçµåããŠãã倧ããªç¬¬1ã¬ãã«ãã¡ã€ã«ã«ããŸãã
/data/dedupe.db$ head -1000 LOG | grep "JOB 41"
[JOB 41] Compacting 4@0 + 4@1 files to L1, score 1.00
[default] [JOB 41] Generated table #655024: 1550991 keys, 69310820 bytes
[default] [JOB 41] Generated table #655025: 1556181 keys, 69315779 bytes
[default] [JOB 41] Generated table #655026: 797409 keys, 35651472 bytes
[default] [JOB 41] Generated table #655027: 1612608 keys, 69391908 bytes
[default] [JOB 41] Generated table #655028: 462217 keys, 19957191 bytes
[default] [JOB 41] Compacted 4@0 + 4@1 files to L1 => 263627170 bytes
èŠçŽãå®äºãããšãçµåãããSSTableããŒãã«ã¯ããŒã¿ããŒã¹ã¬ã³ãŒãã®æçµã»ããã«ãªããå€ãSSTableããŒãã«ã¯ãã§ãŒã³è§£é€ãããŸãã
äœæ¥ã€ã³ã¹ã¿ã³ã¹ãèŠããšããã®äž»èŠãªæžã蟌ã¿ãã°ãã©ã®ããã«æŽæ°ãããããåã ã®SSTableããŒãã«ãã©ã®ããã«æžã蟌ãŸããèªã¿åãããããŒãžãããããããããŸãã

ãžã£ãŒãã«ããã³ææ°ã®SSTableããŒãã«ã¯ãI / Oæäœã®å€§éšåãå ããŠããŸã
æ¬çªãµãŒããŒã§SSTableçµ±èšãèŠããšã4ã€ã®ãã¬ãã«ãã®ãã¡ã€ã«ããããåã¬ãã«ã§ãã¡ã€ã«ãµã€ãºã倧ãããªã£ãŠããŸãã
** Compaction Stats [default] ** Level Files Size(MB} Score Read(GB} Rn(GB} Rnp1(GB} Write(GB} Wnew(GB} Moved(GB} W-Amp -------------------------------------------------------------------------------------------- L0 1/0 14.46 0.2 0.0 0.0 0.0 0.1 0.1 0.0 0.0 L1 4/0 194.95 0.8 0.5 0.1 0.4 0.5 0.1 0.0 4.7 L2 48/0 2551.71 1.0 1.4 0.1 1.3 1.4 0.1 0.0 10.7 L3 351/0 21735.77 0.8 2.0 0.1 1.9 1.9 -0.0 0.0 14.3 Sum 404/0 24496.89 0.0 3.9 0.4 3.5 3.9 0.3 0.0 34.2 Int 0/0 0.00 0.0 3.9 0.4 3.5 3.9 0.3 0.0 34.2
Rd(MB/s} Wr(MB/s} Comp(sec} Comp(cnt} Avg(sec} KeyIn KeyDrop 0.0 15.6 7 8 0.925 0 0 20.9 20.8 26 2 12.764 12M 40 19.4 19.4 73 2 36.524 34M 14 18.1 16.9 112 2 56.138 52M 3378K 18.2 18.1 218 14 15.589 98M 3378K 18.2 18.1 218 14 15.589 98M 3378K
RocksDB ã¯ãç¹å®ã®SSTableããŒãã«ã®ã€ã³ããã¯ã¹ãšãã«ãŒã ãã£ã«ã¿ãŒããããã®ããŒãã«èªäœã«ä¿åããã¡ã¢ãªã«ããŒãããŸãã ãããã®ãã£ã«ã¿ãŒãšã€ã³ããã¯ã¹ã¯ãç¹å®ã®ããŒãèŠã€ããããã«ããŒãªã³ã°ãããå®å šãªSSTableããŒãã«ãLRUã®äžéšãšããŠã¡ã¢ãªã«ããŒããããŸãã
ã»ãšãã©ã®å Žåãå€å žçãªéè€æé€ã·ã¹ãã ããã«ãŒã ãã£ã«ã¿ãŒã䜿çšããå€å žçãªã±ãŒã¹ã«ããæ°ããã¡ãã»ãŒãžã衚瀺ãããŸãã
ãã«ãŒã ãã£ã«ã¿ãŒã¯ãããŒããããããè€æ°åœ¢ãããééããªãè€æ°åœ¢ããã瀺ããŸãã çããåºãããã«ããã£ã«ã¿ãŒã¯ã以åã«çºçããåèŠçŽ ã«ç°ãªãããã·ã¥é¢æ°ãé©çšããåŸãå€ãã®ããããç¯çŽããŸãã ããã·ã¥é¢æ°ã®ãã¹ãŠã®ããããã»ããã«åæããå Žåããããããã»ããã«å±ããŠããããšããçããè¿ãããŸãã

ã»ããã«{xãyãz}ã®ã¿ãå«ãŸããå Žåããã«ãŒã ãã£ã«ã¿ãŒã®æåwãç §äŒããŸãã ãããã®1ã€ãåæããªãããããã£ã«ã¿ãŒã¯ãã»ããã«å±ããŠããŸããããšããçããè¿ããŸãã
çãããããããã»ããã«å±ãããå ŽåãRocksDBã¯SSTableããŒãã«ãããœãŒã¹ããŒã¿ãèŠæ±ããèŠçŽ ãå®éã«ã»ããã«ååšãããã©ãããå€æã§ããŸãã ããããã»ãšãã©ã®å Žåããã£ã«ã¿ãŒã¯ãééããªãã»ããã«å±ããŠããŸããããšããçããè¿ããããäžè¬ã«ããŒãã«ã®ã¯ãšãªãåé¿ã§ããŸãã
RocksDBã«ç®ãåãããšãèŠæ±ããé¢é£ãããã¹ãŠã®
messageId
ã«å¯ŸããŠMultiGetèŠæ±ãäœæããŸãã ããã©ãŒãã³ã¹ãšå€æ°ã®åæãããã¯æäœãåé¿ããããã«ãããã±ãŒãžã®äžéšãšããŠäœæããŸãã ãŸããKafkaããã®ããŒã¿ãããã±ãŒãžåããããšãã§ããéåžžã¯é£ç¶æžã蟌ã¿ãåªå ããŠã©ã³ãã æžã蟌ã¿ãåé¿ããŸãã
ããã¯ãèªã¿åã/æžã蟌ã¿ã¿ã¹ã¯ãã©ã®ããã«é«ãããã©ãŒãã³ã¹ãçºæ®ãããã説æããŸãããå€ãããŒã¿ãã©ã®ããã«é³è åãããšã¿ãªãããããšããçåãæ®ã£ãŠããŸãã
åé€ïŒæéã§ã¯ãªããµã€ãºã«ã¹ããã
éè€æé€ããã»ã¹ã§ã¯ãã·ã¹ãã ãå³å¯ãªãéè€æé€ãŠã£ã³ããŠãã«å¶éãããããã£ã¹ã¯äžã®ããŒã¿ããŒã¹ã®åèšãµã€ãºã«å¶éãããã決å®ããå¿ èŠããããŸãã
ãã¹ãŠã®ãŠãŒã¶ãŒã®éè€æé€ã«ããã·ã¹ãã ã¯ã©ãã·ã¥ãåé¿ããããã«ã æéééã®å¶éã§ã¯ãªããµã€ãºã® å¶éãéžæããããšã«ããŸããã ããã«ãããåRocksDBã€ã³ã¹ã¿ã³ã¹ã®æ倧ãµã€ãºãèšå®ããçªç¶ã®ãžã£ã³ããè² è·ã®å¢å ã«å¯ŸåŠã§ããŸãã å¯äœçšã¯ãæéééã24æéæªæºã«ççž®ã§ããããšã§ãããã®å¢çã§ã¯ãåœçŽã®ãšã³ãžãã¢ãåŒã³åºãããŸãã
RocksDBã®å€ãããŒã¯ããµã€ãºãç¡å¶éã«å€§ãããªãã®ãé²ãããã«ãå®æçã«å»æ¢ãšå®£èšããŸãã ãããè¡ãããã«ãæãå€ãããŒãæåã«åé€ã§ããããã«ã ã»ã«ã³ããªããŒã€ã³ããã¯ã¹ãã·ãªã¢ã«çªå·ã§ä¿åããŸãã
ããŒã¿ããŒã¹ãéããšãã«åºå®TTLãç¶æããå¿ èŠãããRocksDB TTLã䜿çšãã代ããã«ããã¹ããããåããŒã®ã·ãªã¢ã«çªå·ã§ãªããžã§ã¯ãèªäœãåé€ããŸãã
ã·ãªã¢ã«çªå·ã¯ã»ã«ã³ããªã€ã³ããã¯ã¹ãšããŠä¿åããããããã·ãªã¢ã«çªå·ããã°ããèŠæ±ããåé€æžã¿ãšããŠãããŒã¯ãããããšãã§ããŸãã ã·ãªã¢ã«çªå·ãæž¡ããåŸã®åé€é¢æ°ã¯æ¬¡ã®ãšããã§ãã
func (d *DB) delete(n int) error { // open a connection to RocksDB ro := rocksdb.NewDefaultReadOptions() defer ro.Destroy() // find our offset to seek through for writing deletes hint, err := d.GetBytes(ro, []byte("seek_hint")) if err != nil { return err } it := d.NewIteratorCF(ro, d.seq) defer it.Close() // seek to the first key, this is a small // optimization to ensure we don't use `.SeekToFirst()` // since it has to skip through a lot of tombstones. if len(hint) > 0 { it.Seek(hint) } else { it.SeekToFirst() } seqs := make([][]byte, 0, n) keys := make([][]byte, 0, n) // look through our sequence numbers, counting up // append any data keys that we find to our set to be // deleted for it.Valid() && len(seqs) < n { k, v := it.Key(), it.Value() key := make([]byte, len(k.Data())) val := make([]byte, len(v.Data())) copy(key, k.Data()) copy(val, v.Data()) seqs = append(seqs, key) keys = append(keys, val) it.Next() k.Free() v.Free() } wb := rocksdb.NewWriteBatch() wo := rocksdb.NewDefaultWriteOptions() defer wb.Destroy() defer wo.Destroy() // preserve next sequence to be deleted. // this is an optimization so we can use `.Seek()` // instead of letting `.SeekToFirst()` skip through lots of tombstones. if len(seqs) > 0 { hint, err := strconv.ParseUint(string(seqs[len(seqs)-1]), 10, 64) if err != nil { return err } buf := []byte(strconv.FormatUint(hint+1, 10)) wb.Put([]byte("seek_hint"), buf) } // we not only purge the keys, but the sequence numbers as well for i := range seqs { wb.DeleteCF(d.seq, seqs[i]) wb.Delete(keys[i]) } // finally, we persist the deletions to our database err = d.Write(wo, wb) if err != nil { return err } return it.Err() }
ããã«é«ãæžã蟌ã¿é床ãä¿èšŒããããã«ãRocksDBã¯ããã«è¿ãããããŒãåé€ããŸããïŒSSTableããŒãã«ã¯äžå€ã§ãïŒïŒã 代ããã«ãRocksDBã¯ãå¢ç³ããããŒã«è¿œå ããŸããããã¯ãããŒã¹ã®å§çž®ããã»ã¹äžã«åé€ãããŸãã ãããã£ãŠãã·ãŒã±ã³ã·ã£ã«æžã蟌ã¿æäœäžã«ã¬ã³ãŒããè¿ éã«å»æ¢ããå€ãã¢ã€ãã ãåé€ããéã®ã¡ã¢ãªã®è©°ãŸããåé¿ã§ããŸãã
ããŒã¿ã®æ£ç¢ºæ§
æ°ååã®ã¡ãã»ãŒãžã®é床ãã¹ã±ãŒãªã³ã°ãå®äŸ¡ãªæ€çŽ¢ãã©ã®ããã«æäŸããããã«ã€ããŠã¯ãã§ã«èª¬æããŸããã æåŸã®æçãæ®ã£ãŠããŸã-ããŸããŸãªé害ãçºçããå Žåã«ããŒã¿ã®æ£ç¢ºæ§ã確ä¿ããæ¹æ³ã
EBSã¹ãããã·ã§ãããšã¢ããªã±ãŒã·ã§ã³
ããã°ã©ããŒã®ãšã©ãŒãŸãã¯EBSã®èª€åäœã«ããæå·ããRocksDBã€ã³ã¹ã¿ã³ã¹ãä¿è·ããããã«ãåããŒããã©ã€ãã®ã¹ãããã·ã§ãããå®æçã«ååŸããŸãã EBSã¯ããèªäœã§è€è£œãããŸããããã®æ¹æ³ã¯äœããã®å éšã¡ã«ããºã ã«ãã£ãŠåŒãèµ·ããããæå·ããä¿è·ããŸãã ç¹å®ã®ã€ã³ã¹ã¿ã³ã¹ãå¿ èŠãªå Žåãã¯ã©ã€ã¢ã³ããäžæåæ¢ã§ããŸãããã®æç¹ã§ã察å¿ããEBSãã£ã¹ã¯ãããŠã³ã解é€ãããæ°ããã€ã³ã¹ã¿ã³ã¹ã«åæ¥ç¶ãããŸãã ããŒãã£ã·ã§ã³IDãå€æŽããªãéãããã©ã€ãã®åæ¥ç¶ã¯ãããŒã¿ãæ£ããããšãä¿èšŒããå®å šã«ç°¡åãªæé ã®ãŸãŸã§ãã
ã¯ãŒã«ãŒã«é害ãçºçããå Žåãã¡ãã»ãŒãžã倱ããªãããã«ãRocksDBã«çµ±åãããå è¡æžã蟌ã¿ãã°ã«äŸåããŠããŸãã RocksDBãã¡ãã»ãŒãžããã°ã«ç¢ºå®ã«ä¿åããããšãä¿èšŒããããŸã§ãå ¥åã»ã¯ã·ã§ã³ããã®ã¡ãã»ãŒãžã¯èš±å¯ãããŸããã
åºåã»ã¯ã·ã§ã³ã®èªã¿åã
ãã®æç¹ãŸã§ãã¡ãã»ãŒãžãå³å¯ã«äžåºŠã ãé ä¿¡ãããããšãä¿èšŒã§ãããã¢ãããã¯ãªãã¹ãããããªãã£ãããšã«ãæ°ã¥ããããããŸããã ãã€ã§ããã¯ãŒã«ãŒãã¯ã©ãã·ã¥ããå¯èœæ§ããããŸãïŒRocksDBãžã®æžã蟌ã¿æãåºåã»ã¯ã·ã§ã³ãžã®çºè¡æããŸãã¯åä¿¡ã¡ãã»ãŒãžã®ç¢ºèªæã
ãããã®åå¥ã·ã¹ãã ãã¹ãŠã®ãã©ã³ã¶ã¯ã·ã§ã³ãäžæã«ã«ããŒããã¢ãããã¯ãªãåºå®ããã€ã³ããå¿ èŠã§ãã ããŒã¿ã«ã¯äœããã®ãçå®ã®æºããå¿ èŠã§ãã
ããã¯ãåºåã»ã¯ã·ã§ã³ããã®èªã¿åããæ©èœããå Žæã§ãã
äœããã®çç±ã§ã¯ãŒã«ãŒãã¯ã©ãã·ã¥ããããKafkaã§ãšã©ãŒãçºçããŠåèµ·åããå Žåãæåã®ã¹ãããã¯ãã€ãã³ããçºçãããã©ããããçå®ã®ãœãŒã¹ãã§ç¢ºèªããããšã§ãããã®ãœãŒã¹ã¯åºåã»ã¯ã·ã§ã³ã§ãã
åºåã»ã¯ã·ã§ã³ã§ã¡ãã»ãŒãžãèŠã€ãã£ãããRocksDBã§ã¯èŠã€ãããªãã£ãå ŽåïŒããã³ãã®éïŒãéè€æé€ã¯ãŒã«ãŒã¯å¿ èŠãªç·šéãè¡ã£ãŠããŒã¿ããŒã¹ãšRocksDBãåæããŸãã åºæ¬çã«ãåºåããŒãã£ã·ã§ã³ãå è¡æžã蟌ã¿ãã°ãšç©¶æ¥µã®çå®ã®ãœãŒã¹ãšåæã«äœ¿çšããRocksDBãããããã£ããã£ããŠæ€èšŒããŸãã
å®éã®ä»äºã§
çŸåšãåœç€Ÿã®éè€æé€ã·ã¹ãã ã¯å®çšŒåã§3ãæ皌åããŠããããã®çµæã«éåžžã«æºè¶³ããŠããŸãã æ°åã®å Žåã次ã®ããã«ãªããŸãã
- RocksDBã®ãã£ã¹ã¯ã«ä¿åããã1.5 TBã®ããŒ
- å€ãããŒãå»æ¢ããåã®4é±éã®éè€æé€ãŠã£ã³ããŠ
- RocksDBã€ã³ã¹ã¿ã³ã¹ã«ä¿åãããçŽ600åã®ããŒ
- 2,000åã®ã¡ãã»ãŒãžãéè€æé€ã·ã¹ãã ãééãã
ã·ã¹ãã å šäœã¯ãé«éã§å¹ççã§èé害æ§ããããŸãããéåžžã«ã·ã³ãã«ãªã¢ãŒããã¯ãã£ã§ãã
ç¹ã«ãã·ã¹ãã ã®2çªç®ã®ããŒãžã§ã³ã«ã¯ãå€ãéè€æé€ã·ã¹ãã ã«æ¯ã¹ãŠå€ãã®å©ç¹ããããŸãã
以åã¯ããã¹ãŠã®ããŒãMemcachedã«ä¿åããã¢ãããã¯æŒç®åã䜿çšããŠCASïŒcheck-and-setïŒã¬ã³ãŒãã®å€ã確èªããã³èšå®ããååšããªãããŒãèšå®ããŸããã Memcachedã¯ããŒã®å ¬éã«ãããåºå®ç¹ããã³ãååæ§ããšããŠæ©èœããŸããã
ãã®ãããªã¹ããŒã ã¯éåžžã«ããŸãæ©èœããŸãããããã¹ãŠã®ããŒãåããããã«å€§éã®ã¡ã¢ãªãå¿ èŠã§ããã ããã«ãã©ã³ãã ãªMemcachedã®å€±æãåãå ¥ããããã¡ã¢ãªã倧éã«å¿ èŠãªãã§ã€ã«ã»ãŒãã³ããŒãäœæããã³ã¹ãã2åã«ããããéžæããå¿ èŠããããŸããã
Kafka / RocksDBã¹ããŒã ã¯ãå€ãã·ã¹ãã ã®ã»ãŒãã¹ãŠã®å©ç¹ãæäŸããŸãããä¿¡é Œæ§ãåäžããŠããŸãã èŠçŽãããšãäž»ãªææã¯æ¬¡ã®ãšããã§ãã
ãã£ã¹ã¯ãžã®ããŒã¿ã®ä¿åïŒããŒã®ã»ããå šäœãŸãã¯å®å šãªã€ã³ããã¯ã¹äœæãã¡ã¢ãªã«ä¿åããã®ã¯ã容èªã§ããªãã»ã©é«äŸ¡ã§ãã ããå€ãã®ããŒã¿ããã£ã¹ã¯ã«è»¢éããããŸããŸãªã¬ãã«ã®ãã¡ã€ã«ãšã€ã³ããã¯ã¹ã䜿çšããããšã§ãã³ã¹ããå€§å¹ ã«åæžããããšãã§ããŸããã é害çºçæã«ãã³ãŒã«ããã¹ãã¬ãŒãžïŒEBSïŒã«åãæ¿ããããšãã§ããé害çºçæã«è¿œå ã®ãããããã€ã³ã¹ã¿ã³ã¹ã®åäœããµããŒãããŸããã
ããŒãã£ã·ã§ã³åå²ïŒãã¡ãããæ€çŽ¢ã¹ããŒã¹ãçµã蟌ã¿ãã¡ã¢ãªã«å€§éã®ã€ã³ããã¯ã¹ãããŒãããªãããã«ããããã«ãç¹å®ã®ã¡ãã»ãŒãžãæ£ããã¯ãŒã«ãŒã«éä¿¡ãããããšãä¿èšŒãããŠããå¿ èŠããããŸãã Kafkaã®ããŒãã£ã·ã§ãã³ã°ã«ããããããã®ã¡ãã»ãŒãžãåžžã«æ£ããã«ãŒãã«ã«ãŒãã£ã³ã°ã§ãããããããŒã¿ããã£ãã·ã¥ãããªã¯ãšã¹ããããå¹ççã«çæã§ããŸãã
å»æ¢ãããããŒã®æ£ç¢ºãªèªè ïŒMemcachedã§ã¯ãåããŒã®TTLãèšå®ããŠãã®æå¹æéã決å®ããMemcachedããã»ã¹ã«äŸåããŠããŒãé€å€ããŸãã 倧ããªããŒã¿ãã±ããã®å Žåãããã¯ã¡ã¢ãªã®äžè¶³ãè ãããå€æ°ã®äž»èŠãªäŸå€ã®ããã«CPU䜿çšçãæ¥äžæããŸãã ããŒãåé€ããããã«ã¯ã©ã€ã¢ã³ãã«æ瀺ããããšã«ããããéè€æé€ãŠã£ã³ããŠããåæžããããšã§åé¡ãåé¿ã§ããŸãã
çå®ã®æºãšããŠã®ã«ãã« ïŒè€æ°ã®æ³šèŠç¹ã«ããéè€æé€ãæ¬åœã«åé¿ããããã«ãäžæµã®ãã¹ãŠã®é¡§å®¢ã«å ±éããçå®ã®æºã䜿çšããå¿ èŠããããŸãã ãã®ãããªãçå®ã®æºããšããŠã®ã«ãã«ã¯é©ãã»ã©ããŸãæ©èœããŸãã ã»ãšãã©ã®é害ã®å ŽåïŒKafkaèªäœã®é害ãé€ãïŒãã¡ãã»ãŒãžã¯Kafkaã«æžã蟌ãŸããããèšé²ãããŸããã ãŸããKafkaã䜿çšãããšã倧éã®ããŒã¿ãã¡ã¢ãªã«ä¿åããããšãªããå ¬éãããã¡ãã»ãŒãžãé©åã«é ä¿¡ãããè€æ°ã®ãã·ã³ã®ãã£ã¹ã¯éã§è€è£œãããŸãã
ãããèªã¿åãããã³æžã蟌ã¿ïŒ Kafkaããã³RocksDBã®åŒã³åºãã«å¯ŸããŠãããI / Oæäœãè¡ãããšã«ãããã·ãŒã±ã³ã·ã£ã«èªã¿åãããã³æžã蟌ã¿ã䜿çšããŠããã©ãŒãã³ã¹ãå€§å¹ ã«æ¹åããããšãã§ããŸããã Memcachedã䜿çšãã以åã®ã©ã³ãã ã¢ã¯ã»ã¹ã§ã¯ãªãããã£ã¹ã¯ã®ããã©ãŒãã³ã¹ãåäžãããã€ã³ããã¯ã¹ã®ã¿ãã¡ã¢ãªã«ä¿åããããšã§ãã¹ã«ãŒããããå€§å¹ ã«åäžãããŸããã
äžè¬ã«ãäœæããéè€æé€ã·ã¹ãã ãæäŸããä¿èšŒã«éåžžã«æºè¶³ããŠããŸããKafkaãšRocksDBãã¹ããªãŒãã³ã°ã¢ããªã±ãŒã·ã§ã³ã®åºç€ãšããŠäœ¿çšããããšããŸããŸãæšæºã«ãªã£ãŠããŸãããããŠããã®åºç€ã®äžã§æ°ããåæ£ã¢ããªã±ãŒã·ã§ã³ã®éçºãåãã§ç¶ããŸãã