
ã«ãŒãã«å ã®ããã»ã¹ã®æé©åã®åé¡ãçºçããã®ã¯ãªãã§ããïŒ ãã¹ãŠã¯ãã³ã³ãããšã¡ã¢ãªå¶åŸ¡ã°ã«ãŒãïŒmemcgïŒãç©æ¥µçã«äœ¿çšããŠããã客æ§ã®1人ããæã çºçããCPUæ¶è²»ã®å¥åŠãªããŒã¯ã«æ³šæãåããããšããå§ãŸããŸããã éåžžã®ã·ã¹ãã è² è·ã¯çŽ50ïŒ ã§ãããããŒã¯æã«ã¯ããã»ããµæéã®100ïŒ ã䜿çšããããã®ã»ãŒãã¹ãŠãã«ãŒãã«ïŒsysæéïŒã«ãã£ãŠæ¶è²»ãããŸããã
ããŒãèªäœã¯ãã«ããŠãŒã¶ãŒã§ãããçŽ200åã®OpenVZã³ã³ãããŒãèµ·åãããŸããã åæã®çµæãå€æ°ã®ãŠãŒã¶ãŒããã¹ããããDockerã³ã³ãããŒãšã¡ã¢ãªå¶åŸ¡ã°ã«ãŒãã®ãã«ãã¬ãã«éå±€ãäœæããããšãããããŸããã åãŠãŒã¶ãŒã¬ãã«ã®æäžäœã³ã³ããã«ã¯ãsystemdã«ãã£ãŠäœæãããçŽ20åã®ããŠã³ããã€ã³ããš20åã®å¶åŸ¡ã¡ã¢ãªã°ã«ãŒãïŒmemcgïŒãå«ãŸããŠããŸããã ããã«ãåè¿°ã®Dockerã«ãã£ãŠäœæãããããŠã³ããã€ã³ããšã³ã³ãããŒã«ã°ã«ãŒãããããŸããã ç°¡åã«èšãã°ãããŒãã®è² è·ã倧ããããã®è² è·ã¯ä»ã®ãã¹ãŠã®é¡§å®¢ã®å¹³åãããã¯ããã«åŒ·ãã£ãã®ã§ãã åãåé¡ãããŸãç®ç«ããªãæ··éããŠããªããã·ã³ã§çºçããå¯èœæ§ãããããããããã®ããŒã¯ãçŸããçç±ãèŠã€ããããšã«èå³ããããŸããïŒããšãã°ãã·ã¹ãã æé+ 5ïŒ ã®ããŒã¯ãäžãããšããã©ãŒãã³ã¹ãäœäžããŸãïŒã
perfãæäœããããšã§ãããŒã¯ããã£ããããŠãã¬ã€ã«ãåé€ããããšãã§ããŸããã ããã»ããµæéã®ã»ãšãã©ããã¹ã©ããã£ãã·ã¥ãã€ãŸãã¹ãŒããŒãããã¯ãã£ãã·ã¥ã®ã¯ãªã¢ã«è²»ããããŠããããšãå€æããŸããã
- 100,00% 0,00% kswapd0 [kernel.vmlinux] [k] kthread - 99,31% balance_pgdat - 82,11% shrink_zone - 61,69% shrink_slab - 58,29% super_cache_count + 54,56% list_lru_count_one
ããã§ã¯ããã®åé¡ã«ã€ããŠè©³çŽ°ã«èª¬æãã説æãã䟡å€ããããŸãã æçµçã«ã¡ã¢ãªã解æŸããåã«ãã«ãŒãã«ãæªäœ¿çšã®ããŒã¿ããã°ãããã£ãã·ã¥ããããšã¯èª°ããç¥ã£ãŠããŸãã ã«ãŒãã«ã¯ãã®ååãåºç¯ã«äœ¿çšããŸãã ããšãã°ãããŒãžãã£ãã·ã¥ã«ã¯ãã¡ã€ã«ã«é¢é£ããããŒã¿ã®ããŒãžãå«ãŸããŠãããèªã¿åãæã«ããŒãžãžã®ç¹°ãè¿ãã¢ã¯ã»ã¹ãå€§å¹ ã«é«éåããŸãïŒãã£ã¹ã¯ã«å床ã¢ã¯ã»ã¹ããå¿ èŠããªãããïŒã ãã®å Žåã2ã€ã®LRUãªã¹ãïŒs_dentry_lruããã³s_inode_lruïŒã«å«ãŸããã¹ãŒããŒãããã¯ã¡ã¿ããŒã¿ãã£ãã·ã¥ã§åé¡ãçºçããŸããã
LRUïŒæè¿æã䜿çšãããŠããªãïŒ
struct lru_listã¯ãªã³ã¯ãªã¹ãã®é åãæããåã¢ã¯ãã£ããªmemcgã¯ãã®é åã®1ã€ã®èŠçŽ ïŒlist_lru_oneïŒã«å¯Ÿå¿ããŸãã ç¹å®ã®SLABãªããžã§ã¯ããã«ãŒãã«ã«ãã£ãŠäœ¿çšãããªããªããšãã«ãŒãã«ã¯ãããé åã®ãªã³ã¯ãªã¹ãã®1ã€ã«è¿œå ããŸãïŒãªããžã§ã¯ããå±ããmemcgããŸãã¯å€§ãŸãã«èšã£ãŠããã®ãªããžã§ã¯ããäœæãããšãã«äœ¿çšãããããã»ã¹ã®memcgã«ãã£ãŠç°ãªããŸãïŒã é åèªäœã¯æ¬¡ã®ããã«èšè¿°ãããŸãïŒlru_list :: node :: memcg_lrusïŒïŒ
struct list_lru_memcg { struct rcu_head rcu; /* array of per cgroup lists, indexed by memcg_cache_id */ struct list_lru_one *lru[0]; /* */ }; struct list_lru_one { struct list_head list; /* */ /* may become negative during memcg reparenting */ long nr_items; /* */ };
lru [0]ã¯ãID 0ã®memcgã«é¢é£ãããªããžã§ã¯ãã®ãªã¹ãã瀺ããŸãã
lru [1]ã¯ãID 1ã®memcgã«é¢é£ãããªããžã§ã¯ãã®ãªã¹ãã瀺ããŸãã
...
lru [n]ã¯ãID nã®memcgã«é¢é£ãããªããžã§ã¯ãã®ãªã¹ãã瀺ããŸãã
LRUãªã¹ãs_dentry_lruãšs_inode_lruãåé¡ã«çŸããŠãããååããæšæž¬ã§ããããã«ãæªäœ¿çšã®dentryããã³iããŒããã¡ã€ã«ã·ã¹ãã ãªããžã§ã¯ããå«ãŸããŠããŸãã
å°æ¥ãã·ã¹ãã ãŸãã¯ç¹å®ã®memcgã«ååãªã¡ã¢ãªããªãå Žåããªã¹ãé ç®ã®äžéšãæçµçã«è§£æŸãããã·ã¥ãªã³ã¯ãšåŒã°ããç¹å¥ãªã¡ã«ããºã ããããè¡ããŸãã
ã·ã¥ãªã³ã¯
ã«ãŒãã«ãã¡ã¢ãªããŒãžãå²ãåœãŠãå¿ èŠãããããNUMAããŒããŸãã¯ã·ã¹ãã ã«ç©ºãã¡ã¢ãªããªãå Žåããããã¯ãªãŒãã³ã°ããã¡ã«ããºã ãéå§ãããŸãã 圌ã¯ãäžå®éã®ãã£ã¹ã¯ãã¹ããŒãŸãã¯ç Žæ£ããããšããŠããŸãã1ïŒããŒãžãã£ãã·ã¥ãããã¡ã€ã«ã®ã³ã³ãã³ãã®ããŒãžã 2ïŒã¹ã¯ããå ã®å¿åã¡ã¢ãªã«é¢é£ããããŒãžã3ïŒãã£ãã·ã¥ãããSLABãªããžã§ã¯ãïŒçºçããåé¡ã¯ãããã«é¢é£ããŠããŸãïŒã
ãã£ãã·ã¥ãããSLABãªããžã§ã¯ãã®äžéšãç Žæ£ããŠããããŒãžã®ãªãªãŒã¹ã«ã¯çŽæ¥åœ±é¿ããŸãããååãšããŠããµã€ãºã¯ããŒãžãµã€ãºãããå€§å¹ ã«å°ããã1ããŒãžã«ã¯äœçŸãã®ãªããžã§ã¯ããå«ãŸããŸãã ãªããžã§ã¯ãã®äžéšã解æŸããããšãSLABããŒãžã«ç©ºãã¡ã¢ãªã®ã£ããã衚瀺ãããä»ã®SLABãªããžã§ã¯ãã®äœæã«äœ¿çšã§ããŸãã ãã®ã¢ã«ãŽãªãºã ã¯ã«ãŒãã«ã§æå³çã«åãå ¥ããããŠããŸããã·ã³ãã«ã§éåžžã«å¹ççã§ãã èå³ã®ããèªè ã¯ãdo_shrink_slabïŒïŒé¢æ°ã§ã¯ãªãŒãã³ã°ãããªããžã§ã¯ãã®äžéšãéžæããåŒãèŠãããšãã§ããŸãã
ãã®é¢æ°ã¯ããªããžã§ã¯ãã®äžéšã®å®éã®ã¯ãªãŒãã³ã°ãå®è¡ããŸããããã¯ãæ§é äœã·ã¥ãªã³ã¯ã§æž¡ããã説æã«åŸã£ãŠè¡ãããŸãã
static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, struct shrinker *shrinker, int priority) { ⊠/* */ freeable = shrinker->count_objects(shrinker, shrinkctl); if (freeable == 0) return 0; total_scan = _(freeable); while (total_scan >= batch_size) { /* */ ret = shrinker->scan_objects(shrinker, shrinkctl); total_scan -= shrinkctl->nr_scanned; } ... }
ã·ã¥ãªã³ã¯ã¹ãŒããŒãããã¯ã«é¢é£ããŠããããã®æ©èœã¯æ¬¡ã®ããã«å®è£ ãããŸãã åã¹ãŒããŒãããã¯ã¯ãé¢é£ããæªäœ¿çšãªããžã§ã¯ãã®ç¬èªã®s_dentry_lruããã³s_inode_lruãªã¹ããä¿æããŸãã
struct super_block { ... struct shrinker s_shrink; /* per-sb shrinker handle */ ... struct list_lru s_dentry_lru; struct list_lru s_inode_lru; ⊠};
.count_objectsã¡ãœããã¯ããªããžã§ã¯ãã®æ°ãè¿ããŸãã
static unsigned long super_cache_count(struct shrinker *shrink, struct shrink_control *sc) { total_objects += list_lru_shrink_count(&sb->s_dentry_lru, sc); total_objects += list_lru_shrink_count(&sb->s_inode_lru, sc); /* ) */ total_objects = vfs_pressure_ratio(total_objects); return total_objects; }
.scan_objectsã¡ãœããã¯å®éã«ãªããžã§ã¯ãã解æŸããŸãïŒ
static unsigned long super_cache_scan(struct shrinker *shrink, struct shrink_control *sc) { /* s_dentry_lru */ prune_dcache_sb(sb, sc); /* s_inode_lru */ prune_icache_sb(sb, sc); }
解æŸãããªããžã§ã¯ãã®æ°ã¯scãã©ã¡ãŒã¿ãŒã§æž¡ãããŸãã ãŸããmemcgã瀺ãããŠããããã®ãªããžã§ã¯ãã¯LRUããã¹ããŒãããå¿ èŠããããŸãã
struct shrink_control { int nid; /* ID NUMA */ unsigned long nr_to_scan; /* */ struct mem_cgroup *memcg; /* memcg */ };
ãããã£ãŠãprune_dcache_sbïŒïŒã¯é åstruct list_lru_memcg :: lru []ãããªã³ã¯ãªã¹ããéžæããããã䜿çšããŸãã Prune_icache_sbïŒïŒã¯åãããšãè¡ããŸãã
å€ãã·ã¥ãªã³ã¯ãã€ãã¹ã¢ã«ãŽãªãºã
æšæºçãªã¢ãããŒãã§ã¯ãã¡ã¢ãªäžè¶³ã§ã¹ã©ããããªããžã§ã¯ãããæåºãããŸã
sc-> target_mem_cgroupã¯æ¬¡ã®ããã«çºçããŸãã
shrink_node() { ⊠struct mem_cgroup *root = sc->target_mem_cgroup; /* sc->target_mem_cgroup */ memcg = mem_cgroup_iter(root, NULL, &reclaim); do { ⊠shrink_slab(memcg, ...); ⊠} while ((memcg = mem_cgroup_iter(root, memcg, &reclaim))); ... }
ãã¹ãŠã®åmemcgã調ã¹ãŠãããããã«å¯ŸããŠshrink_slabïŒïŒãåŒã³åºããŸãã 次ã«ãshrink_slabïŒïŒé¢æ°ã§ããã¹ãŠã®ã·ã¥ãªã³ã¯ãå®è¡ããããããã«å¯ŸããŠdo_shrink_slabïŒïŒãåŒã³åºããŸãã
static unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, int priority) { list_for_each_entry(shrinker, &shrinker_list, list) { struct shrink_control sc = { .nid = nid, .memcg = memcg, }; ret = do_shrink_slab(&sc, shrinker, ...); } }
ã¹ãŒããŒãããã¯ããšã«ãç¬èªã®ã·ã¥ãªã³ã¯ããã®ãªã¹ãã«è¿œå ãããããšãæãåºããŠãã ããã 20åã®memcgãšãããã20åã®ããŠã³ããã€ã³ããæã€200åã®ã³ã³ãããããå Žåã«ãdo_shrink_slabïŒïŒãäœååŒã³åºãããããã«ãŠã³ãããŸãããã åèšã§ã200 * 20ã®ããŠã³ããã€ã³ããš200 * 20ã®ã³ã³ãããŒã«ã°ã«ãŒãããããŸãã æäžäœã®memcgã«ååãªã¡ã¢ãªããªãå Žåããã®ãã¹ãŠã®åmemcgïŒã€ãŸããäžè¬çã«ãã¹ãŠïŒããã€ãã¹ããããã«åŒ·å¶ãããããããã«ã€ããŠãshrinker_listããåã·ã¥ãªã³ã¯ãåŒã³åºããŸãã ãããã£ãŠãã«ãŒãã«ã¯do_shrink_slabïŒïŒé¢æ°ã200 * 20 * 200 * 20 = 16000000åŒã³åºããŸãã
åæã«ããã®é¢æ°ã®å§åçãªæ°ã®åŒã³åºãã¯åœ¹ã«ç«ããªããªããŸããéåžžãã³ã³ããã¯ã³ã³ããéã§éé¢ãããCT1ãCT2ã§äœæãããsuper_block2ã䜿çšããå¯èœæ§ã¯äžè¬çã«äœããªããŸãã ãŸãã¯ãmemcg1ãCT1ããã®å¶åŸ¡ã°ã«ãŒãã§ããå Žåãsuper_block2-> s_dentry_lru-> node-> memcg_lrus-> lru [memcg1_id]é åã®å¯Ÿå¿ããèŠçŽ ã¯ç©ºã®ãªã¹ãã«ãªããdo_shrink_slabïŒïŒãåŒã³åºãæå³ã¯ãããŸããã
ãã®åé¡ã¯ãåçŽãªbashã¹ã¯ãªããã䜿çšããŠã¢ãã«åã§ããŸãïŒåŸã§ã«ãŒãã«ã«æž¡ããããããã»ããããã®ããŒã¿ã¯ãããã§äœ¿çšãããŸãïŒã
$echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy $mkdir /sys/fs/cgroup/memory/ct $echo 4000M > /sys/fs/cgroup/memory/ct/memory.kmem.limit_in_bytes $for i in `seq 0 4000`; do mkdir /sys/fs/cgroup/memory/ct/$i; echo $$ > /sys/fs/cgroup/memory/ct/$i/cgroup.procs; mkdir -ps/$i; mount -t tmpfs $is/$i; touch s/$i/file; done
ãã£ãã·ã¥ãªã»ããããã·ãŒãžã£ãé£ç¶ããŠ5ååŒã³åºããšã©ããªãããèŠãŠã¿ãŸãããã
$time echo 3 > /proc/sys/vm/drop_caches
ãã£ãã·ã¥ããããªããžã§ã¯ããå®éã«ã¡ã¢ãªå ã«ãããããæåã®å埩ã¯14ç§ç¶ããŸãïŒ 0.00ãŠãŒã¶ãŒ13.78ã·ã¹ãã 0ïŒ13.78ã 99ïŒ CPUãçµéããŸãã ã
2çªç®ã®å埩ã«ã¯5ç§ããããŸããããªããžã§ã¯ãã¯ãããããŸããã0.00user5.59system 0ïŒ05.60elapsed 99ïŒ CPUã
3çªç®ã®å埩ã«ã¯5ç§ããããŸãïŒ 0.00user 5.48system 0ïŒ05.48elapsed 99ïŒ CPU
4åç®ã®å埩ã«ã¯8ç§ããããŸãïŒ 0.00user 8.35system 0ïŒ08.35elapsed 99ïŒ CPU
5åç®ã®ç¹°ãè¿ãã«ã¯8ç§ããããŸãïŒ 0.00user 8.34system 0ïŒ08.35elapsed 99ïŒ CPU
ããã©ã³ã¢ã§äœ¿çšãããã·ã¥ãªã³ã¯ãã€ãã¹ã¢ã«ãŽãªãºã ã¯æé©ã§ã¯ãªãããšãæããã«ãªããã¹ã±ãŒã©ããªãã£ã®èŠ³ç¹ãããããå€æŽããå¿ èŠããããŸãã
æ°ããã·ã¥ãªã³ã¯ãã€ãã¹ã¢ã«ãŽãªãºã
æ°ããã¢ã«ãŽãªãºã ãããç§ã¯ä»¥äžãéæãããã£ãïŒ
- è人ã®å·ãã圌ã解æŸãã
- æ°ããããã¯ãè¿œå ããªãã§ãã ããã do_shrink_slabïŒïŒã¯ãæå³ãããå ŽåïŒã€ãŸããs_dentry_lrué åãŸãã¯s_inode_lrué åããã®å¯Ÿå¿ãããªã³ã¯ãªã¹ãã空ã§ã¯ãªãå ŽåïŒã«ã®ã¿åŒã³åºããŸããããªã³ã¯ãªã¹ãã¡ã¢ãªã«ã¯çŽæ¥ã¢ã¯ã»ã¹ããŸããã
ããã¯ãç°çš®ã®ã·ã¥ãªã³ã¯ã®äžã«ããæ°ããããŒã¿æ§é ã«ãã£ãŠã®ã¿æäŸã§ããããšã¯æããã§ããïŒã¹ãŒããŒãããã¯ã·ã¥ãªã³ã¯ã ãã§ãªãããã®èšäºã§èª¬æãããŠããªãä»ã®ããŒã¿ãªããžã§ã¯ãããããŸããèªè ã¯ãããŒã¯ãŒãprealloc_shrinkerïŒïŒã«ãŒãã«ã³ãŒãå ïŒã æ°ããããŒã¿æ§é ã§ã¯ããdo_shrink_slabïŒïŒãåŒã³åºããŠãæå³ããããŸãããšãdo_shrink_slabïŒïŒãåŒã³åºããŠãæå³ããããŸããããšãã2ã€ã®ç¶æ ã®ã³ãŒãã£ã³ã°ãå¯èœã«ãªããŸãã
IDAã¿ã€ãã®ããŒã¿æ§é ãæåŠãããçç±ã¯ 圌ãã¯èªèº«ã®äžã§ããã¯ã䜿çšããŸãã ããããã£ãŒã«ãã®ããŒã¿æ§é ã¯ããã®åœ¹å²ã«å®å šã«é©ããŠããŸããåã ã®ããããã¢ãããã¯ã«å€æŽã§ããã¡ã¢ãªããªã¢ãšçµã¿åãããŠãããã¯ã䜿çšããã«å¹ççãªã¢ã«ãŽãªãºã ãæ§ç¯ã§ããŸãã
åã·ã¥ãªã³ã¯ã¯ç¬èªã®äžæã®IDïŒã·ã¥ãªã³ã¯:: IDïŒãååŸããåmemcgã¯çŸåšç»é²ãããŠããIDã®æ倧IDãå«ãããšãã§ãããããããããååŸããŸãã æåã®èŠçŽ ãs_dentry_lru-> node-> memcg_lrus-> lru [memcg_id]ãªã¹ãã«è¿œå ããããšã察å¿ããmemcgããããããã¯çªå·ã·ã¥ãªã³ã¯-> idã§1ãããã«èšå®ãããŸãã s_inode_idã§ãåãã§ãã
ããã§ãshrink_slabïŒïŒã®ã«ãŒãã¯ãå¿ èŠãªã·ã¥ãªã³ã¯ã®ã¿ããã€ãã¹ããããã«æé©åã§ããŸãã
unsigned long shrink_slab() { ⊠for_each_set_bit(i, map, shrinker_nr_max) { ⊠shrinker = idr_find(&shrinker_idr, i); ⊠do_shrink_slab(&sc, shrinker, priority); ⊠} }
ïŒãããã¯ãªãŒãã³ã°ã¯ãã·ã¥ãªã³ã¯ããdo_shrink_slabïŒïŒãåŒã³åºããŠãæå³ããããŸããã詳现ã«ã€ããŠã¯ãGithubã®ã³ããããåç §ããŠãã ããã
ãã£ãã·ã¥ãªã»ãããã¹ããç¹°ãè¿ããå Žåãæ°ããã¢ã«ãŽãªãºã ã䜿çšãããšãå€§å¹ ã«åªããçµæã瀺ãããŸãã
$time echo 3 > /proc/sys/vm/drop_caches
æåã®ååŸ©ïŒ 0.00user 1.10system 0ïŒ01.10elapsed 99ïŒ CPU
2åç®ã®ååŸ©ïŒ 0.00user 0.00system 0ïŒ00.01elapsed 64ïŒ CPU
3åç®ã®ååŸ©ïŒ 0.00user 0.01system 0ïŒ00.01elapsed 82ïŒ CPU
4åç®ã®ååŸ©ïŒ 0.00user 0.00system 0ïŒ00.01elapsed 64ïŒ CPU
5åç®ã®ååŸ©ïŒ 0.00user 0.01system 0ïŒ00.01elapsed 82ïŒ CPU
2åç®ãã5åç®ã®ç¹°ãè¿ãã®æéã¯0.01ç§ã§ã 以åããã548åé«éã§ãã
ãã·ã³ã®ã¡ã¢ãªäžè¶³ããšã«ãã£ãã·ã¥ããªã»ããããåæ§ã®ã¢ã¯ã·ã§ã³ãçºçããããããã®æé©åã«ãããå€æ°ã®ã³ã³ãããšã¡ã¢ãªå¶åŸ¡ã°ã«ãŒããæã€ãã·ã³ã®åäœãå€§å¹ ã«æ¹åãããŸãã ãããã®ã»ãã ïŒ17åïŒãããã©ã³ã¢ã«åãå ¥ããããŠãããããŒãžã§ã³4.19ãããããèŠã€ããããšãã§ããŸãã
ããããã¬ãã¥ãŒããéçšã§ãGoogleã®åŸæ¥å¡ãçŸããåãåé¡ãæ±ããŠããããšãå€æããŸããã ãã®ããããããã¯ç°ãªãã¿ã€ãã®è² è·ã§ããã«ãã¹ããããŸããã
ãã®çµæããããã»ããã¯9åç®ã®å埩ããæ¡çšãããŸããã ãããŠãããã©ã³ã¢ãžã®é²å ¥ã«ã¯çŽ4ãæããããŸããã ãŸããä»æ¥ããããã»ããã¯ããŒãžã§ã³vz7.71.9以éã®ç¬èªã®Virtuozzo 7ã«ãŒãã«ã«å«ãŸããŠããŸãã