Kubernetesã§ã¯ãã¯ã©ã¹ã¿ãŒå ã®åã³ã³ãããŒã«äžæã®ã«ãŒãã£ã³ã°å¯èœãªIPãå¿ èŠã§ãã Kubernetesã¯IPã¢ãã¬ã¹èªäœãå²ãåœãŠãªãããããã®ã¿ã¹ã¯ã¯ãµãŒãããŒãã£ã®ãœãªã¥ãŒã·ã§ã³ã«ä»»ãããŠããŸãã
ãã®èª¿æ»ã®ç®æšã¯ãæå°ã®ã¬ã€ãã³ã·ãæé«ã®ã¹ã«ãŒããããããã³æäœã®æ§æã³ã¹ãã§ãœãªã¥ãŒã·ã§ã³ãèŠã€ããããšã§ãã è² è·ã¯é 延ã«äŸåãããããååã«ã¢ã¯ãã£ããªãããã¯ãŒã¯è² è·ã§ã®é«ããŒã»ã³ã¿ã€ã«ã®é 延ã枬å®ããŸãã ç¹ã«ãæå€§è² è·ã®30ã50ïŒ çšåºŠã®ããã©ãŒãã³ã¹ã«çŠç¹ãåœãŠãŸãããããã¯ãæ··éããŠããªãã·ã¹ãã ã®å žåçãªç¶æ³ãæãããåæ ããŠããããã§ãã
ãªãã·ã§ã³
--net=host
ã--net=host
Docker
æš¡ç¯çãªã€ã³ã¹ããŒã«ã ä»ã®ãã¹ãŠã®ãªãã·ã§ã³ã¯åœŒå¥³ãšæ¯èŒãããŸããã
ãªãã·ã§ã³
--net=host
ã¯ãã³ã³ããããã¹ããã·ã³ã®IPã¢ãã¬ã¹ãç¶æ¿ããããšãæå³ããŸãã ãããã¯ãŒã¯ã®ã³ã³ããåã¯ãããŸããã
å éšçãªãããã¯ãŒã¯ã³ã³ããåã®æ¬ åŠã¯ãå®è£ ã®ååšãããåªããããã©ãŒãã³ã¹ãæäŸããŸãããã®ããããã®ã€ã³ã¹ããŒã«ãåç §ãšããŠäœ¿çšããŸããã
ãã©ã³ãã«
Flannelã¯ã CoreOSãããžã§ã¯ãã§ãµããŒããããŠããä»®æ³ãããã¯ãŒã¯ãœãªã¥ãŒã·ã§ã³ã§ãã ååã«ãã¹ããããçç£ã®æºåãã§ããŠãããããå®è£ ã®ã³ã¹ãã¯æå°éã§ãã
ãã©ã³ãã«ãã·ã³ãã¯ã©ã¹ã¿ãŒã«è¿œå ãããšããã©ã³ãã«ã¯æ¬¡ã®3ã€ã®ããšãè¡ããŸãã
- etcdã䜿çšããŠãµãããããæ°ãããã·ã³ã«å²ãåœãŠãŸãã
- ãã·ã³äžã«ä»®æ³ããªããžã€ã³ã¿ãŒãã§ã€ã¹ãäœæããŸãïŒ
docker0 bridge
ïŒã - ãã±ãã転éããã¯ãšã³ããæ§æããŸã ã
-
aws-vpc
-Amazon AWSã€ã³ã¹ã¿ã³ã¹ããŒãã«ã«ãã·ã³ãµãããããç»é²ãã ãã®ããŒãã«ã®ãšã³ããªã®æ°ã¯50ã«å¶éãããŠããŸããaws-vpc
ã§aws-vpc
ã䜿çšããå Žåãã¯ã©ã¹ã¿ãŒå ã«50å°ãè¶ ãããã·ã³ãaws-vpc
ããšã¯ã§ããŸããã ããã«ããã®ããã¯ãšã³ãã¯Amazon AWSã§ã®ã¿æ©èœããŸãã -
host-gw
ãªã¢ãŒããã·ã³ã®IPã¢ãã¬ã¹ãä»ããŠãµãããããžã®IPã«ãŒããäœæããŸãã flannelãå®è¡ããŠãããã¹ãéã®çŽæ¥L2æ¥ç¶ãå¿ èŠã§ãã -
vxlan
ä»®æ³VXLANã€ã³ã¿ãŒãã§ã€ã¹ãäœæããŸã ã
-
flannelã¯ããªããžã€ã³ã¿ãŒãã§ã€ã¹ã䜿çšããŠãã±ããã転éãããããåãã±ããã1ã€ã®ã³ã³ããããå¥ã®ã³ã³ããã«éä¿¡ããããšã2ã€ã®ãããã¯ãŒã¯ã¹ã¿ãã¯ãééããŸãã
IPvlan
IPvlanã¯Linuxã«ãŒãã«ã®ãã©ã€ããŒã§ãããªããžã€ã³ã¿ãŒãã§ã€ã¹ãå¿ èŠãšããã«äžæã®IPã¢ãã¬ã¹ãæã€ä»®æ³ã€ã³ã¿ãŒãã§ã€ã¹ãäœæã§ããŸãã
IPvlanã䜿çšããŠã³ã³ããã«IPã¢ãã¬ã¹ãå²ãåœãŠãã«ã¯ã次ãå¿ èŠã§ãã
- ãããã¯ãŒã¯ã€ã³ã¿ãŒãã§ã€ã¹ãªãã§ã³ã³ãããäœæããŸãã
- æšæºã®ãããã¯ãŒã¯åå空éã«ipvlanã€ã³ã¿ãŒãã§ã€ã¹ãäœæããŸãã
- ã€ã³ã¿ãŒãã§ã€ã¹ãã³ã³ããã®ãããã¯ãŒã¯åå空éã«ç§»åããŸãã
IPvlanã¯æ¯èŒçæ°ãããœãªã¥ãŒã·ã§ã³ã§ããããããã®ããã»ã¹ãèªååããããã®æ¢è£œã®ããŒã«ã¯ãããŸããã ãããã£ãŠãå€ãã®ãã·ã³ããã³ã³ã³ããã§ã®IPvlanã®å±éã¯ããè€éã«ãªããã€ãŸãå®è£ ã³ã¹ããé«ããªããŸãã ãã ããIPvlanã¯ããªããžã€ã³ã¿ãŒãã§ã€ã¹ãå¿ èŠãšããããã±ãããNICããä»®æ³ã€ã³ã¿ãŒãã§ã€ã¹ã«çŽæ¥è»¢éããããããã©ã³ãã«ãããåªããããã©ãŒãã³ã¹ãæåŸ ãããŠããŸããã
è² è·ãã¹ãã¹ã¯ãªãã
åãªãã·ã§ã³ã«ã€ããŠã次ã®æé ãå®äºããŸããã
- 2ã€ã®ç©çãã·ã³ã§ãããã¯ãŒã¯ãã»ããã¢ããããŸã ã
- 1å°ã®ãã·ã³ã®ã³ã³ãããŒã§tcpkaliãéå§ããäžå®ã®é床ã§èŠæ±ãéä¿¡ããããã«ã»ããã¢ããããŸããã
- å¥ã®ãã·ã³ã®ã³ã³ããã§nginxãèµ·åããåºå®ãµã€ãºã®ãã¡ã€ã«ã§å¿çããããã«èšå®ããŸããã
- ã·ã¹ãã ã¡ããªãã¯ãštcpkaliã®çµæãåé€ããŸããã
ãã®ãã¹ãã¯ãæ¯ç§50,000ãã450,000ã®ãªã¯ãšã¹ãïŒRPSïŒã®ç°ãªããªã¯ãšã¹ãæ°ã§å®è¡ããŸããã
åãªã¯ãšã¹ãã«å¯ŸããŠãnginxã¯åºå®ãµã€ãºã®éçãã¡ã€ã«ã§å¿çããŸããïŒ350ãã€ãïŒ100ãã€ãã®ã³ã³ãã³ããš250ãã€ãã®ããããŒïŒãŸãã¯4ãããã€ãã
çµæ
- IPvlanã¯ãæå°ã®ã¬ã€ãã³ã·ãšæé«ã®æ倧ã¹ã«ãŒãããã瀺ããŠããŸãã
host-gw
ããã³aws-vpc
ãåããFlannelã¯ãcloseã€ã³ãžã±ãŒã¿ã䜿çšããŠããã«è¿œåŸããŸãããhost-gw
ã¯æå€§è² è·äžã§ããã©ãŒãã³ã¹ãåäžããŸããã -
vxlan
ã䜿çšãããã©ã³ãã«ã¯ããã¹ãŠã®ãã¹ãã§ææªã®çµæã瀺ããŸããã ãã ããäŸå€çã«æªãããŒã»ã³ã¿ã€ã«99.999ã¯ãã°ãåå ã§ãããšæãããŸãã - 4 KBã®å¿çã®çµæã¯350ãã€ãã®å ŽåãšäŒŒãŠããŸããã2ã€ã®é¡èãªéãããããŸãã
- 4ãããã€ãã®å¿çã§ã¯10ã®ã¬ãããNICãå®å šã«ããŒãããã®ã«çŽ27äžRPSããããããªãã£ããããæ倧RPSã¯ã¯ããã«äœããªããŸãã
- 垯åå¹
ã®å¶éã«è¿ã¥ããšãIPvlanã¯
--net=host
ã«éåžžã«è¿ããªããŸãã
çŸåšã®éžæã¯ã
host-gw
ãã©ã³ãã«ã§ãã äŸåé¢ä¿ã¯ã»ãšãã©ãªãïŒç¹ã«ãAWSãLinuxã«ãŒãã«ã®æ°ããããŒãžã§ã³ãå¿ èŠãšããŸããïŒãIPvlanãšæ¯èŒããŠç°¡åã«ã€ã³ã¹ããŒã«ã§ããååãªããã©ãŒãã³ã¹ãæäŸããŸãã IPvlanã¯ãã©ãŒã«ããã¯ã§ãã ããæç¹ã§flannelãIPvlanãµããŒããååŸãããããã®ãªãã·ã§ã³ã«é²ã¿ãŸãã
aws-vpc
ã¯
host-gw
ããããããã«åªããŠããŸãããã50å°ã®ãã·ã³ã®å¶éãšAmazon AWSãžã®ç·å¯ãªãã€ã³ãã®äºå®ã決å®çãªèŠå ã§ããã
50,000 RPSã350ãã€ã
1ç§ããã50,000件ã®ãªã¯ãšã¹ãã§ããã¹ãŠã®åè£è ã蚱容ã§ããããã©ãŒãã³ã¹ã瀺ããŸããã ãã§ã«äž»ãªåŸåã«æ°ã¥ãããšãã§ããŸããIPvlanãæè¯ã®çµæã瀺ãã
host-gw
ãš
aws-vpc
ãããã«
vxlan
ææªã§ãã
150,000 RPSã350ãã€ã
150,000 RPSïŒæ倧RPSã®çŽ30ïŒ ïŒã§ã®é 延ããŒã»ã³ã¿ã€ã«ãããªç§
IPvlanã¯
host-gw
ããã³
aws-vpc
ããããããã«åªããŠããŸãããææªã®ããŒã»ã³ã¿ã€ã«ã¯99.99ã§ãã
host-gw
ããã©ãŒãã³ã¹ã¯
aws-vpc
ããããããã«åªããŠããŸãã
250,000 RPSã350ãã€ã
ãã®ãããªè² è·ã¯çç£ã§äžè¬çã§ãããšæ³å®ãããŠãããããçµæã¯ç¹ã«éèŠã§ãã
250,000 RPSã®ããŒã»ã³ã¿ã€ã«é 延ïŒæ倧RPSã®çŽ50ïŒ ïŒãããªç§
IPvlanã®ããã©ãŒãã³ã¹ã¯åã³åäžããŠããŸããã
aws-vpc
æé«ããŒã»ã³ã¿ã€ã«ã¯99.99ãš99.999ã§ãã
host-gw
ãããŒã»ã³ã¿ã€ã«95ããã³99ã§
aws-vpc
ãããåªããŠããŸãã
350,000 RPSã350ãã€ã
ã»ãšãã©ã®å Žåãé 延ã¯250,000 RPSïŒ350ãã€ãïŒã®çµæã«è¿ãã§ããã99.5ããŒã»ã³ã¿ã€ã«ã®åŸã«æ¥éã«å¢å ããŸããããã¯ãæ倧RPSã«è¿ã¥ãããšãæå³ããŸãã
450,000 RPSã350ãã€ã
èå³æ·±ãããšã«ã
host-gw
ã¯
aws-vpc
ãããã¯ããã«åªããããã©ãŒãã³ã¹ã瀺ããŸãã
500,000 RPSã350ãã€ã
è² è·ã500,000 RPSã®å ŽåãIPvlanã®ã¿ãåäœãã--
--net=host
ãè¶ ã
--net=host
ããé 延ãéåžžã«å€§ãããããé 延ã®åœ±é¿ãåããããã¢ããªã±ãŒã·ã§ã³ã§ã¯èš±å®¹ã§ããªããšã¯èšããŸããã
50,000 RPSã4ãããã€ã
倧èŠæš¡ãªã¯ãšãªçµæïŒä»¥åã«ãã¹ãããã350ãã€ãã«å¯ŸããŠ4ãããã€ãïŒã¯ããããã¯ãŒã¯è² è·ã倧ããããŸããããªãŒããŒããŒãã¯å®è³ªçã«å€åããŸããã
50,000 RPSã®ããŒã»ã³ã¿ã€ã«é 延ïŒæ倧RPSã®çŽ20ïŒ ïŒãããªç§
150,000 RPSã4ãããã€ã
host-gw
99.999ããŒã»ã³ã¿ã€ã«ã¯é©ãã»ã©è²§åŒ±ã§ãããå°ããããŒã»ã³ã¿ã€ã«ã§ãè¯ãçµæã瀺ããŠããŸãã
150,000 RPSã§ã®é 延ããŒã»ã³ã¿ã€ã«ïŒæ倧RPSã®çŽ60ïŒ ïŒãms
250,000 RPSã4ãããã€ã
ããã¯ã倧ããªå¿çïŒ4 KbïŒã§ã®æ倧RPSã§ãã
aws-vpc
ãå°ããªå¿çïŒ350ãã€ãïŒã®å Žåãšã¯ç°ãªãã
host-gw
ãã
aws-vpc
å€§å¹ ã«åªããŠã
host-gw
ã
Vxlan
ã¯åã³ã¹ã±ãžã¥ãŒã«ããé€å€ãããŸããã
ãã¹ãç°å¢
åºæ¬
ãã®èšäºãããããç解ãããã¹ãç°å¢ãåçŸããã«ã¯ãé«æ§èœã®åºæ¬ãç解ããå¿ èŠããããŸãã
ãããã®èšäºã«ã¯ããã®ãããã¯ã«é¢ããæçšãªæ å ±ãå«ãŸããŠããŸãã
- CloudFlareããæ¯ç§100äžãã±ãããåä¿¡ããæ¹æ³ ã
- CloudFlareã®10Gbpsã€ãŒãµãããã§äœé 延ãå®çŸããæ¹æ³ ã
- Linuxã«ãŒãã«ã®ããã¥ã¡ã³ãã®Linux Networking Stackã§ã®ã¹ã±ãŒãªã³ã° ã
è»
- CentOS 7ãæèŒããAmazon AWS EC2ã§c4.8xlargeã® 2ã€ã®ã€ã³ã¹ã¿ã³ã¹ã䜿çšããŸããã
- äž¡æ¹ã®ãã·ã³ã§æ¡åŒµãããã¯ãŒãã³ã°ãæå¹ã«ãªã£ãŠããŸãã
- åãã·ã³ã¯2ã€ã®ããã»ããµãåããNUMAã§ãããåããã»ããµã«ã¯9ã€ã®ã³ã¢ããããåã³ã¢ã«ã¯2ã€ã®ã¹ã¬ããïŒãã€ããŒã¹ã¬ããïŒããããåãã·ã³ã§36ã¹ã¬ããã®å¹æçãªèµ·åãä¿èšŒããŸãã
- åãã·ã³ã«ã¯ã10GbpsïŒNICïŒãããã¯ãŒã¯ã«ãŒããš60 GBã®RAMããããŸãã
- æ¡åŒµãããã¯ãŒã¯ãšIPvlanããµããŒãããããã«ãIntel ixgbevfãã©ã€ããŒãåããLinux 4.3.0ã«ãŒãã«ãã€ã³ã¹ããŒã«ããŸããã
æ§æ
ææ°ã®NICã¯ãè€æ°ã®å²ã蟌ã¿èŠæ±ã©ã€ã³ïŒ IRQ ïŒãä»ããŠReceive Side ScalingïŒRSSïŒã䜿çšããŸãã EC2ã¯ä»®æ³åç°å¢ã§ãã®ãããªè¡ã2ã€ã ãæäŸãããããRSSãšReceive Packet SteeringïŒRPSïŒã䜿çšããŠããã€ãã®æ§æããã¹ãããLinuxã«ãŒãã«ã®ããã¥ã¡ã³ãã§äžéšæšå¥šãããŠãã次ã®èšå®ã«å°éããŸããã
- IRQ 2ã€ã®NUMAããŒãã®ããããã®æåã®ã³ã¢ã¯ãNICããå²ã蟌ã¿ãåä¿¡ããããã«æ§æãããŠããŸãã CPUãNUMA
lscpu
ã«lscpu
ãlscpu
䜿çšãããŸãã
$ lscpu | grep NUMA NUMA node(s): 2 NUMA node0 CPU(s): 0-8,18-26 NUMA node1 CPU(s): 9-17,27-35
ãã®èšå®ã¯ã0
ãš9
ã/proc/irq/<num>/smp_affinity_list
ã«æžã蟌ãããšã§è¡ãããŸã/proc/irq/<num>/smp_affinity_list
çªå·ã¯grep eth0 /proc/interrupts
ããååŸgrep eth0 /proc/interrupts
ãŸãã
$ echo 0 > /proc/irq/265/smp_affinity_list $ echo 9 > /proc/irq/266/smp_affinity_list
- ãã±ããã¹ãã¢ãªã³ã°ïŒRPSïŒãåä¿¡ããŸãã RPSã®ããã€ãã®çµã¿åããããã¹ããããŠããŸãã é
延ãæžããããã«ãCPUçªå·1ã8ããã³10ã17ã®ã¿ã䜿çšããŠãIRQåŠçããããã»ããµãã¢ã³ããŒãããŸããã IRQã®
smp_affinity
ãšã¯ç°ãªããrps_cpus
sysfsãã¡ã€ã«ã«rps_cpus
åŸçœ®èšå·rps_cpus
ãªããããããããã¹ã¯ã䜿çšããŠãRPSããã©ãã£ãã¯ã転éã§ããCPUããªã¹ãããŸãïŒè©³çŽ°ã«ã€ããŠã¯ã Linuxã«ãŒãã«ã®ããã¥ã¡ã³ãïŒRPS Configurationãåç §ããŠãã ããïŒ ã
$ echo "00000000,0003fdfe" > /sys/class/net/eth0/queues/rx-0/rps_cpus $ echo "00000000,0003fdfe" > /sys/class/net/eth0/queues/rx-1/rps_cpus
- ãã±ããã¹ãã¢ãªã³ã°ïŒXPSïŒãéä¿¡ããŸãã ãã¹ãŠã®NUMA 0ããã»ããµãŒïŒãã€ããŒã¹ã¬ããã£ã³ã°ãã€ãŸã0-8ã18-26ã®CPUçªå·ãå«ãïŒã¯tx-0ã«æ§æãããNUMA 1ããã»ããµãŒïŒ9-17ã27-37ïŒã¯tx-1ã«æ§æãããŸããïŒè©³çŽ°Linuxã«ãŒãã«ã®ããã¥ã¡ã³ããåç
§ããŠãã ããïŒXPS Configuration ïŒ ïŒ
$ echo "00000000,07fc01ff" > /sys/class/net/eth0/queues/tx-0/xps_cpus $ echo "0000000f,f803fe00" > /sys/class/net/eth0/queues/tx-1/xps_cpus
- ãããŒã¹ãã¢ãªã³ã°ïŒRFSïŒãåä¿¡ããŸãã ç§ãã¡ã¯6äžã®æ°žç¶çãªæ¥ç¶ã䜿çšããäºå®ã§ããããå
¬åŒã®ææžã§ã¯ããã®æ°ã2ã®ã¹ãä¹ã«äžžããããšãæšå¥šããŠããŸãã
$ echo 65536 > /proc/sys/net/core/rps_sock_flow_entries $ echo 32768 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt $ echo 32768 > /sys/class/net/eth0/queues/rx-1/rps_flow_cnt
- Nginx Nginxã¯ãããããç¬èªã®CPUïŒ0ã17ïŒãåãã18ã®ã¯ãŒã¯ãããŒã䜿çšããŸããã ããã¯
worker_cpu_affinity
ã䜿çšããŠèšå®ãããŸãïŒ
workers 18; worker_cpu_affinity 1 10 100 1000 10000 ...;
- Tcpkali ã Tcpkaliã«ã¯ãç¹å®ã®CPUãžã®ãã€ã³ãã®ãµããŒããçµã¿èŸŒãŸããŠããŸããã RFSã䜿çšããããã«ãã¿ã¹ã¯ã»ããã§tcpkaliãå®è¡ãããŸããªã¹ã¬ããã®åå²ãåœãŠã®ããã«ã¹ã±ãžã¥ãŒã©ãŒãã»ããã¢ããããŸããã
$ echo 10000000 > /proc/sys/kernel/sched_migration_cost_ns $ taskset -ac 0-17 tcpkali --threads 18 ...
ãã®æ§æã«ãããå²ã蟌ã¿ã®è² è·ãããã»ããµã³ã¢å šäœã«åçã«åæ£ããä»ã®ãã¹ãæžã¿æ§æãšåãé 延ãç¶æããªããã¹ã«ãŒããããåäžãããããšãã§ããŸããã
ã«ãŒãã«0ããã³9ã¯ããããã¯ãŒã¯ã€ã³ã¿ãŒãã§ã€ã¹å²ã蟌ã¿ïŒNICïŒã®ã¿ãåŠçãããã±ãããåŠçããŸããããæãå¿ãããŸãŸã§ãïŒ
ãŸããä»å±ã®ãããã¯ãŒã¯é 延ãããã¡ã€ã«ã§Red Hatãã調æŽããããã®ã䜿çšããŸããã
nf_conntrackã®åœ±é¿ãæå°éã«æããããã«ãNOTRACK ã«ãŒã«ãè¿œå ãããŸããã
sysctlæ§æã¯ãå€æ°ã®TCPæ¥ç¶ããµããŒãããããã«æ§æãããŸããã
fs.file-max = 1024000 net.ipv4.ip_local_port_range = "2000 65535" net.ipv4.tcp_max_tw_buckets = 2000000 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_low_latency = 1
翻蚳è ãã ïŒ Machine ZoneãIncã®ååã®ãã¹ãã«æè¬ããŸãïŒ ããã¯ç§ãã¡ãå©ããã®ã§ãä»ã®äººãšå ±æãããã£ãã®ã§ãã
PSèšäºã Container Networking InterfaceïŒCNIïŒ-Network Interface and Standard for Linux Containers ãã«ãèå³ããããããããŸããã