Igor PanteleevããœãããŠã§ã¢éçºè ãDataArt
å€ãã®ãµãŒãã¹ã人éã®é³å£°ãèªèããããã«çºæãããŸãã-PocketsphinxãŸãã¯Google Speech APIãèŠããŠãããŠãã ããã ãªãŒãã£ãªãã¡ã€ã«ã®åœ¢åŒã§èšé²ããããã¬ãŒãºããå®æ§çã«å°å·ããã¹ãã«å€æã§ããŸãã ãã ãããããã®ã¢ããªã±ãŒã·ã§ã³ã¯ããããããã€ã¯ã§ãã£ããã£ãããããŸããŸãªãµãŠã³ãããœãŒãã§ããŸããã æ£ç¢ºã«èšé²ããããã®ïŒäººéã®ã¹ããŒããåç©ã®å«ã³ãé³æ¥œïŒ ãã®è³ªåã«çããå¿ èŠã«çŽé¢ããŠããŸãã ãããŠãæ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã䜿çšããŠé³ãåé¡ããããã®ãã€ããããããžã§ã¯ããäœæããããšã«ããŸããã ãã®èšäºã§ã¯ãéžæããããŒã«ãçºçããåé¡ãTensorFlowã®ã¢ãã«ã®ãã¬ãŒãã³ã°æ¹æ³ããªãŒãã³ãœãŒã¹ãœãªã¥ãŒã·ã§ã³ã®å®è¡æ¹æ³ã«ã€ããŠèª¬æããŸãã ãŸããèªèçµæãDeviceHive IoTãã©ãããã©ãŒã ã«ã¢ããããŒãããŠããµãŒãããŒãã£ã¢ããªã±ãŒã·ã§ã³ã®ã¯ã©ãŠããµãŒãã¹ã§äœ¿çšããããšãã§ããŸãã
åé¡ã®ããã®ããŒã«ãšã¢ãã«ã®éžæ
ãŸãããã¥ãŒã©ã«ãããã¯ãŒã¯ãæäœããããã®ãœãããŠã§ã¢ãéžæããå¿ èŠããããŸããã ç§ãã¡ã«ãšã£ãŠé©åã ãšæãããæåã®è§£æ±ºçã¯ã Python Audio Analysisã©ã€ãã©ãªãŒã§ããã
æ©æ¢°åŠç¿ã®äž»ãªåé¡ã¯ãåªããããŒã¿ã»ããã§ãã é³å£°èªèãšé³æ¥œåé¡ã«ã¯ããã®ãããªã»ããããããããããŸãã ã©ã³ãã ãªé³ã®åé¡ã§ã¯ãç©äºã¯ããã»ã©è¯ããããŸããããããã§ã¯ãªããšããŠãããéœäŒçãªãé³ã®ããŒã¿ã»ãããèŠã€ããŸãã ã
ãã¹ãäžã«ã次ã®åé¡ãçºçããŸããã
- pyAudioAnalysisã¯ååãªæè»æ§ããããŸããã çãç¯å²ã®ãã©ã¡ãŒã¿ãŒã§åäœãããããã®äžéšã¯ãªã³ã¶ãã©ã€ã§èšç®ãããŸãã ããšãã°ããã¬ãŒãã³ã°ãµã€ã¯ã«ã®æ°ã¯ãµã³ãã«ã®æ°ã«åºã¥ããŠãããããã¯å€æŽã§ããŸããã
- éžæããããŒã¿ã»ããã«ã¯10ââåã®ã¯ã©ã¹ã®ã¿ãå«ãŸããŠããããããã¯ãã¹ãŠéœåžã®é³ã®ã°ã«ãŒãã«å«ãŸããŠããŸãã
次ã®ãœãªã¥ãŒã·ã§ã³ã¯ã Google AudioSetã§ãã ãããã¯ãYouTubeã®ã¿ã°ä»ããããªãã©ã°ã¡ã³ãã«åºã¥ããŠããã2ã€ã®åœ¢åŒã§ããŠã³ããŒãã§ããŸãã
- åãã©ã°ã¡ã³ãã«é¢ãã次ã®æ å ±ãå«ãCSVãã¡ã€ã«ïŒYouTubeã«æçš¿ããããããªã®IDããã©ã°ã¡ã³ãã®éå§æéãšçµäºæéãããã»ãŒãžã«å²ãåœãŠããã1ã€ä»¥äžã®ã¿ã°ã
- TensorFlowãã¡ã€ã«ãšããŠä¿åãããæœåºããããªãŒãã£ãªæ©èœã
ãããã®ãªãŒãã£ãªæ©èœã¯ã YouTube-8Mã¢ãã«ãšäºææ§ããããŸã ã ãã®ãœãªã¥ãŒã·ã§ã³ã§ã¯ã TensorFlow VGGishã¢ãã«ã䜿çšããŠãªãŒãã£ãªã¹ããªãŒã ããæ©èœãæœåºããããšãæšå¥šããŠããŸãã ãã®ãœãªã¥ãŒã·ã§ã³ã¯åœç€Ÿã®èŠä»¶ã®ã»ãšãã©ãæºãããŠãããããéžæããããšã«ããŸããã
åŠç¿ã¢ãã«
次ã®ã¿ã¹ã¯ã¯ãYouTube-8Mã€ã³ã¿ãŒãã§ã€ã¹ãã©ã®ããã«æ©èœãããã調ã¹ãããšã§ããã ãããªã§åäœããããã«èšèšãããŠããŸããã幞ããªããšã«ããªãŒãã£ãªã§åäœããããšãã§ããŸãã ãã®ã©ã€ãã©ãªã¯éåžžã«æè»ã§ãããã¯ã©ã¹ã®æ°ã¯åºå®ãããŠããŸãã ãã®ãããã¯ã©ã¹ã®æ°ããã©ã¡ãŒã¿ãŒãšããŠæž¡ãããšãã§ããããã«ãããã€ãã®å€æŽãå ããŸããã YouTube-8Mã¯ãéçŽãããæ©èœãšåãã©ã°ã¡ã³ãã®æ©èœã®2çš®é¡ã®ããŒã¿ãåŠçã§ããŸãã Google AudioSetã¯ãåãã©ã°ã¡ã³ãã®æ©èœã®åœ¢åŒã§ããŒã¿ãæäŸããŸãã 次ã«ããã¬ãŒãã³ã°çšã®ã¢ãã«ãéžæããå¿ èŠããããŸããã
ãªãœãŒã¹ãæéã粟床
ã°ã©ãã£ãã¯ããã»ããµïŒGPUïŒã¯ãäžå€®åŠçè£ çœ®ïŒCPUïŒãããæ©æ¢°åŠç¿ã«é©ããŠããŸãã 詳现ã«ã€ããŠã¯ãã¡ããã芧ãã ããã詳现ã«ã€ããŠã¯èª¬æãããããã«èšå®ã«é²ã¿ãŸãã å®éšã«ã¯ãNVIDIA GTX 970 4GBã°ã©ãã£ãã¯ã¹ã«ãŒãã1ææèŒããPCã䜿çšããŸããã
ç§ãã¡ã®å Žåããã¬ãŒãã³ã°æéã¯ããã»ã©éèŠã§ã¯ãããŸããã§ããã éžæããã¢ãã«ãšãã®ç²ŸåºŠã«ã€ããŠæåã®æ±ºå®ãäžãã«ã¯ã1ã2æéã®ãã¬ãŒãã³ã°ã§ååã§ããã
ãã¡ãããå¯èœãªéãæé«ã®ç²ŸåºŠãåŸãããšæã£ãŠããŸãã ããããããè€éãªã¢ãã«ïŒããé«ã粟床ãæäŸããå¿ èŠããããŸãïŒããã¬ãŒãã³ã°ããã«ã¯ãããå€ãã®RAMïŒã°ã©ãã£ãã¯ããã»ããµã䜿çšããå Žåã¯ãããªã«ãŒãã¡ã¢ãªïŒãå¿ èŠã«ãªããŸãã
ã¢ãã«éžæ
説æä»ãã®YouTube-8Mã¢ãã«ã®å®å šãªãªã¹ãã«ã€ããŠã¯ã ãã¡ããã芧ãã ãã ã ãã¬ãŒãã³ã°ããŒã¿ã¯æçåãããæ©èœãšããŠè¡šç€ºããããããé©åãªã¢ãã«ã䜿çšããå¿ èŠããããŸãã Google AudioSetã«ã¯ããã©ã³ã¹ã®åãããã¬ãŒãã³ã°ããã©ã³ã¹ã®åããŠããªããã¬ãŒãã³ã°ãè©äŸ¡ã®3ã€ã®éšåãããªãããŒã¿ã»ãããå«ãŸããŠããŸãã 詳现ã«ã€ããŠã¯ãã¡ããã芧ãã ãã ã
ãã¬ãŒãã³ã°ãšè©äŸ¡ã®ããã«ãYouTube-8Mã®ä¿®æ£çã䜿çšãããŸããã ãã¡ãã«ãããŸã ã
ãã©ã³ã¹ã®åããåŠç¿
ãã®å Žåãã³ãã³ãã¯æ¬¡ã®ãšããã§ãã
python train.py --train_data_pattern = / path_to_data / audioset_v1_embeddings / bal_train / *ãtfrecord --num_epochs = 100 --learning_rate_decay_examples = 400000 --feature_names = audio_embedding --feature_sizes = 128 --frame_featuresbatch = patch 527 --train_dir = / path_to_logs --model = ModelName
LstmModelã«ã€ããŠã¯ãããã¥ã¡ã³ãã«åŸã£ãŠåºæ¬åŠç¿é床ã0.001ã«å€æŽããŸããã ãŸããååãªRAMããªãã£ããããlstm_cellsã®å€ã256ã«å€æŽããŸããã
åŠç¿ææãèŠãŠã¿ãŸãããã
ã¢ãã«å | ãã¬ãŒãã³ã°æé | æçµã¹ãããã®è©äŸ¡ | å¹³åè©äŸ¡ |
---|---|---|---|
ããžã¹ãã£ã㯠| 14å3ç§ | 0.5859 | 0.5560 |
Dbof | 31å46ç§ | 1,000 | 0.5220 |
Lstm | 1æé45å53ç§ | 0.9883 | 0.4581 |
ãã¬ãŒãã³ã°æ®µéã§äœãšãè¯ãçµæãåŸãããšãã§ããŸããããããã¯å®å šãªè©äŸ¡ã§åæ§ã®ææšãéæãããšããæå³ã§ã¯ãããŸããã
äžåè¡¡ãªåŠç¿
äžåè¡¡ãªããŒã¿ã»ããã«ã¯ããã«å€ãã®ãµã³ãã«ãããããããã¬ãŒãã³ã°ãµã€ã¯ã«ã®æ°ã10ã«èšå®ããŸãïŒåŠç¿ã«æéããããããã5ãèšå®ããå¿ èŠããããŸãïŒã
ã¢ãã«å | ãã¬ãŒãã³ã°æé | æçµã¹ãããã®è©äŸ¡ | å¹³åè©äŸ¡ |
---|---|---|---|
ããžã¹ãã£ã㯠| 2æé4å14ç§ | 0.8750 | 0.5125 |
Dbof | 4æé39å29ç§ | 0.8848 | 0.5605 |
Lstm | 9æé42å52ç§ | 0.8691 | 0.5396 |
ãã¬ãŒãã³ã°ãžã£ãŒãã«
ãã°ãã¡ã€ã«ã調ã¹ããå Žåã¯ããã®ãªã³ã¯ããããŠã³ããŒãããŠæœåºã§ããŸãã ããŒãåŸã tensorboard --logdir / path_to_train_logs /ãå®è¡ãã ãªã³ã¯ããã©ããŸãã
ãã¬ãŒãã³ã°ã®è©³çŽ°ãã芧ãã ããã
YouTube-8Mã¯å€ãã®ãã©ã¡ãŒã¿ãŒãåãå ¥ãããããã®å€ãã¯åŠç¿ããã»ã¹ã«åœ±é¿ãäžããŸãã
ããšãã°ãåŠç¿ã®é床ãšæ代ã®æ°ã調æŽã§ããŸããããã«ãããåŠç¿ããã»ã¹ã倧ããå€ãããŸãã çµæãæ¹åããããã«èª¿æŽããã³å€æŽã§ããæ倱ããã³ãã®ä»ã®æçšãªå€æ°ãèšç®ããããã®3ã€ã®é¢æ°ããããŸãã
ãªãŒãã£ãªãã£ããã£ããã€ã¹ã§ãã¬ãŒãã³ã°ãããã¢ãã«ã䜿çšãã
ã¢ãã«ããã¬ãŒãã³ã°ããããã³ãŒããè¿œå ããŠã¢ãã«ãšããåãããŸãã
ãã€ã¯ãªãŒãã£ãªãã£ããã£
ã©ããããããããã€ã¯ããé³å£°ããŒã¿ãååŸããå¿ èŠããããŸãã PyAudioã©ã€ãã©ãªã䜿çšããŸããããã¯ãã·ã³ãã«ãªã€ã³ã¿ãŒãã§ã€ã¹ãæã¡ãã»ãšãã©ã®ãã©ãããã©ãŒã ã§åäœããŸãã
é³ã®æºå
åè¿°ã®ããã«ãæ©èœãæœåºããããŒã«ãšããŠTensorFlow VGGishã¢ãã«ã䜿çšããŸãã å€æããã»ã¹ã®ç°¡åãªèª¬æã次ã«ç€ºããŸãã
èŠèŠåã«ã¯ãUrbanSoundããŒã¿ã»ããã®Dog barkãµã³ãã«ïŒãDog BarkingãïŒã䜿çšããŸããã
ãªãŒãã£ãªã16 kHzã¢ãã©ã«åœ¢åŒã«å€æããŸãã
25 msã®ãŠã£ã³ããŠãµã€ãºã10 msã®ã¹ããããããã³åšæçãªHannãŠã£ã³ããŠã䜿çšããŠãSTFTå€ïŒçãæéééã§ã®ããŒãªãšå€æïŒã䜿çšããŠã¹ãã¯ããã°ã©ã ãèšç®ããŸã ã
ãã§ãŒã¯ã¹ãã¯ããã°ã©ã ãèšç®ããçŸåšã®ã¹ãã¯ããã°ã©ã ã64ããããã§ãŒã¯ã®ç¯å²ã«ããŸãã
logïŒmel-spectrum + 0.01ïŒã䜿çšããŠå®å®åããã察æ°ã¹ãã¯ããã°ã©ã ãèšç®ããŸããããã§ããªãã»ããã¯ãŒã察æ°ãåé¿ããããã«äœ¿çšãããŸãã
ãããã®æ©èœã¯ã0.96ç§ã§ã°ãã°ãã®ãã©ã°ã¡ã³ãã«å€æãããŸããåæ©èœã¯ã10ããªç§ã®96ãã¬ãŒã ã«å¯ŸããŠ64ãã§ãŒã¯ç¯å²ã®æ¬¡å ãæã¡ãŸãã
次ã«ãçµæã®ããŒã¿ã¯VGGishã¢ãã«ã«éãããããŒã¿ããã¯ãã«åœ¢åŒã«ãªããŸãã
åé¡
æåŸã«ãããŒã¿ããã¥ãŒã©ã«ãããã¯ãŒã¯ã«è»¢éããŠçµæãååŸããããã®ã€ã³ã¿ãŒãã§ã€ã¹ãå¿ èŠã§ãã
YouTube-8Mã®ã€ã³ã¿ãŒãã§ã€ã¹ãããŒã¹ã«ããŠããŸãããã·ãªã¢ã«å/éã·ãªã¢ã«åãã§ãŒãºãåé€ããããã«å€æŽããŸãã
ãã㧠ãç§ãã¡ã®ä»äºã®çµæãèŠã€ããããšãã§ããŸãã ãã®ç¹ã詳ããèŠãŠã¿ãŸãããã
èšçœ®
PyAudioã¯libportaudio2ãšportaudio19-devã䜿çšããããããããã®ããã±ãŒãžãã€ã³ã¹ããŒã«ããŠåäœãããå¿ èŠããããŸãã
ããã«ãããã€ãã®Pythonã©ã€ãã©ãªãå¿ èŠã«ãªããŸãã ãããã¯pipã§ã€ã³ã¹ããŒã«ã§ããŸãïŒ pip install -r requirements.txt
ãŸããä¿åããã¢ãã«ãå«ãã¢ãŒã«ã€ãããããžã§ã¯ãã«ãŒãã«ããŠã³ããŒãããŠæœåºããå¿ èŠããããŸãã ããã§èŠã€ããããšãã§ããŸãã
æã¡äžã
ãã®ãããžã§ã¯ãã§ã¯ã3ã€ã®ã€ã³ã¿ãŒãã§ãŒã¹ã®ããããã䜿çšããå¯èœæ§ãæäŸããŠããŸãã
é²é³æžã¿ã®é³å£°ãã¡ã€ã«
python parse_file.py path_to_your_file.wavãå®è¡ãããšãã¿ãŒããã«ã§Speechã衚瀺ãããŸãïŒ0.75ãMusicïŒ0.12ãInsideã倧ããªéšå±ãŸãã¯ããŒã«ïŒ0.03
çµæã¯ãœãŒã¹ããŒã¿ã«äŸåããŸãã ãããã®å€ã¯ããã¥ãŒã©ã«ãããã¯ãŒã¯ã®äºæž¬ã«åºã¥ããŠå°åºãããŸãã å€ã倧ããã»ã©ãå ¥åããŒã¿ããã®ã¯ã©ã¹ã«å±ããå¯èœæ§ãé«ããªããŸãã
ãã€ã¯ããŒã¿ã®ãã£ããã£ãšåŠç
python capture.pyã¯ããã€ã¯ããããŒã¿ãç¶ç¶çã«ãã£ããã£ããããã»ã¹ãèµ·åããŸãã 5ã7ç§ããšã«åé¡çšã®ããŒã¿ãéä¿¡ããŸãïŒããã©ã«ãïŒã åã®äŸã®ããã«çµæã衚瀺ãããŸãã ãã©ã¡ãŒã¿ãŒ--save_path = / path_to_samples_dir /ã䜿çšããŠå®è¡ã§ããŸãããã®å Žåããã£ããã£ãããããŒã¿ã¯ãã¹ãŠãæå®ããããã©ã«ããŒã«.WAV圢åŒã§ä¿åãããŸãã ãã®æ©èœã¯ãåããã¿ãŒã³ã§ç°ãªãã¢ãã«ãè©Šãããå Žåã«äŸ¿å©ã§ãã 詳现ã«ã€ããŠã¯ã-helpãªãã·ã§ã³ã䜿çšããŠãã ããã
Webã€ã³ã¿ãŒãã§ãŒã¹
python daemon.pyã³ãã³ãã¯ãåçŽãªWebã€ã³ã¿ãŒãã§ãŒã¹ãå®è£ ããŸã ãããã¯ãããã©ã«ãã§http://127.0.0.1:8000ã§å©çšå¯èœã§ãã åã®äŸãšåãã³ãŒãã䜿çšããŸãã ã€ãã³ãããŒãžã§ææ°ã®10åã®äºæž¬ã確èªã§ããŸã ã
IoTçµ±å
æåŸã®éåžžã«éèŠãªç¹ã¯ãIoTã€ã³ãã©ã¹ãã©ã¯ãã£ãšã®çµ±åã§ãã åã®ã»ã¯ã·ã§ã³ã§èª¬æããWebã€ã³ã¿ãŒãã§ãŒã¹ãèµ·åãããšãã¡ã€ã³ããŒãžã§DeviceHiveã¯ã©ã€ã¢ã³ãæ¥ç¶ãšãã®èšå®ã®ã¹ããŒã¿ã¹ã確èªã§ããŸãã ã¯ã©ã€ã¢ã³ããæ¥ç¶ãããŠããéãäºæž¬ã¯éç¥ã®åœ¢åŒã§æå®ãããããã€ã¹ã«éä¿¡ãããŸãã
ãããã«
TensorFlowã¯ãç»åãé³å£°ãèªèããããã®å€ãã®æ©æ¢°åŠç¿ã¢ããªã±ãŒã·ã§ã³ã§åœ¹ç«ã€éåžžã«æè»ãªããŒã«ã§ãã ãã®ãããªããŒã«ãIoTãã©ãããã©ãŒã ãšé£æºããŠäœ¿çšãããšã倧ããªå¯èœæ§ãç§ããã€ã³ããªãžã§ã³ããªãœãªã¥ãŒã·ã§ã³ãäœæã§ããŸãã ãã¹ããŒãã·ãã£ãã§ã¯ãå®å šæ§ã確ä¿ããããã«äœ¿çšã§ããŸããããšãã°ãã¬ã©ã¹ãã·ã§ãããå£ãé³ãèªèããããšãã§ããŸãã ç±åž¯æã§ããããã®ãããªãœãªã¥ãŒã·ã§ã³ã䜿çšããŠãéçåç©ãé³¥ã®å£°ãåæããããšã§ãéçåç©ãé³¥ã®ã«ãŒãã远跡ã§ããŸãã IoTãã©ãããã©ãŒã ã¯ããã€ã¯ã®ç¯å²å ã§é³ã®éç¥ãéä¿¡ããããã«æ§æã§ããŸãã ãã®ãããªãœãªã¥ãŒã·ã§ã³ãããŒã«ã«ããã€ã¹ã«ã€ã³ã¹ããŒã«ããŠïŒåæã«ã¯ã©ãŠãã·ã¹ãã ãšããŠå±éããããšãã§ããŸãïŒããã©ãã£ãã¯ãšã¯ã©ãŠãã³ã³ãã¥ãŒãã£ã³ã°ã®ã³ã¹ããæå°éã«æããæªåŠçã®é³å£°ä»ãã®æ·»ä»ãã¡ã€ã«ãªãã§éç¥ã®ã¿ãéä¿¡ããããã«ã«ã¹ã¿ãã€ãºã§ããŸãã ããã¯ãªãŒãã³ãœãŒã¹ãããžã§ã¯ãã§ããããšãå¿ããªãã§ãã ãããããã䜿çšããŠç¬èªã®ãµãŒãã¹ãäœæã§ããŸãã