ZFS:不良ドライブの信頼性をテストする

ZFSはその信頼性で有名であり、ここで私の指先でいくつかの命を奪われたディスクを集めました。 raidz1を試して、一定量のデータをスローしてから、その整合性をチェックし、zfsがこのような状況に対する耐性を確認します。





ドライブは「シンプルボリューム」としてAdaptec RAID 2805に接続され、デフォルト設定のraidz1が上に展開されます(FreeBSD 11-1インストーラーの場合)。

ZFS設定
zfs get all zroot NAME PROPERTY VALUE SOURCE zroot type filesystem - zroot creation  . 26 15:40 2018 - zroot used 75,5G - zroot available 10,1T - zroot referenced 128K - zroot compressratio 1.01x - zroot mounted yes - zroot quota none default zroot reservation none default zroot recordsize 128K default zroot mountpoint /zroot local zroot sharenfs off default zroot checksum on default zroot compression lz4 local zroot atime off local zroot devices on default zroot exec on default zroot setuid on default zroot readonly off default zroot jailed off default zroot snapdir hidden default zroot aclmode discard default zroot aclinherit restricted default zroot canmount on default zroot xattr off temporary zroot copies 1 default zroot version 5 - zroot utf8only off - zroot normalization none - zroot casesensitivity sensitive - zroot vscan off default zroot nbmand off default zroot sharesmb off default zroot refquota none default zroot refreservation none default zroot primarycache all default zroot secondarycache all default zroot usedbysnapshots 0 - zroot usedbydataset 128K - zroot usedbychildren 75,5G - zroot usedbyrefreservation 0 - zroot logbias latency default zroot dedup off default zroot mlslabel - zroot sync standard default zroot refcompressratio 1.00x - zroot written 128K - zroot logicalused 75,8G - zroot logicalreferenced 11,5K - zroot volmode default default zroot filesystem_limit none default zroot snapshot_limit none default zroot filesystem_count none default zroot snapshot_count none default zroot redundant_metadata all default
      
      







SMARTドライブ



aacd0p4
 === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD40EFRX-68WT0N0 Serial Number: WD-WCC4E6PN673U LU WWN Device Id: 5 0014ee 20d6399e3 Firmware Version: 82.00A82 User Capacity: 4 000 787 030 016 bytes [4,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Jan 26 15:25:30 2018 MSK SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status command failed: Input/output error SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 296 3 Spin_Up_Time 0x0027 210 208 021 Pre-fail Always - 6483 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 14 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7986 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 10 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 186 194 Temperature_Celsius 0x0022 112 111 000 Old_age Always - 40 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 17 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 55
      
      







aacd1p4
 === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD40EFRX-68WT0N0 Serial Number: WD-WCC4E7LPL4AH LU WWN Device Id: 5 0014ee 2b8213774 Firmware Version: 82.00A82 User Capacity: 4 000 787 030 016 bytes [4,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Jan 26 15:26:19 2018 MSK SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status command failed: Input/output error SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 6043 3 Spin_Up_Time 0x0027 207 190 021 Pre-fail Always - 6616 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 173 173 140 Pre-fail Always - 803 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7964 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 7 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 212 194 Temperature_Celsius 0x0022 108 108 000 Old_age Always - 44 196 Reallocated_Event_Count 0x0032 049 049 000 Old_age Always - 151 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 6 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 101 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 232
      
      







aacd2p4
 === START OF INFORMATION SECTION === Model Family: Western Digital Green Device Model: WDC WD40EZRX-22SPEB0 Serial Number: WD-WCC4E4KAK52T LU WWN Device Id: 5 0014ee 2b6900646 Firmware Version: 80.00A80 User Capacity: 4 000 787 030 016 bytes [4,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Jan 26 15:26:43 2018 MSK SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status command failed: Input/output error SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 180 173 021 Pre-fail Always - 7983 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 11 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9777 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6 193 Load_Cycle_Count 0x0032 138 138 000 Old_age Always - 186920 194 Temperature_Celsius 0x0022 110 109 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
      
      







aacd3p4
 === START OF INFORMATION SECTION === Model Family: Western Digital Green Device Model: WDC WD40EZRX-00SPEB0 Serial Number: WD-WCC4E5ALXUHC LU WWN Device Id: 5 0014ee 20c16eacf Firmware Version: 80.00A80 User Capacity: 4 000 787 030 016 bytes [4,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Jan 26 15:27:03 2018 MSK SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status command failed: Input/output error SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 394 3 Spin_Up_Time 0x0027 194 179 021 Pre-fail Always - 7258 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 16 5 Reallocated_Sector_Ct 0x0033 195 195 140 Pre-fail Always - 160 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7584 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 16 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 11 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 678600 194 Temperature_Celsius 0x0022 114 112 000 Old_age Always - 38 196 Reallocated_Event_Count 0x0032 121 121 000 Old_age Always - 79 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 44 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 319 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 1 200 Multi_Zone_Error_Rate 0x0008 199 199 000 Old_age Offline - 530
      
      







苦痛への道



テストデータとして、約71 GBのマルチメディアを投入し、



読み取りエラーの束
 aacd3: hard error cmd=read 40789032-40789207 aacd3: hard error cmd=read 40789032-40789207 aacd1: hard error cmd=read 40790712-40790967 aacd3: hard error cmd=read 40789560-40789735 aacd1: hard error cmd=read 40794064-40794319 aacd1: hard error cmd=read 40795208-40795287 aacd1: hard error cmd=read 40795824-40796079 aacd1: hard error cmd=read 40796088-40796343 aacd1: hard error cmd=read 40797240-40797423 aacd3: hard error cmd=read 21743840-21744015 aacd1: hard error cmd=read 28502624-28502799 aacd1: hard error cmd=read 28597680-28597855 aacd1: hard error cmd=read 28635368-28635623 aacd1: hard error cmd=read 37340776-37340951 aacd1: hard error cmd=read 37342712-37342887 aacd1: hard error cmd=read 37347808-37348063 aacd1: hard error cmd=read 37348072-37348327 aacd1: hard error cmd=read 37352168-37352343 aacd1: hard error cmd=read 37359472-37359647 aacd1: hard error cmd=read 37365576-37365831 aacd1: hard error cmd=read 37372960-37373215 aacd1: hard error cmd=read 37373488-37373743 aacd1: hard error cmd=read 37380608-37380863 aacd1: hard error cmd=read 37381136-37381391 aacd1: hard error cmd=read 37382984-37383239 aacd1: hard error cmd=read 57577976-57577999 aacd1: hard error cmd=read 4606480-4606495 aacd1: hard error cmd=read 7811867664-7811867679 aacd1: hard error cmd=read 7811868176-7811868191 aac0: COMMAND 0xfffffe0000e97690 (TYPE 502) TIMEOUT AFTER 137 SECONDS aac0: COMMAND 0xfffffe0000e91650 (TYPE 502) TIMEOUT AFTER 137 SECONDS aac0: COMMAND 0xfffffe0000e92d10 (TYPE 502) TIMEOUT AFTER 137 SECONDS aac0: WARNING! Controller is no longer running! code= 0xbcc90100 aacd3: hard error cmd=read 40785088-40785343 aacd3: hard error cmd=read 40785352-40785607 aacd3: hard error cmd=read 40785616-40785871 aacd3: hard error cmd=read 40788240-40788495 aacd3: hard error cmd=read 40783592-40783847 aacd3: hard error cmd=read 40784648-40784903 aacd3: hard error cmd=read 40785176-40785431 aacd3: hard error cmd=read 40785440-40785695 aacd3: hard error cmd=read 21743928-21744103 aacd1: hard error cmd=read 25407280-25407535 aacd1: hard error cmd=read 28507712-28507967 aacd1: hard error cmd=read 37322056-37322311 aacd1: hard error cmd=read 37344208-37344383 aacd1: hard error cmd=read 37348160-37348415 aacd1: hard error cmd=read 37373488-37373743 aacd1: hard error cmd=read 37380696-37380951 aacd1: hard error cmd=read 37383072-37383327 aacd1: hard error cmd=read 37383776-37384031 aacd1: hard error cmd=read 37395312-37395487 aacd1: hard error cmd=read 37426368-37426623 aacd1: hard error cmd=read 40682424-40682679 aacd1: hard error cmd=read 40702816-40703071 aacd1: hard error cmd=read 40725472-40725647 aacd1: hard error cmd=read 40760224-40760479 aacd1: hard error cmd=read 40761280-40761535 aacd1: hard error cmd=read 40764536-40764711 aacd1: hard error cmd=read 40772144-40772399 aacd1: hard error cmd=read 40774520-40774775 aacd1: hard error cmd=read 40778304-40778559 aacd3: hard error cmd=read 40783592-40783847 aacd3: hard error cmd=read 40784648-40784903 aacd3: hard error cmd=read 40785176-40785431 aacd3: hard error cmd=read 40785440-40785695 aacd1: hard error cmd=read 40785792-40785879 aacd3: hard error cmd=read 40785792-40785871 aacd1: hard error cmd=read 40785792-40785879 aacd1: hard error cmd=read 40790624-40790879 aacd3: hard error cmd=read 40790000-40790175 aacd1: hard error cmd=read 40799280-40799535 aacd3: hard error cmd=read 41121032-41121287 aacd1: hard error cmd=read 44290824-44290999 aacd1: hard error cmd=read 44301408-44301583 aacd1: hard error cmd=read 44315680-44315935 aacd1: hard error cmd=read 44330184-44330359 aacd1: hard error cmd=read 44337224-44337399 aacd1: hard error cmd=read 44344472-44344727 aacd1: hard error cmd=read 51561672-51561927 aacd1: hard error cmd=read 51571528-51571783
      
      







テスト中、Adaptec RAID 2805コントローラーには多くの問題があり、問題のあるディスクでパニックでクラッシュしました。そのため、テストは6回実行されました。しかし、テストは完了し、データを失うことさえできませんでした。

プールの状態
  pool: zroot state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: scrub in progress since Fri Jan 26 16:05:37 2018 67,0G scanned out of 101G at 71,3M/s, 0h8m to go 2,63M repaired, 66,05% done config: NAME STATE READ WRITE CKSUM zroot DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 aacd0p4 ONLINE 0 0 0 aacd1p4 FAULTED 40 93 7 too many errors (repairing) aacd2p4 ONLINE 0 0 0 aacd3p4 ONLINE 0 0 0 pool: zroot state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 4,54M in 0h18m with 0 errors on Fri Jan 26 16:45:35 2018 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 aacd0p4 ONLINE 0 0 0 aacd1p4 ONLINE 1 0 11 aacd2p4 ONLINE 0 0 0 aacd3p4 ONLINE 0 0 2 errors: No known data errors
      
      







整合性を二次的に確認するために、テストスイートでエラーを見つけられないトレントを使用しました。



まとめ



ZFSは、明らかに悪い条件でも良好に機能しました。 ZFSに対してushatannyeディスクとコントローラーを使用しましたが、そのファームウェアはディスク上の問題のために落ちました。 特筆すべきは、ディスクへの書き込みは非常に高速(ギガビットのネットワークカード全体をダウンロード)でしたが、読み取りとエラーの取得には非常に長い時間がかかりました。 これをどうするか、自分で決める;)



All Articles