Differences

This shows you the differences between two versions of the page.

--- doc:appunti:hardware:raspberrypi_nas_smart_hard_disk [2021/03/10 10:39] – niccolo
+++ doc:appunti:hardware:raspberrypi_nas_smart_hard_disk [2022/01/07 14:04] (current) – [Letting the hard drive to remain in standby mode] niccolo
@@ Line 1: / Line 1: @@
 ====== Raspberry Pi NAS: Hard disk management ======
-===== smartctl and hdparm =====
+This page is about configuring the **hard disk power management** on a Raspberry Pi, to be used as NAS and mediacenter. The hard disk is connected via a **[[raspberrypi_nas_x835_sata_board|SupTronics X835 shield]]**. See the main index at **[[raspberrypi_nas]]**.
+There are two tools to manage SATA disks in GNU/Linux. **smartctl** is designed to interact with the **[[wp>S.M.A.R.T.]]** capabilities of a disk. **hdparm** is a command line interface to the Linux kernel SATA subsystem, including Advanced Power Management.
+===== smartctl =====
 The **smartctl** tool from the **smartmontools** package requires the **%%-d sat%%** option to access the hard disk through the USB bridge with the right protocol.
@@ Line 16: / Line 20: @@
 # Disable DEVICESCAN, which does not work in our environment.
 #DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
-# Send an email test to <root@localhost> on daemon start.
-/dev/sda -d sat -m root@localhost -M test
 # Use the suggedested subset of checks, instead of the '-a'.
+# NOTICE: We are running smartd with option --interval=3600 instead of the
+# 1800 default, i.e. device polling occurs every 1 our instead of 30 minutes.
+# The number of skipped checks (option -n) must be multipled by that value
+# to obtain the maximum time that checks will be skipped:
+# 336 * 3600 seconds = 14 days.
 /dev/sda -d sat \
     -H \                # Check the health with the SMART RETURN STATUS command
@@ Line 27: / Line 33: @@
     -f \                # Check for 'failure' of any Usage Attributes
     -n standby,336 \    # Skip smartd checks during standby (max 336 times, add ',q' for quiet)
-    -W 0,50,60 \        # Report Temperature Celsius WARN=50, CRIT=60 (SMART attribute 194)
+    -W 0,50,60 \        # Temperature (SMART att. 194): WARN=50 (log), CRIT=60 (mail)
     -s S/../../1/23 \   # Schedule a short Self-Test at 23:00 of every monday.
+    -s O/../.././21 \   # Schedule an Offline Immediate Test every day at 21:00.
     -m root@localhost \ # Send a warning email on failures and errors
     -M daily \          # Repeat email warnings daily
@@ Line 42: / Line 49: @@
 Restart the **smartd.service** and verify that the **smartd** program is running with that paramter (3600 seconds, i.e. one hour, instead of 30 minutes).
-To set the **Advanced Power Management** level use the **%%-B%%** option, the maximum performance level which **permits spin-down** of the drive is 127:
+At each cycle, all the **%%-s%%** options are check for a match, the first match will be executed and the remaining are ignored.
+The syntax for the **%%-s%%** option (test scheduling) is as follow:
 <code>
-hdparm -B 127 /dev/sda
+-s T/MM/DD/d/HH
+   |  |  | |  |
+   |  |  | |  |
+   |  |  | |  \-- Hour of day, 2 digits
+   |  |  | \----- Day of week, 1 is Monday
+   |  |  \------- Day of the month, 2 digits
+   |  \---------- Month, 2 digits
+   \------------- (L)ong Self-Test, (S)hort Self-Test,
+                  (C)onveyance Self-Test, (O)ffline  Immediate  Test
 </code>
-Regardless of the APM level, we can set the hard disk **Standby timer** (spindown) after 30 minutes of inactivity. See **man hdparm** for explanation of the number following the **%%-S%%** option (it seems that there is not way to know the timeout once you have set it):
+We should use the **background test mode**, which does not interrupt normal disk activity.
+  * **Short Self-Test**: max 2 min. Perform some tests on Electrical Properties, Mechanical Properties and Read/Verify. It requires under ten minutes and can be given during normal system operation.
+  * **Long Self-Test**: like short test, but with no time restriction and Read/Verify spans the entire disk. It requires tens of minutes to several hours to complete, can be given during normal system operation.
+  * **Conveyance Test** (ATA only): a few minutes to check for damages incurred during transporting of the device. It requires a few minutes to complete.
+  * **Offline Immediate  Test**: The results of this test is actually the data collection reflected in the values of the SMART Attributes. Thus, if problems or errors are detected, the values of these Attributes will go below their failure thresholds; some types of errors may also appear in the SMART error log. These are visible with the ''-A'' and ''-l error'' options respectively.
+If you want to display the log e.g. of self-test executed, run:
 <code>
-hdparm -S 241 /dev/sda
+smartctl -d sat -l selftest /dev/sda
 </code>
-**WARNING**: If both **APM** and the **Standby timer** are set, then the device shall go to the Standby state when the timer expires or the device’s APM algorithm indicates that the Standby state should be entered. If you want a time exact spin-down, use the -S Standby timer, because APM algorithm (including its spin-down timeout) is device dependant and you cannot control it. See p.19 of **[[https://web.archive.org/web/20160701095638/https://www.t13.org/documents/UploadedDocuments/docs2011/d2015r7-ATAATAPI_Command_Set_-_2_ACS-2.pdf|ATA/ATAPI Command Set - 2 (ACS-2)]]**.
+The output is something like this:
-Enable the **most quiet acoustic management** (it is **not supported** in my case: 4 Tb Seagate IronWolf):
 <code>
-hdparm -M 128 /dev/sda
+=== START OF READ SMART DATA SECTION ===
+SMART Self-test log structure revision number 1
+Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
+# 1  Short offline       Completed without error       00%       525         -
+# 2  Short offline       Completed without error       00%       377         -
+# 3  Short offline       Completed without error       00%       252         -
+# 4  Conveyance offline  Completed without error       00%       252         -
 </code>
-**Check** whether the drive is in **standby** mode, without waking it up:
+===== hdparm =====
+To set the **Advanced Power Management** level use the **%%-B%%** option, the maximum performance level which **permits spin-down** of the drive is 127:
 <code>
-smartctl -d sat --nocheck=standby,3 /dev/sda
+hdparm -B 127 /dev/sda
 </code>
-The meaning of **standby,3** is: **do not check** the disk if it is in **SLEEP** or **STANDBY** mode (not spinning), return **exit code 3** in this case (you can choose your custom exit code). Beside the exit code, these are the output messages:
+Regardless of the APM level, we can set the hard disk **Standby timer** (spindown) after 30 minutes of inactivity. See **man hdparm** for explanation of the number following the **%%-S%%** option (the granularity of the timer up to 20 minutes is 5 seconds, above that threshold it is 30 minutes). It seems that there is not way to know the timeout once you have set it.
 <code>
-Device is in ACTIVE or IDLE mode
+hdparm -S 241 /dev/sda
 </code>
+**WARNING**: If both **APM** and the **Standby timer** are set, then the device shall go to the Standby state when the timer expires or the device’s APM algorithm indicates that the Standby state should be entered. If you want a time exact spin-down, use the -S Standby timer, because APM algorithm (including its spin-down timeout) is device dependant and you cannot control it. See p.19 of **[[https://web.archive.org/web/20160701095638/https://www.t13.org/documents/UploadedDocuments/docs2011/d2015r7-ATAATAPI_Command_Set_-_2_ACS-2.pdf|ATA/ATAPI Command Set - 2 (ACS-2)]]**.
+Enable the **most quiet acoustic management** (it is **not supported** in my case: 4 Tb Seagate IronWolf):
 <code>
-Device is in STANDBY mode, exit(3)
+hdparm -M 128 /dev/sda
 </code>
-**WARNING**: Reading SMART attributes using the **smartctl** command will reset the Standby timer, even if using the **%%--nocheck=standby%%** option; i.e. that option will prevent exiting from the standby mode, but if ''smartctl'' is executed when standby is not yet active, it will reset the timer and this may prevent entering the standby mode, regardless of that option.
 Debian provides the file **/etc/hdparm.conf** which is handled by the **udev** on system start to set the required parameters. You can set something like this:
@@ Line 86: / Line 119: @@
 # It is advisable to disable write cache in this case.
 # See man hdparm(8), -S and -W options.
+# The granularity of the spindown_time up to 20 minutes is 5
+# seconds, above that threshold it is 30 minutes.
 /dev/sda {
         write_cache = off
@@ Line 97: / Line 132: @@
 DEVNAME=/dev/sda /lib/udev/hdparm
 </code>
+===== Querying the disk status =====
+**Check** whether the drive is in **standby** mode, without waking it up:
+<code>
+smartctl -d sat --nocheck=standby,3 /dev/sda
+</code>
+The meaning of **standby,3** is: **do not check** the disk if it is in **SLEEP** or **STANDBY** mode (not spinning), return **exit code 3** in this case (you can choose your custom exit code). Beside the exit code, these are the output messages:
+<code>
+Device is in ACTIVE or IDLE mode
+</code>
+<code>
+Device is in STANDBY mode, exit(3)
+</code>
+**WARNING**: Reading SMART attributes using the **smartctl** command will reset the Standby timer, even if using the **%%--nocheck=standby%%** option; i.e. that option will prevent exiting from the standby mode, but if ''smartctl'' is executed when standby is not yet active, it will reset the timer and this may prevent entering the standby mode, regardless of that option.
 **WARNING**: When **smartctl** report the device being in **ACTIVE or IDLE mode**, the disk may be actually not spinning.
@@ Line 105: / Line 160: @@
   * **updatedb** - Execute by the **/etc/cron.daily/mlocate** cronjob, will update the database of files stored on hard disk. Add the directory to be skipped into **/etc/updatedb.conf**.
-  * **smartctl** - Reading SMART attributes (e.g. disk temperature, errors log) awake the disk. Hou may have periodic checks executed by **snmpd**, etc.
+  * **smartctl** - Reading SMART attributes (e.g. disk temperature, errors log) awake the disk. You may have periodic checks executed by **snmpd**, etc.
 ===== Web References =====
   * **[[https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl|SMART tests with smartctl]]**
+  * **[[https://serverfault.com/questions/275364/get-drive-power-state-without-waking-it-up/|Get drive power state without waking it up]]**
+  * **[[https://unix.stackexchange.com/questions/366438/hard-disk-not-going-to-standby-automatically|Hard disk not going to standby automatically]]**
+  * **[[https://serverfault.com/questions/305847/how-to-determine-disc-spindown-time|How to determine disc spindown time]]**