You can perform the hard disk replacement on Exadata cellnodes by performing the following checks and steps.
Hard Disks on the cell node are automatically dropped at the time of damage, and the disks of the asm disk groups will also be deleted. After the disk drop operation and after the disk changes, rebalance process will automatically start in ASM disk groups.
Find the Damaged Disk
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status LIKE ".*failure.*" DETAIL name: 20:11 deviceId: 8 diskType: HardDisk enclosureDeviceId: 20 errMediaCount: 980 errOtherCount: 0 foreignState: false luns: 0_11 makeModel: "SEAGATE ST360057SSUN600G" physicalFirmware: 0A25 physicalInsertTime: 2011-11-29T13:40:05+02:00 physicalInterface: sas physicalSerial: E1EY5Z physicalSize: 558.9109999993816G slotNumber: 11 status: predictive failure |
1 2 |
CellCLI> list alerthistory 1_1 2017-09-09T15:41:00+03:00 critical "Hard disk status changed to predictive failure. Status : PREDICTIVE FAILURE Manufacturer : SEAGATE Model Number : ST360057SSUN600G Size : 600GB Serial Number : E1EY5Z Firmware : 0A25 Slot Number : 11 Cell Disk : CD_11_orclcel08 Grid Disk : RECO_ORCL_CD_11_orclcel08, DBFS_DG_CD_11_orclcel08, DATA_ORCL_CD_11_orclcel08" |
Check the Rebalance Operation in ASM Disk Groups
Do not change the disk if there is a running rebalance operation. Wait for the process to finish.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
[root@orcldb01 ~]# su - oracle [oracle@orcldb01 ~]$ . oraenv ORACLE_SID = [oracle] ? +ASM1 The Oracle base has been set to /u01/app/oracle [oracle@orcldb01 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.3.0 Production on Mon Sep 11 15:46:58 2017 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> select * from gv$asm_operation; no rows selected SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options |
Check the Status of the Disk to be Replaced
1 2 3 4 5 6 7 |
CellCLI> list celldisk where lun=0_11 CD_11_orclcel08 proactive failure CellCLI> list griddisk where celldisk=CD_11_orclcel08 attributes name,size,status,asmmodestatus DATA_ORCL_CD_11_orclcel08 423G proactive failure DROPPED DBFS_DG_CD_11_orclcel08 29.125G proactive failure DROPPED RECO_ORCL_CD_11_orclcel08 105.6875G proactive failure DROPPED |
Make sure that the Asm Disks on the Damaged Disk are Removed from the Disk Groups
With the following query, you can be sure that there are no asm disks associated with the damaged disk. Because the values in the name column in the v$asm_disk view are in the upper case, use upper case when querying.
1 |
SQL> select * from v$asm_disk where name like '%CD_11_ORCLCEL08%' |
Although it is not necessary, it is useful to remove the disk completely from the system with the following command. In some versions, this command may cause a syntax error. If values from the above query returns, the following command must be executed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CellCLI> alter physicaldisk 20:11 drop for replacement Physical disk 20:3 was dropped for replacement. CellCLI> CellCLI> list physicaldisk 20:11 detail name: 20:11 deviceId: 8 diskType: HardDisk enclosureDeviceId: 20 errMediaCount: 980 errOtherCount: 0 foreignState: false luns: 0_11 makeModel: "SEAGATE ST360057SSUN600G" physicalFirmware: 0A25 physicalInsertTime: 2011-11-29T13:40:05+02:00 physicalInterface: sas physicalSerial: E1EY5Z physicalSize: 558.9109999993816G slotNumber: 11 status: warning - predictive failure - dropped for replacement |
Change the Damaged Disk
You can change the disk at this stage. When ejecting the disc, click the button and pull the disk a little and wait for the lights to turn off completely. Then remove the disc completely from the slot.
After the change, you should check the following to see if everything is normal.
After the disk change, the rebalance process will start automatically in asm disk groups. You can increase the power value for shorter Rebalance operation. You can review the article titled “How To Increase ASM Rebalance Processing Speed” for this process.
Make sure Everything Goes Well After the Change
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
CellCLI> list physicaldisk 20:11 detail name: 20:11 deviceId: 8 diskType: HardDisk enclosureDeviceId: 20 errMediaCount: 0 errOtherCount: 0 luns: 0_11 makeModel: "SEAGATE ST360057SSUN600G" physicalFirmware: E5AFNP physicalInsertTime: 2017-09-11T13:35:43+02:00 physicalInterface: sas physicalSerial: L89VNM physicalSize: 558.9109999993816G slotNumber: 11 status: normal CellCLI> list celldisk where lun=0_3 detail name: CD_11_orclcel08 comment: creationTime: 2017-09-11T13:35:43+02:00 deviceName: /dev/sdl devicePartition: /dev/sdl diskType: HardDisk errorCount: 0 freeSpace: 0 id: 51a96b02-82d6-4d1e-bc82-a8d6e8305c28 interleaving: none lun: 0_11 raidLevel: 0 size: 557.859375G status: normal CellCLI> list griddisk where celldisk=CD_11_orclcel08 attributes name,size,status,asmmodestatus DATA_ORCL_CD_11_orclcel08 423G active ONLINE DBFS_DG_CD_11_orclcel08 29.125G active ONLINE RECO_ORCL_CD_11_orclcel08 105.6875G active ONLINE |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
[root@orclcel08 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 |egrep 'Slot Number|Firmware state' Slot Number: 0 Firmware state: Online, Spun Up Slot Number: 1 Firmware state: Online, Spun Up Slot Number: 2 Firmware state: Online, Spun Up Slot Number: 3 Firmware state: Online, Spun Up Slot Number: 4 Firmware state: Online, Spun Up Slot Number: 5 Firmware state: Online, Spun Up Slot Number: 6 Firmware state: Online, Spun Up Slot Number: 7 Firmware state: Online, Spun Up Slot Number: 8 Firmware state: Online, Spun Up Slot Number: 9 Firmware state: Online, Spun Up Slot Number: 10 Firmware state: Online, Spun Up Slot Number: 11 Firmware state: Online, Spun Up |