Hardware issues that occur in the Exadata db node and cell nodes need to be manually detected and cleaned in some cases. Even in temporary situations, the hardware fault light is on and a fault record is displayed when controlled via ilom.
It is possible to connect to the servers via SSH and to view and clear the related errors.
You can connect to the server from the ilom IP as follows.
1 2 3 4 5 6 7 8 |
[root@exadb01 ~]# ssh 10.124.20.14 Password: Oracle(R) Integrated Lights Out Manager Version 3.1.2.20.b r82465 Copyright (c) 2013, Oracle and/or its affiliates. All rights reserved. |
Once we connect, we can display the current errors as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
-> show faulty Target | Property | Value --------------------+------------------------+--------------------------------- /SP/faultmgmt/0 | fru | /SYS/MB/RISER1/PCIE4/F20CARD /SP/faultmgmt/0/ | class | fault.chassis.device.esm.eol.exc faults/0 | | eeded /SP/faultmgmt/0/ | sunw-msg-id | SPX86-8002-S3 faults/0 | | /SP/faultmgmt/0/ | component | /SYS/MB/RISER1/PCIE4/F20CARD faults/0 | | /SP/faultmgmt/0/ | uuid | 9e8b85cb-6f07-c007-d592-c19a4618 faults/0 | | 244e /SP/faultmgmt/0/ | timestamp | 2017-10-17/18:21:16 faults/0 | | /SP/faultmgmt/0/ | system_component_seria | 1038FMM112 faults/0 | l_number | /SP/faultmgmt/0/ | system_component_part_ | 602-4982-01 faults/0 | number | /SP/faultmgmt/0/ | system_component_name | SUN FIRE X4270 M2 SERVER faults/0 | | /SP/faultmgmt/0/ | system_component_manuf | Oracle Corporation faults/0 | acturer | /SP/faultmgmt/0/ | chassis_serial_number | 1038FMM112 faults/0 | | /SP/faultmgmt/0/ | chassis_part_number | 602-4982-01 faults/0 | | /SP/faultmgmt/0/ | chassis_name | SUN FIRE X4270 M2 SERVER faults/0 | | /SP/faultmgmt/0/ | chassis_manufacturer | Oracle Corporation faults/0 | | /SP/faultmgmt/0/ | system_serial_number | 1038FMM112 faults/0 | | /SP/faultmgmt/0/ | system_part_number | 602-4982-01 faults/0 | | /SP/faultmgmt/0/ | system_name | SUN FIRE X4270 M2 SERVER faults/0 | | /SP/faultmgmt/0/ | system_manufacturer | Oracle Corporation faults/0 | | /SP/faultmgmt/0/ | fru_name | Aura1 faults/0 | | /SP/faultmgmt/0/ | fru_manufacturer | Celestica Apodaca,Nuevo Leon, faults/0 | | Mexico /SP/faultmgmt/0/ | fru_serial_number | 0111APO-1029AU007Q faults/0 | | /SP/faultmgmt/0/ | fru_rev_level | 50 faults/0 | | /SP/faultmgmt/0/ | fru_dash_level | 02 faults/0 | | /SP/faultmgmt/0/ | fru_part_number | 511-1500 faults/0 | | |
When more than one error occurs , the existing errors will start as /SP/fultmgmt/1 , /SP/fultmgmt/2.
To resolve the error, we will use the component in the value column in the first row of errors. The component of the error mentioned above with “/SP/fultmgmt/0” is “/sys/MB/RISER1/PCIE4/F20CARD”. We can close the error in this component as follows.
1 2 3 |
-> set /SYS/MB/RISER1/PCIE4/F20CARD clear_fault_action=true Are you sure you want to clear /SYS/MB/RISER1/PCIE4/F20CARD (y/n)? y Set 'clear_fault_action' to 'true' |
When we check later, we will not see any errors.
1 2 3 |
-> show faulty Target | Property | Value --------------------+------------------------+--------------------------------- |