Oracle VM Server hangs with status 'starting' continuously

Recently I have come across an unusual behaviour of Oracle VM Server(Installed on bare metal) where VM manager is able to connect but when I try to start the server, it goes to state 'Starting' for indefinite time. Below are the environment, problem summary and solution worked for our environment-


Environment-
  • Oracle Linux 7.5
  • Architecture- x86-64
  • Oracle VM Server 3.4 installed on Bare Metal
  • Oracle VM Manager 3.4
Problem Summary-
There was some network changes in the environment that caused change in UUID of respective blade server and hence when we tried to start the blade servers, it goes to state 'Staring' continuously for indefinite period of time. Actually the servers were not started.
Upon close observation and additional checks it has come to note that the UUID(Universal Unique Identifier) of respective bare metal server was changed. The same was seen in the log when I tried to 'REFRESH' there server via OVM-

LOG in OVM
-------------------------------------------------------------------------------------
OVMAPI_6000E Internal Error: OVMAPI_4021E Server discover conflict at IP address: 10.0.0.10. The manager already has a server: DC-BS-009, at this IP address, with SMBIOS UUID: d0:67:26:c7:01:00:d0:67:26:c7:01:08:ff:ff:ff:ff. But the server now being discovered: unknown, at that same IP address, has a different SMBIOS UUID: d0:67:26:c7:01:00:d0:67:26:c7:01:00:ff:ff:ff:ff. This can happen in these cases: 1) This server has the same IP address as another server. Please correct that on the servers. Or, 2) The SMBIOS UUID of this server has changed due to a server motherboard change. Please delete the server from the manager and then re-discover it. Or, 3) The SMBIOS UUID has changed due to moving the blade in the chassis and there is an incorrect blade chassis SMBIOS UUID setting which allows the UUID of the server to change with the slot. Please update the blade chassis's SMBIOS UUID settings and re-discover. [Wed Oct 24 17:58:33 IST 2018]
-------------------------------------------------------------------------------------
This was a serious problem for our bare metal startup. The server log clearly indicates the problem and the solution as well. 

Cause-
This can happen in the below cases :
  1. This server has the same IP address as another server. 
  2. The SMBIOS UUID of this server has changed due to a server motherboard change. 
  3. The SMBIOS UUID has changed due to moving the blade in the chassis and there is an incorrect blade chassis SMBIOS UUID setting which allows the UUID of the server to change with the slot. 
How to detect a changed UUID?
The following points to a changed uuid :
  • Unable to join cluster after reboot.
  • "Unable to send notification" messages on the console.
  • The OVM Manager fails to rediscover the server after reboot ( tries 5 times and gives up ).
  • The OVM Server in OVM Manager will become in ERROR state with an error that looks like : "The server has changed IP or is unreachable".
Solution Worked-
Actually, we need to set the UUID of OVM Server in a way so that it shouldn't change irrespective of any network changes-
1. Get the OVM server  UUID  from OVM manager Under the Advance section by choosing the perspective as "Info".
2. Now add the UUID to the file  /etc/ovs-agent/agent.ini on Oracle VM server to the starting with  "fakeuuid" line as there was no UUID present:
# cat /etc/ovs-agent/agent.ini
[server]
fakeuuid= 44:45:4c:4c:54:00:10:4e:80:48:c3:c0:4f:4a:48:31
3. Started the ovs-agent services of the OVM server  : 
# service ovs-agent restart 
4.  Refresh Server via OVM cli-
 4.1 Login to CLI from the manager server
       #ssh admin@localhost -p 10000
 4.2 List the servers and then do a refresh 
        OVM>list server
     OVM>refresh server name=<Name of server found by "list server" command>

The starting status of the server will  change to "Running"

References-




Comments

Popular posts from this blog

Oracle SOA Suite- Implementing Email Notification

Oracle SOA BPEL inserting a new line character in XPath expression