2 Down CMMs on 61000 chassis

From cpwiki.net
Jump to: navigation, search
Check Point Profressional Services

problem description

symptoms 1. "asg stat" shows 2 down CMMs and 0/2 up as seen here

[Expert@61k_fw-ch01-01]# asg stat
--------------------------------------------------------------------------
| System Status                                                          |
--------------------------------------------------------------------------
| Up time                   | 1 year, 38 days, 23:14:58 hours            |
--------------------------------------------------------------------------
| Current CPUs load average | 1 %                                        |
| Concurrent connections    | 5176                                       |
| Health                    | CMMs                        2 Down         |
--------------------------------------------------------------------------
| Chassis 1                 | STANDBY                    UP / Required   |
|                           |   SGMs                     12 / 12         |
|                           |   Ports                     2 / 2          |
|                           |   Fans                      6 / 6          |
|                           |   SSMs                      2 / 2          |
|                           |   CMMs                      0 / 2   (!)    |
|                           |   Power Supplies            5 / 5          |

2. bay 1(bottom) CMM has red status light

The bay1 CMM had a red status light...I am not sure which LED (act, ctr, pwe, mjr, hs, mnr) was red. I failed to gather that info from the onsite person helping me. bay2 LEDs were normal, none red.

3. inter-chassis network connectivity to CMMs failing

There were are no console cables plugged into the CMM cards for this device. So, no troubleshooting could be done there. The active CMM was not reachable via the CMM IPs 198.51.100.33 or 192.51.100.233. These are the IPs used on all 61000 devices for intercommunication with the CMMS. packet captures below show the CMM not responding to arp requests.

listing of the chassis ports for CMM connectivity...

[Expert@61k_fw-ch01-01]# ifconfig -a |grep -A 1 CIN
eth1-CIN    Link encap:Ethernet  HWaddr 00:1C:7F:20:14:7C  
           inet addr:198.51.100.1  Bcast:198.51.100.127  Mask:255.255.255.128
eth2-CIN    Link encap:Ethernet  HWaddr 00:1C:7F:20:14:7D  
           inet addr:198.51.100.201  Bcast:198.51.100.255  Mask:255.255.255.128

packet capture taken on CMM networks show arp requests but no replies. Neither CMM appears to be responding on their network connection.

[Expert@61k_fw-ch01-01]# tcpdump -i eth2-CIN
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2-CIN, link-type EN10MB (Ethernet), capture size 96 bytes
05:37:25.865315 arp who-has 198.51.100.233 tell 198.51.100.204
05:37:26.864981 arp who-has 198.51.100.233 tell 198.51.100.204
05:37:27.864873 arp who-has 198.51.100.233 tell 198.51.100.204
05:37:29.467760 arp who-has 198.51.100.233 tell 198.51.100.204
...

[Expert@61k_fw-ch01-01]# tcpdump -i eth1-CIN tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1-CIN, link-type EN10MB (Ethernet), capture size 96 bytes 05:38:01.206437 arp who-has 198.51.100.33 tell 198.51.100.1 05:38:02.206092 arp who-has 198.51.100.33 tell 198.51.100.1 05:38:04.143829 arp who-has 198.51.100.33 tell 198.51.100.1

problem resolution

Due to the lack of CMM console cables and telnet/ssh connectivity, we resorted to physically resetting the cards one at a time. They were physically reset by pulling them out and then re-inserting them. First bay 1, then bay 2. After resetting bay 1, there was no change in status. After resetting bay2, then the red error status light on bay1 went green. Also the CMM status changed from 0/2 up to 2/2 up as seen below.

[Expert@61k_fw-ch01-01]# asg stat | grep -i -E "chassis|cmms"
| Chassis 1                 | STANDBY                    UP / Required   |
|                           |   CMMs                      2 / 2          |

root cause

root cause undetermined