The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

IPSec Tunnel stops working

Locust76Locust76 Registered User regular
edited November 2010 in Help / Advice Forum
I've got a Fortigate firewall connecting over the internet via public IPs with a Bintec Router (or whatever the hell its called) via IPSec. It's for a customer of ours so they can access a server from one of their partner companies.

The Tunnel is actually active and stays up, but after the Phase 2 timeout (or close to the point where it would time out), data traffic ceases. I've confirmed this by running a steady ping from the Fortigate (using it's internal trusted interface as the source) to the destination server. Since a new ping gets sent every second, it basically counts how long data flows over the tunnel.

I've contacted the techs on the distant end and they seem to have more experience with IPSec tunnels, but they also don't have any concrete answers. We set the Key Life time up from 1,800 seconds to 18,000 seconds. Sure enough, 16,300 seconds later, the tunnel died again, wheras it used to die after about 1,600 seconds.

If I click the option "Bring Down" to kill the tunnel, the tunnel suddenly starts working again without actually going down.

I have the option for "Autokey Keepalive" on the Fortigate activated, but the other guys don't have any such option that I could ascertain. Also, Dead Peer Detection is activated in Phase 1.

TL;DR: Anyone have any ideas as to why my IPSec tunnel suddenly stops passing traffic, yet stays online, and instantly resumes sending traffic as soon as I reinitialize the tunnel by hand?

Locust76 on

Posts

  • Locust76Locust76 Registered User regular
    edited November 2010
    First of all: Bump.

    Second of all: Since I'm testing this connection with neverending Pings, can it be that the tunnel reactivates with TCP or UDP traffic and "ignores" ICMP? I ask because I've noticed that even though the tunnel goes down regularly, it comes back up on it's own, suggesting that maybe an initiated TCP session brought the tunnel back up or something...

    Locust76 on
  • twmjrtwmjr Registered User regular
    edited November 2010
    Locust76 wrote: »
    First of all: Bump.

    Second of all: Since I'm testing this connection with neverending Pings, can it be that the tunnel reactivates with TCP or UDP traffic and "ignores" ICMP? I ask because I've noticed that even though the tunnel goes down regularly, it comes back up on it's own, suggesting that maybe an initiated TCP session brought the tunnel back up or something...

    In general, the devices will bring up the IPSEC tunnel when "interesting traffic" is observed as defined by the firewall device. So the answer to your question is: it depends. I'm not terribly familiar with the equipment being used (I'm primarily a Cisco guy), but I would expect the tunnel to go down if there were no traffic traversing it.

    Is the tunnel going down actually causing a problem? Or does the session get built every time the client goes to access the server? Of course their first attempt might fail if the tunnel isn't built and doesn't get built before the application times out, but a retry should work just fine in that instance.

    If you want the tunnel established permanently, you need to keep "interesting traffic" flowing across it. The way we do this for one of our clients is by running a routing protocol across the link (EIGRP in our case since it's pretty chatty).

    twmjr on
  • Locust76Locust76 Registered User regular
    edited November 2010
    Well, the tunnel is staying up, and so far I'm the only one complaining about it, even though the customers insisted that it be up and running by Tuesday. They would have probably said something if it wasn't to their liking. Then again, I'm pretty sure I already tried my "what if they attempt to make a connection from their side" theory and it didn't work. I'll have to follow up on that.

    I just checked it again, and the tunnel is back up and passing traffic. We set it up at around 10 this morning and it stayed up for 4.5 hours, then died for probably about the same amount of time, and now that I'm at home, it's up again. really weird.

    I don't really have access to the internal devices on either side, so I can't tell if the firewall builds the connection again after idling it when something comes across. I don't even know if traffic originating from the firewall itself is considered "interesting" enough to rebuild the connection, and I cant really test TCP-based connections because I don't have the option to define a source address or interface when using the telnet or ssh commands on the Fortigate (though I do have the option to choose a source when pinging).

    Somehow I think the tunnel is timing out, staying down for the next timeout period and then coming back up for the third timeout. Really strange shit.

    Locust76 on
  • harry.timbershaftharry.timbershaft Registered User regular
    edited November 2010
    Problems like this, when it comes to working with IPSec, suck as it could be a situation where something implementation-specific is breaking it... especially when working with devices of two different brands. That being said, try setting up the tunnel to use IKE in aggressive mode and see what happens. Make sure everything is configured identically on both ends before changing the configuration in this manner. Also, if this works, ensure you know what level of protection is afforded to the endpoints regarding IKE operating in main mode vs. aggressive mode.

    harry.timbershaft on
  • Locust76Locust76 Registered User regular
    edited November 2010
    We went through line for line and configured both sides identically (as much as we could, anyways, because both ends are different brands). IKE in aggressive mode is set up already.

    I looked at the logs this morning and this is what I found:
    6	2010-11-18	07:17:43	notice	 	negotiate	Initiator: tunnel THEIR IP, transform=ESP_AES, HMAC_MD5
    7	2010-11-18	07:17:43	notice	 	negotiate	Initiator: sent THEIR IP quick mode message #2 (DONE)
    8	2010-11-18	07:17:43	notice	 	tunnel_up	IPsec tunnel to THEIR IP:500 is up
    9	2010-11-18	07:17:43	notice	 	install_sa	Initiator: tunnel THEIR IP install ipsec sa
    10	2010-11-18	07:17:43	notice	 	negotiate	Initiator: sent THEIR IP quick mode message #1 (OK)
    11	2010-11-18	07:17:30	information	ssh(ADMIN IP)	login	Administrator admin logged in successfully from ssh(ADMIN IP)
    12	2010-11-18	02:24:41	notice	 	negotiate	Initiator: sent THEIR IP aggressive mode message #2 (DONE)
    13	2010-11-18	02:24:40	notice	 	negotiate	Initiator: sent THEIR IP aggressive mode message #1 (OK)
    14	2010-11-18	02:24:25	notice	 	negotiate	Initiator: sent THEIR IP aggressive mode message #1 (OK)
    15	2010-11-18	02:24:10	notice	 	negotiate	Initiator: sent THEIR IP aggressive mode message #1 (OK)
    16	2010-11-18	02:23:55	notice	 	negotiate	Initiator: sent THEIR IP aggressive mode message #1 (OK)
    17	2010-11-18	02:23:40	notice	 	negotiate	Initiator: sent THEIR IP aggressive mode message #1 (OK)
    18	2010-11-18	02:23:25	notice	 	negotiate	Initiator: sent THEIR IP aggressive mode message #1 (OK)
    

    The repetitive aggressive mode message #1's persist for about 45 minutes every 15 seconds until the log file ends. I was sleeping, so I obviously don't know if the tunnel was passing data, but message #11 shows my login this morning and when I pinged from that point in time the tunnel was passing data... So far the tunnel has been up for about an hour according to the timeout, and I logged in probably about an hour and 10 minutes ago for the first time today...

    Locust76 on
  • harry.timbershaftharry.timbershaft Registered User regular
    edited November 2010
    Yeah, looks like their end has stopped responding during the aggressive mode exchange, as you've noted. So just out of curiosity, for completeness sake, have you tried IKE main mode or is there a requirement for running in aggressive mode? Are the logs on the other endpoint showing any errors? Are both endpoints running the newest versions of code available? IPSec is such a horribly complicated protocol, so many things can break and can almost always be attributed to something implementation-specific. I've had Cisco ASAs on two different dot releases not be able to reliably keep a tunnel established because of goofiness in the implementation of IPSec between revisions. It sucks, but there might not be an clean resolution to this one.

    Also, I'm not sure if you meant to, but you left a valid IP addy listed in the provided log entries.

    EDIT: Even though the average lifespan of an SA is typically pretty short in practice, I still recommend not using MD5 for HMAC, and rather SHA1 or even better SHA256 if it's available. This is especially important if you're doing U.S. federal GVT work (and even though you're in Germany this still may be the case :) ).

    harry.timbershaft on
  • Locust76Locust76 Registered User regular
    edited November 2010
    Woops... corrected :)

    I haven't bothered to talk with the other guys yet, as I want to be able to thoroughly test my side to see what works and what doesn't. I set the phase 2 keepalive all the way down to two minutes (their side is still sitting at 18,000 seconds), and the connection has been up for about 15,000 seconds, with my side timing out every two minutes and renewing (that's what it does after a timeout, right?)

    These settings are all complements of the customer, so I'm not gonna get into which encryption method is better with them ;)

    Other than that, I haven't tried running main mode, but I will do that as the next step if things continue to crap out...

    Locust76 on
  • harry.timbershaftharry.timbershaft Registered User regular
    edited November 2010
    Yes, it should attempt to establish a new SA after the timeout period expires.

    Is PFS enabled on these endpoints? It occurs to me that if one end is and one end isn't, it could appear to cause the completion of phase 2 to "hang" if the unexpected extra step were being handled inappropriately by the endpoint not configured to use it. Something else to check out. I mean really we could go on and on and on about this kind of little stuff. Have you grabbed a packet capture of the negotiation?

    In any event, good luck, and if you don't mind posting here or PMing me with what you do find as a resolution, I would appreciate it. I'm curious. :)

    harry.timbershaft on
  • Locust76Locust76 Registered User regular
    edited November 2010
    I'm still testing. I set the Phase 2 Keylife at 1 hour, and now it stays up for an hour, dies for an hour and comes back up after an hour. Gonna set it back to two minutes and see if it does the same thing in two minute intervals.

    Locust76 on
  • nasbotnasbot Registered User new member
    hi Locust76, did you find a solution ???
    We face the same problem here. We did everything - debugging ant etc. but no success yet. Please share your knowledge if you manage to solve the problem.

This discussion has been closed.