Since September we have had a new upstream network architecture in testing. The new setup provides additional redundancy (from N+1 to N+3 for network links), additional control of routing selection (providing us more opportunities to respond to networking problems more quickly), additional bandwidth (by double) and ability to easily increase bandwidth and diversity.
I am now confident that we can start moving production traffic over to the routers and network connections. These changes will occur Tuesday through Thursday nights, January 16 through 18 between 10pm and 2am.
On Tuesday, we will move one block that largely contains our own servers, though some older (primarily virtual) servers will be moved.
On Wednesday another block, also consisting of a number of tummy servers, but this time more customer servers will be moved.
Thursday we will move the remainder of our networking blocks over.
The old networking setup will remain in place, providing the ability for us to migrate back to the old setup if necessary.
November 27,
2006
Wednesday, November 29 from 11pm Mountain time to 2am, one of our upstream carriers will be performing network maintenance to increase capacity of a line. This is preemptive maintenance, there is no current impact requiring the upgrade. We may also, at this time, be testing a new router and Internet connection. There may be brief periods of packet loss as traffic reconverges to the different network lines in response to maintenance.
November 27,
2006
Wednesday, November 29 from 11pm Mountain time to 2am, one of our upstream carriers will be performing network maintenance to increase capacity of a line. This is preemptive maintenance, there is no current impact requiring the upgrade. We may also, at this time, be testing a new router and Internet connection. There may be brief periods of packet loss as traffic reconverges to the different network lines in response to maintenance.
October 30,
2006
On Wednesday night between 10pm and midnight we will be making some small changes to our network infrastructure. We will be bring down the "B" side (backup only to the A side) of one of our network connections, to re-connect with a more direct path to that upstream backup network.
Because this line is not currently passing traffic, we expect no impact. However, we strive to keep you updated about maintenance which may impact you or your customers. Please let us know if you notice any problems related to this change.
October 27,
2006
Between 3:00pm and 3:30pm today (Friday October 27) our network was experiencing high latency and packet loss due to a Denial of Service attack on one of our providers. Normal traffic flow resume around 3:30pm.
Note that we are in the late testing stages of another connection, which we will use to isolate us from this sort of issue, which, while not directed at us, has impacted our network connectivity twice in the last year.
It was scheduled to be deployed this week for final testing, but we found a problem which we are currently working. This is why we do extensive testing. :-)
Everything seems to have been back to normal for the last 20 minutes.
August 8-9 ,,
2006
Tonight between 11pm and midnight, we will be doing a minor software upgrade on our backup router. Tomorrow night, after the backup router has been running a day, we will upgrade the primary router during the same window. No network impact is expected, in the past there has been zero packet loss during a fail-over and fail-back to the backup node.
July 6,,
2006
From 11pm to 2am, one of our upstream carriers will be upgrading equipment on their network to provide enhanced network performance. During the installation and testing, there may be some network latency route instability as routes change between the upstream connections.
June 13,
2006
The networking problems are resolved, it was apparently memory problems on one of the upstream links. It was causing large packets to be mangled and have to be retransmitted, sometimes it would make it sometimes not. However, it was not such that BGP could route around it without explicit intervention.
We have failed over to the other link, and things have been fine for the last 25 minutes. Further diagnostics with the router at the other upstream will be done and we expect to switch the other connection back on tomorrow.
June 13,
2006
Starting around 6:50pm (16 minutes ago) we have been seeing slowness, particularly with web connections. It seems to be happening over both upstream connections equally. We are working with our providers to track this down. More information as we have it.
June 7,
2006
one of our upstream connections to the Internet will be brought off-line while an interface card is upgraded. Our upstream has determined that this card upgrade is necessary. Brief outages (90
seconds) and increased latency may occur during the hour from 23:00 to midnight.
May 31,
2006
Starting at 11pm through Thursday at 2am we are opening a maintenance window for the installation of a network routing optimization engine. This system proactively optimizes routing paths based on enhanced information about BGP routing paths, with the goal of providing not only better performance, but also better routing around network paths that are having problems.
During the 11pm to 2am time, we expect to be up the majority of the time, but there may be some outages or increased latency during that time as routes reconverge between upstreams.
February 17, 2006
We are currently seeing high packet loss on the Internet between Time Warner and AT&T. We've had several users on cable modems report packet loss and high latency which we've tracked to this problem.We are working right now on dropping our association with Time Warner, to try to force traffic to go around this problem, and will open a ticket with Time Warner about it.
January 12, 2006
Tonight between 16:30 to 18:30 Mountain Time, machines on our network experienced heavy loss and latency due to a distributed denial of service of attack to several IPs related to our upstream Internet connectivity. Redundant connectivity didn't help this because the DDoS would follow the switching over to another link.We will be working with the upstreams and conducting a service outage analysis to deterine what countermeasures we can put in place to prevent this from happening again, and reduce the impact should it happen again in the future. Note that this was different than we have seen in the past because it was not directed at servers on our network, so previous countermeasures we put in place could not help.
August 14, 2005
Our network was experiencing extreme latency and packet loss starting around 4:50 this morning and lasting until 5:20am. This was due to a Denial of Service attack, which we have been able to mitigatge. This is similar to the outage we had a few months ago, and we will be working to refine our abilities to detect and more quickly stop these sorts of problems in the future.
August 6-8, 2005
One of our upstreams will be performing maintenance on their routers this weekend on Saturday, Sunday, and Monday between midnight and 6am Mountain time. This is August 6 through 8 in the early morning hours. Minimual if any impact is expected on connectivity, though intermittant loss may occur during this time. Either their or our systems should route around any problems any sort of extensive reachability issues should occur.
August 5, 2005
Our immediate upstreams have done an evaluation of the routers and other networking gear we connect to and believe it to not be vulnerable to the "IPv6 Crafted Packet" buffer overflow and subsequent exploit. We are discussing with them whether anything needs to be done for future buffer overflow issues to prevent the extended problems that were announced last week at Defcon.
August 2, 2005
As you may have heard, recently a researcher has found a number of vulnerabilities against Cisco-based networking equipment. Details at this point are pretty scarce. However, our network is 100% Cisco-free, so we aren't anticipating any problems with our own connections. That's the good news. The bad news is that I'm fairly sure that Cisco gear is used extensively in the network connections up-stream from us. I have started that dialog with our upstreams to see what action they are taking to protect your networking.I just wanted to let you know the status of it and what our response so far has been. I'll pass on additional information as it becomes available.
August 1, 2005
I have successfully completed converting the backup router to the newer software revisions, and running tests on the fail-overs. In the case of fail-overs for maintenance, there is little if any packet loss. Definitely less than a second to fail-over. In the event of a hardware or software failure on one of the routers, the fail-over may take 15 to 30 seconds while routing protocols detect the failure and reconverge.
July 31, 2005
I will be working on migrating our backup router to the new software that has been so successfully running on the primary router for the last week. For about the first 30 to 90 minutes I will be re-imaging the router and preparing to bring it up into the high availability router cluster. No down-time during this should be involved. Over the next hour or two I expect to run a number of test fail-overs to ensure that it is operating as expected. During this time there may be a number of small (10 to 90 second) outages as the routing protocols reconverge.This will likely be the last of the work on converting the routers to the new software platform.
July 17-20, 2005
Sunday through Wednesday night from 11pm to 2am Mountain Time we will be doing some work to improve our Internet infrastructure. Some periodic interruptions in service may occur as routing tables re-converge, but in general the network will be up and available during this time.
July 19, 2005
This morning around 2am we completed the migration to the new router configuration on the primary router. This involved changing pretty much everything about the setup, so if you notice any problems please don't hesitate to let us know and we'll look into them. We have a clear backout procedure, we can just fail back to the standby router which is running all the previous configurations.
On another note, a succress stories here, Last night around 9pm Mountain Time, the fiber transciever which connects the primary router to the primary upstream failed. The routers detected this and failed over to the secondary router without any problems and with little interruption in service. We went through some diagnostics and eventually settled on replacement of the transciever which resolved the problem.
Again, let us know if there are any problems you're noticing with our new connection.