Tip & How-To about Computers & Internet
Let's see what are the most common causes of network downtime, and what can be done about them?
Issue: Hardware single points of failure
Solution: Try to minimize single points of failure. When they're unavoidable, make sure broken hardware can be replaced quickly.
If it's impossible to connect servers to multiple switches, be able to replace a broken switch with another one that has an identical configuration. So either use those switches in their default port/VLAN configuration, or set up a spare switch with an appropriate basic configuration in advance so simply replacing the hardware is enough to restore connectivity.
Issue: Routing protocol problems
Sometimes, for some reason, the packets are lost, but the BGP doesn't notice that there's a problem, so traffic is flowing towards the black hole rather than be rerouted.
Solution: One way to detect this problem is to monitor reachability of key remote services. Another is looking at total traffic, which will be much lower than usual in the presence of a black hole. Once detected, recovering from routing black hole affecting one ISP is very simple: shut down the BGP session towards that ISP until they've fixed the problem.
Issue: Routing problems caused by software bugs
Solution: It is useful to have equipment from different vendors, so that if one device is affected by a bug, the other one isn't.
Issue: Power supply problems
Solution: Having some kind of backup power is key. Also all network components must have redundant power supplies connected to different circuits, so they can keep running when there's a failure or maintenance on one feed. Make sure one circuit can provide adequate power by itself, and that the circuits have as few components in common as possible. If it is not possible to ensure redundant power supply, it's important to have two components that provide backup for each other connected to different power circuits, so if one circuit goes down, that doesn't take out both devices.
Posted by Noction on
Apr 07, 2013 | Dell Ultrathin Xps 13 L321x I7-2637m...
A network which contains many network components of both hardware and software can incur failures due to one (or even multiple) of its contained components incurs a failure. Ranging from the largest to the smallest and from hardware to software, network failures can be divided into the following categories:
Single failure and multiple failures: in general network failure implies a single network failure because network failure normally seldom occurs. However, under some situations, there can be more than failure occurring with a network. This kind of situation is called multiple failure, example, dual failures and triple failures. For a network, planning for full recovery of multiple failures generally requires much high protection capacity than that of purely for a single failure.
Jan 04, 2011 | Samsung J700 Cellular Phone
May 29, 2008 | Trendware TEW-410APBplus 802.11g/b...
1,083 people viewed this tip
Usually answered in minutes!