IT support small business
network switch

Network faults: A Disaster Recovery

We recently visited a client who was having major network issues after they had a power cut.  The client had recently upgraded their phone system from ISDN to VOIP and were struggling to access internal or external systems.  The issues were intermittent which caused them several days of pain.

The power outage had blown one of the Netgear POE switches in the IT cupboard which was replaced by the owner without knowing had connected several of the routers and Wi-Fi access points together on the same network which caused multiple network faults.

Creating order from chaos

After speaking with the client and trying several options to connect remotely, I visited their office to find a mess of cables in their IT cupboard and little or no documentation.

The client had rapidly grown over the past 8-9 years and had several IT staff who had added to the network and then left the company.  As a result, the network had multiple routers and switches in their IT cupboard as well as smaller Power Over Ethernet (POE) switches at the banks of desks.

The first task when trying to locate a network fault was to try to document what should be working and what could be causing the issues.  we could see several routers including: Cisco, Netgear, Microtik and a BT business hub.  We could also see several Netgear switches, HP Arubu as well as an Avaya IP500, 2 small servers, a NAS and a POE adaptor going somewhere into the ceiling.

Looking around the desks we could see more Netgear switches.  These connected to POE phones and desktop PCs. 

Checking the PCs we could see some devices were receiving addresses from one of several DHCP servers on the network and only one of them allowed Internet connectivity. 

DHCP (Dynamic Host Configuration Protocol) is a service which is normally setup on routers and servers to provide addresses for each device on the network. 

Each time we disconnected/connected or released/renewed the PC I got an address from a different DHCP server.  There are some cases where you would do this but in a simple network you would normally ensure that each DHCP server provides address different ranges and normally similar gateways/dns entries.

Using an advanced IP scanner to confirm the manufacturer of the default gateway we found that there were 4 devices giving DHCP addresses.  The correct one being the Cisco router and one device, a TP Link that we could not see in the IT cupboard at all.   

We were able to connect to the TP Link via the IP address it provided and found that it was a Wireless Access Point and not a Router.  After asking the client this device was in a training room.  The device had also been recently moved and someone had reset it to factory, and this was acting as a DHCP server.  This function was disabled via the control panel which allowed us to continue.

Chasing Cables

Typically, most clients will have a small network and once you have more than 5-10 desks, the cable management is generally required, and the client’s network was closer to 30 users and even with structured cables and cable management they had continued to exceed the number that it was installed for.  As a result, the colour coding of patch cables and documentation had been lost many years ago. 

Network chaos

Each port on the network switch had to be chased and checked to see if it was in use and by which type of device.  Some network patches were short, while others were longer, and cable tied together making the task to find a network fault harder. 

What we found was when the network switch had failed, several cables had been connected to the wrong switches which results in loops and as in this case wrong DHCP servers providing addresses for the VOIP network instead of the Data network.

Network loops on a managed network switch can be ignored using Spanning Tree Protocol to prevent packet storms that will cripple your network.  New switches do not always have this enabled by default, so it is worth checking their manual.

Have you tried Switching it off and on again!

Ok, this has become quite a standing joke for IT, curtesy of the IT Crowd, but the power in the IT cupboard was also in a mess. 

The client’s cabinet was quite small and probably was suitable when they started.  As they have expanded, the cabinet soon became unsuitable.  The top of the cabinet was used a shelf for the Routers, Avaya (retired) phone system and the NAS.

Avaya IP500 phone system

Two small Uninterruptable Power Supplies were sitting on top of another power extension lead which when it was moved the whole IT cupboard power went off.  Unsurprisingly, the Servers also went off as they were connected to the surge protection side of the UPS and not the battery back-up.  Luckily, the servers restarted quickly and there was no loss of data. 

It is worth noting that some routers or switches do not automatically save their configuration so its important to backup their configurations on a regular basis.  These can be often saved to a PC, Server, or the cloud (which should also be saved locally if you cannot access the internet). Network faults can be caused by device failure but human error is generally the most common. If you change something then it should be documented so you can re-trace your steps or restore a backup on the configuration.

Downtime

We often think that our network equipment will last forever.  Network Routers have been often bundled into our home broadband packages that we forget they probably are quite expensive.   If the last time you changed your network switch was when you moved into your office 5, 10 or 20 years ago then there maybe some advances in network equipment which have occurred since.  Most of us may have gigabit networking.  10 gigabit switches are ideal for larger businesses and You can get smart switches that monitor your traffic and report statistics via an app on your mobile.   

The client did notice that there was a red fan light on their second network switch so had ordered a spare. 

The network downtime caused by these issues for the client may have cost them some project delays or even some lost sales, as well as the cost of the new equipment and our consultancy fees.  While we have been working from home for the last year, we would recommend you review your current IT to ensure you have maintained your equipment in the same way you would service your car each year to ensure it lasts.

Simplifying the network – keep it simple and document.

At some point in the client’s network, routers were added, and the old equipment had not been decommissioned; this had caused a few of the faults we discovered in the network.

If you have multiple sections or layers within a network (normally called vLAN) that are used to segregate users or devices, then the best method is to keep the number of vLANs’ to a minimum.  

There are many manufacturers who provide network equipment that can be managed easily using a website or even mobile app.  (Ubiquiti or TP-Link to name a couple) These work well if the equipment is all the same manufacturers.  The client already used an Ubiquiti Wireless Access Point and this provided a stable platform to extend by adding a further Wi-Fi access point in the training room to replace the TP-Link device which had caused some of the issues and to install a Ubiquiti Switch to provide network management of ports.   Once we had installed these devices, we could see that the traffic and help simplify the network using the Unifi platform.  We were able to add Guest Wi-Fi to the client’s network securely ensure that any servers were protected from any guests.

The Unifi software also helps monitor for rouge devices such as 3rd party Wi-Fi Access Points which could be used to hack your main network. 

unifi wireless access point

Once we were finished, the Unifi software allowed us to easily document the current network and provide a report of devices on the network as well as create a baseline for performance and the ability to expand the Wi-Fi infrastructure easily and quickly should a device fail in the future.

Should you require assistance with your network or would like to discuss any of the network faults or issues we have mentioned in this article please contact our team on 0333 332 6600.