[geeks] Network Nightmare

Grant Taylor gtaylor at tnetconsulting.net
Mon Jul 20 13:14:45 CDT 2020


On 7/20/20 4:33 AM, md.benson at gmail.com wrote:
> Hi,

Hi,

> We are having a major issue at $work with or LAN just completely
> flaking out or slowing up to a crawl.

Please try to articulate which it is, completely flaking out, or just
slowing to a crawl.  Or does it slow to the point that it is considered
to be not working?  "Slowing" can be caused by a number of things, each
with different symptoms.  "Not working (at all)" is usually something
different.

> It usually happens in the morning when people start coming into work
> and booting up PCs (which means Ibm not usually on site to analyse
> it).

This hints of an old school broadcast storm, possibly or probably
NetBIOS related.

> Usually disconnecting one of the Area switches from the Central switch
> will calm it down, but it can reoccur several times.

That's an old hack to divide a big broadcast storm into multiple smaller
broadcast storms such that they have better chance of working themselves
out.

I didn't see any indication as to what OS the machines run or what
networking protocols you're using.  Is this by chance a network of PCs
that are running Windows?  If so, are they on a domain?  Or are they
simply in a workgroup?

What you're describing seems quite typical for Windows PCs going through
a NetBIOS browse master election storm.  When each and every computer
turns on, it tries to be the browse master by declaring itself as such.
Then an election occurs of all the currently online computers to
determine who will actually be the brows master.  This process can take
anywhere from 90 seconds to 5+ minutes each time.  Each and every
computer that turns on during that window restarts the process.  The
more systems you have involved in the process, the longer that it will take.

Usually, during such browse master elections, anything trying to start
accessing the network will have significant delay.  Conversely,
established and ongoing connections will usually be okay and not suffer
much.

> We have one central (unmanaged) switch that has the 2 main area
> switches (also unmanaged) spurred off it as well as a WiFi AP and a
> switch on a Fibre line (via copper to fibre media converters). The
> main switch then connects to a Firebox T35 via a single line to the
> main switch.

I'm not hearing anything that would be causing a loop.  Nor does your
picture indicate such.

There's also the fact that a loop would likely exist all the time and as
such cause problems all the time, not just during the inrush as people
get into the office in the morning.

> The T35, as well as vetting our LAN traffic, routes off-site traffic
> to 3 WANs depending if itbs for a specific Cloud service (uses RDP
> Terminal services via a permanent VPN over a dedicated VDSL line), VoIP
> (uses itbs own VDSL line to prevent internet traffic compromising
> call quality) or general Internet (has a general use VDSL line of
> itbs own).

I think that it should be fairly easy to identify if it's Internet or
LAN problems.  If it's the Internet, things on the LAN should be
unaffected.  Maybe hardware LAN problems could cause the Internet to
appear problematic while the LAN appears problematic too.  But I suspect
you could test this with a notebook plugged directly into the T35.

> We have 2 servers (in different places, one in a rack with a dedicated
> switch), about 20 client PCs, 2 WiFi APs (with minimal traffic as they
> are only accessible through equipment I allow). We will be introducing
> 18 VoIP phones also in the next week or so so as you can imagine LAN
> outages or blackouts are going to be a major headache.

I would strongly suggest that you get an understanding of what the
problem is before introducing VoIP phones into the mix.  They will do
nothing but cause more stress on the existing configuration.  Plus, they
will likely be a high visibility point.  I would not want to invite such
while dealing with an ongoing problem.

> The infrastructure wiring is all Cat6a (STP) and is less than 3
> years old.

Even old school 100BaseT on Cat5 should be perfectly fine.  This sounds
much more like a software issue on top of the network to me.

> Herebs a link to a diagram:
>
> https://www.dropbox.com/s/j4qf3evxqy31xcb/network-2020.png?dl=0

I don't see anything indicative of a loop in the diagram.

If it was someone tiding up cables and plugging two station runs
together, the problem would exist all the time, not just in the mornings.

There is a slim chance that the problem does exist and that you are
running very close to the limit and the morning load just pushes things
over the top.  But, I doubt that.

> I understand the situation is very vague but I am at a loss to know
> how to begin diagnosis of this kind of issue. I canbt just pull up
> logs for something at link level with a setup like this, and I lack the
> expertise to know the go-to tools or methods in a situation like this.

If this happens (almost) every morning, the very first things I would do
would be to plug a notebook in with a sizeable drive (possibly external)
and have it record traffic that comes into it's network card overnight
and into the morning when the problem starts.

If it is a broadcast storm / browse master election, it should be quite
obvious in the packet capture.

> My only possible notion thus far is we might be overwhelming the
> Firewall (we are near itbs recommended limits) but the situation
> seems more like something in the LAN is getting stuck in a loop or
> a bunfight or something.

Hypothetically, if the firewall is routing between multiple subnets on
the LAN, it might be a problem, especially if clients and servers are on
opposing subnets and having to go through the firewall.

But we'd need to know a lot more about the LAN, workstations, and
servers to be able to speculate further.

> All help and suggestions gratefully received.

Share more about the network; client & server OS, protocols, subnet
information, etc.

Start a packet capture overnight.



--
Grant. . . .
unix || die

[demime 1.01d removed an attachment of type application/pkcs7-signature which had a name of smime.p7s]


More information about the geeks mailing list