Or, How Compaq and Netware Nearly Killed Me.
Long ago, in the spring of 1999, my department decided that it was time to move from
our distributed file storage system to a centralized file storage system. At the time we
had about thirty Netware servers of varying flavors. Each one was an MPR as well as
providing file and print services for the local users. Most of these Netware servers were
resident on Compaq Proliant 500's. When purchased these boxes were fairly stout and more
than met the requirements to perform their job. That was a long time ago though, and 2 gigs
of storage just wasn't cutting the mustard anymore. Most users had more storage space on
their desktop PC's but still persisted in the belief that the network drives were virtual
bags of holding.
It was decided that what we really needed was one server. One server to rule them
all, One client to find them, One NOS to bring them all and in the darkness bind them...
Ahem. So, there were a number of emerging technologies that we could take advantage of that
would make a central file server available for the entire campus and reliable enough to
support several thousand Users. Or so we thought.
We went through all the state financing bureaucratic hoops and jumps. We requested money for
a Honking big server, a honking big tape backup system, and the software to run it. We had
regional sales reps from Novell come to our site and pitch their wares. We traveled to
HP's home office to review their hardware. Compaq wooed us with stories of 99.99% up time
and limitless terabytes of data storage.
In the end we decided to go with Compaq and Netware 5.1. We were familiar with Netware and
had great hopes for the new build. We had great success with Compaq servers in the past, and
they seemed a logical choice, especially when they revealed that they had designed a
cluster system to work specifically with Netware 5.x. Towards that end we hired MicroAge to
design and build a system for us. We wanted a turnkey system. They would put it together,
drop it off and it would just work. What we got was a nightmare.
What we specifically got were two Proliant 6400R's. Both were identical, they had two nine
gig drives, dual PIII 550 Xeon processors and two gigs of RAM. They were each umbilicaled
with fibre channel to three RAID arrays, for a massive 550 gigs of storage with the
potential to increase many terabytes as our demands arose. Total price tag: $160,000.
The server was assembled, and delivered... to a room with no available power. We had
requested it be setup in the campus's main computer room, where a generator was maintained
and we would never have to suffer from a power blackout. The only problem was the
electricians couldn't be bothered to process our work request and provide the necessary power
for the system to use. 160,000 dollars of University hardware sat in the corner, dormant
for six months before the power was activated.
The purchasing department used this time, and then some, to bicker with Novell about the
purchase price for an unlimited licensing agreement. The Purchasing department insisted on
taking the purchase request out to bid, as if somehow, someone else would be able to underbid
Novell on the purchase of a license agreement that only they sell.
Time passed. Novell, surprisingly, won the bid for the license. Now all we needed was to
get the signatures of practically every person involved with University administration
on the purchase agreement. We waited almost two months to get these signatures, only for it
to be lost enroute. The whole signature process was repeated. Total price tag: $60,000
Sometime in March of 2000 we had completed the purchase agreement and had received our
software. MicroAge, as part of the purchase agreement sent out a technician to install the
software on the servers. When it came time to install the cluster services he requested the
appropriate software. A few phone calls later we realized that the Netware clustering service
was not included in our unlimited license. Another, separate, bid process was started to
acquire the Netware clustering software. A license to connect two servers to one clustered
array cost us another $10,000 and 30 more days. Well it would have been 30 days if they had
sent us the right software. We received NCS ver1.0 for Netware 5.0, not NCS 1.1 for Netware
5.1. The two are incompatible.
So about May of 2000 the server was finally setup and the clusters were enabled. There was
great rejoicing, for a very short amount of time. It didn't quite work right. We called
Novell, only to discover that our Unlimited License agreement did not cover support either.
7000 dollars and another round of signatures solved that. Netware Premium Technical Support
told us that our problem was definitely hardware related. We contacted Compaq Technical
Support, after ensuring them that we had installed all of the latest ROMPAQ's and yes, we
had reseated the memory, they insisted that it was a software problem.
So what was the problem? The cluster was designed so that two servers could share the same
data volume. If one server went down, then the other came up and took over hosting duties.
The switch would take about 20 seconds and as far as the average user could tell, windows
just had a little hemorrhage and went on its merry way. It was a brilliant system, and it
worked too. Each time one server went down the other came up, like magic. The only real
problem was that it did it about a half dozen times a day. Every hour and a half or so the
server would just stop and restart itself, failing over to the other server.
After some research we found that a Compaq utility running on the server was reporting a
strange error at a specific Netware memory address. Of course, Compaq claimed it was a
Netware issue because the utility specifically named Netware. Netware claimed it was a Compaq
issue because it was a Compaq utility reporting the error. This was very frustrating.
It was during this time that I learned about Compaq Technical supports most asinine
procedure. Every time I called them I talked to a different technician, even when I called on
the same incident number. Every time I called I had to review the entire problem to this
new first level technician. They made me go through the same ridiculous procedure every time.
They asked if I had updated the ROMPAQ's, had I reseated the memory, did I try reseating
the processor? Had I talked to Novell Technical support? One day I had finally had
enough. I wanted to talk to the same technician; I didn't want to have to re-explain
everything. The operator claimed that I could not directly contact any support technicians.
I threatened, yelled and complained, eventually the operator relented and transferred me to
the technician I had previously spoken with. When she answered I explained who I was and she
was incredulous. "How did you get this number?" She asked. I explained what I had done and
she asked for the name of the operator. I told her I couldn't remember who it was and asked
why it mattered. "I have to report this to my supervisor, It's against our policy to take
the same call twice."
In July of 2000, fed up with dealing with first tier technical support, I called our Compaq
sales agent and told her that we were sick of this server. I told her to take the damn thing
back; we were never going to buy another Compaq. The next day I received a call from Compaq
Gold Technical Support. Three months after my first call to technical support, my case had
been upgraded. We then spent several months going over the details. Technicians came out
and investigated our power and our system. Lots of people had ideas, nobody had a solution.
By this point the server would crash anytime you attempted to copy data down from it. At
Compaq's request we boxed up one of the servers and sent it to them in whole.
Presumably, while in their custody it was submitted to a battery of tests. They found no
problems and sent it back, wherein it immediately failed again.
On a hunch the Compaq Support ninja asked what type of NIC we were using in our servers. We
had installed 3com 100fx fiber NICs. At the time of purchase Compaq had stopped
manufacturing their popular and dependable Netflex series of NICs and had yet to produce
their less popular and much more expensive Netintelligent NICs. MicroAge had recommended
the 3com NIC.
The technician suggested that perhaps that was the problem; he had no fiber plant in his
test lab and used the EISA UTP card that shipped with the Proliant server to test it
when we had shipped it back. He told me he would attempt to find a Compaq Fiber NIC and ship
it to me. So, in October we replaced the 3com NIC with an old Compaq Netflex fiber NIC...
It worked.
Now, before you go assaulting my intelligence, I had replaced virtually every piece of
hardware in this machine, including the NIC's, with known good, new, out of the box hardware.
Of course I had always replaced the NICs with the same brand and model. No one had even
suggested, in all the months we co-operated with both Compaq and Netware, that maybe we
should try a different brand of NIC.
There was a brief amount of celebration, followed by great cursing of Netware and Compaq.
Now, almost two years after our first request for funds, we are just now starting to
implement our great scheme. I am convinced that as we upgrade 7500 client machines all over
campus that we will encounter many more problems. I am also convinced that by the time we
get the system installed and functional, it will be obsolete. Towards that end we have
begun the purchase and implementation of hardware routers. Unfortunately, some of us didn't
quite learn our lesson last time. It has been decided that instead of using standard Cisco
routers, we will instead use routers made by Entarsys, formerly Cabletron.
I need a new job.