[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

computer crash analysis



PureBytes Links

Trading Reference Links

Some calming words of wisdom from the Ensign List. Howard is the main
programmer and support man for Ensign.

Best regards

Walter

===========================

EnsignList - http://www.ensignsoftware.com
Hi Bill,

Thanks for the information about the error message your are getting.   Sorry
that this is happening to you.  I know it is very frustrating for one's
computer and software not to be operating rock solid.   While you are having
a problem, others are not.   In fact, here is a comment from an Oct 12th
e-mail from another user:

"On another topic ... The Oct 7 version of DTN Ensign, like the other recent
versions seems especially solid.  If my system is the equivalent of a canary
in a coal mine, it is saying there are no minute noxious gases about.   :)
I've seen no access errors ... nothing odd, whatsoever, in any of the
processes here.  Even running 96 charts with several alerts (non ESPL) did
not cause any problems.  This is something I've not done in the past.  But,
have wanted to try for awhile."

So, let's work together to try to make progress for your system, and I will
dig into my experience with others to pose ideas:

1) If the bug is triggered by processing the data feed, then usually I have
several customers get hit with the problem.   While these several users use
the program differently, the commonality is the digestion of the data feed
which triggers the same problem for several.   For this type of problem, we
immediately notice a ground swell of telephone support calls reporting the
same type of error, even the same error address.

2)  If there is a design flaw with the program, it usually is something that
can be retriggered by exercising the buggy section of code.   Sometimes the
bug requires the user to manipulate the program in a certain way or sequence
that is unusual, and the majority of users don't use the program in that
manner.   Thus, all have the bug, but only one or two exercise it.   For
this type of problem, I get the most benefit by having the user describe in
detail their use of the program.  Sometimes I have them send me layout files
or parameter files as e-mail attachments.   As soon as I can duplicate the
problem, I am usually able to correct the design flaw or add code to prevent
the problem.  Through evolution as these types of problems are corrected,
the program becomes more stable for everyone, and it gets harder and harder
for users to find new bugs.

3) Another type of problem are those unique to a user's computer.   The
'problem' computer might have an unreliable hardware component, such as
memory or a CPU that misbehaves when it is hot.   I just went through this
frustration with a new computer that I bought.  It would not run reliably
for even a half hour.   The error messages and addresses were varied and the
crashes occurred 'out of the blue' at random times.   Fortunately the
computer was under warranty and a local technician serviced the computer.
Because of the random crashes and varied addresses, we were looking for a
temperature related problem.  The technician replaced the following
components in this order, trying to find the unreliable part:

    a) replaced the power supply and power supply fan   (hoping for low
voltage or poor air flow}
    b) replaced the memory        {hoping for unreliable chip}
    c) replace the cooling fan on the CPU    {hoping the CPU was running too
hot}
    d) finally replaced the CPU    {finally found the weak link and source
of the unreliability}

He proceeded to each next step, only after proving that the step just taken
failed to improve the machine's reliability.

4)  Another type of problem unique to a 'problem' computer might be some of
the foundation software that is loaded on the computer.   My brother had a
computer like this, and he kept thinking Ensign Windows was the culprit
because Ensign Windows was the software he used on the computer all day
long.   Yet, he had another computer running the same version of Ensign
Windows that was rock solid.   So, with the same version of Ensign Windows,
he experienced two different levels of reliability.   Fortunately, his case
has a happy ending.   He upgraded his computer by downloading from Microsoft
a newer version of Internet Explorer 5.0.   Whatever DLLs, and other files,
the upgrade replaced, one of them must have been involved in the previous
unreliability.   His problem machine after the upgrade become rock solid
reliable.   Ensign Windows did not change.  It was Internet Explorer that
changed, and the 'problem' suddenly healed.

5)  A related type of problem can be a virus.   Customer's in the past have
reported to us how their machines would crash frequently and unexpectedly.
Then they discovered they had a virus, removed with some anti-virus program
like Norton's or McAfee, and then the 'problem' suddenly healed.   Again,
Ensign Windows did not change.   It was the removal of the virus that healed
the computer of the problem it was having.

>From the clues you have provided so far, I think we can rule out category
#1.   Other DTN users are running day after day just fine, myself included.
Because of the varied addresses, and the fact the problem can occur during
the night when you are not doing anything, I want to rule out category #2.
I don't think the problem is being exercised by using a buggy section of
code in Ensign Windows.

Your clue that frequently the whole computer is crashed, locked-up, makes me
favor a category 3 hardware problem over a category 4 software or category 5
virus problem.  Usually problems of type 4 or 5 have a more repeatable
symptom, address, or behavior.  The most random would be temperature related
hardware problems as described in category 3.

So, here are my suggestions for today:

1)  Keep a written paper log of every crash.   Include date, time, error
message, error address, locked up status, recovery steps, what was last
action taken, what was showing on the screen, all applications running at
the time.

2)  If you have any virus protection software such as McAfee, please run it
and have it do its most thorough scan.   If you have not downloaded the
latest library of viruses it can check for, please obtain an upgrade for the
software.   Usually they have new libraries for download from the Internet
to bring their virus scanning product more current.

3)   Check to see what temperature the air flow out of the computer at the
power supply might be.   Use a thermometer for a reading.   Take a new
temperature reading when the computer crashes.   Try a test of running with
the computer cover off.  This will change the air flow dynamics.   All I am
looking for is a change in the frequency of the crash.  If the problem is
temperature related, having the cover off might make the crash occur more
readily and it might work the other way of making it less frequent.  Observe
to see if you have the expected air flow from the exhaust fan on the power
supply and that the cooling fan on the CPU is blowing normally.

4)  Write down on a list all the software items that are running.  You think
Ensign Windows is the only thing running.  Not so.   Press Ctrl-Alt-Del key
combination one time to see a list of all the stuff that is running.
Probably half the list is excess baggage that can be eliminated by
unchecking items in the start-up profile.    I once had a printer reminder
task auto-scheduled and it was a source of unreliability.  Glad I figured
out how to eliminate it.   Anyway, what is your list, and in another e-mail
we can try the process of removing unneeded tasks from the start-up profile.
This type of step was covered on the Discussion Group back in February.

Sincerely,
Howard Arrington
http://www.ensignsoftware.com/
______________________________________________________________________ To
unsubscribe, write to EnsignList-unsubscribe@xxxxxxxxxxx Start Your Own FREE
Email List at http://www.listbot.com/