[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Correlation Math Help



PureBytes Links

Trading Reference Links

>These results suggest that comparing two data sets one of which remains
>always positive while the other does not is, in fact, OK and does not
>introduce any error into the results.  Any comments?

To find the correlation among two sequences of numbers, each sequence is first converted into a sequence of z-scores.  Then the resulting 2 sets of z-scores are cross-multiplied and added up.  The z-score process removes the mean from the set and then scales the variables so the standard deviation of the set is 1.

When you added a constant to the 2nd sine wave, the z-score process took it right out.  When you scaled the 2nd sine wave, the z-score process "unscaled" it. So in either case, you get the same result as the original test.

The big problem with cross-correlation is that a few relatively large numbers that coincide  between the two sets will dominate the overall result.  Correlation gives too much power to large numbers at the expense of all the smaller ones.  Rank correlation avoids that problem, although it has its own idiosyncracies.

- Mark Jurik





  



----------
From: 	rudolf stricker
Sent: 	Monday, August 26, 2002 10:34 AM
To: 	omega-list@xxxxxxxxxx
Subject: 	Re: Correlation Math Help

On Sun, 25 Aug 2002 11:56:14 -0700, "carrslem" <carrslem@xxxxxxx>
wrote:

>Before receiving your answer, I conducted an experiment.  I generated two
>data sets, one a simple sine wave (which oscillates above and below zero)
>and the other the same sine wave but displaced upward by about double its
>amplitude so that it was all positive.  I then generated a correlation
>coefficient and, voila! - it was exactly 1.00.  I then modified one of the
>sets so that both still had the same phase and wavelength but different
>amplitudes.  Again the coefficient was exactly 1.00.
>
>These results suggest that comparing two data sets one of which remains
>always positive while the other does not is, in fact, OK and does not
>introduce any error into the results.  Any comments?
>
Imo, this does not meet the situation we normally have to deal with.
In reality, we have a sequence of positive (eg wins) and negative
values (eg losses) of rather different absolute size, and another
sequence of the same kind comes from our "prediction model" or
"trading system". And the question is, how the system might correlate
with reality. For those cases, my statement holds:

>Using correlation coefficient might be very misleading, when
>correlating variables ranging from positive to negative values. This
>is because rather small values around zero might show even negative
>correlation, even if the absolute errors are very small. Moving to
>_rank correlation_  gives a much better insight into the quality of
>correlation in those cases.

mfg rudolf stricker
--
| Disclaimer: The views of this user are strictly his own.