Options

# (Bug?) What definition of AutoCorrelation operator is in valueSeries plugin?

Hello statistical friends,

I examined the code of

[tt]rapidminer\operator\valueseries\transformations\basis\AutoCorrelation.java[/tt]

so that I could understand the meaning of the three input parameters factor, start, end.

Here is the relevant excerpt from [tt]AutoCorrelation.java[/tt] v5.3.000.

[tt]result( i ) = 2 Variance( x ) - 2 Covariance( x( j ), x( j+factor/i ) )[/tt]

.

The term "factor/i" is unfamiliar to me.

To calculate an auto-covariance function of sequence x, I would have expected to see [tt]Cov(x( j ), x( j+factor * i ))[/tt]. There, the purpose of factor is to enable user to control the computational effort by sparsely sampling the lag axis.

A few questions arise for me:

1. Is this a bug? Or is "autocorrelation transformation" something mathematically distinct from the autocovariance of the sequence?

2. Suppose the output was [tt]result(lag) = 2 Var(x) - 2 Cov( x( j ), x( j+lag ) )[/tt]. Is there a reason in machine learning why that expression is more useful than just [tt]result(lag) = Cov( x( j ), x( j+lag ) )[/tt] ?

3. Where is the public repository for ValueSeries plugin so that I can be sure that my comments are relevant to the latest code?

Thanks and regards,

Owen

I examined the code of

[tt]rapidminer\operator\valueseries\transformations\basis\AutoCorrelation.java[/tt]

so that I could understand the meaning of the three input parameters factor, start, end.

Here is the relevant excerpt from [tt]AutoCorrelation.java[/tt] v5.3.000.

for (int i = start; i < end; i++) {The function appears to calculate an estimate that converges to

double differences = 0.0d;

int numberOfValues = 0;

for (int j = 0; j < series.length(); j++) {

int lag = (int) ((double) factor / (double) i);

if ((j + lag) >= series.length())

break;

numberOfValues++;

double difference = series.getValue(j) - series.getValue(j + lag);

differences += (difference * difference);

}

differences /= numberOfValues;

displacements[i - start] = i;

result[i - start] = new Vector(differences);

}

[tt]result( i ) = 2 Variance( x ) - 2 Covariance( x( j ), x( j+factor/i ) )[/tt]

.

The term "factor/i" is unfamiliar to me.

To calculate an auto-covariance function of sequence x, I would have expected to see [tt]Cov(x( j ), x( j+factor * i ))[/tt]. There, the purpose of factor is to enable user to control the computational effort by sparsely sampling the lag axis.

A few questions arise for me:

1. Is this a bug? Or is "autocorrelation transformation" something mathematically distinct from the autocovariance of the sequence?

2. Suppose the output was [tt]result(lag) = 2 Var(x) - 2 Cov( x( j ), x( j+lag ) )[/tt]. Is there a reason in machine learning why that expression is more useful than just [tt]result(lag) = Cov( x( j ), x( j+lag ) )[/tt] ?

3. Where is the public repository for ValueSeries plugin so that I can be sure that my comments are relevant to the latest code?

Thanks and regards,

Owen

Tagged:

0