Options

# "[SOLVED] Bug: Distinct Values in Advanced Charts"

Hello,

I think there might be a bug in the new Advanced Charts, more precisely in "Grouping: Distinct Values".

Lets assume I have a dataset of 1000 examples and I want to create a histogram of a certain attribute a1:

- I drag a1 to the "domain" and the "range" dimension

- I select "grouping: distinct values" in the domain dimension

- I select "aggregation: count" in the range dimension

When I now sum up all the count values for the attribute, I get a sum which is by far less than 1000.

Is this a bug, or did I misunderstand "grouping distinct values" ?

If you want, I can either post a process or pictures showing this (or both of course).

Cheers Q-Dog

I think there might be a bug in the new Advanced Charts, more precisely in "Grouping: Distinct Values".

Lets assume I have a dataset of 1000 examples and I want to create a histogram of a certain attribute a1:

- I drag a1 to the "domain" and the "range" dimension

- I select "grouping: distinct values" in the domain dimension

- I select "aggregation: count" in the range dimension

When I now sum up all the count values for the attribute, I get a sum which is by far less than 1000.

Is this a bug, or did I misunderstand "grouping distinct values" ?

If you want, I can either post a process or pictures showing this (or both of course).

Cheers Q-Dog

Tagged:

0

## Answers

1,869Unicorndoes a1 contain missing values? Those are not counted, and thus it is possible that the total count is less than the number of examples.

If you don't have missings, we would be very interested in your process and the data, such that we can reproduce the problem.

Best regards,

Marius

32Contributor IIno a1 does not contain any missing values. Is it somehow possible to attach the ExampleSet so that you can view it directly in RapidMiner (without importing the logfile first) ?

Will the ".ioo" file do the job?

Anyway, here is a screenshot of my problem:

The example set has 17639 examples, but the plot has by far less.

The values in the x-axis are 0-163. If you assume that each value on the y-axis is 100 (which clearly is not the case), you will end up with 164*100 = 16400 < 17639.

Cheers Q-Dog

// Edit

I just checked, e.g. 0 appears 177 times in my example set, but in the plot, the count of 0 is only 45

1,869UnicornBest,

Marius

32Contributor IIThis looks by far better, thanks a lot!

1,869UnicornBest, Marius