"[SOLVED] Bug: Distinct Values in Advanced Charts"

Q-DogQ-Dog Member Posts: 32 Contributor II
edited June 2019 in Help
Hello,

I think there might be a bug in the new Advanced Charts, more precisely in "Grouping: Distinct Values".

Lets assume I have a dataset of 1000 examples and I want to create a histogram of a certain attribute a1:

- I drag a1 to the "domain" and the "range" dimension
- I select "grouping: distinct values" in the domain dimension
- I select "aggregation: count" in the range dimension

When I now sum up all the count values for the attribute, I get a sum which is by far less than 1000.

Is this a bug, or did I misunderstand "grouping distinct values" ?

If you want, I can either post a process or pictures showing this (or both of course).


Cheers Q-Dog

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    does a1 contain missing values? Those are not counted, and thus it is possible that the total count is less than the number of examples.

    If you don't have missings, we would be very interested in your process and the data, such that we can reproduce the problem.

    Best regards,
    Marius
  • Q-DogQ-Dog Member Posts: 32 Contributor II
    Hi Marius,

    no a1 does not contain any missing values. Is it somehow possible to attach the ExampleSet so that you can view it directly in RapidMiner (without importing the logfile first) ?
    Will the ".ioo" file do the job?

    Anyway, here is a screenshot of my problem:
    image

    The example set has 17639 examples, but the plot has by far less.
    The values in the x-axis are 0-163. If you assume that each value on the y-axis is 100 (which clearly is not the case), you will end up with 164*100 = 16400 < 17639.


    Cheers Q-Dog

    // Edit
    I just checked, e.g. 0 appears 177 times in my example set, but in the plot, the count of 0 is only 45
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hm, the plotters reduce the number of data points by sampling because otherwise drawing an example set with a large number of datapoints would be very slow. However, when using aggregation and grouping, it *should* not sample. Anyways, can you please try to increase the property rapidminer.gui.plotter.rows.maximum in the Gui tab of Tools->Properties in RapidMiner to a value greater than 17000?

    Best,
    Marius
  • Q-DogQ-Dog Member Posts: 32 Contributor II
    image

    This looks by far better, thanks a lot! :)
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Now we also fixed it in the code: if any of the grouping functions is set for a Plot, no sampling is applied for that Plot. It didn't make it into yesterday's release, though.

    Best, Marius
Sign In or Register to comment.