Options

Visualizing Data Set with tons of small values.

gerbygerby Member Posts: 2 Newbie
edited December 2023 in Help
Hi all, I'm new to rapidminer and just started to use it today. I have an extremely large dataset that I am trying to visualize. However most of the attributes contains lots of zeroes in them, as a result, when I visualize using a histogram, it ends up looking like this:
If I turn on the logarithmic scale in the y-axis it still looks pretty weird to me.

so my question is, are there anything to make the data look better? I thought of removing outliers, but due to the large amounts of smaller values, the outliers end up being most of the data that have higher values. Tried splitting zeroes and non zeroes but since most of them are small data and not just zeroes, it end up looking pretty much the same. Thanks in advance!

Best Answer

  • Options
    rjones13rjones13 Member Posts: 189 Unicorn
    Solution Accepted
    Hi @gerby,

    Unfortunately, this is the sort of answer in that it really depends what your end goal for visualizing the data is. If it's just to produce a visual overview for the data, then I think the logarithmic scale does a reasonable job of this. You'll see each line corresponds to a power of 10, so moving one tick mark up corresponds to a 10x increase. I'd also potentially filter out values above 10k to produce more granularity. What do you think?

    Best,

    Roland 

Answers

  • Options
    gerbygerby Member Posts: 2 Newbie
    I see, thanks for the input. I think filtering extreme values seems to be my best bet. Thanks once again!
  • Options
    nataliarelishnataliarelish Member Posts: 4 Learner I
    Thank you for sharing valuable insights.
Sign In or Register to comment.