Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

How to turn a colored histogram into a stacked bar chart?

apoloduvalisapoloduvalis Member Posts: 3 Learner I
edited February 2020 in Help
I want to create a visualization from data with columns SEXO (sex, with 1 for male and 2 for female) and INGLABO (income, which is a range of four bins) with 24,984 examples. My goal is to show a column for male and other for female, where each segment of the bar be the count of examples of that sex within the appropriate income range.
My first choice was an histogram, with value column as SEXO (1,2) and color as INGLABO (the ranges are in the legend at the bottom of the chart). Despite there should be displayed 13,545 records for male (1) and 11,439 for female (2) in two single columns, each color (counts of values for a income range of a particular sex) is shown as a separated column, so you only get to see the more frequent group instead of visualizing each group of a same sex stacked upon each other. 

Is this a bug? Or am I doing something wrong? The histogram seems to work fine without color.

I got closer to the visualization I need with a TurboPrep "Histogram color" Chart, but it downsized the sample from 24,984 records to 5,000, so it's kind of useless:

Any ideas? Thanks in advance,
Andrés

Best Answer

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @apoloduvalis I'm sorry no one has chimed in here. Can you post some data so I can try to see how a chart would look?
  • apoloduvalisapoloduvalis Member Posts: 3 Learner I
    edited March 2019
    Hi @sgenzer , thank you for showing up!
    The colored histogram I am getting shows the different segments in a different color, but one behind the others instead of stacking them one upon each other. When I hover the pointer over the legend of the last segment (0, the black one) as if I was going to hide it, I can see the other segments behind it because they turned semi-transparent. I would like to attach an image but since I am still a newbie the system does not allow me to do it.

    I started to think that the histogram not stacking the different colored segments was not a bug but a feature of RM. However, I got closer to the visualization I need with a TurboPrep "Histogram color" Chart, which with the same data managed to show a histogram with stacked segments. Sadly, it downsized the sample from 24,984 records to 5,000 (perhaps a limitation of the free version?), so it has not enough samples to include data for the red color (0) or to get the right proportions of the distribution.

    I have attached the data I am using as source for the histogram as a .csv file. I hope you can reproduce this behavior and figure out what is going on.
    Kind regards,
    Andrés
  • apoloduvalisapoloduvalis Member Posts: 3 Learner I
    Well, it seems my second posts upgraded me from newbie to learner so now I can attach images. My histogram looks like this:



    And the TurboPrep histogram color chart looks like this:


  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hi,

    Currently you would need to pivot the data manually first, and then use a "Column" chart with stacking. Histograms in the new visualizations are not stacking the data on purpose, as they work on numerical data, and usually the bins between different colors do not have anywhere close to a 100% match.
    We will have a "Color Group" option coming for Line/Bar/Column/Area charts coming at some point in the future, where you can basically split the data into different groups (per category value), and at that point you can very easily do what you are currently trying to achieve.

    Turbo Prep still uses the old charts, which is why it's a bit different story there.

    Regards,
    Marco
Sign In or Register to comment.