Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

How can I see multicollinearity?

soheeparksoheepark Member Posts: 3 Learner I
edited May 2022 in Help
Hi, I'm a beginner.

I have a total of 17,379 row data.
I clicked to check the spatter matrix and heatmap because I wanted to check the relationship between variables.

But I couldn't see the scatter matrix and the heatmap.
Because the following text was displayed.

<heatmap>
Plot Heatmap does only support more than 2,000 rows if aggregation is enabled.

<scatter matrix>
Plot Scatter Matrix does not support more than 10,000 rows with the current configuration.

My data is time series data, and because it is time-based data from 2011-2012,
It is also ambiguous to cut the data to about 2,000 pieces.

In this case, what should I do?

Additionally, how can the VIF value be calculated in the Rapidminer?

I ask for an answer.
Thank you.




Best Answer

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi!

    In the Preferences (Settings => Preferences => User Interface) there's a setting "Visualizations row limit modifier". You can input higher values there if you are confident that your computer should be powerful enough to process and visualize more data. This is a safety limit to avoid overwhelming older computers.

    With higher limits you should be able to get the charts you need.

    About the VIF factor: RapidMiner is not a classical statistic application. It doesn't do regression analysis like those programs do.
    That said, this could be calculated in a process according to the formula in https://www.statisticshowto.com/variance-inflation-factor/ by looping through the attributes, doing the regression with the current attribute being the label, getting the R² values and calculating the VIF.

    Regards,
    Balázs
Sign In or Register to comment.