**🎉 🎉 RAPIDMINER 9.5 BETA IS OUT!!! 🎉 🎉**

### GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

## CLICK HERE TO DOWNLOAD

**🦉 🎤 RapidMiner Wisdom 2020 - CALL FOR SPEAKERS 🦉 🎤**

### We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.

Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.

Form link is below and deadline for submissions is **November 15**. See you in Boston!

## Answers

2,527Unicornas usual you cannot say how things will work before you tested it. One mostly applies the X-Validation to get a performance estimation if it's a classification or regression task. Clusterings might be evaluated using cluster measures. You might optimize the number of dimensions by iterating over this parameter and test every combination. There are operators for this, all starting with Optimize Parameters.

But anyway, I doubt that the SVD will work very well on Text Datasets, simply because it might take much to long time to compute the Singular Value Decomposition of such a huge matrix, as they frequently occur in text mining.

Greetings,

Sebastian

72MavenThere seems to be a lot of literature about the use of SVD with text, but indeed the time might be prohibitive. Is there a way in RM to get the singular values themselves (I have read one can plot their squares and see where they level off to determine the best # of dimensions)?

Thanks!

2,527UnicornI'm not sure about this, but aren't they displayed in the model of the Singular Value Decomposition?

Greetings,

Sebastian

72Maven2,527Unicornas I saw now, all this information is discarded in RapidMiner and hence currently cannot be shown in the visualization of the preprocessing model. If you take a look at the result of the PCA, it's in deed possible to use these values for displaying. I could add that relative easily I guess, but this and next week, there won't be the time for it. I'm very busy with customer projects, that bye the way are about text mining. So I'm curious about the runtime of SVD with many features and the amount of memory it needs. As far as I can see, it needs the complete matrix, so it will crash with my around 40.000 word attributes. Did you made any experience with that? What about the classification performance, is it worth implementing a special SVD for sparse matrices?

Greetings,

Sebastian

98Contributor IIregarding your question: "is it worth implementing a special SVD for sparse matrices?"

I think absolutely YES. it worth for sure. A couple of days ago I tested the SVD operator on my text dataset with 23000 features on a relatively high performance machine. After 10 hours the algorithm was finished!!!

As far as I know, LSA algorithm has tackled with this problem. It just use an approximation of term-document matrix (which is so smaller than the original matrix). So, I kindly suggest that RM team try to embed an LSA operator in RM.

cheers.