The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Options

## Answers

2,531Unicornas usual you cannot say how things will work before you tested it. One mostly applies the X-Validation to get a performance estimation if it's a classification or regression task. Clusterings might be evaluated using cluster measures. You might optimize the number of dimensions by iterating over this parameter and test every combination. There are operators for this, all starting with Optimize Parameters.

But anyway, I doubt that the SVD will work very well on Text Datasets, simply because it might take much to long time to compute the Singular Value Decomposition of such a huge matrix, as they frequently occur in text mining.

Greetings,

Sebastian

72Contributor IIThere seems to be a lot of literature about the use of SVD with text, but indeed the time might be prohibitive. Is there a way in RM to get the singular values themselves (I have read one can plot their squares and see where they level off to determine the best # of dimensions)?

Thanks!

2,531UnicornI'm not sure about this, but aren't they displayed in the model of the Singular Value Decomposition?

Greetings,

Sebastian

72Contributor II2,531Unicornas I saw now, all this information is discarded in RapidMiner and hence currently cannot be shown in the visualization of the preprocessing model. If you take a look at the result of the PCA, it's in deed possible to use these values for displaying. I could add that relative easily I guess, but this and next week, there won't be the time for it. I'm very busy with customer projects, that bye the way are about text mining. So I'm curious about the runtime of SVD with many features and the amount of memory it needs. As far as I can see, it needs the complete matrix, so it will crash with my around 40.000 word attributes. Did you made any experience with that? What about the classification performance, is it worth implementing a special SVD for sparse matrices?

Greetings,

Sebastian

98Contributor IIregarding your question: "is it worth implementing a special SVD for sparse matrices?"

I think absolutely YES. it worth for sure. A couple of days ago I tested the SVD operator on my text dataset with 23000 features on a relatively high performance machine. After 10 hours the algorithm was finished!!!

As far as I know, LSA algorithm has tackled with this problem. It just use an approximation of term-document matrix (which is so smaller than the original matrix). So, I kindly suggest that RM team try to embed an LSA operator in RM.

cheers.