# What does Singular Value Decomposition exactly do?

Dear all,

I have a question regarding the dimension reduction technique called Singular Value Decomposition in Rapidminer. I am using it in the context of textmining and i want to know what it actually does. I have searched everywhere for an answer including this forum but i couldn't find any.

I did some experiments to find out what SVD does and to my experience it decomposes the (term versus document )matrix in three matrices: USV*. Then it replaces the original (term versus document )matrix with the matrix U and applies the dimension reduction on this matrix. Is this correct and if so, why is the orginal matrix replaced with the matrix U. Is there some explanation or theory behind this?

I hope that you can help me out with this. Maybe there is some file that describes the working of SVD in Rapidminer, if there is such an file, maybe you can pass me a link to it.

Thanx in advance.

Greetings,

Textminer

I have a question regarding the dimension reduction technique called Singular Value Decomposition in Rapidminer. I am using it in the context of textmining and i want to know what it actually does. I have searched everywhere for an answer including this forum but i couldn't find any.

I did some experiments to find out what SVD does and to my experience it decomposes the (term versus document )matrix in three matrices: USV*. Then it replaces the original (term versus document )matrix with the matrix U and applies the dimension reduction on this matrix. Is this correct and if so, why is the orginal matrix replaced with the matrix U. Is there some explanation or theory behind this?

I hope that you can help me out with this. Maybe there is some file that describes the working of SVD in Rapidminer, if there is such an file, maybe you can pass me a link to it.

Thanx in advance.

Greetings,

Textminer

0

## Answers

4Contributor ICan someone help me out please! I really need the anwser. What about the moderators, doesnt anyone of you know the answer to my question (what about you mierswa??). I find this very strange, is it such a difficult question?

I hope to hear some reactions.

Bye,

Textminer

849Guru4Contributor IThanx for your link, but i know what Singular Value Decomposition is and what it does. I actually want to know what rapidminer does. Which actions does it perform on the term by document matrix? If you select SVDreduction in Rapidminer it only states "a dimensionality reduction method based on singular value decomposition". What does this mean in practice, i cant find this anywhere.

Greetings,

Textminer

849GuruI have sympathy with your problem, because it concerns not the "what" but the "how" of the RM SVD implementation, and that means getting down and dirty with the source and Eclipse, or being very nice to Ingo, he of the very pointy head.

However, I take the more general point, namely that RM documentation could be improved. Being an automaton myself, I too can learn from supervised examples, so it would be nice if you could drill down directly from the "new operator" tab to examples, forum articles, and outside links, as you can with other IDEs.

That being said, you have to go with what you've got, which is in its own way a world leader, actively supported and developed by some of the most qualified, able, and enthusiastic minds you are likely to meet.

Happy coding - I took a squint and survived the experience!

1,643RM Founderthanks I would like to thank Haddock for your kind words - I must admit that I am always looking forward to your answers and comments since they are always a pleasure to read (did you consider to work as an author? I would surely read your books / articles / editorials / ...).

About the documentation issue in general:

We really would like to have more ressources for improving the documentation and actually already started on this. But this of course takes much time. And then again this is one of the major advantages in using open-source software: you can check out the concrete details yourself. As a developer, I always stick to the following two rules:

1.) Don't write comments which are likely to become wrong sometime

2.) Don't write code which is less clear than a comment

If you work like this this will have two consequences: less comments but clearly written code which can often be read even by non-developers (of course it is easier for developers or at least for people with some mathematical background).

About the SVD:

You do not have to be a developer and work with all these developer tools to get insight into the code. A simple web browser is enough. The following link leads to the base of all source code of RapidMiner:

http://yale.cvs.sourceforge.net/yale/

And here you can find the concrete source for the SVD:

http://yale.cvs.sourceforge.net/yale/yale/src/com/rapidminer/operator/features/transformation/SVDReduction.java?view=markup

One of the most important lines here is

import Jama.SingularValueDecomposition;

meaning that we do not compute it ourself but ask a library (Jama) for this. Since this is again open source you could check there for more details.

Cheers,

Ingo

4Contributor IThanx for your answer about SVD. I wasnt aware of the fact that you could check out the sourcecode of rapidminer online. Thanx for pointing that out. The important lines are indeed:

Matrix u = svd.getU().getMatrix(0, es.size() - 1, 0, dimensions - 1);

return u;

In the Jama pack you have a class called SingularValueDecomposition which computes: "For an m-by-n matrix A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'. ".

This class has a method getU() wich returns the left singular vectors (in other words it returns the columns of matrix U). It is this method that is called upon in the important lines. So this means that I was right in the first place (see my first post).

But in this first post I also wandered if there is some theory or explaination behind this? Because you are one of the authors of the code I ask you Ingo this question. Why do you select the columns of the matrix U? Has this some connection with LSI (latenst semantic indexing)? I hope you can help me out with this question. I would appreciate it very much.

Thanx in advance,

Greetings,

Textminer

1,643RM FounderCheers,

Ingo