"PCA Operator taking to much time"

waqaskhan343waqaskhan343 Member Posts: 11 Contributor I
edited June 2019 in Help

Hello, I am performing sentiment analysis on text data in which I examine 1700 tweets. after performing all preprocessing of data I want to visualize it using PCA to check the relationship between the different classes. After generating TF-IDF I am using pca operator with componant=2 and fixed number variance but it taking much much time approx 2 to 3 hour. Even I put a normalize operator before PCA but it doesn't work for me    

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Did you apply any pruning when you generated your word vector?  If not, then you probably have thousands of attributes, many of which have extremely low values, and that is why PCA is taking so long!  You should definitely prune your wordlist first, since tokens that have only a handful of occurrences are not going to be meaningful, but they are causing a lot of computational effort on the part of the PCA operator.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    What @Telcontar120 said. Work on your wordlist first before you put it into PCA. Even just 50 attributes could chew up runtime if you don't have a large memory computer. 

Sign In or Register to comment.