The role of dimensionality reduction with regard to Clustering approaches

Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
Hello Community, 

I plan to evaluate several Clustering techniques on a TF-IDF bag of words representation where I've previously executed a feature selection to efficiently reduce the number of dimensions of my vector space. In this sense, I've read that Feature Extraction/Transformation approaches get better results with regard to dimensionality reduction in comparison to Feature Selection ones if Clustering algorithms will be applied afterwards. First of all, how do you see this opinition out of theory? 

Secondly, as explained I've still executed Feature Selection. Would it be correct to additionally execute Feature Extraction based on the remaining dimensions which were derived from Feature Selection? Or should the Feature Exraction for efficient Clustering should be applied on the initial rough dataset? 

I thank you all for the participation and for the answers! 

Best regards!


Jasmine_

Best Answer

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited February 2020
     I've read that Feature Extraction/Transformation approaches get better results with regard to dimensionality reduction in comparison to Feature Selection ones if Clustering algorithms will be applied afterward

    Based on your question, I assume that you are talking about techniques like PCA, ICA or some other things related to your data (n-grams etc). One of the major drawback with dimensionality reduction like PCA is the loss of interpretability. If you want to explain/interpret then feature selection is the way as it preserves original features. If your focus is to do dimensionality reduction then feature extraction can be done. You can use it where interpretation is not highly important.

    I think both (extract/selection) of them seem similar but they have a different purpose. I am not sure if it is always correct to say that feature extraction works better than selection.

    Secondly, as explained I've still executed Feature Selection. Would it be correct to additionally execute Feature Extraction based on the remaining dimensions which were derived from Feature Selection?
    Yes, you can do both. I generally apply feature extraction first and then doing a feature selection. There is nothing wrong as far as I know.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    Jasmine_
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    Be careful with feature selection for clustering though: If you simply optimize for things like DB-Index without multi-objective optimization you will end up with trivial solutions where the data space collapses and the clusters no longer have any meaning.  I recommend to check out some of the papers I wrote about this ages ago.  There are still relevant though.  I am sure you can find them online somewhere:

    Mierswa, Ingo and Wurst, Michael. Information Preserving Multi-Objective Feature Selection for Unsupervised Learning. In Maarten Keijzer and Mike Cattolico and Dirk Arnold and Vladan Babovic and Christian Blum and Peter Bosman and Martin V. Butz and Carlos Coello Coello and Dipankar Dasgupta and Sevan G. Ficici and James Foster and Arturo Hernandez-Aguirre and Greg Hornby and Hod Lipson and Phil McMinn and Jason Moore and Guenther Raidl and Franz Rothlauf and Conor Ryan and Dirk Thierens (editors), GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1545--1552, New York, NY, USA, ACM Press, 2006.

    Or you just can go with the full PhD which covers a lot of related topics, too:


    There is a PDF of it as well...

    Cheers,
    Ingo
    varunm1Jasmine_sgenzer
  • Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hi @IngoRM

    thank you for the literature recommendation!

    However, you wrote that one should be careful when using Feature Selection and Clustering. But do you have other alternatives with regard to efficient dimensionality reduction and subsequent Clustering if you want to interprete the Clustering results afterwards as @varunm1 mentioned? I don't see any other way beside Topic Modeling approaches like LDA. 

    Thank you in advance for your answer!
    Jasmine_
  • Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hi @mschmitz

    interesting approach. So you start clustering based on the PCA values and try to give a sense to the detected clusters afterwards by using the Decision Tree, right ? 

    Best regards! 
    Jasmine_
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,280 RM Data Scientist
    Hi,
    pretty much yes. the trick is that you can do the interpretation on the original feature space, not the PCA-ed one.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    Jasmine_Muhammed_Fatih_
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    Martin's approach works. 
    But do you have other alternatives with regard to efficient dimensionality reduction and subsequent Clustering if you want to interprete the Clustering results
    The other alternative is to use multi-objective optimization for feature selection in the original space.  HOWEVER, you need to maximize the number of features, not minimize it.  More details can be found in the paper I have mentioned above.
    Cheers,
    Ingo
    Jasmine_Muhammed_Fatih_
Sign In or Register to comment.