The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

The role of dimensionality reduction with regard to Clustering approaches

Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
Hello Community, 

I plan to evaluate several Clustering techniques on a TF-IDF bag of words representation where I've previously executed a feature selection to efficiently reduce the number of dimensions of my vector space. In this sense, I've read that Feature Extraction/Transformation approaches get better results with regard to dimensionality reduction in comparison to Feature Selection ones if Clustering algorithms will be applied afterwards. First of all, how do you see this opinition out of theory? 

Secondly, as explained I've still executed Feature Selection. Would it be correct to additionally execute Feature Extraction based on the remaining dimensions which were derived from Feature Selection? Or should the Feature Exraction for efficient Clustering should be applied on the initial rough dataset? 

I thank you all for the participation and for the answers! 

Best regards!


Best Answer

Answers

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited February 2020
     I've read that Feature Extraction/Transformation approaches get better results with regard to dimensionality reduction in comparison to Feature Selection ones if Clustering algorithms will be applied afterward

    Based on your question, I assume that you are talking about techniques like PCA, ICA or some other things related to your data (n-grams etc). One of the major drawback with dimensionality reduction like PCA is the loss of interpretability. If you want to explain/interpret then feature selection is the way as it preserves original features. If your focus is to do dimensionality reduction then feature extraction can be done. You can use it where interpretation is not highly important.

    I think both (extract/selection) of them seem similar but they have a different purpose. I am not sure if it is always correct to say that feature extraction works better than selection.

    Secondly, as explained I've still executed Feature Selection. Would it be correct to additionally execute Feature Extraction based on the remaining dimensions which were derived from Feature Selection?
    Yes, you can do both. I generally apply feature extraction first and then doing a feature selection. There is nothing wrong as far as I know.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    Be careful with feature selection for clustering though: If you simply optimize for things like DB-Index without multi-objective optimization you will end up with trivial solutions where the data space collapses and the clusters no longer have any meaning.  I recommend to check out some of the papers I wrote about this ages ago.  There are still relevant though.  I am sure you can find them online somewhere:

    Mierswa, Ingo and Wurst, Michael. Information Preserving Multi-Objective Feature Selection for Unsupervised Learning. In Maarten Keijzer and Mike Cattolico and Dirk Arnold and Vladan Babovic and Christian Blum and Peter Bosman and Martin V. Butz and Carlos Coello Coello and Dipankar Dasgupta and Sevan G. Ficici and James Foster and Arturo Hernandez-Aguirre and Greg Hornby and Hod Lipson and Phil McMinn and Jason Moore and Guenther Raidl and Franz Rothlauf and Conor Ryan and Dirk Thierens (editors), GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1545--1552, New York, NY, USA, ACM Press, 2006.

    Or you just can go with the full PhD which covers a lot of related topics, too:


    There is a PDF of it as well...

    Cheers,
    Ingo
  • Options
    Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hi @IngoRM

    thank you for the literature recommendation!

    However, you wrote that one should be careful when using Feature Selection and Clustering. But do you have other alternatives with regard to efficient dimensionality reduction and subsequent Clustering if you want to interprete the Clustering results afterwards as @varunm1 mentioned? I don't see any other way beside Topic Modeling approaches like LDA. 

    Thank you in advance for your answer!
  • Options
    Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hi @mschmitz

    interesting approach. So you start clustering based on the PCA values and try to give a sense to the detected clusters afterwards by using the Decision Tree, right ? 

    Best regards! 
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Hi,
    pretty much yes. the trick is that you can do the interpretation on the original feature space, not the PCA-ed one.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    Martin's approach works. 
    But do you have other alternatives with regard to efficient dimensionality reduction and subsequent Clustering if you want to interprete the Clustering results
    The other alternative is to use multi-objective optimization for feature selection in the original space.  HOWEVER, you need to maximize the number of features, not minimize it.  More details can be found in the paper I have mentioned above.
    Cheers,
    Ingo
Sign In or Register to comment.