"Brand new user - text mining basics"

GizaJack
GizaJack New Altair Community Member
edited November 2024 in Community Q&A
Hello. If anyone is willing to help point a newb in the right direction, I'd appreciate it very much. I am working on a personal project to get a feel for the software and concepts, which will lead into a school project. I have an Excel file with the lyrics and some other basic information of several hundred songs. I wanted to look for interesting relationships with word usage perhaps in songs by artists of a certain gender, the year the song was written, and/or hit songs.
For my first go I thought I'd try focusing on just the decade (70s, 80s, 90s) and the lyrics. Maybe certain words didn't appear until a certain timeframe or there are some interesting cultural references. I can import the data and get the word frequency lists and understand on a basic level how to use the association operators. However, I'm not sure what I need to do so that RapidMiner groups the text by years/decades. Will I be able to see easily that in different years/decades certain words appear together or at all? What operators should I use and what should my data be like? Is an Excel file with a separate row for each song sufficient?
Do you think this is even a good started project or will nothing interesting/useful come out of it?

Thanks in advance for any advice or pointers in the right direction.

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi GizaJack,

    first of all: any task in Data Mining needs some work and a rough knowledge of some Data Mining concepts, and there is almost no problem which you can solve "easily". To get an overview how to work with RapidMiner, you should have a look at our video tutorials which you can find on our webpage: http://rapid-i.com/content/view/189/212/lang,en/
    There you can also find an introduction to text mining with RapidMiner.

    Generally, an excel file with one row per song should be fine. You certainly have a column for each feature of a song, e.g. decade, maybe genre, and one big column for the complete lyrics of each song. If that is the case, you should be fine. It may be a good idea to import the data only once and store it in the RapidMiner repository for easier access.

    To filter examples with a certain value in one attribute, you can use the "Filter Examples" operator.

    For anything else please have a look at the tutorials. If you have any further questions, just ask!

    Cheers, Marius

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.