Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Brand new user - text mining basics"

GizaJackGizaJack Member Posts: 1 Learner III
edited June 2019 in Help
Hello. If anyone is willing to help point a newb in the right direction, I'd appreciate it very much. I am working on a personal project to get a feel for the software and concepts, which will lead into a school project. I have an Excel file with the lyrics and some other basic information of several hundred songs. I wanted to look for interesting relationships with word usage perhaps in songs by artists of a certain gender, the year the song was written, and/or hit songs.
For my first go I thought I'd try focusing on just the decade (70s, 80s, 90s) and the lyrics. Maybe certain words didn't appear until a certain timeframe or there are some interesting cultural references. I can import the data and get the word frequency lists and understand on a basic level how to use the association operators. However, I'm not sure what I need to do so that RapidMiner groups the text by years/decades. Will I be able to see easily that in different years/decades certain words appear together or at all? What operators should I use and what should my data be like? Is an Excel file with a separate row for each song sufficient?
Do you think this is even a good started project or will nothing interesting/useful come out of it?

Thanks in advance for any advice or pointers in the right direction.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi GizaJack,

    first of all: any task in Data Mining needs some work and a rough knowledge of some Data Mining concepts, and there is almost no problem which you can solve "easily". To get an overview how to work with RapidMiner, you should have a look at our video tutorials which you can find on our webpage: http://rapid-i.com/content/view/189/212/lang,en/
    There you can also find an introduction to text mining with RapidMiner.

    Generally, an excel file with one row per song should be fine. You certainly have a column for each feature of a song, e.g. decade, maybe genre, and one big column for the complete lyrics of each song. If that is the case, you should be fine. It may be a good idea to import the data only once and store it in the RapidMiner repository for easier access.

    To filter examples with a certain value in one attribute, you can use the "Filter Examples" operator.

    For anything else please have a look at the tutorials. If you have any further questions, just ask!

    Cheers, Marius
Sign In or Register to comment.