Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Facebook Political ads
deutschland2k
Member Posts: 3 Learner II
Hello,
I am a student at a portuguese University. I had to do a data mining project so I have decided to use the theme "Political ads on Facebook" and I choose the problem "what makes a facebook ad from political nature". For that I obtained a dataset with 160000 lines with Facebook ads, some are political , some are not. So my professor told me this was a classification problem and so I began cleaning the data knowing that the atribute "Message" is probably the GOLD here, because i think the solution is probably there and not in the correlation of other atributes (i don't know if i'm correct). What would my next step be?
Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and saw the forums but can not find an answer.
I am pretty much lost in this.
Thanks for your time.
Rúben
I am a student at a portuguese University. I had to do a data mining project so I have decided to use the theme "Political ads on Facebook" and I choose the problem "what makes a facebook ad from political nature". For that I obtained a dataset with 160000 lines with Facebook ads, some are political , some are not. So my professor told me this was a classification problem and so I began cleaning the data knowing that the atribute "Message" is probably the GOLD here, because i think the solution is probably there and not in the correlation of other atributes (i don't know if i'm correct). What would my next step be?
Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and saw the forums but can not find an answer.
I am pretty much lost in this.
Thanks for your time.
Rúben
Tagged:
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi Rúben,
for a classification problem you need "labeled" data. This means in your case that you'll need to manually label (assign a class) a few hundred messages. You can do it in a spreadsheet, just by creating a new column "class" (or whatever you'd like to call it) and filling it with "political" and "not political" (or whatever applies to your problem).
Then you can try AutoModel on the data with text mining ("Extract text information") switched on.
There are also multiple videos on the Academy on classification and text mining.
Regards,
Balázs6
Answers
Thank you for your advices and your response.
So basically I need to manually classify a few hundred messages so I can train my model? Is that the idea behind this? And if so, should I save 2 files, one with classified examples (politcal and not political) and one without classified examples?
I checked multiple videos and understood the logic but I am still stuck somehow.
Rúben
in data mining you typically use one data set and mark the label attribute. In Rapidminer you set the "role" label to mark it.
The data set should have instances of all classes you're trying to predict.
If your data aren't labeled yet, you need to label them somehow. If you can find a corpus of labeled political and non-political messages, you could try building a model from that and apply to your messages, it might work.
Best regards
Balázs