How to I sort out various unstandardised data from a single cell ?

ETMZETMZ Member Posts: 7 Contributor I
edited September 20 in Help

Hello everyone, 

For context, I'm trying to find out which marketing medium is the most effective to be used by Starbucks. 

Attached is the dataset used and the column pertaining to my part is question 19. 


As you can see, the data are retrieved from survey forms, as such, the format of their answer varies (e.g. some people only choose 'social media' as how they hear about promotions whereas some chose multiple methods (e.g. Starbucks Website/Apps;Social Media;Through friends and word of mouth). So my question is, is there anyway that i can sort out those with multiple answers so i can accurately count which is the most favourable medium to be used?


Thank you !


Best Answer

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted
    You need the Split operator, not Split Data. Use the operator search and it will come up. 
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You can easily separate out the separate items using the Split operator.  It looks like the semicolon is your split pattern.  Then you can count the distinct values using Aggregate.  The sample tutorial processes in the software should show you how to set it up.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    ETMZ
  • ETMZETMZ Member Posts: 7 Contributor I
    edited September 20
    @Telcontar120 thank you so much for responding !!! However, i do not understand the ratio part. How do i split it using semicolons? 
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    The Split operator has a parameter called "split patten" where you specify the character(s) that should be used to accomplish the split. Were you able to open the tutorial process under the help menu for the Split operator as I suggested? That should make it clear. 
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • ETMZETMZ Member Posts: 7 Contributor I
    @Telcontar120 are you referring to this?



  • ETMZETMZ Member Posts: 7 Contributor I
    @Telcontar120 please excuse my stupidity ahhhaa 
    anyway i have managed to split the data however the marketing method has now become as below, may i know what do you suggest to tidy things up so that i can count the value using aggregate 

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I think this very much depends on exactly how you want to represent the data, given that any individual example can have several marketing methods. You can of course simply aggregate and count examples for all the values of your first 3 marketing methods, for example. But you may want to do something more complicated, in which case you probably are going to want to look at some of the text processing operators which would allow you to use the original data (before you split it) and tokenize it and then count total occurrences of specific tokens of interest. This is something that would probably be outside the scope of complexity that someone would be able to offer guidance on a forum like this. 
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • hainaha4hainaha4 Member Posts: 1 Newbie
    hello,
    i am new here from india, here to share some thoughts with you all

Sign In or Register to comment.