Options

Append Example Sets of Varying Structure/Schema

AustinTAustinT RapidMiner Certified Analyst, Member Posts: 12 Contributor II
edited November 2018 in Help

Forum,

 

I am reading Tweets from S3 using the "Loop Amazon S3" operator. I have it working well (thanks, Marcel!) but the output of that process are several example sets. I could use the Append operator to combine them all, but due to the nature of Twitter's API, the Tweets can be of various structure/schema. 

 

Question is: can I use an RM operator (like "Collect") to understand the distinct structures of my source data (which is JSON coerced into a kind of CSV format, by the way) then treat them differently? Ideally I would understand their structures, manipulate them and them combine the example sets based on a subset of common attributes in the data. 

 

Thanks for the help! 

 

 

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    The Cartesian join operator will at least allow you to look at everything together from sources with different file structure,  But, you need to be careful using this operator because it multiplies your example count, so you might want to filter/sample first to get a reasonably-sized dataset to work with before you try it.  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.