Append Example Sets of Varying Structure/Schema
Forum,
I am reading Tweets from S3 using the "Loop Amazon S3" operator. I have it working well (thanks, Marcel!) but the output of that process are several example sets. I could use the Append operator to combine them all, but due to the nature of Twitter's API, the Tweets can be of various structure/schema.
Question is: can I use an RM operator (like "Collect") to understand the distinct structures of my source data (which is JSON coerced into a kind of CSV format, by the way) then treat them differently? Ideally I would understand their structures, manipulate them and them combine the example sets based on a subset of common attributes in the data.
Thanks for the help!
Answers
The Cartesian join operator will at least allow you to look at everything together from sources with different file structure, But, you need to be careful using this operator because it multiplies your example count, so you might want to filter/sample first to get a reasonably-sized dataset to work with before you try it.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts