Radoop Open File Operator

JugiJugi RapidMiner Certified Analyst, Member Posts: 12 Contributor II
edited December 2018 in Product Feedback - Resolved

It would be handsome to have an operator that can read files from HDFS without the definition of a schema in hive.

It should then provide the file as the Open File operator does for local files, URL and Repository Blob Entries.

HDFS security features like user and kerberos should be used in this new operator.

One application would be the processing of XML or JSON files from a cluster.

This would be usefull for the process pushdown because various file types could be processed inside the cluster.

9
9 votes

Declined · Last Updated

From PM: Radoop fundamentally relies on Hive tables to be able to process ExampleSets, or rather, ExampleSet-like, row-based units of data. In other words, the unit of data, based on which data is split and parallelization is handled, are “table rows”, not files. Changing the code so it would be able to work on file-level as well would be a rather expensive endeavor but we are laying down the concepts of the next iteration of Radoop and we will build it in a way to support file-based operation. PROD-761

Comments

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Not just simple JSON or XML, but also for image files.

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn

    Hi,

     

    I agree, that would make RM much more useful and a real centerpiece in the architecture. Of course a Write File (HDFS) operator should be added, too :)

     

    Greetings,

      Sebastian

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Sign In or Register to comment.