It would be handsome to have an operator that can read files from HDFS without the definition of a schema in hive.
It should then provide the file as the Open File operator does for local files, URL and Repository Blob Entries.
HDFS security features like user and kerberos should be used in this new operator.
One application would be the processing of XML or JSON files from a cluster.
This would be usefull for the process pushdown because various file types could be processed inside the cluster.
Not just simple JSON or XML, but also for image files.
I agree, that would make RM much more useful and a real centerpiece in the architecture. Of course a Write File (HDFS) operator should be added, too
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.