"JSON to data operator"

fmehraliyevfmehraliyev Member Posts: 2 Contributor I
edited May 2019 in Help

Hello. I am really new in RapidMiner.

Basically,  I have a dataset that has .json extension. 

I was recommended to use JSON to data operator to be able to work with dataset. 

Unfortunately I could not use the operator. It does not say anywhere, where  in the operator to specify which dataset I am intending to work with. I basically cannot figure out, how to use this operator.

 

p.s. I have read some responses but they seem to be to advanced with xml codes. Do I need to specify the name of the file somwhere in xml code or? Thanks ahead. 

Tagged:

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @fmehraliyev - welcome to the community. To use the "JSON to Data" operator with a local JSON file, just use a "Read Document" operator before it:

     

    Screen Shot 2018-10-16 at 7.14.53 PM.png

     

    As for sharing XML, that's the way we RapidMiner share our processes with one another. You can read about how to do this here:

     

    https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-can-I-share-processes-without-RapidMiner-Server/ta-p/37047

     

    Scott

     

     

  • fmehraliyevfmehraliyev Member Posts: 2 Contributor I

    Thank you very much!

    One step is done. Now the second problem.

    Json to data operator could not transform the file into dataset appopriately.

     

    This is how my json file looks like: 

     

    @{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id":"iCQpiavjjPzJ5_3gPD5Ebg","stars":2,"date":"2011-02-25","text":"The pizza was okay. Not the best I've had. I prefer Biaggio's on Flamingo \/ Fort Apache. The chef there can make a MUCH better NY style pizza. The pizzeria @ Cosmo was over priced for the quality and lack of personality in the food. Biaggio's is a much better pick if youre going for italian - family owned, home made recipes, people that actually CARE if you like their food. You dont get that at a pizzeria in a casino. I dont care what you say...","useful":0,"funny":0,"cool":0}
    {"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id":"pomGBqfbxcqPv14c3XH-ZQ","stars":5,"date":"2012-11-13","text":"I love this place! My fiance And I go here atleast once a week. The portions are huge! Food is amazing. I love their carne asada. They have great lunch specials... Leticia is super nice and cares about what you think of her restaurant. You have to try their cheese enchiladas too the sauce is different And amazing!!!","useful":0,"funny":0,"cool":0}

    It is a yelp dataset, and as I understand every {} sign should represent one row. 

    However, the operator transforms everything into one row and all attribute names starts with businessid (screenshot.jpg file attached).

     

    Following is the xml code.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:read_document" compatibility="8.1.000" expanded="true" height="68" name="Read Document" width="90" x="45" y="85">
    <parameter key="file" value="C:\Users\Fuad\Documents\yelp_dataset\yelp_academic_dataset_checkin.json"/>
    </operator>
    <operator activated="true" class="text:json_to_data" compatibility="8.1.000" expanded="true" height="82" name="JSON To Data" width="90" x="313" y="85"/>
    <connect from_op="Read Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
    <connect from_op="JSON To Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    I have been reading a lot about json extention in rapidminer, hope it can be fixed though. Thanks everybody

     

     

     

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research

    Hi,

     

    to process the JSON document into an example set, you need to group the entries in a collection.

    This can be easily done with the Split Document into Collection Operator from the Text Processing Extension.

     

    If the document looks exactly as your sample, then the split string is " \n" to indicate a new line (as parameter without the quotes).

     

    See this sample process as an example:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document (3)" width="90" x="45" y="85">
    <parameter key="text" value="{&quot;review_id&quot;:&quot;x7mDIiDB3jEiPGPHOmDzyw&quot;,&quot;user_id&quot;:&quot;msQe1u7Z_XuqjGoqhB0J5g&quot;,&quot;business_id&quot;:&quot;iCQpiavjjPzJ5_3gPD5Ebg&quot;,&quot;stars&quot;:2,&quot;date&quot;:&quot;2011-02-25&quot;,&quot;text&quot;:&quot;The pizza was okay. Not the best I've had. I prefer Biaggio's on Flamingo \/ Fort Apache. The chef there can make a MUCH better NY style pizza. The pizzeria @ Cosmo was over priced for the quality and lack of personality in the food. Biaggio's is a much better pick if youre going for italian - family owned, home made recipes, people that actually CARE if you like their food. You dont get that at a pizzeria in a casino. I dont care what you say...&quot;,&quot;useful&quot;:0,&quot;funny&quot;:0,&quot;cool&quot;:0}&#10;{&quot;review_id&quot;:&quot;dDl8zu1vWPdKGihJrwQbpw&quot;,&quot;user_id&quot;:&quot;msQe1u7Z_XuqjGoqhB0J5g&quot;,&quot;business_id&quot;:&quot;pomGBqfbxcqPv14c3XH-ZQ&quot;,&quot;stars&quot;:5,&quot;date&quot;:&quot;2012-11-13&quot;,&quot;text&quot;:&quot;I love this place! My fiance And I go here atleast once a week. The portions are huge! Food is amazing. I love their carne asada. They have great lunch specials... Leticia is super nice and cares about what you think of her restaurant. You have to try their cheese enchiladas too the sauce is different And amazing!!!&quot;,&quot;useful&quot;:0,&quot;funny&quot;:0,&quot;cool&quot;:0}"/>
    </operator>
    <operator activated="true" class="operator_toolbox:split_document_into_collection" compatibility="1.5.000" expanded="true" height="82" name="Split Document into Collection" width="90" x="313" y="85">
    <parameter key="split_string" value="\n"/>
    </operator>
    <operator activated="true" class="text:json_to_data" compatibility="8.1.000" expanded="true" height="82" name="JSON To Data" width="90" x="581" y="85"/>
    <connect from_op="Create Document (3)" from_port="output" to_op="Split Document into Collection" to_port="document"/>
    <connect from_op="Split Document into Collection" from_port="collection" to_op="JSON To Data" to_port="documents 1"/>
    <connect from_op="JSON To Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • kaymankayman Member Posts: 662 Unicorn

    Another way to deal with this would be to cut your json first using a jsonpath constructor to get rid of your 'root', this would allow you to 'flatten' your tree so the results of the example set would be more in line with your expectations. This is quite common with JSON, specifically if you call webservices as the actual json is still encapsulated in a result node. 

     

    so load your json -> cut the root with jsonpath (something simple like $.. could already do miracles) -> json to data

  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research

    @David_A wrote:

    Hi,

     

    to process the JSON document into an example set, you need to group the entries in a collection.

    This can be easily done with the Split Document into Collection Operator from the Text Processing Extension.

     

    Hi, 

     

    Just a small addition, the Split Document into Collection operator is from the Operator Toolbox Extension. But as @David_A says, it is probably exactly what you need.

     

    Best regards,
    Fabian

     

Sign In or Register to comment.