Detecting written text language in text mining using DetectLanguage API

sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor Posts: 2,130  Community Manager
edited December 2018 in Knowledge Base

 Hello RapidMiners -

 

Yet another nice, easy-to-use API that you can use to enrich your text mining processes if you have text in a variety of languages.  Thanks to user @tibi for the idea!

 

Super easy to get started:

 

1. Go to https://detectlanguage.com, sign up, and get an API key

2. Input your "foreign" language text and run it through the Encode URLs operator (to convert to UTF-8)

3. Use our classic "Enrich Data via Webservice" operator or "Get Page" operator with your credentials to query the API and get the  JSON response.

4. Parse the JSON using any usual methods.

 

Screen Shot 2017-12-07 at 4.34.58 PM.pngRapidMiner process using Enrich Data via Webservice

 

Screen Shot 2017-12-07 at 4.35.35 PM.pngmessage to be translated

Screen Shot 2017-12-07 at 4.36.53 PM.pngparse the JSON response

Screen Shot 2017-12-07 at 4.37.52 PM.pngnice example set to be used in text mining!

XML process is below.  Enjoy!

 

Scott

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="generate_data_user_specification" compatibility="8.0.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
<list key="attribute_values">
<parameter key="message" value="&quot;buenos dias señor&quot;"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="179" y="34">
<parameter key="url_attribute" value="message"/>
<parameter key="encoding" value="UTF-8"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="language" value="$..language"/>
<parameter key="isReliable" value="$..isReliable"/>
<parameter key="confidence" value="$..confidence"/>
</list>
<parameter key="url" value="http://ws.detectlanguage.com/0.2/detect?q=&amp;lt;%message%&amp;gt;&amp;amp;key=e[enter-your-key-here]"/>
<list key="request_properties"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Encode URLs" to_port="example set input"/>
<connect from_op="Encode URLs" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Telcontar120

Comments

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor Posts: 2,130  Community Manager

    @s242936 and others - please note that you MUST CREATE YOUR OWN API KEY TO USE THIS PROCESS. See step 1 above. :)

     

    Scott

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor Posts: 2,130  Community Manager

    second note: if you are using Process Documents as an input to this process, you may need to use a Set Role operator to set your text attribute to "regular"....

Sign In or Register to comment.