Detecting written text language in text mining using DetectLanguage API
Hello RapidMiners -
Yet another nice, easy-to-use API that you can use to enrich your text mining processes if you have text in a variety of languages. Thanks to user @tibi for the idea!
Super easy to get started:
1. Go to https://detectlanguage.com, sign up, and get an API key
2. Input your "foreign" language text and run it through the Encode URLs operator (to convert to UTF-8)
3. Use our classic "Enrich Data via Webservice" operator or "Get Page" operator with your credentials to query the API and get the JSON response.
4. Parse the JSON using any usual methods.
RapidMiner process using Enrich Data via Webservice
message to be translated
parse the JSON response
nice example set to be used in text mining!
XML process is below. Enjoy!
Scott
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="generate_data_user_specification" compatibility="8.0.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
<list key="attribute_values">
<parameter key="message" value=""buenos dias señor""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="179" y="34">
<parameter key="url_attribute" value="message"/>
<parameter key="encoding" value="UTF-8"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="language" value="$..language"/>
<parameter key="isReliable" value="$..isReliable"/>
<parameter key="confidence" value="$..confidence"/>
</list>
<parameter key="url" value="http://ws.detectlanguage.com/0.2/detect?q=&lt;%message%&gt;&amp;key=e[enter-your-key-here]"/>
<list key="request_properties"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Encode URLs" to_port="example set input"/>
<connect from_op="Encode URLs" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Comments
@s242936 and others - please note that you MUST CREATE YOUR OWN API KEY TO USE THIS PROCESS. See step 1 above.
Scott
second note: if you are using Process Documents as an input to this process, you may need to use a Set Role operator to set your text attribute to "regular"....