Indico Text and Image Analysis
This is the second of several articles to help people use external APIs from within RapidMiner. Here I will show how to access the Indico APIs (indico.io) which is a huge collection of tools for text and image analysis:
Text Analysis: Text Input Format, Sentiment, Sentiment HQ, Text Tags, Language Predictor, Political Analysis, Keywords, People, Places, Organizations, Twitter Engagement, Personality, Personas, Text Features, Relevance, Emotion, Intersections, Analyze Text, and Sentence Splitting
Image Analysis: Image Input Format, Facial Emotion Recognition, Image Features, Facial Features, Facial Localization, Content Filtering, Image Recognition, and Analyze Image
I will show below an example of one text analysis API, Text Tags, and one image analysis, Facial Emotion Recognition, and you should be able to adapt easily to any of the others. The full API documentation is here: https://indico.io/docs
INDICO TEXT TAGS
Here I am going to show how to use the Indico.io API “Text Tags” to take text and extract the likelihood that the text contains one or more of 111 possible topics (tags). You can of course change this to whatever you want. I then add a short RapidMiner process to reduce this down to the top three tags.
1. You will need to create a free Indico account to get an API key. You do this on https://indico.io/ The key should look like a long string of alphanumeric characters. Keep this key secure as it is the way Indico authenticates and allocates the billing. As of Dec 2016, Indico’s “Pay-as-you-Go” account allows up to 10,000 free API calls per month. After that it is $0.006 per call up to 250,000 calls, and so forth (see https://indico.io/dashboard/plans for more info on pricing).
2. If you have not already done so, download the Web Mining extension in RapidMiner Studio.
3. Build a process that sends a text attribute (called “text”) to the Enrich Data by Webservice operator (found in the Web Mining extension) and then connect to the results. I have included below sample process if you want to use mine as a starting point (you will need to insert your own API key).
4. The only hard part here (and the only thing that changes from API to API) is how you set the “Enrich Data via Webservice” operator. This is very similar to the Google Cloud API set-up (see previous post) but with the following changes:
query type: JSON path
attribute type: Numerical
JSONpath queries:
Anime $..Anime
Anthropology $..Anthropology
etc…
[There are 111 of these tags if you want all of them. If you grab the XML from the sample process, you can save yourself a lot of work typing them all in manually.]
Request method: POST
Body: {"data":"<%text%>"}
URL: https://apiv2.indico.io/texttags
That’s it. Results should look like this:
INDICO IMAGE FACIAL EMOTION RECOGNITION
Here I am going to show how to use the Indico.io API “Facial Emotion Recognition” to take a image [containing a human face] and extract the likelihood that the image contains one or more of the six possible emotions: happy, sad, angry, fear, surprise, neutral. You can of course change this to whatever you want.
1. You will need to create a free Indico account to get an API key and get the Web Mining Extension (see above).
2. Build a process that sends an image URL text attribute (called “URL”) to the Enrich Data by Webservice operator.
3. Parameters for Enrich Data via Webservice:
query type: JSON path
attribute type: Numerical
JSONpath queries:
Happy $..happy
Sad $..sad
Angry $..angry
Fear $..fear
Surprise $..surprise
Neutral $..neutral
Request method: POST
Body: {"data”:”<%URL%>”}
URL: https://apiv2.indico.io/fer
That’s it. If you use this image (https://pbs.twimg.com/profile_images/7962438846365
<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000-BETA">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.4.000-BETA" expanded="true" name="Process">
<process expanded="true">
<operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess" width="90" x="112" y="34">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
<parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data" width="90" x="179" y="34">
<parameter key="text_attribute" value="text"/>
<parameter key="add_meta_information" value="false"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="Libertarian" value="$..Libertarian"/>
<parameter key="Green" value="$..Green"/>
<parameter key="Liberal" value="$..Liberal"/>
<parameter key="Conservative" value="$..Conservative"/>
</list>
<parameter key="request_method" value="POST"/>
<parameter key="service_method" value="foo"/>
<parameter key="body" value="{"data":"<%text%>"}"/>
<parameter key="url" value="https://apiv2.indico.io/political"/>
<list key="request_properties">
<parameter key="X-ApiKey" value="foo"/>
</list>
</operator>
<operator activated="true" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers (2)" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="[A-Z].*"/>
</operator>
<operator activated="true" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (4)" width="90" x="581" y="34">
<process expanded="true">
<operator activated="true" class="de_pivot" compatibility="7.4.000-BETA" expanded="true" height="82" name="De-Pivot (2)" width="90" x="45" y="34">
<list key="attribute_name">
<parameter key="Probability" value="[A-Z].*"/>
</list>
<parameter key="index_attribute" value="Politics"/>
<parameter key="create_nominal_index" value="true"/>
</operator>
<operator activated="true" class="sort" compatibility="7.4.000-BETA" expanded="true" height="82" name="Sort (2)" width="90" x="179" y="34">
<parameter key="attribute_name" value="Probability"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="7.4.000-BETA" expanded="true" height="82" name="Filter Example Range (2)" width="90" x="313" y="34">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="1"/>
</operator>
<connect from_port="in 1" to_op="De-Pivot (2)" to_port="example set input"/>
<connect from_op="De-Pivot (2)" from_port="example set output" to_op="Sort (2)" to_port="example set input"/>
<connect from_op="Sort (2)" from_port="example set output" to_op="Filter Example Range (2)" to_port="example set input"/>
<connect from_op="Filter Example Range (2)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">choose highest probability</description>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
<connect from_op="Documents to Data" from_port="example set" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_op="Parse Numbers (2)" to_port="example set input"/>
<connect from_op="Parse Numbers (2)" from_port="example set output" to_op="Subprocess (4)" to_port="in 1"/>
<connect from_op="Subprocess (4)" from_port="out 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Politics</description>
</operator>
<operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (2)" width="90" x="112" y="187">
<process expanded="true">
<operator activated="false" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (3)" width="90" x="45" y="34">
<parameter key="text" value="Je m'appelle Scott."/>
</operator>
<operator activated="false" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="103" name="Documents to Data (2)" width="90" x="179" y="34">
<parameter key="text_attribute" value="text"/>
<parameter key="add_meta_information" value="false"/>
</operator>
<operator activated="false" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (2)" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="Spanish" value="$..Spanish"/>
<parameter key="French" value="$..French"/>
<parameter key="English" value="$..English"/>
<parameter key="Portuguese" value="$..Portuguese"/>
</list>
<parameter key="request_method" value="POST"/>
<parameter key="service_method" value="foo"/>
<parameter key="body" value="{"data":"<%text%>"}"/>
<parameter key="url" value="https://apiv2.indico.io/language"/>
<list key="request_properties">
<parameter key="X-ApiKey" value="foo"/>
</list>
</operator>
<operator activated="false" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="[A-Z].*"/>
</operator>
<operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (3)" width="90" x="581" y="34">
<process expanded="true">
<operator activated="true" class="de_pivot" compatibility="7.4.000-BETA" expanded="true" height="82" name="De-Pivot" width="90" x="45" y="34">
<list key="attribute_name">
<parameter key="Probability" value="[A-Z].*"/>
</list>
<parameter key="index_attribute" value="Language"/>
<parameter key="create_nominal_index" value="true"/>
</operator>
<operator activated="true" class="sort" compatibility="7.4.000-BETA" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
<parameter key="attribute_name" value="Probability"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="7.4.000-BETA" expanded="true" height="82" name="Filter Example Range" width="90" x="313" y="34">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="1"/>
</operator>
<connect from_port="in 1" to_op="De-Pivot" to_port="example set input"/>
<connect from_op="De-Pivot" from_port="example set output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">choose highest probability</description>
</operator>
<connect from_op="Create Document (3)" from_port="output" to_op="Documents to Data (2)" to_port="documents 1"/>
<connect from_op="Documents to Data (2)" from_port="example set" to_op="Enrich Data by Webservice (2)" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice (2)" from_port="ExampleSet" to_op="Parse Numbers" to_port="example set input"/>
<connect from_op="Parse Numbers" from_port="example set output" to_op="Subprocess (3)" to_port="in 1"/>
<connect from_op="Subprocess (3)" from_port="out 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Language Detection</description>
</operator>
<operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (5)" width="90" x="246" y="34">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (2)" width="90" x="45" y="34">
<parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (3)" width="90" x="179" y="34">
<parameter key="text_attribute" value="text"/>
<parameter key="add_meta_information" value="false"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (3)" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="SentimentScore" value="$..results"/>
</list>
<parameter key="request_method" value="POST"/>
<parameter key="service_method" value="foo"/>
<parameter key="body" value="{"data":"<%text%>"}"/>
<parameter key="url" value="https://apiv2.indico.io/sentiment"/>
<list key="request_properties">
<parameter key="X-ApiKey" value="foo"/>
</list>
</operator>
<operator activated="true" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers (3)" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="SentimentScore"/>
<parameter key="regular_expression" value="[A-Z].*"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.4.000-BETA" expanded="true" height="82" name="Generate Attributes" width="90" x="581" y="34">
<list key="function_descriptions">
<parameter key="Sentiment" value="if(SentimentScore>0.67,"Positive", if(SentimentScore<0.33,"Negative","Neutral"))"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Sentiment</description>
</operator>
<connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data (3)" to_port="documents 1"/>
<connect from_op="Documents to Data (3)" from_port="example set" to_op="Enrich Data by Webservice (3)" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice (3)" from_port="ExampleSet" to_op="Parse Numbers (3)" to_port="example set input"/>
<connect from_op="Parse Numbers (3)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Sentiment</description>
</operator>
<operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (6)" width="90" x="380" y="34">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (4)" width="90" x="45" y="34">
<parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (4)" width="90" x="179" y="34">
<parameter key="text_attribute" value="text"/>
<parameter key="add_meta_information" value="false"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (4)" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="SentimentScore" value="$..results"/>
</list>
<parameter key="request_method" value="POST"/>
<parameter key="service_method" value="foo"/>
<parameter key="body" value="{"data":"<%text%>"}"/>
<parameter key="url" value="https://apiv2.indico.io/sentimenthq"/>
<list key="request_properties">
<parameter key="X-ApiKey" value="foo"/>
</list>
</operator>
<operator activated="true" class="parse_numbers" compatibility="7.4.000-BETA" expanded="true" height="82" name="Parse Numbers (4)" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="SentimentScore"/>
<parameter key="regular_expression" value="[A-Z].*"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.4.000-BETA" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="581" y="34">
<list key="function_descriptions">
<parameter key="Sentiment" value="if(SentimentScore>0.67,"Positive", if(SentimentScore<0.33,"Negative","Neutral"))"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Sentiment</description>
</operator>
<connect from_op="Create Document (4)" from_port="output" to_op="Documents to Data (4)" to_port="documents 1"/>
<connect from_op="Documents to Data (4)" from_port="example set" to_op="Enrich Data by Webservice (4)" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice (4)" from_port="ExampleSet" to_op="Parse Numbers (4)" to_port="example set input"/>
<connect from_op="Parse Numbers (4)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Sentiment High Quality</description>
</operator>
<operator activated="false" class="subprocess" compatibility="7.4.000-BETA" expanded="true" height="82" name="Subprocess (7)" width="90" x="246" y="187">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document (5)" width="90" x="45" y="34">
<parameter key="text" value="Democratic candidate Hillary Clinton is excited for the upcoming election."/>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="7.3.000" expanded="true" height="82" name="Documents to Data (5)" width="90" x="179" y="34">
<parameter key="text_attribute" value="text"/>
<parameter key="add_meta_information" value="false"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Indico API Text Tags" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Numerical"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="Anime" value="$..anime"/>
<parameter key="Anthropology" value="$..anthropology"/>
<parameter key="Archery" value="$..archery"/>
<parameter key="Architecture" value="$..architecture"/>
<parameter key="Art" value="$..art"/>
<parameter key="Astronomy" value="$..astronomy"/>
<parameter key="Atheism" value="$..atheism"/>
<parameter key="Aviation" value="$..aviation"/>
<parameter key="Baseball" value="$..baseball"/>
<parameter key="Beer" value="$..beer"/>
<parameter key="Bicycling" value="$..bicycling"/>
<parameter key="Biology" value="$..biology"/>
<parameter key="Books" value="$..books"/>
<parameter key="Boxing" value="$..boxing"/>
<parameter key="Buddhism" value="$..buddhism"/>
<parameter key="Business" value="$..business"/>
<parameter key="Cars" value="$..cars"/>
<parameter key="Christianity" value="$..christianity"/>
<parameter key="Climbing" value="$..climbing"/>
<parameter key="Comedy" value="$..comedy"/>
<parameter key="Comics" value="$..comics"/>
<parameter key="Conspiracy" value="$..conspiracy"/>
<parameter key="Cooking" value="$..cooking"/>
<parameter key="Crafts" value="$..crafts"/>
<parameter key="Cricket" value="$..cricket"/>
<parameter key="Design" value="$..design"/>
<parameter key="Dieting" value="$..dieting"/>
<parameter key="Diy" value="$..diy"/>
<parameter key="Drugs" value="$..drugs"/>
<parameter key="Economic_Discussion" value="$..economic_discussion"/>
<parameter key="Education" value="$..education"/>
<parameter key="Electronics" value="$..electronics"/>
<parameter key="Energy" value="$..energy"/>
<parameter key="Entertainment_News" value="$..entertainment_news"/>
<parameter key="Environmental" value="$..environmental"/>
<parameter key="Fashion" value="$..fashion"/>
<parameter key="Fiction" value="$..fiction"/>
<parameter key="Film" value="$..film"/>
<parameter key="Fishing" value="$..fishing"/>
<parameter key="Fitness" value="$..fitness"/>
<parameter key="Gaming" value="$..gaming"/>
<parameter key="Gardening" value="$..gardening"/>
<parameter key="Gender_Issues" value="$..gender_issues"/>
<parameter key="General_Food" value="$..general_food"/>
<parameter key="Golf" value="$..golf"/>
<parameter key="Guns" value="$..guns"/>
<parameter key="Health" value="$..health"/>
<parameter key="History" value="$..history"/>
<parameter key="Hockey" value="$..hockey"/>
<parameter key="Hunting" value="$..hunting"/>
<parameter key="Individualist_Politics" value="$..individualist_politics"/>
<parameter key="Investment" value="$..investment"/>
<parameter key="Islam" value="$..islam"/>
<parameter key="Jobs" value="$..jobs"/>
<parameter key="Judaism" value="$..judaism"/>
<parameter key="Left_Politics" value="$..left_politics"/>
<parameter key="Lgbt" value="$..lgbt"/>
Comments
@sgenzer, great article! This looks quite helpful and I wanted to try this out and was looking for the sample process referenced, but I could not find it. Did it get attached to this post, or should I find it elsewhere?
Also the link to indico.io above actually seems to direct to cloud.google.com.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I'm trying to use this, for images, but I get "cannot connect to https://". Same error I got when trying to call Azure!!
WTF is going on? Any ideas?
I don't think this process was designed for Azure so it may need to be modified beyond just adding a URL. W.R.T. to the Create Document operator, it's not designed to load in images, just txt, pdf, xml, and, html files.