The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Cannot retrieve data with "Enrich Data by Webservice"

rachel_lomaskyrachel_lomasky Member Posts: 52 Maven
edited November 2018 in Help

Hi,

 

I've downloaded the Web Mining extension and would like to use it to connect to a Google-provided webservice.  I've constructed a GET url, and it works fine when I just paste it into a browser (bunch of JSON returned).  However, when I run it with "Enrich Data by Webservice", I get:

Dec 3, 2016 10:31:57 AM SEVERE: Process failed: Cannot retrieve data from the specified URL 'https://www.googleapis.com/analytics/v3/data/ga'.
Dec 3, 2016 10:31:57 AM SEVERE: Here:
Dec 3, 2016 10:31:57 AM SEVERE: Process[1] (Process)
Dec 3, 2016 10:31:57 AM SEVERE: subprocess 'Main Process'
Dec 3, 2016 10:31:57 AM SEVERE: +- Retrieve questions[1] (Retrieve)
Dec 3, 2016 10:31:57 AM SEVERE: ==> +- Enrich Data by Webservice[1] (Enrich Data by Webservice)

Two questions:

1. Why doesn't it work?

2. Is there a way that I can see the query string to do debugging?

 

Thank you,

Rachel

Best Answer

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted

    here's a sample process (it's using RM 7.3):

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
    <list key="attribute_values">
    <parameter key="foo" value="0"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="179" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="foo2" value=".*"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    <parameter key="url" value="https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXX&amp;amp;start-date=30daysAgo&amp;amp;end-date=yesterday&amp;amp;metrics=ga:sessions&amp;amp;access_token=XXXXXX"/>
    <list key="request_properties"/>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
    <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process> 

    I just tested this with my own Google API account and it works.

     

    Scott 

Answers

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi...I use Google API all the time with this operator and it is quite tricky to get all the settings right. First guess - did you encode your URL?  Can you share your parameter settings (without your key of course)?

    The answer to your second question is no, RM does not give you the same verbose output as you would get with the terminal.  Sometimes when I can't get it right, I do a cURL at the command line, get that to work, and then go back to RM.  

    Scott

  • rachel_lomaskyrachel_lomasky Member Posts: 52 Maven

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
    <operator activated="true" class="retrieve" compatibility="7.2.003" expanded="true" height="68" name="Retrieve questions" width="90" x="45" y="85">
    <parameter key="repository_entry" value="../../data/import/questions"/>
    </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
    <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.2.001" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="246" y="85">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <parameter key="attribute_type" value="Nominal"/>
    <list key="regular_expression_queries"/>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <parameter key="ignore_CDATA" value="true"/>
    <parameter key="assume_html" value="true"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    <parameter key="request_method" value="GET"/>
    <parameter key="service_method" value="reportRequests"/>
    <parameter key="url" value="https://www.googleapis.com/analytics/v3/data/ga"/>
    <parameter key="delay" value="0"/>
    <list key="request_properties">
    <parameter key="ids" value="ga:myids"/>
    <parameter key="start-date" value="30daysAgo"/>
    <parameter key="end-date" value="yesterday"/>
    <parameter key="metrics" value="ga:sessions"/>
    <parameter key="access_token" value="my access token"/>
    </list>
    <parameter key="encoding" value="SYSTEM"/>
    </operator>
    </process>

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi ok thanks.  It was hard to figure out that XML (it's from ver 7.2 and there's some strange cut and paste there) but I think I know what you're doing.  I have not used Google Analytics API before but for a GET request, I would first try putting all the parameters in the URL, rather than in "request properties".  Don't ask me why this makes a difference, but in my experience, it does.  Try something like this in the URL:

     

    https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3A<your number here>&start-date=30daysAgo&end-date=yesterday&metrics=ga%3Asessions&access_token=<your access token>

     

    I also don't see anything in your String Matching (called "Machting in the XML!) query so you'll need to tell RapidMiner what you want to do with the response.  I would recommend just doing Regular Expression and using .* for now - just to ensure you're getting a response.

     

    Scott

     

  • rachel_lomaskyrachel_lomasky Member Posts: 52 Maven

    Thank you, this works.  Now to figure out how to parse the response...

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    <grin> should not be too bad.  There are a variety of tools to use.  Post if you need more help.

     

    Scott


  • rachel_lomaskyrachel_lomasky Member Posts: 52 Maven

    It ain't pretty, but I got it working :).

  • khairulnizamkhairulnizam Member Posts: 1 Learner III

    Hi, I have the same problem with the "Enrich Data by Webservice". I already tried the parameters using curl.. its work. Here is my process:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="45" y="136">
    <parameter key="text" value="I love hotdogs. Hotdogs are the greatest. They are hot and delicious."/>
    <parameter key="add label" value="false"/>
    <parameter key="label_type" value="nominal"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="7.4.001" expanded="true" height="82" name="Documents to Data" width="90" x="179" y="136">
    <parameter key="text_attribute" value="text"/>
    <parameter key="add_meta_information" value="true"/>
    <parameter key="datamanagement" value="double_sparse_array"/>
    </operator>
    <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="136">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <parameter key="attribute_type" value="Nominal"/>
    <list key="regular_expression_queries">
    <parameter key="all" value=".*"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <parameter key="ignore_CDATA" value="true"/>
    <parameter key="assume_html" value="true"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    <parameter key="request_method" value="POST"/>
    <parameter key="body" value="text=&lt;%text%&gt;"/>
    <parameter key="url" value="https://twinword-sentiment-analysis.p.mashape.com/analyze/"/>
    <parameter key="delay" value="0"/>
    <list key="request_properties">
    <parameter key="X-Mashape-Key" value="QhBpo6d9YgmsherFsSBVfycN0czjp1rf0HIjsnooes2EdNYmao"/>
    <parameter key="Content-Type" value="application/x-www-form-urlencoded"/>
    <parameter key="Accept" value="application/json"/>
    </list>
    <parameter key="encoding" value="SYSTEM"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_op="Enrich Data by Webservice" to_port="Example Set"/>
    <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I think there's a problem with your API key. I tried your XML code and get a JSON respons that say "

    {"message":"Missing Mashape application key. Go to http:\/\/docs.mashape.com\/api-keys to learn how to get your API application key."}

      

  • rachel_lomaskyrachel_lomasky Member Posts: 52 Maven

    My problem was that I was quoting parameters. Everything should be non-quoted.

Sign In or Register to comment.