The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Not following links

DazzermanDazzerman Member Posts: 4 Contributor I
edited November 2018 in Help
Hi,

I have adapted a Crawl Web process that worked elsewhere, that is not working in my latest example.  All I have changed is the starting URL and the crawling rules.

Anyone know why this might be happening?

Thanks.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="web:crawl_web" compatibility="5.3.001" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="165">
        <parameter key="url" value="http://www.heatingspareparts.com/index.asp"/>
        <list key="crawling_rules">
          <parameter key="store_with_matching_url" value=".+\&amp;gc=.+"/>
          <parameter key="follow_link_with_matching_url" value=".+suppliername.+|.+\&amp;gc=.+"/>
        </list>
        <parameter key="output_dir" value="C:\scratch\RapidMiner\Gas Council"/>
        <parameter key="extension" value="html"/>
        <parameter key="max_depth" value="3"/>
        <parameter key="max_page_size" value="10000"/>
        <parameter key="user_agent" value="Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; MDDRJS)"/>
      </operator>
      <connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • DazzermanDazzerman Member Posts: 4 Contributor I
    I have adapted this process to work correctly on yet another website, but still do not understand why it is not working for the details posted here.

    Does anyone know what might be preventing this proces from producing results?

    Thanks.
Sign In or Register to comment.