The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Not following links
Hi,
I have adapted a Crawl Web process that worked elsewhere, that is not working in my latest example. All I have changed is the starting URL and the crawling rules.
Anyone know why this might be happening?
Thanks.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="web:crawl_web" compatibility="5.3.001" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="165">
<parameter key="url" value="http://www.heatingspareparts.com/index.asp"/>
<list key="crawling_rules">
<parameter key="store_with_matching_url" value=".+\&gc=.+"/>
<parameter key="follow_link_with_matching_url" value=".+suppliername.+|.+\&gc=.+"/>
</list>
<parameter key="output_dir" value="C:\scratch\RapidMiner\Gas Council"/>
<parameter key="extension" value="html"/>
<parameter key="max_depth" value="3"/>
<parameter key="max_page_size" value="10000"/>
<parameter key="user_agent" value="Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; MDDRJS)"/>
</operator>
<connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I have adapted a Crawl Web process that worked elsewhere, that is not working in my latest example. All I have changed is the starting URL and the crawling rules.
Anyone know why this might be happening?
Thanks.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="web:crawl_web" compatibility="5.3.001" expanded="true" height="60" name="Crawl Web" width="90" x="45" y="165">
<parameter key="url" value="http://www.heatingspareparts.com/index.asp"/>
<list key="crawling_rules">
<parameter key="store_with_matching_url" value=".+\&gc=.+"/>
<parameter key="follow_link_with_matching_url" value=".+suppliername.+|.+\&gc=.+"/>
</list>
<parameter key="output_dir" value="C:\scratch\RapidMiner\Gas Council"/>
<parameter key="extension" value="html"/>
<parameter key="max_depth" value="3"/>
<parameter key="max_page_size" value="10000"/>
<parameter key="user_agent" value="Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; MDDRJS)"/>
</operator>
<connect from_op="Crawl Web" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0
Answers
Does anyone know what might be preventing this proces from producing results?
Thanks.