The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Problem with Get Pages- Operator in combination with read csv (duplicate attribute)
informatist
Member Posts: 1 Learner III
I just started to use rapidminer and I have a problem with the operator "get pages". When I start my process, the tool says "Process failed. Duplicate attribute name: URL". I'm starting with a csv-file which has names in the first column. in the second column, which is called "URL" and classified as "file path" attribute in the first operator (read csv), there are links, which i want to open with the operator "get pages". in the "get pages" operator, I selected URL as link attribute. I hope you can help me... the whole error message is "Exception: java.lang.IllegalArgumentException
Message: Duplicate attribute name: URL Stack trace: com.rapidminer.example.SimpleAttributes.register(SimpleAttributes.java:124) com.rapidminer.example.SimpleAttributes.add(SimpleAttributes.java:203) com.rapidminer.example.AbstractAttributes.addRegular(AbstractAttributes.java:94) com.rapidminer.operator.web.features.construction.RetrievePagesOperator.doWork(RetrievePagesOperator.java:124) com.rapidminer.operator.Operator.execute(Operator.java:1002) com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:76) com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:811) com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:806) java.security.AccessController.doPrivileged(Native Method) com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:806) com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:392) com.rapidminer.operator.Operator.execute(Operator.java:1002) com.rapidminer.Process.run(Process.java:1195) com.rapidminer.Process.run(Process.java:1091) com.rapidminer.Process.run(Process.java:1044) com.rapidminer.Process.run(Process.java:1039) com.rapidminer.Process.run(Process.java:1029) com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)
My XML-Process:
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.2.003" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="csv_file" value="/Users/test/agrar Kopie.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Link.true.polynominal.attribute"/>
<parameter key="1" value="URL.true.file_path.base_value"/>
</list>
<parameter key="datamanagement" value="float_array"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="7.2.003" expanded="true" height="82" name="Filter Example Range" width="90" x="246" y="34">
<parameter key="first_example" value="-2"/>
<parameter key="last_example" value="-1"/>
</operator>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.2.001" expanded="true" height="68" name="Get Pages" width="90" x="514" y="34">
<parameter key="link_attribute" value="URL"/>
<parameter key="random_user_agent" value="true"/>
<parameter key="accept_cookies" value="all"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.2.003" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="csv_file" value="/Users/test/agrar Kopie.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Link.true.polynominal.attribute"/>
<parameter key="1" value="URL.true.file_path.base_value"/>
</list>
<parameter key="datamanagement" value="float_array"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="7.2.003" expanded="true" height="82" name="Filter Example Range" width="90" x="246" y="34">
<parameter key="first_example" value="-2"/>
<parameter key="last_example" value="-1"/>
</operator>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.2.001" expanded="true" height="68" name="Get Pages" width="90" x="514" y="34">
<parameter key="link_attribute" value="URL"/>
<parameter key="random_user_agent" value="true"/>
<parameter key="accept_cookies" value="all"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I hope someone can help me with this problem.
Thank you in advance!
0
Answers
What happens if you change the name of the second column in the raw csv to something else, like "link"? Does the error still occur?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts