The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Problem with Get Pages- Operator in combination with read csv (duplicate attribute)

informatistinformatist Member Posts: 1 Learner III
edited June 2019 in Help

I just started to use rapidminer and I have a problem with the operator "get pages". When I start my process, the tool says "Process failed. Duplicate attribute name: URL". I'm starting with a csv-file which has names in the first column. in the second column, which is called "URL" and classified as "file path" attribute in the first operator (read csv), there are links, which i want to open with the operator "get pages". in the "get pages" operator, I selected URL as link attribute. I hope you can help me... the whole error message is "Exception: java.lang.IllegalArgumentException



Message: Duplicate attribute name: URL Stack trace: com.rapidminer.example.SimpleAttributes.register(SimpleAttributes.java:124) com.rapidminer.example.SimpleAttributes.add(SimpleAttributes.java:203) com.rapidminer.example.AbstractAttributes.addRegular(AbstractAttributes.java:94) com.rapidminer.operator.web.features.construction.RetrievePagesOperator.doWork(RetrievePagesOperator.java:124) com.rapidminer.operator.Operator.execute(Operator.java:1002) com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:76) com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:811) com.rapidminer.operator.ExecutionUnit$3.run(ExecutionUnit.java:806) java.security.AccessController.doPrivileged(Native Method) com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:806) com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:392) com.rapidminer.operator.Operator.execute(Operator.java:1002) com.rapidminer.Process.run(Process.java:1195) com.rapidminer.Process.run(Process.java:1091) com.rapidminer.Process.run(Process.java:1044) com.rapidminer.Process.run(Process.java:1039) com.rapidminer.Process.run(Process.java:1029) com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

 

My XML-Process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.2.003" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="csv_file" value="/Users/test/agrar Kopie.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Link.true.polynominal.attribute"/>
<parameter key="1" value="URL.true.file_path.base_value"/>
</list>
<parameter key="datamanagement" value="float_array"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="7.2.003" expanded="true" height="82" name="Filter Example Range" width="90" x="246" y="34">
<parameter key="first_example" value="-2"/>
<parameter key="last_example" value="-1"/>
</operator>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.2.001" expanded="true" height="68" name="Get Pages" width="90" x="514" y="34">
<parameter key="link_attribute" value="URL"/>
<parameter key="random_user_agent" value="true"/>
<parameter key="accept_cookies" value="all"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

I hope someone can help me with this problem.

Thank you in advance!

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    What happens if you change the name of the second column in the raw csv to something else, like "link"?  Does the error still occur?

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.