The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
CSV with uncommon header can't be processed correctly
mugicagonzalez_
Member Posts: 14 Contributor I
Hi all,
I am using the "Read CSV" operator to read a CSV-file with multiple lines. The problem is that the first few lines are all technical information that are not in a valid CSV format, so I define them as Comment. But then, only column one of the last row with the values is read.
Is this a common error? I think it might be caused because there are more lines, with different amount of columns, but because I define these as Comment I don't understand why it doesn't work.
This is my operator for "TEST_Jette.csv"
Thanks in advance
Pello
I am using the "Read CSV" operator to read a CSV-file with multiple lines. The problem is that the first few lines are all technical information that are not in a valid CSV format, so I define them as Comment. But then, only column one of the last row with the values is read.
Is this a common error? I think it might be caused because there are more lines, with different amount of columns, but because I define these as Comment I don't understand why it doesn't work.
This is my operator for "TEST_Jette.csv"
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="read_csv" compatibility="8.1.003" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34"> <parameter key="csv_file" value="/Users/pello/Downloads/TEST_Jette.csv"/> <parameter key="skip_comments" value="true"/> <parameter key="parse_numbers" value="false"/> <parameter key="decimal_character" value=","/> <parameter key="first_row_as_names" value="false"/> <list key="annotations"> <parameter key="0" value="Comment"/> <parameter key="1" value="Comment"/> <parameter key="2" value="Comment"/> <parameter key="3" value="Comment"/> <parameter key="4" value="Comment"/> <parameter key="5" value="Comment"/> <parameter key="6" value="Comment"/> <parameter key="7" value="Comment"/> <parameter key="8" value="Comment"/> <parameter key="9" value="Comment"/> <parameter key="10" value="Comment"/> <parameter key="11" value="Comment"/> <parameter key="12" value="Comment"/> <parameter key="13" value="Comment"/> <parameter key="14" value="Comment"/> <parameter key="15" value="Comment"/> <parameter key="16" value="Comment"/> <parameter key="17" value="Name"/> </list> <parameter key="encoding" value="UTF-8"/> <parameter key="read_all_values_as_polynominal" value="true"/> <list key="data_set_meta_data_information"> <parameter key="0" value="timestamp.true.polynominal.attribute"/> </list> </operator> <connect from_op="Read CSV" from_port="output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Thanks in advance
Pello
Tagged:
1
Best Answer
-
mugicagonzalez_ Member Posts: 14 Contributor ISOLVED! Thanks to to jczgalla (can't post link to thread)!
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="open_file" compatibility="8.1.003" expanded="true" height="68" name="Open File" width="90" x="45" y="34"> <parameter key="filename" value="/Users/pello/Downloads/TEST_Jette.csv"/> </operator> <operator activated="true" class="text:read_document" compatibility="8.1.000" expanded="true" height="68" name="Read Document" width="90" x="179" y="34"> <parameter key="extract_text_only" value="false"/> </operator> <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="313" y="34"> <parameter key="query_type" value="Regular Expression"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"> <parameter key="text" value="((?:[^"]+?|"(.|\n)*?"|)*?)\n"/> </list> <list key="regular_region_queries"/> <list key="xpath_queries"/> <list key="namespaces"/> <list key="index_queries"/> <list key="jsonpath_queries"/> <process expanded="true"> <operator activated="true" class="text:remove_document_parts" compatibility="8.1.000" expanded="true" height="68" name="Remove Document Parts" width="90" x="45" y="34"> <parameter key="deletion_regex" value="""/> </operator> <connect from_port="segment" to_op="Remove Document Parts" to_port="document"/> <connect from_op="Remove Document Parts" from_port="document" to_port="document 1"/> <portSpacing port="source_segment" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="447" y="34"> <parameter key="text_attribute" value="text"/> </operator> <operator activated="true" class="select_attributes" compatibility="8.1.003" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="text"/> </operator> <operator activated="true" class="filter_example_range" compatibility="8.1.003" expanded="true" height="82" name="Filter Example Range" width="90" x="715" y="34"> <parameter key="first_example" value="18"/> <parameter key="last_example" value="19"/> </operator> <operator activated="true" class="split" compatibility="8.1.003" expanded="true" height="82" name="Split" width="90" x="849" y="34"> <parameter key="split_pattern" value=";"/> </operator> <operator activated="true" class="rename_by_example_values" compatibility="8.1.003" expanded="true" height="82" name="Rename by Example Values" width="90" x="983" y="34"/> <connect from_op="Open File" from_port="file" to_op="Read Document" to_port="file"/> <connect from_op="Read Document" from_port="output" to_op="Cut Document" to_port="document"/> <connect from_op="Cut Document" from_port="documents" to_op="Documents to Data" to_port="documents 1"/> <connect from_op="Documents to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/> <connect from_op="Filter Example Range" from_port="example set output" to_op="Split" to_port="example set input"/> <connect from_op="Split" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/> <connect from_op="Rename by Example Values" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
2
Answers
[Helpful hint from community manager - if you just "like" a few posts or mark something as solution or practically anything else, you will gain points and move way beyond Newbie quickly!!]
Scott
RapidMiner Studio 9.1 will feature a better way of skipping lines and defining the header row in combination with the structural changes that come with it. So the workaround above will soon be no longer necessary.
Regards,
Marco