The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Missing rows with ExampleSource
I am trying to import a large dataset into RM. As source I have a CSV File with about 200 rows and app. 250 columns.
(ExampleCSVSource gives an error complaining that there are different columns in line...)
Using the ExampleSource and the ExampleSource Wizard I can see in the lower part of the window that 189 rows and 251 columns to import, so I click the Finish button.
When click on the Edit... Button to see my dataset I get table with all 251 columns, but only 19 examples.
Where are my missing rows? Any help is welcome!
BTW: I am still using version 4.1
(ExampleCSVSource gives an error complaining that there are different columns in line...)
Using the ExampleSource and the ExampleSource Wizard I can see in the lower part of the window that 189 rows and 251 columns to import, so I click the Finish button.
When click on the Edit... Button to see my dataset I get table with all 251 columns, but only 19 examples.
Where are my missing rows? Any help is welcome!
BTW: I am still using version 4.1
0
Answers
in the AttributeEditor, you can define which rows should be shown and press the Update button in the panel on the left. You could of course also simply load the data and see if all data is there. Just run the process and check the meta data view and the data view.
If you have missing values in this data set at the end of the lines I would suggest upgrading to RapidMiner 4.2 since there was a bug for previous versions ignoring missing values at the end of lines in CSV files.
Cheers,
Ingo
Another funny thing about it is that if i import my data to OpenOffice, export the data as XLS File and load this file as ExcelExampleSource I get all columns and rows. I have read some threads about that bug in other posts and I have switched to version 4.2.
Thanks for your help,
Markus
the attribute editor stops reading if anything goes wrong. So, I assume that there is something unusual with line 19 or 20. Probably, there is a problem with quoting or with the definition of the column separators not matching your data format. If you like, and the data is not too sensible, you could post an excerpt of your data and I could have a look what the problem might be.
Cheers,
Ingo
I am pretty sure that you are right with your assumption. I'll try and go through the CSV File with a text editor to check commas and the columns.
Greets,
Markus