The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
read csv file skip first n lines
Telcontar120
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
The Read CSV operator should be given a parameter option to skip the first n lines (often header lines).
While there is already an option to allow for skipping comments, if the lines do not have a comment indicator, that requires users to manually go in and modify the lines in the file, which is not efficient for automated processing of large numbers of files.
Instead, if the operator could automatically skip the first n lines and then take the header from the n+1 row and read all data normally thereafter, it would drastically improve efficiency of working with csv files.
While there is already an option to allow for skipping comments, if the lines do not have a comment indicator, that requires users to manually go in and modify the lines in the file, which is not efficient for automated processing of large numbers of files.
Instead, if the operator could automatically skip the first n lines and then take the header from the n+1 row and read all data normally thereafter, it would drastically improve efficiency of working with csv files.
Tagged:
0
Comments
Brian,
you can put the first n-lines to "Comment:
that should do the trick.
Dortmund, Germany
@mschmitz Thanks, this actually pointed me to the answer. If you don't want to run the wizard (which I wanted to avoid since it was going to be in a loop using the "file" input rather than pointing to a specific file), I think you can still accomplish the same thing by using the "Annotations" parameter and setting the first lines to comment, like so:
I was getting hung up before because there is a separate parameter for a comment character, which I didn't want to have to add manually, but I tested using this method and it appears to work, starting the import on the specified line and taking the correct number of columns from that. So thanks for the pointer!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
workaround available