The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"How to filter lines with regexp with RapidMiner?"
Hello!
I have ten millions txt files in a folder (100KB/file), and I would filter special lines from this files.
In UltraEdit I use this regexp:
How can I filter it? RapidMiner could do it?
My process is this:
1. Filter line by this regexp from the ten millions txt:
Can you help solve my problem?
Thanks,
Attila
I have ten millions txt files in a folder (100KB/file), and I would filter special lines from this files.
In UltraEdit I use this regexp:
<strong class="name".*-id-.*My problem is the large number of files, because the Ultraedit goes wrong...
How can I filter it? RapidMiner could do it?
My process is this:
1. Filter line by this regexp from the ten millions txt:
<strong class="name".*-id-.*2. The filtered line must be in a new txt file...
Can you help solve my problem?
Thanks,
Attila
Tagged:
0
Answers
you can use the text processing extension to filter the files. Please have a look at the attached process: inside the process documents operator, the Tokenize operator cuts the document into separate lines, and the next operator, Filter Tokens, selects only the lines containing the word "hallo".
Best, Marius