The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Search for specific pattern within strings of characters
komal_chenthama
Member Posts: 3 Learner III
Hi,
I have a table with 1000s of rows, with long string of characters without spaces. An example of such a single row is below:
"MKFFAAAALFATSAMAAVCPDGGLFSNPLCCSSILLEAVGLDCTTPTAPVVTAGLFQANCASIGKQPACCVAPLAGQGILCNNPAGT"
I would like to filter out all the rows in my table that have following pattern C...CC...C..C..CC..C, where "." represents any character any number of times. Could anyone kindly suggest an operator or combinaton of operators for this task?
Tagged:
0
Answers
Hi @komal_chenthama
The Filter Examples operator allows you to match by a regular expression. I can build one to give you an example. In your question, C means... the character "C" or it can be any character?
@sgenzer (do you mind if I write a tutorial on regular expressions with RapidMiner, to be included in the RapidMiner documentation?)
All the best,
Rodrigo.
Hi @komal_chenthama,
Filter Examples with expression and matches(..) is the way to go. Attached is an example process.
~Martin
Dortmund, Germany
Hi @rfuentealba
Thanks for the quick response. In my example, I mean particularly the character "C".
Cheers,
Komal
PS: I would very much appreciate a tutorial on regular expression with RapidMiner.
Hi @komal_chenthama
The regular expression you are looking for is pretty simple:
C(.+)CC(.+)C(.+)C(.+)CC(.+)C
Here is a capture on how it works when using it on the "Replace Operator".
See? It recognizes only the patterns you want.
Hope it helps.