The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Search Keywords from a file
7amritaarora7
Member Posts: 25 Contributor II
Hi
I'm working on a project, wherein, I have to search a predetermined set of keywords. Further, this list of keywords gets updated regularly and is saved as a column in a database. Individually, I can search them using regex. But, is there a way, where I can search all the keywords mentioned in the file together?
Thanks in advance
Amrita
0
Answers
Quick way missusing operators:
take your list of regexes, aggreagte them together using concat and apply all of them?
Or use a loop Values
~Martin
Dortmund, Germany
Use "Filter Stopwords (Dictionary)" in the Text Processing extension? You may need to invert it depending on what you want to do.
Hi @mschmitz @sgenzer
Thanks for your replies.
I tried your solutions, this is the status now:
1. Concat
it shows the result of all attributes together, but, i need to know exactly, maybe in a separate column, which keyword was found and its frequency. Any help there?
+ This is a temporary solution, but the keywords updation takes place dynamically. So, any idea, how to search all the keywords stated in a file?
2. Filter Stopwords (dictionary)
this is the first solution, I also thought about. But, there isn't an option of invert selection in this operator. So, any other solution?
3. I'm trying loop values operator, but need further help in that
Thanks in advance
Amrita
hmm...looking for an elegant solution. I know this sounds weird, but maybe try this:
- take your text and Split (by space or whatever will split up). This will create a ton of attributes.
- transpose this mess so that your text is listed word by word in one attribute and a ton of examples...sort of like this:
I
am
Scott
and
I
like
RapidMiner
- Do a Join (inner) with your keyword database list to see overlap
- Aggregate if desired to see frequencies
I do this more and more - create master "lookup" data sets and then join. It's quite versatile.
Scott
Thanks for your help and sorry for my late reply.
When I received your reply, I was also working on the same lines and with more help from your side and few hit and trials, some part of my process is done.
Now, last part that I am stuck with is searching the contents of one attribute (i. e. Keyword) in another attribute (i.e. text). For this, I tried using generate attribute and filter examples operators, but didn't get required results. Also regex based on matches and contains are not working since they search a particular word in one attribute, not an attribute in an attribute.
Any help here would be greatly appreciated!
Thanks in advance
Amrita
My initial thought to attacking this problem is this:
1. Read Database for the keywords and then use an Extract Macro set to Data Value. Give this macro a name "keyword_macro" and then this will extract the keyword list and associate it
2. Use a Loop with a Filter Examples embeded inside. Loop over the keywords and drop in the macro value into the Filter Examples. Use the "contains" filter and set it to the %{keyword_macro}
3. Then outside the loop use an Append operator to Append all the matching results.
Ok, try this. Just create a text file with a single column (see below) and then import it and save it as a repository.
Keywords
RapidMiner
Hadoop
Spark
Update: Just swap out the Search Twitter operator for the data store of the strings you want to search.
Hi @Thomas_Ott
I tried this with my database, but, I'm not getting the required results. I'm getting parts of text as result. What I need in result is which keyword is appearing in the text and how many times.
The other way, that I think, it will work is using loop value operator extracting all values of the keyword attribute from the database. Then, within loop value, add generate attribute operator with a regex for searching keyword macro within text attribute. Is this correct way? If yes, need some help with regex.
Or is there any other way?
Thanks in advance
Regards
Amrita
Hi @7amritaarora7,
I did something like this for a customer once but I can't seem to find the process now. I think with the attached process it's getting close, just have to figure out how to properly select the columns.
In this particular process I added in the Process Documents from Data operator and set it to Term Occurances. It definately gets all the occurences for RapidMiner/Hadoop/Spark, but the problem becomes when I have the term "stratahadoop." Maybe you can take it from here and experiment, I'm tied up for the rest of the week.
Update: Do a search on the forum for this. http://community.rapidminer.com/t5/RapidMiner-Studio/SOLVED-Simple-word-count-of-wordlist-from-document/m-p/25054#M18530
Hi @Thomas_Ott
Thanks a lot for this process. This seems really close to the results that I want. I'll work on it from here and let you know, when the perfect solution is found.
Regards
Amrita
@Thomas_Ott, @mschmitz & @sgenzer
Hi
Good News! I was able to do that keyword search process by regularly updating the database using Execute SQL operator and then going through all keywords via loop values operator.
Some Sad News! As I am using a previous version of rapidminer, so, the process suggested by @Thomas_Ott for frequency count of keywords could not work. So, I'm trying to create a script for findig keyword frequency and using the Execute Script operator. The issue with this is that there's some error regarding the output transfer.Execute script is unable to show output when connected to results.
Is it the right way? or is there any other way?
P.s. : I'm using rapidminer version 5.3
Thanks in advance
Regards
Amrita
Dear Amrita,
if you can provide an example process, i would be happy to have a look at your Exec. Script.
~Martin
Dortmund, Germany
Hi Martin
I'm attaching here the process that I'm testing using execute script:
Please have a look at it. When I run this process, this is the error I'm getting in log:
WARNING: Unknown result: class java.lang.String: Occurence of 'here' in String is 1
Regards
Amrita
Hi Amrita,
Couldn't you just upgrade to version 7.2? Version 5.3 is really really old and you miss A LOT of great new performance/operator/extension enhancements.
Hi @Thomas_Ott
Yes, 7.2 is a much better version and I use that too. But, for server, I have complete access to 5.3 version. The newest server has some limitations, unless its purchased. So, by the time, my company purchases that, I have to use the older version.
If there's a way to connect, Rapidminer 7.2 to Rapid Analytics server, do let me know
Regards
Amrita
Hi Amrita,
Unfortunately, you can't connect Studio 5.3 to Server 7.2. We offer a FREE version of RapidMiner Server now but it has limitations of 1000 API calls, 2 GB of memory, and 1 logical core. If your employer is using RapidAnalytics, and get value ($$$) from it, it would be great if you guys upgraded so we can continue to innovate!
Hi @Thomas_Ott
We have already planned to buy that within next few months, but, for now, I'm looking for an interim solution.
Regards
Amrita
Amrita,
the attached code works for me. I do not know what i really did . I added the LogService. Those messages are visible in the Log Panel of RM. If this is what you would like to have, i would like to built it with two Example set inputs (1. Texts, 2. List of Keywords) and one example set out puts (input + count_XXX)
Is our sales team already in contact with you? If no, please reach out to me at mschmitz at rapidminer dot com. We might simply help you to get your use case implemented to get the business convinced.
~Martin
Dortmund, Germany
Hi Martin
Thanks a lot for this. This works great. Now, I'll take it from here and join it to the main process. A small clarification: the results that are shown in logs, can be transferred to database or any other form of output, right?
And yes, I'm in contact with your sales team.
Thanks again
Regards
Amrita
Amrita,
sure. I will have a look tomorrow.Takes a few minute to get it into a RM example set.
~Martin
Dortmund, Germany
Amrita,
attached is a process with this script which has proper in and output ports. It's not yet commented. If you need any help to understand, just post here.
~Martin
Dortmund, Germany
Hi Martin
Thanks a ton. This is exactly what I was trying to do.
Thanks again
Regards
Amrita
Hi,
I had the same problem and your soultion worked well for me! Thanks!
In my case I do not want to find only keywords as some keywords are within hashtags, e.g. #ILikeThatKeyword.
Is there a solution to find also those matches?
Thanks!
Simon