The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to create a process to already prepared data (NLP, NN)
Hi!
I faced a huge problem in Rapidminer. So basically I have a dataset of word vectors which are categorized to 5 labels. I'd like to train a neural network how to associate proper combinations (in this case - appearance of words in sentences) to one of the 5 labels. The problem is that I do not know how to design an useful process. I've watched a lot of tutorials for Rapidminer, but unfortunately there's not enough similarities to my project.
Could you please help me? This is my engineer thesis and I'd really love to finish it properly ))
Thanks in advance!
PS - 9.csv file is an output, arxiv(...).txt file is an example input.
I faced a huge problem in Rapidminer. So basically I have a dataset of word vectors which are categorized to 5 labels. I'd like to train a neural network how to associate proper combinations (in this case - appearance of words in sentences) to one of the 5 labels. The problem is that I do not know how to design an useful process. I've watched a lot of tutorials for Rapidminer, but unfortunately there's not enough similarities to my project.
Could you please help me? This is my engineer thesis and I'd really love to finish it properly ))
Thanks in advance!
PS - 9.csv file is an output, arxiv(...).txt file is an example input.
1
Answers
This is the simplified process :
Thank you for your help! But this process doesn't avoid special words like "CITATION", "NUMBER" and I see that it shows "al" as a word, which is an error. Is there any possibility to work with code in Python?
The program should determine if the neural network assigns a sentence to a correct label. So I would need a loop which already goes through the preprocessed output (process is written in Python) and then it should learn that certain combination of words go with certain labels... Is it even possible to do it?
I'd really appreciate your help. It feels like I went through a milion of websites and I do not see any answer.
Thanks in advance
You could for instance pretty easily filter the special words as they are the only ones in capitals. The filter tokens can do that for you, just filter out all words in capitals etc.
The fact 'al' shows up is because it was in your original text, it's mentioned in 'sandroni et al ', so the fact it shows up is correct. You can filter these out again by creating your own dictionary.
Note that this process is actually trying to do the same thing as what your output (9.csv) shows, simply generating a vectorset from a document, so to get the same results the same cleaning and optimizing needs to be done as in your python workflow.
If I understand it correctly however, this is what you already have, and you need to get the keywords by label from this file (9.csv), correct?
Attached example shows the 'improved' processes, one where I created 9.csv alike from your source (arxiv) and one where I use your 9.csv to create a wordlist by label. The results are pretty similar so this might work for you also. But I have to admit I'm not really sure what you like to achieve so it could be utterly useless also
You can indeed work in python without any problem, the python operator allows you to work directly with python libraries, as long as your dataset is 'pandas friendly' as this is the default way Rapidminer communicates with python. But if you can do it in RM you may not need the python part at all.