The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"read from Excel/CSV"
CaptainChaos
Member Posts: 17 Contributor II
Hi Guys,
Can somebody explain me howe i can tell rapid miner to take each line under "A" as a seperate Document and each line under "B" as its ID.
I would like to add a Data to silimirity operator to it but theirfore each line has to be calssified as a document. Does any body know a operator that can do this.
Thanks
Can somebody explain me howe i can tell rapid miner to take each line under "A" as a seperate Document and each line under "B" as its ID.
I would like to add a Data to silimirity operator to it but theirfore each line has to be calssified as a document. Does any body know a operator that can do this.
Thanks
0
Answers
did you try the wizards in the Read Excel/Read CSV operators? There you are able to define to role of each column, so you can set the id role to column B. Hope this helps, if not, please tell me how exactly a "document" in your files looks like.
Cheers,
Marius
I tried all the widgets but they dont help me to do what i want . I know i can chose the attribute for a column there but this doesnt help me out so far.
At the moment i just have one column(changed it) in Excel Column "A"
in each row of "A" is some kind of text. I just would like to make rapid miner treat each of them like a own document.
Thanks
Reegards
There's probably a simpler way, but you could do it by converting into XML and then back again.
For example:
I created a CSV file called test csv with the following structure: Then made the following process to convert it to XML in the following structure: The process then reads in the XML file and changes it into data.
Probably not at all what you were after, but it was a fun process to build & might be useful for other tasks.
Best regards,
JEdward.
it seems hard to understand what you're after... If you have an example set, each line is a example and usually this is the correct format for most of the operators. If you want to do something with each single example, then the operator "Loop Examples" is probably the right tool. Using IDs for examples is possible by creating new ones via "Generate ID" or setting existing columns to the ID type using "Set Role".
When talking about documents this usually refers to the document datatype of the text processing extension and is only used in text and web mining context.
I am not familiar with the "Data to Similarity" operator, but this one requires an example set as input. So your data should already have the right format. If you want to do something for only one example isolated from all the others, use "Loop Examples" and put the example processing inside this operator.
For further support, it might be useful if you post a process as far as you created it, and describe where things are not working and what you would like to do different.
Regards
Matthias
P.S. Please don't post similar questions to other forums, if they are not answered immediately. Especially specific questions as yours should be posted here instead of the general data mining forum.
Look i do have a excel file with data just in Column a(A1:A3000).
Structure looks like this:
A
Text1........
Text2..........
Text3.......
..
...
Text3000
I know that i can loop through the file, but when i want to work with the Data later on the problem is that the Operator takes the wole Text of one Row and compares it against another(like one term). But I want one row is recognized as a single document and the words inside this row/document can be compared to those of another row/document. In the Moment My process document Operater just takes the whole Row as one term and compares it against another row.
I Hope i made a bit more clear what I want i post my code here maybe one of you guys can than undersatand what my problem is.
Thanks again seems that you all have a hard time with me :P
try adding the operator "Tokenize" inside the "Process Documents" operator. Otherwise the word vector consists of only one word (the whole text). You can also add other preprocessing operators at this place, e.g. "Transform Cases" or "Filter Stopwords".
Hope this is what you are looking for...
Regards
Matthias