The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Any Ideas?
Hi All,
I am trying to convert the following output from
Link cluster able adsl adsl_faceplate alarms
http://test1 cluster_2 .0 .0 .0 .0
http://test2 cluster_2 .0 .0 .0 .0
http://test3 cluster_0 .1 .0 .0 .0
http://test4 cluster_2 .0 .0 .0 .0
http://test5 cluster_1 .0 .1 .0 .0
http://test6 cluster_1 .0 .0 .0 .0
http://test7 cluster_0 .0 .0 .0 .0
http://test8 cluster_2 .0 .0 .0 .0
http://test9 cluster_1 .0 .0 .0 .0
http://test10 cluster_0 .1 .0 .0 .0
to
Link Cluster Word Score
http://test1 cluster_2 able .0
http://test2 cluster_2 able .0
http://test3 cluster_0 able .1
http://test4 cluster_2 able .0
http://test5 cluster_1 able .0
http://test6 cluster_1 able .0
http://test7 cluster_0 able .0
http://test8 cluster_2 able .0
http://test9 cluster_1 able .0
http://test10 cluster_0 able .1
http://test1 cluster_2 adsl .0
http://test2 cluster_2 adsl .0
http://test3 cluster_0 adsl .0
http://test4 cluster_2 adsl .0
http://test5 cluster_1 adsl .1
http://test6 cluster_1 adsl .0
http://test7 cluster_0 adsl .0
http://test8 cluster_2 adsl .0
http://test9 cluster_1 adsl .0
Any ideas how this could be done?
There are thousands of rows and columns
Thanks
S
I am trying to convert the following output from
Link cluster able adsl adsl_faceplate alarms
http://test1 cluster_2 .0 .0 .0 .0
http://test2 cluster_2 .0 .0 .0 .0
http://test3 cluster_0 .1 .0 .0 .0
http://test4 cluster_2 .0 .0 .0 .0
http://test5 cluster_1 .0 .1 .0 .0
http://test6 cluster_1 .0 .0 .0 .0
http://test7 cluster_0 .0 .0 .0 .0
http://test8 cluster_2 .0 .0 .0 .0
http://test9 cluster_1 .0 .0 .0 .0
http://test10 cluster_0 .1 .0 .0 .0
to
Link Cluster Word Score
http://test1 cluster_2 able .0
http://test2 cluster_2 able .0
http://test3 cluster_0 able .1
http://test4 cluster_2 able .0
http://test5 cluster_1 able .0
http://test6 cluster_1 able .0
http://test7 cluster_0 able .0
http://test8 cluster_2 able .0
http://test9 cluster_1 able .0
http://test10 cluster_0 able .1
http://test1 cluster_2 adsl .0
http://test2 cluster_2 adsl .0
http://test3 cluster_0 adsl .0
http://test4 cluster_2 adsl .0
http://test5 cluster_1 adsl .1
http://test6 cluster_1 adsl .0
http://test7 cluster_0 adsl .0
http://test8 cluster_2 adsl .0
http://test9 cluster_1 adsl .0
Any ideas how this could be done?
There are thousands of rows and columns
Thanks
S
0
Answers
maybe if you describe rules used for conversion, it will be easer to help you. Because I don't see any. Look at operators for generating attributes (
Generate Attributes, Generate Aggregation, ...)
Cheers,
Vaclav
Sorry, I will explain a bit more.
I use the k-means clustering operator to cluster text from a webcrawl that have been pre-processed (split into tokens, stop words removed etc).
The cluster set result which consists of 3500 examples of data detailing the URL, the cluster result and the 8500 attributes from the text looks like
Link cluster able adsl adsl_faceplate alarms .......................(8500)...............z
http://test1 cluster_2 .0 .0 .0 .0 .....................................0
http://test2 cluster_2 .0 .0 .0 .0 .......................................0
http://test3 cluster_0 .1 .0 .0 .0 ...................................0
http://test4 cluster_2 .0 .0 .0 .0 ......................................0
http://test5 cluster_1 .0 .1 .0 .0 ......................................0
http://test6 cluster_1 .0 .0 .0 .0 ......................................0
http://test7 cluster_0 .0 .0 .0 .0 ......................................0
http://test8 cluster_2 .0 .0 .0 .0 ......................................0
http://test9 cluster_1 .0 .0 .0 .0 ......................................0
http://test10 cluster_0 .1 .0 .0 .0 ......................................0
....
....
....
(3500)
...
...
http://test3500 cluster_0 .1 .0 .0 .0 ......................................0
I am looking to try and get the data into the following format.
Link Cluster Word TF-IDF Score
http://test1 cluster_2 able .0
http://test1 cluster_2 adsl .0
http://test1 cluster_2 adsl_faceplate .0
http://test1 cluster_2 alarms .0
http://test1 cluster_2 ....... .0
http://test1 cluster_2 z .0
http://test2 cluster_2 able .0
http://test2 cluster_2 adsl .0
http://test2 cluster_2 adsl_faceplate .0
http://test2 cluster_2 alarms .0
http://test2 cluster_2 ....... .0
http://test2 cluster_2 z .0
http://test3 cluster_0 able .0
http://test3 cluster_0 adsl .0
http://test3 cluster_0 adsl_faceplate .0
http://test3 cluster_0 alarms .0
http://test3 cluster_0 ....... .0
http://test3 cluster_0 z .0
....
....
http://test3500 cluster_0 able .0
http://test3500 cluster_0 adsl .0
http://test3500 cluster_0 adsl_faceplate .0
http://test3500 cluster_0 alarms .0
http://test3500 cluster_0 ....... .0
http://test3500 cluster_0 z .0
Does this make a bit more sense?
Thanks
Scott
you can use the operator "Pivot" and "De-Pivot" for tasks like this. You can find examples on myexperiment.org:
http://www.myexperiment.org/search?filter=TYPE_ID%28%2262%22%29&;query=pivoting
Simply install the Community Extension for RapidMiner to access and directly download the processes uploaded there (search the forum for more information about the Community Extension).
Cheers,
Ingo
Thanks for the advice. Maybe you could point me to the example that is closest to what I am trying to do. Although similar I think the output I am after is very different.
I suspect de-pivot is somehow involved.
Many Thanks
Scott