The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Clustering(K-Means) data from database
Hi all,
Sorry if this problem was already solved, but I’m a newbie and I was not able to locate a similar one.
My problem is the following:
I’ve a table with the following columns: doc_id; term; weight. Basically, for each document there are several terms occurrences and a weight associated to each term. This means that, each document is categorized by a set of pair attributes (term, weight)
Example:
Doc_id term weight
Doc1 color 0,45
Doc1 height 0,22
Doc1 weight 0,05
Doc2 altitude 0,04
Doc2 weight 0,35
I intend to perform a clustering analysis using k-means in order to check which documents are more similar against a predefined k clusters.
When I connect the "read database" operator to the "clustering" operator an error message appears saying that clustering doesn’t accept polynomial attributes. It’s not my intention to change both “doc_id” and “term” attributes to nominal ones. The result that I'm expecting should be somthing similar to:
Cluster_0 (Doc1, Doc32, Docx,...), Cluster_1(Doc_2, Doc45, Docy,...), etc.
Does anyone came across such problem?
Thank you for your support.
Best regards,
Sorry if this problem was already solved, but I’m a newbie and I was not able to locate a similar one.
My problem is the following:
I’ve a table with the following columns: doc_id; term; weight. Basically, for each document there are several terms occurrences and a weight associated to each term. This means that, each document is categorized by a set of pair attributes (term, weight)
Example:
Doc_id term weight
Doc1 color 0,45
Doc1 height 0,22
Doc1 weight 0,05
Doc2 altitude 0,04
Doc2 weight 0,35
I intend to perform a clustering analysis using k-means in order to check which documents are more similar against a predefined k clusters.
When I connect the "read database" operator to the "clustering" operator an error message appears saying that clustering doesn’t accept polynomial attributes. It’s not my intention to change both “doc_id” and “term” attributes to nominal ones. The result that I'm expecting should be somthing similar to:
Cluster_0 (Doc1, Doc32, Docx,...), Cluster_1(Doc_2, Doc45, Docy,...), etc.
Does anyone came across such problem?
Thank you for your support.
Best regards,
0
Answers
first of all you have to De-Pivot your data with the equally named operator to get a dataset which contains exactly one document per row, like this: Then define Doc_id as Id with Set Role, and apply the clustering. That's it
Best, Marius
I've used the PIVOT operator instead of the DE-PIVOT.
Regards,
Happy Mining!
~Marius