The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
mystery science data mining problem 3000
MBA_Data_Miner
Member Posts: 21 Contributor II
in Help
In the not too distant future-
I have an idea I would like to try out ... but I have no idea what operators could accomplish this:
What I would like to do is scan a database ( flat file currently) and find any records with matching fields.
Then assign a "relationship ID" to each field to help find relationships in the data.
( later I would like to include fuzzy matching as well above a certain match threshold, like Jaccard similarity or something similar).
Any thoughts?
Best regards, J.
I have an idea I would like to try out ... but I have no idea what operators could accomplish this:
What I would like to do is scan a database ( flat file currently) and find any records with matching fields.
Then assign a "relationship ID" to each field to help find relationships in the data.
( later I would like to include fuzzy matching as well above a certain match threshold, like Jaccard similarity or something similar).
Any thoughts?
Best regards, J.
0
Answers
Just thought I'd revive this topic. Fuzzy matching is possible with the cross distances operator. Break the field/fields for comparison into ngrams and calculate the other records in your dataset that are closest.
Regarding programmatically discovering relationships: I just stumbled upon this fantastic sounding project that RapidMiner are working on alongside the University of Mannheim.
"Key idea
Analysts increasingly have the problem that they know that some data which they need for a project is available somewhere on the Web or in the corporate intranet, but they are unable to find the data. The goal of the 'Data Search for Data Mining' (DS4DM) project is to extend the data mining plattform Rapidminer with data search and data integration functionalities which enable analysts to find relevant data in potentially very large data corpora, and to semi-automatically integrate the discovered data with existing local data."
You want entity relationships back to your database? How about all of wikipedia?
ds4dm.de/en/about/
http://ub-madoc.bib.uni-mannheim.de/40718/1/DataSearchDemo.pdf