The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Data to Similarity - how to define the control group
christina_dehme
Member Posts: 3 Contributor I
Hi everyone,
i have a large number of documents (one folder "auditor report" and one "audit committee report"(AC) ) and want to compare them. With the operator "Data to similarity" the programm compares each file with each file. I want to compare just the matching file names.
The documents in the folder 1 "auditor report" are named: year_company name
and the documents in the folder 2 "audit committee report" are named: AC_year_company name
So instead of comparing each document with each document from the other file i just want to compare the matching documents (= same year and company name in the document name).
Many thanks in advance!!!
Christina
Tagged:
0
Answers
Assuming the time stamps match (i.e. yyyy in one file and yyyy in the othe file), just use a Join operator first to join the two files together and match on your timestamp. Then use the similarity measures.
Hi Thomas,
thanks for the quick reply. I tried it with the operator "join" before testing on similarity. I chose join type "inner" and used as key attributes "metadata_file" for the right and the left key attribute. But somehow it didn't work out as i was expecting it.
For example:
AC_2015_A.G.Barr PLC,GB00B6XZKY75
should match before i use the similarity operator with
2015_A.G.Barr PLC,GB00B6XZKY75.
So that the similarity test just runs between those two files (almost same name just once with and once without AC in the doc name) instead of comparing each doc with another.
This is what I've got:
Thanks a lot in advance
Christina
Dear Christina,
i do not think that there is anyway to do this w/o a loop. Propably something like Loop Values, Filter Examples for the value, left join with the other table and than data to similarity.
In RM 7 we added a Group into Collection operator in the operator toolbox extension. That would make it a bit nicer.
Best,
Martin
Dortmund, Germany
Dear Martin,
i installed the new version of RM. I still get the same results and dont see a way how to solve my problem of matching samples. I have to files and the programm should be able to read the name of each document and just check the matching ones for similarity. Its still comparing all documents with each other. As i have over 400 documents in total the program does not run with so many.
Thanks in advance
Christina
Here you can see which match i want to have. So my question is which operator do i have to use ? In excel it would work with =A2="AC_"&B2
,
Dear Christina,
i thought about something along the lines of the attached process. Not too handsome but working. 7.5 has a bit of a different loop interface but parallized and therefore way faster loops.
Best,
Martin
Dortmund, Germany