The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Compare job ads with a given set of terms

JimpixJimpix Member Posts: 1 Learner I
edited December 2018 in Help

Dear Community,

 

For a recent research paper, I plan to perform the following, for which I'd kindly ask for your advice.

 

I obtained a set of a few thousand job ads. I now want to analyse how and whether these job ads include 'content' that has been previously specified in another research paper as individual 'categories'. To make things more precise, there are about 15 existing categories, each of which contains descriptions that explain the category in 2-4 sentences.

 

Now, I want to understand, which and how many job ads cover the aspects described in each of the 15 categories. A result could be, for example, job ad #1 contains content that matches (or comes close) to the descriptions of categories 2, 5, 8 but misses content that would allow any reference to the remaining categories.

 

In case you got any references or advice how to approach this task, please let me know. I would suspect that the best approach would be a supervised learning approach.

 

Best,

Jimpix.

Answers

  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hi @Jimpix!

     

    I think you should share a sample of your data, because with this description I understood like 4 different RapidMiner processes.

     

    All the best,

     

    Rod.

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi Jimpix,

     

    I think the LDA technique, whose operator belongs to the Operator Toolbox extension, fits the problem quite well.

     

    It will require some hand work though, bacause you have to map the labels to LDA topics and then find and adequate threshold to decide if the confidence is enough to assign any given category to the ad. The operator also works as a one class clusterer/classifier, which is not what you want and should be ignored.

     

    I look forward to seeing some advances as attached process :D

     

    Regards,

    Sebastian

Sign In or Register to comment.