How to analyze negetive attribute value

atul_kotwale · October 2018

Hi,

I am trying to build prediction model to predict the category of any case by looking the description of it. I have two training data set, the first data set contains case id and description and category.

ID Description Category

1 "some txt" A

2 "some text 2" B

and second data set contains following rows. which is basically tells me that which case should not fall for particular category.

ID Description Category

1 "some other txt" notA

2 "some other text 2" notB

I want to tain my model using both the dataset. I am having problem to feed the second data set to my model. I want to feed the second data set in such as way that it give correct information to my model. Any help would be great. Thanks!

atul_kotwale · October 2018

Hi @kypexin

Thanks for reply. I am also considering, not to include negetive result but I have one more thought, if I somehow I convert the negetive dataset to below format by assigning 0 to the category which is not possible and giving 1 to all possible category.

ID Description A B C

1 "some other txt" 0 1 1

2 "some other text 2" 1 0 1

and similarly convert the positive dataset to below

ID Description A B C

1 "some txt" 1 0 0

2 "some text 2" 0 1 0

If I feed above data to my model, will that data would confuse my model ?

Thanks

MartinLiebig · October 2018

Hi @atul_kotwale,

thats one way of doing it, yes.

~Martin

atul_kotwale · October 2018

Thanks @mschmitz.

kypexin · October 2018

Hi @atul_kotwale

I am afraid that you have to think on reformulating the task. You cannot have such 'negative' labels like you described.

For example, if "some other text 2" = notB, then it is either A or notA, which means third category C.

On the other hand, "some txt" = A is also obviously notB.

So you may only have an example which belongs to some category, but you cannot label an example as not belonging to some category.

kypexin · October 2018

Hi @atul_kotwale

Yes.

Your first example is marked both B and C, which again is not possible in terms of ML data.

There should be only one "1" in each row, in case you want to predict categories A, B or C to any given description.

But this is a bit different task from your initial thoughts: this way you just categorize each text separately, and not much more; for example, both "some other text 2" and "some txt" are from category A (as I understood, that's not what you want to achieve).

More generally speaking, you can not feed to the model 2 different datasets with different meanings of categories.

The model still should work with a single dataset, in our case this one, where all examples are actually different:

ID Description A B C

1 "some other txt" 0 1 1

2 "some other text 2" 1 0 1

3 "some txt" 1 0 0

4 "some text 2" 0 1 0

atul_kotwale · October 2018

@kypexin Thanks. I got it now.

MartinLiebig · October 2018

Hi @atul_kotwale,

one idea to use it, to build a "Not_A model". Then you score the other data set with it and use confidence(not_a) as a new variable for further modelling.

BR,

Martin

atul_kotwale · October 2018

Hi @mschmitz,

Thanks for reply. If I am getting it correctly you mean, I should build model using negative dataset and then apply this model on positive dataset. The output will produce three new coloumn (confidence(not_a), confidence(not_b), confidence(not_c)) and I should include these new coloumn for further training ?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How to analyze negetive attribute value

Best Answers

Answers