The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Non-nominal label: the lavel attribute must be nominal
rgavankeulen
Member Posts: 1 Learner II
Hi,
i encountered a problem that i'm unable to solve.
When i do cross validation for KNN i get an error pop for the function Performance (classification): " Non-nominal label: the lavel attribute must be nominal"
I don't understand why i get this pop up since the data is all nomimal before the KNN function?
Also when i use the function " polynominal to binominal" i get below errorNumerical label not supported
Kind regards,
k
Tagged:
0
Answers
Hi,
Can you please post the XML of your process here? If possible including the data. Hard to tell what is going on otherwise...
Best,
Ingo
I am having the exact same problem, and I came here looking for a solution through previous questions, but our colleague here either threw the towel or may have found another way, and didnt share with us.
Anyway... Please help unicorns T-T @IngoRM
Here is the 2nd part of my process (After data was cleaned as I posted separated in process 1 bellow) I have already tried operators 'nominal to text', 'text to nominal'... Don't know what else to do.
Bellow is process 1, from which I got the datasets, cleaned and such... After this process was done, I saved it and retrieved into process 2, trying to perform Cross Validation with SVM.
@alinebora Can you also post a zip with the Wordnet dictionary you are using in this? It's required to complete your initial process and I want to make sure I replicate the same analysis in order to correctly troubleshoot process #2. Thanks.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@Telcontar120 I tried to attach the zip file here but it didnt work.
But I easily I downloaded in the link below, "WordNet 3.1 DATABASE FILES ONLY".
https://wordnet.princeton.edu/download/current-version
@alinebora I was able to discover what your problem was (and it was likely the problem with the OP as well).
In your original dataset, for whatever reason you have the "status-id" field set as the role of label. In RapidMiner, the role of label is for the thing you are trying to predict, so the correct role for the "status-id" would never be label. It should in fact be the role id.
When you then later switch the role of label to the attribute "sentiment" then that leaves the status-id as a regular attribute, which RapidMiner tries to use for modeling, but it has the type polynominal. That's what is causing your error message. So you can make that go away by changing the role of status-id to "id" first. Take a look at the simplified process below.
When I reviewed your processes, I had a few other suggestions for you. Your original dataset contains over 12k attributes that you generated from the original text processing. Many of these have hardly any occurrences, and many are also meaningless (such as single letters or two letters). You can add some additional operators to your original text processing task such as "Filter Token by Length" to get rid of these, and you can also turn on the pruning option in the "Process Documents from Data" operator to remove low occurrence values that will not be useful in modeling because they occur too infrequently. This is highly recommended.
You also are trying to predict the numerical sentiment score from Wordnet, which is likely to be quite difficult. You may be better off recoding this as a simple positive/negative nominal attribute and then predicting that, at least to start. My example process above shows how to do that as well.
Finally, in your process #2, you seem to be taking the same dataset through both branches of your process, the first to build the score and the 2nd to apply that score. That's extra work that isn't needed, since you can output the scored records directly from the cross-validation, as I show in the sample process above. If you have a separate dataset to score, then your 2nd branch would be needed.
I hope you find this all helpful with your project. Don't hestitate to come back and ask more questions if needed.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dear Brian @Telcontar120 thank you so much for your reply! :catvery-happy::catvery-happy:
They were very useful, I applied both 'filter by token' operator (process 2) and prune method in 'Process documents' (process 1).
However, I noticed that you switched the 'Support Vector Machine' operator by 'Classification by regression'. Isn't the SVM operator no longer applicable to this process? Could you explain why one instead the other?
Also, when I run the process, it does not give me the stats informing performance accuracy, as I'd like to see like the example in the picture.
I attached again my latest process in xml.
Thank you again :cathappy::cathappy::cathappy:
, which I'd like to describe in my analysis. Could you advice me with that?
@alinebora I'm not sure what you mean here. When I look at my process in RapidMiner Studio 9.0, I see the SVM model in the training side of the Cross-Validation inner process. You can also see the operator name clearly in the raw XML code I posted here as well.
And I also see the Performance (Binominal Classification) operator there, which outputs the confusion matrix as shown in your picture, as long as you output the "per" port from the Cross Validation (which is already configured to do in my process). Are you certain you pasted my code and didn't modify it before you attempted to run it? As long as you paste the code into your XML panel and then press the green "check" mark, it should render the process as I have created it. If you are doing that, then I have no idea what might be going on.
Assuming you do get the code running, remember also that SVM has a number of parameters that probably need to be tuned (like kernel type, C, and gamma) to get the best model. Reviewing the training videos regarding model optimization would be a good starting point for that.
Happy data mining!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dear Brian @Telcontar120I copied and run your xml process again. The SVM operator there was correct (I think I did something before-sorry about that). Then, the process runs smoothly but the results only show the ExampleSet (No performance measure). One interesting fact about your process is that the run button changes as well, and every time I click on it, it shows me the ExampleSet first just with the sentiment values, and the secod time shows sentiment attibute with negative or positive. (It's the first time I see that, that's why I mention). Below is the result I get running your process.
@alinebora Those are just the breakpoints in the process, which are there so you can see what happens at each step. Keep pressing the play button after each pause and the process will complete normally (until you get the normal triangle again).
As long as you connect all the outputs from the last Cross Validation operator, you will also see the performance statistics. You want the "per" and "mod" and "tes" ports connected. They are connected in my process already but sometimes if you switch the data input source then connections are dropped.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dear Brian @Telcontar120
The connections are all there for Cross Valildation, the only change I made was to locate the data to run the process. If I keep pressing the button as you suggested I get this error message:
@alinebora
That's the same error as before, which originally was being thrown because of the polynominal attribute status-id. The problem is that SVM cannot handle nominal attributes as predictors, only numerical. Did you do something else upstream that could have added more polynominal attributes to the dataset?
I know that running that process that I posted works on the original dataset, which was based on your original input files. Perhaps you can post your current version of the process that you are running? Also check your data file for any polynominal attributes. You could add a "Select Attributes" right before the Cross Validation and filter for only attributes of type "numerical" and that would probably also fix the issue.
EDITED: updated sample process attached---this definitely works on the input data, so if it doesn't work for you it is because you have changed something else.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dear Brian @Telcontar120 ever since the very first time I ran your process here, I did not get any %Performance results :catsad:
Nevertheless, I must say that after, I did made a small change in my Data preparation (process 1): :catwink:
In 'Select attributes' I selected the attribute 'created_at' (instead another I had called 'followers count'). I renamed with 'Rename' operator as "Date" and ran again (just because I wanted see after in a graph the sentiment per day)... Even though I did this, the new 'Date attribute' now shows me only '?'
I didn't see any change in the ExampleSet result to be honest.
It could be that that may have had some impact, I'm not sure... This is the result I got running the latter suggested by you. I attached the xml of Process 1 again (with the small mentioned change).
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I did just did this here (attached xml). But it remains the same. But again... Ever since the very first time I ran the process the %Performance stat never apperead to me. T-T :catsad:
By the way, is there a way that I can visualize the Data again (which was the previous attribute "created_at".
@alinebora the latest version of the process you posted appears to be missing the entire section where all the text is processed from the documents and the sentiment score from WordNet is appended. What happened to that? It seems like you keep changing things around here and it is making it a bit hard to track what is happening. No wonder the modeling process is throwing errors for this.
Basically, you need to make sure that everything is still being done in order. Originally, your process #1 prepared the data from the Excel files and did the text processing and then appended the sentiment scoring from WordNet and stored it in the repository. The process I supplied here was supposed to be used on the output from that process and effetively replaced your entire process #2. This process looks like it is some kind of hybrid of your original process #1, process #2, and the process I posted??
Just take the stored output from your original process #1 (which you may have modified slightly by adding the Filter Token by Length and pruning options), then retrieve it and run it through the process I posted, and see what happens.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@Telcontar120 Dear Brian,
I did the process several times, deleted everything, took the original data and started everything over, but it wasn't working. I had just given up on Cross Validation, but now I found that I need to include in my analysis, that's why I'm back to bother you again if you don't mind :cattongue:
Back then, I had identified 2 constraints:
All that sad, I have decided to get a sample from my data and perform my Cross Validation there (It's allowed because even with the stratified sample I have enough rows for my analysis).
However, when I apply the Cross Validation operator, the subprocess within SVM presents 2 errors:
If you could check my process please and identify what am I doing wrong... I added the dataset again to save you from going back and look for it. It's the same one.
Thanks in advance.
“Keep trying no matter how many times you have failed.
If you fail, try, try again.
Never stop trying.
You success will come unexpected.” -Lailah GiftyAkita
@alinebora I wondered what happened after I never heard back after the last exchange!
The process contained in this post starts from the Excel file that you provided in the last post (airlinex.xlsx). It takes that file, does the text processing (which is required for modeling) and adds pruning, which was still not being done. Without pruning it was generating too many useless attributes. You may want to play around with the pruning threshold in Process Documents from Data parameters, but I do not suggest the option "none" when pruning.
It then selects out the non-text features and transforms the sentiment score into a simple negative/positive binominal attribute, which is required for the SVM model (and the source of your original errors). The role is then set for the id and the transformed sentiment is the label, and the dataset is sampled down to 500 of the positive class and 500 of the negative class (and you could adjust this further if you need to based on your memory constraints). Finally it performs the SVM cross validation and outputs the performance as well as the scored records and the model.
This model is not optimized at all so the performance is mediocre at best. But at least you should have a working process here (just change the path of the Excel file in the first operator) to get started!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@Telcontar120 Dear Brian,
I just ran the process here and it worked! I cannot believe it! I'm so happy!
Thank you so much for your help - Unicorn's magic :cathappy: