The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Correct ARFF Format?
Legacy User
Member Posts: 0 Newbie
Hi all,
I'm running a Naive Bayes classifier on a set of keyword/keyphrases and then using the produced model to predict the label attribute for an unclassified set of keywords/keyphrases. However, I'm running into some strange problems where the result of my applied model shows a ? if I have a space between keywords. I'm thinking that I may be formatting my ARFFs incorrectly?
Here is my training set:
Thank you.
I'm running a Naive Bayes classifier on a set of keyword/keyphrases and then using the produced model to predict the label attribute for an unclassified set of keywords/keyphrases. However, I'm running into some strange problems where the result of my applied model shows a ? if I have a space between keywords. I'm thinking that I may be formatting my ARFFs incorrectly?
Here is my training set:
@RELATION c_trainingAnd here is my test set:
@ATTRIBUTE keywords STRING
@ATTRIBUTE change {up,down,neutral}
@DATA
'delay acquisition',down
'facing the same conundrum',down
'restructuring',down
'delay acquisition',up
'divestiture',down
'profit dissipated',down
'delay acquisition',up
'profits up', up
'profits down', down
'delay acquisition',up
'delay acquisition',up
'delay acquisition',up
'delay acquisition',up
@RELATION c_testAny help would be appreciated.
@ATTRIBUTE keywords STRING
@DATA
'profit dissipated'
Thank you.
0
Answers
as far as I remember, Arff uses double quotes (") instead of single quotes ('). Could that be the reason?
Cheers,
Ingo
The end result of the above training and test data (with double quotes) is instead of the expected But the problem is strange. If in the test data we change "profit dissipated" to "profit dissipated" (with 2 spaces) it works fine.
So a correct Arff for training would look like and for testing accordingly
Please check the meta data view in order to check if everything is done correctly. Instead of using Arff you could also use the Attribute Editor of RM if you do not want to type in the different values yourself. Alternatively, you could load in the data from Arff using a string attribute and write down the data with the ExampleSetWriter (both the meta data file .aml and the data file .dat). Then you could use the same basic .aml file for your test data.
Cheers,
Ingo