Integrating RapidMiner with Java Application

nelze · February 2014

I'm trying to integrate my rapidminer process into a java application. I need to dynamically input the test data, but the training set is already in the rapidminer process. (It's my first time trying to integrate rapidminer so I need some help). But when I run the code, it keeps saying this error:

[Fatal Error] :1:1: Premature end of file.
Exception in thread "main" java.lang.NullPointerException
at NaiveClassifier.main(NaiveClassifier.java:34)

Line 33 and 34 is
Operator op = process.getOperator("Read Excel");
op.setParameter(CSVExampleSource.PARAMETER_FILENAME, myData);

where myData is just the filepath of the test data. I used CSV because my file is a .csv file. I'm not sure if I used it correctly. I also can't seem to find a list of the getoperator parameters so I left it at the default "Read Excel".

When I run the process in rapidminer (with me manually putting the test data in the process), it gives out the classification result.

My test data is unlabeled, so I don't know if that's the issue why it's causing the error. Also, the port in rapidminer is connected to the input port at the left side, since I assume that's what I need to do if I need to input test data dynamically. My test data is also comprised of many rows, so rapidminer needs to give me like an array of classification results.

I used the code posted in this forum sometime ago, the one which the admin posted.

Any help will be appreciated.

Marco_Boeck · February 2014

Hi,

a couple of hints:

1) when you connect the process input port on the left side of your process to an operator, you can supply input IOObjects when calling process.run(new IOContainer(...))
2) process.getOperator("operator_name"); gets an operator from the process where the name matches the name displayed in RapidMiner. If your operator is called "Read Excel" in RapidMiner GUI, you can get it this way.
3) Only set parameters from the matching operator implementation class. If you are using an Excel operator, ExcelExampleSource.PARAMETER_XYZ is valid, CSVExampleSource.PARAMETER_XYZ is not.

Regards,
Marco

nelze · February 2014

Thank you for the reply!

1) what is the parameter inside IOContainer in process.run(new IOContainer(...))? Is it the test data? When I tried to put the test data in it it says premature end of file then no absolute path.

I used the Read CSV so I changed it to that. This is what the process looks like. All settings are in default except for the separator which I changed to comma.

Marco_Boeck · February 2014

Hi,

RapidMiner processes work with IOObjects. This is the interface for the data coming in and out of operator ports. In your specific case, you can remove the connection from the process input port to the Read CSV operator, because the Read CSV operator is capable of reading a .csv file directly from the file system.

Regards,
Marco

nelze · February 2014

This is what the java code looks like


public class NaiveClassifier {

	public static void main (String args[])
	{

		    try {
		      RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
		      RapidMiner.init();

		      Process process = new Process(new File("C:\\Users\\nelze\\.RapidMiner5\\repositories\\WalkingRepo\\NaiveClassify.rmp"));
		      Operator op = process.getOperator("Read CSV");
                     op.setParameter(CSVExampleSource.PARAMETER_FILENAME, "C:\\Users\\nelze\\unlabeled.csv");
		      RepositoryLocation loc = new RepositoryLocation("C:\\Users\\nelze\\unlabeled.csv");
			IOObjectEntry entry = (IOObjectEntry) loc.locateEntry();
			IOObject myIOObject = entry.retrieveData(null);
			IOContainer ioInput = new IOContainer(new IOObject[] {myIOObject});
			IOContainer ioResult = process.run(ioInput);
		      ExampleSet resultSet1 = (ExampleSet)ioResult.getElementAt(0);
		      System.out.println(resultSet1);

		    } catch (IOException | XMLException | OperatorException ex) {
		      ex.printStackTrace();
		    } catch (RepositoryException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		  
	}
}

I'm not sure if what I put in Repository is correct (I just hardcoded the filepath of the test data for now)

nelze · February 2014

So in the rapidminer process, I will remove the line going from inp to read csv completely?

Marco_Boeck · February 2014

Hi,

no, remove the connection between the process input port and the Read CSV operator, leave the others as they are. Also your code will not work because you do this:
RepositoryLocation loc = new RepositoryLocation("C:\\Users\\nelze\\unlabeled.csv");
which is not possible. A RepositoryLocation is a location in the RapidMiner repository, not a file on your harddisk. You need to configure your "Read CSV" properly so that after setting the com.rapidminer.operator.nio.CSVExampleSource.PARAMETER_CSV_FILE to the desired .csv file on your harddisk the process works.

In a simplified way you can say the following: "Read xyz" operators read data from an outside source (harddisk, database, web, ...) into the internal format (IOObject) used by RapidMiner. Usually this will be an ExampleSet, at least for data which can be represented in a table-like structure. Most operators then work with this internal data representation. At the end of the process you either save the results inside a repository (if you want to reuse the results inside another RapidMiner process, or you will make use of a "Write xyz" operator which will transform the data from the interal representation back into something of the outside worl (e.g. store it on the harddisk, in a database, ...)

Edit: updated to correct operator and parameter.

Regards,
Marco

nelze · February 2014

Hi, I apologize for the many questions but I am quite a newbie to this..

1. So my process now looks like this, is this correct?

2. I changed RepositoryLocation loc= new RepositoryLocation("//WalkingRepo//");
But then it says that IOObject is null

WalkingRepo is where my .rmp file is, same with the training data and I also put the test data in it for good measure.

3. How do I configure the Read CSV? The only setting I found there for the csv file (in rapidminer, the settings on the right tab) is where it makes me select a file on the disk. However, only the sample file is in the disk, but in reality I need to get the file dynamically (like the user inputs the directory in their own filepath), so I'm not sure how to configure Read CSV in this way

I only put the harddrive file location as a placeholder in the code.

4. I need to store the results of the classification to the database, but I was just planning on getting the ExampleSet result straight then put it to the web since I don't need a copy of it locally. But if I do need to store the results, what's the sample code for that?

Thank you!

nelze · February 2014

I edited my code to this


RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
		      RapidMiner.init();

		      Process process = new Process(new File("C:\\Users\\nelze\\.RapidMiner5\\repositories\\WalkingRepo\\NaiveClassify.rmp"));
		      Operator op = process.getOperator("Read CSV");
		      op.setParameter(CSVExampleSource.PARAMETER_FILENAME, "C:\\Users\\nelze\\unlabeled.csv");
		      IOContainer ioResult = process.run();
		      ExampleSet resultSet1 = (ExampleSet)ioResult.getElementAt(0);
		      System.out.println(resultSet1);

And it finally printed out the ExampleSet, but all it said was the attributes. How do I obtain the classification results? My unlabeled .csv file has multiple rows, so I was expecting to get an array of the prediction per row, like how it lists it in the "prediction" column when I run it in rapidminer.

Marco_Boeck · February 2014

Hi,

when you Syso an ExampleSet, you only get whatever the toString() method returns. An ExampleSet is basically a table - you can call


ExampleSet exampleSet = null;
Iterator<Attribute> allAttributes = exampleSet.getAttributes().allAttributes();
while (allAttributes.hasNext()) {
	Attribute att = allAttributes.next();
	for (Example example : exampleSet) {
		if (att.isNominal()) {
			System.out.println(example.getNominalValue(att));
		} else if (att.isDateTime()) {
			System.out.println(example.getDateValue(att));
		} else {
			System.out.println(example.getValue(att));
		}
	}
}

to iterate over the whole table structure.

Regards,
Marco

nelze · February 2014

Thank you very much! I was able to get the prediction results. I have another question though...

How do I prevent the first row of my test data from becoming a column name? I just noticed that my test data has 50 rows, but the prediction only gives 49 values.

Marco_Boeck · February 2014

Hi,

you can configure that in the Read Excel or Read CSV operator. See the "first row as names" parameter on them.

Regards,
Marco

nelze · February 2014

Good day! I was going to add the dynamic input of file, but now I get this error

com.rapidminer.operator.UserError: No input file was defined.

Any hints? I continued off from my last post
f = the file obtained dynamically

Process process = new Process(new File("C:\\Users\\nelze\\.RapidMiner5\\repositories\\WalkingRepo\\NaiveClassify.rmp"));
Operator op = process.getOperator("Read CSV");
op.setParameter(CSVExampleSource.PARAMETER_FILENAME, f);
IOContainer ioResult = process.run();
ExampleSet resultSet1 = (ExampleSet)ioResult.getElementAt(0);
Iterator<Attribute> allAttributes = resultSet1.getAttributes().allAttributes();
while (allAttributes.hasNext()) {
Attribute att = allAttributes.next();
for (Example example : resultSet1) {
if (att.isNominal()) {
System.out.println(example.getNominalValue(att));
}
}
}

My rapidminer process is still the same.
It also keeps telling me that CSVExampleSource is deprecated, but when I run this code and change CSV to Excel (I change all my csv files to excel too), it works properly. However, I can't use Excel because I need to import more libraries in Java. Is there an updated command for CSV?

Marco_Boeck · February 2014

Hi,

sorry, my bad. com.rapidminer.operator.nio.CSVExampleSource is the correct one, com.rapidminer.operator.io.CSVExampleSource is outdated and no longer used. You can see that as well when using the debugger and adding a breakpoint after calling process.getOperator("Read CSV"); and you will notice that op is an instance of the new version. This also changes the file parameter to com.rapidminer.operator.nio.CSVExampleSource.PARAMETER_CSV_FILE.

Regards,
Marco

nelze · February 2014

Thank you very much!

I was wondering also if there's a way to have string inputs in rapidminer? Because I was also planning to have the classification done in real time so it would take up a lot of space if I had to create a text file each and every time.

Marco_Boeck · March 2014

Hi,

you can exchange the "Read XYZ" operator with a direct connection from the process input port on the far left to your other operators. That way the exampleSet is not created by the "Read XYZ" operator but rather has to be supplied when running the process as a parameter for the run() method. You can create an exampleSet manually and set the values for all cells yourself.
See here for an example: http://rapid-i.com/rapidforum/index.php/topic,5531.msg19614.html#msg19614

Regards,
Marco

nelze · March 2014

Hi! Unfortunately, when I try to use com.rapidminer.operator.nio.CSVExampleSource and do the PARAMETER_CSV_FILE, it still gives the com.rapidminer.operator.UserError: No input file was defined.

I use it as
Operator op = process.getOperator("Read CSV");
op.setParameter(CSVExampleSource.PARAMETER_CSV_FILE, f);

where f is the filename

Marco_Boeck · March 2014

Hi,

can you please post the full relevant code and the full error log?

Regards,
Marco

nelze · March 2014

try {
RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
RapidMiner.init();

Process process = new Process(new File("C:\\Users\\nelze\\.RapidMiner5\\repositories\\WalkingRepo\\NaiveClassify.rmp"));

Operator op = process.getOperator("Read CSV");
op.setParameter(CSVExampleSource.PARAMETER_CSV_FILE, "C:\\Users\\nelze\\Desktop\\unlabeled.csv");

IOContainer ioResult = process.run();
ExampleSet resultSet1 = (ExampleSet)ioResult.getElementAt(0);
Iterator<Attribute> allAttributes = resultSet1.getAttributes().allAttributes();
while (allAttributes.hasNext()) {
Attribute att = allAttributes.next();
for (Example example : resultSet1) {
if (att.isNominal()) {
System.out.println(example.getNominalValue(att));
}
}
}

} catch (IOException | XMLException | OperatorException ex) {
ex.printStackTrace();
}

The stack trace is as follows

INFO: Process C:\Users\nelze\.RapidMiner5\repositories\WalkingRepo\NaiveClassify.rmp starts
com.rapidminer.operator.UserError: No input file was defined.
at com.rapidminer.operator.nio.model.CSVResultSet.openStream(CSVResultSet.java:137)
at com.rapidminer.operator.nio.model.CSVResultSet.open(CSVResultSet.java:79)
at com.rapidminer.operator.nio.model.CSVResultSet.<init>(CSVResultSet.java:73)
at com.rapidminer.operator.nio.model.CSVResultSetConfiguration.makeDataResultSet(CSVResultSetConfiguration.java:114)
at com.rapidminer.operator.nio.model.AbstractDataResultSetReader.createExampleSet(AbstractDataResultSetReader.java:127)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:52)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:36)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
at com.rapidminer.operator.Operator.execute(Operator.java:866)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:375)
at com.rapidminer.operator.Operator.execute(Operator.java:866)
at com.rapidminer.Process.run(Process.java:949)
at com.rapidminer.Process.run(Process.java:873)
at com.rapidminer.Process.run(Process.java:832)
at com.rapidminer.Process.run(Process.java:827)
at com.rapidminer.Process.run(Process.java:817)
at NaiveClassifier.main(NaiveClassifier.java:34)

nelze · March 2014

Nevermind, there was just a problem in the port. Although I get some weird happenings...

If I input the data dynamically, the result is different as when I manually put the file on my PC (I usually compare things to validate them) even though they have the same values. For example, the dynamically entered file returns predictions of 1,1,3,4 but when I run the file from my PC, it gives 1,2,3,4 even if it's using the same model and have exactly the same values.

Marco_Boeck · March 2014

Hi,

store both examplesets in your repository (one from loading manually, one from loading dynamically. Then compare them (column value types and actual data values). If they are 100% identical, they should produce the same result (unless the used operators use different random seeds).

Regards,
Marco

nelze · March 2014

How do I prevent the change of seed from happening? I already put the inits in an if statement like this

if (!rapidMinerInited)
{
RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
RapidMiner.init();

System.out.println(f);
process = new Process(new File("C:\\Users\\nelze\\.RapidMiner5\\repositories\\WalkingRepo\\NaiveClassify.rmp"));
rapidMinerInited = true;
}

But the results still get randomized... And yeah they are exactly the same

Marco_Boeck · March 2014

Hi,

in that case I also would like to take a look at the process itself, can you please post the xml of it?
And just to clarify - the only difference is one time you set the csv file directly inside the process and just execute it from Java, the other time you do not set the csv file in the operator and set it in Java before executing the process?

Regards,
Marco

nelze · April 2014

Ok I will look for the XML file, and yes that is the situation

I also wanted to ask, how do I setup RapidMiner on a server? Do I just copy all the jar files in the server machine?

Marco_Boeck · April 2014

Hi,

you just install RapidAnalytics (now RapidMiner Server) via the provided installer and follow the instructions in it.

Regards,
Marco

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Integrating RapidMiner with Java Application

Answers