The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"[MOSTLY SOLVED] Trying to build the tutorial's extension"
I'm trying to follow the "How to Extend RapidMiner 5" white paper and having trouble.
At section 7.1 "The Extension Bundle", it talks of "the tutorial extension that comes with this guide. As all RapidMiner extensions it comes as a single jar file." No such jar came with the white paper or its accompanying zips (it's not in the Tutorial, Template or Unuk projects). The contents of the manifest which are given make it look like the jar is produced by the Tutorial, but section 7.2 "The ant Build File" contradicts this by clearly using the Template as the source.
The first confusion occurs in chapter 4 "Creating your own Extension", on p. 20, where it reads "If you are going to deploy your Extension to RapidMiner for testing purpose, you might execute the install target of the ant file build.xml." This should specify which build.xml: Template, Tutorial, or either.
Next, section 5.1 "Our First Operator" should indicate which (of Template or Tutorial) to use as the basis for the new class. I've used the Template.
Small correction: on p. 23 (section 5.2 "Adding Ports"), the 'exampleSetInput.getData()' call is deprecated, and should be replaced with 'exampleSetInput.getData(ExampleSet.class)'.
On p. 25 (section 5.3 "Declaring operators to RapidMiner"), the OperatorsTemplate.xml shown doesn't match the one that came with the Template project. It is currently loaded with a 'generate_extract' operator named 'com.rapidminer.operator.features.construction.TextInformationExtractionOperator'; this should be replaced with what is shown in the white paper (i.e. <key>numerical_to_date</key> <class>com.rapidminer.operator.preprocessing.transformation</class> <replaces>Numerical2Date</replaces>).
Once the OperatorsTemplate.xml is updated and the Numerical2DateOperator.java has been written in the Template's source, running the ant install fails. All 74 errors are along the lines of "error: package com.sun.javadoc does not exist", and clearly are a consequence of the javadoc's tools.jar not being in the JRE (it's in the JDK instead).
It's possible to add tools.jar manually to the Unuk project; in order for the .classpath addition to be a relative path, one could add tools.jar to /lib/. However, this is pointless as it will not solve the ant install failure.
It turns out one needs to set the JRE to the JDK's when running the ant install. From the Template's build.xml, choose Run as: Ant Build...: Edit Configuration: JRE: Separate JRE: jdk, then Run.
But my ant install still fails. The edited log concludes this post, below. The last error message is in French (my OS is French) and basically complains about "unauthorized content in prologue". The problem seems to be that the ant install run goes through all the projects in my Eclipse project repository, including the .metadata folder. Note that the error apparently concerns the very first file it looked at. How do I fix that?
At section 7.1 "The Extension Bundle", it talks of "the tutorial extension that comes with this guide. As all RapidMiner extensions it comes as a single jar file." No such jar came with the white paper or its accompanying zips (it's not in the Tutorial, Template or Unuk projects). The contents of the manifest which are given make it look like the jar is produced by the Tutorial, but section 7.2 "The ant Build File" contradicts this by clearly using the Template as the source.
The first confusion occurs in chapter 4 "Creating your own Extension", on p. 20, where it reads "If you are going to deploy your Extension to RapidMiner for testing purpose, you might execute the install target of the ant file build.xml." This should specify which build.xml: Template, Tutorial, or either.
Next, section 5.1 "Our First Operator" should indicate which (of Template or Tutorial) to use as the basis for the new class. I've used the Template.
Small correction: on p. 23 (section 5.2 "Adding Ports"), the 'exampleSetInput.getData()' call is deprecated, and should be replaced with 'exampleSetInput.getData(ExampleSet.class)'.
On p. 25 (section 5.3 "Declaring operators to RapidMiner"), the OperatorsTemplate.xml shown doesn't match the one that came with the Template project. It is currently loaded with a 'generate_extract' operator named 'com.rapidminer.operator.features.construction.TextInformationExtractionOperator'; this should be replaced with what is shown in the white paper (i.e. <key>numerical_to_date</key> <class>com.rapidminer.operator.preprocessing.transformation</class> <replaces>Numerical2Date</replaces>).
Once the OperatorsTemplate.xml is updated and the Numerical2DateOperator.java has been written in the Template's source, running the ant install fails. All 74 errors are along the lines of "error: package com.sun.javadoc does not exist", and clearly are a consequence of the javadoc's tools.jar not being in the JRE (it's in the JDK instead).
It's possible to add tools.jar manually to the Unuk project; in order for the .classpath addition to be a relative path, one could add tools.jar to /lib/. However, this is pointless as it will not solve the ant install failure.
It turns out one needs to set the JRE to the JDK's when running the ant install. From the Template's build.xml, choose Run as: Ant Build...: Edit Configuration: JRE: Separate JRE: jdk, then Run.
But my ant install still fails. The edited log concludes this post, below. The last error message is in French (my OS is French) and basically complains about "unauthorized content in prologue". The problem seems to be that the ant install run goes through all the projects in my Eclipse project repository, including the .metadata folder. Note that the error apparently concerns the very first file it looked at. How do I fix that?
Buildfile: C:\Users\username\Documents\Eclipse\RapidMiner_Extension_Template\build.xml
Trying to override old definition of task get
Trying to override old definition of task rpm
Trying to override old definition of task post
clean:
[echo] Cleaning...
[delete] Deleting directory C:\Users\username\Documents\Eclipse\RapidMiner_Extension_Template\build
[delete] Deleting directory C:\Users\username\Documents\Eclipse\RapidMiner_Extension_Template\javadoc
[mkdir] Created dir: C:\Users\username\Documents\Eclipse\RapidMiner_Extension_Template\build
[mkdir] Created dir: C:\Users\username\Documents\Eclipse\RapidMiner_Extension_Template\javadoc
version.get:
[echo] Long version: ${extension.version}.${extension.revision}.${extension.update}; short version: ${extension.version}.${extension.revision}
init.setEncoding:
init:
Trying to override old definition of task post
Trying to override old definition of task post
init.setEncoding:
copy-resources:
[echo] Copying resources...
[copy] Copying 9 files to C:\Users\username\Documents\Eclipse\RapidMiner_Extension_Template\build
build:
Trying to override old definition of task post
Trying to override old definition of task post
build.rm:
Trying to override old definition of task post
Trying to override old definition of task post
build.dependencies.prepare:
[echo] Dependencies of Template:
[echo] C:\Users\username\Documents\Eclipse\.metadata\.bak_0.log
...
(thousands of echo lines going through all projects in C:\Users\username\Documents\Eclipse)
...
[echo] C:\Users\username\Documents\Eclipse\RapidMiner_Unuk\svn.project
build.dependencies:
[echo] Building plugin dependencies of Template...
BUILD FAILED
C:\Users\username\Documents\Eclipse\RapidMiner_Unuk\build_extension.xml:139: The following error occurred while executing this line:
C:\Users\username\Documents\Eclipse\RapidMiner_Unuk\build_extension.xml:191: The following error occurred while executing this line:
C:\Users\username\Documents\Eclipse\RapidMiner_Unuk\build_extension.xml:196: The following error occurred while executing this line:
C:\Users\username\Documents\Eclipse\.metadata\.bak_0.log:1: Contenu non autorisé dans le prologue.
Total time: 8 seconds
Tagged:
0
Answers
My ant install log concludes this post. Now I'll try to get rid of the warnings.
The solution (found here: http://rapid-i.com/rapidforum/index.php/topic,2356.msg9362.html#msg9362) is to create a file named "build.properties" (at the project's root) with this content (modifying the values as required): Next! This is a harmless warning that is expected since ant 1.8. To fix it, add to the top-level build.xml (in this case the Template's build.xml) the following lines, right after the <description> element: You will henceforth receive the "Trying to override old definition of task javac" message instead (just before the "Trying to override old definition of task get" message).
Next! The solution is to fix this in the Unuk build_extension.xml, because it redefines the javac task at line 144. Simply change the 'target="1.6"' part into 'source="1.6" target="1.6"'. Like so: Next! This problem occurs because the META-INF/MANIFEST.MF of lib/jdbc/jtds-1.2.2.jar contains a "Class-Path: jcifs.jar" statement.
One solution is to rebuild the offending jar with a MANIFEST.MF that does not have a Class-Path entry. Another is allegedly to pass the -Xlint:-path argument to javac. So, in Unuk's build_extension.xml, in the block we just modified, right after <compilerarg value="${compiler.arguments}" />, we add this: Well, it's supposed to work but does not. Maybe this is an ant 1.8 bug? I've tried every occurrence of javac in both build.xml files and the build_extension.xml, with no luck.
Next! One solution is to provide the path to the rt.jar of the correct source/target version, e.g. <javac ... bootclasspath="/opt/sun-jdk-1.5.0.22/jre/lib/rt.jar" source="1.5" target="1.5" />.
Failing this, we go back to the build.dependencies bit of build_extension.xml and add, right after <compilerarg value="${compiler.arguments}" />, this: Unlike <compilerarg value="-Xlint:-path" />, this works.
Minor: As part of its startup, RapidMiner emits this troubling message: Apparently this is a "normal" SAXException occurring when processing an input XML file, either because the file is missing or because one is trying to read it a second time (see e.g. http://www.danielschneller.com/2008/01/saxparseexception-1-1-premature-end-of.html). ; Anyone know where in RapidMiner this exception is being thrown and caught? It'd be nice to get rid of it cleanly.
Major:
The Template manages to correctly run its ant build and produces an apparently well-formed jar in RapidMiner's plugins directory. But when I run it (from within Eclipse), the plug-in is not being loaded. My own plugin is derived from the Template, so of course it doesn't load either. What is missing?
About the issue with the xml file: does it occur only when your extension is activated (so is it really related to your extension)? Do your operators.xml and operatorsDocumentation.xml etc. have correct syntax?
Regards,
Marius
com.rapidminer.operator.OperatorCreationException: Operator cannot be constructed: 'read_ltf(com.rapidminer.operator.io.LTFDataReader)': tried to access method com.rapidminer.operator.io.AbstractDataReader$CacheResetParameterObserver.<init>(Lcom/rapidminer/operator/io/AbstractDataReader;Ljava/lang/String;)V from class com.rapidminer.operator.io.LTFDataReader
Which occurs at the "getParameters().addObserver(new CacheResetParameterObserver(PARAMETER_LTF_DIR), false);" line in the constructor.
build.properties simply specifies the project version: The RapidMiner_Extension_LTF project was prepared by copying the RapidMiner_Extension_Template project and first changing the *Template* file names into *LTFReader*.
# com.rapidminer.PluginInitLTFReader doesn't do anything
# com.rapidminer.operator.io.LTFDataReader is as given above
# resources com.rapidminer.resources.i18n.ErrorsLTFReader.properties is empty
# resources com.rapidminer.resources.i18n.GUILTFReader.properties is empty
# resources com.rapidminer.resources.i18n.UserErrorMessagesLTFReader.properties is empty
# resources com.rapidminer.resources.GroupsLTFReader.properties is empty
# resources com.rapidminer.resources.i18n.OperatorsDocLTFReader.xml is as follows: # resources com.rapidminer.resources.ioobjectsLTFReader.xml is essentially empty: # resources com.rapidminer.resources.parserulesLTFReader.xml is also essentially empty: # resources com.rapidminer.resources.OperatorsLTFReader.xml is as follows: # build.xml, finally, is as follows: I suspect the error is in build.xml, or lies in some missing file, maybe even a missing Java class.
com.rapidminer.RapidMiner.init(), line 534: "DatabaseConnectionService.init"
com.rapidminer.tools.jdbc.connection.DatabaseConnectionService.init, line 94: "Tools.readTextFile(xmlConnectionsFile)"
com.rapidminer.tools.Tools.readTextFile, line 776: "Document processXmlDocument = documentBuilder.parse(inStream)"
Because the file "/.RapidMiner5/connections.xml" exists but is empty (length zero). javax.xml.parsers.DocumentBuilder throws SAXException AND writes "[Fatal Error] :1:1: Premature end of file" to stderr. There seems to be no way around this behaviour short of changing the Java source code.
The simplest workaround is to add a line at the beginning of Tools.readTextFile (just before line 763):
if (File.length() <= 0) return "";
I also get an annoying "No file exists" message at the end of the RapidMiner startup. This comes from:
com.rapid_i.deployment.update.client.UpdateManager.checkForPurchasedNotInstalled, line 817: "UserCredential authentication = Wallet.getInstance().getEntry(updateServerURI);"
com.rapidminer.gui.security.Wallet.readCache(), line 80: "System.err.println("No file exists");"
Clearly this System.err call should be rewritten to use LogService instead (a String will need to be added to resources/com/rapidminer/resources/i18n/LogMessages.properties). The message should be made more meaningful as well, something like "No stored credentials file exists".
I have this type of data set, stored in a peculiar binary format in the form of a set of files in a directory. I know how to read the format, which actually starts with its own metadata description. So, for each record I can readily tell what the fields will be, and what their data types will be. The records come in a variety of types, each with its own (typically fairly small) set of fields. About the only field they will all have in common is a timestamp.
The data sets can be very, very large (gigabytes and more), which means they can be sampled or scanned but certainly can't be all read into memory. Filtering by record type (or by timestamp) is a possibility.
My problem is how do I get this into RapidMiner? We first thought to have an operator similar to the "Read CSV" operator, with maybe some parameters to achieve filtering. But the family of CSV-related classes leaves me confused. There is CSVDataReader, used by CSVImportWizard, on the one hand, and the CSVFileReader on the other hand. The two don't seem to have much to do with each other at all. CSVFileReader is used only by SimpleExampleSource, which is deprecated, so probably not a good example to follow. But CSVImportWizard seems to try to do one thing: import data into a repository. Is the size of the data sets going to be a problem in this paradigm?
I'd appreciate guidance.
1) Why is your connections.xml empty? It's created non empty when it does not yet exist as I just verified.
2) Useless error message as it is no error. I removed it.
3) See clases in com.rapidminer.operator.nio - com.rapidminer.operator.io is outdated. To see csv examples, look at CSVExampleSource and CSVResultSetConfiguration.
4) Well data in a repository is stored in a file on a harddisk - so if your IOObject is huge, the file will be huge, and currently will be read into memory as a whole. So if that's out of the question, you are better off to get your data into a database or split your data up into smaller chunks.
Regards,
Marco
4) I guess this means I should set up my source component with built-in filtering. This could be exposed as input ports so a process could then chunk its way through. Or manage the ExampleTable so that it becomes a window into the data set (with older rows scrolling off as new rows come in), or have a fully virtual ExampleTable. I think I'll first try to see if I can correctly create a new source operator that works just like the csv one.
Do I understand correctly that the Data:Import:Read CSV operator seen in RapidMiner is implemented by the CSVExampleSource class?
com.rapidminer.operator.nio.CSVExampleSource, yes.
Regards,
Marco