Mistake in OperatorDocLoader.loadSelectedOperatorDocuLocally?

Urhixidur · December 2012

I've been looking into how RapidMiner defines operator documentation bundles in XML (resources/com/rapidminer/resources/i18n/OperatorsCoreDocumentation.xml) but then substitutes html in the GUI's help window. For instance, the stream_database operator's XML doc bundle (an element of OperatorsCoreDocumentation.xml) is not used: rather, RapidMiner substitutes /resources/com/rapidminer/resources/doc/core/stream_database.html

This is not hinted at anywhere within the operator documentation; there is nothing in the operator's description XML that links it to the html documentation (unlike the xml documentation, which is explicitly identified in the extension's OperatorsSHORTNAME.xml, 'operators' element, 'docbundle' attribute). It took me a while to find that it happens in OperatorDocLoader.loadSelectedOperatorDocuLocally, which (if online) first looks for a page on the RapidMiner wiki, and then falls back to an html file. Only if this also fails is the XML doc bunde actually used.

Fine, I'm now ready to write an html page similar to those describing the core operators and include it in my operator plugin...But loadSelectedOperatorDocuLocally constructs the name like this:

com/rapidminer/resources/doc/NAMESPACE/SHORTNAME:NAME.html

...which is not a legal file name under most file systems because of the colon (highlighted in red in the line above). The culprit is line 496:

String documentationResource = "/" + RESOURCE_SUB_DIR + "/" + namespace + "/" + opDesc.getKey() + ".html"

I have a feeling it should instead be:

String documentationResource = "/" + RESOURCE_SUB_DIR + "/" + namespace + "/" + opDesc.getKeyWithoutPrefix() + ".html"

Am I right?

(Note also how the 'key' used here is NOT the 'key' defined in the operator's XML description: it is instead the 'shortName' element prefixed to the 'name' element (with substitution of underscores for spaces); this is a little confusing)

Skirzynski · December 2012

Yes, you are right. We will fix that.

Currently we try to change the documentation system. The loading of the help from the html files is (or will be in near future) deprecated. For greater flexibility we will use XML files which is already done for a lot of our operators. If you take a look on the current code in the SVN repository you will notice a file "rm_doc.jar" in the lib directory. All documentation is stored here in a directory structure induced by the keys in the "OperatorsCore.xml" file. Every documentation is now a XML file and the issue with the prefix is already fixed for this mechanism and should work for your extension.

To write your own XML-documentation just create a directory with the name of your plugin namespace under the "resources" directory and then further subdirectories to match the group keys in your OperatorsXXX.xml file for a given operator. Then name a XML file after the key of your operator and your are done.

Use the XML files in the "rm_doc.jar" file as a template, e.g. the "Retrive" operator:


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="../../../documentation2html.xsl"?>
<p1:documents xmlns:p1="http://rapid-i.com/schemas/documentation/reference/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="http://rapid-i.com/schemas/documentation/reference/1.0                http://rapid-i.com/schemas/documentation/reference/1.0/documentation.xsd";>

	<!-- each operator should have a key that consists of "operator." plus the operator's key. -->
	<operator key="operator.retrieve" locale="en" version="5.1.012">

		<title>Retrieve</title>

		<synopsis>This operator reads an object from the data repository.</synopsis>

		<text>
			<paragraph>This operator can be used to access the repositories. It should replace all file access, since it provides full meta data processing, which eases the usage of RapidMiner a lot. In contrast to accessing a raw file, it provides the complete meta data of the data, so all meta data transformations are possible.</paragraph>
			<paragraph>An easier way to load an object from the repository is to drag and drop the required object from the Repositories View. This will automatically insert a Retrieve operator with correct path of the desired object.</paragraph>
			<paragraph>This operator has no input port. All it requires is a valid value in <em>repository entry</em> parameter.</paragraph>
		</text>

		<outputPorts>
			<port name="output" type="com.rapidminer.example.ExampleSet">It returns the object whose path was specified in <em>repository entry</em> parameter.</port>
		</outputPorts>
		<parameters>

			<!-- description of the parameters and the corresponding values -->
			<parameter key="repository_entry" type="string">
				<paragraph>A valid path should be specified here in order to load an object. This parameter references an entry in the repository which will be returned as the output of this operator. Repository locations are resolved relative to the repository folder containing the current process. Folders in the repository are separated by a forward slash (/), a &quot;..&quot; references the parent folder. A leading forward slash references the root folder of the repository containing the current process. A leading double forward slash is interpreted as an absolute path starting with the name of a repository.</paragraph>
				<paragraph>
					<ul>
						<li>&apos;MyData&apos; looks up an entry &apos;MyData&apos; in the same folder as the current process.</li>
						<li>&apos;../Input/MyData&apos; looks up an entry &apos;MyData&apos; located in a folder &apos;Input&apos; next to the folder containing the current process.</li>
						<li>&apos;/data/Model&apos; looks up an entry &apos;Model&apos; in a top-level folder &apos;data&apos; in the repository holding the current process</li>
						<li>&apos;//Samples/data/Iris&apos; looks up the Iris data set in the &apos;Samples&apos; repository.</li>
					</ul>
				</paragraph>
			</parameter>

			<!-- ... -->
		</parameters>

		<relatedDocuments>
			<!-- ... -->
		</relatedDocuments>

		<tutorialProcesses>
			<tutorialProcess key="process.retrieve.retrieve_golf" title="Retrieving Golf from Repository">
				<description>
					<paragraph>The Example Process loads Golf data set from repository. <em>Repository entry</em> parameter is provided with path &apos;//Samples/data/Golf&apos;, thus Golf data set is returned from Samples repository. As it can be seen in Results Workspace, Retrieve operator loads both data and meta data.</paragraph>
					<!-- tutorialProcess description: What is done and shown here? You can use formated text here -->
				</description>
				<!-- Copy process from RapidMiner's XML view to here -->

				<process version="5.1.011">
					<context>
						<input/>
						<output/>
						<macros/>
					</context>
					<operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
						<process expanded="true" height="377" width="681">
							<operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve" width="90" x="380" y="30">
								<parameter key="repository_entry" value="//Samples/data/Golf"/>
							</operator>
							<connect from_op="Retrieve" from_port="output" to_port="result 1"/>
							<portSpacing port="source_input 1" spacing="0"/>
							<portSpacing port="sink_result 1" spacing="0"/>
							<portSpacing port="sink_result 2" spacing="0"/>
						</process>
					</operator>
				</process>
			</tutorialProcess>
		</tutorialProcesses>
	</operator>
</p1:documents>

Skirzynski · December 2012

Urhixidur wrote:

(Note also how the 'key' used here is NOT the 'key' defined in the operator's XML description: it is instead the 'shortName' element prefixed to the 'name' element (with substitution of underscores for spaces); this is a little confusing)

Why? Can you explain your assumption, please! I am not so familiar with that code, but in the constructor of "OperatorDescription" says

key = XMLTools.getTagContents(element, "key", true);

Urhixidur · December 2012

Marcin wrote:

Currently we try to change the documentation system. The loading of the help from the html files is (or will be in near future) deprecated. For greater flexibility we will use XML files which is already done for a lot of our operators. If you take a look on the current code in the SVN repository you will notice a file "rm_doc.jar" in the lib directory. All documentation is stored here in a directory structure induced by the keys in the "OperatorsCore.xml" file. Every documentation is now a XML file and the issue with the prefix is already fixed for this mechanism and should work for your extension.

To write your own XML-documentation just create a directory with the name of your plugin namespace under the "resources" directory and then further subdirectories to match the group keys in your OperatorsXXX.xml file for a given operator. Then name a XML file after the key of your operator and your are done.

Fascinating. I thought the html took precedence over the xml, because that's what is currently displayed in the GUI (as of rev 734).

What about /resources/com/rapidminer/resources/i18n/OperatorsCoreDocumentation.xml? It has an 'operator' element with sub-elements 'name' ("Retrieve"), 'synopsis' and 'help' (containing a rudimentary description). By comparison, /resources/com/rapidminer/resources/OperatorsCore.xml has an 'operator' element with sub-elements 'key' ("retrieve"), 'class' ("com.rapidminer.operator.io.RepositorySource"), 'icon', and 'replaces'. At its root, OperatorsCore.xml sets docbundle to OperatorsCoreDocumentation. Meanwhile, core/repository_access/retrieve.xml opens with an 'operator' element whose key attribute (not a sub-element, mind you) is "operator.retrieve".

I also note the stream_database operator is not in lib/rm_doc.jar (there is no com.import.data in there), which explains why its html help is being displayed.

Urhixidur · December 2012

Urhixidur wrote:

(Note also how the 'key' used here is NOT the 'key' defined in the operator's XML description: it is instead the 'shortName' element prefixed to the 'name' element (with substitution of underscores for spaces); this is a little confusing)

I was talking about the key used by loadSelectedOperatorDocuFromWiki and got confused along the way. My apologies.

Here are the relevant fragments of my extension. In its build.xml, I have:


<project name="RapidMiner_Extension_LTF">
	<property name="extension.name" value="LTFReader" />
   <!-- extension.name.long must be "RapidMiner " + extension.name + " Extension" -->
	<property name="extension.name.long" value="RapidMiner LTFReader Extension" />
	<property name="extension.namespace" value="LTFDataReader" />
	<property name="extension.operatorDefinition" value="/com/rapidminer/resources/OperatorsLTFReader.xml" />

In OperatorsLTFReader.xml:


<operators name="LTFDataReader" version="5.0" docbundle="com/rapidminer/resources/i18n/OperatorsDocLTFReader">
   <group key="">
      <group key="import">
         <group key="data">
            <operator>
               <key>stream_trace</key>
               <class>com.rapidminer.operator.io.CachedLTFExampleSource</class>

In OperatorsDocLTFReader.xml:


   <operator>
      <key>stream_trace</key>
      <name>Stream LTF Trace</name>
      <shortName>LTFDataReader</shortName>

When com.rapidminer.gui.OperatorDocLoader.loadOperatorDocumentation is invoked for my extension (when I select it in the Operators palette of the GUI), it receives an OperatorDescription which has fullyQualifiedGroupKey == "import.data"; key == "stream_trace"; provider.extensionId == "rmx_LTFDataReader"; provider.name == "LTFReader"; and provider.prefix == "LTFDataReader".

Being online, loadOperatorDocumentation first tries loadSelectedOperatorDocuFromWiki, which builds
operatorWikiName = opDesc.getName().replace(" ", "_") thus "Stream_LTF_Trace" (it's the operator name, not the extension name). Then it gets the prefix from the provider:


String prefix = opDesc.getProvider().getPrefix();
prefix = Character.toUpperCase(prefix.charAt(0)) + prefix.substring(1);
operatorWikiName = prefix + ":" + operatorWikiName;

So it asks the wiki for the document "LTFDataReader:Stream_LTF_Trace". I get the warning:

WARNING: Could not open http://rapid-i.com/wiki/index.php?title=LTFDataReader:Stream_LTF_Trace: http://rapid-i.com/wiki/index.php?title=LTFDataReader:Stream_LTF_Trace

(It is not clear why some operators in the wiki have a "prefix:name" entry instead of a plain "name" entry)

This naturally fails so it falls back on OperatorDocLoader.loadSelectedOperatorDocuLocally which considers namespace to be the provider's extensionId and then builds
String documentationResource = "/" + RESOURCE_SUB_DIR + "/" + namespace + "/" + opDesc.getKeyWithoutPrefix() + ".html"
which yields /com/rapidminer/resources/doc/rmx_LTFDataReader/stream_trace.html.

Opening this stream also fails of course, so loadSelectedOperatorDocuLocally falls back on makeOperatorDocumentation which extracts the operatorDocBundle (com/rapidminer/resources/i18n/OperatorsDocLTFReader).

Most of this is now kind of moot, as I'll try and follow the rm_doc.jar approach instead.

Urhixidur · December 2012

An interesting aspect of the xml documentation in rm_doc.jar is the tutorialProcess element. I want to set one up, but this requires sample data to be included. Is there a way to add sample data to the RapidMiner repositories (a new Samples:data entry) as part of extension installation?

Oh, I see here (http://rapid-i.com/rapidforum/index.php/topic,5696.0.html) that it can't be done. With the RapidMiner sample data repositories. Is there a way my extension could install its own little data repository?

Skirzynski · December 2012

Urhixidur wrote:

Fascinating. I thought the html took precedence over the xml, because that's what is currently displayed in the GUI (as of rev 734).

What about /resources/com/rapidminer/resources/i18n/OperatorsCoreDocumentation.xml? It has an 'operator' element with sub-elements 'name' ("Retrieve"), 'synopsis' and 'help' (containing a rudimentary description). By comparison, /resources/com/rapidminer/resources/OperatorsCore.xml has an 'operator' element with sub-elements 'key' ("retrieve"), 'class' ("com.rapidminer.operator.io.RepositorySource"), 'icon', and 'replaces'. At its root, OperatorsCore.xml sets docbundle to OperatorsCoreDocumentation. Meanwhile, core/repository_access/retrieve.xml opens with an 'operator' element whose key attribute (not a sub-element, mind you) is "operator.retrieve".

I also note the stream_database operator is not in lib/rm_doc.jar (there is no com.import.data in there), which explains why its html help is being displayed.

Currently we are rewriting the operator help/description since it really needs an update. As part of this the documentation system has been reworked and as long as the documentation is not ready, we have several fallbacks which leads to confusion of course, sorry for that. But we appreciate your work on a good documentation for your extension.

Anyway, the planned order which documentation will be used is:

XML-files (for the core this is stored in the rm_doc.jar file)
Wiki
OperatorsDocXXX.xml
HTML-files

Skirzynski · December 2012

Urhixidur wrote:

An interesting aspect of the xml documentation in rm_doc.jar is the tutorialProcess element. I want to set one up, but this requires sample data to be included. Is there a way to add sample data to the RapidMiner repositories (a new Samples:data entry) as part of extension installation?

Oh, I see here (http://rapid-i.com/rapidforum/index.php/topic,5696.0.html) that it can't be done. With the RapidMiner sample data repositories. Is there a way my extension could install its own little data repository?

Yes, there is a way which is broken in the current branch. I will write you in this thread if I know more. Additionally we considering a mechanism to add own entries to the samples repository. Stay tuned! 8)

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Mistake in OperatorDocLoader.loadSelectedOperatorDocuLocally?

Answers