The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to set DTD parameter in FeatureExtraction (rapidminer UI)
because I keep getting IOException thrown from FeatureExtraction:
Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Regards,
skarab
Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Regards,
skarab
0
Answers
I'm sorry, but what exactly are you doing? It would be the easiest to post the process and do a little explanation. And for motivating all other users to answer your questions, it could be a smart move to add something like "hello" in front of your message...
Greetings,
Sebastian
<operator name="FeatureExtraction" class="FeatureExtraction" breakpoints="before,within,after">
<list key="texts">
<parameter key="tmp_file" value="%{parent_path}\tmp%{file_name}\%{file_name}"/>
</list>
<parameter key="default_content_type" value="html"/>
<parameter key="default_content_encoding" value="UTF-8"/>
<parameter key="default_content_language" value="pl"/>
<parameter key="use_content_attributes" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<list key="attributes">
<parameter key="html" value="/h:html"/>
</list>
<list key="namespaces">
<!-- I tried to set it in namespaces -->
<parameter key="html" value="C:\\workspace-rapidminer\xhtml1-transitional.dtd"/>
</list>
</operator>
I don't think, the namespace is either needed, nor is it correctly defined. So the easiest solution would be to erase this parameter...
Anyway it is only used for XPath requests for more complicated XML objects...I have never had to use them for HTML...
Greetings,
Sebastian
Defining namespace does not matter in my case, I still get this exception... I am using Java 1.6.0.16 on VISTA.
Regards
Skarab
I solved the problem...
First I removed
<!DOCTYPE html PUBLIC [^>]*> using TextCleaner.
After that I attached a path to local dtd:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >
using SingleTextObjectInput:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}
Here is my brute force solution (I get a html page as a TextObject):
<operator name="TextCleaner" class="TextCleaner">
<parameter key="deletion_regex" value="<!DOCTYPE html PUBLIC [^>]*>"/>
</operator>
<operator name="TextObject2ExampleSet" class="TextObject2ExampleSet">
<parameter key="keep_text_object" value="true"/>
<parameter key="text_attribute" value="my_doc_text"/>
<parameter key="label_attribute" value="my_doc_label"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="my_doc_text"/>
<operator name="SingleTextObjectInput" class="SingleTextObjectInput">
<parameter key="text" value="<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}"/>
</operator>
</operator>
Regards,
Wojtek