"[SOLVED] Accessing Text Data in Blob in MySQL"

Datadude · December 2012

I'm trying to access a text stored in a Blob field in MySQL. However as I'm putting my job together I"m getting an error message:

The example set must contain at least one text attribute.

It doesn't seem like Rapid understands that there might be text data in that binary field. I"m trying to figure out how to do the conversion so that Rapid Miner understands what is going on here. I"m attempting to connect my Read Database component to my Process Documents from Data component so that the Process Documents from Data Component can execute upon the text residing in the binary field. Is there a way to do this conversion in RapidMiner?

awchisholm · December 2012

Hello,

It's probably because you need to change the type of the attribute you're interested in to text. Use the "nominal to text" operator.

Regards,

Andrew

Datadude · December 2012

Ok...so I did that and I seem to be getting to the next step. Thanks...but now I'm getting a another exception:

Dec 16, 2012 11:39:00 PM SEVERE: Process failed: operator cannot be executed (java.lang.String cannot be cast to org.jdom.Text). Check the log messages...
Dec 16, 2012 11:39:00 PM SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
+- Read Database[1] (Read Database)
+- Nominal to Text[1] (Nominal to Text)
+- Process Documents from Data[1] (Process Documents from Data)
subprocess 'Vector Creation'
==> +- Extract Information[1] (Extract Information)

Is there something I need to do before I can process an xml string with XPath?

awchisholm · December 2012

The best thing to do is to post your process so we can see the details.

Andrew

Datadude · December 2012

Here is it:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <parameter key="logfile" value="/Users/wardloving/Documents/Data Mining/log.out"/>
    <parameter key="resultfile" value="/Users/wardloving/Documents/Data Mining/results.out"/>
    <process expanded="true" height="116" width="614">
      <operator activated="true" class="read_database" compatibility="5.2.008" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
        <parameter key="connection" value="Local MySQL Nutch"/>
        <parameter key="query" value="SELECT content&#10;FROM webpage where Id = 'org.episcopalchurch.www:http/parish/all-saints-episcopal-church-vista-ca'"/>
        <enumeration key="parameters"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="5.2.008" expanded="true" height="76" name="Nominal to Text" width="90" x="246" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="content"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Data" width="90" x="514" y="30">
        <parameter key="create_word_vector" value="false"/>
        <list key="specify_weights"/>
        <process expanded="true" height="252" width="1095">
          <operator activated="true" class="text:extract_information" compatibility="5.2.004" expanded="true" height="60" name="Extract Information" width="90" x="112" y="75">
            <parameter key="query_type" value="XPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries">
              <parameter key="Name" value="substring-before(//title, ',')"/>
              <parameter key="Staff Name" value="substring-before(//*[@class = 'field field-type-text field-field-clergy']/div/div/node()[not(self::div)],',')"/>
              <parameter key="Staff Title" value="substring-after(//*[@class = 'field field-type-text field-field-clergy']/div/div/node()[not(self::div)], ',')"/>
              <parameter key="Address Line 1" value="//*[@class = 'street-address']"/>
              <parameter key="City" value="//*[@class = 'locality']"/>
              <parameter key="State" value="//*[@class = 'region']"/>
              <parameter key="Zip" value="//*[@class = 'postal-code']"/>
              <parameter key="Email" value="//*[@class = 'field field-type-text field-field-email']/div/div/node()[not(self::div)]"/>
              <parameter key="Phone" value="//*[@class = 'field field-type-text field-field-phone']/div/div/node()[not(self::div)]"/>
              <parameter key="Fax" value="//*[@class = 'field field-type-text field-field-fax']/div/div/node()[not(self::div)]"/>
              <parameter key="URL" value="//*[@class = 'field field-type-text field-field-fax']/div/div/node()[not(self::div)]"/>
              <parameter key="Twitter" value="//*[@class = 'field field-type-text field-field-twitter']/div/div/node()[not(self::div)]"/>
            </list>
            <list key="namespaces"/>
            <list key="index_queries"/>
          </operator>
          <connect from_port="document" to_op="Extract Information" to_port="document"/>
          <connect from_op="Extract Information" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="36"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Database" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

awchisholm · December 2012

Hello

I tried with some local data and it seems the output from the read database operator is already of the right type so the conversion is not necessary. So I've learned something

Try unchecking "assume html" on the "extract information" operator.

Andrew

Datadude · December 2012

Thanks awchisholm for the tip. When I removed this attribute the error stopped showing up. This is good. Unfortunately, it revealed that my HTML content in my database has been truncated/corrupted making parsing with XPath difficult. Sometimes you just can't win but at least I know what is going on.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"[SOLVED] Accessing Text Data in Blob in MySQL"

Answers