Scientific Notation for very small numbers 1E-12

dragoljub · June 2010

I have imported some data from a csv file using the AML operator. The data has columns of small E-12 valued data.

I noticed that in the results view all very small numbers are represented as zeros. Even in the meta data view the statistics is all zero. However, when you copy and paste the entry you see that the correct E-12 number is stored there.

Does rapid miner correctly use these numbers (E-10 - E-12 range) or does it assume zero for the processing operators. I suppose I could scale up by some constant but is that necessary?

Also is there any way to show scientific notation in the results view? ;D

Thanks,
-Gagi

dragoljub · June 2010

I have also noticed that this could be problematic when using the 'Remove Useless' operator. It seems like for very small numbers the statistics are not correctly calculated since they are always interpreted as zero rather than normalized values. ???

-Gagi

haddock · June 2010

Hi there,

In Rapido reals are really reals, they are only rounded up for display, according to the 'fractiondigits.number' preference setting. As for imposing scientific notation, or others ....

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="206" width="681">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="111" y="67">
        <parameter key="attributes_lower_bound" value="-1.0E-100"/>
        <parameter key="attributes_upper_bound" value="1.0E-100"/>
      </operator>
      <operator activated="true" class="format_numbers" expanded="true" height="76" name="Format Numbers" width="90" x="313" y="75">
        <parameter key="format_type" value="pattern"/>
        <parameter key="pattern" value="0.###E0"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Format Numbers" to_port="example set input"/>
      <connect from_op="Format Numbers" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

land · June 2010

Hi,
in addition to what haddock said: The Remove Useless operator uses the standard deviation of the attribute values to determine if it's useless. If your numbers are very small, you will have to lower the threshold accordingly.
I think it would be smarter to use some mean weighted threshold, but anyway, the remove useless operator should be avoided for attributes having different values at all if possible. The usage of a learner based attribute selection will be far preferable.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Scientific Notation for very small numbers 1E-12

Answers