[SOLVED] Data To JSON operator issue with array greater than 10 items

mrmikev · May 2015

The result of our process gets saved as an array of arrays in MongoDB. When the data is run through Data to JSON operator, here's what we get:

{ "inputs" : [ { "matrix" : [ { "amount" : 0 },
            { "amount" : 10 },
            { "amount" : 20 },
            { "amount" : 30 },
            { "amount" : 40 },
            { "amount" : 50 },
            { "amount" : 60 },
            { "amount" : 70 },
            { "amount" : 80 },
            { "amount" : 90 }
          ],
        "matrix[10]" : { "amount" : 100 },
        "matrix[11]" : { "amount" : 110 },
        "matrix[12]" : { "amount" : 120 }
      },
      { "matrix" : [ { "amount" : 0 },
            { "amount" : 10 },
            { "amount" : 20 },
            { "amount" : 30 },
            { "amount" : 40 },
            { "amount" : 50 },
            { "amount" : 60 },
            { "amount" : 70 },
            { "amount" : 80 },
            { "amount" : 90 }
          ],
        "matrix[10]" : { "amount" : 100 },
        "matrix[11]" : { "amount" : 110 },
        "matrix[12]" : { "amount" : 120 }
      }
    ] }

Here's what we expect:

{ "inputs" : [ { "matrix" : [ { "amount" : 0 },
            { "amount" : 10 },
            { "amount" : 20 },
            { "amount" : 30 },
            { "amount" : 40 },
            { "amount" : 50 },
            { "amount" : 60 },
            { "amount" : 70 },
            { "amount" : 80 },
            { "amount" : 90 },
            { "amount" : 100 },
            { "amount" : 110 },
            { "amount" : 120 }
          ] },
      { "matrix" : [ { "amount" : 0 },
            { "amount" : 10 },
            { "amount" : 20 },
            { "amount" : 30 },
            { "amount" : 40 },
            { "amount" : 50 },
            { "amount" : 60 },
            { "amount" : 70 },
            { "amount" : 80 },
            { "amount" : 90 },
            { "amount" : 100 },
            { "amount" : 110 },
            { "amount" : 120 }
          ] }
    ] }

In fact, if we start with the expected outcome data in a Create Document operator, run it through JSON to Data (all looks good so far!), then directly through Data to JSON, we still get the malformed results. I've attached a sample process that demonstrates as much:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="6.1.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="120">
        <parameter key="text" value="{&quot;inputs&quot;:[{&quot;matrix&quot;:[{&quot;amount&quot;:0},{&quot;amount&quot;:10},{&quot;amount&quot;:20},{&quot;amount&quot;:30},{&quot;amount&quot;:40},{&quot;amount&quot;:50},{&quot;amount&quot;:60},{&quot;amount&quot;:70},{&quot;amount&quot;:80},{&quot;amount&quot;:90},{&quot;amount&quot;:100},{&quot;amount&quot;:110},{&quot;amount&quot;:120}]},{&quot;matrix&quot;:[{&quot;amount&quot;:0},{&quot;amount&quot;:10},{&quot;amount&quot;:20},{&quot;amount&quot;:30},{&quot;amount&quot;:40},{&quot;amount&quot;:50},{&quot;amount&quot;:60},{&quot;amount&quot;:70},{&quot;amount&quot;:80},{&quot;amount&quot;:90},{&quot;amount&quot;:100},{&quot;amount&quot;:110},{&quot;amount&quot;:120}]}]}"/>
      </operator>
      <operator activated="true" class="text:json_to_data" compatibility="6.1.000" expanded="true" height="76" name="JSON To Data" width="90" x="246" y="120"/>
      <operator activated="true" class="text:data_to_json" compatibility="6.1.000" expanded="true" height="76" name="Data To JSON" width="90" x="380" y="120"/>
      <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
      <connect from_op="JSON To Data" from_port="example set" to_op="Data To JSON" to_port="example set 1"/>
      <connect from_op="Data To JSON" from_port="documents" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

It appears second level array items with an array index greater than one digit do not get interpreted appropriately.

Thank you in advance for your help!

Thank you for getting this resolved promptly with the 6.4.1 Text Mining Extension release.

MichaelKnopf · May 2015

Thank you for reporting this issue. Your description and example process have been very helpful to reproduce this problem. It is definitely a bug in our implementation.

Unfortunately, I see no way to work around it with the current release of the the Text Processing extension.

We will try to fix this as soon as possible. I'll keep you updated.

MichaelKnopf · May 2015

It appears second level array items with an array index greater than one digit do not get interpreted appropriately.

Spot on. Caused by two left over brackets in a regular expression.

We hope to release an update of the extension by the end of this week or the beginning of next week.

MichaelKnopf · May 2015

And it's out!

mrmikev · May 2015

Great! I'll download it, run it through the paces, then mark this as solved.

Thank you for the prompt turn-around on this!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[SOLVED] Data To JSON operator issue with array greater than 10 items

Answers