The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Converting text file containing articles into solr index json format

VickyVicky Member Posts: 3 Learner I
edited November 2019 in Help
Hi Folks,
I'm creating a chatbot to retrieve content from an article. I have about 10 text files. When I tried using solr, it's accepting json/xml with key/value pair format in it.
How do I convert the text to this format?

Please help.

Answers

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @Vicky yes that can be done pretty easily. Do you have the JSON format / sample JSON that Solr is looking for?
  • VickyVicky Member Posts: 3 Learner I
    I don't have one. It's a general blog article collection. Wondering how that can be converted.

    One sample I see in solr example is 

    [
      {
        "id" : "978-0641723445",
        "cat" : ["book","hardcover"],
        "name" : "The Lightning Thief",
        "author" : "Rick Riordan",
        "series_t" : "Percy Jackson and the Olympians",
        "sequence_i" : 1,
        "genre_s" : "fantasy",
        "inStock" : true,
        "price" : 12.50,
        "pages_i" : 384
      }
  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    yep so you just build it. Let me see if I can build this example for you so you can see...


  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    ok this is everything except for the 'cat' field which can be built in a similar way if you understand what I'm doing here:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000-BETA4">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.000-BETA4" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="utility:create_exampleset" compatibility="9.5.000-BETA4" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="187">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="America/New_York"/>
            <parameter key="input_csv_text" value="id,name,author,series_t,sequence_i,genre_s,inStock,price,pages_i&#10;978-0641723445,The Lightning Thief,Rick Riordan,Percy Jackson and the Olympians,1,fantasy,true,12.50,384"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
            <description align="center" color="transparent" colored="false" width="126">everything except cat</description>
          </operator>
          <operator activated="true" class="text:data_to_json" compatibility="8.2.000" expanded="true" height="82" name="Data To JSON" width="90" x="179" y="187">
            <parameter key="ignore_arrays" value="false"/>
            <parameter key="generate_array" value="false"/>
            <parameter key="include_missing_values" value="false"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document" width="90" x="179" y="34">
            <parameter key="text" value="["/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
            <description align="center" color="transparent" colored="false" width="126">[</description>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="8.2.000" expanded="true" height="68" name="Create Document (2)" width="90" x="179" y="340">
            <parameter key="text" value="]"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
            <description align="center" color="transparent" colored="false" width="126">[</description>
          </operator>
          <operator activated="true" class="text:combine_documents" compatibility="8.2.000" expanded="true" height="124" name="Combine Documents" width="90" x="313" y="136"/>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Data To JSON" to_port="example set 1"/>
          <connect from_op="Data To JSON" from_port="documents" to_op="Combine Documents" to_port="documents 2"/>
          <connect from_op="Create Document" from_port="output" to_op="Combine Documents" to_port="documents 1"/>
          <connect from_op="Create Document (2)" from_port="output" to_op="Combine Documents" to_port="documents 3"/>
          <connect from_op="Combine Documents" from_port="document" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    Scott

    [PS nice choice of book - love Percy Jackson!]
  • VickyVicky Member Posts: 3 Learner I
    Thanks. About to travel for some hours. I'll check it out.
Sign In or Register to comment.