The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Reading data using field name

robinrobin Member Posts: 100 Guru
edited November 2018 in Help
I am read a file into RM where there is no header row, each field has the name included in the filed value. 

So where a typical CSV file would be:
ice_cream ,chocolate, candy
1,4,5
6,4,2

My files looks like:
"ice_cream"="1","chocolate"="4","candy"="5"
"ice_cream"="6","chocolate"="4","candy"="2"

Various other data mining programs allow for the "retain name" function, how does one deal with this inside of RapidMiner?

The problem that I face is that these files are large, reading them in retaining the field information and replacing it later with an operator uses more than the available system memory. 

Best Answer

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi @robin ,

    this format looks very wired. Why is this being used? It produces a ton on overhead while storing it.

    Anyway, is the ordering always the same? If yes, you can just read it as polynominals and replace.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • robinrobin Member Posts: 100 Guru
    Yes, it is very heavy. It makes the file enormous. So large that I am unable to read the entire file into RM for processing, just cannot get to the point of using the replace operator. 

    In other programs there is the ability to read this in as a field name, can one do this in RM?
  • robinrobin Member Posts: 100 Guru
    In Linux I would use the stream editor and do:

    sed 's/"ice_cream"="/g'

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="productivity:execute_program" compatibility="9.0.002" expanded="true" height="103" name="Execute Program" width="90" x="246" y="136">
            <parameter key="command" value="sed 's/&quot;ice_cream&quot;=&quot;/g'"/>
            <parameter key="working_directory" value="/Users/robinmeisel/sweets/sweets.flatfile.1"/>
            <list key="env_variables"/>
          </operator>
          <connect from_op="Execute Program" from_port="out" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    But this is a a windows machine I am working on. 
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,

    you would need to read this in completly using Read CSV and then parse it with Replace. There is currently no version of processing a file line by line. It's not to hard to write it though.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.