Importing data and using custom python code to read non-csv/txt file -> graph
Hello,
I am in the process of learning RM and I have a working piece of python code I'd like to compile and produce a graph with, but I am having a hard time setting everything up and running it properly. The dataset is a non-text, csv file so I can't upload it normally through the user interface but it can be read using a python module called nmrglue. I have moved all the necessary files into the local repository and have checked the extension is setup properly for Python (it is up to date and matching). However, it does not seem to be picking up the imports from class to class within, despite the many combinations I've tried attaching the input/output process tree with.
I have attached the raw python script + related file to use and would like a visual instruction on how to properly import it.
I just need help setting it up but I think I am missing something obvious in the process.
Thanks,
Answers
Hi,
i quickly looked into it. Where can i find your rm_main function?
~Martin
Dortmund, Germany
As @mschmitz pointed out, to use Python inside RapidMiner you'll need to encapsulate it in a function (see example image below) and call Pandas as a default. We need Pandas to generate the dataframes between RM and Python.
Hello,
I input the function by relabeling my 'main' class to be under rm_main(data), but I constantly went into a whitespace error, as found as such.
Also the next input:
import nmrglue as ng
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats
import numpy as np
import os
import pylab
def rm_main(data):
def load(path):
dic, data = ng.fileio.bruker.read_pdata(path)
udic = ng.bruker.guess_udic(dic,data)
for k in udic[0].items():
print(k)
[udic[n]["size"] for n in range(udic["ndim"])]
spectrum = data[:]
# store them as float
CAR = float(udic[n]["car"])
SW = float(udic[n]["sw"])
OBS = float(udic[n]["obs"])
num_points = float(len(spectrum))
# needed top divide car by obs to get the carrier in ppm
freq_max = (.5) /float(OBS/SW)+float(CAR/OBS);
freq_min = (.5-((num_points-1)/num_points))/float(OBS/SW)+float(CAR/OBS);
step = (freq_max-freq_min)/(num_points-1)
domain = []
spectrum_flip = []
for i in range(len(spectrum)):
domain.append(freq_min+i*step)
spectrum_flip.append(spectrum[len(spectrum)-i-1])
return domain, spectrum_flip
Essentially, what I want to accomplish, is to run my python code, turn that data into a useable matrix that can take advantage of RapidMiner's analysis tools. I can combine all the modules of main, load, draw etc into one process by combining all the Python functions, but I just essentially want the program to run after the import glob module given by looping over the first 'block'.
Thanks,
Hi,
you need to return pandas dataframes, not lists. That should do it.
~Martin
Dortmund, Germany
Hello,
I replaced all my list comprehensions with pandas.DataFrame along with append() to copy the append() from the original code. However, I am constantly still receiving this error:
IndentationError:expected an indented block line 13, 17 etc.
I have cut out the white space, tried virtually every combination of indentations below each iteration, but it is to no avail.
Hey,
I guess you already understood the main concepts, but let me repeat some things to potentially fill in missing information before providing a potential solution.
Using the "Execute Python" operator you're able to execute python code within RapidMiner. It accesses the python version specified unter "Settings" -> "Preferences..." -> "Python Scripting" -> "Path to Python executable"
Define Python Version to use
Find an example path specification using Python from Anaconda installed under Windows in the screenshot. Now you can access libraries installed for this python version in the operator by importing them like you're used to do in python.
The Python code in the operator uses 4 spaces as one indentation level. So if you receive indentation errors make sure the indentation equals 4 times the indentation level desired. For example, when I copy the code of your "rm_main" it contains a mixture of tabs and spaces, as well as indentations consisting of only 2 spaces. Some editors (like sublime for example) offer the option to display whether tabs or spaces are used.
After dealing with the indentation error make sure to form a proper Pandas DataFrame object. I looked into the "nmrglue" library and the "fileio.bruker.read_pdata" method seems to already return a dictionary of the given data. Fortunately Pandas DataFrames take that as an input. So you might directly create a DataFrame out of the returned object. This even has the advantage, that columns are properly named from the beginning.
Now having your Pandas DataFrame instance you can deliver that in the return statement of your "rm_main" function. Afterwards the "Execute Pyhton" Operator converts the DataFrame to an Example Set (which is used for RapidMiner to manage matrix like data). You can access this Example Set at the Operators Outputport. The first DataFrame returned is delivered at the top most Output Port and so on.
Here is some example code, where you only need to adjust the path to the file you want to read:
Notes:
Edit:
If you are using Windows, make sure to escape backslashes, when providing the path. This means, that you need to provide 2 backslashes, I added it to my code example above.
Again, I have set the directory correctly and the code works just fine without RapidMiner. I am also using Linux and I copy-pasted your code with the parameters in mind. I suspect I may have to combine all the class objects into a single block. It does not work. I also use gedit and sublime to keep track of indentations and I removed all spaces/tabs that were unnecessary from within.
Do I have to move the python files into the module folders where my Python install is? For some reason it is not able to detect the class attached to it sequentially to the right.
The script could not be parsed.
Please check your Python script: Import Error: No module named bin_spectrum
bin_spectrum is the name of my other class with its own set of functions to be called on.
Have you tried adding the folder containing your lib files to the PYTHONPATH?
There is an environment variable (ref.: pythonpath), that is used to look up python modules. In order for it to recognize scripts as a module you need to provide an `__init__.py` file. It can be empty, but it has to be inside the folder you want to add.
Another option is to create an installable module out of your code using a `setup.py` (ref.: creating a setup file). This allows for an installation via pip. If you're choosing this solution, you might want to use the option `-e` during installation. It allows for continuous work on the python files without having to reinstall the module over and over again. (possible installation call `pip install -e folder_containing_the_setup_py`)