How to Analyze Time Data Per Person

n_alkassab · November 2018

Hey there,

I am very new to RapidMiner and have the following task to do:
I have data collected from activity trackers for different individuals. The trackers show step count, heart rate, and blood pressure and how they change every second. I want to use the step count data to predict blood pressure using different machine learning models. However, I am struggling to set up the data, because of the millions of time data corresponding to one person ( I have a total of 20 people). Any suggestions?

rfuentealba · November 2018

Hi @n_alkassab, and welcome to the RapidMiner Community!

In order to help you, let's decompose this problem in some steps first, to help you with this:

Getting the repository prepared.
Adding data to the repository.
Training your models with time series data.

Why? Because since every person is different, so training a single model might be a bit overkill. That is the approach I used, at least

Getting the repository prepared.

To begin, you need your data split in two example sets: one for the people in your study and the other one for the measurements. What we are planning ahead is to build a way to iterate over the patient example set and read the measurements example set, filter by the patient ID and train a single model

I would create a new repository with this shape:

Figure 1: Data, Processes and Models, because we will have one model per person.

Once you have these, you can import your data. I created a simple CSV with Patient ID, Patient Name, Date, Systolic, Diastolic, Pulse. You can find that example one attached to this answer. Of course, that's not the same data you have, but it will help us setting up the rest of the example.

Adding data to the repository.

I imported my data to the repository under the name of Original Patient Data. You can use the Read CSV or Read Excel operators, but for this little example, I wanted my data inside the RapidMiner repository.

Then you should obtain a list of patients and a list of measurements separately. I built a process for this, named it Processes/01 Prepare Patient Data and saved it.

Figure 2: How to prepare data. The process is called "01 Prepare Patient Data" and is also attached.

Training your models

Finally, to train your models, you should make use of the Loop Examples operator in combination with the Extract Macro operator. Here is a picture:

Inside the Loop Examples operator, I have this:

Basically what I do is to extract the Patient ID and Patient Name from a Macro, read all the measures, filter examples per each patient, select only the data I need for my model, train my model with that data and store the results. In this case, I save each cluster model visualization generated from clustering data. I wouldn't want to take from you the joy of building stuff.

This model I made is called 02 Train Models, and the result is that it saves the models for each patient in the Models directory from your newly created repository.

From this, you should be able to train your model and apply the corrections needed but you have a working sample. I attached the repository too, so you can know how things work there.

Hope this helps,

Rodrigo.

rfuentealba · November 2018

Hi @n_alkassab,

BTW, further adjustments you can make:

Store your data once it is filtered, so you don't have to work with millions of records but just the ones you need on every second.
Store your data in a relational database so you don't have to redo everything every single time. I always recommend PostgreSQL for these things.

This process I built for you looks complex, but it has a dozen modifications you can make to get it done properly. I encourage you to experiment with these!

All the best,

Rodrigo.

n_alkassab · November 2018

Thank you soo much ! you saved me a ton of time I really appreciate it

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How to Analyze Time Data Per Person

Best Answers

Getting the repository prepared.

Adding data to the repository.

Training your models

Answers