[Solved] Range Join: Joining two tables --

uenge-san · May 2017

Hello Rapidminer Community,

I'm trying to join two tables: one contains a start- and an end-date_time, the other only a timestamp.

The goal is, to classify, between which timestamp the event in table 2 has been

Example:

Table 1

ID # shift # start # end

1 ; "A" ; 2017-01-01 00:00:00 ; 2017-01-03 23:59:59

2 ; "B" ; 2017-01-04 00:00:00 ; 2017-01-05 23:59:59

3 ; "C" ; 2017-01-06 00:00:00 ; 2017-02-01 23:59:59

Table 2

ID # attr1 # timestamp

1; "error 1" ; 2017-01-01 07:00

2; "error 2" ; 2017-01-02 09:00

3; "error 4" ; 2017-01-06 07:00

Expected joined table

ID # attr1 # timestamp # shift (Table1)

1; "error 1" ; 2017-01-01 07:00 ; "A"

2; "error 2" ; 2017-01-02 09:00 ; "A"

3; "error 4" ; 2017-01-06 07:00 ; "C"

Any hint, how to get to the expected joined table??

Thanks in advance

BalazsBarany · May 2017

Hi,

you could try using my generic join solution in a Groovy script:

https://datascientist.at/2016/06/generic-joins-in-rapidminer/#english

There's also a template building block:

http://community.rapidminer.com/t5/RapidMiner-Building-Blocks/Generic-join-script/m-p/33908#U33908

You would extend the script to compare for the timestamp between the start and the end times.

Regards,

Balázs

uenge-san · May 2017

Amazing,

never heard about Groovy Script, but got it working in less than 30 minutes (and most of the time was consumed regarding a bad typo during copying your script into notepad++ ...)

Many, many thanks for your advice!

I copied the adjusted script below and highlighted the changes with bold fonts

/**
 * Configuration
 * es1AttName: join attribute name in ExampleSet 1
 * es2AttName: join attribute name in ExampleSet 2
 * joinFunc: Closure (function) to call on both arguments
 */

es1AttName1 = "start";
es1AttName2 = "end";
es2AttName = "TIMESTAMP";

def joinFunc = { e11, e12, e2 -> 
	(e11 <= e2 && e12 >= e2) || e2 == null
}

/**
 * End of configuration
 */

//RapidMiner data structures
import com.rapidminer.tools.Ontology;

//Determines if an attribute is nominal
def isNominal = { att ->
	type = att.getValueType();
	type == Ontology.NOMINAL || type == Ontology.BINOMINAL || type == Ontology.POLYNOMINAL;
}

//Fetches the correct value type from the example, depending on the attribute type 
def getExampleValue = { ex, att ->
	if (isNominal(att)) {
		ret = ex.getNominalValue(att);
	} else {
		ret = ex.getValue(att);
	}	
	ret;
}

// First input example set 
ExampleSet es1 = input[0];
Attributes es1Attributes = es1.getAttributes();
es1att1 = es1Attributes.get(es1AttName1);
es1att2 = es1Attributes.get(es1AttName2);
// Second input example set
ExampleSet es2 = input[1];
Attributes es2Attributes = es2.getAttributes();
es2att = es2Attributes.get(es2AttName);

int fields = es1Attributes.size() + es2Attributes.size();

//Arrays for attributes and field metadata
Attribute[] attributes= new Attribute[fields];

//Field counter
int fld = 0;

//Copy each attribute from ExampleSets 1 and 2 to the result attribute list
es1Attributes.each{f ->
	attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
	fld++;
}
es2Attributes.each{f ->
	attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
	fld++;
}

MemoryExampleTable table = new MemoryExampleTable(attributes);
DataRowFactory rowFactory = new DataRowFactory(0);

// Loop ExampleSet 1
for ( Example e1 : es1 ) {

	//Attribute value from ExampleSet 1, current example
e1val1 = getExampleValue(e1, es1att1);
	e1val2	= getExampleValue(e1, es1att2);

	for (Example e2: es2) {

		e2val = getExampleValue(e2, es2att);

		//Join condition - function configured at the top of the script
if (joinFunc(e1val1, e1val2, e2val)) {

			data = new Object[fields];
			fld = 0;

			// Copy values from both example sets into the new record
			es1Attributes.each{f ->
				if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
					data[fld] = e1.getNominalValue(f);
				} else {
					data[fld] = e1.getValue(f);
				}
				fld++;
			}
			es2Attributes.each{f ->
				if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
					data[fld] = e2.getNominalValue(f);
				} else {
					data[fld] = e2.getValue(f);
				}
				fld++;
			}

			DataRow row = rowFactory.create(data, attributes); 
     		table.addDataRow(row);	
		}
	}
}

ExampleSet exampleSet = table.createExampleSet();

return(exampleSet);

Thomas_Ott · May 2017

Make sure that the date-times are recongized as a date-time in RapidMiner. Then you can select Start Time in Table 1 to join with Timestamp in Table 2 in the Join operator. Just toggle of "Use attribute ID as key" option.

uenge-san · May 2017

Hello T-Bone,

thanks for your reply.

Trying to solve the problem with your suggested solution I found out, that there was an import problem with the date_time in table 1 --> solved

so trying to use the Join Operator after this I only get one result out of the joining, which is the event, where the timestamp and the start date matches exactly.

But unfortunately didn't come to my expected table...

Any other hints??

BR

Thomas_Ott · May 2017

Are you using the right join type? Right, Left, Inner, or Outer?

uenge-san · May 2017

Hi T-bone

I think so, the join type INNER should be the correct one, isn't it?

Thank you very much

BalazsBarany · May 2017

Hi uenge-san,

glad it's working for you!

This will be a good reference for everyone trying to do range joins in RapidMiner.

Regards,

Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[Solved] Range Join: Joining two tables --

Best Answers

Answers