[Solved] Range Join: Joining two tables --
Hello Rapidminer Community,
I'm trying to join two tables: one contains a start- and an end-date_time, the other only a timestamp.
The goal is, to classify, between which timestamp the event in table 2 has been
Example:
Table 1
ID # shift # start # end
1 ; "A" ; 2017-01-01 00:00:00 ; 2017-01-03 23:59:59
2 ; "B" ; 2017-01-04 00:00:00 ; 2017-01-05 23:59:59
3 ; "C" ; 2017-01-06 00:00:00 ; 2017-02-01 23:59:59
Table 2
ID # attr1 # timestamp
1; "error 1" ; 2017-01-01 07:00
2; "error 2" ; 2017-01-02 09:00
3; "error 4" ; 2017-01-06 07:00
Expected joined table
ID # attr1 # timestamp # shift (Table1)
1; "error 1" ; 2017-01-01 07:00 ; "A"
2; "error 2" ; 2017-01-02 09:00 ; "A"
3; "error 4" ; 2017-01-06 07:00 ; "C"
Any hint, how to get to the expected joined table??
Thanks in advance
Best Answers
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
Hi,
you could try using my generic join solution in a Groovy script:
https://datascientist.at/2016/06/generic-joins-in-rapidminer/#english
There's also a template building block:
http://community.rapidminer.com/t5/RapidMiner-Building-Blocks/Generic-join-script/m-p/33908#U33908
You would extend the script to compare for the timestamp between the start and the end times.
Regards,
Balázs
1 -
uenge-san Member Posts: 12 Contributor II
Amazing,
never heard about Groovy Script, but got it working in less than 30 minutes (and most of the time was consumed regarding a bad typo during copying your script into notepad++ ...)
Many, many thanks for your advice!
I copied the adjusted script below and highlighted the changes with bold fonts
/**
* Configuration
* es1AttName: join attribute name in ExampleSet 1
* es2AttName: join attribute name in ExampleSet 2
* joinFunc: Closure (function) to call on both arguments
*/
es1AttName1 = "start";
es1AttName2 = "end";
es2AttName = "TIMESTAMP";
def joinFunc = { e11, e12, e2 ->
(e11 <= e2 && e12 >= e2) || e2 == null
}
/**
* End of configuration
*/
//RapidMiner data structures
import com.rapidminer.tools.Ontology;
//Determines if an attribute is nominal
def isNominal = { att ->
type = att.getValueType();
type == Ontology.NOMINAL || type == Ontology.BINOMINAL || type == Ontology.POLYNOMINAL;
}
//Fetches the correct value type from the example, depending on the attribute type
def getExampleValue = { ex, att ->
if (isNominal(att)) {
ret = ex.getNominalValue(att);
} else {
ret = ex.getValue(att);
}
ret;
}
// First input example set
ExampleSet es1 = input[0];
Attributes es1Attributes = es1.getAttributes();
es1att1 = es1Attributes.get(es1AttName1);
es1att2 = es1Attributes.get(es1AttName2);
// Second input example set
ExampleSet es2 = input[1];
Attributes es2Attributes = es2.getAttributes();
es2att = es2Attributes.get(es2AttName);
int fields = es1Attributes.size() + es2Attributes.size();
//Arrays for attributes and field metadata
Attribute[] attributes= new Attribute[fields];
//Field counter
int fld = 0;
//Copy each attribute from ExampleSets 1 and 2 to the result attribute list
es1Attributes.each{f ->
attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
fld++;
}
es2Attributes.each{f ->
attributes[fld] = AttributeFactory.createAttribute(f.name, f.getValueType());
fld++;
}
MemoryExampleTable table = new MemoryExampleTable(attributes);
DataRowFactory rowFactory = new DataRowFactory(0);
// Loop ExampleSet 1
for ( Example e1 : es1 ) {
//Attribute value from ExampleSet 1, current example
e1val1 = getExampleValue(e1, es1att1);
e1val2 = getExampleValue(e1, es1att2);
for (Example e2: es2) {
e2val = getExampleValue(e2, es2att);
//Join condition - function configured at the top of the script
if (joinFunc(e1val1, e1val2, e2val)) {
data = new Object[fields];
fld = 0;
// Copy values from both example sets into the new record
es1Attributes.each{f ->
if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
data[fld] = e1.getNominalValue(f);
} else {
data[fld] = e1.getValue(f);
}
fld++;
}
es2Attributes.each{f ->
if (f.getValueType() == Ontology.NOMINAL || f.getValueType() == Ontology.BINOMINAL || f.getValueType() == Ontology.POLYNOMINAL) {
data[fld] = e2.getNominalValue(f);
} else {
data[fld] = e2.getValue(f);
}
fld++;
}
DataRow row = rowFactory.create(data, attributes);
table.addDataRow(row);
}
}
}
ExampleSet exampleSet = table.createExampleSet();
return(exampleSet);3
Answers
Make sure that the date-times are recongized as a date-time in RapidMiner. Then you can select Start Time in Table 1 to join with Timestamp in Table 2 in the Join operator. Just toggle of "Use attribute ID as key" option.
Hello T-Bone,
thanks for your reply.
Trying to solve the problem with your suggested solution I found out, that there was an import problem with the date_time in table 1 --> solved
so trying to use the Join Operator after this I only get one result out of the joining, which is the event, where the timestamp and the start date matches exactly.
But unfortunately didn't come to my expected table...
Any other hints??
BR
Are you using the right join type? Right, Left, Inner, or Outer?
Hi T-bone
I think so, the join type INNER should be the correct one, isn't it?
Thank you very much
Hi uenge-san,
glad it's working for you!
This will be a good reference for everyone trying to do range joins in RapidMiner.
Regards,
Balázs