The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Get and set roles from a reference data set
christos_karras
Member Posts: 50 Guru
in Help
I have a data set with various features that have been excluded by marking them with a role (they were not removed because they can be useful for reference even if they should be excluded from most operators). Now, I would like to apply the same roles to another data set that has the same columns (on which no roles have been set)
In Python, I was able to do this to achieve my objective:
def rm_main(data,refdata): <br>
data.rm_metadata = refdata.rm_metadata <br>
return data, refdata
However, this can be slow for large data sets because the whole dataset is passed back and forth between Python and RapidMiner, which is not necessary in cases where the only thing I want to do is manipulate the columns metadata.
Is there a native way to do something similar with RapidMiner operators (or with an extension that adds such an operator)?
Otherwise, would the Groovy scripting operator be usable for this? I tried experimenting with it but could not find something that works.
Example (not functional, all attributes are seen to have a "null" role):
ExampleSet inputData = input[0]; <br>
ExampleSet referenceData = input[1]; <br>
ExampleSetMetaData inputMetaData = operator.getInputPorts().getPortByIndex(0).getMetaData();
ExampleSetMetaData referenceMetaData = operator.getInputPorts().getPortByIndex(1).getMetaData(); <br>
for (Attribute attribute: referenceData.getAttributes()) { <br>
AttributeMetaData referenceAttributeMetaData = referenceMetaData.getAttributeByName(attribute.getName())
String referenceRole = referenceAttributeMetaData.getRole() <br>
LogService.root.log(Level.INFO, "Role for " + attribute.getName() + ": " + referenceRole); <br>
}
Tagged:
0
Answers
- Filter removes all rows from the "reference dataset": a dataset where the columns have the roles I want to set
- First input of the Append operator is the "reference dataset", second input is the actual data, with the same columns but without any role set
The resulting dataset will use the metadata of the first input (with the roles), but will include all rows from the actual data.