The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Java iterate over Parameters and write out clusterModels"
Hi together,
i want to evaluate different parameters for a clustering algorithm (dbscan with epsilon and minPoints)
I wrote a little java code that includes two for loops (for each parameter)
outside of the loops i initialize rapidminer and i get the Operators from an Processfile
From there i get a reference to the DBScanOperator in the Processfile
on this reference I change the parameters epsilon and minpoints in the loops.
first i set the minpoints and loop over different epsilon values.
with the resulting ClusterModel i need the average number of examples in the cluster and how many clusters are generated
these values i write out to a file (for plotting in Excel) for each epsilon value.
then a new file is created for a new minPoints parameter and i loop again over the epsilons.
My problem is, that all data that i write out is exactly the same..
Must i reinitalize Rapidminer or clear a cache or something in the loop in order to get the new data and no leftovers?
here is my code:
thanks in advance
i want to evaluate different parameters for a clustering algorithm (dbscan with epsilon and minPoints)
I wrote a little java code that includes two for loops (for each parameter)
outside of the loops i initialize rapidminer and i get the Operators from an Processfile
From there i get a reference to the DBScanOperator in the Processfile
on this reference I change the parameters epsilon and minpoints in the loops.
first i set the minpoints and loop over different epsilon values.
with the resulting ClusterModel i need the average number of examples in the cluster and how many clusters are generated
these values i write out to a file (for plotting in Excel) for each epsilon value.
then a new file is created for a new minPoints parameter and i loop again over the epsilons.
My problem is, that all data that i write out is exactly the same..
Must i reinitalize Rapidminer or clear a cache or something in the loop in order to get the new data and no leftovers?
here is my code:
do I access rapidMiner API or some objects wrong? Is there no direct reference to the objects so that a change results in new parameters in the processfile?
//Method
private void runClusterNumberTest(String processFile) {
RapidMiner.initRM();
Process dbScanRootProcess = setProcessFile(processFile);
ArffExampleSource arffSource = (ArffExampleSource) dbScanRootProcess
.getOperator("ArffExampleSource");
DBScan dbscanClusterAlg = (DBScan) dbScanRootProcess
.getOperator("DBScanClustering");
ClusterModelWriter clusterModWriter = (ClusterModelWriter) dbScanRootProcess
.getOperator("ClusterModelWriter");
//do the clustering for just a little subset
arffSource.setParameter("sample_ratio", Double
.toString(this.percentageOfData));
// loop over all epsilons and minPoints that need to be evaluated
for (double mPts = this.minPtsStart; mPts <= this.minPtsMax; mPts += this.stepmPts) {
// get a new File for writing the data into
BufferedWriter resultWriter = setupResultFile(this.outputFolder,
mPts);
//set the min Points parameter
dbscanClusterAlg.setParameter("min_Points", Double.toString(mPts));
for (double eps = this.epsilonStart; eps <= this.epsilonMax; eps += this.stepEps) {
System.out.println("Bearbeite Konfiguration mit Eps="+eps+" und mPts="+mPts);
resultWriter.append( eps + ";");
dbscanClusterAlg.setParameter("epsilon", Double.toString(eps));
String clusModelOutpf = this.outputFolder + File.separator
+ "ClusterModels" + File.separator + "cmDBSCAN_Eps_"
+ eps + "mPts_" + mPts +"_PercData_"+this.percentageOfData +".clm";
clusterModWriter.setParameter("cluster_model_file",
clusModelOutpf);
// RUN the Process
IOContainer rootIOContainer = new IOContainer();
System.out.println("Rufe RM-Clusteringprozess auf...");
rootIOContainer = dbScanRootProcess.run();
ClusterModel clusterModel=null;
clusterModel = rootIOContainer.get(ClusterModel.class);
System.out.println("Verarbeite Daten aus Clustering...");
int clustercount= clusterModel.getNumberOfClusters();
Collection<Cluster> clusters = clusterModel.getClusters();
//... read only data from clusters then the new loop begins
thanks in advance
Tagged:
0
Answers
tried to achieve the parameter iteration
with the GUI by using the ParameterIteration Operator and I want to write out the clusterModels
but when I try to set the model_file Value I cannot set an absolute Path with : in it (C:\test\test.clm)
the system throws an parsing Error of the String.
Is there a way to get the current parametervalues into the filename of the clusterModelWriter?
for example ClusterModel_Param1_Param1value.clm ?
your code should work. Perhabs you should test it with another parameter variation like the number of clusters. DBScan does only change its behavior in a very small window of possible parameter values. Thats might be the reason, why all clustermodels are the same.
As far as I know there is no possibility to get the parameter values into the filename if you don't use macros. Although you could simply set the macros and use their values as parameters, this might become unhandy if numerical values should be used, because macro definition is always nominal and every value has to be inserted.
There is the predefined macro %{a}, which is filled with the applycount of the current operator. Perhabs this is enough in combination with a process log, which could save the table translating the number into parameter values.
Greetings,
Sebastian
i wrote out the data by using the iteration macro %{a} and set the parameters in the filename manually (luckily there was only a few).
I have seen that dbscan has that behaviour. very difficult to determine good values for epsilon and minPoints when the input dimension is high.