The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Inferential Statistics - R, Python or Extension
michaelgloven
RapidMiner Certified Analyst, Member Posts: 46 Guru
As a partner, I am looking to use RapidMiner to integrate related inferential statistical methods such as hypothesis testing, confidence intervals, chi-square, etc. as part of a client implementation. I see there is a pay-for extension to do this work, but given the simplicity of these methods and unwanted burden of managing a paid for subscription to integrate these methods for only occasional use, is there a no-charge library of operators available, or do I need to just leverage R or Python and create my own? We only need a few methods for occasional use and I'd like to know if there are other options besides R, Python or the pay-for extension? Thanks!
Tagged:
0
Best Answer
-
michaelgloven RapidMiner Certified Analyst, Member Posts: 46 GuruI normally calculate the z test statistic by taking the sample mean (or median) - null hypothesis value (what I'm testing) all divided by the standard error assuming the constraints of the central limit theorem. So, for SE I usually use the sample standard deviation/sq root of samples. I then compare this result with the critical z value (1.65 for a one tail test and level of significance of 5%) to see if I should reject or accept the hypothesis. The math is quite simple, I was just looking for a simple operator to automate the work given how important testing our data and results is to our particular use cases. I believe I can make all of this work with your suggestions above.1
Answers
Dortmund, Germany
For each selected attribute a confidence of the Tukey Test is calculated. This confidence is defined as the distance between the current value to the median, divided by the distance of the lower/upper 'Tukey Test boundary' to the median.
So instead of mean and std_dev we take Inter quartile range and median. Median is more robust to outliers than mean, so i and many stats-people prefer it.
Can you have a look at Tukey test? We may just write the same stuff but with mean and std_dev if that's what you need.
Cheers,
Martin
Dortmund, Germany
Thank you in advance
Dortmund, Germany
in KS test, the KS statistics, p-value will be returned as Dr Martin mentioned above. What is the usual significant level used by you in practice?
KStest http://haifengl.github.io/api/java/smile/stat/hypothesis/KSTest.html
Hope it helps.
YY
My problem is that I was trying to automatize the steps in T Test and F test, and I need more than the p-value, like the statistics T and F,and the critical region.
Is there any way to calculate columns using the distributions F and T like in excel?
Thank you!