Home » Microsoft » DP-100 v.2 » You are performing a filter-based feature selection for a dataset to build a multi-class classifier by using Azure Machine Learning Studio.
You are performing a filter-based feature selection for a dataset to build a multi-class classifier by using Azure Machine Learning Studio.
The dataset contains categorical features that are highly correlated to the output label column.
You need to select the appropriate feature scoring statistical method to identify the key predictors.
Which method should you use?
A. Kendall correlation
B. Spearman correlation
C. Chi-squared
D. Pearson correlation
Correct Answer: D
Explanation/Reference:
Explanation:
Pearson’s correlation statistic, or Pearson’s correlation coefficient, is also known in statistical models as the r value. For any two variables, it returns a value that indicates the strength of the correlation Pearson’s correlation coefficient is the test statistics that measures the statistical relationship, or association, between two continuous variables. It is known as the best method of measuring the association between variables of interest because it is based on the method of covariance. It gives information about the magnitude of the association, or correlation, as well as the direction of the relationship.
Incorrect Answers:
C: The two-way chi-squared test is a statistical method that measures how close expected values are to actual results.
Reference:
https://www.statisticssolutions.com/pearsons-correlation-coefficient/
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/filter-based-feature-selectionhttps://www.statisticssolutions.com/pearsons-correlation-coefficient/
Correct Answer: C
C