Machine Learning Interview Questions

Machine Learning Engineers play a very important role in today’s Computing industry. As many graduates apply for highly coveted Machine Learning professional roles, a quick reference for the same will be handy to prepare for interviews. Here are some frequently asked samples of Machine Learning Interview Questions:

What is Precision and recall?

precision=TP/TP+FP or Total Positive/Total negatives

recall=TP/TP+FN
Read here to know the details of these two terms

What do you mean by information gain in Machine Learning?

Information gain is defined as the expected reduction in entropy due to partitioning as per the attribute of model. Ideally in a tree, we keep partitioning until an observation reaches to the purest form. It can be defined as:

Information gain= Entropy of parent-sum(weight% * entropy of child)

where weight%=Number of observations of a child/sum(observations in all child nodes)

Compare terms Model Accuracy and Model performance

Among Machine Learning Interview Questions, this one is the most commonly asked question. Model Accuracy: Model accuracy deals with the output of the model. It is defined as follows:

(classifications a model predicts correctly)/(total predictions).

Model accuracy is can be used to check the accuracy of the model.

Model performance: It refers is associated with the speed of the model. Model accuracy can be used to assess the model’s performance.

What do you understand by Entropy in Decision Tree?

Entropy gives the measure of impurity of data. For fully homogeneous data, the entropy is 0 and it is 1 when the data sample is equally divided. In decision tree, the data with most heterogeneity or maximum entropy is chosen.

What does Gini index mean in Decision tree algorithms?

Gini index is the measure of misclassification. It is usually applied to multi class classification problems. It is relatively faster to calculate that other metrics. Its value is ideally lower than entropy.

To decide the place of each node, splitting computing methods such as Gini index is used hierarchical structures.

Gini=1-∑i=1 to n (pi)^2

where pi is defined as the probability of an object being classified to a particular class. Also, take the least Gini index attribute as the root node.

How to find outliers?

Top 20 Data Science/Data Analytics Interview Questions- Part 1

Define collinearity and multicollinearity

Collinearity is defined as a linear association between twoexplanatory variables. Multicollinearity is related to multiple regression which defines the linearly related associations between two or more variables.

What is the relation between NumPy and SciPy

In short, NumPy library is part of SciPy ecosystem. NumPy library can be used for various array operations like indexing, reshaping, ordering, sorting etc whereas Scipy will keep be used for all the numerical code. NumPy also have many algebraic functions, transforms etc. SciPy contains more scientific modules and functions along with various advanced algebra functions. It also depends on the type of application to choose from the two. Also, if the use is for high-level scientific application, it is handy to keep both NumPy and SciPy

How to tackle a model with low bias and high variance?

In the case of low bias and high variance, over-fitting will be caused so methods like bagging and ensemble learning can be used to tackle this kind of model.

Compare L1 and L2 regularizations.

There are mainly two types of regularization techniques, namely Ridge Regression and Lasso Regression. Both techniques help to reduce the dimensionality of the data to get rid of over-fitting. The major difference is in the penalty term added to the loss function of both the techniques.

L1 is Lasso (Least Absolute Shrinkage Selector Operator):

It works well when data with high dimensionality or sparse data is available at the time of classification.
It adds absolute value of magnitude as a penalty term in the cost function.

L2 or Ridge:

It adds squared magnitude as a penalty term in cost function.
It is mostly used when we need non-sparse outputs or even when we need to predict a continuous output

How to tackle high variance?

Some ways to tackle high variance includes:

By performing regularization.
By reducing number of features.
Increasing size of training data size.
By trying to fit the model

Mean or Median? Which is bigger in left skew?

Mean is the largest, while the mode is the smallest. Mean also reflects the skewing to the most degree. If the median is more, the data distribution is skewed to the left. Also, the mean is always less than the mode.

How to do pattern analysis in Machine Learning?

Pattern recognition is the process of recognizing patterns in data by using machine learning algorithms The primary ideology used in pattern analysis is the involvement of classification of events based on previously available historical data, statistical information etc. Techniques and algorithms such as neural networks, Naive Bayes, Decision Tree, Support Vector Machines, clustering etc are frequently used in pattern analysis of data.

What is the ROC curve and how is it used ?

Read Here

What are the different performance metrics that can be used for classification and regression

Some important performance metrics are mentioned below:

F1-Score -F1 Score is the harmonic mean of precision and recall values for a classification problem. It is the measure of a model’s accuracy on a given dataset.

F1 Score=2*precision*recall/precision*recall

MSE (Mean Square Error): RME is the Mean Squared Error or MSW in statistics describes the closeness of aregression line is to a set ofdata points. Squaring is used to remove any negative signs. It’s called the mean squared error as you’re finding the average of a set of errors. The lower the MSE, the better the prediction. Mathematically it can be defined as:

MSE = (1/n) * Σ(actual – predicted)2

R-squared (Root mean squared error) – R-squared is a very crucial statistical measure to get a measure that how close the data are to the fitted regression line. It is also known as the coefficient of determination as well:

R-squared = Explained variation / Total variation

MAE (Mean Absolute Error)- MAE or Mean Absolute Error is defined as the average magnitude of all the errors in a set of predictions, without their corresponding directions. It is a loss function used in regression. It can be used in the cases where outliers need to be reduced.

So, these were the most possible 15 Machine Learning Interview Questions that you face in your first interaction for a ML job. If you have some addition to these, we would be glad to extend our list.

admin