You have read Top 10 Data Analytics/Data Science Interview Questions in our earlier post. Here are the next 10 most important Data Analytics/Data Science Interview Questions to help you with brief and to the point answers for your knowledge.
1. What do you mean by A/B testing?
A/B testing is a statistical hypothesis testing technique that is used for randomized experiments with any two variables A and B. A/B testing is ideal tool for businesses to develop and build new models for real world applications. A/B testing is a good option to increase an outcome of a business process by identifying a new change in business. It is also used to find out the best promotional or marketing strategies for a business unit.
2. Explain Hypothesis Testing
In hypothesis, the analyst tests the assumption of population parameter among a dataset. Hypothesis testing is vital for drawing conclusions about a population using the sample data. In hypothesis there are generally 2 hypothesis i.e. Null and Alternative hypothesis. The null hypothesis is often an initial claim that is based on previous analyses or specialized knowledge and alternative hypothesis is what you might believe to be true or hope to prove true.
3. What is the difference between Underfitting and Overfitting?
Underfitting and overfitting are indicators of poor machine learning model performance. Both are very common in the process of machine learning.
Overfitting takes place when a model that was trained to report with very high precision on the training data. This will stall the model to perform well on whenever it is provided with new data. It can be identified by the analyst whenever the model appears to be highly accurate but performs poorly in prediction.
Underfitting takes place when a model that has not been trained sufficiently. This could be due to low training times, lesser parameters or even lack of data. An underfit model will perform poorly on the training as well as new data
4. What is cluster sampling and Systematic sampling ?
These methods are used to create a population from a given dataset as a collection. This collection can be in form of clusters or intervals. Cluster sampling is used to create clusters of data whereas Systemic sampling is used to create fixed intervals from the available data. In cluster sampling on dividing the data into clusters, random samples from each data cluster is taken into sample set whereas in systematic sampling, a sample is taken from each interval.
5. What do you understand by univariate, bivariate and multivariate analysis ?
Univariate analysis techniques involve one dependent variable for comparison at a given point of time. For example pie charts, histograms and bar charts can be used to describe a univariate data. Some examples of nvariate data is age, height, weight etc.
Bivariate analysis is used when a statistical technique compares two variables. Scatterplots, regression analysis can be used in bivariate analysis.
Multivariate analysis is a more complex form of statistical analysis technique and used when there are more than two variables in the dataset. PCA technique can be used in multivariate analysis.
6. What is sensitivity and how to calculate it in any statistical project?
Sensitivity or Recall or True positive rate = (True Positive)/(True Positive + False Negative).
Higher the value of sensitivity, higher will be the value of true positive and lower value of false negative. The lower value of sensitivity would mean lower value of true positive and higher value of false negative. For example in the case of financial analysis techniques, models with high sensitivity will be preferred over others.
It is highly useful in plotting the ROC curve and is a vital metric for checking the efficacy of a machine learning model.
7. What are the steps in any data analytics project?
Following steps are followed in any data analytics development project. The life cycle of an analytics project generally consists of the following steps:
Business problem understanding
> Involves understanding of business requirements and expectations which are used to model the data.
Data understanding and preparation
> Concerns with understanding of the first hand data and to prepare the data for exploratory analysis. The analyst can check for omitted values, null values, duplicate data etc to make sense of the data available.
Data Modeling and Exploratory Analysis
This phase is used for building models and testing the data to get optimal results as per the business requirements. We can also use several statistical modeling methods to determine the best model for the data. Some example models that can be used in this step are linear regressions, decision trees etc.
> This checks data whether the data is correct or whether it needs further cleansing to get the information that needs to be delivered as per the business requirements.
Data Visualization and reports generation
> Visualize the findings of the analysis process carried out in the previous steps.
8. Why data cleaning is important / what is the use of data cleaning in analysis of data?
Data cleaning is helpful in data analysis because:
> It Increases accuracy of the model
> It helps in getting a standard format for the data which is useful when the data comes in real time or from different sources.
> Data cleaning helps to understand the data by any analyst or data scientist.
9. What is Data mining?
Data Mining is defined as the process of extracting information from large amounts of data. This information returned back to the analyst is in the form of knowledge. It is also used to analyse various patterns available in data. Some of the techniques used in data mining are cluster analysis, regression analysis etc. Some applications of data mining are stock market analysis, insurance etc.
10. What is principal component analysis in Machine Learning?
Read all about this question in detail here
So these are the top 20 Data Analytics/Data Science Interview Questions. We hope that you got your queries answered. Do let us know any other questions on this topic that have been bothering you. We would be glad to help.
If you have some other Data Analytics/Data Science Interview Questions and their answers that have not been mentioned in these two posts, do share them for our and our readers’ benefit.