Statistics is a mathematical discipline that deals with the collection, interpretation, analysis, presentation, and organization of data. It is a crucial concept in data science and a cornerstone for machine learning algorithms, assessing data robustness, and making informed decisions. In tech interviews, questions about statistics often evaluate a candidate’s ability to draw relevant insights from data, comprehend probability theory, implement statistical testing, and understand the various distributions and their implications.
Basic Statistical Concepts
- 1.
What is the difference between descriptive and inferential statistics?
Answer:Descriptive statistics aims to summarize and present the features of a given dataset, while inferential statistics leverages sample data to make estimates or test hypotheses about a larger population.
Descriptive Statistics
Descriptive statistics describe the key aspects or characteristics of a dataset:
- Measures of Central Tendency: Identify central or typical values in the dataset, typically using the mean, median, or mode.
- Measures of Spread or Dispersion: Indicate the variability or spread around the central value, often quantified by the range, standard deviation, or variance.
- Data Distribution: Categorizes the data distribution as normal, skewed, or otherwise and assists in visual representation.
- Shape of Data: Describes whether the data is symmetrical or skewed and the extent of that skewness.
- Correlation: Measures the relationship or lack thereof between two variables.
- Text Statistics: Summarizes verbal or written data using word frequencies, readabilities, etc.
Inferential Statistics
In contrast, inferential statistics extends findings from a subset of data (the sample) to make inferences about an entire population.
- Hypothesis Testing: Allows researchers to compare data to an assumed or expected distribution, indicating whether a finding is likely due to chance or not.
- Confidence Intervals: Provides a range within which the true population value is likely to fall.
- Regression Analysis: Predicts the values of dependent variables using one or more independent variables.
- Probability: Helps measure uncertainty and likelihood, forming the basis for many inferential statistical tools.
- Sampling Techniques: Guides researchers in selecting appropriate samples to generalize findings to a wider population.
Visual Representation
Descriptive statistics are often visually presented through:
- Histograms
- Box plots
- Bar charts
- Scatter plots
- Pie charts
Inferential statistics might lead to more abstract visualizations like:
- Confidence interval plots
- Probability distributions
- Forest plots
- Receiver operating characteristic (ROC) curves
Code Example: Descriptive vs. Inferential Stats
Here is the Python code:
import pandas as pd from scipy import stats # Load example data data = pd.read_csv('example_data.csv') # Perform descriptive statistics print(data.describe()) # Perform inferential statistics sample = data.sample(30) # Obtain a random sample t_stat, p_val = stats.ttest_1samp(sample, 10) print(f'T-statistic: {t_stat}, p-value: {p_val}') - 2.
Define and distinguish between population and sample in statistics.
Answer: - 3.
Explain what a “distribution” is in statistics, and give examples of common distributions.
Answer: - 4.
What is the Central Limit Theorem and why is it important in statistics?
Answer: - 5.
Describe what a p-value is and what it signifies about the statistical significance of a result.
Answer: - 6.
What does the term “statistical power” refer to?
Answer: - 7.
Explain the concepts of Type I and Type II errors in hypothesis testing.
Answer: - 8.
What is the significance level in a hypothesis test and how is it chosen?
Answer: - 9.
Define confidence interval and its importance in statistics.
Answer: - 10.
What is a null hypothesis and an alternative hypothesis?
Answer:
Probability Theory and Probability Distributions
- 11.
What is Bayes’ Theorem, and how is it used in statistics?
Answer: - 12.
Describe the difference between discrete and continuous probability distributions.
Answer: - 13.
Explain the properties of a Normal distribution.
Answer: - 14.
What is the Law of Large Numbers, and how does it relate to statistics?
Answer: - 15.
What is the role of the Binomial distribution in statistics?
Answer: