Here’s a comprehensive definition and examples for each of the terms you've listed:
Statistical Question
A statistical question is one that anticipates variability in the data and can be answered by collecting data.
- Example: "What is the average height of students in a high school?" This question implies that heights will vary among students.
Biased vs Unbiased
-
Biased: A biased sample is one that is not representative of the population from which it is drawn. It leads to systematic errors in data collection.
- Example: Surveying only college students about their study habits to understand all adult study habits.
-
Unbiased: An unbiased sample accurately reflects the population and avoids systematic errors.
- Example: Randomly selecting adults from a population to survey their study habits.
Probabilistic vs Deterministic Interpretation
-
Probabilistic Interpretation: Involves uncertainty and variability; outcomes are random and can be described using probabilities.
- Example: Predicting the chance of rain tomorrow (50% chance).
-
Deterministic Interpretation: Involves certainty; given an input, there is a predictable outcome with no variability.
- Example: If you drop a ball from a height, you can predict precisely how long it will take to hit the ground assuming no air resistance.
Extrapolation Using a Scatter Plot
Extrapolation involves estimating values beyond the range of the data based on the observed trend.
- Example: If you have data on the sales of a product over the last five years plotted on a scatter plot and you see a positive trend, you can extend that trend line to predict sales for the next few years.
Making Inferences from a Scatter Plot
Making inferences from a scatter plot involves drawing conclusions about the relationship or correlation between two variables depicted on the plot.
- Example: Observing that as study time increases, grades tend to increase, suggesting a positive correlation.
Making Predictions from a Scatter Plot
Predictions are made by using the trend line (or line of best fit) to estimate values for new observations based on established data.
- Example: If a scatter plot shows the relationship between hours studied (X-axis) and exam scores (Y-axis), you could use the trend line to predict the score for a student who studies for 10 hours.
Qualitative Variable
Qualitative variables are non-numeric categories or attributes used to represent characteristics or traits.
- Example: Colors (red, blue, green), types of cuisine (Italian, Mexican, Chinese).
Quantitative Variable
Quantitative variables are numeric and can be measured or counted.
- Example: Height (170 cm), weight (75 kg), age (30 years).
Trend Line of Best Fit
A trend line of best fit (or regression line) is a straight line that best represents the data on a scatter plot, indicating the general direction of the data points.
- Example: A line drawn through a scatter plot of home prices vs. square footage that minimizes the distance between the line and all the data points.
Negative vs Positive Association
-
Negative Association: As one variable increases, the other variable tends to decrease.
- Example: The relationship between hours spent playing video games and grades in school, where increased gaming correlates with lower grades.
-
Positive Association: As one variable increases, the other variable also tends to increase.
- Example: The relationship between the number of hours studied and test scores.
No Association
No association indicates that there is no discernible relationship between two variables; changes in one variable do not predict changes in the other.
- Example: The amount of time someone spends exercising and their favorite type of music.
Linear vs Nonlinear Relationship
-
Linear Relationship: The relationship between two variables can be described with a straight line, indicating a constant rate of change.
- Example: The relationship between temperature in Celsius and Fahrenheit.
-
Nonlinear Relationship: The relationship between variables is not constant and cannot be described with a straight line; it may curve.
- Example: The relationship between speed and fuel consumption; as speed increases, fuel efficiency may change in a non-linear manner.
Bivariate Data
Bivariate data involves two variables that can be analyzed to determine relationships between them.
- Example: The height and weight of a group of individuals.
Cluster
A cluster refers to a group of data points in a scatter plot that are closely packed together, indicating a concentration of values in that area.
- Example: In a scatter plot of test scores versus hours studied, one cluster might represent students who studied between 5 to 10 hours.
Outlier
An outlier is a data point that is significantly different from other observations, often appearing distant from the main group of data in a plot.
- Example: In a scatter plot of students' heights and weights, a very tall or extremely short person may appear as an outlier if they do not fit the general pattern of the other data points.