Skip to content

Qms 102 Homework Help

Chapter 1 QuanTtaTve data is data values that can be expressed in numerical values. These values can be either con±nuous or discrete. Discrete variables have numerical values that arise from a coun±ng process. ex. the number of premium channels subscribed to. ConTnuous variables produce numerical responses that arise from a measuring process. ex. the ±me you wait at a bank for a teller. QuanTtaTve conTnuous data is data values that are within some reasonable range. ex. "How much ±me (in hours) do you usually spend studying per week? QuanTtaTve discrete data is data in whole numbers. QualitaTve data consists of data values that describe the characteris±cs or features of an item. non-numerical in nature. ex. rate a teacher: good, bad, neutral. Nominal data that have no par±cular order or ranking in their categories. May be both numerical or non-numerical, but are s±ll qualita±ve. ex. Postal Code: L3S 1B7, N5X 0M3, or Sport: Tennis, Golf, Soccer. Ordinal data is similar to nominal but have a natural order to them. ex. Ranking, Grades, Ra±ng, Medals. Interval data are quan±ta±ve data that are numeric, and also dis±nguished by having units of measurement. Can be either discrete or con±nuous. ex. weather : 0°c, 11°c, or Calendar scale: diFerence between ²eb 2 and ²eb 8. RaTo data have the characteris±c of interval data, but the "0" value does mean the absence of the characteris±c being measured. 0 = nothing. can be con±nuous or discrete. ex. discrete. ex. discrete - # of vaca±ons taken within 10 years; con±nuous

Unformatted text preview: QMS 102 Recap Calculator Casio fx-9750GII • Make sure that the batteries are loaded • Press the AC/ON key to turn on the calculator • Press the MENU key and then use the cursor ( ) key to highlight the STAT icon • Then press the EXE key which acts like the Return key on a computer • Enter the numbers 15, 24, 20, 35 & 60 under List 1 Statistics • Variable – name chosen to describe the data collected • Population – includes all the items of persons in your study or research • Census – a set of data that includes all members of a population • Sample – a subset of a population • Sample size – the number of items/ persons in a sample; the sample size is denoted as n • Population size – the number of items/ persons in a population; the population size is denoted as N Statistics • Qualitative data can be classified into two scales: nominal & ordinal • Quantitative data can be classified into two scales: interval & ratio – Nominal data has no particular order or ranking – Ordinal data has a natural order to the categories – Interval data have units of measurement – Ratio data has a zero indicating the absence of the characteristic being measured – Downgrading a scale means transforming a higher-level scale to a lower-level scale Stem and leaf • Stem-and-leaf plots – Summarises the distribution (shape) of data – Retains the values of the data – Stems are the numbers on the left (8 in 84) – Leaves are the numbers on the right (4 in 84) – Appearance is like a horizontal bar chart – Usually the chart has a header to indicate what the data is about – Below is a header indicating how much each stem is worth Stem-and-Leaf • Objective: Minimise perception biases and have an aesthetic appearance • Stems: – Between 6 and 13 stems – Consecutive or repeated numbers – Indicate stem units (or they will be at face value) – At least one leaf associated with the first and last stem respectively – Some intermediate stem values may not have a leaf Stem-and-Leaf • Leaves – Next single digit after the stems – When stems are repeated twice, the leaf values for the first repetition are 0 to 4; for the second repetition are 5 to 9 – When stems are repeated five times, the leaf values are 0 & 1, 2 & 3, 4 & 5, 6 & 7, 8 & 9 – Order is reversed for negative stems – No rounding of – Leaf values written in ascending order when positive, descending order when negative – Even spacing, no punctuation between leaves Stem-and-leaf plot • Sample of 18 students provide their age in years: 16, 19, 22, 17, 19, 25, 17, 20, 27, 18, 20, 32, 18, 21, 38, 18, 22, 42 • Sort the ages from lowest to highest 16, 17, 17, 18, 18 ,18, 19, 19, 20, 20, 21, 22, 22, 25, 27, 32, 38, 42 • Draw a stem-and-leaf plot by using the tens digit as the stems and units digits as the leaves 1 67788899 2 0012257 3 28 4 2 • Based on the guidelines discussed earlier, what is wrong with this? Stem-and-leaf plot • • There should be between 6 and 13 stems We can either split the leaves associated with a stem into two (0 – 4 & 5 – 9) groups or five (0&1, 2 &3, 4&5, 6&7, 8&9) groups 1 2 2 3 3 4 67788899 00122 57 2 8 2 • Looking at it, what conclusions can we draw? Calculator • How do we sort data using the calculator? In STAT mode enter 16, 19, 22, 17, 19, 25, 17, 20, 27 being sure to press EXE after every number • Press F6 to go to the next screen so you see the option TOOL above F1 • Press F1 to select tool and then press F1 again to select SRT-A or SORT • Input 1 when it asks how many lists and input the list number when it asks you which list? • Press EXE to get 16 17 17 19 19 20 22 25 27 Back to Stem-and-leaf Plot 9.8, 9.49, 8.71, 8.71, 6.87, 6.8, 6.04, 2.19, 0.9, 0.79 • First, sort the data to get the minimum (0.79) and maximum (9.8) values • Next, count the number of stems that will result if we use the hundredths as our leaves (starts from 0.7 as our lowest stem, increases by tenths to 9.8 resulting in 92 stems (9.8 - 0.7)) – Use the tenths as the leaves, with no rounding Back to Stem-and-leaf Plot 9.8, 9.49, 8.71, 8.71, 6.87, 6.8, 6.04, 2.19, 0.9, 0.79 0 1 2 3 4 5 6 7 8 9 79 1 088 77 48 Back to Stem-and-leaf Plot 0.0, -3.12, -1.12, -2.51, -1.76, -1.36, 1.38, -0.71, -0.3, 1.65 • Maximum is 1.65, minimum is -3.12 • Using hundredths as our leaves will give too many stems • Use tenths as the leaves, use 0 and -0 for -1 < 0 < 1 -3 1 -2 5 -1 7 3 1 -0 7 3 0 0 1 36 Frequency Distributions • Used for quantitative data to indicate the distribution by indicating how many data points are in each class interval – Aim for five to ten classes – For the upper limit use the words “and under” • “10 and under 15” includes 10, 11, 12, 13, 14 NOT 15 • The same can be “10 to 14” – Close ended classes have lower & upper boundaries – Open ended classes are missing ONE of the boundaries – Class width is the diference between lower & upper boundaries Frequency Distributions • Classes should not overlap with no gaps • All classes must have the same width • The lowest and highest classes cannot be empty • Boundaries must look like the data (same number of digits) • Boundaries must be multiples of class width Class Widths • Find the diference between the minimum and the maximum • Divide by the minimum number of classes • This will yield the estimated class width – Magic numbers 1, 2, 2.5 (if there are decimals), 5 – Multiply each by 10 and multiples of 10 • Divide by 10 and multiples of 10 – Identify the two numbers between which your estimated class width lies – Use either of those two to construct your class width Class Width 1, 2, 2.5, 5 10, 20, 25, 50 100, 200, 250, 500 Max is 1093, min is 684 Est CW = (1093 – 684)/5 = 81.8 which lies between 50 and 100 Max is 53.9, min is 22.5 Est CW = (53.9 – 22.5)/5 = 6.28 which lies between 5 and 10 Class Width • For the first class pick the multiple of the class width which is just below the minimum • Then keep adding the class widths to get successive upper boundaries • Until your upper boundary is just above the maximum Cumulative Distributions & Ogive • Ogive is a line graph of cumulative frequency distribution – Starts from zero in the bottom left hand corner – Finishes at 100% in the top right hand corner • Cumulative frequency distribution is the sum of the current frequency and all previous frequencies – It is easier to understand by using percentages Cumulative Distributions followed by Ogives Salary Comp Cumu Comp Cumu (‘000s) 1 1 2 2 20 & under 25 15 15 11 11 25 & under 30 26 41 26 37 30 & under 35 28 69 35 72 35 & under 40 18 87 16 88 40 & under 45 9 96 9 97 45 & under 50 4 100 2 100 TOTAL 100 100 Cumu 1 100 90 80 70 60 50 40 30 20 10 0 Cumu 2 Ogives and Percentiles • Percentile is the value below which a certain percent of the observations fall – The kth percentile is the value such that at most k% of the data is lower than the value and (100 – k)% of the data is higher than the value Percentiles • Arrange the data into an ascending data array • Calculate the rank of the kth percentile – r = round to nearest half (n*k/100 + ½) where n is number of observations in the dataset and k is percent of observations less than or equal to the needed percentile – Round .25 and .75 down if k < 50 – Round .25 and .75 up if k > 50 – If k = 50 it is the rank for the median • Quartiles are the 25th (1st), 50th (2nd) [also called the median], 75th (3rd) percentiles Figuring out quartiles (percentiles) 4 6 8 10 16 25th percentile = (5 * 25/100) + ½ = down to 1.5 1.5 is between 1 and 2, so take the average of the first two = (4 + 6) / 2 = 5 50th percentile = (5 * 50/100) + ½ = 3 3rd rank is 8 75th percentile = (5 * 75/100) + ½ = up to 4.5 4.5 is between 4 and 5, so take the average of the last two = (10 + 16) / 2 = 13 Volume in 2000 ml bottle 1999.5 2000 2000.5 2001 2001.5 2002 2002.5 2003 2003.5 Measure of Central Tendency • A single value to represent the dataset • Mean, median & mode – Mode is the value that appears most frequently – Median is the middle value in a dataset that is arranged in order – Mean is usually the arithmetic mean or “average” in lay terms Mean, Median & Mode • Nominal – only mode can be found • Ordinal – mode and median can be found • Interval & Ratio – mode, median and mean can be found Mean • x̅ = / n = µ Ʃx • Add all the numbers in the dataset and divide by the number (count) of numbers – Sample is (English for sample statistic) – Population is µ (Greek for population parameter) Grouped Data Mean • If a frequency table is provided – Find the midpoint of each class – Multiply the midpoint by the frequency (occurrence) – Add all the numbers and divide by the total frequency Price of Parking No. of Cars $2 and under $4 20 S4 and under $6 37 $6 and under $10 29 $10 and under $15 15 $15 and under $20 8 Mean = $(3*20)+(5*37)+(8*29)+(12.5*15)+(17.5*8)/ (20+37+29+15+8) = $7.38 Using the Calculator Power AC/ON the calculator Press MENU and select the STAT mode by using the EXE key Enter the numbers 3, 5, 8, 12.5 & 17.5 pressing the EXE key after each Press the key under CALC Press the key under 1VAR to get the statistics for the list of numbers More of Using the Calculator Enter the frequencies under List 2 making sure that each frequency is on the same line as the relevant number in List 1 Press the button under the word CALC 20 5 Press 37 the button under the word SET 8 Press 29 the button under the word LIST 12.5Specify 15 that 1Var Xlist is 1 17.5 8 Specify that 1Var Freq is 2 Press EXIT and then press the button under 1VAR to get the grouped data mean 3 Mean with Relative Frequencies Price of Parking $2 and under $4 S4 and under $6 $6 and under $10 $10 and under $15 $15 and under $20 TOTAL No. of Multiply Relative the mid-point Cars Frequency 20 of 0.18 each class by the 37 29 15 8 109 frequency 0.34 = (3*0.18)+(5*0.34)+ 0.27 (8*0.27)+(12.5*0.14)+ 0.14 (17.5*0.07) = 7.38 0.07 Also called Expected Value 1 Median • Median is the middle number when the data is arranged in order (for an odd number of numbers) Use the formula i = (n+1)/2 to find the position of the median • In case the formula results in x.5, then use the average of the number positioned x and the number positioned x+1 – In other words, when there is an even count of numbers use the average of the two in the middle Mode • The mode is the value that appears most frequently – There may not be any mode in case each number appears an equal number of times in the data – There may be multiple modes in case the most frequently occurring number appears as frequently as another number Measures of Variability • A single number to describe how spread out the data is – Range R = Maximum – Minimum – Interquartile Range IQR = 75th percentile – 25th percentile – Variance is σ2 = Σ (x - )2 / n – Standard deviation is square root of variance – Sample variance is s2 = Σ (x - )2 / (n – 1) – Sample standard deviation is square root of sample variance Standard deviation 1, 2 , 3, 4, 5 Mean is 3 Observation Difference Squared diff 1 1 – 3 = -2 -2*-2 = 4 2 2 – 3 = -1 -1*-1 = 1 3 3–3=0 0 4 4 – 3 =1 1*1=1 5 5 – 3 =2 2*2=4 Sum of squares = 4+1+0+1+4 = 10 Variance = 10/5 = 2 Standard deviation = √2 = 1.414 Coefficient of Variation • Coefficient of variation is a relative measure of variability – It is the standard deviation divided by the mean, multiplied by 100 – Expressed as percentage – (σ/ µ * 100)% for population – (s/ x * 100)% for sample – Previous example is (1.414/3)*100 = 47.13% Shape of Data • Distribution of data values along the x-axis – Symmetrical is the data above the mean are distributed exactly the same as the data below the mean (mirror image) – Skewed is the opposite • Mean < Median is negative or left skewed • Mean > Median is positive or right skewed • Mean = Median is symmetrical of zero skewness Skewness Statistic Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median Mean > Median Skewness < 0 Statistic 0 >0 Box-Whisker Plot • You need five numbers: – (i) Minimum (ii) First quartile (iii) Median (iv) Third quartile (v) Maximum – Also known as the five-number summary • Use the first quartile and third quartile as the ends of the boxes – The median is drawn in the middle of the box – Connect the first quartile and third quartile with top and bottom lines to form the box • Finally connect the minimum to the first quartile with a line & connect the maximum to the third quartile with a line – giving you the whiskers • Indicate where the mean is with a + sign Box-Whisker Plot • Draw the fences as follows – IQR is Inter Quartile Range = Third quartile – First quartile – Right Inner Fence = Third quartile + (1.5 x IQR) – Right Outer Fence = Right Inner Fence + (1.5 x IQR) – Left Inner Fence = First Quartile – (1.5 x IQR) – Left Outer Fence = Left Inner Fence – (1.5 x IQR) – Suspect outliers are between the inner and outer fences & are plotted with an o – Outliers are beyond the outer fences and plotted with an * Box-Whisker Plot Xsmallest Q1 Left-Skewed Q1 Q2 Q3 Median Symmetric Q1 Q2 Q3 Q3 Xlargest Right-Skewed Q1 Q2 Q3 Scatter Plots and Correlation Y X r = -1 Y X r = -.6 Y Y X r = +1 X r = +.3 X r=0 Descriptive Statistics Probability Inferential Statistics • Probability is the numeric value representing the chance, likelihood or possibility that a particular event will occur – Varies from 0 (impossible) to 1 (certain) – Outcome of a random process is called an event – Collection of all possible events is called a sample space Types of Probability • Probability = No. of favourable outcomes ÷ Total No. of outcomes • IN THE LONG RUN • A priori (from earlier) – based on prior knowledge • Empirical – based on observed data • Subjective – based on individual opinions Example Plan to Purchase Yes No TOTAL Actually Purchase Yes 200 100 300 Actually Purchase No 50 650 700 TOTAL 250 750 1000 (i) Probability of Actual Purchase given that there was a Plan to Purchase = 200/ 250 (ii) Probability of Plan to Purchase = 250/ 1000 (iii)Probability of Actual Purchase = 300/ 1000 (iv)Probability of No Purchase given that there was Plan to Purchase = 50/ 250 (v) Probability of Actual Purchase given there was No Plan to Purchase = 100/ 750 More Definitions • Mutually exclusive are events that cannot happen simultaneously – Heads and Tails in a toss of a coin are mutually exclusive – 1 and 2 in a roll of a die are mutually exclusive • Collectively exhaustive are the complete set of events that can occur – Head and Tails are collectively exhaustive – 1 and 2 are not collectively exhaustive – 1, 2, 3 ,4, 5, 6 are collectively exhaustive in a roll of a die Computing Joint and Marginal Probabilities • The probability of a joint event, A and B: number of outcomes satisfying A and B P( A and B) total number of elementary outcomes • Computing a marginal (or simple) probability: P(A) P(A and B1 ) P(A and B 2 ) P(A and Bk ) • Where B1, B2, …, Bk are k mutually exclusive and collectively exhaustive events Marginal & Joint Probabilities In A Contingency Table Event Event B1 B2 A1 P(A1 and B1) A2 P(A2 and B1) P(A2 and B2) P(A2) P(B1) P(B2) 1 Total Joint Probabilities P(A1 and B2) Total P(A1) Marginal (Simple) Probabilities Computing Conditional Probabilities • A conditional probability is the probability of one event given another event has occurred: P(A and B) P(A | B) P(B) The conditional probability of A given that B has occurred P(A and B) P(B | A) P(A) The conditional probability of B given that A has occurred Where P(A and B) = joint probability of A and B P(A) = marginal or simple probability of A P(B) = marginal or simple probability of B Bayes Theorem • Bayes Theorem is useful to calculate probabilities based on new information – Calculate posterior probabilities using prior probabilities – Probability of A given B is equal to the Probability of B given A multiplied by Probability of A; the whole thing divided by the probability of B Bayes Theorem 2 P (A|B) = (P (B|A) x P (A)) ÷ ((P (B|A) x P (A)) + ((P (B|A`) x P (A`)) Probability of event A occurring given event B has occurred is the numerator : Probability of event B occurring given event A has occurred multiplied of Probability of event A; divided by the numerator added to the Probability of event B occurring given event A has not occurred multiplied by the Probability of event A not occurring Example of Bayes Theorem • The probability that someone has a disease is 0.03 • The probability of a positive result given that someone has a disease is 0.90 • The probability of a positive result given that someone doesn`t have the disease is 0.02 • What is the probability that someone has the disease given a positive result ? Worked-out Example • P (B|A) = 0.90 P (B|A`) = 0.02 P (A) = 0.03 Formula is P (A|B) = (P (B|A) x P (A)) ÷ ((P (B|A) x P (A)) + ((P (B|A`) x P (A`)) = (0.90 x 0.03) ÷ ((0.90 x 0.03) + ((0.02) x (1 - 0.03)) = (0.027) ÷ ((0.027 + (0.0194)) = .5819 = 58.19% Worked-out using Frequencies • P (B|A) = 0.90 P (B|A`) = 0.02 P (A) = 0.03 Suppose a random sample of 1,000 people are tested – 3% have the disease or 30 people (1000 – 30) = 970 people don`t have the disease 90% of 30 people test positive = 27 people 2% of 970 people test positive = 19.4 people Probability of someone having the disease given a positive test = 27 ÷ (27 + 19.4) = 27 ÷ 46.4 = .5819 Decision Tree Approach e Hav 1000 People Do n ot se a e d is ha ve d ise a se 30 People tive i s o p Test Test ne gative 970 People P (Disease|Positive) = 27 ÷ (27 + 19.4) = .5819 = 58.19% itive s o p t Tes Tes t n eg ativ e 27 People 3 People 19.4 People 950.6 People Counting Rules • Number of outcomes of a number of mutually exclusive and collectively exhaustive events (k) is kn – A die has six faces and rolled twice 62 = 36 – A coin with two faces is tossed five times 25 = 32 • In general is there are difering number of possible events – k1 x k2 x k3 x …. Counting Rules • Number of ways that you can arrange n things in order – First slot you have n things – Second slot you have (n – 1) things – Third slot you have (n – 2) things Resulting in n x (n – 1) x (n – 2) x … This is called n! or n factorial Counting Rules • Permutations is the number of ways of arranging things when order matters – nPx = n! ÷ (n – x)! With n things to be arranged x things at a time • Combinations is the number of ways of arranging things when order doesn`t matter – nCx = n! ÷ (x!*(n – x)!) With n things to be arranged x things at a time Discrete Random Variable • Usually the result of a counting (rather than measurement) process • Can take a limited number of values (rather than infinite number of values) – A probability distribution for a discrete random variable is a list of mutually exclusive and collectively exhaustive outcomes along with the probability of each outcome Expected Value of Discrete Random Variable • Multiply each possible outcome by its corresponding probability and add them (only for numbers that are at least interval) – Expected value is the same as the mean – µ = E (X) = Σ i=1N X * P (Xi) – The numbers on a die can be nominal – The numbers on a die can also be ordinal or interval Mean, Variance and Standard Deviation • Mean: µ = E (X) = Σ X * P (Xi) • Variance σ2 = Σ (X - µ)2 * P (Xi) – Find the diference between every X and the mean – Square the diferences and multiply by the probability associated with the X – Add all the results • Standard deviation σ = √ Σ (X - µ) * P (X ) 2 i • Square root of the variance is the standard deviation Discrete Random Variable Variance •Expected Value E (X) = Σ (Xi * P (Xi)) Variance σ2 = Σ (Xi – E(X))2 * P (X = Xi) Standard deviation σ = Binomial Distribution • n number of trials (sample of n trials) • Two outcomes – success/ failure • Probability of success is π Probability of failure is (1 – π) • Random variable is X = number of successes • Probability of success = n! ÷ (x!*(n-x)!) * πx * (1 – π)n-x Binomial Answer •X = number of (successes) n = (number of trials) π = (probability of success) P (number of successes) = P (X = number) = 0._ _ _ _ Mean of binomial µ = E (X) = n*π Standard deviation σ = = √n* π*(1 – π) Poisson Distribution • Period of time or amount of space • Event or no event • Average per unit of time (space) is λ • e is base of natural logarithm e = 2.71828 P (X = xi) = (e –λ * λx) ÷ x! Characteristic is λ Variance is λ Standard deviation is square root of λ Continuous numerical variable • A continuous variable can take on an infinite number of values (between any two values) • Probability density function defines the distribution of values for a continuous random variable The Normal Distribution • ‘Bell Shaped’ • Symmetrical • Mean, Median and Mode are Equal Location is determined by the mean, μ Spread is determined by the standard deviation, σ The random variable has an infinite theoretical range: + to f(X) σ μ Mean = Median = Mode X Normal Distribution • Bell shaped curve whose shape is dependant on µ and σ • Symmetrical therefore mean = median – Typically a table only has the probability for right half • Can be used as an approximation for discrete probability distributions • Tails go on till infinity • Exact probability for a particular value is zero – We can only find out the probability for a range of values • IQR is 1.33 standard deviations – The middle half of the data lies between 2/3 standard deviations above and below the mean Normal Distribution By varying the parameters μ and σ, we obtain different normal distributions Normal Distribution & z scores • Most random variables are normally distributed – To find out the probability for a particular value “x” it has to be transformed so that it can be compared against a standard normal distribution – Standard normal distribution is µ = 0 and σ = 1 – Transform the particular value using the mean and standard deviation for that data z = (x - µ) ÷ σ Using z scores • Transforming a variable into a z-score enables comparison to a standard normal distribution – How many standard deviations from the mean is the particular value? – The cumulative standardised normal distribution tables enable you to find the probability from infinity to the particular value – To find the probability for a range of values, convert both to z-scores, find the associated probabilities and find the diference in probabilities Example • If X is distributed normally with mean of $100 and standard deviation of $50, the Z score for X = $200 is X μ $200 $100 Z 2.0 σ $50 • This says that X = $200 is two standard deviations (2 increments of $50 units) above the mean of $100. Finding Normal Probabilities Probability is measured by the area under the curve f(X) P (a ≤ X ≤ b) = P (a < X < b) (Note that the probability of any individual value is zero) a b X Probability as Area Under the Curve The total area under the curve is 1.0, and the curve is symmetric, so half is above the mean, half is below f(X) P( X μ) 0.5 0.5 P(μ X ) 0.5 0.5 μ P( X ) 1.0 X The Standardized Normal Table • The Cumulative Standardized Normal table in the textbook (Appendix table E.2) gives the probability less than a desired value of Z (i.e., from negative infinity to Z) 0.9772 Example: P(Z < 2.00) = 0.9772 0 2.00 Z Using the fx-9750GII • Clear the memory – Press Menu, select System, select Reset, clear Data memory & Main memory • Make sure that calculator is in STAT mode – Press Men, select STAT – Press F5 under the DIST label – Press F1 under the NORM label – Select Ncd for Normal Cumulative Distribution – Select Var to indicate that you will enter variables Using the fx-9750GII • Assume the mean weight is 170 and standard deviation is 25. What is the probability that somebody weighs between 150 and 190? – Leave the Data as Variable – For Lower enter 150 – For Upper enter 190 – For σ enter 25 – For µ enter 170 – Press EXE or calculate to get 0.5763 or 57.63% Using the fx-9750GII • What is the probability weighs less than 140? – Use a value of (µ - 6*σ) for minus infinity and (µ + 6*σ) for plus infinity • 11.51% • What is the probability that person weighs at least 160? • 65.54% Using the fx-9750GII • Using the calculator to find a z score • Make sure that calculator is in STAT mode – Press Men, select STAT – Press F5 under the DIST label – Press F1 under the NORM label – Select InvN for Normal Cumulative Distribution – Select Var to indicate that you will enter variables Empirical Rules • ~ 68% of the area under the curve lies within one standard deviation of the mean – 68% of the area under the curve lies µ ± σ • ~ 95% of the area under the curve lies within two standard deviations of the mean – 95% of the area under the curve lies µ ± 2σ • ~ 99.7% of the area under the curve lies within three standard deviation of the mean – 99.7% of the area under the curve lies µ ± 3σ • Chebyshev’s rule states that for any dataset the percentage of values that are found within k standard deviations are (1 – 1/k2) x 100% Quantile-Quantile Normal Probability Plot Interpretation (continued) Left-Skewed Right-Skewed X 90 X 90 60 60 30 30 -2 -1 0 1 2 Z -2 -1 0 1 Rectangular Nonlinear plots indicate a deviation from normality X 90 60 30 -2 -1 0 1 2 Z 2 Z Samples & Populations • Populations are all people or all items that are of interest – A census measures something in the entire population yielding parameters (µ, σ Greek letters) • Samples are portions of the population that are selected for measurement and analysis – Yields statistics that are denoted by English letters ( Types of Samples Samples Non-Probability Samples Judgment Convenience Probability Samples Simple Random Stratified Systematic Cluster Survey Worth • What is the purpose of the survey? • What is the type of sample? • What could be the sources of error? Types of Survey Errors (continued) • Coverage error Excluded from frame • Nonresponse error • Sampling error • Measurement error Follow up on nonresponses Random differences from sample to sample Bad or leading question Developing a Sampling Distribution • Assume there is a population … • Population size N=4 • Random variable, X, is age of individuals • Values of X: 18, 20, 22, 24 (years) A B C D Developing a Sampling Distribution (continued) Summary Measures for the Population Distribution: X μ P(x) i N 18 20 22 24 21 4 σ (X i N μ) .3 .2 .1 0 2 2.236 18 A B 20 C 22 D 24 Uniform Distribution x Developing a Sampling Distribution (continued) Now consider all possible samples of size n=2 1 Obs 16 Sample Means 2 Observation st nd 18 20 22 24 18 18,18 18,20 18,22 18,24 20 20,18 20,20 20,22 20,24 22 22,18 22,20 22,22 22,24 24 24,18 24,20 24,22 24,24 16 possible samples (sampling with replacement) 1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24 Developing a Sampling Distribution (continued) Sampling Distribution of All Sample Means 16 Sample Means _ Sample Means Distribution P(X) .3 .2 .1 0 18 19 20 21 22 23 24 (no longer uniform) _ X Developing a Sampling Distribution (continued) Summary Measures of this Sampling Distribution: 18 19 19 24 μX 21 16 (18 - 21) 2 (19 - 21) 2 (24 - 21) 2 σX 1.58 16 Note: Here we divide by 16 because there are 16 diferent samples of size 2. Comparing the Population Distribution to the Sample Means Distribution Population N=4 μ 21 Sample Means Distribution n=2 μX 21 σ 2.236 _ P(X) .3 P(X) .3 .2 .2 .1 .1 0 σ X 1.58 18 A B 20 C 22 D 24 X 0 18 19 20 21 22 23 24 _ X Sampling Distribution of The Mean: Standard Error of the Mean • Diferent samples of the same size from the same population will yield diferent sample means • A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean: (This assumes that sampling is with replacement or sampling is without replacement from an infinite population) σ σX n • Note that the standard error of the mean decreases as the sample size increases Sampling Distribution of The Mean: If the Population is Normal • If a population is normal with mean μ and standard deviation σ, the sampling distribution of X is also normally distributed with and μ X μ σ σX n Z-value for Sampling Distribution of the Mean • Z-value for the sampling distribution ofX : Z where: (X μ X ) σX (X μ) σ n =X sample mean μ = population mean σ = population standard deviation n = sample size Sampling Distribution Properties (continued) As n increases, σ decreases x Larger sample size Smaller sample size μ x Determining An Interval Including A Fixed Proportion of the Sample Means Find a symmetrically distributed interval around µ that will include 95% of the sample means when µ = 368, σ = 15, and n = 25. – Since the interval contains 95% of the sample means 5% of the sample means will be outside the interval. – Since the interval is symmetric 2.5% will be above the upper limit and 2.5% will be below the lower limit. – From the standardized normal table, the Z score with 2.5% (0.0250) below it is -1.96 and the Z score with 2.5% (0.0250) above it is 1.96. Determining An Interval Including A Fixed Proportion of the Sample Means (continued) • Calculating the lower limit of the interval σ 15 X L μ Z 368 ( 1.96) 362.12 n 25 • Calculating the upper limit of the interval XU σ 15 μ Z 368 (1.96) 373.88 n 25 • 95% of all sample means of sample size 25 are between 362.12 and 373.88 Casio fx-9750GII • Make sure that the memories are cleared and the calculator is in STAT mode – Under DIST select NORM – Select InvN for Inverse Normal – Select Var for inputting variables – Select CNTR since you want the middle 95% – Input .95 for Area – For standard deviation input the standard error • (15 ÷ √25) = 15 / 5 = 3 – For mean input 368 – Press EXE to execute Sampling Distribution of The Mean: If the Population is not Normal DCOVA • We can apply the Central Limit Theorem: – Even if the population is not normal, – …sample means from the population will be approximately normal as long as the sample size is large enough. Properties of the sampling distribution: μ x μand σ σx n Central Limit Theorem • Central Limit Theorem states as the sample size (the number of values in each sample) gets large enough, the sampling distribution of the mean is approximately normally distributed, regardless of the shape of the distribution ...
View Full Document