Wednesday, January 30, 2008

01/30/08

Research class,


Left off talking about measurement
1. Validity -
2. Reliability - reproducable
3. Objectivity - right and wrong answer
4. Useable -

Validity v. Reliability

Validity = appropriateness, corectness, meaningfulness and usefulness of inferences made about the instruments used in a study

Reliability = consistency of scores obtained across individuals, administrators and sets of items.

Relationship between Reliability and Validity

Suppose I have a faulty measuring tape and I use it to measure each students height. - it can be reliably wrong - answers will be wrong, but it will be consistantly wrong. Every time I measure a specific student - he will be the same height each time, even if it is the wrong height.

How about a measuring tape on an elastic ribbon. I could get 3 or 4 different heights each time I measure the same student.

Unreliable is always invalid
can be valid and reliable, invalid and reliable, unreliable and invalid but not unreliable and valid.

CONTENT VALIDITY
do the contents of the measurement match the contents of the curriculum?

CRITERION VALIDITY
how well do two measures correlate with each other - how well does your test correlate with some other measure of learning, performance
..................Predictive Validity - how well does ACT predict future performance in college
..................Concurrent Validity - does CRT correlate with their grades?
.............................convergent validity (evidience that it measures same thing as other measure) v. discriminant validity (it does not correlate with something else, they measure different things).

CONSTRUCT VALIDITY
vague - does test measure what it is supposed to be measuring? old IQ tests - did they measure intelligence or were they culturally biased, measuring more about how close to WASP culture your culture is.

INTERNAL VALIDITY - how well is the study designed, priocedures controled, subjects selected, designing the study

threats to internal validity
subject characteristics
attrition
location
instrumentation
data collectors
testing
attitude of subjects
implementation
history
maturation
regression threat - refers to the fact that when you retest someone who was way out in the extremes, their new scores tend to regress towards the mean of the distribution

Ways that threats to internal validity can be minimized
· Standardize study conditions
· Obtain more information on individuals in the sample
· Obtain more information about details of the study
· Choice of appropriate design
Is the study well controlled
Reliability Checks
· Test-retest (stability) – are you getting consistent measurements between time one and time 2, 3, 4, etc….
· Equivalent forms – form A, B and etc. If I test the same person on those forms will their scores be close?
· Internal consistency - items in the same test are consistent.
o Split-half (most common), never compare first half to second half – compare odds to evens.
o Kuder Richardson – statistical measure
o Chronbach Alpha – statistical measure
· Inter-Rater (Agreement) hard ass v. easy – how consistent are the people rating the test instrument
….reliability was high, r = .95

Analyzing Data
1. graphs and charts
2. descriptive statistics - describe the sample (socio economic status, race, family situation)
3. Inferential statistics - describe a sample and are inferred to a larger (target) population

Measures of central tendency
--mean (average) - most stable measure from sample to sample
--median - middle score - fluctuates from sample to sample
--mode = most frequent score - fluctuates even more than the median
--range = highest score minus lowest score
--standard deviation = average deviation from the mean
--variance = standard deviation squared
-- standard error of measurement = range in which the "true score" is likely to fall


standard deviation is best measure of variability of samples


inferential statistics make inference from several different descriptive statistics


BACK TO THE NORMAL DISTRIBUTION
mean always down the middle = also mean, median and mode are all the same.
z-score is how many standard deviations away from the mean the score falls.
---------------z = (raw score - mean) divided by standard deviation


+- 1 Sd = 34%
+- 2SD = 14%
+- 3SD = 2%

CORRELATION COEFFICIENTS
- "r" can range from -1 to +1
- negative correlation = as one variable increases the other decreases
- positive correlation = as one variable increases, the other increases also
- zero corrleation = no relationship between the two variables

closer r is to 0, the less the predictive ability, the further away from 0 r gets, the more predictive ability. .9 = high predictability, -.9 = high predictability, .11 = very low predictability.

professor = .8 or above is good predictability

HYPOTHESIS TESTING
WE NEVER PROVE ANYTHING, so we want to prove something wrong.

Null Hypothesis (H0) = set up to state that there is no effect. We say that x will not improve test scores, and then go and prove ourselves wrong.

Alternative Hypothesis (H1) = set up to state that there is an effect

These two hypotheses must be :
-mutually exclusive
-exhaustive

always testing numm hypothesis

Test by determining by doing statistics to determine probability that the result was do to chance: want to show that the probability that results were due to chance to be low, less than 5%
- if the probability that the result was due to chance was <> 5%, the null hypothesis cannot be rejected.

ALWAYS WANT P<.05 (probability result due to chance (P) is less than 5%) P<.05 = significant effect P>.05 = non significant effect



5% level => alpha level => .05, is the stated acceptable P level.

Alpha Level - prestated acceptable level of acceptable, the goal you set for yourself before the start of the study.



.............................................Null is True................Null is False



Fail to Reject the Null.......Correct Decision.........Type II Error



Reject the Null...................Type I Error...............Correct Decision

......................................................................................(power)



Way to increase the power (chance of rejecting the null, and it being the correct thing)

increase sample size (n)

control study really well



1. Research Question: What is the effect of a new notetaking software on the number of lecture units recorded correctly.

2. Null Hypothesis : Software will not have any effect

....Alternative Hypothesis : Software will have an effect.

3. alpha level = .05 (chance I'm willng to take that I am wrong.

4. I conduct my study, and fint that the software significantly increases the amount of lecture units recorded correctly, t(31)=4.56, p=.001, and I reject my null hypothesis (ie. I say the null is false).



Significant effect - Reject the null : either correct or make Type I error

Inconclusive - Fail to reject the Null : either correct or making a Type II error



Ways to increase power

1. increase sample size

2.control for extraneous variables (confounds)

3.Increase the strenght of the treatment

4. Use a one-tailed test when justifiable (directional) - testing for an effect in a specific direction (eliminate that it could actually make people worse)



Effect Sizes = tells you the magnitude of the effect. P tells you that there is an effect, but is it insignificant or not significant?

-Cohen's d - most often method of reporting

eta-squared or partial eta-squared

Coefficient of determination (Rsquared)



if d is <.2 then effect is not significant.

d between .3 and .5 it is a medium effect

d > .5 then it is significant, large effect

can be greater than 1, but that would be a HUGE effect.



Meta Analysis = average effect size over many studies.

No comments: