Kibitzing: January 2008

Wednesday, January 30, 2008

01/30/08

Research class,

Left off talking about measurement
1. Validity -
2. Reliability - reproducable
3. Objectivity - right and wrong answer
4. Useable -

Validity v. Reliability

Validity = appropriateness, corectness, meaningfulness and usefulness of inferences made about the instruments used in a study

Reliability = consistency of scores obtained across individuals, administrators and sets of items.

Relationship between Reliability and Validity

Suppose I have a faulty measuring tape and I use it to measure each students height. - it can be reliably wrong - answers will be wrong, but it will be consistantly wrong. Every time I measure a specific student - he will be the same height each time, even if it is the wrong height.

How about a measuring tape on an elastic ribbon. I could get 3 or 4 different heights each time I measure the same student.

Unreliable is always invalid
can be valid and reliable, invalid and reliable, unreliable and invalid but not unreliable and valid.

CONTENT VALIDITY
do the contents of the measurement match the contents of the curriculum?

CRITERION VALIDITY
how well do two measures correlate with each other - how well does your test correlate with some other measure of learning, performance
..................Predictive Validity - how well does ACT predict future performance in college
..................Concurrent Validity - does CRT correlate with their grades?
.............................convergent validity (evidience that it measures same thing as other measure) v. discriminant validity (it does not correlate with something else, they measure different things).

CONSTRUCT VALIDITY
vague - does test measure what it is supposed to be measuring? old IQ tests - did they measure intelligence or were they culturally biased, measuring more about how close to WASP culture your culture is.

INTERNAL VALIDITY - how well is the study designed, priocedures controled, subjects selected, designing the study

threats to internal validity
subject characteristics
attrition
location
instrumentation
data collectors
testing
attitude of subjects
implementation
history
maturation
regression threat - refers to the fact that when you retest someone who was way out in the extremes, their new scores tend to regress towards the mean of the distribution

Ways that threats to internal validity can be minimized
· Standardize study conditions
· Obtain more information on individuals in the sample
· Obtain more information about details of the study
· Choice of appropriate design
Is the study well controlled
Reliability Checks
· Test-retest (stability) – are you getting consistent measurements between time one and time 2, 3, 4, etc….
· Equivalent forms – form A, B and etc. If I test the same person on those forms will their scores be close?
· Internal consistency - items in the same test are consistent.
o Split-half (most common), never compare first half to second half – compare odds to evens.
o Kuder Richardson – statistical measure
o Chronbach Alpha – statistical measure
· Inter-Rater (Agreement) hard ass v. easy – how consistent are the people rating the test instrument
….reliability was high, r = .95

Analyzing Data
1. graphs and charts
2. descriptive statistics - describe the sample (socio economic status, race, family situation)
3. Inferential statistics - describe a sample and are inferred to a larger (target) population

Measures of central tendency
--mean (average) - most stable measure from sample to sample
--median - middle score - fluctuates from sample to sample
--mode = most frequent score - fluctuates even more than the median
--range = highest score minus lowest score
--standard deviation = average deviation from the mean
--variance = standard deviation squared
-- standard error of measurement = range in which the "true score" is likely to fall

standard deviation is best measure of variability of samples

inferential statistics make inference from several different descriptive statistics

BACK TO THE NORMAL DISTRIBUTION
mean always down the middle = also mean, median and mode are all the same.
z-score is how many standard deviations away from the mean the score falls.
---------------z = (raw score - mean) divided by standard deviation

+- 1 Sd = 34%
+- 2SD = 14%
+- 3SD = 2%

CORRELATION COEFFICIENTS
- "r" can range from -1 to +1
- negative correlation = as one variable increases the other decreases
- positive correlation = as one variable increases, the other increases also
- zero corrleation = no relationship between the two variables

closer r is to 0, the less the predictive ability, the further away from 0 r gets, the more predictive ability. .9 = high predictability, -.9 = high predictability, .11 = very low predictability.

professor = .8 or above is good predictability

HYPOTHESIS TESTING
WE NEVER PROVE ANYTHING, so we want to prove something wrong.

Null Hypothesis (H0) = set up to state that there is no effect. We say that x will not improve test scores, and then go and prove ourselves wrong.

Alternative Hypothesis (H1) = set up to state that there is an effect

These two hypotheses must be :
-mutually exclusive
-exhaustive

always testing numm hypothesis

Test by determining by doing statistics to determine probability that the result was do to chance: want to show that the probability that results were due to chance to be low, less than 5%
- if the probability that the result was due to chance was <> 5%, the null hypothesis cannot be rejected.

ALWAYS WANT P<.05 (probability result due to chance (P) is less than 5%) P<.05 = significant effect P>.05 = non significant effect

5% level => alpha level => .05, is the stated acceptable P level.

Alpha Level - prestated acceptable level of acceptable, the goal you set for yourself before the start of the study.

.............................................Null is True................Null is False

Fail to Reject the Null.......Correct Decision.........Type II Error

Reject the Null...................Type I Error...............Correct Decision

......................................................................................(power)

Way to increase the power (chance of rejecting the null, and it being the correct thing)

increase sample size (n)

control study really well

1. Research Question: What is the effect of a new notetaking software on the number of lecture units recorded correctly.

2. Null Hypothesis : Software will not have any effect

....Alternative Hypothesis : Software will have an effect.

3. alpha level = .05 (chance I'm willng to take that I am wrong.

4. I conduct my study, and fint that the software significantly increases the amount of lecture units recorded correctly, t(31)=4.56, p=.001, and I reject my null hypothesis (ie. I say the null is false).

Significant effect - Reject the null : either correct or make Type I error

Inconclusive - Fail to reject the Null : either correct or making a Type II error

Ways to increase power

1. increase sample size

2.control for extraneous variables (confounds)

3.Increase the strenght of the treatment

4. Use a one-tailed test when justifiable (directional) - testing for an effect in a specific direction (eliminate that it could actually make people worse)

Effect Sizes = tells you the magnitude of the effect. P tells you that there is an effect, but is it insignificant or not significant?

-Cohen's d - most often method of reporting

eta-squared or partial eta-squared

Coefficient of determination (Rsquared)

if d is <.2 then effect is not significant.

d between .3 and .5 it is a medium effect

d > .5 then it is significant, large effect

can be greater than 1, but that would be a HUGE effect.

Meta Analysis = average effect size over many studies.

Wednesday, January 23, 2008

01/23/08

Research class. Yeah, no technology class because of Martin Luther King Jr. Day.

Sampling and Measurement

I. Sampling

A. Samples v. Populations

Sample = group of people participating in the study - is supposed to be representative of whole population

Population = group of people to whom you want to generalize your results.

- Target Population : who do you really want to generalize your results to.

- Accessible Population : who you can actually generalize your results to.

ie: want to study improving test scores in Utah, but can only survey 100 teachers in Salt Lake County

target population is all teachers in Utah

accessible population is teachers in Salt Lake County, because teacher issues in rural areas are different than urban areas.

Target and Accessible populations depend on how well your sample really represents your TP. I your sample really represents your TP well, then AP and TP are the same. If sample does not represent TP well, then AP is who the sample really represents.

Two types of Sampling

1. Probability Sample = take a random sample from the population - each member of the population have the same chance of being selected as every other member. (Simple or Straight Random Sample)

2. Non-Probability Sample - or non-random sample = where members of the sample are chosenin a way that not every member has an equal chance of being chosen.

Probability Sampling Methods

1. Stratified Random Sampling - selected subsets of the population are chosen to represent the same proportions as in the general population. ie: x% male - y% female, or by race x% white, y% Hispanic, z% Asian etc..... Make sure the sample has the same percentages as the general population.

2. Clustered Random Sample - select existing groups of participants instead of creating subgroups. ie: making sure that you have a sample from both lower and higher socio-economic levels, from naturally forming groups (east v. west side schools), without forming the groups, without worrying about % representation.

3. Two Stage Random Sampling = combines stratified and clustered sampling. First you pick the naturally occurring groups to sample from, and then instead of using the entire population of the groups you take a sample from the clusters that were chosen.

Non-Probability Sampling Methods

1. Systematic Sampling = every nth individual in a population is selected for participation in the study. Polling every 4th person that leaves the voting booth. Sampling Interval = n and in the nth person.

2. Convenience Sampling = select a group of individual who are conveniently available to be participants in the study. ie: everyone at the Chevron. ok if you are studying people who shop at convenience stores at the given time - but not really a good sample of the entire population of Salt Lake City.

3. Purposive Sampling = researchers use past knowledge or own judgement to select a sample that he/she thinks is representative of the population.

Sampling in Qualitative Research

- Purposive Sampling - ie: teacher burnout - what do you mean by teachers, what do you mean by burnout and what do you mean by teacher burnout. - select those individuals that researcher thinks reperesents the desired population.

- Case Analysis

.....-typical - what typical teacher burtnout looks like - ie any teacher sometime in April or May

.....-extreme - the teacher who burned out so bad they quit and went to work for Blockbuster

.....-critical - critical characteristics of burnout - same lessons, notes laminated, don't care anymore, just waiting for retirement.

- Maximum Variation - sample represents maximum extremes of your population. ie: from Hickman to

- Snowball Sampling - start out with small group and keep adding on as the study continues.

Sampling and Validity

What size is appropriate?

Descriptive = 100

Correlational = 50

Experimentsl = 30

How generalizable is the sample?

external validity = the results should be generalizable beyond the conditions of the study? Is it only valid for the study sample?

1. Population generalizability - degree to which results can be extended to other populations

2. Ecological generalizable - degree to which results can be extended to other settings or conditions

What is Measurement

-Measurement - just the gathering of information

-Evaluation - making judgements from the collected data

-Where doe assessment fit in?

What kind of scale is the measurement based on?

-Nominal - categorical variables - gender, eye color, type of car - only one that is qualitative not quantative

-Ordinal - ranked - 1st, second, third etc - no info on how much distance between 2nd and 3rd,etc...

-Interval - no absolute zero - degrees farenheit

-Ratio - there is an absolute zero - degrees kelvin

interval and ration tend to look the same

Types of Educational Measures

- Cognitive (how much did someone learn?) v. Non-cognitive

- Commerical (standardized tests developed for commercial purposes, good, tested, norming evidience, but not tailored to the study) vs. Non-commercial (developed by researcher for their study)

- Direct (participants themselves are giving the information) v. Indirect (information about students from teachers)

Sample Cognitive Measures

-Standardized Tests

---acheivement tests - what have people already learned

---aptitude tests - potential for future learning

-Behavioral Measures

---naming time

---response time

---reading time

------wpm

------eyetracking measures - exactly where they are looking on a computer screen, how long they look there, where they go next and if they come back - images v. text

---number of fixations on Areas Of Interest

---transitions between AOI

---Duration if individual fixations or combinations of fixations on AOI

---regressions in or out of AOI

---rereading of AOI

---pupil diameter - measure of cognitive level - interested, scared, aroused, working hard on something our pupil becomes bigger.

Non-Cognitive Measures

---surveys and questionnaires

---observations

---interviews

How is an individual's score interpreted?

---Norm-refrenced - grading on a curve, an individuals score based on comparison to peer scores

---Criterion-referenced

Interpreting scores

Different ways to present scores

1. Raw scores - number of items answered correctly, number of times behavior is tallied

2. Derived scores - scored changed into a more meaningful unit

---age/grade equivalent scores -

---percentile ranking - ranking of score compared to all other individuals who took the test.

---standard scores (z-score) - how far scores are from a reference point; usually best to use in research.

mean 560 compared to

mean 140 and students score is 132

z score basically same as standard deviations. z score of 3 is 3 standard deviations from the mean.

Important Characteristics of Measure

---must be objective

---have to be useable - if they are insanely difficult to use then they are no good.

---validity - does measure actually measure what it is supposed to measure?

---reliability - do I get consistant measures over time.

Wednesday, January 16, 2008

01/16/08

Research Class

Research questions, variables and hypothesis.

What are research questions? example: US census, is just a big research project.

Research problems vs. Research Questions.

Research Problem : problem to be solved, area of concern, general question, etc....
eg We want to increase the use of technology in K-3 classrooms in Utah
Research Question: a clarification of the research problem which is the focus of the research and drives the methodology chosen
eg Does integration of technologyu into teachin in K-3 lead to higher standardized acheivement scores than traditional teaching methods alone.
nature of the question is driving the methodology.

Researchable Research Questions - questions that can be addressed with research

experimenter interests
application issues
replication issues, do these results replicate in different situations

Do they focus on a product or process, or neither?
Are the questions researchable or unresearchable?

Researchable Questions contain empirical referents - something that can be observed and/or quantified in some way. eg the Pepsi challenge - which soda do people prefer more? Coca-Cola or Pepsi? (Coke is always couple degrees warmer)
Unresearchable questions do not contain empirical referents, involve value judgements. eg should prayer be allowed in schools?

Essential characteristics of Good Research Questions

they are feasable.
they are clear - a. conceptual or constitutive definition = all terms in the question must be well defined and understood. should be defined sonewehre in the research statement - not necessarily in one sentence. b. operational definition = specify how the dependant variable will be measured. operationalize = how are we going to measure it.
they are significant, address some fundamental, important, issue.
they are ethical - protect participants from harm - ensure confidientiality - should subjects be deceived? if so, subjects should be debriefed afterwards

Variables: Quantitative vs. categorical

quantitative variables are numerical variables - continous or discontinous (discrete)
categorical variables - cannot be given a number - political affiliation, college major, religious affiliation

Can look for a relationship among

two quantative variables - height and weight
two categorical variables - religion and political affiliation
one of each - age and occupation
quantative made as categorical - age 0-5, 6-10, 11-20, 21-30 etc. same with income $15,000-30,000 etc.

Independant vs. Dependant variables

independant variable - the variable that we are manipulating in the experiment, the variable we have control over. manipulated or selected. (eg gender is selected but not manipulated)
dependent variable - what we are studying, the variable that we are measuring.
extraneous variable - or the confound. uncontrolled factor that affect the dependent variable - the things that mess up, or could mess up, our study

Quantative Research Hypothesis

they should be stated in declaritive form - make a statement not ask a question
they should be based on facts/research/theory
they should be testable
they should be clear and concise
if possible, they should be directional. (non directional "females GRE scores are DIFFERENT than males?") "female GRE scores are better than males" or maybe "females GRE scores are worse than males"

Qualitative Research Questions

they are written aout a central phenomenon instead of a prediction
not too general, not too specific.
amenable to change as data collection progresses.
unbiased by researcher's assumptions or hoped findings

Group Assignment

Anxiety and test-taking

Does higher anxiety in a student produce lower test scores.

Identifying Research Articles

- What type of source is it?

Primary Source - original research article
Secondary Source - reviews, summarizes or discusses research conducted by others
Tertiary Source - summary of a basic topic rather than summaries of individual studies

We are supposed to always look for Primary Sources.

Is it peer reviewed?

Refereed journals - editors v. reviewers - blind reviews - level of journal in field
Non-refereed journals - summary journals, practitioner magazines, rerely see primary source articles in them

Why peer review?

Importance of verification before dissemination - once the media disseminated the information it is hard to undo the damage - scientists arguing autism as a result of MMR vaccine never published his results in a scientific journal - claim of first human baby clone was based only on company's statement -
greater signfcance of the finding the more important it is to ensure that the finding is valid.

Is peer review an insurance policy? NOPE!, just a check.

not exactly - some fraudulent (or incorrect) claims may still make it through publication - Korean scientist who fabricated data supporting the landmark claim in 2004 that he created the world's first stem cells from a cloned human embryo
peer review is another source of information for: funding allocation - quality of research/publication in scientific journals - quality of research institutions (both on departmental and university levels) - policy decisions

http://web.ebscohost.com/ehost/search?vid=1&hid=16&sid=11ec807d-8aef-4abd-b66d-5e0075f5f56c%40sessionmgr9

always choose "Journal Articles" and "Researchers" and check "Full Text"

2 weeks to find article
have it approved by me by 1/30
Initial analysis due 2/6

Hansen et al (2004a)

More Experimental Design
2/20
we are leading the discussion

Monday, January 14, 2008

01/04/08

Technology
Course
.Grade
..Standard
...Objective
....Intended Learning Outcome - to be paraphrased when citing what your outcome is for any given project in this class.

First national core standards - A Nation at Risk, 1983 under Ronald Reagan. NCTM, National Council of Teachers of Mathematics was the first national core standards.

software.utah.edu

penultimate - next to last.

wikipedia - Martin Luther King Day, Utah trivia

American Rhetoric

Monday, January 7, 2008

01/07/08

First day of Educational Technology.

1. Name
2. Contact Info
3. 3 most important things to learn in this class
4. 3 most important things to be teaching our students
5. Favorite ice cream flavor.

How to integrate technology into classroom.

U.E.N. - internet service provider for education in the state of Utah

Went over syllabus - look at hard copy for notes.

The Net Generation - the Milennials - the kids that are currently in our classrooms.

Election websites
NPR & CBS are a couple of good ones.

Tuesday, January 1, 2008

Grades are in.

Instructional Design : A-
Learning Theory : B

Yeah, I'm happy with that. Even though Gibb one upped me on both classes......

Kibitzing