Title: Prezentace aplikace PowerPoint
1Relationship Between Quantitative
Data (Regression and Correlation Analysis)
2One-dimensional Statistics we assess 1 variable
(statistical character) in different sets of data
(samples, populations) we analyse differences
between 2 sets of data (methods statistical
tests)
2-dimensional Statistics we assess 2 variables
in 1 set of data we try to qualify and describe
relationship between 2 variables, one being an
independent and one being a dependent var.
predicted by the independent v. (methods
regression and correlation analyses)
3Relationships between 2 variables - functional
- statistical (correlative)
- Functional Relationship
- (Mathematics, Physics)
- - the magnitude of dependent variable is
determined by the magnitude of independent
variable each value of the independent variable
(xi) corresponds to 1 exact value of the
dependent variable ( yi)
Description exact equation (formula) e.g.
circle radius (r) and circumference (y2?r)
4Graphical description circle radius (r) and
circumference (y2?r)
yi (2?r)
Strictly causal relationship - not affected by
random
(dependent v. - outcome)
xi (r)
(independent v.- input)
5- Statistical Relationship (Correlative)
- (Biology)
- - free relation the magnitude of one of the
variables probably changes as the magnitude of
the second variable changes. - Each value of xi corresponds to several random
values of yi and also the reverse is possible
(In such case it is not resonable to consider
there is an independent and dependent variable
e.g. fore- and hindleg lengths in animals, human
height and weight ).
6Graphical description to display the data
points (each point has its values in both axes
xi,yi - correlation pair)
yi (Weight)
(Correlation Chart, Scatter Diagram)
xi (Height)
7Different pattern of scatter charts ? different
types of relationships
A Relationship between two data sets exists
(tight direct relation)B Relationship among
the two data sets exists (tight inverse
relation)C Evidence of poor or no significant
relation
8Description of correlative relationship
To estimate the best-fit function that can
express the relationship and to determine its
equation (approximation -gt smooth diagram).
According to the pattern of scatter diagram
a) linear correlative relation b)
non-linear correlative relation
9A) Linear Correlative Relation
- Empirical curve (describes the relation in a
sample set) - when we have several equal values xi ? several
values yi ? mean.Join the means empirical
curve (estimation of the best-fit linear
function)
yi (Weight)
(empirical curve)
xi (Height)
10- Theoretical regression line (describes the
relation in a population) - yabx Method regression analysis
(linear regression)
Characteristics of the line a (intercept)
represents the intercept point on axis y b
(slope) tg ?
11Linear Regression (computes the parameters of the
function y abx)
Sample n - number of members correlation pairs
(xi yi)
12- 2 points for the construction of the regression
line - we choose any x1 ? y1 a bx1
- we choose any x2 ? y2 a bx2
yi
y2
(Theoretical regression line)
y1
xi
x1
x2
13- Correlation Analysis determines the level of
association of X and Y (closeness of the
relation)
Correlation coefficient
r quantitative expression of interaction force
between X and Y (cluster of points around the
line in scatter diagram may be free or close).
14 r ?-1 1?
r 0
r gt0
r lt0
Close direct c.(X,Y increase together)
Close inverse c.X increases, Y decreases
No correlation
r 1 Functional direct r.
r -1 Functional inverse r.
15Significance of the Correlation Coefficient
The correlation coefficient r is only an estimate
of an actual cor.coef. in the population (denoted
?). Is there (in fact) any correlation in the
population? - We test the hypothesis of the
independence (H0?0) using t-test
Test statistic
? n-2
SD of the correlation coef.
If t ? t?(?) ? H0 is not true, correlation
between X,Y really exists (r is significant)
If t ? t?(?) ? H0 is true, correlation between
X,Y really does not exist (r is
insignificant)
16B) Non-linear Correlative Relation
Scattered diagram
Difficulties in non-linear regression equations ?
computer polynomial regression ? different
regression models (curves) E.g. The most common -
quadratic yab1xb2x2 (second-order
polynom)- calculation of coefficients a, b1,b2.
17Another method for the analysis of a non-linear
relation is
Spearman Rank Correlation
- Non-parametric method used if either or both
data (dependent or independent variables) are
skewed (non-normal)
- Can be used more generally then parametric
correl.coef. (in both linear and non-linear
correlation), but is not as precise
- Ranks of the measurements only are used in
calculation (instead of observed values xi, yi)
Sample n- number of members correlation
pairs (xi yi)
18Variable X and Y is arranged separately
x2 ltx4 ltx1 ltx5 ltx3 ltx8 ltx6 ltx7..
1 2 3 4 5 6 7 8
n
y3 lty1 lty5 lty2 lty4 lty8 lty7 lty9..
1 2 3 4 5 6 7 8
n
Di difference between xi, yi ranks (D13-2,
D21-4, D35-1 )
rSp ?-1 1?
rSp gt r(?,n) ? significant correlation between
X and Y
rSp ? r(?,n) ? insignificant correlation
between X and Y
19(No Transcript)
20Example The Spearman rank correlation
coefficient, computed for the relation between
wing and tail lengths among birds of a particular
species
21 Wing l. Rank Tail l.
Rank No (X) of X (Y) of
Y Di Di2
1 10.2(cm) 1.5 7.1(cm) 1 0.5
0.25 2 10.2 1.5 7.2 2.5 -1
1 3 10.3 3 7.4 5
-2 4 4 10.4
4 7.4 5 -1
15 10.5 5 7.2 2.5
2.5 6.25 6 10.6
6 7.8 9.5 -3.5
12.25 7 10.7 7 7.4 5
2 4 8 10.8
8.5 7.6 7 1.5
2.25 9 10.8 8.5 7.8
9.5 -1
1 10 11.1 10 7.9 11
-1 1 11 11.2
11 7.7 8 3
9 12 11.4 12 8.3
12 0
0
n12 ?Di242.00
Crit. rSp (0.01, 12)0.727 ? correlation between
wing and tail lengths is statistically highly
significant (really exists in the population).