T检验分析

董一航 · 发表于 2022-12-1 19:13:42

这篇文档，主要针对最近学习的假设检验的应用
1、T检验的统计知识
2、在R语言中如何实现T检验

第一步，什么是T检验
如下是从百度百科中抄录的部分内容，详细解释参见t检验_百度百科
从中我想知道的是
1、什么是T检验
2、可以帮我们解决什么问题
3、公式以及参数的含义

什么是T检验

t检验是用t分布理论来推论差异发生的概率

解决什么问题

从而比较两个平均数的差异是否显著

公式以及参数含义

t检验分为单总体检验和双总体检验。
单总体t检验是检验一个样本平均数与一个已知的总体平均数的差异是否显著。当总体分布是正态分布，如总体标准差未知且样本容量小于30，那么样本平均数与总体平均数的离差统计量（离差也叫差量，是单项数值与平均值之间的差）呈t分布。
单总体t检验统计量为：

双总体t检验是检验两个样本平均数与其各自所代表的总体的差异是否显著。双总体t检验又分为两种情况，一是独立样本t检验，一是配对样本t检验。
独立样本t检验统计量为：

适用条件
(1) 已知一个总体均数；
(2) 可得到一个样本均数及该样本标准差；
(3) 样本来自正态或近似正态总体。
T分布
当样本总体符合正态分布，σ未知，且可供支配的样本很少时，X^符合t分布
T分布的特点：
1、外表光滑
2、对称的曲线
确切的形状取决与样本的大小，当样本很大时，t分布的外形很像正态分布曲线
当样本很小时，曲线较为扁平，有两条粗粗的尾巴
只有一个参数v,v=n-1,n为样本的大小，v为自由度

计算方法
1、求t分布的标准分

2、决定置信水平
置信水平是指置信区间中包含总体统计量，这个说法有多大的信息，它帮助我们可以指出置信区间有多宽

3、求出置信区间的上下限

t检验步骤
以单总体t检验为例说明:
问题：难产儿出生数n=35，体重均值 =3.42，S =0.40，一般婴儿出生体重μ0=3.30（大规模调查获得），问相同否？
解：1.建立假设、确定检验水准α
H0：μ = μ0 （零假设，null hypothesis）
H1：μ ≠ μ0（备择假设, alternative hypothesis，）
双侧检验，检验水准:α=0.05
2.计算检验统计量

3.查相应界值表，确定P值，下结论
查附表1，t=p(0.025) / 34 = 2.032,即t的拒绝域为t>2.032,而我们通过统计计算后的t=1.77不在拒绝域内，所以H0建设不能拒绝，即均值相同

使用R语言进行的T检验方法，参见第7章，7.4t检验
T检测：针对两组的独立样本t检验可以用于检验两个总体的均值相等的假设
使用函数如下：t.test()
Description

Performs one and two sample t-tests on vectors of data.

Usage

t.test(x, ...)

## Default S3 method:
t.test(x, y = NULL,
   alternative = c(&#34;two.sided&#34;, &#34;less&#34;, &#34;greater&#34;),
   mu = 0, paired = FALSE, var.equal = FALSE,
   conf.level = 0.95, ...)

## S3 method for class &#39;formula&#39;
t.test(formula, data, subset, na.action, ...)
Arguments

x
a (non-empty) numeric vector of data values.
y
an optional (non-empty) numeric vector of data values.
alternative
a character string specifying the alternative hypothesis, must be one of &#34;two.sided&#34; (default), &#34;greater&#34; or &#34;less&#34;. You can specify just the initial letter.
mu
a number indicating the true value of the mean (or difference in means if you are performing a two sample test).
paired
a logical indicating whether you want a paired t-test.
var.equal
a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
conf.level
confidence level of the interval.
formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.
data
an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).
subset
an optional vector specifying a subset of observations to be used.
na.action
a function which indicates what should happen when the data contain NAs. Defaults to getOption(&#34;na.action&#34;).
...
further arguments to be passed to or from methods.
Details

The formula interface is only applicable for the 2-sample tests.

alternative = &#34;greater&#34; is the alternative that x has a larger mean than y.

If paired is TRUE then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is TRUE). If var.equal is TRUE then the pooled estimate of the variance is used. By default, if var.equal is FALSE then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used.

If the input data are effectively constant (compared to the larger of the two means) an error is generated.

Value

A list with class &#34;htest&#34; containing the following components:

statistic
the value of the t-statistic.
parameter
the degrees of freedom for the t-statistic.
p.value
the p-value for the test.
conf.int
a confidence interval for the mean appropriate to the specified alternative hypothesis.
estimate
the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.
null.value
the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test.
alternative
a character string describing the alternative hypothesis.
method
a character string indicating what type of t-test was performed.
data.name
a character string giving the name(s) of the data.
See Also

prop.test

Examples

require(graphics)

t.test(1:10, y = c(7:20))    # P = .00001855
t.test(1:10, y = c(7:20, 200)) # P = .1245 -- NOT significant anymore

## Classical example: Student&#39;s sleep data
plot(extra ~ group, data = sleep)
## Traditional interface
with(sleep, t.test(extra[group == 1], extra[group == 2]))
## Formula interface
t.test(extra ~ group, data = sleep)t检验默认假定方差不相等
默认的备择假设是双侧：即均值不相等，但是大小方向不确定

> library(MASS)
> t.test(Prob~So,data = UScrime)

Welch Two Sample t-test

data:  Prob by So
t = -3.8954, df = 24.925, p-value = 0.0006506
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.03852569 -0.01187439
sample estimates:
mean in group 0 mean in group 1
   0.03851265    0.06371269 t值为-3.8954
df：自由度24.925
p-value=0.006506，求出的p值如果小于5%，则拒绝原建设，否则>5%时，则同意原建设。
如下是从网络中摘抄的语句关于p-value与5%的关系
The p-value is greater than 0.05, then we can accept the hypothesis H0 of equality of the averages.
HO建设：prob与so对应的mean值分布相同
H1：prob与so对应的mean分布不同
因为p_value<0.05,所以不支持H0假设

米田田 · 发表于 2022-12-1 19:14:32

此回答明显可以看出来答主是认真准备的，辛苦！
我想做几点补充
首先要说明t检验是针对计量资料而言的一般说样本含量比较小，是不是以n<60为一般标准？在说t检验的用途时，私以为您说的有些专业。是不是可以说“主要应用于特定几种计量资料的假设检验，包括单样本t检验，两样本t检验。配对t检验的实质还是与单样本t检验相同。”假设检验是针对总体，是用样本的参数来计算，得出样本之间差异有无统计学意义，然后将推到总体中。因为在做无效假设和备择假设时都是针对的总体。

我最近也是在复习统计，不知道说的是不是对，我对您的R语言学习特别感兴趣，也一直想学，但是总抽不出时间。我能问您一个白痴点的问题么？R语言好学么？因为我认为所有的统计软件操作都是在有统计理论和统计思维的支持下才能学习的。但是我从来没学过编程类的东西，现在心里着实不安。望回复。谢谢！

股海 · 发表于 2022-12-1 19:15:24

这答案太好了！！

朝暮青丝 · 发表于 2022-12-1 19:15:56

一个问题，为什么上边这个公式分母为s／√n-1，下边假设检验就变成了s/√n了？

沈老师 · 发表于 2022-12-1 19:16:32

下边的错了吧？

		自动登录	找回密码
密码			立即注册

T检验分析

浏览过的版块