仓库源文站点原文


title: "Notes on statistics done wrong" categories: Statistics updated: comments: true

mathjax: true

这本书是在 akuna capital 的 招聘页面 FAQs section 下的 QUANT tab 上看到的.

What can I read to prepare for the job/industry?

Many of our Quants have read and strongly recommend "Clean Code: A Handbook of Agile Software Craftsmanship" by Robert C. Martin, "Statistics Done Wrong: The Woefully Complete Guide" by Alex Reinhart, and "Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Ipython" by Wes McKinney.

<!-- more -->

Ch. 1 statistical significance

Ch. 2 statistical power

Ch. 4 base rate fallacy

第三章略.

一个例子, 测试 100 种药, 其中只有 10 种真正有效. 取置信水平为 0.05, power 为 0.8, 则 10 种有效药中大概有 8 种是统计显著的, 另外无效药中有 5 种 (应该是 4.5) 统计显著. 于是 false discovery rate 为 5/13=0.38, 这么高的原因是 base rate 10/100 太低了. 因此如果有一次实验统计显著, 那么它大概有 38% 得到的是无效药. 这便是 base rate fallacy. 换一个角度, 这就是套 Bayes 公式最简单的习题.

另一个话题是 multiple tests, 当多个检验同时进行的时候, 需要关注整体的 false positive rate.

Ch. 5 bad judges of significance

We compared treatments A and B with a placebo. Treatment A showed a significant benefit over placebo, while treatment B had no statistically significant benefit. Therefore, treatment A is better than treatment B.

这段有几个问题

Ch. 6 double-dipping

Ch. 7-8 regression

Ch. 9-12

略了. 开源大法好, 需要有机制鼓励开源, 比如规范引用数据集.

References