Biostatistics(8)概率与概率分布

3.3.7 泊松分布 Poisson distribution

Simeon Denis Poisson wrote a paper entitled "Researches on the probability of criminal and civil verdicts" published in 1837. Poisson discovered the Poisson Distribution as an approximation of Binomial Distribution when the number of trials is large and the probability of a success tends to zero.

Biostatistics(8)概率与概率分布
文章图片
image.png
假如随机变量X服从泊松分布,即X~B(n,p),那么随机变量X的概率分布律为:
Biostatistics(8)概率与概率分布
文章图片
Poisson.png 令λ = np,假设当n接近无限,p接近0,那么λ = np则不变,因此,p就可以转变为λ/n。
因此X的概率分布律变为:
Biostatistics(8)概率与概率分布
文章图片
Poisson.png 泊松分布的PDF和CDF
Biostatistics(8)概率与概率分布
文章图片
PDF&CDF.png 【Biostatistics(8)概率与概率分布】泊松分布的平均值和方差
平均值:E(x)=np= λ
方差: λ(1-p):其中当p非常小时,近似等于 λ
柏松分布的实际例子
Poisson distribution is for counts-if events happen at a constant rate over time or space(interval),the Poisson distribution gives the probability of X number of events occurring in an interval.
根据经验,下述事件符合泊松分布:
1、The number of deaths by horse kicking in the Prussian army(first application)
2、Birth defects and genetic mutations
3、Rare diseases (like Leukemia, but not AIDS because it is infectious and so not independent)
4、Car accidents during a given time period
5、Number of typing errors on a page
深入理解泊松分布
1、The poisson distribution is generated by processes in which a large number of intervals (time period, space, etc.) are hit by a relatively small number of events.
2、The occurrence or non-occurrence of a hit in an interval is independent to another occurrence or non-occurrence in the same interval, and the occurrences of the hit in any two intervals are independent to each other as well.
3、The probability of observing the occurrence of two or more hits in a small interval is nearly zero.
4、In practice, since n is very large and p is very small, we only need to care about the mean occurrence in a given time interval which solely determine the corresponding poisson distribution.
Example1 of a Poisson Distribution
A life insurance salesman sells on the average 3 life insurance policies per week. Then, what’s the probability that he will sell
some policy in a week?
2 or more policies but less than 5 policies.
Assuming that there are 5 working days per week, what is the probability that in a given day he will sell one policy?
A:
The probability this guy does not sell any policy in a week:

Biostatistics(8)概率与概率分布
文章图片
A1.png
Therefore, the probability he will sell some policy is 1- P(x=0)=0.950213.
The probability he will sell 2~4 policy is P(x=2)+P(x=3)+P(x=4) = 0.6434504.
If he works five days a week, then the average policy he sells a day is 3/5=0.6. Accordingly, the chances that he will sell one policy in a day is
Biostatistics(8)概率与概率分布
文章图片
image.png
Example2 of a Poisson Distribution
(1)If there are 3X10^9 basepairs in the human genome and the mutation rate per generation per basepair is 10^-9, what is the mean number of new mutations that a child will have, what is the variance in this number, and what will the distribution look like?
A:Mean=310^910^-9=3
(2) Generating random sequence fragments are necessary steps for second-generation sequencing. Suppose the average length of randomly generated fragments is 500bp, if we have a cDNA of 1.5kb and repeat the random fragmentation 200 times, how many times will we see two fragments at each step?

Biostatistics(8)概率与概率分布
文章图片
fragments.png
A:The average number of fragments for a cDNA of 1.5kb is 1500/500 = 3.
Biostatistics(8)概率与概率分布
文章图片
frag.png
The average number of cuts on a cDNA of 1.5kb is 2, λ=2
P(x=1, λ=2)=0.2706706
Out of 200 times, the number of times we observe two fragments is 200 * 0.2240418 = 54.13≈54.
Example3 of a Poisson Distribution
Cole (1946) Arthropoda(节肢动物)Data:
Cole scattered boards on a forest floor and periodically counted the arthropoda forms under them. A number of questions can be asked: If 102 spiders are to be distributed at random among 240 pieces of cover, how many of the pieces of cover would we expect to find empty, with one spider, two spiders, ..., n spiders?
A:If spiders arrive at and stay under boards without respect to the number of spiders already under a boarder, then their occurrences are independently to each other - Poisson distribution.
The average occurrence of spiders is 102/240=0.425 (mean and variance of the poisson distribution)
The probability of a given board has no spider

Biostatistics(8)概率与概率分布
文章图片
probability.png
Our of 240 boards, we would expect to see 240 * 0.6537698=165.0≈157
Similarly,the number of boards with three spiders:
Biostatistics(8)概率与概率分布
文章图片
three spiders.png
This is the expected number derived from poisson distribution. How does it compare with the real number?
Biostatistics(8)概率与概率分布
文章图片
Real data.png
Whether a spider decides to stay under a board is not affected by the presence or absence of other spiders in that board.
How about other species?
Similar experiments were done on Sowbugs( 潮虫、鼠妇)
Biostatistics(8)概率与概率分布
文章图片
Sowbugs.png
Sowbugs are social. The reason why there are more "zero" boards is brought about by sowbugs leaving "one" class boards for more social conditions, which removes boards from the "one" class, and place them in the "zero" class.

    推荐阅读