Problem Set #1
Due on January 22nd by 1:50 pm
Please email your pset answers to me (emcintire@uchicago.edu) with last week’s lab answers together as a single PDF.
- (15 points) In December last year, 109 babies were born at the Motherland hospital. Among those, 51 were girls. Assuming a binomial distribution for the number of girls with parameter θ (the rate of female birth at this hospital)
- a) write down the likelihood for this data
- b) plot the likelihood as a function of θ (some of the code here may be helpful https://hakyimlab.github.io/hgen471/L1-binomial-parameter-posterior.html)
- c) find the maximum likelihood estimate of θ
- d) calculate and plot the unnormalized posterior density for θ using a uniform prior for θ (ask the TA or the instructor if you don’t know how to do this)
- e) what’s the relationship between MLE and the posterior distribution of θ when you have a uniform prior?
-
(10 points) Early onset Alzheimer’s disease is very rare; for illustrative purposes, assume it is 0.1% among adults aged 30-60. Rare variants in 3 genes, APP, PSEN1 and PSEN2 have been identified as causing early onset AD in a dominant fashion, With P(AD | any of the three variants) = 1. Early onset AD can also be caused by head injury, many other non-genetic factors have been suggested. In a series of 101 cases of early onset AD, only 7 (or approximately 7%) were found to have these variants in APP, PSEN1 or PSEN2; that is, the attributable risk due to the three rare variants are low. For simplicity, assume that the probability of variants in these 3 genes are so rare that we can assume P(no variant in any gene)≈1. Let the disease allele D symbolize a variant in any one of the three genes, d is no variant, and Y=1 means AD present. Estimate the probability of a phenocopy, P(Y=1 | dd) (also known as phenocopy rate) for these genes combined, using the data given and Bayes Rule.
-
(10 points) Suppose we are dealing with a quantitative recessive trait, which is distributed as N(μ,1) when there are two variants, and N(0,1) otherwise. Calculate the probability that a randomly selected person with two variants has a trait higher than a person with one or no variants, when μ=0.5, and when μ=2.
- Hint: remember that the sum of two independent normal r.v. is another r.v. with mean = difference of the means and variance = sum of the variances. Also that event {Z2 > Z1 } is equivalent to {Z2 - Z1 > 0}. In R, pnorm(x, mean, var) will give you the probability of a normal r.v. <= x.
-
(15 points) Suppose a population of 2000 chromosomes; 1000 carry an A allele at a marker and 1000 carry a. Now suppose a disease mutation (+) arises on one chromosome bearing an A allele, and all the rest of the chromosomes have - at that location.
- a). What are the marginal frequencies of the marker and Disease Susceptibility Locus (DSL for short, it refers to the location of the mutation)?
- b) Fill in the 2×2 table of marker and disease mutation haplotypes.
- c) What is LD coefficient, D , for this table?
- d) What is the correlation between the marker and DSL?
- e) Repeat the questions above, now assuming only 100 chromosomes, one mutation on the same haplotype as an A allele, and a 50/50 split of A and a alleles.