Problem Set #1

Due on January 22nd by 1:50 pm

Please email your pset answers to me (emcintire@uchicago.edu) with last week’s lab answers together as a single PDF.

  1. (15 points) In December last year, 109 babies were born at the Motherland hospital. Among those, 51 were girls. Assuming a binomial distribution for the number of girls with parameter θ (the rate of female birth at this hospital)
  • a) write down the likelihood for this data
  • b) plot the likelihood as a function of θ (some of the code here may be helpful https://hakyimlab.github.io/hgen471/L1-binomial-parameter-posterior.html)
  • c) find the maximum likelihood estimate of θ
  • d) calculate and plot the unnormalized posterior density for θ using a uniform prior for θ (ask the TA or the instructor if you don’t know how to do this)
  • e) what’s the relationship between MLE and the posterior distribution of θ when you have a uniform prior?
  1. (10 points) Early onset Alzheimer’s disease is very rare; for illustrative purposes, assume it is 0.1% among adults aged 30-60. Rare variants in 3 genes, APP, PSEN1 and PSEN2 have been identified as causing early onset AD in a dominant fashion, With P(AD | any of the three variants) = 1. Early onset AD can also be caused by head injury, many other non-genetic factors have been suggested. In a series of 101 cases of early onset AD, only 7 (or approximately 7%) were found to have these variants in APP, PSEN1 or PSEN2; that is, the attributable risk due to the three rare variants are low. For simplicity, assume that the probability of variants in these 3 genes are so rare that we can assume P(no variant in any gene)≈1. Let the disease allele D symbolize a variant in any one of the three genes, d is no variant, and Y=1 means AD present. Estimate the probability of a phenocopy, P(Y=1 | dd) (also known as phenocopy rate) for these genes combined, using the data given and Bayes Rule.

  2. (10 points) Suppose we are dealing with a quantitative recessive trait, which is distributed as N(μ,1) when there are two variants, and N(0,1) otherwise. Calculate the probability that a randomly selected person with two variants has a trait higher than a person with one or no variants, when μ=0.5, and when μ=2.

    • Hint: remember that the sum of two independent normal r.v. is another r.v. with mean = difference of the means and variance = sum of the variances. Also that event {Z2 > Z1 } is equivalent to {Z2 - Z1 > 0}. In R, pnorm(x, mean, var) will give you the probability of a normal r.v. <= x.
  3. (15 points) Suppose a population of 2000 chromosomes; 1000 carry an A allele at a marker and 1000 carry a. Now suppose a disease mutation (+) arises on one chromosome bearing an A allele, and all the rest of the chromosomes have - at that location.

  • a). What are the marginal frequencies of the marker and Disease Susceptibility Locus (DSL for short, it refers to the location of the mutation)?
  • b) Fill in the 2×2 table of marker and disease mutation haplotypes.
  • c) What is LD coefficient, D , for this table?
  • d) What is the correlation between the marker and DSL?
  • e) Repeat the questions above, now assuming only 100 chromosomes, one mutation on the same haplotype as an A allele, and a 50/50 split of A and a alleles.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.