Bioinformatics Module Project 3

Bioinformatics - Module Project 3

Introduction

The purpose of this project is to help you gain familiarity with aspects of sequence homology, including DNA and protein alignments, the significance of using the appropriate scoring matrices and the using the BLAST search tool.[1]

Presented below are a sequence of exploratory steps you should undertake. Each of these steps is accompanied by a question. This is an individual project. Each of you will submit and be graded on the answer you provide for the question. Some of the questions are marked as General Questions. These are unrelated to the specific steps of the exercise and require independent research on your part.

Assignment

1. Basics: Download the myoglbin protein sequences from the pig, chicken and snake. Since these are full protein sequences, we will use the global sequence alignment method (the Needleman-Wunch). The matlab function to perform this alignment is nwalign. Here is an example of the usage of this function, sequence seq1 and seq2 are being aligned using PAM250 scoring matrix. The function returns the score and the alignment.

[score, align] = nwalign(seq1, seq2, ’scoringmatrix’, @pam250)

Perform the global alignments using PAM-10 and PAM-250 and complete similarity matrix shown in Fig. 1. Note that the similarly matrix is symmetric – so only upper half need be computed.

Complete the following matrix:

              PAM - 10                                                                   PAM - 250

Pig        Chicken      Snake                                          Pig        Chicken      Snake

Pig

Chicken

Snake

Pig

Chicken Snake

  • Is a snake more like a chicken or more like a pig?(c) Is a chicken more like a pig or more like a snake?
  • Are the results consistent across the PAM-10 and PAM-250 matrices?
  • Why are the results with PAM-10 higher than PAM-250?

2. The H and N Proteins:Next, we will look at the efficacy of flu vaccines. Influenza, commonly known as the flu, is caused by a virus that attacks the upper respiratory tract (i.e., the nose, the throat and the lungs). Cold and dry weather allows the virus to survive longer outside the body than in warm weather.

There are three types of influenza virus: A, B and C. Type A can infect humans, other mammals and birds and can spread fast. Types B and C affect only humans; and type C causes only a mild infection. Type A viruses are sub-typed based on proteins on the surface of the virus – specifically proteins hemagglutinin (abbreviated as H or HA) and neuraminidase (abbreviated as N or NA). Types A and B viruses continually evolve genetically with changes being made to the amino acid sequence of the H and N proteins and thus keep eluding the host’s immune system.

Each year, the influenza vaccine uses the prevalent strains from the previous years to design the vaccines. Generally, the vaccine contains three virus strains, two type A strains and one type B strain. Type C strain is not used since it only causes a mild infection. Each year in February, the World Health Organization (WHO) picks the three influenza virus strains to be included in the vaccine for that year. .

To find information on the strains in that season’s influenza vaccine, you will need to visit the Center for Disease Control http://www.cdc.gov/flu/weekly/fluactivitysurv.htm and look at ”Past Weekly Surveillance Reports” to select the previous influenza season. We will be investigating the 2009-10 influenza season. So, you will need to look at the 2008-09 influenza season summary and read the section titled “Composition of the 2009-2010 Influenza Vaccine.”

YOu will find that WHO recommended that the 2009–10 Northern Hemisphere influenza vaccine contain (1) A/Brisbane/59/2007-like (H1N1), (2) A/Brisbane/10/2007like (H3N2), and (3) B/Brisbane/60/2008-like (B/Victoria lineage) viruses.

Antigenic Characterization for 2009 - 2010: CDC antigenically characterized 2 seasonal influenza A (H1N1), 14 influenza A (H3N2), 43 influenza B, and 1,904 pandemic H1N1 viruses with 99.5% of pandemic H1N1 viruses being related to the A/California/07/2009 (H1N1) reference virus.

The 2 seasonal influenza A (H1N1) viruses tested were related to A/Brisbane/59/2007. The 14 influenza A (H3N2) viruses tested were related to A/Brisbane/10/2007, and were antigenically related to A/Perth/16/2009. Of the 43 influenza B viruses tested, 38 (88.4%) were related to B/Brisbane/60/2008. Five (11.6%) viruses tested belonged to the B/Yamagata lineage.

  • Compare the sequence of the viruses in the 2009-10 vaccine to A/California/07/2009 virus that required the 2009 pandemic H1N1 vaccine. Specifically do a pairwise comparison of the hemagglutinin protein and the neuraminidase protein. Overall, how similar are the hemagglutinin and neuraminidase protein sequences to the influenza virus used in the vaccination for that season?
  • Based on sequence similarity, how well do you think the vaccine protected avaccinated person from the different strains of the 2009–2010 season?

3. BLAST We’ve cloned a new mouse cDNA, and it has an open reading frame. We want to use BLAST https://blast.ncbi.nlm.nih.gov/ to characterize this mRNA.

MKRKFVGAAIGGALAVAGAPVALSAVGFTGAGIAAGSIAAKMMSAAAIANGGGIAAGGLVATLQSVGVLGL

STITNIILVAVGTATGARAEGSMGASREQESGPQDPPQELQEPQEPPSCKKQDLNLGKFVGAAIGGALAVA

GAPIALSAVGFTGAGIAAGSIAAKMMSAAAIANGGGIAAGGLVATLQSVGILGLSTSTNIILGAVGAATGA

TAAGAMGACREQEPGLQDLQQEPKEPQEPQELQKQQEPQEPQELQKQQETQETQETQELQKTQEPPSYEK

‡ Configure BLAST to run a search using all of the following parameters: NR database, using the blastp algorithm , with the filter on, and using =the matrix blosum62.

  • What is the purpose of running the search with filter?
  • From the graphic output, what conserved domain is in our protein the regionthat it spans?
  • How many hits are there? How many hits (from the list) have an E-score of ¡0.05 ? How many hits (from the list) have 0.05 ¡ E-score ¡ 1.00 ?
  • Provide the accession numbers, descriptions and e-scores for the 5 best targetsequences.
  • Review the taxonomy report. To which organism do the hits belong?
  • Now run the search without the filter. How do the answers to the above questions change?
  • Look at the top hit, and enter the RefSeq entry. What is its accession number?What species is it from? How long is it? What is its name? According to the Genbank record, from where to where is the domain that blast found?
  • Download this GenBank RefSeq to a local file, and compute its alignment withour sequence using the MATLAB nwalign Include the alignment with your answer. Is this a true hit? Why or why not?

4. Dynamic Programming Grid: You are given five coding sequence fragments. These are thought to encode a homologous proteins in different species. The sequences begin with the start codon:

  1. ATGCCGGCGGGCATGACGAAGCATGGCTCCCGCTCCACCAGCTCG
  2. ATGCCCGGGTGGATGAATAAGCATGGATCTCGATCGACTACCTCG
  3. ATGCCGGCGGGCATGACGAAGCATGGCTCGCGCTCCACCAGCTCG
  4. ATGGTCGGCGAACGCGACAGGGACCGTGAGGCGGTACGCTGGGCA
  5. ATGGTCGGCGAACGCGACAGGGACCGATGAGGCGGATACGCTGGG

Answer the following:

  • Construct the dynamic programming grid to compare codon 10 though codon 12 of sequence 1, which is the human gene, to sequences 2 through 4 using scoring: match +2, mismatch -1, indel -1.
  • For each case, write out the optimal alignment at and give its score.
  • What can you say about the relatedness of the species. Which species is likely abetter model of the human version, 2, 3 or 4?
  • Sequence 5 is derived from sequence 4, but now has a mutation with potentiallyrather severe consequences. Using the MATLAB alignment programs to find this mutation. And then use the Genetic Code to explain why this mutation is so severe? Can you identify the phenotype of the mutation. Hint: Use BLAST to find out the gene to which these sequences belong, and OMIM to track down the disease.

References

[1] Chapter 5: Gautam B. Singh. Fundamentals of Bioinformatics and Computational Biology. Springer, 2015.


Want latest solution of this assignment

Want to order fresh copy of the Sample Template Answers? online or do you need the old solutions for Sample Template, contact our customer support or talk to us to get the answers of it.