University of Texas, Arlington

                                    Computer Science and Engineering

                                     CSE5370   Bioinformatics


                                               Instructor:           Jean Gao
                                                    Office:           538 Engineering Research Building,  phone: 817-272-3628
                                         Office Hours:          Monday and Wednesday, 12:00 - 1:00pm or by appointment
                               Course Information:         Tue & Thu, 12:30 - 1:50pm.
                                             Classroom:          ERB 129

                                                          TA:         Mingon Kang
                                                     Office:          544 Engineering Research Building,
                                          Office Hours:          Tuesday and Thursday, 11:00am - 12:00pm


Course Description:  (class schedule)

Biological sciences are undergoing a revolution in how they are practiced. In the last decade, a vast amount of data (DNA sequences,
protein sequences, etc.) has become available, and computational methods are playing a fundamental role in transforming this data into scientific understanding.

    Bioinformatics involves developing and applying computational methods for managing and analyzing information about the sequence, structure and function of biological molecules and systems.  Topics will include understanding the evolutionary organization of genes (genomics), the structure and function of gene products (proteomics), and the dynamics for gene expression in biological processes (transcriptomics).


    To provide students an understanding of the fundamental computational problems in molecular biology and genomics, and a core set of widely used algorithms in computational biology.  The proposed course is intended to help students have a working knowledge of a variety of publicly available data and computational tools important in bioinformatics, and a grasp of the underlying principles
of contemporary bioinformatics. 


          1. A background in biology is not required, but students should be interested in catching up quickly on relevant topics.
          2. Half of the homework assignments is programming.  Students are free to choose any language they are comfortable with.
              For those who like Matlab, we have Matlab Bioinformatics Toolbox installed in the public labs at Ransom Halls and Nedderman Halls.


            N. Jones & P. Pevzner, "An Introduction to Bioinformatics Algorithms," 2004, ISBN 0262101068.


          --    Mount, D.W., "Bioinformatics : sequence and genome analysis". 2001,  Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory
                 Press. xii, 564. ISBN: 0879696087.  
                 (UTA library has electronic version of this book.)

          --   "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids". R. Durbin, S. Eddy, A. Krogh, and G. Mitchison.
                 Cambridge University Press, 1998  

           --  "Discovering genomics, proteomics and bioinformatics", A. Malcolm Campbell, and Laurie Heyer, Benjamin Cummings, 2003.
                 ISBN: 0-8053-4722-4.

           --  "Biochemistry", L. Stryer,  5th ed,  W H Freeman and Co.


          Homeworks & Projects                                                                                                          50%
                 -- There will be  about 5 homework assignments.
                 -- Mainly are programming projects and some are written exercises.
          Exam                                                                                                                                       30%
                 -- There will be one exam two weeks before final week.

          Class Presentation                                                                                                                   15%
                 -- Student will give a class presentation from a given selected topics and write
                     a critique for the presented topic.
                 -- Grading will be based on clarity of presentation, preparedness, understanding of problem and
                      slides writing.

          Class Attendance                                                                                                                     5%    

          Homework Policy:
                -- All assignments are due on the day of class time.  Hard copies with source code should be turned in at class.
                    Source code is supposed to email to TA before that.
                --  No emails or phone calls will be replied regarding to assignment within 24 hours of due time.
                --  Late submission will be deducted at 10% of each assignment score per 24 hours.

      Academic Misconduct:

          All homework assignments must be done individually.  Cheating and plagiarism will result in a default "F" grade for this course.
          Code for programming assignements must NOT be developed in groups, nor should be shared.  Discussions with peers, or TA about
          approaches and techniques are encouraged, but not at a detail level of implementation. 

     Tentative Topics:

        1. Introduction
            -- Introduction to bioinformatics
            -- Whirlwind tour of Chem/MolBio/BioChem
            -- Primer on probability theory

        2. Genomics
            -- Tools for sequence alignment and database searches
            -- Pairwise sequence alignment
            -- Sequence database search
            -- Multiple sequence alignment
            -- Hiden Markov Models (HMM)
            -- Gene Finding

        3. Proteomics
            -- Protein structure and its prediction
            -- Structure alignment
            -- Phylogenetic inference
            -- Molecular modeling (mechanics & dynamics)
            -- Protein threading

         4. Functional Genomics and Proteomics
           -- Construction and use of microarrays
           -- Statistical analysis of microarray data: clustering methods
           -- Microarray analysis: dimensionality reduction
           --  System biology and pathway study
           -- Mass spectrometry for proteomics: protein selection and identification