Du Lab Research
Computational Mass Spectrometry

Research

People

Software

Publication

Open position

Teaching

Links

Contact

Location

Our lab is located on the North Carolina Research Campus (NCRC). This campus is home to research labs from eight universities across North Carolina. In addition to experimental capabilities in individual labs, the David H. Murdock Research Institute on the campus provides a broad range of multidisciplinary technologies. We are affiliated with the Department of Bioinformatics & Genomics at the University of North Carolina at Charlotte.

Research

Our current research focuses on developing novel algrithms to analyze and visualize large-scale mass spectrometry-based proteomics and metabolomics data and to extract maximal amount of biologically meaningful information from the data. Our long-term research goal is to integrate heterogeneous types of -omics data and develop predictive models of biological systems to contribute to nutrigenomics research and applications. Current and past projects include:

  • Proteomics Data Management System

  • Computational Platforms for GC-MS and LC-MS Metabolomics Data Analysis

  • A Computational Platform for Ion Mobility Mass Spectrometry Data Analysis

  • Identification of Chemically Crosslinked Peptides using Tandem Mass Spectrometry

    Chemical crosslinking combined with mass spectrometry is a powerful method to identify protein-protein interactions and to study the structure of protein (complex). Crosslinking is the process of covalently binding two proteins using crosslinking reagents. After proteolytic cleavage, the crosslinked peptides are identified using tandem mass spectrometry.

    We have developed Xlink-Identifier, a comprehensive data analysis platform, that has been developed for the label-free experimental approach. It can identify inter-peptide, intra-peptide, and deadend crosslinks, and regular peptides. Identifications of these four types of peptides use a single, universal scoring mechanism, which enables the selection of the best peptide when multiple types of peptides match with a single spectrum. Xlink-Identifier was designed for general-purpose analyses in that the crosslinker can be amine-reactive or specific to any other amino acid(s). It streamlines data pre-processing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein-protein interactions and protein structure using mass spectrometry.

    The software has been tested on both commercial and custom synthesized crosslinkers with the latter featuring an enrichment tag. The distance constraints obtained from the identified crosslinks are consistent with the crystal structure of the protein under investigation. Xlink-Identifier has the potential to identify large-scale protein-protein interactions using tandem mass spectrometry.

    Link to paper

  • Estimation of the False Discovery Rate for Phosphopeptide Identifications

    The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. We have developed a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated.

    Link to paper

  • A computational Strategy to Analyze Label-Free Temporal Bottom-Up Proteomics Data

    Biological systems are in a continual state of flux, which necessitates an understanding of the dynamic nature of protein abundances. The study of protein abundance dynamics has become feasible with recent improvements in mass spectrometry-based quantitative proteomics. However, a number of challenges still remain related to how best to extract biological information from dynamic proteomics data, for example, challenges related to extraneous variability, missing abundance values, and the identification of significant temporal patterns. We have developed a strategy to address these issues.

    Link to paper


Copyright @ 2009-2010 by Xiuxia Du. All rights reserved. Last modified on 11-21-2008.