IC4R017-RNA-Seq-2014-24518221

From RiceWiki
Jump to: navigation, search

Project Title

  • Time-series RNA-seq analysis package (TRAP) and its application to the analysis of rice, Oryza sativa L. ssp. Japonica, upon drought stress


The Background of This Project

  • The high throughput RNA sequencing(RNA-seq) technique has advantages over microarray technique such as the decreased noise level and more replicable results ,compared to microarrays. For this reason, the number of time-series RNA-seq data sets in the public domain has increased dramatically over the past few years.
  • The main issue with RNA-seq is to handle sequencing data that is much bigger than microarray data.The researchers developed a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), for analyzing time series transcriptome data. There have been many packages developed for analyzing time-series gene expression data. The most widely used technique is to identify DEGs. Packages for finding DEGs from time-series data include SAM , LIMMA ,EDGE , maSigPro and BETR .However, most of them are developed for microarray data and they assume that gene expression follows normal distribution. Users with RNA-seq data,therefore, should perform additional conversion process or use the certain type of distribution (e.g., Poisson or Binomial) which needs further validation.
  • Finding DEGs and clustering is the first step for identifying genes that may have an important role in relation to phenotypes. However, the analysis needs to go at least one step further to extract biological implication from the gene list. The most widely used method for this additional analysis step is pathway analysis.
  • This article describes a Time-series RNA-seq Analysis Package(TRAP), integrating all necessary tasks such as finding DEGs, clustering and pathway analysis for time-series data.


Plant Culture & Treatment

  • The analysis starts with selecting DEGs from the genes. The researchers define DEGs as genes which have significantly different gene expression value in the control and treated samples. For gene expression value XðgÞ and YðgÞ of gene g in two different samples X and Y, we define the log fold change of FPKM values as DEðgÞ ¼ logðYðgÞ=XðgÞÞ. DEGs are chosen by Cuffdiff that tests the log fold change against the null hypothesis or by picking up genes above the cutoff value defined by the user.
  • Given a set of gene expression values from Lt time points, TRAP extends two pathway analysis algorithms, time-series ORA and time-series SPIA, for the analysis of time series RNA-seq data. The output is a list of DEGs with annotations and a list of pathways with P-values from two analysis methods.
  • Time-series clustering is to find groups of genes having similar expression pattern. Given a set of gene expression values from Lt time points, TRAP put the label for each gene by its expression pattern, and cluster the genes with the same label vector. This procedure is named as gene expression change labeling. Consider two gene expression values of a gene g; XtðgÞ and YtðgÞ at the same time point. If g is up-regulated in the Y sample and the log fold change of the gene expression DEtðgÞ ¼ logðYtðgÞ=XtðgÞÞ is above the threshold, then this is denoted as U. D and C are also the label which indicate down-regulated and constant, respectively.
  • The pathway analysis result in TRAP is created in both text and image forms. The text result includes a list of pathways, the size of the pathway, the number of DEGs, and their P-values (PNDE, PPERT,PG) and FDR corrected P-values. The image file has an undirected graph where nodes are pathways, node size represents the number of genes in a pathway, and node color indicates P-value of a pathway.


Illumina sequencing

  • To evaluate how good TRAP is, the researchers used rice (Oryza sativa L.Japonica nipponbare) mRNA-seq data generated by Illumina sequencing. The dataset compares drought-resistant AP2/EREBP transgenic rice samples with normal nontransgenic rice samples sequenced at three time points, 0, 1, and 6 h, after applying drought stress.


Research Findings

  • The researchers used TRAP to perform pathway analysis using data from the rice samples at each of 0, 1, and 6 h and also for the whole time point vector. Table 2 summarizes the pathway analysis result at the 5% cutoff of the FDR adjusted P-value.


'Table 2 TRAP pathway analysis results on rice dataset. PNDE and PPERT are the P-value from ORA analysis and SPIA analysis, respectively. The two P-values are combined into PG by Eq. 6. PNDE-FDR and PG-FDR are adjusted P-values using false discovery rate. The lower PNDE implies that the ratio of DEGs in the pathway are higher than the ratio of DEGs among the whole genes. The lower PPERT indicates the greater signaling effect than when gene expression values are randomly assigned. Status of the pathway is denoted as activated or inhibited, which is a relative status in AP2 samples compared to Nip samples. Three pathways were found to be activated in both one time point and the time-series analysis result and additionally two pathways were exclusively found only in the time-series analysis result, demonstrating the effectiveness of the time series analysis methods. Four bold pathways in time-series pathway analysis result are found to have relevance to drought-resistance of plant in previous studies. Plant hormone signal transduction and Diterpenoid biosynthesis pathways include genes related to stress response and stomatal closure. Relevance of Biosynthesis of unsaturated fatty acids pathway to drought stress is supported by previous studies that show the relationship between unsaturation level of fatty acid and water deficit condition. Note that Plant–pathogen interaction pathway is removed from the time-series result due to its lack of signaling effect.'


  • The PG-FDR of Plant–pathogen interaction pathway is below 0.05 at 1 h but not in the time-series analysis result due to its increased PPERT, which indicates its signaling impact is much weaker considering the whole time points. Removal of this pathway is reasonable because drought stress is an abiotic stress and its high PNDE might be from the high ratio of common DEGs with Plant hormone signal transduction pathway. 41 DEGs found in the five pathways are listed in Table 3 with description from gene annotation data.


'Table 3 41 DEGs found in the time-series pathway analysis with description from RAP-DB gene annotation.'


  • Fig. 4 is the graphical representation of the pathway analysis result. Only pathways with adjusted P-value above 0.5 are included due to the size of the network. Four pathways in green are pathways with unknown status and one pathway in red is Plant hormone signal transduction pathway which is found to be activated by SPIA. Although there is no common gene between five significant pathways, the graph shows the location of significant pathways and clusters in the pathway network.


'Fig. 4. The graphical representation of the pathway analysis result including pathways with adjusted P-value above 0.5. Four pathways in green are pathways with unknown status and one pathway in red is Plant hormone signal transduction pathway which is found to be activated by SPIA. No genes are found to be common between the five pathways.'


  • 4 out of 27 possible clusters had significant pathways in the result of clustering analysis with the threshold of 1.0 (Table 4). There are two pathways previously found in the pathway analysis using ORA, Isoquinoline alkaloid biosynthesis and Diterpenoid biosynthesis.However, the status of these pathways (i.e., direction of gene expression change) was unknown since ORA method which estimates the significance of pathways only from the number of DEGs. The estimation of pathway status is possible from the clustering result from the labels of the clusters.


'Table 4 Clustering results on rice dataset. Pathways already found in the pathway analysis are bold-faced. The estimation of pathway status is possible from the label of the clusters.'


Labs working on this Project

  • Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
  • Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
  • Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
  • Department of Biomedical Sciences, Sunmoon University, Asan 336-708, Republic of Korea


Corresponding Author

  • Sun Kim:sunkim.bioinfo@snu.ac.kr