Syndicated from PubMed RSS Feeds
Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length.
BMC Bioinformatics. 2011;12 Suppl 5:S2
Authors: Lou SK, Li JW, Qin H, Yim AK, Lo LY, Ni B, Leung KS, Tsui SK, Chan TF
Abstract
BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths.
RESULTS: The distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads.
CONCLUSIONS: GT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads.
PMID: 21988959 [PubMed - in process]
More...
Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length.
BMC Bioinformatics. 2011;12 Suppl 5:S2
Authors: Lou SK, Li JW, Qin H, Yim AK, Lo LY, Ni B, Leung KS, Tsui SK, Chan TF
Abstract
BACKGROUND: RNA sequencing (RNA-seq) measures gene expression levels and permits splicing analysis. Many existing aligners are capable of mapping millions of sequencing reads onto a reference genome. For reads that can be mapped to multiple positions along the reference genome (multireads), these aligners may either randomly assign them to a location, or discard them altogether. Either way could bias downstream analyses. Meanwhile, challenges remain in the alignment of reads spanning across splice junctions. Existing splicing-aware aligners that rely on the read-count method in identifying junction sites are inevitably affected by sequencing depths.
RESULTS: The distance between aligned positions of paired-end (PE) reads or two parts of a spliced read is dependent on the experiment protocol and gene structures. We here proposed a new method that employs an empirical geometric-tail (GT) distribution of intron lengths to make a rational choice in multireads selection and splice-sites detection, according to the aligned distances from PE and sliced reads.
CONCLUSIONS: GT models that combine sequence similarity from alignment, and together with the probability of length distribution, could accurately determine the location of both multireads and spliced reads.
PMID: 21988959 [PubMed - in process]
More...