Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strand-specificity in paired-end data

    Dear All,

    I have a question about paired-end Illumina Hiseq100 data. My fastq files have normal average quality, but Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens?

    Also, when I display TopHat bam files in UCSC browser, I get left-hand reads colored blue and right-end reads-red. The same color scheme is used for single-end reads to show strand specificity. But paired-end reads in USCS browser have no visible strand indication. Or is it just that I do not see it?
    When reads are assembled into transcripts by Cufflinks, each transcript is annotated to either "plus" or "minus" strand and it generally corresponds right with reference annotation. But some transcripts get "antisense" class codes (x) and I would like to check how many reads actually build them.
    Does anyone know how to make paired-end reads look strand specific?

  • #2
    Originally posted by arabidopsis View Post
    Also, when I display TopHat bam files in UCSC browser, I get left-hand reads colored blue and right-end reads-red. The same color scheme is used for single-end reads to show strand specificity. But paired-end reads in USCS browser have no visible strand indication. Or is it just that I do not see it?
    ...
    Does anyone know how to make paired-end reads look strand specific?
    I'm not familiar with the UCSC browser settings, but you could try another BAM viewer for a 'second opinion', e.g. IGV or Tablet. I've used Tablet to show paired end reads with their green/blue strand specific colour scheme.

    Comment


    • #3
      Originally posted by arabidopsis View Post
      Dear All,

      Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens?
      Could you specify what you exactly did? because Tophat use Bowtie as mapper.

      Comment


      • #4
        Originally posted by schönblick View Post
        Could you specify what you exactly did? because Tophat use Bowtie as mapper.
        TopHat settings:

        RNA-Seq FASTQ file 94: Clip on Read_LL
        Conditional (refGenomeSource) 0
        Select a reference genome /galaxy/data/hg18/bowtie_index/hg18
        Conditional (singlePaired) 1
        RNA-Seq FASTQ file 97: Clip on Read_RR
        Mean Inner Distance between Mate Pairs 200
        Conditional (pParams) 1
        Library Type FR First Strand
        Std. Dev for Distance between Mate Pairs 20
        Anchor length (at least 3) 8
        Maximum number of mismatches that can appear in the anchor region of spliced alignment 0
        The minimum intron length 70
        The maximum intron length 500000
        Conditional (indel_search) 1
        Max insertion length. 3
        Max deletion length. 3
        Maximum number of alignments to be allowed 20
        Minimum intron length that may be found during split-segment (default) search 50
        Maximum intron length that may be found during split-segment (default) search 500000
        Number of mismatches allowed in the initial read mapping 2
        Number of mismatches allowed in each segment alignment for reads mapped independently 2
        Minimum length of read segments 25
        Conditional (own_junctions) 1
        Conditional (closure_search) 1
        Conditional (coverage_search) 0
        Minimum intron length that may be found during coverage search 50
        Maximum intron length that may be found during coverage search 20000
        Use Microexon Search No

        Bowtie settings:
        Conditional (refGenomeSource) 0
        Select a reference genome /galaxy/data/hg18/bowtie_index/hg18
        Conditional (singlePaired) 1
        Forward FASTQ file 19: P0037_N038-02_CGATGT_L004_R1_001.fastq
        Reverse FASTQ file 18: P0037_N038-02_CGATGT_L004_R2_001.fastq
        Maximum insert size for valid paired-end alignments (-X) 1000
        The upstream/downstream mate orientation for valid paired-end alignment against the forward reference strand (--fr/--rf/--ff) FR (for Illumina)
        Conditional (pParams) 1
        Skip the first n pairs (-s) 0
        Only align the first n pairs (-u) -1
        Trim n bases from high-quality (left) end of each read before alignment (-5) 5
        Trim n bases from low-quality (right) end of each read before alignment (-3) 20
        Maximum number of mismatches permitted in the seed (-n) 3
        Maximum permitted total of quality values at mismatched read positions (-e) 70
        Seed length (-l) 28
        Whether or not to round to the nearest 10 and saturating at 30 (--nomaqround) Round to nearest 10
        Number of mismatches for SOAP-like alignment policy (-v) -1
        Minimum insert size for valid paired-end alignments (-I) 0
        Maximum number of attempts Bowtie will make to match an alignment for one mate with an alignment for the opposite mate (--pairtries) 100
        Choose whether or not to attempt to align the forward reference strand (--nofw) Align against the forward reference strand
        Choose whether or not to align against the reverse-complement reference strand (--norc) Align against the reverse-complement reference strand
        Whether or not to try as hard as possible to find valid alignments when they exist (-y) Do not try hard
        Report up to n valid arguments per pair (-k) 1
        Whether or not to report all valid alignments per pair (-a) Do not report all valid alignments
        Suppress all alignments for a pair if more than n reportable alignments exist (-m) -1
        Write all reads with a number of valid alignments exceeding the limit set with the -m option to a file (--max) False
        Write all reads that could not be aligned to a file (--un) True
        Conditional (pBestOption) 0
        Maximum number of backtracks permitted when aligning a read (--maxbts) 125
        Override the offrate of the index to n (-o) -1
        Seed for pseudo-random number generator (--seed) -1
        Suppress the header in the output SAM file False

        I used both Bowtie and TopHat versions, available on Galaxy Genome web platform. And the fact that TopHat is based on Bowtie makes the matter even more confusing...

        Comment


        • #5
          Originally posted by arabidopsis View Post
          Dear All,

          I have a question about paired-end Illumina Hiseq100 data. My fastq files have normal average quality, but Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens
          Because using Bowtie to directly map RNA-Seq data to a genomic reference is not appropriate. RNA-Seq reads do not (generally) map contiguously to their reference genome. They have to be "split" to span the introns which have been spliced out. Bowtie won't map split reads; Tophat does. The results you describe are not surprising at all. You need to use the correct tool for the job.

          Comment


          • #6
            Originally posted by kmcarr View Post
            Because using Bowtie to directly map RNA-Seq data to a genomic reference is not appropriate. RNA-Seq reads do not (generally) map contiguously to their reference genome. They have to be "split" to span the introns which have been spliced out. Bowtie won't map split reads; Tophat does. The results you describe are not surprising at all. You need to use the correct tool for the job.
            kmcarr,

            What you say is right, but when I map single-end reads with bowtie it works fine. Tophat gives only 10-15% more mapped reads. And with the current problem I also looked at the splice junction file, produced by tophat. It only contains about 10000 regions. This cannot account for 10-fold increase in mapped read number, can it?

            Comment


            • #7
              Originally posted by arabidopsis View Post
              kmcarr,

              What you say is right, but when I map single-end reads with bowtie it works fine. Tophat gives only 10-15% more mapped reads. And with the current problem I also looked at the splice junction file, produced by tophat. It only contains about 10000 regions. This cannot account for 10-fold increase in mapped read number, can it?
              When working with paired-end data bowtie considers the pair as a whole when determining the validity of an alignment. Part of that consideration is the relative distance between the two reads when aligned to the genome. The limits to be considered valid are set by the -X and -I options and in your example the maximum distance between paired reads was 1000bp. If the forward & reverse read map to different exons they could be separated by a much greater distance than this.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Working...
              X