Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat, trimmed PE reads, and SAM flags

    Hi All,
    I used cutadapt to trim my PE reads and TopHat to map the trimmed ones. TopHat could map most of them. Then I wanted to use HTSeq-Count to count the reads. However it complained like this:
    Error occured when processing SAM input (line 184429 of file .../accepted_hits_unique_sorted.sam):
    'pair_alignments' needs a sequence of paired-end alignments
    [Exception type: ValueError, raised in __init__.py:612]
    I checked lines 184429 and 184430 of my input file. Here they are:
    DCNW5DQ1:262:C4804ACXX:6:2316:1266:24754 0 14 38508069 50 20M * 0 0 TTAATCTAGAATCCGCATAC ;?F<GHJJIJBHIG0D:6?/ AS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:8T11 YT:Z:UUNH:i:1
    DCNW5DQ1:262:C4804ACXX:6:2316:1266:24754 153 14 38508069 50 20M * 0 0 TTAATCTAGAATCCGCATAC F=/.=.4F=;?(68*9HDHE AS:i:-4 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:8T11 YT:Z:UUNH:i:1

    So they are paired but the flag column of read 1 is 0. This makes no sense. I don't either know if the flag value 153 of read 2 is correct or not.
    Also, I've had a similar problem when using Trimmomatic-trimmed PE reads with TopHat. It was also a flag problem but happened in unpaired reads only. Does anybody have any idea?

    Thanks,
    Sylvia

  • #2
    Flag zero means that it was considered single-ended when mapping. Flag 153 means it was considered paired when mapping, but its mate was unmapped. So, that doesn't make much sense, unless you merged two sets of mapped data (one single-ended and one paired), perhaps.

    Can you share the exact Cutadapt/Tophat/Trimmomatic commands you are using?

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      Flag zero means that it was considered single-ended when mapping. Flag 153 means it was considered paired when mapping, but its mate was unmapped. So, that doesn't make much sense, unless you merged two sets of mapped data (one single-ended and one paired), perhaps.

      Can you share the exact Cutadapt/Tophat/Trimmomatic commands you are using?
      Thanks for your reply. I didn't merge two sets of mapped data. I used cutadapt separately for left and right read files to trim off some adapter sequences, low quality bases, and too-short reads, then sent the two trimmed files to Tophat as input. The two files obviously had different numbers of reads, some were paired, some were not, and I don't know if they were ordered, but Tophat could map most of the paired and unpaired reads. I don't know why the flags of tophat-mapped paired reads contradict with each other.

      My reads had to be trimmed at both ends and some had to be discard when not consisting a specific sequence, so it would look confusing if I paste the commands here.

      For Trimmomatic, I just used the example command for PE reads, as shown on its website. Here it is:
      java -jar trimmomatic-0.30.jar PE --phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
      Then I called tophat using the 4 output files of trimmomatic in this order:
      output_forward_paired.fq.gz output_reverse_paired.fq.gz,output_forward_unpaired.fq.gz output_reverse_unpaired.fq.gz

      Comment


      • #4
        Originally posted by zzhao2 View Post
        I used cutadapt separately for left and right read files to trim off some adapter sequences, low quality bases, and too-short reads, then sent the two trimmed files to Tophat as input.
        That would certainly cause a problem. You can use my repair.sh tool to fix the two files:

        repair.sh in1=read1.fq in2=read2.fq out1=fixed1.fq out2=fixed2.fq

        ...or you could use my reformat.sh program to redo the trimming process paired. I can't directly advise you on the Trimmomatic command line, but perhaps a Trimmomatic user will chime in.

        Comment


        • #5
          Originally posted by Brian Bushnell View Post
          That would certainly cause a problem. You can use my repair.sh tool to fix the two files:

          repair.sh in1=read1.fq in2=read2.fq out1=fixed1.fq out2=fixed2.fq

          ...or you could use my reformat.sh program to redo the trimming process paired. I can't directly advise you on the Trimmomatic command line, but perhaps a Trimmomatic user will chime in.
          Thank you so much! I'll try them.

          Comment


          • #6
            Originally posted by zzhao2 View Post
            Thank you so much! I'll try them.
            Hi Brian, I just tried the link of repair.sh but it was BBMap, and I didn't seem to see that file in the source file list?

            Comment


            • #7
              Sylvia,

              repair.sh should be in the latest release of BBMap. You can't see it in the source file list because I only submit a single gzipped/tarred file, but it will be there once you unzip it.

              Comment


              • #8
                Originally posted by Brian Bushnell View Post
                Sylvia,

                repair.sh should be in the latest release of BBMap. You can't see it in the source file list because I only submit a single gzipped/tarred file, but it will be there once you unzip it.
                Downloaded. Thank you!

                Comment


                • #9
                  Hi, Using Tophat v 2.0.11 for my paired-end data, I ran the default option of num allowed mismatches=2, but when going over my sam file and fetching the NM tags for each alignment, I see cases where there are mismatches of more than 2 (even though these numbers are much smaller than those alignments with 0,1 and 2 mismatches respectively).

                  Why is this happening? I might be misunderstanding what the parameter really means, but all it says in the manual is "Final read alignments having more than these many mismatches are discarded. The default is 2.". So techncally, I shouldn't be seeing those alignments

                  Thanks

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM
                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-14-2024, 06:13 AM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-08-2024, 08:03 AM
                  0 responses
                  71 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-07-2024, 08:13 AM
                  0 responses
                  80 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-06-2024, 09:51 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X