Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probelem with SAM file created by Bowtie2

    Hello everyone,

    I created a bowtie index from a fasta file which was merged from 4 individual fasta file. One is from miRBase, two from RFAM coressponding to tRNA and other non coding RNA and the final one is repeat masker fasta sequences downloaded from UCSC.

    Then i aligned the reads using bowtie2 with index created from above fasta files under local mode with the following options:

    bowtie2 --local -N 1 -L 20 -p 20 -x ~/Downloads/index/superfasta.index -f -U input_E1.fa -S output.sam

    After the run i got the alignment rate as follow:

    2599066 reads; of these:
    2599066 (100.00%) were unpaired; of these:
    1730189 (66.57%) aligned 0 times
    265716 (10.22%) aligned exactly 1 time
    603161 (23.21%) aligned >1 times
    33.43% overall alignment rate


    Then i converted the sam file to bam using

    samtools view -bS output_E1.sam > output_E1.bam

    Then i checked the falgstats using:

    samtools flagstat output_E1.bam'

    and got the following results

    0 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    0 + 0 supplementary
    0 + 0 duplicates
    0 + 0 mapped (N/A : N/A)
    0 + 0 paired in sequencing
    0 + 0 read1
    0 + 0 read2
    0 + 0 properly paired (N/A : N/A)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (N/A : N/A)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    The flagstat output says that none of the sequences mapped, whereas it should be 33.43% as shown in the first output.

    Then, i also checked the number of sequences that mapped using this command
    samtools view -F 4 output.bam | wc -l
    and again i got 0.

    When i used tail command for the SAM file, everything seems fine i can see the flag value of 0 or 4 or 16. But why isn't the samtools showing weird results. is there an issue with the SAM file? I actually used another genome with another file of reads and i when i analysed using SAM tools, everything is fine and all the numbers are concordant. So the only likely explanation is that there must be a problem with the reference index or the fasta file that is used to create the reference index. There were some white spaces in the header of few fasta sequecnes and i removed them, then tried it again with the new index and the problem still persists. What could be the problem behind this? Whether the SAM file has a malformed line? or could be something else. If someone could suggest a solution, that would be awesome. I have an fast approaching deadline to complete the data analysis and i am stuck here. if anyone could help find a solution, that would be great.

    thanks!
    Last edited by newfie; 09-21-2016, 09:25 AM. Reason: one output file name was not mentioned correctly

  • #2
    Hi,

    did you got any error messages, while doing:
    Originally posted by newfie View Post
    samtools view -bS output_E1.sam > output_E1.bam
    ?
    You forgot to include the header which should lead to a non-functional bam file.
    Please try
    Code:
    samtools view -bS[B]h[/B] output_E1.sam > output_E1.bam
    this should produce a working bam file.

    Comment


    • #3
      Originally posted by Michael.Ante View Post
      Hi,

      did you got any error messages, while doing:

      ?
      You forgot to include the header which should lead to a non-functional bam file.
      Please try
      Code:
      samtools view -bS[B]h[/B] output_E1.sam > output_E1.bam
      this should produce a working bam file.
      I tried it again with samtools view -bSh output_E1.sam > output_E1.bam , but the problem is not solved, it just keeps on giving the same output

      Comment


      • #4
        Maybe you have also the wrong file:

        bowtie2 --local -N 1 -L 20 -p 20 -x ~/Downloads/index/superfasta.index -f -U input_E1.fa -S output.sam
        vs.
        samtools view -bSh output_E1.sam > output_E1.bam

        Comment


        • #5
          Originally posted by Michael.Ante View Post
          Maybe you have also the wrong file:


          vs.
          Actually there are two output files output.sam and output_E1.sam in the home directory. both of them gives the same output as i had mentioned earlier. Out of curiosity, i tired with fastq files instead of fasta files and now the sam files are okay, the statistics number are all now concordant. It seems that there was a problem when fastq files were converted to fasta format. But thank you very much for your time and help that you have rendered

          Michael Ante, i wanted to ask a favor. Do you know how to subset the sam files based on the fasta headers of the reference fasta file?. In my case, there are four different types of fasta headers (as i used a merged fasta file from four different ones) and actually i can see them in the third tab (chromosome). if there is an alignment, it inputs which fasta header the read aligns to. if there is no alignment there is * there. Already, i sorted out my sam files based on reads that mapped and which didn't map. Now i want to further sort the mapped ones based on which fasta header they align to. If you know, how to do it. could you please provide me the code?

          Thanks in advance

          Comment


          • #6
            @newfie: You can use some of the suggestions in this thread to split your sam file https://www.biostars.org/p/46327/

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X