Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA - unmapped reads

    Hello everyone,

    I'm trying to combine the use of bwa and blat but I have a few difficulties to extract the unmapped reads from bwa.
    It appears they are in the first .sam file created using bwa bwasw.
    I used the command below to filter the SAM file and extract unaligned reads into the file unalntest.bam:

    $ samtools view -f 4 /home/biopuce11/Documents/Adam/bwa-0.5.8/aln/aln.sam -t -b -S -o /home/biopuce11/Documents/Adam/bwa-0.5.8/aln/unalntest.bam
    The problem is that when I flagstat it to see how many reads it contains, the soft tells me that there is no read in it.

    By the way, I have this message using "samtools view..." (above):
    [bam_header_read] EOF marker is absent.
    [bam_flagstat_core] Truncated file? Continue anyway.
    Could someone help me, please? I'm turning mad!

    Thanks.

  • #2
    Originally posted by Adamo View Post
    Hello everyone,

    I'm trying to combine the use of bwa and blat but I have a few difficulties to extract the unmapped reads from bwa.
    It appears they are in the first .sam file created using bwa bwasw.
    I used the command below to filter the SAM file and extract unaligned reads into the file unalntest.bam:

    The problem is that when I flagstat it to see how many reads it contains, the soft tells me that there is no read in it.
    What about:

    Code:
    $ samtools view -b -S -f 4 input.sam > output.bam
    By the way, I have this message using "samtools view..." (above):
    Could someone help me, please? I'm turning mad!
    Read this:

    http://seqanswers.com/forums/showpos...72&postcount=2
    -drd

    Comment


    • #3
      I have resolved the problem concerning the error message, thanks.
      But for the other one, I'm still try to figure out where the problem is.
      I've tried your command (btw, I didn't paste perfectly mine ("-f 4" is missing...) in my first post), but the problem remains...
      Any idea?
      Last edited by Adamo; 06-21-2010, 06:49 AM.

      Comment


      • #4
        if you type
        Code:
        samtools view -?
        you'll see that to convert sam to bam you'll need the reference fasta file

        Code:
        2. SAM->BAM conversion: `samtools view -bT ref.fa in.sam.gz'.
        your command should probabaly be
        samtools view -bT ref.fa -f 4 input.sam > output.bam
        or
        samtools view -bt ref.fa.fai -f 4 input.sam > output.bam

        Comment


        • #5
          I think the command's ok but still, I have no read in the output file. I don't get it!
          I have only about 2% of mapped reads in my .sam file according to flagstat...
          I've tried both commands you suggest.
          Is there something I have forgotten?

          Comment


          • #6
            It should work ...
            Can you paste your commands and the first few lines of your sam file

            Comment


            • #7
              The first lines of my sam file are:

              @SQ SN:Clostridium_botulinum_chromosome LN:3886916
              @SQ SN:Clostridium_botulinum_plasmid LN:16344
              SRR037154.1271 0 Clostridium_botulinum_chromosome 9824 0 35S229M3S * 0 0 CAGCTTAGGCTGCTAAAGCGCACGCAGGCGGTCTGTTAAGTCAGATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATTTGAAACTGGCAGGCTTGAGTCTTGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGATAAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCG FFFFFFFFFFFFFI@@@GIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFG;55555CGEGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBBGGFFFGGGGFFFFFFFFFFFFFFFFFFFFFFFFGA@@EE@7777@@@EEEEEEEEBBBGGIIIAAAAAAAAAAAAAA@@@@@@@@ AS:i:73 XS:i:73 XF:i:1 XE:i:0 XN:i:0
              My command is:

              $ samtools view -bT /home/db.fa -f 4 /home/alncomplet.sam > /home/unalncomplet.bam
              I also tried -bt instead of -bT (replacing db.fa by db.fa.fai (I indexed my database with samtools faidx)) but the result is just the same...

              Comment


              • #8
                I'm wondering if you have unmapped read in your sam file to beggin with.

                try
                Code:
                samtools view -bT ref.fa sam_file | samtools sort - bam_file 
                (with no .bam extension)
                then
                 samtools flagstat bam_file
                This will tell you what you have in your bam file.
                Then you can extract what you want from the bam file with
                Code:
                samtools view bam_file -f 4 | less
                to see the unmapped reads
                or which ever flag you want

                but if your original file does not contain any unmapped read then this is not going to help

                Comment


                • #9
                  Well, I'm a little confused.
                  My sam file contains only the reads mapped. The thing is that I read on a topic a post from lh3 (who know what he speaks about, I think!) telling someone that all the informations concerning unmapped reads where in the sam file. I assumed he talked about the one created using:
                  $bwa bwasw ...
                  That's why I'm trying to extract them from this sam file.
                  Otherwise, which one was lh3 speaking about? Or which one are you speaking about?

                  Comment


                  • #10
                    I'm sure Heng li knows what he's talking about so you should check your commands and log messages, parameters used, maybe rerun the alignment command (and keep the command you're running).
                    If it really does not output the unmapped read then send a email to the bwa mailing list.
                    and try to be a bit more details on what you've done and what you have otherwise you won't get any help.

                    Comment


                    • #11
                      I'll try the mailing list, thanks for the tip.
                      I've already run bwa several times, with default parameters and followed the steps suggested by the readme. Nothing more, nothing less...

                      I'm quite a newbie here, excuse me for not being precise enough.
                      Last edited by Adamo; 06-22-2010, 01:59 AM.

                      Comment


                      • #12
                        After further researches, it seems like bwasw doesn't include any unmapped read in the output file, unlike bwa aln.
                        I think I might have to write a script which could handle such a job.

                        Thanks for your help anyway!

                        Comment


                        • #13
                          I confirm that I find that 'bwa samse/sampe' does output the unaligned reads in the sam file while 'bwa bwasw' does not. This difference in behavior is quite odd to me. Is there a reason for it? This is not even mentioned in the bwa manual. But then, bwa is still at version 0.5.8a, so I should not expect too much. I would be really interested in seeing what is being left after aligning all the reads with samse and bwasw, using blat or blast.

                          Comment


                          • #14
                            bwasw patch

                            This simple patch:
                            Code:
                            --- bwtsw2_aux.c
                            +++ bwtsw2_aux.c
                            @@ -520,8 +520,10 @@
                             			bsw2_resolve_query_overlaps(b[0], opt.mask_level);
                             		} else b[1] = 0;
                             		// generate CIGAR and print SAM
                            -		gen_cigar(&opt, l, seq, pac, b[0]);
                            -		print_hits(bns, &opt, p, b[0]);
                            +		if(b[0]->n > 0) {
                            +		    gen_cigar(&opt, l, seq, pac, b[0]);
                            +		    print_hits(bns, &opt, p, b[0]);
                            +		}
                             		// free
                             		free(seq[0]);
                             		bsw2_destroy(b[0]);
                            @@ -583,6 +585,7 @@
                             	for (i = 0; i < _seq->n; ++i) {
                             		bsw2seq1_t *p = _seq->seq + i;
                             		if (p->sam) printf("%s", p->sam);
                            +		else printf("%s\t4\t*\t0\t0\t*\t*\t0\t0\t%s\t%s\n", p->name, p->seq, p->qual);
                             		free(p->name); free(p->seq); free(p->qual); free(p->sam);
                             		p->tid = -1; p->l = 0;
                             		p->name = p->seq = p->qual = p->sam = 0;
                            will make bwasw spit out also the unmapped reads, though it only works with .fastq files, so use it carefully. To use it, create a file contained the above code, like "bwtsw2_aux.diff", enter the bwa source directory, and issue the command "patch < bwtsw2_aux.diff". (PS You need to be careful to not make the tabs become spaces, or the patch will not work)
                            Last edited by freeseek; 10-07-2010, 03:09 PM.

                            Comment


                            • #15
                              This thread contains a perl script to get reads that are in a .fq file but not in a sam file. It was written with TopHat in mind but it may work for you too.
                              I'm not sure if it helps.
                              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X