Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert BAM file to FASTQ

    After a quick search I found these:

    Hydra
    Picard (SAMToFastq)
    HudsonAlpha
    Possibly EMBOSS

    Any comments on these? Any other options for BAM-to-FASTQ conversion?

    Basically I want to recover all paired-end reads (both R1 and R2) that were fed into the alignment that produced the BAM file, whether they mapped successfully or not.
    Last edited by malachig; 09-28-2010, 01:04 PM.

  • #2
    I've used Picard and it works fine for me.

    Comment


    • #3
      You may want to filter the BAM file to remove any non-primary mappings (otherwise you could get duplicate entries in the FASTQ file). The tools may do that for you.

      You may also want to append /1 and /2 to the forward and reverse read names (this information isn't currently stored in SAM/BAM format but there is a proposed tag for the read name suffix in the draft standard update).

      Also double check that any reads mapped to the reverse stand get reverse complemented when writing the FASTQ file since you want to recover the input sequences.

      There are also DIY approaches, for example BAM to SAM and then a Perl/Python script. I have some experimental code for Biopython to do this too.

      There was a thread on this on the samtools-help mailing list in August 2010, "BAM to fastq how?"

      Comment


      • #4
        Bamtools (http://github.com/pezmaster31/bamtools) can convert BAM to FASTQ.

        bamtools convert -in file1.bam -in file2.bam ... -format fastq >reads.fq

        Comment


        • #5
          Hi,

          For BamtoFastq convertion I use Bamtools.
          But when I try to convert one of my bam files to fastq I get the following error message
          "BGZF ERROR: read block failed - could not read data from block"
          The problem is that after this step bamtools exits. Is it possible to avoid it? I don't know, somehow to tell bamtools just to skip such block and continue. Or, like in the picard, is there any VALIDATION_STRIGENCY option that could be set lenient or silent?
          Just to mention, these bam files contain unmapped PE reads.
          thanks
          Last edited by ElMichael; 11-15-2010, 10:44 AM.

          Comment


          • #6
            On Picard,
            my service provider mentioned this
            "Using picard tools directly has one significant drawback. Picard tools will read in sequence from the BAM
            line by line and cache it until it has both reads. Once it has both reads it will print them out and free the
            memory. Unfortunately this means that every read which doesn't have the pairs near each other will
            take memory. In the example above it took 2.5GB of memory for 120GB of sequence but this is not
            guaranteed and will get worse on larger builds.
            "

            Sounds terrible to me..

            fortunately there's method 2

            'You can specify samtools memory usage (it'll use temporary files) so if you sort the BAM by name prior
            to running picard tools on it you guarantee the reads are next to each other and picard tools will barely
            use any memory. '



            side question, was there anything in the original fastq one might want to keep that you can't find in the sorted bams? I am inclined to retrieve the original fastq files but data storage might be a problem for me.
            http://kevin-gattaca.blogspot.com/

            Comment


            • #7
              I've use Picard on .bams generated by bwa/samtools, and it definately keeps the unmapped reads. But that's because the .bam has them. If you used an aligner that tossed them, or put them in another .bam (didn't bowtie used to do that be default?) Then there's nothing any software can do about that.

              I've never tried to get them back out as paired reads. I assume that it uses the flag to know which is read 1 and which is read 2, but it might not know to order them properly. If your .bam has all the reads sorted by name, and you haven't filtered out any single reads, I bet the fastqs would be in the right order.
              Last edited by swbarnes2; 11-16-2011, 10:41 AM.

              Comment


              • #8
                Try using bam2fastq from hudsonalpha at http://www.hudsonalpha.org/gsl/software/bam2fastq.php. It is very quick (processed my bam files size ranging from 0.5 - 4 GB(8 files) in less than 10 minutes in a standard 2 core linux machine.)

                Comment


                • #9
                  Help using bamtools

                  I'm new to this and looking for help too - when I use bamtools to convert my .bam file to fastq, I only get one output file. Is it possible to split pair-ended reads into two output files? Can someone suggest a method?
                  Many thanks,
                  Johnny.

                  Comment


                  • #10
                    You just specify two different output files like:

                    java picard-tools/SamToFastq.jar I=Input.bam F=seq1_1.fastq F2=seq1_2.fastq

                    You can also split these by read groups using additional command line arguments.

                    Comment


                    • #11
                      TopHat

                      The following command in Tophat can convert bam to fastq (with basic settings)

                      bam2fastx -q -Q -A -o output.fastq input.bam

                      for more manipulation

                      bam2fastx [--fasta|-a|--fastq|-q] [--color] [-Q] [--sam|-s|-t]
                      [-M|--mapped-only|-A|--all] [-o <outfile>] [-P|--paired] [-N] <in.bam>

                      Note: By default, reads flagged as not passing quality controls are
                      discarded; the -Q option can be used to ignore the QC flag.

                      Use the -N option if the /1 and /2 suffixes should be appended to
                      read names according to the SAM flags

                      Comment


                      • #12
                        Originally posted by abhinay View Post
                        The following command in Tophat can convert bam to fastq (with basic settings)

                        bam2fastx -q -Q -A -o output.fastq input.bam

                        for more manipulation

                        bam2fastx [--fasta|-a|--fastq|-q] [--color] [-Q] [--sam|-s|-t]
                        [-M|--mapped-only|-A|--all] [-o <outfile>] [-P|--paired] [-N] <in.bam>

                        Note: By default, reads flagged as not passing quality controls are
                        discarded; the -Q option can be used to ignore the QC flag.

                        Use the -N option if the /1 and /2 suffixes should be appended to
                        read names according to the SAM flags
                        I second that

                        Comment


                        • #13
                          Hi, I am new here. Can any one tell me what script you use to convert BAM files to FASTQ in PICARD? tnx



                          Originally posted by malachig View Post
                          After a quick search I found these:

                          Hydra
                          Picard (SAMToFastq)
                          HudsonAlpha
                          Possibly EMBOSS

                          Any comments on these? Any other options for BAM-to-FASTQ conversion?

                          Basically I want to recover all paired-end reads (both R1 and R2) that were fed into the alignment that produced the BAM file, whether they mapped successfully or not.

                          Comment


                          • #14
                            Code:
                            java -jar /usr/local/tools/picard-tools-1.114/SamToFastq.jar \
                            VALIDATION_STRINGENCY=SILENT \
                            INPUT=HI.1965.007.Index_1.FL_K562-110k-A.bam \
                            FASTQ=HI.1965.007.Index_1.FL_K562-110k-A_R1.fastq \
                            SECOND_END_FASTQ=HI.1965.007.Index_1.FL_K562-110k-A_R2.fastq \
                            &> bamtofastq.sh.log

                            Comment


                            • #15
                              found this thread and decided to revive it.
                              Did anyone tried to get back to several fastq pairs r1 and r2 merged into one bam file. Alignment was done with bwa mem, merging with biobambam.
                              3 seperately sequenced lanes where the input.
                              Right now I use picard bam2fastq are there any other feasible options?
                              And do I really get back to the 100% identical fastq files which where the original input?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X