Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which bam file is what I need to count reads?

    Hi every one,
    After I converted bowtie sam files to the bam files, I have no idea which bam should be used to cound reads, because there are tow bam files--- .bam file and .bam.sorted file. Is there any one who can do me a favor?

    Thanks a lot!!

    Richard

  • #2
    I assume you mean counting via htseq-count or something like that. In that case, use whichever file is name (rather than coordinate) sorted. Bowtie produces name-sorted output (as opposed to tophat, which defaults to coordinate sorting things, though you can disable this behavior).

    Comment


    • #3
      I don't quite understand your question. Could you provide the line of code you used to run bowtie? I use Bowtie2 and I believe when it's finished aligning it provides output stating how many reads did and did not align uniquely or more than once.

      Comment


      • #4
        I'll add that if you need to see how a file was sorted, just
        Code:
        samtools view -H file.bam | grep "@HD"
        and see if it says "unsorted", "queryname", or "coordinate". Practically speaking, "unsorted" is usually sufficient and you likely don't need to actually have "queryname" there (I'm sure some aligner or other actually interleaves paired-reads, but I've never seen it).

        Comment


        • #5
          Thanks Devon!
          Yes, I will count them via htseq-count and bedtool-multcov. According to my understanding your opinion, the bam file is what I need. Then, it means that these two tools can automaticly use the sorted bam file and the indexed bam file internally, right?

          Comment


          • #6
            HTSeq-count doesn't perform random access, so it won't use the index (you can't index a non-coordinate sorted BAM file anyway). I've never used bedtool-multcov, so I don't know what it should be fed as input.

            Comment


            • #7
              Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?

              Comment


              • #8
                I am sorry, Devon!
                I got it. sorting the bam file is for its index.

                Comment


                • #9
                  Originally posted by wmseq View Post
                  Although the bam file has been sorted after runing the samtools sort command, why is the sorted bam file still kept, and what is the purpose of storeing it?
                  the samtools sort does not sort in place, so it generates a new sorted file, use it for HTSeq count, but remember to sort it by name (-n flag) as what HTSeq needs (a name sorted sam file).

                  Comment


                  • #10
                    crazyhottommy,
                    You mean that I need the file of "name_forted.bam" for HTSeq count, not the file of "name.bam" from samtools view command?

                    Comment


                    • #11
                      Sorting is for sorting. If you sort by coordinate, then you can create an index to quickly randomly seek to a given portion of the file. You can also name sort, which is really the ideal input to htseq-count. A name-sorted BAM file can't be indexed (I assume this throws an error). You can also have a simple unsorted file. Normally, those actually work fine for use in htseq-count, you just need mates in a pair to be next to each other.

                      If you have single-end reads, then any BAM file (sorted or not) will work for htseq-count.

                      Comment


                      • #12
                        Devon,
                        After I run the following commands, I got three output files---in fact, two of them (0_1Q_3.sam, and 0_1Q_3_sorted.bam) are folders in which there is a file of 0_1Q_3 and a file of 0_1Q_3_sorted respectively, and a file of 0_1Q_3_sorted.bam.bai. That is why I am not sure which file is what I need.

                        $/home/wenfu/bin/samtools import /media/wenfu/LaCie/my_rnaseq_dat/Amhg45.fa 0_1Q_3.sam 0_1Q_3.bam

                        $/home/wenfu/bin/samtools sort 0_1Q_3.bam 0_1Q_3_sorted

                        $/home/wenfu/bin/samtools index 0_1Q_3_sorted.bam

                        Comment


                        • #13
                          For htseq-count, 0_1Q_3.bam would work and the sorted file wouldn't, since you coordinate sorted it (as I mentioned earlier, if you have single-end reads, they both will work). htseq-count needs mates to be next to each other in a file in order to work, so if you feed it a coordinate-sorted file (e.g., 0_1Q_3_sorted.bam), you'll get a lot of warnings and incorrect counts if you have paired-end reads. BTW, in the future, just do this:

                          Code:
                          samtools view -bS 0_1Q_3.sam | samtools sort - 0_1Q_3.sorted
                          samtools index 0_1Q_3.sorted.bam
                          Just give htseq-count the SAM file and then delete it. There's no need to use the old import command, which is just an alias for the "view" command and probably needs an indexed fasta file.

                          Comment


                          • #14
                            Thank a lot, Devon!!
                            Is the "-" following sort and before 0_1Q_3.sorted necessary?

                            Comment


                            • #15
                              Yes, it means "standard input", which is needed for the pipe to work.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X