Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kaps
    Member
    • Jan 2015
    • 71

    Calculating percentage of reads aligning to a subject

    I have done a blast on virus contigs and obtained hits matching to virus (strains/isolates) in the database. I would like to calculate the percentage of reads that are aligning to the viruses in the database. Can someone guide me on how to do this?
  • Michael.Ante
    Senior Member
    • Oct 2011
    • 127

    #2
    You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
    After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

    Comment

    • kaps
      Member
      • Jan 2015
      • 71

      #3
      Thanks Michael,

      I have not used Bowtie/ samtools before. How do I start off?

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        1. What format are your blast results in (html, xml, text)? You may be able to parse that result file if all you want to know is how many sequences hit a "virus".

        2. If you did the blast locally do you have a sequence file with all "virus" sequences available? You will be able to use that file as an input for bowtie2 and follow the path @Michael.Ante suggested.

        3. Are you comfortable using command line (e.g. linux) applications?

        Comment

        • kaps
          Member
          • Jan 2015
          • 71

          #5
          1. blast results are in txt format
          2. yes. sequence file is available ( database file?)
          3. am fairly comfortable with command line
          4. I would prefer the command prompt option as opposed to logging on the cluster (my internet is erratic)
          Last edited by kaps; 04-11-2015, 05:05 AM.

          Comment

          • Michael.Ante
            Senior Member
            • Oct 2011
            • 127

            #6
            Originally posted by kaps View Post
            Thanks Michael,

            I have not used Bowtie/ samtools before. How do I start off?
            Hi Kaps,

            you should have a look at the Bowtie2 homepage. There, it is explained in detail how the programs work. At the end of the manual is a "Lambda phage example", which has quite an overlap to your problem. It also has a SAMtools downstream section...

            Cheers,
            Michael

            Comment

            • kaps
              Member
              • Jan 2015
              • 71

              #7
              Hi, Michael

              Thanks for pointing this to me!

              Comment

              • kaps
                Member
                • Jan 2015
                • 71

                #8
                Originally posted by Michael.Ante View Post
                You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                Hello Michael, when I try samtools idxstats,
                I am getting a comment as below;

                samtools idxstats lib4seq.sorted.bam
                [bam_idxstats] fail to load the index.

                what could be the problem?

                Comment

                • Michael.Ante
                  Senior Member
                  • Oct 2011
                  • 127

                  #9
                  Have you created the index with
                  Code:
                  samtools index lib4seq.sorted.bam
                  ?
                  If yes how does your bam-dile header looks like?
                  Code:
                   samtools view -H lib4seq.sorted.bam

                  Comment

                  • kaps
                    Member
                    • Jan 2015
                    • 71

                    #10
                    I had not created the index,
                    I can now see the statistics in the index file!

                    Thanks

                    Comment

                    • kaps
                      Member
                      • Jan 2015
                      • 71

                      #11
                      Originally posted by Michael.Ante View Post
                      You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                      After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                      In a case where my index file has several sequences for different strains/isolates of the same virus which may be treated as duplicates, how do I restrict bowtie2 to do the alignment once?

                      Comment

                      • kaps
                        Member
                        • Jan 2015
                        • 71

                        #12
                        Originally posted by Michael.Ante View Post
                        You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                        After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                        Hello,

                        After getting the samtools idxstats (on number of mapped vs unmapped reads), is it possible to extract/select reads that mapped from the raw read files/query? how is it done?

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          If you had used the "--un-conc and --al-conc" options (http://bowtie-bio.sourceforge.net/bo...output-options) the unmapped reads could have been written to separate files when you did the alignment.


                          1. You could repeat bowtie2 alignment with above parameters added to your original list (easier) OR
                          2. Identify read ID's of sequences that mapped and use a tool like seqtk to extract the mapped reads (e.g. seqtk subseq in.fq name.lst > out.fq)


                          Use @Michael.Ante's easy suggestion below
                          Last edited by GenoMax; 05-12-2015, 04:34 AM.

                          Comment

                          • Michael.Ante
                            Senior Member
                            • Oct 2011
                            • 127

                            #14
                            You can use samtools view to extract the mapped/unmapped reads by filtering the 'unmapped' flag:

                            Code:
                            samtools view -F 4 -bh lib4seq.sorted.bam > lib4seq.sorted.mapped.bam
                            samtools view -f 4 -bh lib4seq.sorted.bam > lib4seq.sorted.unmapped.bam
                            Samtools view will help a lot; just have a look at some tutorials, for instance Dave's wiki

                            Comment

                            • kaps
                              Member
                              • Jan 2015
                              • 71

                              #15
                              if i want to convert lib4seq.sorted.mapped.bam to a fastq file (creating 2 files for paired end) do i need to sort this bam file again?
                              Last edited by kaps; 05-19-2015, 02:39 AM. Reason: error correction

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                Today, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 06:09 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              37 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...