Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating percentage of reads aligning to a subject

    I have done a blast on virus contigs and obtained hits matching to virus (strains/isolates) in the database. I would like to calculate the percentage of reads that are aligning to the viruses in the database. Can someone guide me on how to do this?

  • #2
    You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
    After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

    Comment


    • #3
      Thanks Michael,

      I have not used Bowtie/ samtools before. How do I start off?

      Comment


      • #4
        1. What format are your blast results in (html, xml, text)? You may be able to parse that result file if all you want to know is how many sequences hit a "virus".

        2. If you did the blast locally do you have a sequence file with all "virus" sequences available? You will be able to use that file as an input for bowtie2 and follow the path @Michael.Ante suggested.

        3. Are you comfortable using command line (e.g. linux) applications?

        Comment


        • #5
          1. blast results are in txt format
          2. yes. sequence file is available ( database file?)
          3. am fairly comfortable with command line
          4. I would prefer the command prompt option as opposed to logging on the cluster (my internet is erratic)
          Last edited by kaps; 04-11-2015, 05:05 AM.

          Comment


          • #6
            Originally posted by kaps View Post
            Thanks Michael,

            I have not used Bowtie/ samtools before. How do I start off?
            Hi Kaps,

            you should have a look at the Bowtie2 homepage. There, it is explained in detail how the programs work. At the end of the manual is a "Lambda phage example", which has quite an overlap to your problem. It also has a SAMtools downstream section...

            Cheers,
            Michael

            Comment


            • #7
              Hi, Michael

              Thanks for pointing this to me!

              Comment


              • #8
                Originally posted by Michael.Ante View Post
                You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                Hello Michael, when I try samtools idxstats,
                I am getting a comment as below;

                samtools idxstats lib4seq.sorted.bam
                [bam_idxstats] fail to load the index.

                what could be the problem?

                Comment


                • #9
                  Have you created the index with
                  Code:
                  samtools index lib4seq.sorted.bam
                  ?
                  If yes how does your bam-dile header looks like?
                  Code:
                   samtools view -H lib4seq.sorted.bam

                  Comment


                  • #10
                    I had not created the index,
                    I can now see the statistics in the index file!

                    Thanks

                    Comment


                    • #11
                      Originally posted by Michael.Ante View Post
                      You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                      After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                      In a case where my index file has several sequences for different strains/isolates of the same virus which may be treated as duplicates, how do I restrict bowtie2 to do the alignment once?

                      Comment


                      • #12
                        Originally posted by Michael.Ante View Post
                        You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                        After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                        Hello,

                        After getting the samtools idxstats (on number of mapped vs unmapped reads), is it possible to extract/select reads that mapped from the raw read files/query? how is it done?

                        Comment


                        • #13
                          If you had used the "--un-conc and --al-conc" options (http://bowtie-bio.sourceforge.net/bo...output-options) the unmapped reads could have been written to separate files when you did the alignment.


                          1. You could repeat bowtie2 alignment with above parameters added to your original list (easier) OR
                          2. Identify read ID's of sequences that mapped and use a tool like seqtk to extract the mapped reads (e.g. seqtk subseq in.fq name.lst > out.fq)


                          Use @Michael.Ante's easy suggestion below
                          Last edited by GenoMax; 05-12-2015, 04:34 AM.

                          Comment


                          • #14
                            You can use samtools view to extract the mapped/unmapped reads by filtering the 'unmapped' flag:

                            Code:
                            samtools view -F 4 -bh lib4seq.sorted.bam > lib4seq.sorted.mapped.bam
                            samtools view -f 4 -bh lib4seq.sorted.bam > lib4seq.sorted.unmapped.bam
                            Samtools view will help a lot; just have a look at some tutorials, for instance Dave's wiki

                            Comment


                            • #15
                              if i want to convert lib4seq.sorted.mapped.bam to a fastq file (creating 2 files for paired end) do i need to sort this bam file again?
                              Last edited by kaps; 05-19-2015, 02:39 AM. Reason: error correction

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              80 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X