Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating percentage of reads aligning to a subject

    I have done a blast on virus contigs and obtained hits matching to virus (strains/isolates) in the database. I would like to calculate the percentage of reads that are aligning to the viruses in the database. Can someone guide me on how to do this?

  • #2
    You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
    After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).

    Comment


    • #3
      Thanks Michael,

      I have not used Bowtie/ samtools before. How do I start off?

      Comment


      • #4
        1. What format are your blast results in (html, xml, text)? You may be able to parse that result file if all you want to know is how many sequences hit a "virus".

        2. If you did the blast locally do you have a sequence file with all "virus" sequences available? You will be able to use that file as an input for bowtie2 and follow the path @Michael.Ante suggested.

        3. Are you comfortable using command line (e.g. linux) applications?

        Comment


        • #5
          1. blast results are in txt format
          2. yes. sequence file is available ( database file?)
          3. am fairly comfortable with command line
          4. I would prefer the command prompt option as opposed to logging on the cluster (my internet is erratic)
          Last edited by kaps; 04-11-2015, 05:05 AM.

          Comment


          • #6
            Originally posted by kaps View Post
            Thanks Michael,

            I have not used Bowtie/ samtools before. How do I start off?
            Hi Kaps,

            you should have a look at the Bowtie2 homepage. There, it is explained in detail how the programs work. At the end of the manual is a "Lambda phage example", which has quite an overlap to your problem. It also has a SAMtools downstream section...

            Cheers,
            Michael

            Comment


            • #7
              Hi, Michael

              Thanks for pointing this to me!

              Comment


              • #8
                Originally posted by Michael.Ante View Post
                You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                Hello Michael, when I try samtools idxstats,
                I am getting a comment as below;

                samtools idxstats lib4seq.sorted.bam
                [bam_idxstats] fail to load the index.

                what could be the problem?

                Comment


                • #9
                  Have you created the index with
                  Code:
                  samtools index lib4seq.sorted.bam
                  ?
                  If yes how does your bam-dile header looks like?
                  Code:
                   samtools view -H lib4seq.sorted.bam

                  Comment


                  • #10
                    I had not created the index,
                    I can now see the statistics in the index file!

                    Thanks

                    Comment


                    • #11
                      Originally posted by Michael.Ante View Post
                      You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                      After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                      In a case where my index file has several sequences for different strains/isolates of the same virus which may be treated as duplicates, how do I restrict bowtie2 to do the alignment once?

                      Comment


                      • #12
                        Originally posted by Michael.Ante View Post
                        You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
                        After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats).
                        Hello,

                        After getting the samtools idxstats (on number of mapped vs unmapped reads), is it possible to extract/select reads that mapped from the raw read files/query? how is it done?

                        Comment


                        • #13
                          If you had used the "--un-conc and --al-conc" options (http://bowtie-bio.sourceforge.net/bo...output-options) the unmapped reads could have been written to separate files when you did the alignment.


                          1. You could repeat bowtie2 alignment with above parameters added to your original list (easier) OR
                          2. Identify read ID's of sequences that mapped and use a tool like seqtk to extract the mapped reads (e.g. seqtk subseq in.fq name.lst > out.fq)


                          Use @Michael.Ante's easy suggestion below
                          Last edited by GenoMax; 05-12-2015, 04:34 AM.

                          Comment


                          • #14
                            You can use samtools view to extract the mapped/unmapped reads by filtering the 'unmapped' flag:

                            Code:
                            samtools view -F 4 -bh lib4seq.sorted.bam > lib4seq.sorted.mapped.bam
                            samtools view -f 4 -bh lib4seq.sorted.bam > lib4seq.sorted.unmapped.bam
                            Samtools view will help a lot; just have a look at some tutorials, for instance Dave's wiki

                            Comment


                            • #15
                              if i want to convert lib4seq.sorted.mapped.bam to a fastq file (creating 2 files for paired end) do i need to sort this bam file again?
                              Last edited by kaps; 05-19-2015, 02:39 AM. Reason: error correction

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              47 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X