Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filtering bam files by index

    Dear all,

    Is there any possibility to filter bam files by indexes?
    I've create in R a txt file with needed sequences indexes and now I have to extract these sequences and rewrite it as a new bam file.

    Does anybody know if this is possible? Preferably in R or samtools.

    If these is not possible, is there any way to create pileup file with Rsamtools simmilar to samtools mpileup?

    Thanks for any response

  • #2
    Are you aware that 'samtools view' can be called with region information to get a sub-file (SAM or BAM format) containing just the reads mapped in that region? That sounds like what you are asking for.

    Comment


    • #3
      Thank your for your reply, unfortunately this is not what we are looking for.

      Maybe we should make more clear what kind of data we have and what we wish to accomplish.

      We sequenced a short plasmid fragment (50 bp) with a GAx2. We used 100K copies of that plasmid to generate a library and got approx. 1M reads, which we interpret that 1 copy of plasmid yielded 10 clusters on the flow cell that were detected by the sequencer. When we analyzed the reads that mapped to our 50bp reference sequence we found that there are ~190.000 different variants of reads, with ~170.000 variants occuring only 10 or less times. We want to remove those low frequency reads from the BAM file, since we think those are artifacts from PCR, and create a mpileup for future use with Varscan to detect reliable variants of that plasmid.
      So far we created an R script to make an index of those low frequency reads - how can we write a new BAM file without those reads using Rsamtools?

      Thank you
      Stephan

      Comment


      • #4
        So essentially you have generated a list of about 170,000 read names you want to remove from the BAM file?

        In python I'd create a set object of the unwanted read names, then iterate over the SAM/BAM file in one pass filtering using this set. Python sets are much faster than the typically used list data structure because they use a hash for membership testing, rather than scanning the entire list. You could easily do this with pysam (the Python samtools API wrapper).

        I presume the same basic approach would work just as well in R, but I cannot give you any specific advice there.

        Comment


        • #5
          Thank you for Your response, we'll try Python

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          68 views
          0 likes
          Last Post seqadmin  
          Working...
          X