Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract mapped reads at certain position from paired end RNA-seq data

    Hello everyone,

    I am having problems extracting the mapped reads at a certain position from a BAM alignment file. I hope that you can help me.

    I have paired end RNA-seq reads which I mapped to the human reference genome using TopHat. What I want to do now, is for a predefined list of SNP positions, extract the reads that mapped to each position. I am currently using samtools view for this, for instance:

    Code:
    samtools view reads_unique_hits.sorted.bam chr2:96858914-96858914 > extracted_reads.sam
    For most of the positions this works fine, however for some it goes wrong. It happens when the SNP position is in the insert region between two paired reads, then samtools returns the reads of those pairs, although at the exact location no reads were mapped. What I want is just an empty file for these locations.

    Is there any filtering that I can apply, so that it only returns reads that acually map at the given location? Or is there another tool that I can use for this? I tried bedtools (intersectBed), but it acts the same as samtools view. Any help is appreciated.

    Thanx in advance!

  • #2
    Intersectbed should work just fine for that purpose assuming that you have your reads converted into bedpe. I just tested it out and it did not report reads in the insert region.

    bedtools intersect -a test.bed -b test.gff -wa

    i just threw a single position into the gff file ; you could put your snp /snp list in.

    Comment


    • #3
      Hi RedMary,

      Did you inspect those regions in a genome browser to make sure there were no reads mapped? If you may, please share a screen shot.

      Best regards,
      Douglas

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      30 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      32 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X