Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Help] How to get those reads containing specified SNP?

    Hi, all,

    I am a new drummy for bioinformatics.

    After SNP calling using GATK/freebayes, we usually get a SNP list. Now I have some interest SNP sites. Does anyone know how to identify those reads containing these interest SNPs?

    Please note these SNP might be heterozygous. And now I mapped the reads to a reference, and get sorted bam file.

    Would anyone tell me how to achieve that in detail or just tell me your thought and some tools might be helpful

  • #2
    Assuming you have mapped your reads and now have a SAM/BAM file [this is the usual case] then the samtools program using the 'view' option will pull out reads in the region of your choice.

    Comment


    • #3
      Might not be understanding you but you can pull out all the reads + info with
      grep -B 1 -A 2 GCCTATCGCAGATACACTCC sample.fastq > SNVreads.fastqish
      (the nuc string contains your SNP)

      need to remove the -- printed between reads
      grep -v -e -- SNVreads.fastqish > SNVreads.fastq

      You might have to tweek the length of your grep nuc pattern for specificity and avoiding other SNPs (dont know what you are sequencing). A couple cross platform visualization tools is Ugene.

      Hope this is what you are looking for.

      Earl
      --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

      Comment


      • #4
        reference -----------------------------------------------------------
        read1 ----------T-------------
        read2 -------------------------
        read3 ------T------------------
        read4 --------------------------

        I want to extract all the read id having the T snp

        Comment


        • #5
          If your read file looks like that then you can use

          [your/Directory]$ grep -------T------ YourReadFile.txt > YourSNPReadFile.txt

          output:
          [your/Directory]$ more YourSNPReadFile.txt
          read1 ----------T-------------
          read3 ------T------------------

          _________________________________________________________________________
          If you have a .fastq file, all you need is the first line, which is just before the nuc string like:

          @M01472:34:000000000-A40FG:1:1101:17765:1645 1:N:0:9
          NTTCCAGCGAGGTTCTGAGTTCTTAGTCTGGTGTCGGCGTACCCACACGGTG
          +
          #>>>ABFFB?DBGGGGGCEGGGHHHGHHHHHFAGHEEGGGGGGHHGFDEEFG


          just use:

          [your/Directory]$ grep -B 1 GCCTATCGCAGATACACTCC YourSample.fastq > NamesAndReads.txt
          #where "-B 1" prints the line before the pattern
          #and the pattern "GCCTATCGCAGATACACTCC" contains the SNP somewhere in the middle.

          [your/Directory]$ grep @M01472 NamesAndReads.txt > Names.txt
          # "@M01472" is something in all the names but not in any reads
          # for instance if your read names are actually read1, read2, read3, and read4 you could use "read"

          #output for my command
          [your/Directory]$ more Names.txt
          @M01472:34:000000000-A40FG:1:1101:17765:1645 1:N:0:9
          @M01472:34:000000000-A40FG:1:1101:18453:1656 1:N:0:9
          @M01472:34:000000000-A40FG:1:1101:16266:1658 1:N:0:9
          --More--(0%)

          NOTE: this is a quick solution, if your genome is repetitive or if the SNP is in a duplicated region this approach might not be the best method. If that is the case. Something a little more involved from a .sam file might be necessary.

          hope that helps
          --Please take everything thing I say with a grain of salt, because, if grad school has taught me anything, it's that I'm an idiot--

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:35 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Working...
          X