Seqanswers Leaderboard Ad

**westerman** · 07-28-2014, 07:28 AM

Assuming you have mapped your reads and now have a SAM/BAM file [this is the usual case] then the samtools program using the 'view' option will pull out reads in the region of your choice.

**BurlEarl** · 07-29-2014, 09:32 AM

Might not be understanding you but you can pull out all the reads + info with
grep -B 1 -A 2 GCCTATCGCAGATACACTCC sample.fastq > SNVreads.fastqish
(the nuc string contains your SNP)

need to remove the -- printed between reads
grep -v -e -- SNVreads.fastqish > SNVreads.fastq

You might have to tweek the length of your grep nuc pattern for specificity and avoiding other SNPs (dont know what you are sequencing). A couple cross platform visualization tools is Ugene.

Hope this is what you are looking for.

Earl

**Phoenix_ICE** · 07-30-2014, 01:36 AM

reference -----------------------------------------------------------
read1 ----------T-------------
read2 -------------------------
read3 ------T------------------
read4 --------------------------

I want to extract all the read id having the T snp

**BurlEarl** · 08-04-2014, 09:34 AM

If your read file looks like that then you can use

[your/Directory]$ grep -------T------ YourReadFile.txt > YourSNPReadFile.txt

output:
[your/Directory]$ more YourSNPReadFile.txt
read1 ----------T-------------
read3 ------T------------------

_________________________________________________________________________
If you have a .fastq file, all you need is the first line, which is just before the nuc string like:

@M01472:34:000000000-A40FG:1:1101:17765:1645 1:N:0:9
NTTCCAGCGAGGTTCTGAGTTCTTAGTCTGGTGTCGGCGTACCCACACGGTG
+
#>>>ABFFB?DBGGGGGCEGGGHHHGHHHHHFAGHEEGGGGGGHHGFDEEFG

just use:

[your/Directory]$ grep -B 1 GCCTATCGCAGATACACTCC YourSample.fastq > NamesAndReads.txt
#where "-B 1" prints the line before the pattern
#and the pattern "GCCTATCGCAGATACACTCC" contains the SNP somewhere in the middle.

[your/Directory]$ grep @M01472 NamesAndReads.txt > Names.txt
# "@M01472" is something in all the names but not in any reads
# for instance if your read names are actually read1, read2, read3, and read4 you could use "read"

#output for my command
[your/Directory]$ more Names.txt
@M01472:34:000000000-A40FG:1:1101:17765:1645 1:N:0:9
@M01472:34:000000000-A40FG:1:1101:18453:1656 1:N:0:9
@M01472:34:000000000-A40FG:1:1101:16266:1658 1:N:0:9
--More--(0%)

NOTE: this is a quick solution, if your genome is repetitive or if the SNP is in a duplicated region this approach might not be the best method. If that is the case. Something a little more involved from a .sam file might be necessary.

hope that helps

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

[Help] How to get those reads containing specified SNP?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News