SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting reads with a particular mismatch from bam file DerSeb Bioinformatics 1 05-08-2014 06:26 AM
extracting reads from sam/bam file gene wise chris_bioinfo Bioinformatics 5 09-20-2013 02:09 AM
Extracting unpaired reads from BAM file JChase Bioinformatics 4 08-13-2012 06:23 PM
Obtaining reference identical reads from a BAM file Sakti Bioinformatics 2 05-17-2011 10:40 AM
Extracting unique reads from a .ma or .bam file? JohnK SOLiD 14 06-04-2010 12:32 AM

Reply
 
Thread Tools
Old 08-12-2014, 07:43 AM   #1
Schelarina
Member
 
Location: P

Join Date: Apr 2014
Posts: 18
Default extracting identical reads from bam file

Hello,
I have an alignment file of smallRNAs in bam format.
How can I extact from this file all the identical reads in a single sequence but mantaining the reads count?
Thank you for the help!
Schelarina is offline   Reply With Quote
Old 08-12-2014, 08:44 AM   #2
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Are you trying to match a specific sequence ?

samtools view x.bam | grep TTTTCTGCCTGTTGGGCTGGAG | awk '{print $10}' | uniq -c


All reads with same sequence as another read, try this ...

samtools view x.bam | awk '{print $10}' | sort --buffer-size=20G | uniq -c | awk '{if ($1!=1) print $0}'


where x.bam is you bam file.
fine tune the --buffer-size parameter to sort
Richard Finney is offline   Reply With Quote
Old 08-12-2014, 10:12 AM   #3
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

It would take longer, but swap the second awk in Richard's command line with

Code:
sort -nr
To get the most common reads in order, starting with the most abundant. Tack a

Code:
| head -n 100
to get only the top 100
swbarnes2 is offline   Reply With Quote
Old 08-20-2014, 08:11 AM   #4
Schelarina
Member
 
Location: P

Join Date: Apr 2014
Posts: 18
Default

thank you! it worked!
can I do the same from a fastq file?
Schelarina is offline   Reply With Quote
Old 08-20-2014, 08:47 AM   #5
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Yes, you might want to use awk to only print every 4th line.
Note 1) the "mod" operator and 2) "line count" intrinsic variable in awk.
Perl/python/C/java if you prefer can address the issue of filtering for only the sequence also.
Richard Finney is offline   Reply With Quote
Old 08-24-2014, 06:32 AM   #6
Schelarina
Member
 
Location: P

Join Date: Apr 2014
Posts: 18
Default

After I extracted the identical reads in a single sequence from the bam file I aligned them again to the genome. When I use igv to visualize the alignment now all the sequences are mapping in sense orientation.. even those sequences that are supposed to be antisense to the genome are shown in sense orientation. Why is that?
Schelarina is offline   Reply With Quote
Old 08-24-2014, 01:32 PM   #7
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

The sequence for the read in a bam files may be reverse complemented to align to the reference.
Reads are supposed to be properly noted as reversed in the bitwise flags field in a line/entry of sam/bam.

You may wish to interrogate this flag for special processing.
Richard Finney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO