![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extracting reads with a particular mismatch from bam file | DerSeb | Bioinformatics | 1 | 05-08-2014 07:26 AM |
extracting reads from sam/bam file gene wise | chris_bioinfo | Bioinformatics | 5 | 09-20-2013 03:09 AM |
Extracting unpaired reads from BAM file | JChase | Bioinformatics | 4 | 08-13-2012 07:23 PM |
Obtaining reference identical reads from a BAM file | Sakti | Bioinformatics | 2 | 05-17-2011 11:40 AM |
Extracting unique reads from a .ma or .bam file? | JohnK | SOLiD | 14 | 06-04-2010 01:32 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: P Join Date: Apr 2014
Posts: 18
|
![]()
Hello,
I have an alignment file of smallRNAs in bam format. How can I extact from this file all the identical reads in a single sequence but mantaining the reads count? Thank you for the help! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Are you trying to match a specific sequence ?
samtools view x.bam | grep TTTTCTGCCTGTTGGGCTGGAG | awk '{print $10}' | uniq -c All reads with same sequence as another read, try this ... samtools view x.bam | awk '{print $10}' | sort --buffer-size=20G | uniq -c | awk '{if ($1!=1) print $0}' where x.bam is you bam file. fine tune the --buffer-size parameter to sort |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
It would take longer, but swap the second awk in Richard's command line with
Code:
sort -nr Code:
| head -n 100 |
![]() |
![]() |
![]() |
#4 |
Member
Location: P Join Date: Apr 2014
Posts: 18
|
![]()
thank you! it worked!
can I do the same from a fastq file? |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Yes, you might want to use awk to only print every 4th line.
Note 1) the "mod" operator and 2) "line count" intrinsic variable in awk. Perl/python/C/java if you prefer can address the issue of filtering for only the sequence also. |
![]() |
![]() |
![]() |
#6 |
Member
Location: P Join Date: Apr 2014
Posts: 18
|
![]()
After I extracted the identical reads in a single sequence from the bam file I aligned them again to the genome. When I use igv to visualize the alignment now all the sequences are mapping in sense orientation.. even those sequences that are supposed to be antisense to the genome are shown in sense orientation. Why is that?
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
The sequence for the read in a bam files may be reverse complemented to align to the reference.
Reads are supposed to be properly noted as reversed in the bitwise flags field in a line/entry of sam/bam. You may wish to interrogate this flag for special processing. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|