SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Why there is more reads in BAM file than I can see in tview? tomjan Bioinformatics 0 04-26-2012 12:41 PM
How to replace select reads in a bam file? Heisman Bioinformatics 8 01-02-2012 02:49 PM
Getting Reads from Bam file empyrean Bioinformatics 4 10-12-2011 12:57 PM
Query bam file and assemble reads dustar1986 Bioinformatics 10 09-29-2011 07:48 PM
Extracting unique reads from a .ma or .bam file? JohnK SOLiD 14 06-04-2010 12:32 AM

Reply
 
Thread Tools
Old 05-25-2012, 03:10 AM   #1
FiReaNG3L
Member
 
Location: Paris

Join Date: Apr 2012
Posts: 17
Default Trim BAM file reads from 5' ends

For a specific application I need to display the 5' ends of reads in a genome browser - so far I haven't found a way to easily trim all the reads in a BAM file down to the first basepairs. I obviously need to retain mapping information so I can't just trim the reads in FASTQ.
FiReaNG3L is offline   Reply With Quote
Old 05-25-2012, 03:37 AM   #2
arvid
Senior Member
 
Location: Berlin

Join Date: Jul 2011
Posts: 156
Default

Define "retain mapping information"; do you need the information provided by the CIGARs? If you don't, such trimming is straightforward with a simple script chopping off the bases and qualities in the BAM (pipe "samtools view -h" into a script which skips over the headers, then for each line cuts down columns 10 and 11 to the desired length, and just replaces the CIGAR field with a *, piped into "samtools view -bS").
If you need to retain the information in the CIGARs properly, it becomes more messy, as you'd want to think about how to handle soft-clippings and indels.

Last edited by arvid; 05-25-2012 at 03:38 AM. Reason: typo
arvid is offline   Reply With Quote
Old 05-25-2012, 03:47 AM   #3
FiReaNG3L
Member
 
Location: Paris

Join Date: Apr 2012
Posts: 17
Default

Ok, so something along the lines of what's suggested in http://seqanswers.com/forums/showpos...53&postcount=2.

I don't think SeqMonk cares about the CIGAR information, so it should be fine.
FiReaNG3L is offline   Reply With Quote
Old 05-25-2012, 04:01 AM   #4
arvid
Senior Member
 
Location: Berlin

Join Date: Jul 2011
Posts: 156
Default

Yes, basically. I'd really replace the CIGAR with a * though, to make the SAM/BAM file standardized. You might want to check how to decide on which side of the read sequence in the BAM is 5'/3' (in the read, not alignment sense as I understand the SAM spec) however, didn't think about that.
arvid is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO