SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cuffdiff says SAM isnít sorted, although it handled it cufflinks crader Bioinformatics 2 04-05-2012 08:43 AM
badly sorted BAM Filippo Bioinformatics 3 12-29-2011 12:39 PM
How to get all contig boundaries from a sorted bam file dustar1986 Bioinformatics 3 09-30-2011 12:31 AM
how to check whether a bam fille is sorted using picard in java jay2008 Bioinformatics 0 05-23-2011 03:14 PM
Sorted bam wangzkai Bioinformatics 3 05-07-2010 01:37 AM

Reply
 
Thread Tools
Old 10-18-2012, 01:49 PM   #1
ugolino
Member
 
Location: maryland, usa

Join Date: Oct 2011
Posts: 14
Default convert sorted bam to sorted sam for htseq-count

Hi there,

I have a bowtie2 alignment of PE non-stranded RNA-seq reads from a bacterial species (used option -k 1; 96.20% pairs aligned concordantly exactly 1 time) , and would like to use htseq-count to get count data across genes. I am having trouble retaining reads sorted after converting a sorted bam to sam format (htseq-count needs sorted sam for PE reads).

These are my attempts and error messages:

# sorting reads
$ samtools sort myalignment.bam myalignment.sorted

# convert back to sam
$ samtools view -h myalignment.sorted.bam > out.sorted.sam

# check (truncated output) - note @HD line 'unsorted', ?
$ head out.sorted.sam
@HD VN:1.0 SO:unsorted
@SQ SN:NC_017656.1 LN:5212843
@PG ID:bowtie2 PN:bowtie2 VN:2.0.0-beta7
HWUSI-EAS1615L:13:FC64RB1AAXX:4:23:7604:14541_1 99 NC_017656.1 156 255 68M = 206 118 CAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCA IIIIIIEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIHGIIIIGHIIHIHH AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:68 YS:i:0 YT:Z:CP
HWUSI-EAS1615L:13:FC64RB1AAXX:4:70:9040:11393_1 99 NC_017656.1 164 255 68M = 207 111 TAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATT IIIIFIIIIBHHHIFIIIIIIIIIHIGIHIIIIIFHIIIIBIHGHIGHIHHICIHCHF;FEDEFDEH< AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:68 YS:i:0 YT:Z:CP

# tried htseq-count (truncated output)
$ htseq-count -s no -t gene -i ID out.sorted.sam ../reference.gff
4965 GFF lines processed.
Warning: Read HWUSI-EAS1615L:13:FC64RB1AAXX:4:23:7604:14541_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWUSI-EAS1615L:13:FC64RB1AAXX:4:70:9040:11393_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)
Warning: Read HWUSI-EAS1615L:13:FC64RB1AAXX:4:62:8731:7335_1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)


Your insight is much appreciated!
ugolino is offline   Reply With Quote
Old 10-18-2012, 04:09 PM   #2
ugolino
Member
 
Location: maryland, usa

Join Date: Oct 2011
Posts: 14
Default

I think I figured it out. Bowtie2 outputs by default reads sorted by name. The offending part in the reads is the _1, _2 at the end. Removing those (in vim) fixed the problem and htseq-count works without any sorting needed. Only after couple hours of staring at the reads to figure this out, came across this thread that explains an identical issue. Feeling slow...

http://seqanswers.com/forums/showthread.php?t=23087

thanks
ugolino is offline   Reply With Quote
Old 10-19-2012, 05:02 AM   #3
ThePresident
Member
 
Location: Sherbrooke / Canada

Join Date: Jun 2012
Posts: 72
Default

Simple curiosity (since I've also done RNA-seq on some bacterial species), why have you chosen to do your study with paired-end sequencing?

TP
ThePresident is offline   Reply With Quote
Old 10-19-2012, 06:30 AM   #4
ugolino
Member
 
Location: maryland, usa

Join Date: Oct 2011
Posts: 14
Default

To align reads with greater confidence, as these strains have many phages and IS elements ( some of which are chromosomal in certain strains and plasmid borne in others ), and their genomes have not been sequenced yet ( so I also sequenced the genomes ).
ugolino is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO