SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cuffmerge crashes when converting gtf files to sam files swbiggs4 Bioinformatics 20 02-16-2017 09:19 AM
List of indels and snps between two Bam Files (Comparison) Xx KeNoH xX Bioinformatics 0 06-12-2012 10:18 AM
Manipulating .sam files anunn Bioinformatics 3 03-23-2011 09:23 AM
Bam and Sam don't like my fasta file mindlessbrain Bioinformatics 2 12-09-2010 10:47 PM
help with sam files frankyue50 Bioinformatics 5 08-19-2010 12:41 PM

Reply
 
Thread Tools
Old 06-18-2013, 11:58 AM   #1
prs321
Member
 
Location: US

Join Date: Jun 2013
Posts: 96
Default Why don't my SAM files list the chromosomes?

I used the latest version of BWA. I tried the program 4 different ways on the same paired-end sequence to see which gives me the best quality.

First way involved using mem. I used one paired-end read that had the adaptor sequences chopped off. I then chopped off poor quality bases from that same file and ran BWA again.

Second way involved using aln and sampe. I tried this two different ways like the first way.

After this process, I used samtools for each sam file produced. For each sam file, I converted to bam. Then I sorted the bam file. Then I used the index command on the bam file. Finally I used idxstats for stats.

My questions:

1. After using bwa to align/map and then using samtools to sort and index, I checked out each final bam file by converting them to a sam file and I viewed them in the terminal.

I couldn't seem to find the chromosome, I think in the third column. Why?


Example from SAM file:
Code:
M00532:8:000000000-A17VF:1:1101:16380:1451      83      Serratia        3298780 29      229M1S  =       3298620 -389    TGTCGTTCGCCAACTTCAGCGTGCTCTGGACCTCAATGGCCTTTNTGCTCGCCGCGCCGCCGTTCAACTATTCCGAGGGAGTGATCGGGCTGTTCGGCCTGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCANCTGGCGGACAAAGGCAAGGCCGGNCTGACNACCACCGTCGGCCTGGTGTTNCTGCTGCTGTCCTGGATCCCTATCGCGTTCGCCAAN  D>ED>4'8?1*1*?*AEED>FEA?A1*A???:??A?8A8)8800#;DDDDDDDD?D8D;ECECA?E?C?CC;EDFEEEFFFEDDDDEE?:DDDDDA8)0)0.#####################################?44#[email protected]?4#HFD?5#HHHEHHHHHHHIHIHHFEA5#[email protected]@???<5#  XT:A:M  NM:i:49 SM:i:29 AM:i:29 XM:i:7  XO:i:0  XG:i:0  MD:Z:44C34G22T0G0G0G0C0G0C0C0G0C0C0G0G0G0G0C0G0C0T0G0G0C0C0G0C0T0T0C0G0C0G0C0G0C0C0G0G3T14T2A5G0T4C20G0T1A26A5

2. What does the last line mean after running idxstats?

Serratia 5113802 307778 2900
* 0 0 155004


And just for clarification, the first line reads reference sequence name, sequence length, # of mapped reads and # of unmapped reads?
prs321 is offline   Reply With Quote
Old 06-18-2013, 12:29 PM   #2
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

This comes down to how you built the index for BWA. What FASTA file(s) did you use? If you didn't build the index from FASTA sequences that are full chromosome references then you won't get alignments in terms of chromosomes.

Also that last line of idxstats is probably just the number of unaligned reads. Typically unmapped reads have an '*' in the third column.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 06-18-2013, 08:39 PM   #3
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

Did you read SAM format description?

Yes, the third column of a sam file has the chromosome name.

You've done something very wrong, though.

Quote:
MD:Z:44C34G22T0G0G0G0C0G0C0C0G0C0C0G0G0G0G0C0G0C0T0G0G0C0C0G0C0T0T0C0G0C0G0C0G0C0C0G0G3T14T2A5G0T4C20G0T1A26A5
Means that you used the wrong fastq file in the sampe step.
swbarnes2 is offline   Reply With Quote
Old 06-19-2013, 07:23 AM   #4
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

Also I recommend mem over the aln/sampe pipeline. It's simpler and it works better.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Old 06-19-2013, 08:17 AM   #5
prs321
Member
 
Location: US

Join Date: Jun 2013
Posts: 96
Default

Quote:
Originally Posted by sdriscoll View Post
This comes down to how you built the index for BWA. What FASTA file(s) did you use? If you didn't build the index from FASTA sequences that are full chromosome references then you won't get alignments in terms of chromosomes.

Also that last line of idxstats is probably just the number of unaligned reads. Typically unmapped reads have an '*' in the third column.
I used db11.fasta

I did build the index.

And the last part of what you said makes no sense because the first row describes the name, sequences, # of mapped reads, and # of unmapped reads. How does the second row (* 0 0 32694) describe the # of unmapped reads when the first row already lists the # of unmapped reads?
prs321 is offline   Reply With Quote
Old 06-19-2013, 08:18 AM   #6
prs321
Member
 
Location: US

Join Date: Jun 2013
Posts: 96
Default

Quote:
Originally Posted by swbarnes2 View Post
Did you read SAM format description?

Yes, the third column of a sam file has the chromosome name.

You've done something very wrong, though.



Means that you used the wrong fastq file in the sampe step.
Could this have anything to do with the fact that Serratia marcescens is a bacteria with only 1 chromosome?
prs321 is offline   Reply With Quote
Old 06-19-2013, 08:43 AM   #7
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

A read can be unmapped, and associated with a chromosome, if it hangs off the edge. You have 2900 such reads. The rest of the unmapped reads didn't map at all, that's the 155004.

I used bwa and samtools on single chromosome bacterial references all the time. You messed up your sampe command, that's why you have that nonsense MD part. That's the only mistake you appear to have made, everything else looks normal, so I'm not sure what you think the problem is.
swbarnes2 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:51 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO