SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How do you convert a SAM/BAM file to a PSL or AXT format? lpipes Bioinformatics 1 07-30-2013 03:16 PM
BWA aligned sam file missing "QNAME" format silentio Bioinformatics 3 05-21-2013 08:19 PM
Obtain file from Bowtie containing only aligned reads in SAM format shanebrubaker Bioinformatics 2 08-04-2012 11:28 AM
Looking process to convert gff3 format into ace format or sam format andylai Bioinformatics 1 05-17-2011 03:09 AM
Issues generating sam file format in ssaha2? eni Bioinformatics 3 02-18-2010 08:52 AM

Reply
 
Thread Tools
Old 07-09-2015, 04:42 PM   #1
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default Unfamiliar SAM file format outputted by Rockhopper program

Hi,

I'm using Rockhopper to analyze E. coli RNA-Seq data.
http://cs.wellesley.edu/~btjaden/Roc.../download.html
I'm not familiar with the SAM format outputted by Rockhopper.
Has anyone seen this format before, or have any ideas on how to convert it the traditional format, which I could then view in IGV or on the UCSC Genome Browser? I'm quite comfortable with both Python and R, but I really don't understand the current format, so I'm unable to convert it.
The data is paired-end.

Here is the first fourteen lines from the SAM file.
I've put more lines in the attached file.

Code:
[blancha@lg-1r14-n04 samFiles]$ samtools view -h -f 2 IK_21C-EM9-1_R1.sam | more
@HD	VN:1.0	SO:unsorted
@SQ	SN:gi|556503834|ref|NC_000913.3|	LN:4641652	SP:Escherichia coli str. K-12 substr. MG1655
@PG	ID:Rockhopper	PN:Rockhopper	VN:2.03
D69F08P1:403:C6Y8VACXX:5:1101:1436:2236 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	2527763	255	50M	=	2527927	213	TGGCAAATGGCATCCCGATGGCAAACATTCTGTTCCCCACATCGGTGATC	BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIII
+	131	gi|556503834|ref|NC_000913.3|	2527927	255	49M	=	2527763	-213	CGCAACTGGTCCAGCCCCTGAAGCGTCCGCTTTAAGCTTTATCGGCGCT	BBBFFFFFFFFFFIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIIIIFF
D69F08P1:403:C6Y8VACXX:5:1101:1606:2216 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3441734	255	50M	=	3441811	126	CGACAACCGTTATGAGGGATCGGAGTCACATCAGTAATGTTAGTGATGCG	BBBFBFF<F0<FFIIIIIF7FFFFFIIIIIIFFFBFFFF<FFFB7B7B<F
+	131	gi|556503834|ref|NC_000913.3|	3441811	255	49M	=	3441734	-126	GAATCTGGAAGTTATGGTTAAAGGTCCGGGTCCAGGCCGCGAAACTACT	BBBBFBFFFBF<FFFIIB<FFFIBFFFFF7BBFFFFFIFFIFF<FFFFB
D69F08P1:403:C6Y8VACXX:5:1101:1955:2210 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3471221	255	50M	=	3471324	152	CCCGTACGGTGGTGATTGCAGCGGTCAGAGTAGTTTTACCGTGGTCAACG	BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFFFIIIIFFIIIIIIIIIII
+	131	gi|556503834|ref|NC_000913.3|	3471324	255	49M	=	3471221	-152	GCTCTCTCCTGAAGGGGAGAGCACTATAGTAAGGAATATAGCCGTGTCT	BBBFFFFFFFFFFIIIIIIIIIIIIIIFIFFIIIIIIIIIIIIIFIIII
D69F08P1:403:C6Y8VACXX:5:1101:2133:2203 1:N:0:AGTCAAC	115	gi|556503834|ref|NC_000913.3|	1719838	255	50M	=	1719872	83	AAGAGACAGACCTACCATTGAAACAACCAATACGCGTTTAATCATTGAAA	BBBFFFFFFFFFFIIIIIIFFIIIFIIIIIIFFFBFBFFFIIIFFFFFFB
+	179	gi|556503834|ref|NC_000913.3|	1719872	255	49M	=	1719838	-83	GCTTGCGTGGCGTTTCATGGTGAACAGGAGATTTTTCAATGATTAAACG	BBBFFFFFFFFFFFFFIIIIBFBFFIIIFFBFFFIIIIBFFBFIFBBFB
D69F08P1:403:C6Y8VACXX:5:1101:1916:2222 1:N:0:AGTCAAC	67	gi|556503834|ref|NC_000913.3|	3444439	255	50M	=	3444490	100	CCCACGACCACCGGTTTTACCGAGGCCAGAACCGATACCACGACCCAGGC	BBBFFFFFFFFFFFFFFFFIFFII<BBFFFFIIFFIF<<<BF<BBFBF7B
+	131	gi|556503834|ref|NC_000913.3|	3444490	255	49M	=	3444439	-100	TGCGTTTAAATACTCTGTCTCCGGCCGAAGGCTCCAAAAAGGCGGGTAA	BB<FFFFFFFFFFFBFFBBBFBFFFFFFFB7BFFIBFFFBFB<BBB0<B
D69F08P1:403:C6Y8VACXX:5:1101:2117:2249 1:N:0:AGTCAAC	115	gi|556503834|ref|NC_000913.3|	639393	255	50M	=	639501	157	GGCGACGCCAACGCCGCTATGGCGTGAAAGACGAAGGAAATTTAGATTTT	<BBFBFFFBBFBFFFIFFBFFIIIIIFBFFIIIIF7<BF<BBBBBBBBB<
+	179	gi|556503834|ref|NC_000913.3|	639501	255	49M	=	639393	-157	GTAAAATCAAAGCAGCACAGTACGTAGCTTCTCACCCAGGTGAAGTTTG	B<BFFFFFFFFFFFBFFFFBBFFFFFFIIIFFFIFFBFFFFIBFFBFFF
Thank you for your help.
Attached Files
File Type: txt rockhopper_sample_sam.txt (9.2 KB, 5 views)

Last edited by blancha; 07-09-2015 at 05:11 PM. Reason: Put lines from SAM file in Code box
blancha is offline   Reply With Quote
Old 07-09-2015, 05:13 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

It mostly looks like a normal sam file; the specification is here: https://samtools.github.io/hts-specs/SAMv1.pdf

However, the second line has "+" for the read name, which is odd to say the least. Can you run head on the input fastq file to show the first 8 lines?

Edit - looking at the attachment, it appears that either you have an odd fastq file with read2 always named "+" or that Rockhopper has a bug causing it to incorrectly report the read name.
Brian Bushnell is offline   Reply With Quote
Old 07-18-2015, 06:19 AM   #3
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Thank you Brian.
You are correct in pointing out that the only problem with the format is the + sign on every other line.
The + just corresponds to the paired FASTQ read.
If this was the only issue I had with Rockhopper, I would be happy.

The main problem I have is that when I view the alignments in IGV, at least half the reads are mostly composed of mutations relative to the reference genome.
I've tried all the different settings, fr, ff, rf, and rr.
I cannot figure out why Rockhopper insists on aligning reads in what appears to be the wrong location.

I think I'll just give up on the software, even if it appears to be widely used in respected publications for E. coli RNA-Seq analysis.
blancha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:01 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO