![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do you convert a SAM/BAM file to a PSL or AXT format? | lpipes | Bioinformatics | 1 | 07-30-2013 03:16 PM |
BWA aligned sam file missing "QNAME" format | silentio | Bioinformatics | 3 | 05-21-2013 08:19 PM |
Obtain file from Bowtie containing only aligned reads in SAM format | shanebrubaker | Bioinformatics | 2 | 08-04-2012 11:28 AM |
Looking process to convert gff3 format into ace format or sam format | andylai | Bioinformatics | 1 | 05-17-2011 03:09 AM |
Issues generating sam file format in ssaha2? | eni | Bioinformatics | 3 | 02-18-2010 08:52 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Montreal Join Date: May 2013
Posts: 367
|
![]()
Hi,
I'm using Rockhopper to analyze E. coli RNA-Seq data. http://cs.wellesley.edu/~btjaden/Roc.../download.html I'm not familiar with the SAM format outputted by Rockhopper. Has anyone seen this format before, or have any ideas on how to convert it the traditional format, which I could then view in IGV or on the UCSC Genome Browser? I'm quite comfortable with both Python and R, but I really don't understand the current format, so I'm unable to convert it. The data is paired-end. Here is the first fourteen lines from the SAM file. I've put more lines in the attached file. Code:
[blancha@lg-1r14-n04 samFiles]$ samtools view -h -f 2 IK_21C-EM9-1_R1.sam | more @HD VN:1.0 SO:unsorted @SQ SN:gi|556503834|ref|NC_000913.3| LN:4641652 SP:Escherichia coli str. K-12 substr. MG1655 @PG ID:Rockhopper PN:Rockhopper VN:2.03 D69F08P1:403:C6Y8VACXX:5:1101:1436:2236 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 2527763 255 50M = 2527927 213 TGGCAAATGGCATCCCGATGGCAAACATTCTGTTCCCCACATCGGTGATC BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIII + 131 gi|556503834|ref|NC_000913.3| 2527927 255 49M = 2527763 -213 CGCAACTGGTCCAGCCCCTGAAGCGTCCGCTTTAAGCTTTATCGGCGCT BBBFFFFFFFFFFIIIIIIIFIIIIFIIIIIIIIIIIIIIIIIIIIIFF D69F08P1:403:C6Y8VACXX:5:1101:1606:2216 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3441734 255 50M = 3441811 126 CGACAACCGTTATGAGGGATCGGAGTCACATCAGTAATGTTAGTGATGCG BBBFBFF<F0<FFIIIIIF7FFFFFIIIIIIFFFBFFFF<FFFB7B7B<F + 131 gi|556503834|ref|NC_000913.3| 3441811 255 49M = 3441734 -126 GAATCTGGAAGTTATGGTTAAAGGTCCGGGTCCAGGCCGCGAAACTACT BBBBFBFFFBF<FFFIIB<FFFIBFFFFF7BBFFFFFIFFIFF<FFFFB D69F08P1:403:C6Y8VACXX:5:1101:1955:2210 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3471221 255 50M = 3471324 152 CCCGTACGGTGGTGATTGCAGCGGTCAGAGTAGTTTTACCGTGGTCAACG BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFFFIIIIFFIIIIIIIIIII + 131 gi|556503834|ref|NC_000913.3| 3471324 255 49M = 3471221 -152 GCTCTCTCCTGAAGGGGAGAGCACTATAGTAAGGAATATAGCCGTGTCT BBBFFFFFFFFFFIIIIIIIIIIIIIIFIFFIIIIIIIIIIIIIFIIII D69F08P1:403:C6Y8VACXX:5:1101:2133:2203 1:N:0:AGTCAAC 115 gi|556503834|ref|NC_000913.3| 1719838 255 50M = 1719872 83 AAGAGACAGACCTACCATTGAAACAACCAATACGCGTTTAATCATTGAAA BBBFFFFFFFFFFIIIIIIFFIIIFIIIIIIFFFBFBFFFIIIFFFFFFB + 179 gi|556503834|ref|NC_000913.3| 1719872 255 49M = 1719838 -83 GCTTGCGTGGCGTTTCATGGTGAACAGGAGATTTTTCAATGATTAAACG BBBFFFFFFFFFFFFFIIIIBFBFFIIIFFBFFFIIIIBFFBFIFBBFB D69F08P1:403:C6Y8VACXX:5:1101:1916:2222 1:N:0:AGTCAAC 67 gi|556503834|ref|NC_000913.3| 3444439 255 50M = 3444490 100 CCCACGACCACCGGTTTTACCGAGGCCAGAACCGATACCACGACCCAGGC BBBFFFFFFFFFFFFFFFFIFFII<BBFFFFIIFFIF<<<BF<BBFBF7B + 131 gi|556503834|ref|NC_000913.3| 3444490 255 49M = 3444439 -100 TGCGTTTAAATACTCTGTCTCCGGCCGAAGGCTCCAAAAAGGCGGGTAA BB<FFFFFFFFFFFBFFBBBFBFFFFFFFB7BFFIBFFFBFB<BBB0<B D69F08P1:403:C6Y8VACXX:5:1101:2117:2249 1:N:0:AGTCAAC 115 gi|556503834|ref|NC_000913.3| 639393 255 50M = 639501 157 GGCGACGCCAACGCCGCTATGGCGTGAAAGACGAAGGAAATTTAGATTTT <BBFBFFFBBFBFFFIFFBFFIIIIIFBFFIIIIF7<BF<BBBBBBBBB< + 179 gi|556503834|ref|NC_000913.3| 639501 255 49M = 639393 -157 GTAAAATCAAAGCAGCACAGTACGTAGCTTCTCACCCAGGTGAAGTTTG B<BFFFFFFFFFFFBFFFFBBFFFFFFIIIFFFIFFBFFFFIBFFBFFF Last edited by blancha; 07-09-2015 at 05:11 PM. Reason: Put lines from SAM file in Code box |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
It mostly looks like a normal sam file; the specification is here: https://samtools.github.io/hts-specs/SAMv1.pdf
However, the second line has "+" for the read name, which is odd to say the least. Can you run head on the input fastq file to show the first 8 lines? Edit - looking at the attachment, it appears that either you have an odd fastq file with read2 always named "+" or that Rockhopper has a bug causing it to incorrectly report the read name. |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Montreal Join Date: May 2013
Posts: 367
|
![]()
Thank you Brian.
You are correct in pointing out that the only problem with the format is the + sign on every other line. The + just corresponds to the paired FASTQ read. If this was the only issue I had with Rockhopper, I would be happy. The main problem I have is that when I view the alignments in IGV, at least half the reads are mostly composed of mutations relative to the reference genome. I've tried all the different settings, fr, ff, rf, and rr. I cannot figure out why Rockhopper insists on aligning reads in what appears to be the wrong location. I think I'll just give up on the software, even if it appears to be widely used in respected publications for E. coli RNA-Seq analysis. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|