SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > SOLiD



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to convert BLAT output (pslx or maf) to gff, sam, wig to load in browser MQ-BCBB Bioinformatics 11 01-16-2013 06:20 AM
Dindel SAM/BAM format - viewing with IGV EHC General 0 10-06-2011 11:27 AM
IGV reverse strand gff NicoBxl Bioinformatics 1 04-01-2011 03:16 AM
IGV gff file NicoBxl Bioinformatics 1 03-18-2011 05:03 AM
Convert Solid BAM/SAM to Solid GFF? ioannis Bioinformatics 0 07-20-2010 08:06 AM

Reply
 
Thread Tools
Old 09-01-2009, 05:09 AM   #1
hingamp
Member
 
Location: Marseille, France

Join Date: Sep 2009
Posts: 27
Default Convertion of SOLiD3 gff to SAM/BAM for IGV browser

Hi,

In a nutshell: a recipe to show SOLID3 reads in IGV

We wish to visualize SOLiD3 reads on the human genome, but we've been unable to use ABI's SAB (performance issues). To instead use IGV (https://www.broadinstitute.org/igv/) which - in our hands - is much more scalable, we needed to convert the ABI v2 flavor gff to SAM/BAM (the alignment optimized BAM format might by itself explain some of the performance superiority of IGV over SAB).

We therefore tried to install the ABI "GffToSam" utility of the "SOLiD BaseQV Tool" package.

First, to compile GffToSam on debian based Linux, one has to add the following include line to the c++ source code:
Code:
#include <cstdlib>
Second, when run against SOLiD3 gff alignment files, the GffToSam utility crashes with:

Code:
GffToSam -i alignement_chr17_ASG05_200000.v2.gff -id S1 -sm ASG05bis -o test.sam
Total number of gff records: 199920
Starting frag gff to sam conversion..
Total number of chromosomes: 1
Chr Indexes:     0
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::at
Abandon

After much time debugging, it appears that the SOLID3 gff files contain color_seq and color_qual definitions that are mostly longer than the sequence definition for each read. It seems the GffToSam utility never anticipated this, leading to "out_of_range" errors. Maybe the gff file specs have changed in SOLID3?

Anyhow, here's a code hack of GffToSam.cpp around line 1220 that has allowed me to successfully produce SAM format files for SOLID3. Has anyone encountered this issue, as I'd like to make sure that at least the approach is valid (ie shortening color_seq and color_qual to seq length):

Code:
            /* Read Gff attributes */
            while (getline(linestream, token, ';')) // Attributes
            {
                if (token.substr(0,2).compare("b=")==0)
                {
                    seq = token.substr(2);
                    seq = toUpperCase (seq);
                    bvalid = true;
                }
                else if (token.substr(0,2).compare("g=")==0)
                {
                    color_seq = token.substr(2);
                    //hack start, force  color_seq same length as seq
                    if (strand.compare("-")==0)
                    {
                        start_pos += color_seq.length() - seq.length();
                    }
                    color_seq = color_seq.substr(0,seq.size());//hack end
                    gvalid = true;
                }
                else if (token.substr(0,2).compare("q=")==0)
                {    //hack start, force  color_qual same length as seq
                    color_qual = token.substr(2);
                    istringstream linestream2 (color_qual);
                    string token2;
                    int nbqv = 0;
                    string color_qual2 = "";
                    while (getline(linestream2, token2, ',')) // QV's
                    {
                        nbqv++;
                        if (nbqv <= seq.size()) color_qual2 += ',' + token2;
                    }
                    color_qual = color_qual2.substr(1);//hack end
                    //color_qual = token.substr(2);
                    qvalid = true;
                }
                else if (token.substr(0,2).compare("s=")==0)
                {
                    annotations = token.substr(2);
                }
                else if (token.substr(0,2).compare("r=")==0)
                {
                    ref_changes = token.substr(2);
                }
                else if (token.substr(0,2).compare("c=")==0)
                {
                    mates_att = token.substr(2);
                    cvalid = true;
                }
                else if (token.substr(0,2).compare("i=")==0)
                {
                    ref_index = token.substr(2);
                    rvalid = true;
                }
            }
Nitty gritty details of gff to bam conversion:

-install SOLiD_BaseQV_Tool_1.0.0 from http://solidsoftwaretools.com/gf/project/sam/frs/

-edit GffToSam.cpp as described above, compile

-convert SOLID gff:
Code:
GffToSam -i alignement_chr17_ASG05.v2.gff -o alignement_chr17_ASG05.sam -id S1 -sm ASG05
-correct indexed chromosome names, eg with a fix.pl perl script:
Code:
#!/usr/bin/perl
while ($line = <>){
	$line =~ s/^(\S+\s\d+)\s\S+/$1\tchr17/;
	print $line;
}
Code:
./fix.pl alignement_chr17_ASG05.sam > alignement_chr17_ASG05_fixed17.sam
-install "samtools" from http://samtools.sourceforge.net/samtools.shtml

-define chromosome reference sequence (here chr17) from FASTA file:
Code:
samtools faidx chr17.fa
-convert SAM file to BAM:
Code:
samtools import chr17.fa.fai alignement_chr17_ASG05_fixed17.sam alignement_chr17_ASG05_fixed17.sam.bam
-sort BAM file:
Code:
samtools sort alignement_chr17_ASG05_fixed17.sam.bam alignement_chr17_ASG05_fixed.sorted
-index BAM file:
Code:
samtools index alignement_chr17_ASG05_fixed17.sam.sorted.bam
-load in IGV with File->Load "alignement_chr17_ASG05_fixed17.sam.sorted.bam"

There must be a better way, but this works:
Attached Images
File Type: jpg Capture-IGV-2.jpg (18.1 KB, 116 views)
hingamp is offline   Reply With Quote
Old 09-27-2009, 11:45 PM   #2
polyhedron
Member
 
Location: China

Join Date: Aug 2009
Posts: 12
Default

Hello!

I'm also trying to view the reads. I obtained .bam files from Aspera, and made the .bam.bai . This file can be read using ./samtools tview , however, it doesn't work under IGV. (since tview is not so convenient to use as the graphic interface tools.) In IGV, although I can load the .bam file, but only the reference is displayed, but I can't see the reads. Does anyone know why? Or which other freely available tool can be used for viewing .bam or .sam files? Thanks!
polyhedron is offline   Reply With Quote
Old 09-27-2009, 11:55 PM   #3
hingamp
Member
 
Location: Marseille, France

Join Date: Sep 2009
Posts: 27
Default

Hi,

Make sure you zoom in sufficiently (by default zoom level of less than 30kb) or else alignments are not displayed.

You can change the 30kb default value in the "Alignments" tab of the Preferences window of IGV.

Pascal
hingamp is offline   Reply With Quote
Old 09-28-2009, 12:27 AM   #4
polyhedron
Member
 
Location: China

Join Date: Aug 2009
Posts: 12
Default

I'm sure that I've zoomed enough, since I can see the colorful sequences of the reference. And I can see the reads with tview, but in IGV, when I scroll to the same position, I can see nothing.
polyhedron is offline   Reply With Quote
Old 09-28-2009, 02:38 AM   #5
hingamp
Member
 
Location: Marseille, France

Join Date: Sep 2009
Posts: 27
Default

Indeed if you see the nucleotides, then you should see the reads... I've not used .bam files outside the ones produced from solid files.
hingamp is offline   Reply With Quote
Old 11-27-2009, 05:29 PM   #6
sgusev
Junior Member
 
Location: nyc

Join Date: Nov 2009
Posts: 2
Default

I am also having trouble seeing the reads though I followed through the guide completely. I can see them in the GFF files using apollo, and even though IGV loads without error it shows nothing regardless of where I zoom? Any suggestions? Is there another bamfile viewer that I could use?
sgusev is offline   Reply With Quote
Old 11-30-2009, 11:40 PM   #7
hingamp
Member
 
Location: Marseille, France

Join Date: Sep 2009
Posts: 27
Default

If the samtools commands have been able to work with the BAM files, I don't understand why IGV can't display the reads. Have you tried viewing the reads with samtools tview? Have you changed the 30kb default window size value in the "Alignments" tab of the Preferences window of IGV? No other viewer we have tried is able to efficiently handle the amount of data produced by SOLiD.

Last edited by hingamp; 11-30-2009 at 11:44 PM.
hingamp is offline   Reply With Quote
Old 12-03-2009, 09:45 AM   #8
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

As another option, we used BFAST to align our whole genome SOLiD data and it generates BAM files as output. I have visualized this output successfully in IGV without a hitch.
Michael.James.Clark is offline   Reply With Quote
Old 03-03-2010, 02:45 PM   #9
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I just used a sorted indexed bam file and was able to view it in IGV. I have the bai index file in the same folder (guess IGV requires that).

It needed a fair bit of zooming to be able to see the coverage and read info. These are paired reads and gotta play around a bit more to comment on how useful it really is..

anyone has suggestions to see a nice coverage plot for PE Illumina exome data?
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO