Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convertion of SOLiD3 gff to SAM/BAM for IGV browser

    Hi,

    In a nutshell: a recipe to show SOLID3 reads in IGV

    We wish to visualize SOLiD3 reads on the human genome, but we've been unable to use ABI's SAB (performance issues). To instead use IGV (https://www.broadinstitute.org/igv/) which - in our hands - is much more scalable, we needed to convert the ABI v2 flavor gff to SAM/BAM (the alignment optimized BAM format might by itself explain some of the performance superiority of IGV over SAB).

    We therefore tried to install the ABI "GffToSam" utility of the "SOLiD BaseQV Tool" package.

    First, to compile GffToSam on debian based Linux, one has to add the following include line to the c++ source code:
    Code:
    #include <cstdlib>
    Second, when run against SOLiD3 gff alignment files, the GffToSam utility crashes with:

    Code:
    GffToSam -i alignement_chr17_ASG05_200000.v2.gff -id S1 -sm ASG05bis -o test.sam
    Total number of gff records: 199920
    Starting frag gff to sam conversion..
    Total number of chromosomes: 1
    Chr Indexes:     0
    terminate called after throwing an instance of 'std::out_of_range'
      what():  basic_string::at
    Abandon

    After much time debugging, it appears that the SOLID3 gff files contain color_seq and color_qual definitions that are mostly longer than the sequence definition for each read. It seems the GffToSam utility never anticipated this, leading to "out_of_range" errors. Maybe the gff file specs have changed in SOLID3?

    Anyhow, here's a code hack of GffToSam.cpp around line 1220 that has allowed me to successfully produce SAM format files for SOLID3. Has anyone encountered this issue, as I'd like to make sure that at least the approach is valid (ie shortening color_seq and color_qual to seq length):

    Code:
                /* Read Gff attributes */
                while (getline(linestream, token, ';')) // Attributes
                {
                    if (token.substr(0,2).compare("b=")==0)
                    {
                        seq = token.substr(2);
                        seq = toUpperCase (seq);
                        bvalid = true;
                    }
                    else if (token.substr(0,2).compare("g=")==0)
                    {
                        color_seq = token.substr(2);
                        //hack start, force  color_seq same length as seq
                        if (strand.compare("-")==0)
                        {
                            start_pos += color_seq.length() - seq.length();
                        }
                        color_seq = color_seq.substr(0,seq.size());//hack end
                        gvalid = true;
                    }
                    else if (token.substr(0,2).compare("q=")==0)
                    {    //hack start, force  color_qual same length as seq
                        color_qual = token.substr(2);
                        istringstream linestream2 (color_qual);
                        string token2;
                        int nbqv = 0;
                        string color_qual2 = "";
                        while (getline(linestream2, token2, ',')) // QV's
                        {
                            nbqv++;
                            if (nbqv <= seq.size()) color_qual2 += ',' + token2;
                        }
                        color_qual = color_qual2.substr(1);//hack end
                        //color_qual = token.substr(2);
                        qvalid = true;
                    }
                    else if (token.substr(0,2).compare("s=")==0)
                    {
                        annotations = token.substr(2);
                    }
                    else if (token.substr(0,2).compare("r=")==0)
                    {
                        ref_changes = token.substr(2);
                    }
                    else if (token.substr(0,2).compare("c=")==0)
                    {
                        mates_att = token.substr(2);
                        cvalid = true;
                    }
                    else if (token.substr(0,2).compare("i=")==0)
                    {
                        ref_index = token.substr(2);
                        rvalid = true;
                    }
                }
    Nitty gritty details of gff to bam conversion:

    -install SOLiD_BaseQV_Tool_1.0.0 from http://solidsoftwaretools.com/gf/project/sam/frs/

    -edit GffToSam.cpp as described above, compile

    -convert SOLID gff:
    Code:
    GffToSam -i alignement_chr17_ASG05.v2.gff -o alignement_chr17_ASG05.sam -id S1 -sm ASG05
    -correct indexed chromosome names, eg with a fix.pl perl script:
    Code:
    #!/usr/bin/perl
    while ($line = <>){
    	$line =~ s/^(\S+\s\d+)\s\S+/$1\tchr17/;
    	print $line;
    }
    Code:
    ./fix.pl alignement_chr17_ASG05.sam > alignement_chr17_ASG05_fixed17.sam
    -install "samtools" from http://samtools.sourceforge.net/samtools.shtml

    -define chromosome reference sequence (here chr17) from FASTA file:
    Code:
    samtools faidx chr17.fa
    -convert SAM file to BAM:
    Code:
    samtools import chr17.fa.fai alignement_chr17_ASG05_fixed17.sam alignement_chr17_ASG05_fixed17.sam.bam
    -sort BAM file:
    Code:
    samtools sort alignement_chr17_ASG05_fixed17.sam.bam alignement_chr17_ASG05_fixed.sorted
    -index BAM file:
    Code:
    samtools index alignement_chr17_ASG05_fixed17.sam.sorted.bam
    -load in IGV with File->Load "alignement_chr17_ASG05_fixed17.sam.sorted.bam"

    There must be a better way, but this works:
    Attached Files

  • #2
    Hello!

    I'm also trying to view the reads. I obtained .bam files from Aspera, and made the .bam.bai . This file can be read using ./samtools tview , however, it doesn't work under IGV. (since tview is not so convenient to use as the graphic interface tools.) In IGV, although I can load the .bam file, but only the reference is displayed, but I can't see the reads. Does anyone know why? Or which other freely available tool can be used for viewing .bam or .sam files? Thanks!

    Comment


    • #3
      Hi,

      Make sure you zoom in sufficiently (by default zoom level of less than 30kb) or else alignments are not displayed.

      You can change the 30kb default value in the "Alignments" tab of the Preferences window of IGV.

      Pascal

      Comment


      • #4
        I'm sure that I've zoomed enough, since I can see the colorful sequences of the reference. And I can see the reads with tview, but in IGV, when I scroll to the same position, I can see nothing.

        Comment


        • #5
          Indeed if you see the nucleotides, then you should see the reads... I've not used .bam files outside the ones produced from solid files.

          Comment


          • #6
            I am also having trouble seeing the reads though I followed through the guide completely. I can see them in the GFF files using apollo, and even though IGV loads without error it shows nothing regardless of where I zoom? Any suggestions? Is there another bamfile viewer that I could use?

            Comment


            • #7
              If the samtools commands have been able to work with the BAM files, I don't understand why IGV can't display the reads. Have you tried viewing the reads with samtools tview? Have you changed the 30kb default window size value in the "Alignments" tab of the Preferences window of IGV? No other viewer we have tried is able to efficiently handle the amount of data produced by SOLiD.
              Last edited by hingamp; 11-30-2009, 11:44 PM.

              Comment


              • #8
                As another option, we used BFAST to align our whole genome SOLiD data and it generates BAM files as output. I have visualized this output successfully in IGV without a hitch.
                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                Projects: U87MG whole genome sequence [Website] [Paper]

                Comment


                • #9
                  I just used a sorted indexed bam file and was able to view it in IGV. I have the bai index file in the same folder (guess IGV requires that).

                  It needed a fair bit of zooming to be able to see the coverage and read info. These are paired reads and gotta play around a bit more to comment on how useful it really is..

                  anyone has suggestions to see a nice coverage plot for PE Illumina exome data?
                  --
                  bioinfosm

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  29 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X