Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alta-Cyclic

    Has anyone played with Alta-Cyclic? It claims to be an improved base caller for the Illumina GA.

    Nick

  • #2
    Just saw the paper...http://seqanswers.com/forums/showthread.php?t=515

    Darn NewsBot! needs an upgrade, he missed it!

    Comment


    • #3
      as I understood, it just works on paired end data.. nothing on single-reads!
      --
      bioinfosm

      Comment


      • #4
        We tried it here. Not noticing a marked improvement (5%) over Illumina's software, but we only ran it on a subset of one lane. Paired-end definitely not required. The great thing is that we wouldn't need to run phi X and waste a lane...

        Comment


        • #5
          Originally posted by ucpete View Post
          We tried it here. Not noticing a marked improvement (5%) over Illumina's software, but we only ran it on a subset of one lane. Paired-end definitely not required. The great thing is that we wouldn't need to run phi X and waste a lane...
          My understanding was that Altacyclic needs to be trained from the PhiX lane of each run. Also wouldn't you still need the PhiX lane for calibrating the quality scores?

          Comment


          • #6
            We've not been able to persuade GA Pipeline 1.0 to give decent calibrated base call quality score results for our 45bp reads. Is I think due to use of ELAND and that eland_extended isnt great for >36bp-ish (cant remember if it is 32 or 36 now, I forget) reads. Calibration at pos 36-45 looks wrong.

            Anyone else seen this? Have we done something stupid?

            david

            Comment


            • #7
              Originally posted by new300 View Post
              My understanding was that Altacyclic needs to be trained from the PhiX lane of each run. Also wouldn't you still need the PhiX lane for calibrating the quality scores?
              You don't necessarily need the phi X lane, rather any reference genome against which to align your reads. In our case, we're doing metagenomic studies and can use the host genome as our reference.

              Comment


              • #8
                Originally posted by ucpete View Post
                You don't necessarily need the phi X lane, rather any reference genome against which to align your reads. In our case, we're doing metagenomic studies and can use the host genome as our reference.
                That sounds like a neat experiment! I've not tried it but can the Illumina pipeline not use other genomes as a reference for calibration?

                Comment


                • #9
                  Originally posted by new300 View Post
                  That sounds like a neat experiment! I've not tried it but can the Illumina pipeline not use other genomes as a reference for calibration?
                  Yes, technically. According to Illumina, you can use any genome as a reference for calibration as long as it has 50% GC content. They claim also that this is a very strict requirement, i.e. it can't sway by more than 0.5%. The crappy part about their error rate calculations is that it's only based on those reads that actually align to the reference genome, so if you have a read with > 2 mismatches it won't even align by ELAND to phi X and won't be considered in the error calculations...

                  Comment


                  • #10
                    Originally posted by ucpete View Post
                    Yes, technically. According to Illumina, you can use any genome as a reference for calibration as long as it has 50% GC content. They claim also that this is a very strict requirement, i.e. it can't sway by more than 0.5%.
                    Yep, I guess what everyone wants ideally is a fixed calibration table. I'm surprised it makes that much of a difference though.

                    Originally posted by ucpete View Post
                    The crappy part about their error rate calculations is that it's only based on those reads that actually align to the reference genome, so if you have a read with > 2 mismatches it won't even align by ELAND to phi X and won't be considered in the error calculations...
                    I think that should only make a difference if highly errored reads have a different error source than reads with one or two errors. The fraction of errors within a bin associated with a given feature will still be the same if you look at reads with few errors or many.

                    What I think looking aligned reads does for you is discard contamination. This is useful as these aren't really errors. For my own calibrator I found that letting reads with about 5 errors through was the sweet spot. So in general I think discarding reads that clearly don't come from the reference genome during calibration is a good thing.

                    IIRC the Alta cyclic paper doesn't assess the quality scores they assign, do you find the quality scores assigned by Alta cyclic accurate?

                    Comment


                    • #11
                      Originally posted by new300 View Post
                      For my own calibrator I found that letting reads with about 5 errors through was the sweet spot.
                      Nav,

                      Am interested:

                      1. sweet spot=5 errors, but in what read length - 36bp, 45bp, 70bp etc ?

                      2. Did you remove homopolymer, and "low base quality across entire read" reads first, or rely on the alignment for this?

                      david

                      Comment


                      • #12
                        Originally posted by dvh View Post
                        Nav,

                        Am interested:

                        1. sweet spot=5 errors, but in what read length - 36bp, 45bp, 70bp etc ?

                        2. Did you remove homopolymer, and "low base quality across entire read" reads first, or rely on the alignment for this?

                        david
                        I was looking at 36bp reads, just filtered by alignment. I was just using phiX so anything low complexity like homopolymers should get filtered out by alignment. IIRC phix is unique at around 12bp so even with 5 errors you're unlikely to mis-align a 36bp read. Making sure I excluded SNP positions had more of an effect, but that's probably down to the fact I was using a really naive algorithm...

                        Comment


                        • #13
                          Originally posted by ucpete View Post
                          Yes, technically. According to Illumina, you can use any genome as a reference for calibration as long as it has 50% GC content. They claim also that this is a very strict requirement, i.e. it can't sway by more than 0.5%. The crappy part about their error rate calculations is that it's only based on those reads that actually align to the reference genome, so if you have a read with > 2 mismatches it won't even align by ELAND to phi X and won't be considered in the error calculations...
                          Originally Solexa provided two aligners for precisely this reason. One is Eland with a 2 error limit. The other was PhageAlign with no limit. You were thus able to force align all of the reads and count all of the errors. This was done deliberately and as policy during the development of the tech and transferred as part of the pipeline. Theres nothing to stop you doing this. I think an issue is really the speed of PhageAlign. Its very very very very slow - so you probably only want to do it on a chosen sub-sample of tiles rather than a whole lane.

                          In fact on 'normal' runs Eland only 'discards' about 3-5% of reads (last time I looked - things may have changed) - some of which will be truly erroneous - some will be contaminants that slipped filters and other oddness caused for example by imaging artifacts.

                          I still say, if you are getting a significant percentage of reads with more than two errors then something is seriously awry with your system becuase youd be looking at error rates in the high single to double percentages.

                          Comment


                          • #14
                            Hello everybody,

                            Does anybody tried to compare Alta-Cyclic, the Illumina Pipeline (GAP*1.4.0) and Ibis (http://genomebiology.com/2009/10/8/R83 ) ?
                            Last edited by yvan.wenger; 10-07-2009, 01:08 AM. Reason: added Illumina pipeline version

                            Comment


                            • #15
                              Update: I tried Ibis and it performed slightly better than the GA Pipeline 1.4 on 3 lanes, ~60x10⁶ raw reads, 76 bp. Great tool. For the comparison I*tested raw reads (Ibis) vs raw reads (GAP).

                              Does anyone knows the exact criteria that GERALD*uses to choose to discard low quality reads?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X