Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Michael.James.Clark
    Senior Member
    • Apr 2009
    • 207

    Nature Methods AOP: CREST maps somatic structural variation in cancer genomes...



    Yet another SV detection program. Of course, this one claims to be the best.

    I think this may be best utilized in conjunction with algorithms that utilize discordant paired reads like my own BreakWay or Breakdancer or other algorithms like BreakSeq. Also, with Pindel. They do seem to make a case that the best results are obtained by looking for events that are identified by multiple methods.

    I do want to point out also that some breakpoints have been identified to the nucleotide level in next gen data in the past. I think the statement about it in the paper is a bit misleading, actually, but perhaps that's true for the couple of examples they put. Still, it's not the first time it's been done nor the only program capable of it (though CREST may be the best at it, and they make a strong case for that).

    Regardless, an interesting paper and yet another tool to implement in the murky world of SV detection!
    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
    Projects: U87MG whole genome sequence [Website] [Paper]
  • aquinom85
    Research Bioinformaticist
    • Dec 2011
    • 19

    #2
    Have you used it on your own data at all?

    After almost a full day of installiing and getting all the dependencies running I was able to use their sample files, but when I tried to generate the inputs with extractSClip.pl i came back the next morning with a nice
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).

    but the header looked fine to me:
    aquinom@ubuntu:~/Crest$ samtools view -H ~/.gvfs/workorders\ on\ kcifs03/0135/seqdata/SS6002602/Assembly/genome/bam/SS6002602.bam
    @HD VN:1.0 SO:coordinate
    @PG ID:CASAVA VN:CASAVA-1.8.0a19 CL:/illumina/development/casava/CASAVA-1.8.0a19_NMNM_BAMFIX/bin/configureBuild.pl --targets all bam --inSampleDir=../Aligned/B0205ACXX/Sample_SS6002602 --outDir=/isilon/RUO/Projects/Knome/SS6002602/Assembly --samtoolsRefFile=/isilon/Genomes/FASTA_UCSC/HumanNCBI37_UCSC/HumanNCBI37_UCSC_XY.fa --jobsLimit=40 --variantsPrintUsedAlleleCounts --variantsWriteRealigned --sortKeepAllReads --bamChangeChromLabels=OFF --sgeQueue=all.q --tempDir=/state/partition1
    @SQ SN:chr1 LN:249250621
    @SQ SN:chr2 LN:243199373
    @SQ SN:chr3 LN:198022430
    @SQ SN:chr4 LN:191154276
    @SQ SN:chr5 LN:180915260
    @SQ SN:chr6 LN:171115067
    @SQ SN:chr7 LN:159138663
    @SQ SN:chrX LN:155270560
    @SQ SN:chr8 LN:146364022
    @SQ SN:chr9 LN:141213431
    @SQ SN:chr10 LN:135534747
    @SQ SN:chr11 LN:135006516
    @SQ SN:chr12 LN:133851895
    @SQ SN:chr13 LN:115169878
    @SQ SN:chr14 LN:107349540
    @SQ SN:chr15 LN:102531392
    @SQ SN:chr16 LN:90354753
    @SQ SN:chr17 LN:81195210
    @SQ SN:chr18 LN:78077248
    @SQ SN:chr20 LN:63025520
    @SQ SN:chrY LN:59373566
    @SQ SN:chr19 LN:59128983
    @SQ SN:chr22 LN:51304566
    @SQ SN:chr21 LN:48129895
    @SQ SN:chrM LN:16571

    Has anyone had any luck with this?

    Comment

    • ben.weisburd
      Junior Member
      • Oct 2010
      • 9

      #3
      I've run CREST successfully before, and the only difference I can see is that my bam also had a read group:
      @RG ID:1 PL:illumina PU:1 LB:1 SM:Sample_11p

      Not sure if that's the issue, but...
      picard AddOrReplaceReadGroups is a good tool for adding this.

      Comment

      • aquinom85
        Research Bioinformaticist
        • Dec 2011
        • 19

        #4
        Thanks, I'll try that if I get the error again. Did it take you a very long time to run the ./extractSClip.pl on your bams?

        Comment

        • ben.weisburd
          Junior Member
          • Oct 2010
          • 9

          #5
          No not long - about 1 hr for a 7Gb bam.

          Comment

          • aquinom85
            Research Bioinformaticist
            • Dec 2011
            • 19

            #6
            Heh WGS gives 180-200G bams, so I guess ~24h but if I run in parallel maybe ~6 on a quad core. Alas, I'm using bams that were aligned with CASAVA and thus I don't think they have any soft-clipped reads...are those only available if a genome is aligned with bwa sampe?

            Comment

            • ben.weisburd
              Junior Member
              • Oct 2010
              • 9

              #7
              Yeah I was using bwa sampe on whole exome samples. Not sure about CASAVA and soft clipping.

              Comment

              • aquinom85
                Research Bioinformaticist
                • Dec 2011
                • 19

                #8
                Is there a readme that explains how to read the alignment created by bam2html? I couldn't find one on their website or within the README that comes with the software.

                Comment

                • ben.weisburd
                  Junior Member
                  • Oct 2010
                  • 9

                  #9
                  Has anyone written scripts to convert CREST output to a format that can be visualized in Circos or another SV viewer?
                  What tools are people using to visualize the SVs?
                  Thanks
                  -Ben
                  Last edited by ben.weisburd; 04-11-2012, 12:51 AM.

                  Comment

                  • bw.
                    Member
                    • Mar 2012
                    • 21

                    #10
                    I'm running CREST on exome seq samples.
                    On 8 out of the approx. 50 samples, the tool hangs after partially running through the step where it prints:

                    Output is in /tmp/486391.1.all.q/6IGDfMyBGM/tiw652Qy6Q.fa.clip.fa.psl
                    21 38520525 - 1
                    Output is in /tmp/486391.1.all.q/6IGDfMyBGM/oZqu2DckND.fa.clip.fa.psl
                    GL000211.1 156726 + 1
                    Output is in /tmp/486391.1.all.q/6IGDfMyBGM/m9Sp2Shk_H.fa.clip.fa.psl
                    7 2768145 - 3
                    Output is in /tmp/486391.1.all.q/6IGDfMyBGM/_eFzYi5xLz.fa.cap.contigs.clip.fa.psl

                    ...

                    After that it doesn't produce any more output and doesn't terminate.
                    It doesn't ever get to the part where it says "SV filter starting...."

                    This makes the tool unusable for these samples, and I can't see what differentiates these samples from my other samples where CREST completes normally and outputs the table of structural variants.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    21 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    29 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-26-2026, 10:12 AM
                    0 responses
                    31 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...