Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by michaelb28 View Post
    Quick Question - Has anybody tried running CONTRA on a VM? I am running it in a Ubuntu-Linux environment, and I'm trying to find out if that may be the source of my problem. I always receive this error, even though all of my scripts are in the correct directory.

    Traceback (most recent call last):
    File "contra.py", line 569, in <module>
    main()
    File "contra.py", line 514, in main
    get_genome(params.TEST, genomeFile)
    File "/home/michaelb/Documents/Contra-II/CONTRA.v2.0.2/scripts/get_chr_length.py",
    line 31, in get_genome
    raw_header = subprocess.Popen(args, stdout =
    subprocess.PIPE).communicate()[0]
    File "/usr/local/lib/python2.6/subprocess.py", line 595, in __init__
    errread, errwrite)
    File "/usr/local/lib/python2.6/subprocess.py", line 1106, in _execute_child
    raise child_exception
    OSError: [Errno 2] No such file or directory
    I realize this thread is over a year old, so this answer is probably not relevant to anyone who posted here. However, given that this is the only discussion of get_chr_length.py I could find anywhere online, I thought I should post my solution. It turns out that error is produced when samtools is missing.

    If any CONTRA developers are reading this, I'd like to suggest adding a quick check at the beginning for all the prerequisites. It seems like most of the errors are due to either missing prerequisites or incorrect versions.

    Comment


    • #17
      CONTRA on whole exome - problem with large bam file

      Hello,

      I just tried CONTRA for doing CNV detection in tumor/sample coupled sample.

      I use Ubuntu distribution, python 2.7.3, R 2.15.

      CONTRA worked well with test sample included in the installation package.

      However, when I try with my own bam file, it fails.
      The error says that the process is stopped after "Getting the Log Ratio ...".

      Here is the whole command line and stout :

      time /home/cecile/Documents/appli/CONTRA/CONTRA.v2.0.4/contra.py -f hg19.fa -c suite_analyse/GL_patient_GCCAAT_L006.reorder_contig.sort_by_coordinate.add_or_replace_group.mark_duplicates.indel_realignment.recal.bam -s suite_analyse/DIAG_patient_CTTGTA_L006.reorder_contig.sort_by_coordinate.add_or_replace_group.mark_duplicates.indel_realignment.recal.bam -t /home/cecile/Documents/data/capture/Agilent_SureSelectAllExonHumanv4/S03723314_Regions.bed -p -o output_CONTRA_param --sampleName WEA_DIAG --nomultimapped --minControlRdForCall 10 --minTestRdForCall 10 -l
      target : /home/cecile/Documents/data/capture/Agilent_SureSelectAllExonHumanv4/S03723314_Regions.bed
      test : suite_analyse/DIAG_patient_CTTGTA_L006.reorder_contig.sort_by_coordinate.add_or_replace_group.mark_duplicates.indel_realignment.recal.bam
      control : suite_analyse/GL_patient_GCCAAT_L006.reorder_contig.sort_by_coordinate.add_or_replace_group.mark_duplicates.indel_realignment.recal.bam
      fasta : hg19.fa
      outfolder : output_CONTRA_param
      numBin : [20]
      minreaddepth : 10
      minNBases : 10
      sam : False
      pval : 0.05
      sampleName : WEA_DIAG
      nomultimapped : True
      plot : True
      bedInput : False
      minExon : 2000
      largeDeletion : True
      Creating Output Folder : Done.
      Removing multi-mapped reads
      Multi mapped reads removed.
      Multi mapped reads removed.
      Converting TEST Sample...
      DEBUG 123 genomeCoverageBed -ibam output_CONTRA_param/buf/test_reliable.BAM -bga -g output_CONTRA_param/buf/sample.Genome
      Converting CONTROL Sample...
      DEBUG 123 genomeCoverageBed -ibam output_CONTRA_param/buf/control_reliable.BAM -bga -g output_CONTRA_param/buf/sample.Genome
      Getting targeted regions DOC...
      chr1
      chr10
      chr11
      chr12
      chr13
      chr14
      Getting targeted regions DOC...
      chr1
      chr15
      chr16
      chr10
      chr17
      chr11
      chr18
      chr19
      chr12
      chr2
      chr13
      chr14
      chr15
      chr20
      chr16
      chr21
      chr22
      chr17
      chr3
      chr18
      chr19
      chr4
      chr5
      chr2
      chr6
      chr7
      chr20
      chr21
      chr22
      chr8
      chr3
      chr9
      chrX
      chr4
      chrY
      Targeted regions pre-processing: Done
      chr5
      chr6
      chr7
      chr8
      chr9
      chrX
      chrY

      Targeted regions pre-processing: Done
      Test file read depth = 8701473580
      Control file read depth = 8850452831
      Pre-processing Completed.
      Getting the Log Ratio ...
      Processus arrêté


      real 296m18.786s
      user 153m41.800s
      sys 11m19.754s


      Do you have an idea of what causes this error ? Are my bam files (~18Ga) too large ?

      Thank you for your help !!!

      Comment


      • #18
        Looks like CONTRA locked up when it tried to calculate the log ratios. You may want to try increasing your available RAM, reducing your file size, or adjust your input where CONTRA doesn't have to read as many rows. That may involve increasing the minimum size acceptable for a read.

        Can you post your computer specs?

        Comment


        • #19
          Here are my computer specs :

          4 processors, such as :
          processor : 0
          vendor_id : GenuineIntel
          cpu family : 6
          model : 45
          model name : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
          stepping : 7
          microcode : 0x70b
          cpu MHz : 1200.000
          cache size : 10240 KB

          MemTotal: 65902720 kB

          My bam contain about 181 000 000 and 157 000 000 reads. When I remove duplicates, I can eliminate 17% of them. I can try again with those file.
          Generally, how do you deal with large files ? Do you perform a per-chromosome analysis ?

          Thank you wolfpack14 !!

          Comment


          • #20
            Interesting. You would think 64GB of RAM would be enough for calculating log ratios on 180MM reads, but I guess not. CONTRA isn't exactly the most efficient piece of software.

            We are in the process of putting samples through the algorithm to compare to other CNV packages. I will keep you posted if we run into issues.

            There are quite a few of CNV packages, including ExomeCNV, PennCNV, ADTEx, CNV-Seq, CNVer and Connifer. I would explore all of your options before settling on one algorithm to implement. That said, CONTRA is nice because it is well integrated into NGS hardware and output formatting, but that won't stop us from using a better algorithm.

            Comment


            • #21
              Found another bug. If CONTRA runs and forces you data into 1 bin, you must launch CONTRA with the option --numBin 1. If you don't, the application will fail.

              It is a bug in how cn_analysis.v3.R handles bin sizes of 1. It doesn't carry over the actual bin size of your data, but rather the specified bin size from the application launch (20 by default).

              Comment


              • #22
                cecile,

                I think your BAM files are too large. I am successfully running CONTRA on BAM files that are about 200-300MB. How big are the bins in your target BED file? That may also make a difference in performance.

                Comment


                • #23
                  Hi wolfpack14,

                  Indeed, my bam files are ~18 Ga.
                  The intervals in my target BED file vary from 200bp to 1000 bp. I did'nt specify the option --nomBin, so I ran CONTRA with the default (20).

                  I ran it after spliting my bam files by chromosome (using Bamtools split), and it also failed. The error was about the bam file, which seemed to be malformed.

                  I did another test by spliting the file with samtools view -bh file.bam chr${num} and it worked.
                  But it seems there is no significant result (nothing in CNATable.10rd.10bases.20bins.DetailsFILTERED) althougt there were 6439 targets.

                  Comment


                  • #24
                    Have you tried playing around with the minExon setting? We have ours set at 100 so we can make appropriate bin sizes. If we left it at 2000 (the default), CONTRA had a tough time splitting the data up into enough bins.

                    We also forced our input BED intervals to width 20. We are getting pretty granular results from this combination.

                    Comment


                    • #25
                      Just solved another issue we were having with using multiple input bed files on a CONTRA built baseline....

                      You have to build a new baseline for each input bed you want to use. CONTRA will only work if the input bed file matches the one used to create the original baseline. I think it has to do with non-matching base pair intervals. I kept getting "list out of index" errors on the control sample.
                      Last edited by wolfpack14; 01-30-2014, 10:45 AM. Reason: Grammar

                      Comment


                      • #26
                        Cecile,

                        I think you forgot to specify that your input file is in BED format. I can tell because your output has this:

                        bedInput : False
                        Use the --bed parameter when you launch CONTRA. I think it will solve your problem.

                        If it doesn't solve your issue, try running without the -l option (CBS Large Variation detection). That is probably what is taking so long.
                        Last edited by wolfpack14; 01-30-2014, 10:53 AM. Reason: Added Info

                        Comment


                        • #27
                          Hi,

                          Im trying to get CONTA working.
                          However, using the testfiles i get the following error:
                          DEBUG 266b

                          fastaFromBed -fi reference/human_g1k_v37.fasta -bed testfix2/buf/CNATable.10rd.10bases.20bins.BED -fo testfix2/buf/CNATable.10rd.10bases.20bins.fastaOut.txt -name
                          Error: The requested bed file (testfix2/buf/CNATable.10rd.10bases.20bins.BED) could not be opened. Exiting!
                          Creating VCF file ...
                          testfix2/table/CNATable.10rd.10bases.20bins.vcf created.
                          Done...
                          Command used:
                          python CONTRA.v2.0.4/contrafix.py -t test_files/0247401_D_BED_20090724_hg19_MERGED.bed -s test_files/P0667T_GATKrealigned_duplicates_marked.bam -c test_files/P0667N_GATKrealigned_duplicates_marked.bam -f reference/human_g1k_v37.fasta -o testfix2

                          I was wondering if someone could help me to solve this error.

                          I am planning to determine CNV on IonTorrent PGM data.
                          Do you think CONTRA would be suitable for this?
                          Or do you suggest another program?

                          Nevertheless, I hope someone can help me to fix this!

                          Thanks in advance

                          Comment


                          • #28
                            Are there still people using CONTRA?

                            Comment


                            • #29
                              Hi nielsk,

                              are you still running into this error?

                              If so, I am just trying out CONTRA for the first time so this may be of no help to you.

                              It seems CONTRA cannot find your .bed file in testfix2/buf/ directory. Can you check your paths are correct? You seem to be executing CONTRA from your home dir or something similar and you have also your test files and your reference in the same directory

                              From my initial tests, CONTRA copies your target bed file to the buf directory and renames it target.BED

                              But in your case CONTRA seems to be looking for the original name of the file. I wonder if copying the file manually into the buf directory will make the trick

                              Anyway, have you managed to get any CNV results from your PGM data? it would be interesting to know if it works on single-end reads with a targeted resequencing experiment

                              HTH

                              Dave
                              Last edited by dnusol; 11-13-2014, 07:52 AM.

                              Comment


                              • #30
                                Hi Dr. Li,

                                I have some problems running CONTRA, and I have posted a thread here:
                                Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                                Could you please help me on that?

                                Thanks a lot!

                                -J

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X