Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hello,
    If I can resurrect this thread briefly, I've been having a little trouble with ERANGE 3.1. When I try to either annotate the genes or use the getfasta.py script to get the fasta sequence for meme, I always and up with a blank file. The annotation script gives no errors, and the getfasta.py script only says there is a problem for each peak. Everything is "installed" properly, and my $PATH variable is set, so that isn't the issue. In looking through the code, I see a call to "Genome', but I haven't been able to locate that anywhere in the code. This might be an issue more with cistematic than with ERANGE, or it's just me missing the obvious fix. Either way, any suggestions would be wonderful. I've already written Perl scripts for some of this, but I'd like to save time and use the built-in scripts if possible. Thanks so much!

    Daniel

    EDIT: It was a result of a difference in the input file. The eland files I was using were slightly different than normal; everything works as it should now!
    Last edited by dnewkirk; 06-10-2009, 11:24 AM.

    Comment


    • #17
      ERANGE problem

      Hello Everyone,

      I am trying to run ERANGE 3.1 (http://woldlab.caltech.edu/rnaseq/) with my own ChIP-Seq dataset. I have eland_extended files for lane 1 & 2. I convert them serially to RDS using makerdsfromeland2.py script. Then I run ERANGE with the following command.

      python ./findall.py EXP2vsEXP1 EXP2.rds ./output/EXP2vsEXP1.txt -control EXP1.rds -listPeak -revbackground -ratio 3 -nodirectionality

      It runs for about 10 minutes and finally succeeds. The EXP2vsEXP1.txt file has the following information.

      #enriched sample: EXP2.rds (7.8 M reads)
      #control sample: EXP1.rds (7.1 M reads)
      #enforceDirectionality=False listPeak=True nomulti=False cache=False
      #spacing<50 minimum>4.0 ratio>3.0 minPeak=0.5 trimmed=10%
      #minPlus=0.25 maxPlus=0.75 leftPlus=0.40 shift=0 pvalue=back
      #regionID chrom start stop RPM fold multi% peakPos peakHeight pValue
      EXP2vsEXP11 ref_chr9 83452778 83452862 7.3 3.1 57.3 83452819 7.0 2.5e-17
      #stats: 7.3 RPM in 1 regions
      #8 regions (37.6 RPM) found in background (FDR = 100.00 percent)


      But i am pretty sure that there are more than one peak with this data set (from the peak files I received from the contractor company). Could someone help me where exactly am I doing wrong.

      Regards,
      PJ
      Last edited by joshibros; 09-18-2009, 07:49 AM. Reason: Naming Issues

      Comment


      • #18
        @!$#@%@#%#^%#@@^@
        Last edited by mcknowable; 11-11-2009, 06:04 AM. Reason: delete

        Comment


        • #19
          empty splicefa file

          does anyone happen to know why would i end up with an empty file after running the "getsplicefa.py" script? I have the knowngene.txt as well as the correct length - 4! but still no results are made.

          Thanks in advance.

          Comment


          • #20
            Hello,

            I have a question related to "cistematic" for RNA-seq analysis. The reads were aligned using bowtie. After that I built the RDS file using makerdsfrombowtie.py. Then I tried running geneMrnaCounts.py but received an error like below.

            #Run script
            python geneMrnaCounts.py Mouse mm9.mybowtie.rds mm9.mybowtie.uniqs.count -markNM

            #Error
            Traceback (most recent call last):
            File "geneMrnaCounts.py", line 16, in <module>
            from cistematic.genomes import Genome
            File "cistematic/cistematic/genomes/__init__.py", line 907, in <module>
            geneDB[genome] = eval(genome+'.geneDB')
            File "<string>", line 1, in <module>
            AttributeError: 'module' object has no attribute 'geneDB'

            Does anyone have any insight? Thank you very much for your help!
            Best regards,
            C C
            Last edited by ww236c; 12-17-2009, 01:47 PM. Reason: about Cistematic

            Comment


            • #21
              Hi,

              I have a traceback error as well but it's not related to Cistematic.

              python geneMrnaCountsWeighted.py human bowtie_1205_2_hg18.rds bowtie_1205_2_hg18.expanded.rpkm bowtie_1205_2_hg18.multi.count -accept bowtie_1205_2_hg18.accepted.rpkm -multi -cache 1


              returning 37279 regions
              human cached
              caching ....
              dataset bowtie_1205_2_hg18.rds
              metadata:
              bowtie_mapped True
              dataType RNA
              paired True
              rdsVersion 1.1
              readsize 40

              19380512 unique reads, 35893 spliced reads and 1531861 multireads
              default cache size is 100000 pages
              found index

              1 read 100000 read 200000
              10 read 300000
              10_random
              11 read 400000 read 500000
              12 read 600000 read 700000
              13 read 800000
              13_random
              14 read 900000
              15
              15_random read 1000000
              16
              16_random
              17 read 1100000
              17_random
              18 read 1200000
              19
              1_random
              2 read 1300000 read 1400000 read 1500000
              20
              21
              21_random read 1600000
              22
              22_h2_hap1
              Traceback (most recent call last):
              File "geneMrnaCountsWeighted.py", line 119, in <module>
              for (tagStart, tagReadID) in hitDict[fullchrom]:
              KeyError: u'chr22_h2_hap1'



              I get it for another rds file I was running as well but it failed on;

              "KeyError: u'chr16_random'"



              Thanks for any help. I'm really not sure what this error relates to and how to fix it!

              Kasycas

              Comment


              • #22
                findall.py works for 1M records; fails silently for 28M

                Hello --

                I am trying to process an Illumina ChIP-seq experiment with a background: two 4-5GB files with about 30M mappings apiece (ELAND_MULTI files). Using a sample of 1M records, I got credible-looking results. For example, the output for chromosome 4 in the sample run output looks like:
                chromosome chr4
                calculating background...
                5 36.0783776167
                Poisson n=114533, p=0.280103
                #regionID chrom start stop RPM fold multi% plus% leftPlus% peakPos peakHeight pValue
                s_2-s_19 chr4 80274411 80274599 6.1 4.8 43.9 33.0 67.0 80274448 3.0 5.1e-07
                ...

                However, with the full files (ca. 30M records apiece), findall.py runs for many hours; then fails with no error message. All the chromosome backgrounds are 0, and there are no peaks reported; also the chip_regions.txt file is not created.

                Procedure:
                Following Alli's helpful responses to earlier posts, I converted two ELAND_MULTI files to RDS using this syntax:
                python2.6 ~/bin/ERANGE3.1/commoncode/mkrds.py $1 ~/test/$1_eland_multi.txt ~/test/$1.rds -index -cache 2000000
                (where $1 is the unique part of each file name).

                Then I ran findall.py as follows:
                python2.6 ~/bin/ERANGE3.1/commoncode/findall.py $1-$2 ~/test/$1.rds ~/test/$1_chip_regions.txt -control ~/test/$2.rds -listPeak -revbackground

                I don't believe there were any different parameters, etc.; the difference was just creating the two RDS files with 1M source lines each, or about 30M each. Is anybody aware of a limit in the program or python (2.6.4) that might cause such a silent failure based solely on the data size? It was a 64-bit computer with 32GB of total RAM; other processes took some of that, but I don't see any obvious external limitations.

                Any suggestions will be much appreciated.

                Thanks!
                Howie Goodell

                Comment


                • #23
                  further issues

                  Dear all,

                  I'm still having a great deal of trouble with ERANGE. Whether I use paired end data or single read, I get the following error on the very last script of the runStandardAnalysis.sh pipeline;


                  /usr/local/commoncode/geneMrnaCountsWeighted.py: version 3.7
                  merged 0 times
                  returning 0 regions
                  dataset posStrand_read1_paired_hg19_spliceCore_ALL1205_10m_1mm_hits.rds
                  metadata:
                  bowtie_mapped True
                  dataType RNA
                  paired True
                  rdsVersion 1.1
                  readsize 40

                  11744838 unique reads, 0 spliced reads and 0 multireads
                  default cache size is 100000 pages
                  found index

                  1
                  Traceback (most recent call last):
                  File "/usr/local/commoncode/geneMrnaCountsWeighted.py", line 128, in <module>
                  for (tagStart, tagReadID) in hitDict[fullchrom]:
                  KeyError: u'chr1'
                  /usr/local/commoncode/normalizeFinalExonic.py: version 3.5
                  reporting fractional contribution of multireads
                  dataset posStrand_read1_paired_hg19_spliceCore_ALL1205_10m_1mm_hits.rds
                  metadata:
                  bowtie_mapped True
                  dataType RNA
                  paired True
                  rdsVersion 1.1
                  readsize 40
                  default cache size is 100000 pages
                  found index
                  returned 29469 genes



                  Please, has anyone come across this? I'm going mad as I can't see it would be the data I'm using, nor the way I'm running it as this is fairly straightforward...

                  Thanks,

                  Kas

                  Comment


                  • #24
                    cistematic.core error

                    I realize that this has been posted before but I still get a cistematic error

                    >python2.5 $ERANGEPATH/getallgenes.py hsapiens chip_regions.7_vs_6>.nodirection.txt outfile

                    >'import site' failed; use -v for traceback

                    >psyco not running

                    >Traceback (most recent call last):
                    > File "/usr/local/bioinf/ERANGE3.1/commoncode/getallgenes.py", line 6, in
                    > <module> from cistematic.core import genesIntersecting,
                    > featuresIntersecting, cacheGeneDB, uncacheGeneDB
                    > ImportError: No module named cistematic.core

                    I'm on a 64 bit linux machine, so I don't think I need psyco. I've exported the paths for both python2.5 and cistematic.

                    Any other ideas?

                    Thanx

                    Comment


                    • #25
                      What directory have you exported? I had the same error when I mistakenly the path including the cistematic directory (cistematic has a subpackage called stat which clashes with the basic python one in this case and causes the import site failure).

                      Comment


                      • #26
                        Thanks, I'll give that a try.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        8 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        8 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        66 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X