Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transcriptome reconstruction from RNA-seq data using Scripture.

    Hi all,

    I'm trying to re-construct the transcriptome from RNA-seq data using scripture and I get some errors. I'm pasting a log of errors and commands I've been using.

    I have paired end data from Illumina, I use tophat on these files as:
    Code:
    $nohup tophat -o tophat_out_SRR065506_1 --GTF ../ucsc.gtf ../hg19index/hg19-25chr SRR065506_1.fastq
    
    $nohup tophat -o tophat_out_SRR065506_2 --GTF ../ucsc.gtf ../hg19index/hg19-25chr SRR065506_2.fastq
    I get the standard output files. The accepted_hits.bam is converted into accepted_hits.sam as:

    Code:
    $ samtools view -h -o tophat_out_SRR065506_1/accepted_hits.sam tophat_out_SRR065506_1/accepted_hits.bam
    
    $ samtools view -h -o tophat_out_SRR065506_2/accepted_hits.sam tophat_out_SRR065506_2/accepted_hits.bam
    I now take off the header as:

    Code:
    $ sed '1,2d' tophat_out_SRR065506_1/accepted_hits.sam | sort > tophat_out_SRR065506_1/accepted_hits.sorted.sam
    
    $ sed '1,2d' tophat_out_SRR065506_2/accepted_hits.sam | sort > tophat_out_SRR065506_2/accepted_hits.sorted.sam
    There is only one apparent change in file. I attach screenshots of a small part of the two files.





    I then use Scripture's -task makePairedFile as:

    Code:
    $java -jar scripture.jar -task makePairedFile -pair1 tophat_out_SRR065506_1/accepted_hits.sorted.sam -pair2 tophat_out_SRR065506_2/accepted_hits.sorted.sam -out postTophat/SRR065506.scripturePaired.sam -sorted
    On completion, I use IGVtools to sort and index the paired and alignment files as:
    $cat tophat_out_SRR065506_1/accepted_hits.sorted.sam tophat_out_SRR065506_2/accepted_hits.sorted.sam > postTophat/all_tophat_alignments.sam
    $igvtools sort postTophat/all_tophat_alignments.sam all_tophat_alignments.sorted.sam

    $igvtools sort postTophat/SRR065506.scripturePaired.sorted.sam
    This completes successfully and I get the .sai files.

    I then use Scripture as:

    Code:
    $ java -jar scripture.jar -alignment all_tophat_alignments.sorted.sam -out scriptureResults/chr1.segment -sizeFile ../hg19/hg19.chrom.sizes2 -chr chr1 -chrSequence ../hg19/chr1.fa -pairedEnd SRR065506.scripturePaired.sorted.sam
    This is when I get the error. Log of the error...

    Code:
    [SIZE="1"]Using Version VPaperR3
    Computing weights..... upweighting? false weight: 1.0
    Computing alignment global stats for chromosome chr1
    Computing alignment global stats for chromosome chr10
    Computing alignment global stats for chromosome chr11
    Computing alignment global stats for chromosome chr12
    Computing alignment global stats for chromosome chr13
    Computing alignment global stats for chromosome chr14
    Computing alignment global stats for chromosome chr15
    Computing alignment global stats for chromosome chr16
    Computing alignment global stats for chromosome chr17
    Computing alignment global stats for chromosome chr18
    Computing alignment global stats for chromosome chr19
    Computing alignment global stats for chromosome chr2
    Computing alignment global stats for chromosome chr20
    Computing alignment global stats for chromosome chr21
    Computing alignment global stats for chromosome chr22
    Computing alignment global stats for chromosome chr3
    Computing alignment global stats for chromosome chr4
    Computing alignment global stats for chromosome chr5
    Computing alignment global stats for chromosome chr6
    Computing alignment global stats for chromosome chr7
    Computing alignment global stats for chromosome chr8
    Computing alignment global stats for chromosome chr9
    Computing alignment global stats for chromosome chrM
    Computing alignment global stats for chromosome chrX
    Computing alignment global stats for chromosome chrY
    Has pairs: true
    Has upweighting turned on: false
    Computing weights..... upweighting? false weight: 1.0
    AlignmentDataModel loaded, initializing model stats
    Computing alignment global stats for chromosome chr1
    Computing alignment global stats for chromosome chr10
    Computing alignment global stats for chromosome chr11
    Computing alignment global stats for chromosome chr12
    Computing alignment global stats for chromosome chr13
    Computing alignment global stats for chromosome chr14
    Computing alignment global stats for chromosome chr15
    Computing alignment global stats for chromosome chr16
    Computing alignment global stats for chromosome chr17
    Computing alignment global stats for chromosome chr18
    Computing alignment global stats for chromosome chr19
    Computing alignment global stats for chromosome chr2
    Computing alignment global stats for chromosome chr20
    Computing alignment global stats for chromosome chr21
    Computing alignment global stats for chromosome chr22[/SIZE]
    
    [B]Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Not enough fields; Line 510022
    Line: @SQ	SN:chr15	LN:102531392
    	at [/B][SIZE="1"]net.sf.samtools.SAMTextReader.reportFatalErrorParsingLine(SAMTextReader.java:169)
    	at net.sf.samtools.SAMTextReader.access$400(SAMTextReader.java:40)
    	at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:268)
    	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:232)
    	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:196)
    	at org.broad.igv.sam.reader.SamQueryTextReader$SAMQueryIterator.next(SamQueryTextReader.java:197)
    	at org.broad.igv.sam.reader.SamQueryTextReader$SAMQueryIterator.next(SamQueryTextReader.java:141)
    	at broad.pda.seq.segmentation.GenericAlignmentDataModel.getCountsPerAlignment(GenericAlignmentDataModel.java:257)
    	at broad.pda.seq.segmentation.GenericAlignmentDataModel.getCountsPerAlignment(GenericAlignmentDataModel.java:196)
    	at broad.pda.seq.segmentation.GenericAlignmentDataModel.getCountsPerAlignment(GenericAlignmentDataModel.java:1661)
    	at broad.pda.seq.segmentation.AlignmentDataModelStats.getNumberOfReadsByChr(AlignmentDataModelStats.java:178)
    	at broad.pda.seq.segmentation.AlignmentDataModelStats.computeDataStats(AlignmentDataModelStats.java:133)
    	at broad.pda.seq.segmentation.AlignmentDataModelStats.computeGlobalStats(AlignmentDataModelStats.java:122)
    	at broad.pda.seq.segmentation.AlignmentDataModelStats.<init>(AlignmentDataModelStats.java:86)
    	at broad.pda.seq.segmentation.AlignmentDataModelStats.<init>(AlignmentDataModelStats.java:67)
    	at broad.pda.seq.segmentation.ContinuousDataAlignmentModel.main(ContinuousDataAlignmentModel.java:2277)[/SIZE]
    I don't reckon that I've made a major mistake while doing this. My most probable guess is that something is wrong in format conversion (bam to sam). I'm working on that right now.

    Any insights or help would be greatly appreciated.

    Thanks!
    Last edited by Joker!sAce; 08-06-2011, 06:22 AM.

  • #2
    Hi

    I have exactly got the same error as you "Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Not enough fields". I don't know where i am doing wrong. Let me know if you find the solution for this problem. However there is another way of doing scriputre analysis which i did and it worked. Just follow the scripture manual in this link (http://www.molecularevolution.org/re...pture_activity). Good luck!

    Comment


    • #3
      Hi,

      Thanks for the reply.

      I have 2 questions on this link...

      1. In the IGVtools count command, what is the .genome file? Rather how do you get it? I tried the .fa reference genome but that did not work.
      2. In scripture, makePairedFile task requires 2 files. In the link, thye've done it in a different way. I unzipped the scripture and scripture_beta versions but I cannot find the file scripture_alpha2.jar What/Where is this file?

      Thanks!
      Last edited by Joker!sAce; 08-07-2011, 02:33 AM.

      Comment


      • #4
        I was using IGVtools without the genome files. Downloading that, it should hopefully work now. Q2 still stands.
        Last edited by Joker!sAce; 08-07-2011, 02:52 AM.

        Comment


        • #5
          Hi,
          I guess the workshop material for Scripture analysis was run on old version (alpha) and the current downloaded version for broad institute was new (beta). I think you can use the beta version and see if it still works or otherwise i will send you the old version.

          Comment


          • #6
            Hey,

            That'd be awesome. My e-mail is [email protected]

            I've tried the three versions of scripture they've put on the website, none of them have this older function. Maybe they should include the older functions too in the next release, I'm going to leave a comment there.

            Thanks!

            Comment


            • #7
              why don't you use cufflinks?

              Comment


              • #8
                Everyone in my department already does. We wanted to test out scripture. I know that using cufflinks would probably be less challenging given the great documentation.
                Last edited by Joker!sAce; 08-08-2011, 01:32 AM.

                Comment


                • #9
                  @ Upendra

                  Thanks a lot for the mail. It all works well now.

                  I have one small question.

                  In the bed file there are some transcripts with a * instead of +/- for orientation. What does this mean?

                  Comment


                  • #10
                    what's the difference between scripture and cufflinks,

                    I also have a question, scripture reconstruct transcriptome by specific chromosome, How can I use whole genome? I don't see the command. And where is the whole genome size file?

                    Comment


                    • #11
                      A size file is just a file with chromosome name and it's length. i've attached 2 text files. The chrom2 is the one you should probably use, the other has lengths of all hg19 chromosomes(alternative haplotypes).

                      Difference in scripture and cufflinks is in how it reconstructs the transcriptomes. Read the papers linked on the official sites. Also, there is plus and minus of each software. While cufflinks can handle whole genome, differential analysis, basic stats; it has aggressive filtering (and more that I don't know of). Scripture does not have that aggressive filtering, but scripture pipelines are a bit difficult to manage.

                      You cannot run scripture for the whole genome. What you can do it make a bash script that runs for all the chromosomes, that'd automate running the same command over and over again. Concatenate all the .bed files. Sort it and covert to .gtf file.
                      Attached Files
                      Last edited by Joker!sAce; 08-17-2011, 04:47 AM.

                      Comment


                      • #12
                        The issue from the first post was caused by the creation of the sam files with the "-h" option. This did not allow the proper cleaning of the headers, which resulted in erroneous merged files.
                        ...
                        Talking from experience here.

                        Cheers,

                        Nenad

                        Comment


                        • #13
                          Hi all,

                          Am following the guide lines mentioned in the scripture walk through example.
                          But the command

                          java -Xmx4000m -jar scripture.jar -task makePairedFile -pair1 tophat_out_SRR039999_1/accepted_hits.sorted.sam -pair2 tophat_out_SRR039999_2/accepted_hits.sorted.sam -out SRR039999.paired.sam -sorted

                          This is not giving me any kind of out put can anyone help me out.

                          Thank you in advance
                          Deepak

                          Comment


                          • #14
                            Originally posted by upendra_35 View Post
                            Hi,
                            I guess the workshop material for Scripture analysis was run on old version (alpha) and the current downloaded version for broad institute was new (beta). I think you can use the beta version and see if it still works or otherwise i will send you the old version.
                            Hi, I want to use the scripture (old version) to transfer bed file to gff file. BC the beta version can not do that. My email is [email protected] Would you please send me a copy of old version

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM
                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            22 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            24 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 09:21 AM
                            0 responses
                            19 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-04-2024, 09:00 AM
                            0 responses
                            50 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X