Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffdiff won't open TopHat accepted_hits.bam

    I'm following the Tuxedo protocol with 100cycle single directional illumina hi Seq 2000 data. I have used Tophat and Bowtie to align my reads to the Arabidopsis genome, I then used Cufflinks to search for splice-variants and Cuffmerge to join the files together. I now want to use Cuffdiff to examine the differential gene and isoform expression but get the following error message...

    File ./home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM... Error: cannot open alignment file

    This implies that it can't open the alignment files but i don't know why? When i examined the BAM files that were produced by TopHat, i noticed their icons look like they have been zipped (right click, properties also supports this theory), however i can't unzip them with gunzip (using command line nor GUI methods).
    Any ideas?

  • #2
    Can you use 'samtools' can you read the BAM file? If not then indeed the file is not valid.

    Code:
    samtools view  ./home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam | more

    Comment


    • #3
      Thanks for the advice, when i used 'samtools view' the file seemed to opened fine, there appears to be normal sequence information but there's also no tx at the beginning, i have to scroll down to see the sequence info... is that normal?
      So i assume that means the file is fine and the zipped appearance may be a red herring... Any other suggestions?

      Comment


      • #4
        I am not sure what you mean by 'tx at the beginning'. When I look at a recently generated accepted_hits.bam file then I get the reads right at the start.

        Aside from that I have no concrete suggestions. Perhaps you can give us your full cuffdiff command line? That could help in debugging.

        Comment


        • #5
          My cuffdiff command line is

          cuffdiff -o cuffdiff_0 -b TAIR10_chr_all.fas -p 8 -L 9-0GS,10-0MS -u /home/richard/RNA_seq_analysis/run296_cuffmerge/run296_merged_asm/merged.gtf ./home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam ./home/richard/RNA_seq_analysis/run296_tophat/10-0MS_AGTTCC_run296_tophat/accepted_hits.bam

          What i mean by "no tx at the beginning' is that the window produced by 'samtools view' is blank, unless i scroll down, then about 20% of the way down text encoding sequencing information begins...

          Comment


          • #6
            I tried using samtools to convert my BAM to SAM (samtools view -h -o 9-0GS_run296.sam accepted_hits.bam) to see if that makes any difference and got the following error...
            [bam_header_read] EOF marker is absent. The input is probably truncated.
            Why is there no EOF marker? I used tophat with normal/basic settings...

            Comment


            • #7
              I am thinking that your file name is incorrect. You have a dot before '/home'. Can you do a
              Code:
              ls ./home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam
              As versus
              Code:
              ls /home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam
              Last edited by westerman; 08-14-2012, 07:51 AM. Reason: Forgot 'ls' in second code section

              Comment


              • #8
                Thanks, i think that helped as i now have a different issue.
                How do i think i now have an issue with the number if labels vs samples (see below).
                I'm trying to compare 2 files/treatments (for my first run at least).

                richard@ubuntu:~/RNA_seq_analysis/run297_cuffdiff$ cuffdiff -o cuffdiff_0 -b TAIR10_chr_all.fas -p 8 -L 9-0GS,10-0MS -u /home/richard/RNA_seq_analysis/run296_cuffmerge/run296_merged_asm/merged.gtf ls ./home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam ls ./home/richard/RNA_seq_analysis/run296_tophat/10-0MS_AGTTCC_run296_tophat/accepted_hits.bam
                You are using Cufflinks v2.0.2, which is the most recent release.
                Error: number of labels must match number of conditions

                Comment


                • #9
                  I adjusted the coding after looking at other posts about the number of conditions which then brought me back to a invalid BAM file.

                  richard@ubuntu:~/RNA_seq_analysis/run297_cuffdiff$ cuffdiff -o cuffdiff_0 -b TAIR10_chr_all.fas -p 8 -L 9-0GS,10-0MS -u /home/richard/RNA_seq_analysis/run296_cuffmerge/run296_merged_asm/merged.gtf ls ./home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam,./home/richard/RNA_seq_analysis/run296_tophat/10-0MS_AGTTCC_run296_tophat/accepted_hits.bam
                  You are using Cufflinks v2.0.2, which is the most recent release.
                  open: No such file or directory
                  File ls doesn't appear to be a valid BAM file, trying SAM...
                  Error: cannot open alignment file ls for reading

                  Comment


                  • #10
                    I still think that your files are named incorrectly. Almost always having the dot in front of '/home' will not be the file name. Plus you have now put an 'ls' in there which is obviously not a file name either. In my previous post I wanted you to simply run the 'ls' (or dir on Windows -- but I suspect you are running Linux or MacOS) command to prove to me that ./home/etc. was a correct syntax for your files. The complete command line that you should run is as follows. Ignore any weird spacing that the forum may introduce.


                    Code:
                    cuffdiff -o cuffdiff_0 -b TAIR10_chr_all.fas -p 8 -L 9-0GS,10-0MS -u /home/richard/RNA_seq_analysis/run296_cuffmerge/run296_merged_asm/merged.gtf /home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam,/home/richard/RNA_seq_analysis/run296_tophat/10-0MS_AGTTCC_run296_tophat/accepted_hits.bam

                    Comment


                    • #11
                      I removed the ls and the dot before the code and ran the script... i then tried copying and pasteing the sone above in just in case i had made a mistake but got the same response each time. However this is a new one!

                      richard@ubuntu:~/RNA_seq_analysis/run297_cuffdiff$ cuffdiff -o cuffdiff_0 -b TAIR10_chr_all.fas -p 8 -L 9-0GS,10-0MS -u /home/richard/RNA_seq_analysis/run296_cuffmerge/run296_merged_asm/merged.gtf /home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam,/home/richard/RNA_seq_analysis/run296_tophat/10-0MS_AGTTCC_run296_tophat/accepted_hits.bam
                      \You are using Cufflinks v2.0.2, which is the most recent release.
                      Error: cuffdiff requires at least 2 SAM files

                      Comment


                      • #12
                        Well, you do need two different conditions. As per the cuffdiff help, using the commas with your bam files is the following ...

                        Supply replicate SAMs as comma separated lists for each condition
                        Even if 9-0GS and 10-0MS are the same replicates then, at least for a test, take away the comma so that they do not act like replicates.

                        Comment


                        • #13
                          I removed the comma between the 2 input files and replaced it with a space and it now appears to working! The final script is below. Thanks for your help Rick

                          cuffdiff -o cuffdiff_0 -b TAIR10_chr_all.fas -p 8 -L 9-0GS,10-0MS -u /home/richard/RNA_seq_analysis/run296_cuffmerge/run296_merged_asm/merged.gtf /home/richard/RNA_seq_analysis/run296_tophat/9-0GS_AGTCAA_run296_tophat/accepted_hits.bam /home/richard/RNA_seq_analysis/run296_tophat/10-0MS_AGTTCC_run296_tophat/accepted_hits.bam

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          22 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          47 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X