Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Viewing Tophat results in IGV

    Hello everybody,

    I have 120 sequence read files (76bp illumina reads) that I have converted into fastq format and wish to use in Tophat to align to the human reference genome and then view the alignments in the Integrated Genome Viewer (IGV).

    I have access to a high powered computing facility so was going to submit by 120 fastq files seperately (so they can run in parallel) and have them output to seperate folders. If I do this, will I be able to simply concatenate the 3 main output files from each Tophat run (i.e. junctions.bed, coverage.wig and accepted_hits.sam) to gain 3 master output files for all 120 runs?!

    As a test, before I run all 120 files, I have run a single fastq file. I have then used sam tools to:
    • Convert .sam to .bam [samtools view -bt hg18.fa.fai accepted_hits.sam > accepted_hits.bam]
    • Sort my .bam file [samtools sort accepted_hits.bam accepted_hits.sorted]
    • Create a bam index (.bai) file [samtools index accepted_hits.sorted.bam]


    Finally I have opened IGV and loaded hg18 before loading accepted_hits.sorted.bam. The viewer informs me that it has located the .bai file and automatically loaded it but then I see no alignments in the genome viewer I have tried zooming in on regions where, I believe, the .sam file is telling me there should be an alignment but I still see nothing.

    Any help would be much appreciated. I have looked for 2 days to try and find the answer so I'm really sorry if I've missed a relevant post but I'm at my wit's end.

    Cheers
    Last edited by SEQquestions; 02-02-2010, 11:26 PM.

  • #2
    concatenating

    If I do this, will I be able to simply concatenate the 3 main output files from each Tophat run (i.e. junctions.bed, coverage.wig and accepted_hits.sam) to gain 3 master output files for all 120 runs?
    I suspect you might have a problem concatenating the wig files, because I don't think you can have overlapping regions and they have to be sorted.

    I think you should be fine concatenating the sam files and bed files, but take out the bed file's first line.

    Comment


    • #3
      Thank you mgogol. I have been able to locate my alignments now so will try concatenating like you suggest

      Regards
      SEQquestions

      Comment


      • #4
        I'm having trouble with the exact same thing, but I don't have any .wig files (I'm guessing Tophat has been updated since this last post). I sorted my accepted_hits.bam file, and then created an index, and I put these two files into a folder. I then loaded IGV, and loaded the sorted accepted_hits.bam file, and it loads, but there is nothing there. I've selected the right genome (hg19). Any help would be greatly, greatly appreciated.

        Comment


        • #5
          Did you use the same genome for the alignment and the visualization ?

          Comment


          • #6
            I would look into three aspects

            1. Whether the genome file you used to map your reads is the same genome file you have uploaded into IGV.

            2. Tophat can directly give you output in bam format (1.4.0), so why you are using an additional step to convert from sam to bam.

            3. Try and sort with picard tools your bam files
            java -Xmx10000m -jar picard-tools-1.58/SortSam.jar I=accepted_hits.bam O= sorted.bam SO=coordinate

            Hope one of this can solve your problem.

            Comment


            • #7
              Originally posted by swaraj View Post
              I would look into three aspects

              1. Whether the genome file you used to map your reads is the same genome file you have uploaded into IGV.

              2. Tophat can directly give you output in bam format (1.4.0), so why you are using an additional step to convert from sam to bam.

              3. Try and sort with picard tools your bam files
              java -Xmx10000m -jar picard-tools-1.58/SortSam.jar I=accepted_hits.bam O= sorted.bam SO=coordinate

              Hope one of this can solve your problem.
              Thanks for the responses. Yes, I used the same genome file (hg19). And I didn't convert from sam to bam, that was the original poster. I just took my output file, acceptedhits.bam, and sorted with SAMtools. Do you think Samtools is the problem, and I should use Picard?

              Comment


              • #8
                Samtools is a problem only if one has to use Scripture for downstream analysis. I would suggest though to go with picard tools once, and keep your fingers crossed :-).

                Comment


                • #9
                  Confused about viewing results

                  Hi All,

                  I am getting slightly confused here when I view my results from Tophat fusion in IGV. Any help will be appreciated !

                  - I downloaded the Bowtie1 index and ran TopHat fusion on that one.
                  - I converted this index to fasta file and uploaded to IGV.
                  - I view my sorted accepted_hits in IGV. However, there are very small tracks seen, when I know there is high coverage.
                  - Also, I cannot see the gene names ( as there is no annotation file ).

                  It might be my error in understanding the basic stuff. However, ideally, I would like to
                  - Align reads using TopHatFusion (which also gives candidates for fusion genes). I am assuming I would be able to see all the alignments ( and maybe I have to manually search for the fusion regions )
                  - View it in IGV with the ideogram and gene tracks ( maybe the default genome that they have loaded )

                  It seems very simple. How do I acheive this?

                  Thanks a lot,
                  K

                  Comment


                  • #10
                    Originally posted by billstevens View Post
                    Thanks for the responses. Yes, I used the same genome file (hg19). And I didn't convert from sam to bam, that was the original poster. I just took my output file, acceptedhits.bam, and sorted with SAMtools. Do you think Samtools is the problem, and I should use Picard?
                    I was originally having this problem as well but you should go to File->run igvtools and run a count of your .bam files onto the genome that you are currently working with (it will automatically load the genome for you). From here igvtools will spit out a .tdf file that when you right click on your .bam track, you can load this coverage data and get both the alignment histograms and get specific gene alignments from your .bam file. Some of my .bam files wouldnt show anything until this step, I hope this helps.

                    Comment


                    • #11
                      Hi guys,

                      I really need some advice on this. I first ran three conditions on one lane. I obtained very good results, and I followed up with running two replicates of each condition on one lane. However, I have been looking at the results, and something is very off. I've attached a snapshot of the file in IGV. My WT Samples 2 and 3 and Control Sample 3 are wildly different from everything
                      else. These three conditions are very similar, I really shouldn't be
                      able to detect much difference at all between them in IGV, but in
                      every chromosome, these three samples are very different than the
                      other six. Additionally, Samples 2 and 3 were extracted from
                      different experiments a week apart, but both Samples 2 and 3 of the WT
                      and my Control Sample 3 are very similar to each other and very
                      different from everything else.

                      Is such variability common? If it is, then why do these three look so similar, to each other then? I emailed the people who ran it and they said it doesn't seem bad to them, but to me, it looks ridiculous. Any thoughts would be very, very, very appreciated!
                      Attached Files

                      Comment


                      • #12
                        Originally posted by billstevens View Post
                        Hi guys,

                        I really need some advice on this. I first ran three conditions on one lane. I obtained very good results, and I followed up with running two replicates of each condition on one lane. However, I have been looking at the results, and something is very off. I've attached a snapshot of the file in IGV. My WT Samples 2 and 3 and Control Sample 3 are wildly different from everything
                        else. These three conditions are very similar, I really shouldn't be
                        able to detect much difference at all between them in IGV, but in
                        every chromosome, these three samples are very different than the
                        other six. Additionally, Samples 2 and 3 were extracted from
                        different experiments a week apart, but both Samples 2 and 3 of the WT
                        and my Control Sample 3 are very similar to each other and very
                        different from everything else.

                        Is such variability common? If it is, then why do these three look so similar, to each other then? I emailed the people who ran it and they said it doesn't seem bad to them, but to me, it looks ridiculous. Any thoughts would be very, very, very appreciated!
                        There's not much you can really tell from such a broad view. Do some scatterplots and linear modeling between samples, then you'll get a better idea of the variability.

                        Comment


                        • #13
                          Thanks, yes, I should have posted that too. Take a look. This is not between samples (I'll run that next), but between the conditions.

                          See that branching?
                          Attached Files

                          Comment


                          • #14
                            Hi guys,

                            So I've done the scatterplots between the samples. Please, please take a look.

                            WT is the Wildtypes, LuxS is the mutant, and RPMI is the control. Also, I've attached what my first run looked like compared to each other. The first run is what I was expecting for everything else, but as you can see, RPMI3 seems really off from the other two RPMI's (both have the same shift). Also, WT2 and WT3 seem really different from WT1 (lots of dots along the axis) although they are very similar to each other.

                            What do you guys think? My first run showed such good agreement between the different conditions. This is my first time using RNA-Seq so I don't have a feel for what seems normal and not. I'd really love anyone sharing what they think.

                            The csDensity plots all look exactly the same, for whatever thats worth.

                            Thanks so much!
                            Attached Files

                            Comment


                            • #15
                              Originally posted by billstevens View Post
                              Hi guys,

                              So I've done the scatterplots between the samples. Please, please take a look.

                              WT is the Wildtypes, LuxS is the mutant, and RPMI is the control. Also, I've attached what my first run looked like compared to each other. The first run is what I was expecting for everything else, but as you can see, RPMI3 seems really off from the other two RPMI's (both have the same shift). Also, WT2 and WT3 seem really different from WT1 (lots of dots along the axis) although they are very similar to each other.

                              What do you guys think? My first run showed such good agreement between the different conditions. This is my first time using RNA-Seq so I don't have a feel for what seems normal and not. I'd really love anyone sharing what they think.

                              The csDensity plots all look exactly the same, for whatever thats worth.

                              Thanks so much!
                              Unless you have some idea of what to expect from the different conditions, it's hard to be sure if what you are seeing is normal variability or represents some problem with the samples. Just based on the scatterplots though, there's nothing that would make me very concerned. You could try the analysis with and without the odd samples and see which matches better to your expectations or some sort of additional confirmation like qPCR.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              29 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X