SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Dindel SAM/BAM format - viewing with IGV EHC General 0 10-06-2011 11:27 AM
Different results produced by tophat? kentnf Bioinformatics 2 10-29-2010 10:35 AM
TopHat: the results confused me Maria_Lu Bioinformatics 2 05-14-2010 07:54 PM
Viewing GFF2/GFF3 files on IGV pratibhamani SOLiD 1 02-04-2010 08:54 AM
Tophat and Bowtie results baohua100 Bioinformatics 6 08-27-2009 12:17 AM

Reply
 
Thread Tools
Old 02-02-2010, 03:42 AM   #1
SEQquestions
Member
 
Location: Leeds

Join Date: Jan 2010
Posts: 13
Default Viewing Tophat results in IGV

Hello everybody,

I have 120 sequence read files (76bp illumina reads) that I have converted into fastq format and wish to use in Tophat to align to the human reference genome and then view the alignments in the Integrated Genome Viewer (IGV).

I have access to a high powered computing facility so was going to submit by 120 fastq files seperately (so they can run in parallel) and have them output to seperate folders. If I do this, will I be able to simply concatenate the 3 main output files from each Tophat run (i.e. junctions.bed, coverage.wig and accepted_hits.sam) to gain 3 master output files for all 120 runs?!

As a test, before I run all 120 files, I have run a single fastq file. I have then used sam tools to:
  • Convert .sam to .bam [samtools view -bt hg18.fa.fai accepted_hits.sam > accepted_hits.bam]
  • Sort my .bam file [samtools sort accepted_hits.bam accepted_hits.sorted]
  • Create a bam index (.bai) file [samtools index accepted_hits.sorted.bam]

Finally I have opened IGV and loaded hg18 before loading accepted_hits.sorted.bam. The viewer informs me that it has located the .bai file and automatically loaded it but then I see no alignments in the genome viewer I have tried zooming in on regions where, I believe, the .sam file is telling me there should be an alignment but I still see nothing.

Any help would be much appreciated. I have looked for 2 days to try and find the answer so I'm really sorry if I've missed a relevant post but I'm at my wit's end.

Cheers

Last edited by SEQquestions; 02-02-2010 at 11:26 PM.
SEQquestions is offline   Reply With Quote
Old 02-02-2010, 12:26 PM   #2
mgogol
Senior Member
 
Location: Kansas City

Join Date: Mar 2008
Posts: 197
Default concatenating

Quote:
If I do this, will I be able to simply concatenate the 3 main output files from each Tophat run (i.e. junctions.bed, coverage.wig and accepted_hits.sam) to gain 3 master output files for all 120 runs?
I suspect you might have a problem concatenating the wig files, because I don't think you can have overlapping regions and they have to be sorted.

I think you should be fine concatenating the sam files and bed files, but take out the bed file's first line.
mgogol is offline   Reply With Quote
Old 02-02-2010, 11:27 PM   #3
SEQquestions
Member
 
Location: Leeds

Join Date: Jan 2010
Posts: 13
Default

Thank you mgogol. I have been able to locate my alignments now so will try concatenating like you suggest

Regards
SEQquestions
SEQquestions is offline   Reply With Quote
Old 04-02-2012, 03:22 PM   #4
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

I'm having trouble with the exact same thing, but I don't have any .wig files (I'm guessing Tophat has been updated since this last post). I sorted my accepted_hits.bam file, and then created an index, and I put these two files into a folder. I then loaded IGV, and loaded the sorted accepted_hits.bam file, and it loads, but there is nothing there. I've selected the right genome (hg19). Any help would be greatly, greatly appreciated.
billstevens is offline   Reply With Quote
Old 04-02-2012, 11:47 PM   #5
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 264
Default

Did you use the same genome for the alignment and the visualization ?
NicoBxl is offline   Reply With Quote
Old 04-03-2012, 03:58 AM   #6
swaraj
Member
 
Location: Naples, Italy

Join Date: Feb 2012
Posts: 50
Default

I would look into three aspects

1. Whether the genome file you used to map your reads is the same genome file you have uploaded into IGV.

2. Tophat can directly give you output in bam format (1.4.0), so why you are using an additional step to convert from sam to bam.

3. Try and sort with picard tools your bam files
java -Xmx10000m -jar picard-tools-1.58/SortSam.jar I=accepted_hits.bam O= sorted.bam SO=coordinate

Hope one of this can solve your problem.
swaraj is offline   Reply With Quote
Old 04-03-2012, 07:08 AM   #7
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Quote:
Originally Posted by swaraj View Post
I would look into three aspects

1. Whether the genome file you used to map your reads is the same genome file you have uploaded into IGV.

2. Tophat can directly give you output in bam format (1.4.0), so why you are using an additional step to convert from sam to bam.

3. Try and sort with picard tools your bam files
java -Xmx10000m -jar picard-tools-1.58/SortSam.jar I=accepted_hits.bam O= sorted.bam SO=coordinate

Hope one of this can solve your problem.
Thanks for the responses. Yes, I used the same genome file (hg19). And I didn't convert from sam to bam, that was the original poster. I just took my output file, acceptedhits.bam, and sorted with SAMtools. Do you think Samtools is the problem, and I should use Picard?
billstevens is offline   Reply With Quote
Old 04-03-2012, 07:58 AM   #8
swaraj
Member
 
Location: Naples, Italy

Join Date: Feb 2012
Posts: 50
Default

Samtools is a problem only if one has to use Scripture for downstream analysis. I would suggest though to go with picard tools once, and keep your fingers crossed :-).
swaraj is offline   Reply With Quote
Old 04-20-2012, 08:31 AM   #9
Khanjan
Member
 
Location: Atlanta

Join Date: Jan 2010
Posts: 20
Default Confused about viewing results

Hi All,

I am getting slightly confused here when I view my results from Tophat fusion in IGV. Any help will be appreciated !

- I downloaded the Bowtie1 index and ran TopHat fusion on that one.
- I converted this index to fasta file and uploaded to IGV.
- I view my sorted accepted_hits in IGV. However, there are very small tracks seen, when I know there is high coverage.
- Also, I cannot see the gene names ( as there is no annotation file ).

It might be my error in understanding the basic stuff. However, ideally, I would like to
- Align reads using TopHatFusion (which also gives candidates for fusion genes). I am assuming I would be able to see all the alignments ( and maybe I have to manually search for the fusion regions )
- View it in IGV with the ideogram and gene tracks ( maybe the default genome that they have loaded )

It seems very simple. How do I acheive this?

Thanks a lot,
K
Khanjan is offline   Reply With Quote
Old 06-21-2012, 02:58 PM   #10
wmyashar
Junior Member
 
Location: San Diego

Join Date: Jun 2012
Posts: 4
Default

Quote:
Originally Posted by billstevens View Post
Thanks for the responses. Yes, I used the same genome file (hg19). And I didn't convert from sam to bam, that was the original poster. I just took my output file, acceptedhits.bam, and sorted with SAMtools. Do you think Samtools is the problem, and I should use Picard?
I was originally having this problem as well but you should go to File->run igvtools and run a count of your .bam files onto the genome that you are currently working with (it will automatically load the genome for you). From here igvtools will spit out a .tdf file that when you right click on your .bam track, you can load this coverage data and get both the alignment histograms and get specific gene alignments from your .bam file. Some of my .bam files wouldnt show anything until this step, I hope this helps.
wmyashar is offline   Reply With Quote
Old 07-05-2012, 09:04 PM   #11
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Hi guys,

I really need some advice on this. I first ran three conditions on one lane. I obtained very good results, and I followed up with running two replicates of each condition on one lane. However, I have been looking at the results, and something is very off. I've attached a snapshot of the file in IGV. My WT Samples 2 and 3 and Control Sample 3 are wildly different from everything
else. These three conditions are very similar, I really shouldn't be
able to detect much difference at all between them in IGV, but in
every chromosome, these three samples are very different than the
other six. Additionally, Samples 2 and 3 were extracted from
different experiments a week apart, but both Samples 2 and 3 of the WT
and my Control Sample 3 are very similar to each other and very
different from everything else.

Is such variability common? If it is, then why do these three look so similar, to each other then? I emailed the people who ran it and they said it doesn't seem bad to them, but to me, it looks ridiculous. Any thoughts would be very, very, very appreciated!
Attached Images
File Type: jpeg IGV Snapshot.jpeg (141.8 KB, 78 views)
billstevens is offline   Reply With Quote
Old 07-06-2012, 04:58 AM   #12
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by billstevens View Post
Hi guys,

I really need some advice on this. I first ran three conditions on one lane. I obtained very good results, and I followed up with running two replicates of each condition on one lane. However, I have been looking at the results, and something is very off. I've attached a snapshot of the file in IGV. My WT Samples 2 and 3 and Control Sample 3 are wildly different from everything
else. These three conditions are very similar, I really shouldn't be
able to detect much difference at all between them in IGV, but in
every chromosome, these three samples are very different than the
other six. Additionally, Samples 2 and 3 were extracted from
different experiments a week apart, but both Samples 2 and 3 of the WT
and my Control Sample 3 are very similar to each other and very
different from everything else.

Is such variability common? If it is, then why do these three look so similar, to each other then? I emailed the people who ran it and they said it doesn't seem bad to them, but to me, it looks ridiculous. Any thoughts would be very, very, very appreciated!
There's not much you can really tell from such a broad view. Do some scatterplots and linear modeling between samples, then you'll get a better idea of the variability.
pbluescript is offline   Reply With Quote
Old 07-06-2012, 09:10 AM   #13
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Thanks, yes, I should have posted that too. Take a look. This is not between samples (I'll run that next), but between the conditions.

See that branching?
Attached Images
File Type: bmp WT_LuxS_Scatter_Altered.bmp (107.7 KB, 47 views)
billstevens is offline   Reply With Quote
Old 07-16-2012, 01:28 PM   #14
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Hi guys,

So I've done the scatterplots between the samples. Please, please take a look.

WT is the Wildtypes, LuxS is the mutant, and RPMI is the control. Also, I've attached what my first run looked like compared to each other. The first run is what I was expecting for everything else, but as you can see, RPMI3 seems really off from the other two RPMI's (both have the same shift). Also, WT2 and WT3 seem really different from WT1 (lots of dots along the axis) although they are very similar to each other.

What do you guys think? My first run showed such good agreement between the different conditions. This is my first time using RNA-Seq so I don't have a feel for what seems normal and not. I'd really love anyone sharing what they think.

The csDensity plots all look exactly the same, for whatever thats worth.

Thanks so much!
Attached Images
File Type: jpg WildType.jpg (76.2 KB, 16 views)
File Type: jpg LuxS.jpg (70.8 KB, 11 views)
File Type: jpg RPMI.jpg (87.9 KB, 10 views)
File Type: jpg FirstRun.jpg (62.8 KB, 11 views)
billstevens is offline   Reply With Quote
Old 07-17-2012, 05:25 AM   #15
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by billstevens View Post
Hi guys,

So I've done the scatterplots between the samples. Please, please take a look.

WT is the Wildtypes, LuxS is the mutant, and RPMI is the control. Also, I've attached what my first run looked like compared to each other. The first run is what I was expecting for everything else, but as you can see, RPMI3 seems really off from the other two RPMI's (both have the same shift). Also, WT2 and WT3 seem really different from WT1 (lots of dots along the axis) although they are very similar to each other.

What do you guys think? My first run showed such good agreement between the different conditions. This is my first time using RNA-Seq so I don't have a feel for what seems normal and not. I'd really love anyone sharing what they think.

The csDensity plots all look exactly the same, for whatever thats worth.

Thanks so much!
Unless you have some idea of what to expect from the different conditions, it's hard to be sure if what you are seeing is normal variability or represents some problem with the samples. Just based on the scatterplots though, there's nothing that would make me very concerned. You could try the analysis with and without the odd samples and see which matches better to your expectations or some sort of additional confirmation like qPCR.
pbluescript is offline   Reply With Quote
Old 07-18-2012, 12:26 PM   #16
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Thanks I'll give that a try.
billstevens is offline   Reply With Quote
Old 10-31-2012, 02:04 AM   #17
jiewencai
Junior Member
 
Location: China

Join Date: Oct 2012
Posts: 3
Default

It‘s so hard for me to understand those answers.
jiewencai is offline   Reply With Quote
Old 10-17-2014, 10:10 AM   #18
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hi All,
I converted the .bam file to .bai file using
$ samtools index test_sorted.bam test_sorted.bai (and now I have both in the same folder). When i upload the .bam file to IGV it seems to work, but i see no alignment of the reads against the genome (i do see the genome in blue)
I am using the TAIR10 genome.

what could be this? ideas?
many thanks
G

Last edited by Gonza; 10-17-2014 at 10:14 AM.
Gonza is offline   Reply With Quote
Old 10-17-2014, 10:39 AM   #19
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,079
Default

Quote:
Originally Posted by Gonza View Post
Hi All,
I converted the .bam file to .bai file using
$ samtools index test_sorted.bam test_sorted.bai (and now I have both in the same folder). When i upload the .bam file to IGV it seems to work, but i see no alignment of the reads against the genome (i do see the genome in blue)
I am using the TAIR10 genome.

what could be this? ideas?
many thanks
G
Chromosome names need to match between your BAM/genome in IGV (I assume you are using the built-in genome) in order for IGV to be able to display your data. Other possibility is you need to start zooming in/out/sideways depending on the relative location of IGV display, to see the mapped reads.
GenoMax is offline   Reply With Quote
Old 10-17-2014, 12:12 PM   #20
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Thanks, I just realized that you need to start scrolling to the side to see where the reads are. It wasn't really intuitive....sorry for the silly question.
Gonza is offline   Reply With Quote
Reply

Tags
.bai, igv, tophat, viewer

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:20 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO