Seqanswers Leaderboard Ad

**Heisman** · 03-29-2012, 04:50 AM

Originally posted by Fedster View Post

Hi,

I am trying to use bwa and samtools to see if in the CEU transcriptome there are multiple splice variants of a couple of genes.

I got the CEU transcriptome from here:

http://jungle.unige.ch/rnaseq_CEU60/

(I used the fastq files), and I went on to use bwa to

1) align the fastq reads to my reference file with the possible splices
2) creating a sam file from (1)

Now I have transformed the sam files in bam files, and additionally sorted and indexed them.

What I want to do now is the following: see a (m)pileup of my reference splices against the bam files (161 bam files in total).

My problem is that I am running everything on a cluster, so I cannot run samtools mpileup to give me an interactive view of the alignment.

What I'd like to do is to get out of samtools some output text file that tells me, for every bam file, if there is anything aligning to my splices, and some sort of read depth/other quality score/p-value/whatever.

Any idea on how to do that? I am running out of ideas (I self taught bwa and samtools in the last 3 days, so I feel I am running out of my intuition).

You can generate a consensus like this:

Code:

/samtools-0.1.18/samtools/ mpileup -q 5 -Q 15 -l [Interval_File] -uABf [reference_sequence.fa] [aligned_file.bam] | /samtools-0.1.18/bcftools/bcftools view -bcg - > [intermediate_file.bcf] &

/samtools-0.1.18/bcftools/bcftools view [intermediate_file.bcf] > [consensus.txt]

Where your [Interval_File] must be in a format where positions are denoted as:

chr1 3301721
chr1 3313108
chr1 3319339

and intervals are denoted as:

chr1 2985720 2985880
chr1 3102667 3103058
chr1 3160629 3160721

**Fedster** · 03-29-2012, 05:14 AM

Originally posted by Heisman View Post

Where your [Interval_File] must be in a format where positions are denoted as:

chr1 3301721
chr1 3313108
chr1 3319339

and intervals are denoted as:

chr1 2985720 2985880
chr1 3102667 3103058
chr1 3160629 3160721

Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

chr6 3301721
chr19 3301721
chr6 2985720 2985880
chr19 2985720 2985880

(the positions/intervals I just used are random), or have a position/interval for each splice?

many thanks!

**Heisman** · 03-29-2012, 05:17 AM

Originally posted by Fedster View Post

Excellent thanks! Just a quick question: I have two genes only, but 15 possible splices in total. Should my interval file be

chr6 3301721
chr19 3301721
chr6 2985720 2985880
chr19 2985720 2985880

(the positions/intervals I just used are random), or have a position/interval for each splice?

many thanks!

You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.

**Fedster** · 03-29-2012, 05:59 AM

Originally posted by Heisman View Post

You can do either; doing one for each splice site region would be yield a smaller output file that would be a lot easier to look through visually.

Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

Again, amy thanks!

**Heisman** · 03-29-2012, 06:02 AM

Originally posted by Fedster View Post

Thanks a lot! final questions: do I need to change the suffix of my fats file to.fa, and can I run mpileup on oll the bam files at once (I just want to know if, as a population, the CEU show more than one possible splice, I don't care for any specific individual).

Again, amy thanks!

You don't need to change the suffix of your files.

You can specify multiple bam files, see this: http://samtools.sourceforge.net/samtools.shtml

Alternatively, you could merge all of the bam files together and then run mpileup on the merged bam file. I don't know if one would be faster than the other.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

saving samtools mpileup output from a cluster

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News