Seqanswers Leaderboard Ad

**k-gun12** · 05-17-2011, 09:24 AM

It turns out that a rather nice one is shipped with EVidence Modeler:

http://evidencemodeler.sourceforge.net/

.. in the utilities directory.

**adarob** · 05-17-2011, 02:16 PM

There is a new utility included with cufflinks now, called "gffread", which should be able to extract spliced transcript sequences from a GFF/GTF. This is now documented on this page (scroll down to the "Extracting transcript sequences" section at the end):

404 Not Found

http://cufflinks.cbcb.umd.edu/gff.html

**Anelda** · 10-25-2011, 01:56 AM

Originally posted by adarob View Post

There is a new utility included with cufflinks now, called "gffread", which should be able to extract spliced transcript sequences from a GFF/GTF. This is now documented on this page (scroll down to the "Extracting transcript sequences" section at the end):

http://cufflinks.cbcb.umd.edu/gff.html

The problem with this script is that it returns the reference DNA sequence (from the genome fasta file used for mapping) for the transcripts and not the assembly from the mapped reads.

Does anyone know of a tool that can give the assembled transcript sequence from the RNA-seq data? Or can someone help me if I'm misunderstanding GFFRead?

Thanks!

**pkerrwall** · 11-23-2011, 09:15 AM

Originally posted by Anelda View Post

The problem with this script is that it returns the reference DNA sequence (from the genome fasta file used for mapping) for the transcripts and not the assembly from the mapped reads.

Does anyone know of a tool that can give the assembled transcript sequence from the RNA-seq data? Or can someone help me if I'm misunderstanding GFFRead?

Here is the process that I am using:

samtools mpileup -uf ref.fa accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq | fq2fa.pl > new_ref.fa
gffread -w transcripts.fa -g new_ref.fa transcripts.gtf

where fq2fa.pl is a bioperl script to convert from fq to fasta

I also have an email into the cufflinks developers to see if there is a way that the gffread utility can be enhanced to get the consensus sequence from the bam file and not the reference genome.

**huangjun** · 12-11-2011, 05:14 PM

the cufflinks only give us the rebuild assembled sequence based on the reference genome, but i want to know how to get unmapped contigs also , only that we could get some new transcripts.

**Wallysb01** · 12-11-2011, 06:09 PM

Originally posted by huangjun View Post

the cufflinks only give us the rebuild assembled sequence based on the reference genome, but i want to know how to get unmapped contigs also , only that we could get some new transcripts.

Its my understanding that nothing tophat/cufflinks will do this. You'll something like trinity, ABySS/trans-ABySS, or Velvet/Oases to do de novo transcriptome assembly. Tophat and cufflinks require the reference sequence.

**kenietz** · 09-12-2012, 11:27 PM

Hi guys,
this thread was really useful for creating my script for extracting consensus sequences from the bam file. I tried to use the new samtools mpileup but it didnt do the job that i wanted. I tried to use vcf-tools's vcf-consensus but it can only extract the whole chromosome. I couldnt make it extract only a region. So went back to samtools-0.1.16 'pileup -c' option. I didnt use the reference sequnce tho cos is more complicated. So i just parse the result of the command:

samtools pileup -c bam.file

I take the transcript.gtf from cufflinks and for every transcript extract the consensus from the pileup. Well this task by it self sounds simple enough but its not. Cos cufflinks doesnt not provide the starts and ends as found in the bam for each found transcript but the start and end found in the annotation. So one has to search for the 'real' coordinates of the transcripts in the pile up first then extractions is easy. Well i have to admit i didnt try to run cufflinks without reference and then see the output. Maybe then transcripts.gtf will contain the true coordinates.

Now my script is working but is slow for big bams cos then the pileup is also big and parsing it takes time. Reading about multi-threading at the moment. Or using PDL's PP programming to make it a bit faster.

**figo1019** · 09-27-2012, 09:02 AM

Originally posted by kenietz View Post

Hi guys,
this thread was really useful for creating my script for extracting consensus sequences from the bam file. I tried to use the new samtools mpileup but it didnt do the job that i wanted. I tried to use vcf-tools's vcf-consensus but it can only extract the whole chromosome. I couldnt make it extract only a region. So went back to samtools-0.1.16 'pileup -c' option. I didnt use the reference sequnce tho cos is more complicated. So i just parse the result of the command:

samtools pileup -c bam.file

I take the transcript.gtf from cufflinks and for every transcript extract the consensus from the pileup. Well this task by it self sounds simple enough but its not. Cos cufflinks doesnt not provide the starts and ends as found in the bam for each found transcript but the start and end found in the annotation. So one has to search for the 'real' coordinates of the transcripts in the pile up first then extractions is easy. Well i have to admit i didnt try to run cufflinks without reference and then see the output. Maybe then transcripts.gtf will contain the true coordinates.

Now my script is working but is slow for big bams cos then the pileup is also big and parsing it takes time. Reading about multi-threading at the moment. Or using PDL's PP programming to make it a bit faster.

Hi Kenietz

Can you upload your script for doing the things it will be great as I am also trying the same thing to do

Regards

**kenietz** · 09-28-2012, 01:03 AM

Hi figo1019,
here it is. Hope it does the job for you.

I've put some explanations inside the script. Please take a look before using it.

Comments and questions are also welcomed

Cheers

Attached Files

get_consensus_bam_batch.pl (7.1 KB, 118 views)

**kenietz** · 10-01-2012, 12:17 AM

Hi,
just figured out that my work around does not work for paired-end reads

Somehow pileup -c removes some reads except these ones:

0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate

I suppose i will have to use directly the command: samtools pileup BAM > pileup.out, then parse the output. Its a bit more tricky but i think will do the job.
Will update when im done.

**upendra_35** · 11-07-2012, 03:37 PM

Originally posted by kenietz View Post

Hi,
just figured out that my work around does not work for paired-end reads

Somehow pileup -c removes some reads except these ones:

0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate

I suppose i will have to use directly the command: samtools pileup BAM > pileup.out, then parse the output. Its a bit more tricky but i think will do the job.
Will update when im done.

Hi
I am just wondering have you fixed your script for paired end reads? I would like to try your script to extract the consensus sequences from bam file based on cufflinks gtf coordinates.

**kenietz** · 11-08-2012, 06:08 PM

Hi,
well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

I attach here the script.

But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

iter->flag_mask = BAM_DEF_MASK

you have to add the following lines:

iter->flag_mask &= -BAM_FSECONDARY;
iter->flag_mask &= -BAM_FDUP;

Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

Hope it does the job for you

Cheers

Attached Files

get_consensus_bam_batch_v3.pl (4.4 KB, 63 views)

**upendra_35** · 11-09-2012, 01:24 PM

Originally posted by kenietz View Post

Hi,
well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

I attach here the script.

But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

iter->flag_mask = BAM_DEF_MASK

you have to add the following lines:

iter->flag_mask &= -BAM_FSECONDARY;
iter->flag_mask &= -BAM_FDUP;

Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

Hope it does the job for you

Cheers

Thanks. I have sent a PM to you........

**figo1019** · 01-11-2013, 04:44 AM

Originally posted by kenietz View Post

Hi,
well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

I attach here the script.

But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

iter->flag_mask = BAM_DEF_MASK

you have to add the following lines:

iter->flag_mask &= -BAM_FSECONDARY;
iter->flag_mask &= -BAM_FDUP;

Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

Hope it does the job for you

Cheers

Hi Kenietz

I have send you a personal message

Regard

**Lizex** · 03-20-2013, 01:07 AM

Originally posted by figo1019 View Post

Hi Kenietz

I have send you a personal message

Regard

Hi Kenietz

I've tried to implement what you've suggested by adding the extra lines to samtools-0.1.18, saving it as samtools_ndsm and re-compiling it by cd into the bin where I keep samtools, typing
make
make[2]: Nothing to be done for `lib'.
make[2]: Nothing to be done for `lib'.
make[2]: Nothing to be done for `lib'.
gcc -g -Wall -O2 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o -Lbcftools libbam.a -lbcf -lcurses -lm -lz
gcc -g -Wall -O2 -o bcftools call1.o main.o ../kstring.o ../bgzf.o ../knetfile.o ../bedidx.o -L. -lbcf -lm -lz
make[1]: Nothing to be done for `all'.

I don't know what this means?

I continued with this command: samtools mpileup -m 100000 accepted_hits.bam > parsed.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000

I do get a file parsed.bam, which I then used with the script.

Followed by this script which I've downloaded:
perl get_consensus_bam_batch_v3.pl -b parsed.bam -t transcripts.gtf > consensus_sequences
sh: samtools-0.1.18/samtools_ndsm: Permission denied

I do get a file with this in it:
E2_Tophat/parsed.bam;MDC000004.264:487-777;/Data_Analysis/E2_data/E2_Tophat/parsed.bam.v3.pileup

Clearly there's something wrong somewhere? Any pointers???

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News