SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
find ORFs for cufflinks assembled transcripts jiexiong Bioinformatics 1 05-08-2014 12:26 PM
fetch transcripts assembled by cufflinks asling Bioinformatics 6 09-27-2012 09:46 PM
Improvements to assembled genome NPalopoli De novo discovery 6 01-31-2012 09:00 AM
Assembled sequence submission to Genbank? Melissa General 0 04-26-2011 12:54 AM
454 assembled reads katussa10 Bioinformatics 7 12-02-2010 01:19 PM

Reply
 
Thread Tools
Old 12-11-2011, 05:09 PM   #21
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Quote:
Originally Posted by huangjun View Post
the cufflinks only give us the rebuild assembled sequence based on the reference genome, but i want to know how to get unmapped contigs also , only that we could get some new transcripts.
Its my understanding that nothing tophat/cufflinks will do this. You'll something like trinity, ABySS/trans-ABySS, or Velvet/Oases to do de novo transcriptome assembly. Tophat and cufflinks require the reference sequence.
Wallysb01 is offline   Reply With Quote
Old 09-12-2012, 11:27 PM   #22
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Hi guys,
this thread was really useful for creating my script for extracting consensus sequences from the bam file. I tried to use the new samtools mpileup but it didnt do the job that i wanted. I tried to use vcf-tools's vcf-consensus but it can only extract the whole chromosome. I couldnt make it extract only a region. So went back to samtools-0.1.16 'pileup -c' option. I didnt use the reference sequnce tho cos is more complicated. So i just parse the result of the command:

samtools pileup -c bam.file

I take the transcript.gtf from cufflinks and for every transcript extract the consensus from the pileup. Well this task by it self sounds simple enough but its not. Cos cufflinks doesnt not provide the starts and ends as found in the bam for each found transcript but the start and end found in the annotation. So one has to search for the 'real' coordinates of the transcripts in the pile up first then extractions is easy. Well i have to admit i didnt try to run cufflinks without reference and then see the output. Maybe then transcripts.gtf will contain the true coordinates.

Now my script is working but is slow for big bams cos then the pileup is also big and parsing it takes time. Reading about multi-threading at the moment. Or using PDL's PP programming to make it a bit faster.
kenietz is offline   Reply With Quote
Old 09-27-2012, 09:02 AM   #23
figo1019
Member
 
Location: germany

Join Date: Jun 2012
Posts: 32
Default

Quote:
Originally Posted by kenietz View Post
Hi guys,
this thread was really useful for creating my script for extracting consensus sequences from the bam file. I tried to use the new samtools mpileup but it didnt do the job that i wanted. I tried to use vcf-tools's vcf-consensus but it can only extract the whole chromosome. I couldnt make it extract only a region. So went back to samtools-0.1.16 'pileup -c' option. I didnt use the reference sequnce tho cos is more complicated. So i just parse the result of the command:

samtools pileup -c bam.file

I take the transcript.gtf from cufflinks and for every transcript extract the consensus from the pileup. Well this task by it self sounds simple enough but its not. Cos cufflinks doesnt not provide the starts and ends as found in the bam for each found transcript but the start and end found in the annotation. So one has to search for the 'real' coordinates of the transcripts in the pile up first then extractions is easy. Well i have to admit i didnt try to run cufflinks without reference and then see the output. Maybe then transcripts.gtf will contain the true coordinates.

Now my script is working but is slow for big bams cos then the pileup is also big and parsing it takes time. Reading about multi-threading at the moment. Or using PDL's PP programming to make it a bit faster.
Hi Kenietz

Can you upload your script for doing the things it will be great as I am also trying the same thing to do

Regards
figo1019 is offline   Reply With Quote
Old 09-28-2012, 01:03 AM   #24
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Hi figo1019,
here it is. Hope it does the job for you.

I've put some explanations inside the script. Please take a look before using it.

Comments and questions are also welcomed

Cheers
Attached Files
File Type: pl get_consensus_bam_batch.pl (7.1 KB, 118 views)
kenietz is offline   Reply With Quote
Old 10-01-2012, 12:17 AM   #25
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Hi,
just figured out that my work around does not work for paired-end reads
Somehow pileup -c removes some reads except these ones:

0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate

I suppose i will have to use directly the command: samtools pileup BAM > pileup.out, then parse the output. Its a bit more tricky but i think will do the job.
Will update when im done.
kenietz is offline   Reply With Quote
Old 11-07-2012, 02:37 PM   #26
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Quote:
Originally Posted by kenietz View Post
Hi,
just figured out that my work around does not work for paired-end reads
Somehow pileup -c removes some reads except these ones:

0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate

I suppose i will have to use directly the command: samtools pileup BAM > pileup.out, then parse the output. Its a bit more tricky but i think will do the job.
Will update when im done.
Hi
I am just wondering have you fixed your script for paired end reads? I would like to try your script to extract the consensus sequences from bam file based on cufflinks gtf coordinates.
upendra_35 is offline   Reply With Quote
Old 11-08-2012, 05:08 PM   #27
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Hi,
well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

I attach here the script.

But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

iter->flag_mask = BAM_DEF_MASK

you have to add the following lines:

iter->flag_mask &= -BAM_FSECONDARY;
iter->flag_mask &= -BAM_FDUP;

Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

Hope it does the job for you
Cheers
Attached Files
File Type: pl get_consensus_bam_batch_v3.pl (4.4 KB, 63 views)
kenietz is offline   Reply With Quote
Old 11-09-2012, 12:24 PM   #28
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Quote:
Originally Posted by kenietz View Post
Hi,
well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

I attach here the script.

But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

iter->flag_mask = BAM_DEF_MASK

you have to add the following lines:

iter->flag_mask &= -BAM_FSECONDARY;
iter->flag_mask &= -BAM_FDUP;

Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

Hope it does the job for you
Cheers
Thanks. I have sent a PM to you........
upendra_35 is offline   Reply With Quote
Old 01-11-2013, 03:44 AM   #29
figo1019
Member
 
Location: germany

Join Date: Jun 2012
Posts: 32
Default

Quote:
Originally Posted by kenietz View Post
Hi,
well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

I attach here the script.

But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

iter->flag_mask = BAM_DEF_MASK

you have to add the following lines:

iter->flag_mask &= -BAM_FSECONDARY;
iter->flag_mask &= -BAM_FDUP;

Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

Hope it does the job for you
Cheers
Hi Kenietz

I have send you a personal message

Regard
figo1019 is offline   Reply With Quote
Old 03-20-2013, 01:07 AM   #30
Lizex
Member
 
Location: Cape Town South Africa

Join Date: Feb 2011
Posts: 22
Default

Quote:
Originally Posted by figo1019 View Post
Hi Kenietz

I have send you a personal message

Regard
Hi Kenietz

I've tried to implement what you've suggested by adding the extra lines to samtools-0.1.18, saving it as samtools_ndsm and re-compiling it by cd into the bin where I keep samtools, typing
make
make[2]: Nothing to be done for `lib'.
make[2]: Nothing to be done for `lib'.
make[2]: Nothing to be done for `lib'.
gcc -g -Wall -O2 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o -Lbcftools libbam.a -lbcf -lcurses -lm -lz
gcc -g -Wall -O2 -o bcftools call1.o main.o ../kstring.o ../bgzf.o ../knetfile.o ../bedidx.o -L. -lbcf -lm -lz
make[1]: Nothing to be done for `all'.

I don't know what this means?

I continued with this command: samtools mpileup -m 100000 accepted_hits.bam > parsed.bam
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000

I do get a file parsed.bam, which I then used with the script.

Followed by this script which I've downloaded:
perl get_consensus_bam_batch_v3.pl -b parsed.bam -t transcripts.gtf > consensus_sequences
sh: samtools-0.1.18/samtools_ndsm: Permission denied

I do get a file with this in it:
E2_Tophat/parsed.bam;MDC000004.264:487-777;/Data_Analysis/E2_data/E2_Tophat/parsed.bam.v3.pileup

Clearly there's something wrong somewhere? Any pointers???
Lizex is offline   Reply With Quote
Old 03-25-2013, 08:15 PM   #31
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Hi Lizex,
im sorry for the late reply, im very busy lately and could not find time. But better later than never

firstly it seems to me you compiled samtools before, i mean when you first installed the program. Then, try this:
1 Edit the source file
2 make clean
3 ./configure
4 make


This should work.

'Permission denied' shows that the program is owned most probably by root and you try running it as user. Two things possible here:
1. Copy/Move the program to a place in your PATH.
2. Change the owner of the file to be your user with 'chown'. Check 'man chown' for syntax. Also you need to be root to use chown.

Else provide me with email n i will send you the compiled version of modified samtools plus last version of my script.

Cheers
kenietz is offline   Reply With Quote
Old 03-25-2013, 11:15 PM   #32
Lizex
Member
 
Location: Cape Town South Africa

Join Date: Feb 2011
Posts: 22
Default

Hi Kenietz
Thanks for the reply.
I've send the my e-mail adress.

Regards
Lizex is offline   Reply With Quote
Old 06-06-2013, 09:58 AM   #33
YazBraimah
Junior Member
 
Location: USA

Join Date: Jul 2012
Posts: 7
Default

Kenietz,

I just used your script and it works beautifully. Thanks for writing it.

Yaz
YazBraimah is offline   Reply With Quote
Old 06-06-2013, 05:14 PM   #34
kenietz
Member
 
Location: Singapore

Join Date: Nov 2011
Posts: 85
Default

Hi YazBraimah,
glad to hear that it worked for you

Cheers
kenietz is offline   Reply With Quote
Old 07-09-2013, 05:57 AM   #35
YazBraimah
Junior Member
 
Location: USA

Join Date: Jul 2012
Posts: 7
Default

FYI,

A Trinity plug-in can also do this. See:

http://transdecoder.sourceforge.net/

under "Starting from a genome-based transcript structure GTF file (eg. cufflinks)"

YB
YazBraimah is offline   Reply With Quote
Old 01-09-2014, 06:07 PM   #36
xmubingo
Member
 
Location: China guangzhou

Join Date: Sep 2013
Posts: 13
Default

Quote:
Originally Posted by pkerrwall View Post
Here is the process that I am using:

samtools mpileup -uf ref.fa accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq | fq2fa.pl > new_ref.fa
gffread -w transcripts.fa -g new_ref.fa transcripts.gtf

where fq2fa.pl is a bioperl script to convert from fq to fasta

I also have an email into the cufflinks developers to see if there is a way that the gffread utility can be enhanced to get the consensus sequence from the bam file and not the reference genome.

Hi pkerrwall, I used commands same as yours. but i found that sequences in cns.fq generated by fq2fa.pl have different length with them in reference sequences. do you know why?
xmubingo is offline   Reply With Quote
Old 01-09-2014, 06:24 PM   #37
yueluo
Member
 
Location: Guangzhou China

Join Date: Aug 2013
Posts: 82
Default

Quote:
Originally Posted by xmubingo View Post
Hi pkerrwall, I used commands same as yours. but i found that sequences in cns.fq generated by fq2fa.pl have different length with them in reference sequences. do you know why?
Take a look at this post:
http://seqanswers.com/forums/showthread.php?t=35066
As gringer mentioned:
Quote:
Originally Posted by gringer View Post
But bear in mind that the vcf2fq script is designed for SNPs, not INDELs. If there is an INDEL in your sequence relative to the reference, then the INDEL and a few flanking bases will be changed to lower case, but not replaced. This means that any fastq file generated from this script will have the same length as the input reference sequence.

I sent a patch to their sourceforge page to fix this (and allow more compact partial vcf files with a provided reference sequence), but I don't think it's been implemented yet.
yueluo is offline   Reply With Quote
Old 01-10-2014, 01:56 AM   #38
xmubingo
Member
 
Location: China guangzhou

Join Date: Sep 2013
Posts: 13
Default

Quote:
Originally Posted by yueluo View Post
Take a look at this post:
http://seqanswers.com/forums/showthread.php?t=35066
As gringer mentioned:
Thanks, yueluo!! I asked gringer, and he replied. I guess some incorrect operations in my pipeline.
xmubingo is offline   Reply With Quote
Old 02-17-2014, 02:04 AM   #39
kiran0991
Analyst
 
Location: Pune, India

Join Date: Jan 2012
Posts: 2
Default

Quote:
Originally Posted by zchou View Post
Hi All,

I use Bowtie/TopHat/cufflink to analyze the RNA-Seq. I want to extract the rebuild transcripts by these softwares. However, I can only the BED file. Can anyone give idea to extract these assembled transcript sequence?

Thanks,
ZC
To extract the assembled transfrags from cufflink, following command is useful

$gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf

Source : Cufflink
kiran0991 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO