![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
the low percentage of mapped reads | wangli | RNA Sequencing | 2 | 04-04-2012 06:56 AM |
Calculating Percentage of n's for Each Transcript in FASTA File | KDS | Bioinformatics | 7 | 06-08-2011 12:06 AM |
a question for the percentage of aligned reads | chenwb | SOLiD | 1 | 03-18-2011 08:58 AM |
Percentage of mapped reads ? | zack80.liu | Bioinformatics | 6 | 03-01-2011 10:08 AM |
low percentage of reads mapped | rahilsethi | SOLiD | 3 | 09-13-2010 07:01 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
I have done a blast on virus contigs and obtained hits matching to virus (strains/isolates) in the database. I would like to calculate the percentage of reads that are aligning to the viruses in the database. Can someone guide me on how to do this?
|
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]()
You can download the virus sequences in fasta format and use e.g. Bowtie2 to align the reads locally. Therefore, you need to build an index first. The log output of Bowtie2 tells you haw many reads mapped.
After aligning the reads, you can use samtools to get some statistics (e.g. samtools idxstats). |
![]() |
![]() |
![]() |
#3 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
Thanks Michael,
I have not used Bowtie/ samtools before. How do I start off? |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,082
|
![]()
1. What format are your blast results in (html, xml, text)? You may be able to parse that result file if all you want to know is how many sequences hit a "virus".
2. If you did the blast locally do you have a sequence file with all "virus" sequences available? You will be able to use that file as an input for bowtie2 and follow the path @Michael.Ante suggested. 3. Are you comfortable using command line (e.g. linux) applications? |
![]() |
![]() |
![]() |
#5 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
1. blast results are in txt format
2. yes. sequence file is available ( database file?) 3. am fairly comfortable with command line 4. I would prefer the command prompt option as opposed to logging on the cluster (my internet is erratic) Last edited by kaps; 04-11-2015 at 06:05 AM. |
![]() |
![]() |
![]() |
#6 | |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]() Quote:
you should have a look at the Bowtie2 homepage. There, it is explained in detail how the programs work. At the end of the manual is a "Lambda phage example", which has quite an overlap to your problem. It also has a SAMtools downstream section... Cheers, Michael |
|
![]() |
![]() |
![]() |
#7 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
Hi, Michael
Thanks for pointing this to me! |
![]() |
![]() |
![]() |
#8 | |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]() Quote:
I am getting a comment as below; samtools idxstats lib4seq.sorted.bam [bam_idxstats] fail to load the index. what could be the problem? |
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]()
Have you created the index with
Code:
samtools index lib4seq.sorted.bam If yes how does your bam-dile header looks like? Code:
samtools view -H lib4seq.sorted.bam |
![]() |
![]() |
![]() |
#10 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
I had not created the index,
I can now see the statistics in the index file! Thanks |
![]() |
![]() |
![]() |
#11 | |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#12 | |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]() Quote:
After getting the samtools idxstats (on number of mapped vs unmapped reads), is it possible to extract/select reads that mapped from the raw read files/query? how is it done? |
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,082
|
![]()
If you had used the "--un-conc and --al-conc" options (http://bowtie-bio.sourceforge.net/bo...output-options) the unmapped reads could have been written to separate files when you did the alignment.
1. You could repeat bowtie2 alignment with above parameters added to your original list (easier) OR 2. Identify read ID's of sequences that mapped and use a tool like seqtk to extract the mapped reads (e.g. seqtk subseq in.fq name.lst > out.fq) Use @Michael.Ante's easy suggestion below ![]() Last edited by GenoMax; 05-12-2015 at 05:34 AM. |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]()
You can use samtools view to extract the mapped/unmapped reads by filtering the 'unmapped' flag:
Code:
samtools view -F 4 -bh lib4seq.sorted.bam > lib4seq.sorted.mapped.bam samtools view -f 4 -bh lib4seq.sorted.bam > lib4seq.sorted.unmapped.bam |
![]() |
![]() |
![]() |
#15 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
if i want to convert lib4seq.sorted.mapped.bam to a fastq file (creating 2 files for paired end) do i need to sort this bam file again?
Last edited by kaps; 05-19-2015 at 03:39 AM. Reason: error correction |
![]() |
![]() |
![]() |
#16 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
I have generated fastq files (R1& R2) from the mapped bam file and I am now interested in assembling those reads using velvet. I need guidance to how to run velveth, velvetg and viewing the output.
|
![]() |
![]() |
![]() |
#17 | |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]() Quote:
here is the feedback: samtools view -bS 4_S4_paired.sam 4_S4_paired.bam [main_samview] random alignment retrieval only works for indexed BAM or CRAM files. ▒BC▒m▒▒N▒0D▒ܜK▒X=p[▒qU| ▒ ▒▒▒5r[SY▒I▒▒▒O▒sH[@▒▒▒f▒▒▒a0▒w3▒Rʄq▒▒G▒[▒2▒▒Q▒Lk▒▒ vU▒1)J▒ei▒M&"▒qq5▒Q^ݒ▒▒▒▒▒`▒ U▒▒▒Q▒ vI▒▒▒f▒▒q▒▒:;▒=▒▒P▒▒U▒▒"n\o3▒ε▒ ▒;*(n▒J)*▒▒▒▒֙%▒zM19H▒I▒▒▒▒▒5{▒>▒S▒H▒Qt▒~▒▒▒ ▒▒y▒▒ when with -bT, I get a similar feedback as; samtools view -bT cp4genome.fa 4_S4_paired.sam 4_S4_paired.bam [samfaipath] build FASTA index... [fai_build_core] different line length in sequence '4_S4_contig_23'. [samfaipath] fail to build FASTA index. [main_samview] random alignment retrieval only works for indexed BAM or CRAM files. ▒BC▒m▒▒N▒0D▒ܜK▒X=p[▒qU| ▒ ▒▒▒5r[SY▒I▒▒▒O▒sH[@▒▒▒f▒▒▒a0▒w3▒Rʄq▒▒G▒[▒2▒▒Q▒Lk▒▒ vU▒1)J▒ei▒M&"▒qq5▒Q^ݒ▒▒▒▒▒`▒ U▒▒▒Q▒ vI▒▒▒f▒▒q▒▒:;▒=▒▒P▒▒U▒▒"n\o3▒ε▒ ▒;*(n▒J)*▒▒▒▒֙%▒zM19H▒I▒▒▒▒▒5{▒>▒S▒H▒Qt▒~▒▒▒ where is the problem? |
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: Vienna Join Date: Oct 2011
Posts: 123
|
![]()
Hi Kaps,
try Code:
samtools view -S 4_S4_paired.sam | head Code:
samtools view -bSh 4_S4_paired.sam > 4_S4_paired.bam Cheers, Michael |
![]() |
![]() |
![]() |
#19 |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]()
its working.
Thanks |
![]() |
![]() |
![]() |
Thread Tools | |
|
|