Seqanswers Leaderboard Ad

**Brian Bushnell** · 09-18-2017, 04:21 PM

I suggest you try BBSplit as in this thread. That will give you one fastq file for human reads, and one for viral reads. Then, map the viral output fastq file to the virus reference with a normal aligner (such as BBMap or Nextgenmap which you used previously). BBMap can output coverage data directly (well, BBSplit can too, actually...) using the covstats or basecov flags, if you want.

In answer to your last question, read IDs do not change with respect to reference genomes. However, unmapped bam files are not very useful and you can't use them for coverage analysis.

**Vca80553** · 09-19-2017, 04:27 AM

Dear Brian, Thanks a lot. I did as you suggested.
I also used BBMap (pileup.sh) for output coverage. I get the following results
Avg fold 53035,994
Length 7904
Ref_GC 0.0000
Covered % 100
Covered bases 7904
Plus reads 3190731
Minus reads 3190731
Read GC 0.395
Median_fold 22882
Std 24900.22

I assume I covered the whole genome. Do you happen to know whey the Ref_GC equals to 0.0000? I calculated GC content % of my reference genome and it is 36.51%. Thanks!

**Brian Bushnell** · 09-19-2017, 12:16 PM

That's because you did not specify the reference file with "ref=" when you ran pileup.sh. It's not necessary; you only need it if you want the Ref_GC column to be correct.

**Vca80553** · 09-19-2017, 01:23 PM

Ref_GC

Originally posted by Brian Bushnell View Post

That's because you did not specify the reference file with "ref=" when you ran pileup.sh. It's not necessary; you only need it if you want the Ref_GC column to be correct.

I wrote the following, but still didn't get it. Maybe something not right?

/home/sara/bbmap/pileup.sh in=1409_cat_sorted.bam "ref=/home/sara/HPV16Reference.fasta" out=1409.ref_stats1.txt hist=1409.ref1_histogram.txtll

Thanks

**Brian Bushnell** · 09-19-2017, 02:48 PM

That's strange - I tested it and it works fine for me. No reference:

Code:

pileup.sh in=mapped.sam.gz stats=covstats.txt

cat covstats.txt

#ID     Avg_fold        Length  Ref_GC  Covered_percent Covered_bases   Plus_reads      Minus_reads     Read_GC Median_fold     Std_Dev
chr1    0.0908  249250621       0.0000  8.1241  20249466        76413   74517   0.4179  0       0.33
chr10   0.0960  135534747       0.0000  8.6332  11700973        43491   43228   0.4160  0       0.33

With reference:

Code:

pileup.sh in=mapped.sam.gz stats=covstats2.txt ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz

cat covstats2.txt

#ID     Avg_fold        Length  Ref_GC  Covered_percent Covered_bases   Plus_reads      Minus_reads     Read_GC Median_fold     Std_Dev
chr1    0.0908  249250621       0.4183  8.1241  20249466        76413   74517   0.4179  0       0.33
chr10   0.0960  135534747       0.4167  8.6332  11700973        43491   43228   0.4160  0       0.33

I tried various things including adding "hist=" and using "out=" instead of "stats=", and putting quotes around the reference flag, but was unable to replicate this. Are you sure you are looking at the correct output file, rather than an old one?

Topics	Statistics	Last Post
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, Today, 08:18 AM	0 responses 8 views 0 likes	Last Post by seqadmin Today, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, Today, 08:04 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 27 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM

Seqanswers Leaderboard Ad

Announcement

Bam file with unmapped reads from another genome than reference

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News