SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie, an ultrafast, memory-efficient, open source short read aligner Ben Langmead Bioinformatics 513 05-14-2015 02:29 PM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 09:37 AM
Miso's open source joyce kang Bioinformatics 1 01-25-2012 06:25 AM
Targeted resequencing - open source stanford_genome_tech Genomic Resequencing 3 09-27-2011 03:27 PM
EKOPath 4 going open source dnusol Bioinformatics 0 06-15-2011 01:10 AM

Reply
 
Thread Tools
Old 01-24-2019, 11:46 AM   #661
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Have you verified that the $i variable is expanding correctly?

If you have multiple files to align you should first create an index by doing

Code:
bbmap.sh ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa
this will create a "ref" directory with all index files. Don't worry about what is in the directory (bbmap uses it own organization).

Then when you run use
Code:
path=dir_containing_ref_dir
in command line, instead of "ref=". This will avoid re-indexing the genome each time.

If you are using a job scheduler then you should submit each alignment job separately (not the way you have the loop setup, which I assume is submitted as a single job?)
GenoMax is offline   Reply With Quote
Old 01-24-2019, 12:03 PM   #662
chutfilz
Junior Member
 
Location: Providence, RI

Join Date: Jan 2019
Posts: 3
Default

Wow! Thanks for the fast reply. I've pared down my job to just one set of directly-called, paired files to eliminate the possibility of a malfunctioning $i.

Code:
bbmap.sh ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa in=PoolCH-1_R1_001_val_1.fa.gz in2=PoolCH-1_R2_001_val_2.fa.gz out=PoolCH-1.sam
The ref assignment was my first argument of the previous command as well, and I've retained it in this run (deleted the previous 'ref' folder first, from the previous failed run).

No use in posting the error I received - it's exactly the same as before!
chutfilz is offline   Reply With Quote
Old 01-25-2019, 03:41 AM   #663
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I see that you are using samtools module as well. With that you can directly write BAM files no need to use SAM.

So let us try a modified command line and see what happens (I am going to assume that you have ~30G of RAM and 4 cores available for this job in command below and the two fastq files are in the current directory). dm3.fa is just a multi-fasta file of Drosophila chromosomes?

Code:
bbmap.sh -Xmx30g threads=4 ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa in1=PoolCH-1_R1_001_val_1.fa.gz in2=PoolCH-1_R2_001_val_2.fa.gz out=PoolCH-1.bam ambig=random maxindel=10000 trd=t
GenoMax is offline   Reply With Quote
Old 01-25-2019, 10:38 AM   #664
chutfilz
Junior Member
 
Location: Providence, RI

Join Date: Jan 2019
Posts: 3
Default

No dice, same error.

dm3.fa is a file under the subdirectory "WholeGenomeFasta" in the file set downloaded from iGenomes.

Also in this subdirectory are files ending in .dict, .fa.fai, and an xml for genome size. I only imported the .fa to my institution's server for mapping purposes.

cat dm3.fa reveals unannotated sequence, as expected, as well as a few stretches of ~1,000 Ns.
chutfilz is offline   Reply With Quote
Old 03-17-2019, 06:41 PM   #665
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.

Last edited by ssully; 03-17-2019 at 06:43 PM.
ssully is offline   Reply With Quote
Old 03-18-2019, 03:50 AM   #666
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by ssully View Post
How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.

If your OS does not support shellscripts, replace 'bbmap.sh' like this:
Code:
java -XmxNNg -cp /path/to/current align2.BBMap in=reads.fq out=mapped.sam
(NN will be a real number on your system).
GenoMax is offline   Reply With Quote
Old 03-29-2019, 06:44 AM   #667
aushev
Member
 
Location: Europe

Join Date: Nov 2009
Posts: 21
Default

I'm wondering if anyone tried to apply callvariants.sh to RNA-seq data. When I tried to use it with my bam file, it found about 140,000 variants - but all of them are homozygous which is obviously impossible. I guess I should play with the parameters somehow...
aushev is offline   Reply With Quote
Old 04-01-2019, 05:17 PM   #668
seqmore
Junior Member
 
Location: china

Join Date: Apr 2019
Posts: 4
Default RNAseq data analysis failed with BBMAP

Dear Brain,

BBMAP is great for mapping coverage and mapping speed. I have tried several times but failed. The versions of bbmap and samtools are 38.22 and 0.1.9, respectively. My data is RNA seq generated using human cell lines. The command lines and output are listed below:

bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa

$bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh

## a.fq has been trimmed using trim-galore and dynamictrim. The sequencer is illumila hiseq.

$samtools flagstat a.bbmap.sam
[bam_header_read] EOF marker is absent.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_flagstat_core] Truncated file? Continue anyway.
0 in total
0 QC failure
0 duplicates
0 mapped (-nan%)
0 paired in sequencing
0 read1
0 read2
0 properly paired (-nan%)
0 with itself and mate mapped
0 singletons (-nan%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)


$more a.bbmap.sam
@HD VN:1.4 SO:unsorted
@SQ SN:1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF LN:248956422
@SQ SN:10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF LN:133797422
@SQ SN:11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF LN:135086622
@SQ SN:12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF LN:133275309
@SQ SN:13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF LN:114364328
@SQ SN:14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF LN:107043718
@SQ SN:15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF LN:101991189
@SQ SN:16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF LN:90338345
.......................
......................[omit other lines]
@PG ID:BBMap PN:BBMap VN:38.22 CL:java -Djava.library.path=/path/bbmap-38.22-1/jni/ -ea -Xmx158342m align2.BBMap
build=1 overwrite=true fastareadlen=500 maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbma
p.fq bs=script.sh
E00603:213:HVLFGCCXY:1:1101:20172:9431 1:N:0:ACGGAACA 16 5 dna:chromosome chromosome:GRCh38:5:1:181538259:1 REF 14481853 42 44= * 0 0 CAGAAACAAGCAGGACCGGGCTTTGTCTCTTGGGCCCAGTACTG FA<JJJAJJJAJFA7FJJJJFJFJJJJJFJJJJF7JJFJJJFFF NM:i:0 AM:i:42 XM:i:1 NH:i:1
E00603:213:HVLFGCCXY:1:1101:17056:10081 1:N:0:ACGGAACA 4 * 0 0 * * 0 0 AAGCAAGTCTTTATCTTTAGAATAAATGTAGT JJJJ7FAFFFFJJJJJAJJJJJJJJJJJJJ77
.......................
......................[omit other lines]


$sh script.sh
Note: This script is designed to run with the amount of memory detected by BBMap.
If Samtools crashes, please ensure you are running on the same platform as BBMap,
or reduce Samtools' memory setting (the -m flag).
Note: Please ignore any warnings about 'EOF marker is absent'; this is a bug in samtools that occurs when using piped input.
[samopen] SAM header is present: 194 sequences.
sort: invalid option -- '@'
Parse error at line 197: invalid CIGAR character
open: No such file or directory
Aborted (core dumped)
[bam_sort_core] fail to open file 3
open: No such file or directory
[bam_index_build2] fail to open the BAM file.


Could you give some suggestions? Thanks a lot.
seqmore is offline   Reply With Quote
Old 04-02-2019, 03:54 AM   #669
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

@seqmore: I don't see you actually providing the sequence index you made in step 1 to your bbmap.sh command.

It would be included using "path=dir_with_index" in

Code:
$bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh
GenoMax is offline   Reply With Quote
Old 04-02-2019, 06:50 AM   #670
seqmore
Junior Member
 
Location: china

Join Date: Apr 2019
Posts: 4
Default

Thank you for your kindly replay @GenoMax. I didnot specify ref= since I have copied the ref fold genereted by index building with bbmap to the current working directory. As I learned from your post, bbmap will automatically find the ref fold in the current directory. I also succeeded in this way for many times previously. Now, as you indicate, I rerun the command again with ref= specified, but I failed as above. I should mention that the screen output looks like normal, as shown below.
So I'm confused. I would be really appreciate if you could clarify this issue. Thanks a lot in advance.

The screen output during bbmap:
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, maxindel=100k, intronlen=10, in=a.fq, out=a.bb.sam, outu=a.unbbmap.fq, bs=script.sh]
Version 38.22

Retaining first best site only for ambiguous mappings.
Found samtools 0.1.9
Writing reference.
Executing dna.FastaToChromArrays2 [/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

Set genScaffoldInfo=true
Writing chunk 1
Writing chunk 2
Writing chunk 3
Writing chunk 4
Writing chunk 5
Writing chunk 6
Writing chunk 7
Set genome to 1

Loaded Reference: 0.010 seconds.
Loading index for chunk 1-7, build 1
No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr1-3_index_k13_c2_b1.block
No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr4-7_index_k13_c2_b1.block
Indexing threads started for block 4-7
Indexing threads started for block 0-3
Indexing threads finished for block 0-3
Indexing threads finished for block 4-7
Generated Index: 213.256 seconds.
Finished Writing: 19.955 seconds.
Analyzed Index: 7.710 seconds.
Started output stream: 0.045 seconds.
Started output stream: 0.001 seconds.
Cleared Memory: 0.241 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 56 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55

------------------ Results ------------------

Genome: 1
Key Length: 13
Max Indel: 100000
Minimum Score Ratio: 0.56
Mapping Mode: normal
Reads Used: 1236379 (56305606 bases)

Mapping: 163.878 seconds.
Reads/sec: 7544.53
kBases/sec: 343.58


Read 1 data: pct reads num reads pct bases num bases

mapped: 42.1017% 520537 41.2560% 23229413
unambiguous: 28.9375% 357777 29.3317% 16515379
ambiguous: 13.1642% 162760 11.9243% 6714034
low-Q discards: 0.0000% 0 0.0000% 0

perfect best site: 34.4476% 425903 34.5461% 19451376
semiperfect site: 34.4530% 425970 34.5520% 19454715

Match Rate: NA NA 45.7553% 22917698
Error Rate: 7.7666% 94588 54.2440% 27169499
Sub Rate: 7.4116% 90264 0.6028% 301949
Del Rate: 0.7988% 9728 53.6224% 26858156
Ins Rate: 0.3819% 4651 0.0188% 9394
N Rate: 0.0053% 64 0.0007% 372
Splice Rate: 0.4858% 5917 (splices at least 10 bp)

Total time: 438.182 seconds.
seqmore is offline   Reply With Quote
Old 04-02-2019, 07:07 AM   #671
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

I am not sure I am understanding what seems to be happening. Is the flagstat command showing no reads aligned?

At this point in time samtools 0.1.19 is ancient and should really NOT be used for anything. Errors you are seeing also are about samtools options that only the new versions have.

You should upgrade to latest samtools which is now in v.1.9. As long as samtools is in your $PATH, BBMap is able to directly write BAM files so there is no need to create SAM files. Just specify out=yourfile.bam.
GenoMax is offline   Reply With Quote
Old 04-02-2019, 06:43 PM   #672
seqmore
Junior Member
 
Location: china

Join Date: Apr 2019
Posts: 4
Default

@GenoMax, Your suggestion is great! I uninstall Samtools and reinstall 1.9. The samtools flagstat is working. Then I try output bam directly as you suggested, like this:
bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa ambig=all xstag=unstranded xmtag=t maxindel=100k intronlen=10 in=a.fq out=bbmap.bam outu=unbbmap.fq

Next, I perform cufflinks using the bam. The command line is
cufflinks bbmap.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
The std output:
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[09:21:15] Loading reference annotation.
[09:21:42] Inspecting reads and determining fragment length distribution.
> Processed 37488 loci. [*************************] 100%
> Map Properties:
> Normalized Map Mass: 0.00
> Raw Map Mass: 0.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[09:21:46] Estimating transcript abundances.
> Processed 37488 loci. [*************************] 100%

The transcripts.gtf looks strange with all FPKM=0
$more ./cuff_out/transcripts.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

I try two sort methods to sort the bam file, one is like "$sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort", and the other is "$samtools sort -n bbmap.bam >bbmap.sortn.bam". Both are failed to get FPKM values.
However, I get normal FPKMs by Tophat2 using the same set of genome assembly and gtf annotation.

Any comments or suggestions are greatly appreciated.
seqmore is offline   Reply With Quote
Old 04-02-2019, 07:16 PM   #673
seqmore
Junior Member
 
Location: china

Join Date: Apr 2019
Posts: 4
Default

additional information:
The outputs by both sort methods are list below. Sorry to put so mach words here. In fact I have been tortured by this error for several weeks but cannot figure it out by myself. I would be very grateful if GenoMax or Brian or anyone could shed light on this issue. Thanks a lot!

sort method #1:
$sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort

$cufflinks bbmap.bam.sort -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File bbmap.bam.sort doesn't appear to be a valid BAM file, trying SAM...
[10:04:49] Loading reference annotation.
[10:05:15] Inspecting reads and determining fragment length distribution.
SAM error on line 148483: CIGAR op has zero length
SAM error on line 171044: CIGAR op has zero length
SAM error on line 172120: CIGAR op has zero length
SAM error on line 173571: CIGAR op has zero length
SAM error on line 186806: CIGAR op has zero length
"#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856O?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
SAM error on line 191159: CIGAR op has zero length
SAM error on line 199865: CIGAR op has zero length
SAM error on line 213871: CIGAR op has zero length
> Processed 37488 loci. [*************************] 100%
> Map Properties:
> Normalized Map Mass: 0.00
> Raw Map Mass: 0.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[10:05:19] Estimating transcript abundances.
SAM error on line 401632: CIGAR op has zero length
SAM error on line 424193: CIGAR op has zero length
SAM error on line 425269: CIGAR op has zero length
SAM error on line 426720: CIGAR op has zero length
SAM error on line 439955: CIGAR op has zero length
"#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856O?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
SAM error on line 444308: CIGAR op has zero length
SAM error on line 453014: CIGAR op has zero length
SAM error on line 467020: CIGAR op has zero length
> Processed 37488 loci. [*************************] 100%


$more ./cuff_out/transcripts.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

sort method #2:
$samtools sort -n bbmap.bam >bbmap.sortn.bam

$cufflinks bbmap.sortn.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff.sortn
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[09:48:22] Loading reference annotation.
[09:48:48] Inspecting reads and determining fragment length distribution.
> Processed 37488 loci. [*************************] 100%
> Map Properties:
> Normalized Map Mass: 0.00
> Raw Map Mass: 0.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[09:48:53] Estimating transcript abundances.

$ more ./cuff.sortn/transcripts.gtf
1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
............................

Last edited by seqmore; 04-02-2019 at 07:22 PM.
seqmore is offline   Reply With Quote
Old 04-04-2019, 09:50 AM   #674
darencard
Junior Member
 
Location: Boston, MA

Join Date: Apr 2019
Posts: 1
Default Divide by 0 error in randomreads.sh

I am having an issue with randomreads.sh that I cannot make sense of myself.

I am using this tool to try to extract a random subset of a genome. Most tools subset by selecting some proportion of sequences, but I want to randomly sample pieces of randomly-sampled sequences. So read simulators seem to be the better option for this.

In this case, I'm trying to sample a (giant!) salamander genome from NCBI. For now I just have some arbitrary length/number settings, as follows:

randomreads.sh ref=GCA_002915635.2_ASM291563v2_genomic.fna out=test.fq reads=100 minlength=50000 maxlength=500000 seed=5 banns=t adderrors=f

As the command shows, I do not want variants or errors added in at all; the sequences should be identical to the reference genome.

Here is the output I'm getting, which indicates some sort of 'divide by 0' error. Hopefully someone can help me diagnose and overcome this issue.

Executing align2.RandomReads3 [build=1, ref=GCA_002915635.2_ASM291563v2_genomic.fna, out=test.fq, reads=100, minlength=50000, maxlength=500000, seed=5, banns=t, adderrors=f]

Writing reference.
Executing dna.FastaToChromArrays2 [GCA_002915635.2_ASM291563v2_genomic.fna, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=500, nodisk=false]

Set genScaffoldInfo=true
Exception in thread "main" java.lang.ArithmeticException: / by zero
at align2.RandomReads3.fillRandomChrom(RandomReads3.java:1758)
at align2.RandomReads3.<init>(RandomReads3.java:585)
at align2.RandomReads3.main(RandomReads3.java:389)

Thanks!
Daren
darencard is offline   Reply With Quote
Old 06-03-2019, 12:46 AM   #675
tamu_anand
Junior Member
 
Location: us

Join Date: May 2011
Posts: 7
Default

Has anyone used bbmap for QuantSeq Data Analysis (more precisely the QuantSeq FWD protocol). The Lexogen website recommends bbduk for quality trimming and suggests use of STAR for downstream analysis.

Is it possible to do something similar (to STAR) with bbmap? In other words, is there an analogous bbmap command similar to how one does mapping with STAR (using the genome index and gtf together)?

Thanks in advance.
tamu_anand is offline   Reply With Quote
Reply

Tags
bbmap, metagenomics, rna-seq aligners, short read alignment

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO