Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Thanks for your quick reply. I ran a test run with both the references in the same command.

    Comment


    • #62
      Hi Brian,

      I'm trying to use bbsplit to separate rnaseq reads from two mixed fungal samples. I'm using the individual transcriptomes as references. I was getting some unexpected results. It seemed that more reads were unambiguously mapping to the reference that is listed first, so I swapped the order of the references and the results changed dramatically. I have ambiguous2=toss, but it seems like it's still using the first best site. Below are my commands and refstats output. Is there anything I'm doing wrong?

      Thanks,
      Brian
      Code:
      bbsplit.sh ref=53.fasta,17.fasta \
              in=53_30_r1_S7_R1_001.fastq.gz in2=53_30_r1_S7_R2_001.fastq.gz \
              out_17=map17_53_30_r1_S7_R#_001.fastq.gz \
              out_53=map53_53_30_r1_S7_R#_001.fastq.gz \
              refstats=53_30_r1_S7.stats ambiguous2=toss
      
      #name	%unambiguousReads	unambiguousMB	%ambiguousReads	ambiguousMB	unambiguousReads	ambiguousReads
      53	41.51013	1625.01508	57.30665	2219.25878	11241396	15519266
      17	1.13394	44.03152	57.30665	2219.25878	307084	15519266        
              
      bbsplit.sh ref=17.fasta,53.fasta \
              in=53_30_r1_S7_R1_001.fastq.gz in2=53_30_r1_S7_R2_001.fastq.gz \
              out_17=map17_53_30_r1_S7_R#_001.fastq.gz \
              out_53=map53_53_30_r1_S7_R#_001.fastq.gz \
              refstats=53_30_r1_S7.stats2 ambiguous2=toss
      
      #name	%unambiguousReads	unambiguousMB	%ambiguousReads	ambiguousMB	unambiguousReads	ambiguousReads
      53	21.37940	838.36051	67.54242	2623.22348	5789774	18291224
      17	11.02890	426.72088	67.54242	2623.22348	2986746	18291224
      Last edited by GenoMax; 08-20-2018, 08:03 AM.

      Comment


      • #63
        Contamination from human genome?

        Hi,

        I am working on non-model fish RNA-seq data, I am considering remove human contamination from reads, is this feasible since there is number of orthologs between human and fish?
        Is there any recommendation regarding choice of "-minratio" for this case? It seems that 0.56 maybe too low? (I don't have reference genome for this non-model fish, by the way)

        P.s: I think there should be different usage strategy of sensitivity or specificity for the case of binning (having 2 reference, i.e host vs contaminant, both have comparative alignment score to judge) AND for the case of decontaminating (only have the reference of contaminant, judgement only based on alignment to contaminant reference).

        Thank you very much for your suggestion !

        Comment


        • #64
          Question about BBsplit ambig2=toss and bam files

          Hello!

          I am using BBsplit to separate reads from a paired-end three-species bacterial RNASeq project. I set the flag ambig2=toss but then see this sentence in the print out for the code:

          "Retaining first best site only for ambiguous mappings."

          To me, that looks like default ambiguous=best. Is that what I should be seeing? How do I know if the ambiguous reads are being tossed?

          Additionally, I am mapping directly into a bam file. From earlier posts, looks like BBsplit bam files are incompatible with IGV but would they be okay with a feature counter like HTseq or edgeR?

          Thanks very much,
          Amanda

          Comment


          • #65
            @Amanda: I will need to dig through some past correspondence with Brian but I think he had recommended splitting first and then mapping to avoid the problem of having all references present in the BAM file. Which indeed causes issues with visualization programs.

            If you look at the in-line help for "ambiguous2" you can see what it is doing:
            Code:
            ambiguous2=<best>    Set behavior only for reads that map ambiguously to multiple different references.
                                 Normal 'ambiguous=' controls behavior on all ambiguous reads;
                                 Ambiguous2 excludes reads that map ambiguously within a single reference.

            Comment


            • #66
              Hi there,
              I am trying to run BBSplit on a huge chr-level assembled reference genome (~24Gb) and its non-chr-level-assembled contigs (ca. 1Gb) with the following command on remote server (I specify the maximum memory use in the server as 64G).

              bbsplit.sh -Xmx40g ambiguous=toss ambiguous2=toss in1=HKs_fq/HK002_L1_1_trimmed.fastq.gz in2=HKs_fq/HK002_L1_2_trimmed.fastq.gz ref=P.tabuliformis_V1.0_contig.fa,P.tabuliformis_V1.0_chr.fa basename=out_%_#.fq.gz

              But the merging reference step produces much smaller (8Gb) fasta, and the mapping step also produce warning/error as follows:

              Exception in thread "main" java.lang.AssertionError: Resizing to an non-longer array (2147483627); probable array size overflow.

              at structures.ByteBuilder.expand(ByteBuilder.java:606)

              at structures.ByteBuilder.append(ByteBuilder.java:379)

              at dna.FastaToChromArrays2.nextScaffold(FastaToChromArrays2.java:539)

              at dna.FastaToChromArrays2.makeNextChrom(FastaToChromArrays2.java:460)

              at dna.FastaToChromArrays2.makeChroms(FastaToChromArrays2.java:345)

              at dna.FastaToChromArrays2.main2(FastaToChromArrays2.java:153)

              at align2.RefToIndex.makeIndex(RefToIndex.java:147)

              at align2.BBMap.setup(BBMap.java:280)

              at align2.AbstractMapper.<init>(AbstractMapper.java:58)

              at align2.BBMap.<init>(BBMap.java:42)

              at align2.BBMap.main(BBMap.java:30)

              at align2.BBSplitter.main(BBSplitter.java:48)
              ---------------------------------

              Is there anyway for me to handle this large genome and proceed adequate merging and mapping?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X