Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very low map rate while mapping to denovo assebly

    Hi everyone,

    I am working on a species with little genome information available. I had 4 samples of RNA-seq. I would like to know how many genes were differentially expressed among these 4 samples. We used illumina GAII 100bp paired-end sequencing. First, I combined the all sequences to one single file (all left to one single left file and all right to one single right file). Using the combined sequences to do denovo assembly by trinity. Then, I mapped the sample sequences to the assembled sequences. However, I got very low map rate (about 2%) using paired-end sequences. If I used single-end sequence to map, I got about 40% mapping rate. What's the possible problem? It bothered me a whole week now. Thank you in advance for your help!

    Yan

  • #2
    Have you checked the quality and possible contamination in your reads? Trinity just changes everything to fasta then assembles quality/error ignorant, so it can help to get rid of the “junk” first.

    Also, if you had strand specific data, did you specify that correctly? It shouldn’t make a huge difference in the assembly quality (just it will all be the wrong strand), but if you have it reversed between the assembly and the mapping commands, that could explain it, as I believe the RSEM in the Trinity package will only try to align on the strands you tell it to.

    Also, if you quality trimmed your reads, you’ll need to use the raw (untrimmed) reads for RSEM. It doesn’t seem to like reads of varying lengths.

    Finally, how many reads do you have total and is this a vertebrate sized transcriptome? Maybe very few paired end reads are mapping simply because your transcriptomic coverage is so low you have very few assembled transcripts long enough to map both sides of the fragments?

    Comment


    • #3
      Hi Wallysb01,
      I checked the quality of reads got from Illumina. They are good. I didn't trim the reads, so all 100bp paired-end reads are used for building assembly and mapping step. I had strand specific data, and I am sure I specify that correctly.

      I got about 10M paired-end reads for a sample. I think it has a vertebrate sized transcriptome, which contains about 30,000 genes. The assembly I got from Trinity is total contig: about 87,000. Length N50:874bp. Not sure if this is possible as you mentioned: have very few assembled transcripts long enough to map both sides of the fragments?

      Or do you have other suggestions?
      Thanks very much!

      Comment


      • #4
        That all sounds pretty good. That assembly isn't huge in number or length, but it should be enough to give you a lot better than 4%. Did you convert read names to have the /1,/2 business?

        I can't think of what else the issue could be. Do you have some sort of non-de novo assembled transcripts you could check against, like a closely related EST set or reference genome?

        Comment


        • #5
          I'd try some by hand. Pick a single end read that maps and then look why the paired end read doesn't. It shouldn't be too hard to figure out where the paired-end read should go on a 1 kb contig. If you can find it by hand, then something about the strandedness or not interpreting the reverse complement paired end is going on. If it falls off the edge of the contig, then the assembly might need some tuning.
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Can you show the commands you are using for trinity and bowtie.

            Comment


            • #7
              I used the other denovo assembly for our species and got the similar mapping results using paired-end. That's really frustrating.
              The commands I used for Trinity and Bowtie are:
              Trinity assembly:
              Trinity.pl --seqType fq --JM 100G --left oyster-G_all_1.fq --right oyster-G_all_2.fq --CPU 6

              Bowtie alignment
              bowtie-build --offrate 1 all_assembly.fasta oyster_all
              bowtie -a -S -p 8 oyster_all -1 Oyster-2G-idx7_1.fastq -2 Oyster-2G-idx7_2.fastq oyster_2G.sam

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              56 views
              0 likes
              Last Post seqadmin  
              Working...
              X