Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HiSeq small rna data adapter trimming using Adapter_trim.pl (mirTools)

    Hi,

    My HiSeq data for small RNA looks like this


    @HWI-ST705:254:C0G8HACXX:5:1101:1674:1995 1:N:0:AGTCAA
    TGAGATGAAGCACTGTAGCTCTGGAATTCTCGGGT
    +
    CCCFFFFFHHHHHJJJJJJJIJJJJJJJJJJJJJH
    @HWI-ST705:254:C0G8HACXX:5:1101:1765:1986 1:N:0:AGTCAA
    TGAGAACTGAATTCCATAGGCTGTTGGAATTCTCG
    +
    BCCFFFFDHFHHHIJJJIJJJJJHHJJEHIJIJJJ
    @HWI-ST705:254:C0G8HACXX:5:1101:1785:1990 1:N:0:AGTCAA
    GCTCTGTGATGAACCCTGGAATTCTCGGGTGCCAA
    +
    =?@DFFFA=AADFHIJJJJGAHHGIIIJI?CFGHC
    @HWI-ST705:254:C0G8HACXX:5:1101:1825:1999 1:N:0:AGTCAA
    TTTGGCAATGGTAGAACTCACACCTGGAATTCTCG
    +
    CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJ
    @HWI-ST705:254:C0G8HACXX:5:1101:2182:1985 1:N:0:AGTCAA
    CAACNGAATCCCAAAAGCAGCTGTGGAATTCTCGG
    +
    @@@D#2=BDDHFHBGHHIIIGHHHIGGGHH<?DHI
    @HWI-ST705:254:C0G8HACXX:5:1101:2106:1988 1:N:0:AGTCAA
    TAGCTTATCAGACTGATGTTGACTTGGAATTCTCG
    +
    ??@FFFD+=CFFFHGIJJGIHHHHHJCFHEHHHDH
    @HWI-ST705:254:C0G8HACXX:5:1101:2543:1995 1:N:0:AGTCAA
    TTCACAGTGGCTAAGTTCTGCTGGAATTCTCGGGT
    +
    CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJH

    When I use the Adapter_trim.pl script from miRTools with format option "3" (for illumina format 1.3+) or evn "2" for the older formats ..I get a empty output file..

    The previous illumina datasets had the complete IDs repeated befire the quality value lines and the script used to work good for me...

    Any suggestions?

    regards,

    Nandan
    The read grouping script from miRAnalyser also gives me issues for HiSeq dataset (again this worked well for GAII illumina data)

  • #2
    I don't know miRTools, but I am pretty convinced that Biopieces will be quite helpful for this. Especially find_adaptor.

    Comment


    • #3
      To my knowledge - best way to do adapter filtering/trimming is



      if you are not fan of unix system then try galaxy
      Galaxy is a community-driven web-based analysis platform for life science research.

      Comment


      • #4
        Hi all:
        I am new to Illumina sequencing. I have a very basic question. The sequence we get after running through the Illumina pipeline, does they contain adapters for all the reads or only few reads.

        Recently we did an sequencing run through Hiseq2000 (multiplexed) and the fastq file has only few reads containing (5%) adapters or primers. I used the adapter and primer sequences used in library prep (from illumina truseq).

        I read some where that when the pipeline demultiplex it trims the reads and removes the barcode.Is it true.

        Please reply or direct me to some literature that explains the basic.

        Thank you

        Comment


        • #5
          i am facing a similar problem..

          I was interested in getting some information from a publicly available hiseq2000 small RNA seq data from drosophila.
          However, the library was prepared by cloning and not using truseq (as reported in the SRA. Accession number SRR513393).

          This isnt a great concern. I used fastqc to analyse the reads and the quality distribution seemed to be pretty okay (PFA). However, no overrepresented sequence was detected and I am unsure of the sequence of adapters. The reads are 50 nt long (more than twice the size of any miRNA or similar RNAs).

          I used bowtie v0.12.9 to align the reads against the drosophila transcriptome index that I built from flybase transcripts release v5.49, with options (-v 2 --norc -a --best --strata). No read got aligned, and I suspect that it might be because of some bogus sequence filling up the ends. I am not able to detect what those bogus sequences might be.

          Any tips for preprocessing.

          Plus, when I used tophat to align the reads against genome index with annotations provided from GFF file v5.49 from flybase, then tophat stopped with a report "gtf_to_fasta returned an error" [ isn't tophat supposed to accept GFF v3 files ??]
          Attached Files

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          29 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X