Seqanswers Leaderboard Ad

**Brian Bushnell** · 09-10-2015, 09:27 AM

You can do that with "filterbyname.sh" in the BBMap package.

filterbyname.sh in=reads.fq out=filtered.fq include=t names=names.txt

...where names.txt has 1 name per line. Or, you can say "names=X01032:109:000000000-AGKF7:1:1101:11950:1779" instead. This program will include reads that have non-matching stuff after the first whitespace. You should not include the leading "@" in the query, as it is not part of the name. But, if you do include the leading @ for whatever reason, then add the flag "truncateheadersymbol".

**loba17** · 09-11-2015, 12:48 AM

Works - problem solved!

Dear Brian,

thanks for your suggestion!

I downloaded bbmap and I tried filterbyname.sh

Code:

filterbyname.sh in=in.fq out=out.fq names=select.list include=t truncateheadersymbol

Input is being processed as unpaired
Time:               53.202 seconds.
Reads Processed:    5747570 	108.03k reads/sec
Bases Processed:    2296943848 	43.17m bases/sec
Reads Out:          65246
Bases Out:          25944173

Number of reads for in.fq: 5,747,570
Number of headers selected: 66,182
Number of reads for out.fq: 65,246

Works great and I really like the output summary!

Question 1: Is there a way (setting) to get a list of the records that did not match?

Question 2: bbmap seems to be a nice and very useful collection of tools - thanks a lot! - but is there an overview or a summary that would describe the tools briefly.

Thanks for the help !

**GenoMax** · 09-11-2015, 08:17 AM

Originally posted by loba17 View Post

Question 2: bbmap seems to be a nice and very useful collection of tools - thanks a lot! - but is there an overview or a summary that would describe the tools briefly.

Thanks for the help !

See this thread for a recap of many things BBMap can do: http://seqanswers.com/forums/showthread.php?t=58221

I would suggest trying outu=filename with your command to see if that captures reads that did not match.

**Brian Bushnell** · 09-11-2015, 09:27 AM

Originally posted by GenoMax View Post

I would suggest trying outu=filename with your command to see if that captures reads that did not match.

You know, to be consistent, I should really add that (I'll make a note to do so)! Unfortunately filterbyname does not currently capture outu. Instead, you need to run it twice, with "include=t" to capture the matching reads, and "include=f" to capture the nonmatching reads.

**loba17** · 09-14-2015, 04:55 AM

Thanks

Dear Brian, thanks for the clarification and the help.

**maubp** · 09-21-2015, 03:38 AM

My Python script with a Galaxy interface:

pico_galaxy/tools/seq_filter_by_id at master · peterjc/pico_galaxy

https://github.com/peterjc/pico_galaxy/tree/master/tools/seq_filter_by_id

Galaxy tools and wrappers for sequence analysis. Contribute to peterjc/pico_galaxy development by creating an account on GitHub.

Galaxy | Tool Shed

http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Illumina Fastq Header Search

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News