Piphillin is a tool to infer metagenome from 16S rRNA OTU counts table and representative sequence of each OTU.

The algorithm uses nearest-neighbor match of 16S rRNA sequences to those in genomes with user specified identity cutoff. This simple algorithm enabled using up-to-date comprehensive genome database as a reference.  

1. Upload input files

OTU abundance table (size limit 2MB): File should contain each OTU in each row and each sample in each column. OTU column needs to have a header 'OTU'. Save the file as .csv format.

Representative sequence file (size limit 1MB): File should contain representative sequences of all OTUs in OTU count table in FASTA format.

Maximal file size is 2MB for the OTU table and 1MB for the fasta file [~ 100 samples & 3000 sequences]. If your data is larger than the limit, subset samples and submit in multiple sets or contact us for further assistance.

2. Choose reference database

Currently we support KEGG and/or BioCyc as reference database. You will get KEGG Ortholog (KO) count table by choosing 'KEGG' and BioCycReaction (RXN) count table by choosing 'BioCyc'. You can also select an older database version if you want.

3. Choose % identity cutoff

This % identity will be used to match representative sequences to 16S rRNA genes in reference genome database. Loose cutoff employs more sequence counts to be used in the analysis, however it may add noise and lower the accuracy of inferred metagenome. Stringent cutoff increases the accuracy of inferred metagenome although smaller number of sequences will be used in the analysis. For relatively well genome-characterized samples (such as clinical microbiome), cutoff of 97% gave the highest accuracy.

4. Submit data

You will receive a result download URL within 20 minutes by email. Make sure you have your correct email above.

