PON-Sol2

Citing PON-Sol2

  1. PON-Sol2 was developed by Lianjie Zeng, Yang Yang and Mauno Vihinen. Manuscript describing the method has been submitted. Waiting...

  2. Citation for the original PON-Sol predictor is:

    Yang Yang, Abhishek Niroula, Bairong Shen, Mauno Vihinen PON-Sol: prediction of effects of amino acid substitutions on protein solubility Bioinformatics, Volume 32, Issue 13, 1 July 2016, Pages 2032–2034, https://doi.org/10.1093/bioinformatics/btw066.

Start prediction

To start using PON-Sol2, you just need to provide protein sequence and amino acid substitution information. For protein sequences, there are two ways to input: directly by providing FASTA sequences or list of IDs (GI, Ensemble ID or UniProt). These data can be written or pasted to the boxes in the input forms or uploaded as a file. Both types allow the submission of variants in multiple queries simultaneously.

In protein prediction, this routine predicts all the 19 possible single amino acid substitutions.

To look for an input example, please click the "Example" text on the input page.

Input FASTA sequences

If complete FASTA sequence(s) is available, you can paste it to the input FASTA sequences box. FASTA sequence(s) and amino acid substitution(s) must be provided, e-mail is optional. If an email is provided, the results will be sent to you by email when ready.

  1. FASTA sequences have to contain a header line starting with greater than sign (>) followed by amino acids sequence. Amino acids sequence has to start from a new line.
  2. Information for amino acid substitutions has to contain the same header line as the sequence. An amino acid substitution consists of three parts in HGVS format: original amino acid, position, and new amino acid. For example, "A2M" means the second amino acid A (alanine) is substituted by M (methionine). Use single letter amino acid codes. Each protein sequence can contain multiple amino acid substitutions, each one indicated in a different line.

Example

  1. FASTA Sequence
  2. >AAK83653.1 11-beta hydroxysteroid dehydrogenase type 1 [Homo sapiens]
    MAFMKKYLLPILGLFMAYYYYSANEEFRPEMLQGKKVIVTGASKGIGREMAYHLAKMGAHVVVTARSKETLQKVVSHCLELGAASAHYIAGTMEDMTFAEQFVAQAGKLMGGLDMLILNHITNTSLNLFHDDIHHVRKSMEVNFLSYVVLTVAALPMLKQSNGSIVVVSSLAGKVAYPMVAAYSASKFALDGFFSSIRKEYSVSRVNVSITLCVLGLIDTETAMKAVSGIVHMQAAPKEECALEIIKGGALRQEEVYYDSSLWTTLLIRNPCRKILEFLYSTSYNMDRFINK
    
    >sp|Q8NH21|OR4F5_HUMAN Olfactory receptor 4F5 OS=Homo sapiens OX=9606 GN=OR4F5 PE=3 SV=1
    MVTEFIFLGLSDSQELQTFLFMLFFVFYGGIVFGNLLIVITVVSDSHLHSPMYFLLANLSLIDLSLSSVTAPKMITDFFSQRKVISFKGCLVQIFLLHFFGGSEMVILIAMGFDRYIAICKPLHYTTIMCGNACVGIMAVTWGIGFLHSVSQLAFAVHLLFCGPNEVDSFYCDLPRVIKLACTDTYRLDIMVIANSGVLTVCSFVLLIISYTIILMTIQHRPLDKSSKALSTLTAHITVVLLFFGPCVFIYAWPFPIKSLDKFLAVFYSVITPLLNPIIYTLRNKDMKTAIRQLRKWDAHSSVKF
    
  3. Amino Acid Substitution
  4. >AAK83653.1 11-beta hydroxysteroid dehydrogenase type 1 [Homo sapiens]
    M1A
    F3K
    >sp|Q8NH21|OR4F5_HUMAN Olfactory receptor 4F5 OS=Homo sapiens OX=9606 GN=OR4F5 PE=3 SV=1
    V2E
    T3F
    

Input Protein IDs

PON-Sol2 accepts also sequence IDs. Similar to FASTA Protein IDs, amino acid substitutions and types of IDs must be provided, and email is optional. If an email is provided, the results will be sent to you by email when ready.

The IDs should be preceded by greater than sign (>). After that, provide amino acid substitutions starting from the next line. All the variants in a protein in a single list. After that, details for another sequence can be provided. Substitutions are provided in the HGVS format.

Example

  1. ID and amino acid substitution
  2. >Q5VT03
    M1A
    F2Q
    >Q8NH21
    V2A
    
  3. ID Type
  4. UniProtKB/Swiss-Prot ID
    

Protein Prediction

Protein prediction will predict all the 19 possible single substitutions. Only one protein can be provided at a time. Provide the sequences either in FASTA format or use an ID (GI, Ensemble ID or UniProt). It will take a long time, please be patient and the result will be sent to email.

Prediction results

PON-Sol2 provides results on separate web pages and if e-mail is provided they are mailed to the submitter. For each submission, which is called "Task" and for each amino acid substitution "Record" there will be detailed pages.

Data sets

Extensive data mining was performed for obtaining cases for training and testing. Blind Test1 data set was originally used to test PON-Sol.  download

Dataset variations decrease no-change increase
Training dataset 5666 2798 1929 939
Blind Test1 dataset 46 12 22 12
Blind Test2 dataset 662 338 237 87

Blind Test1 data set was originally used to test PON-Sol. It can be download from here.  download

Mirror website

Contact

If you have any problems, please contact Lianjie Zeng (1709404010@stu.suda.edu.cn).