If you find the StructureMapper algorithm useful, please cite the following article in your work:
StructureMapper: a high-throughput algorithm for analyzing and mapping protein
sequence locations to structural data
Anssi Nurminen1 and Vesa P. Hytönen1,2
1Faculty of Medicine and Life Sciences and BioMediTech, University of Tampere, Arvo Ylpön katu 34, 33520 Tampere, Finland
2Fimlab Laboratories, Biokatu 4, 33520 Tampere, Finland
Keywords: bioinformatics; algorithm; protein structure; protein sequence; accessible surface area.
Motivation: StructureMapper is a high-throughput algorithm for automated mapping of protein primary amino sequence locations to existing three-dimensional protein structures. The algorithm is intended for facilitating easy and efficient utilization of structural information in protein characterization and proteomics. StructureMapper provides an analysis of the identified structural locations that includes surface accessibility, flexibility, protein-protein interfacing, intrinsic disorder prediction, secondary structure assignment, biological assembly information, and sequence identity percentages, among other metrics.
Results: We have showcased the use of the algorithm by estimating the coverage of structural infor-mation of the human proteome, identifying critical interface residues in DNA polymerase γ, profiling structurally protease cleavage sites and post-translational modification sites, and by identifying puta-tive, novel phosphoswitches.
Availability: The StructureMapper algorithm is available as an online service and standalone imple-mentation at http://structuremapper.uta.fi.
Full Article
,
Bioinformatics 2018, Open access
The StructureMapper algorithm is an open-source (MIT-license) Python algorithm available for download at github.
The StructureMapper online server does not guarantee long term storage of the query results. Query results are deleted without notice (FIFO)
depending on server load to free up space on the server.
Column | Description |
DATASERIES | Name for the processing run. |
SEQ_ID | Input POI id including sequence id and position of POI. |
GENE | GENE information from input file fasta header. |
DESC | Description of sequence from input fasta header. |
N_STRUCTURES | Number of structures found and analyzed for the POI. |
PYMOL | If Pymol (3rd party software) is installed and configured this column will contain a clickable link to view POI in result structure (1st BLAST structure). |
PYMOL_BIOMOL | If Pymol (3rd party software) is installed and algorithm has been run with the '--biomol' option this column will contain a clickable link to open result biomol (1st structure). |
ASA_AVG | AVG Accessible Surface Area percentage for the POI in resulting structures. This column will contain the value from BIOL_ASA_AVG if available, otherwise from ISOL_ASA_AVG.
Values 0-15.0% Buried, 15.0-25.0 Intermediate, 25.0-100.0% Surface. |
TEMPF_SCORE | Normalized Tempf score averaged between result structures. Tempf can be used as a measure of the flexibility of the structure at POI. The values are dependent on the method and
the resolution the structure. The TEMPF_SCORE column normalizes (0.0-100.0) the tempf value of each POI within its structure file. Where a score of 100.0 means that the POI has
the maximum tempf value within the structure (high flexibility) and a score of 0.0 means that it is among the least flexible regions in the structure. |
IUPRED | Prediction of disorder at POI, values above 0.5 are predicted to be intrinsically disordered. For more information: http://iupred.enzim.hu/ |
DSSP | Secondary structure at POI. http://swift.cmbi.ru.nl/gv/dssp/ |
ASYM_ASA_DECISION | Most common description of POI location in the native PDB (asymmetric unit) result structures (Surface/Intermediate/Buried) |
ASYM_ASA_DECISIONS | All descriptions of POI location in the native PDB (asymmetric unit) result structures (Surface/Intermediate/Buried) |
ASYM_ASA_VALS | Accessible surface Area percentages of POI in native PDB (asymmetric unit) result structures. If POI contains multiple residues, values are averaged. |
ASYM_ASA_AVG | AVG of values in column ASYM_ASA_VALS |
ISOL_ASA_DECISION | Most common description of POI location in the isolated chain in result structures (Surface/Intermediate/Buried) |
ISOL_ASA_DECISIONS | All descriptions of POI location in the isolated chain in result structures (Surface/Intermediate/Buried) |
ISOL_ASA_VALS | Accessible surface Area percentages of POI in isolated chain in result structures. If POI contains multiple residues, values are averaged. |
ISOL_ASA_AVG | AVG of values in column ISOL_ASA_VALS |
BIOL_ASA_DECISION | Most common description of POI location in the determined biological assembly in result structures (Surface/Intermediate/Buried) |
BIOL_ASA_DECISIONS | All descriptions of POI location in the determined biological assembly in result structures (Surface/Intermediate/Buried) |
BIOL_ASA_VALS | Accessible surface Area percentages of POI in determined biological assembly in result structures. If POI contains multiple residues, values are averaged. |
BIOL_ASA_AVG | AVG of values in column BIOL_ASA_VALS |
TEMPF_AVG | AVG of TempF values at POI in result structures |
ASYM_ASA_POI_DELTA | Difference between ASYM_ASA_POI_MIN and ASYM_ASA_POI_MAX |
ASYM_ASA_POI_MIN | Smallest ASA value in POI residues |
ASYM_ASA_POI_MAX | Largest ASA value in POI residues |
ISOL_ASA_POI_DELTA | Difference between ISOL_ASA_POI_MIN and ISOL_ASA_POI_MAX |
ISOL_ASA_POI_MIN | Smallest ASA value in POI residues |
ISOL_ASA_POI_MAX | Largest ASA value in POI residues |
BIOL_ASA_POI_DELTA | Difference between BIOL_ASA_POI_MIN and BIOL_ASA_POI_MAX |
BIOL_ASA_POI_MIN | Smallest ASA value in POI residues |
BIOL_ASA_POI_MAX | Largest ASA value in POI residues |
INTERFACE | No/Homomer/Heteromer. The POIs interface is determined based on the ASA_DECISON columns. If the Isolated ASA is on the surface but the biological unit ASA
(or Asymmetric unit ASA, if biological unit has not been determined) is intermediate or buried the POI is on an interface. If the occluding chain is a duplicate
of the POI chain, the interface is Homomer, otherwise Heteromer. This columns shows the most common type of interface in the result structures. |
INTERFACES | The interface in each result structure file. |
INT_ASA_DELTA | Changes in percentage in result structures between ASA in isolated POI chain vs. biological assembly chain (or asym unit if biol. not available).
Use these values instead of INTERFACE column if more subtle changes than surface/ buried are significant. |
INT_ASA_DELTA_AVG | Average of values in INT_ASA_DELTA |
TEMPF_POI_VALS | POI tempf (B-col) values from result structures. If POI contains multiple residues, their value is averaged. |
TEMPF_POI_MIN | Minimum POI tempf values in result structures. |
TEMPF_POI_MAX | Maximum POI tempf values in result structures. |
TEMPF_FILE_MIN | Minimum Tempf values within result structures. |
TEMPF_FILE_MAX | Maximum Tempf values within result structures. |
TEMPF_DELTA | Max - Min Tempf value within result structures. |
PDB_FILES | PDB files used in analysis. |
PDB_SITES | Description of the POI residue in the PDB files. |
BLAST_RANKS | Blast ranking of the used PDB structures. 1 meaning the highest scoring homologous structure according to BLAST. |
ENTRY | SEQ_ID with BLAST ranking. |
SEQ_POIPOS | Amino acid position of POI in input sequence. |
SEQ_POISEQ | Sequence surrounding the POI in input sequence. |
PDBSEQ | Sequence surrounding the POI in result PDB structures (in primary sequence). Comparison with SEQ_POISEQ has been used to locate the POI in the structure file. |
ALIGNMENTS | Simple alignment scoring between SEQ_POIPOS and SEQ_POISEQ. Scores range from 0 to 100, where 100 means a perfect identity, and 0 no match at all. |
ALIGN_AVG | Average of alignment scores (can be used to determine reliablity) |
BITSCORE_AVG | BLAST bitscore AVG for resulting homologous structures. This value is dependendent of used blast query sequence window size. |
IDENTITY_AVG | Percentage of identical residues around the POI in input sequence vs. homologous structure. |
METHODS | List of methods by witch result structures have been acquired |
R-FREE_AVG | Reported R-FREE values from result PDB files |
RESOLUTIONS | Reported Resolutions from result PDB files |
ORGANISM_SCIENTIFIC | Reported organisms from result PDB files |
ORGANISM_TAXID | Reported organism txonomy id numbers from result PDB files |