Structure Mapper Online

:.: Home :.: Query :.: Information :.: Contact :.:

StructureMapper Algorithm

If you find the StructureMapper algorithm useful, please cite the following article in your work:

StructureMapper: a high-throughput algorithm for analyzing and mapping protein sequence locations to structural data

Anssi Nurminen1 and Vesa P. Hytönen1,2

1Faculty of Medicine and Life Sciences and BioMediTech, University of Tampere, Arvo Ylpön katu 34, 33520 Tampere, Finland
2Fimlab Laboratories, Biokatu 4, 33520 Tampere, Finland

Keywords: bioinformatics; algorithm; protein structure; protein sequence; accessible surface area.

Motivation: StructureMapper is a high-throughput algorithm for automated mapping of protein primary amino sequence locations to existing three-dimensional protein structures. The algorithm is intended for facilitating easy and efficient utilization of structural information in protein characterization and proteomics. StructureMapper provides an analysis of the identified structural locations that includes surface accessibility, flexibility, protein-protein interfacing, intrinsic disorder prediction, secondary structure assignment, biological assembly information, and sequence identity percentages, among other metrics.
Results: We have showcased the use of the algorithm by estimating the coverage of structural infor-mation of the human proteome, identifying critical interface residues in DNA polymerase γ, profiling structurally protease cleavage sites and post-translational modification sites, and by identifying puta-tive, novel phosphoswitches.
Availability: The StructureMapper algorithm is available as an online service and standalone imple-mentation at http://structuremapper.uta.fi.

Full Article , Bioinformatics 2018, Open access

Downloads

The StructureMapper algorithm is an open-source (MIT-license) Python algorithm available for download at github.

StructureMapper :.:Full Algorithm (at GitHub )
ASA_UTA :.: Algorithm for Accessible surface area (ASA) calculations (at GitHub )
AA_sampler :.: Script for making randomized aa selections (sampling)
with a given pattern (at GitHub )

Storage of results

The StructureMapper online server does not guarantee long term storage of the query results. Query results are deleted without notice (FIFO) depending on server load to free up space on the server.

Result File Columns

ColumnDescription
DATASERIES Name for the processing run.
SEQ_ID Input POI id including sequence id and position of POI.
GENE GENE information from input file fasta header.
DESC Description of sequence from input fasta header.
N_STRUCTURES Number of structures found and analyzed for the POI.
PYMOL If Pymol (3rd party software) is installed and configured this column will contain a clickable link to view POI in result structure (1st BLAST structure).
PYMOL_BIOMOL If Pymol (3rd party software) is installed and algorithm has been run with the '--biomol' option this column will contain a clickable link to open result biomol (1st structure).
ASA_AVG AVG Accessible Surface Area percentage for the POI in resulting structures. This column will contain the value from BIOL_ASA_AVG if available, otherwise from ISOL_ASA_AVG. Values 0-15.0% Buried, 15.0-25.0 Intermediate, 25.0-100.0% Surface.
TEMPF_SCORE Normalized Tempf score averaged between result structures. Tempf can be used as a measure of the flexibility of the structure at POI. The values are dependent on the method and the resolution the structure. The TEMPF_SCORE column normalizes (0.0-100.0) the tempf value of each POI within its structure file. Where a score of 100.0 means that the POI has the maximum tempf value within the structure (high flexibility) and a score of 0.0 means that it is among the least flexible regions in the structure.
IUPRED Prediction of disorder at POI, values above 0.5 are predicted to be intrinsically disordered. For more information: http://iupred.enzim.hu/
DSSP Secondary structure at POI. http://swift.cmbi.ru.nl/gv/dssp/
ASYM_ASA_DECISION Most common description of POI location in the native PDB (asymmetric unit) result structures (Surface/Intermediate/Buried)
ASYM_ASA_DECISIONS All descriptions of POI location in the native PDB (asymmetric unit) result structures (Surface/Intermediate/Buried)
ASYM_ASA_VALS Accessible surface Area percentages of POI in native PDB (asymmetric unit) result structures. If POI contains multiple residues, values are averaged.
ASYM_ASA_AVG AVG of values in column ASYM_ASA_VALS
ISOL_ASA_DECISION Most common description of POI location in the isolated chain in result structures (Surface/Intermediate/Buried)
ISOL_ASA_DECISIONS All descriptions of POI location in the isolated chain in result structures (Surface/Intermediate/Buried)
ISOL_ASA_VALS Accessible surface Area percentages of POI in isolated chain in result structures. If POI contains multiple residues, values are averaged.
ISOL_ASA_AVG AVG of values in column ISOL_ASA_VALS
BIOL_ASA_DECISION Most common description of POI location in the determined biological assembly in result structures (Surface/Intermediate/Buried)
BIOL_ASA_DECISIONS All descriptions of POI location in the determined biological assembly in result structures (Surface/Intermediate/Buried)
BIOL_ASA_VALS Accessible surface Area percentages of POI in determined biological assembly in result structures. If POI contains multiple residues, values are averaged.
BIOL_ASA_AVG AVG of values in column BIOL_ASA_VALS
TEMPF_AVG AVG of TempF values at POI in result structures
ASYM_ASA_POI_DELTA Difference between ASYM_ASA_POI_MIN and ASYM_ASA_POI_MAX
ASYM_ASA_POI_MIN Smallest ASA value in POI residues
ASYM_ASA_POI_MAX Largest ASA value in POI residues
ISOL_ASA_POI_DELTA Difference between ISOL_ASA_POI_MIN and ISOL_ASA_POI_MAX
ISOL_ASA_POI_MIN Smallest ASA value in POI residues
ISOL_ASA_POI_MAX Largest ASA value in POI residues
BIOL_ASA_POI_DELTA Difference between BIOL_ASA_POI_MIN and BIOL_ASA_POI_MAX
BIOL_ASA_POI_MIN Smallest ASA value in POI residues
BIOL_ASA_POI_MAX Largest ASA value in POI residues
INTERFACE No/Homomer/Heteromer. The POIs interface is determined based on the ASA_DECISON columns. If the Isolated ASA is on the surface but the biological unit ASA (or Asymmetric unit ASA, if biological unit has not been determined) is intermediate or buried the POI is on an interface. If the occluding chain is a duplicate of the POI chain, the interface is Homomer, otherwise Heteromer. This columns shows the most common type of interface in the result structures.
INTERFACES The interface in each result structure file.
INT_ASA_DELTA Changes in percentage in result structures between ASA in isolated POI chain vs. biological assembly chain (or asym unit if biol. not available). Use these values instead of INTERFACE column if more subtle changes than surface/ buried are significant.
INT_ASA_DELTA_AVG Average of values in INT_ASA_DELTA
TEMPF_POI_VALS POI tempf (B-col) values from result structures. If POI contains multiple residues, their value is averaged.
TEMPF_POI_MIN Minimum POI tempf values in result structures.
TEMPF_POI_MAX Maximum POI tempf values in result structures.
TEMPF_FILE_MIN Minimum Tempf values within result structures.
TEMPF_FILE_MAX Maximum Tempf values within result structures.
TEMPF_DELTA Max - Min Tempf value within result structures.
PDB_FILES PDB files used in analysis.
PDB_SITES Description of the POI residue in the PDB files.
BLAST_RANKS Blast ranking of the used PDB structures. 1 meaning the highest scoring homologous structure according to BLAST.
ENTRY SEQ_ID with BLAST ranking.
SEQ_POIPOS Amino acid position of POI in input sequence.
SEQ_POISEQ Sequence surrounding the POI in input sequence.
PDBSEQ Sequence surrounding the POI in result PDB structures (in primary sequence). Comparison with SEQ_POISEQ has been used to locate the POI in the structure file.
ALIGNMENTS Simple alignment scoring between SEQ_POIPOS and SEQ_POISEQ. Scores range from 0 to 100, where 100 means a perfect identity, and 0 no match at all.
ALIGN_AVG Average of alignment scores (can be used to determine reliablity)
BITSCORE_AVG BLAST bitscore AVG for resulting homologous structures. This value is dependendent of used blast query sequence window size.
IDENTITY_AVG Percentage of identical residues around the POI in input sequence vs. homologous structure.
METHODS List of methods by witch result structures have been acquired
R-FREE_AVG Reported R-FREE values from result PDB files
RESOLUTIONS Reported Resolutions from result PDB files
ORGANISM_SCIENTIFIC Reported organisms from result PDB files
ORGANISM_TAXID Reported organism txonomy id numbers from result PDB files
:.: Disclaimer :.: :.: Protein Dynamics Group :.: University of Tampere 2017 :.: