StructureMapper - Information

StructureMapper Algorithm

If you find the StructureMapper algorithm useful, please cite the following article in your work:

StructureMapper: a high-throughput algorithm for analyzing and mapping protein sequence locations to structural data

Anssi Nurminen¹ and Vesa P. Hytönen^1,2

¹Faculty of Medicine and Life Sciences and BioMediTech, University of Tampere, Arvo Ylpön katu 34, 33520 Tampere, Finland
²Fimlab Laboratories, Biokatu 4, 33520 Tampere, Finland

Keywords: bioinformatics; algorithm; protein structure; protein sequence; accessible surface area.

Motivation: StructureMapper is a high-throughput algorithm for automated mapping of protein primary amino sequence locations to existing three-dimensional protein structures. The algorithm is intended for facilitating easy and efficient utilization of structural information in protein characterization and proteomics. StructureMapper provides an analysis of the identified structural locations that includes surface accessibility, flexibility, protein-protein interfacing, intrinsic disorder prediction, secondary structure assignment, biological assembly information, and sequence identity percentages, among other metrics.
Results: We have showcased the use of the algorithm by estimating the coverage of structural infor-mation of the human proteome, identifying critical interface residues in DNA polymerase γ, profiling structurally protease cleavage sites and post-translational modification sites, and by identifying puta-tive, novel phosphoswitches.
Availability: The StructureMapper algorithm is available as an online service and standalone imple-mentation at http://structuremapper.uta.fi.

Full Article , Bioinformatics 2018, Open access

Downloads

The StructureMapper algorithm is an open-source (MIT-license) Python algorithm available for download at github.

Storage of results

The StructureMapper online server does not guarantee long term storage of the query results. Query results are deleted without notice (FIFO) depending on server load to free up space on the server.

Result File Columns

Column

Description

DATASERIES

Name for the processing run.

SEQ_ID

Input POI id including sequence id and position of POI.

GENE

GENE information from input file fasta header.

DESC

Description of sequence from input fasta header.

N_STRUCTURES

Number of structures found and analyzed for the POI.

PYMOL

If Pymol (3rd party software) is installed and configured this column will contain a clickable link to view POI in result structure (1st BLAST structure).

PYMOL_BIOMOL

If Pymol (3rd party software) is installed and algorithm has been run with the '--biomol' option this column will contain a clickable link to open result biomol (1st structure).

ASA_AVG

AVG Accessible Surface Area percentage for the POI in resulting structures. This column will contain the value from BIOL_ASA_AVG if available, otherwise from ISOL_ASA_AVG. Values 0-15.0% Buried, 15.0-25.0 Intermediate, 25.0-100.0% Surface.

TEMPF_SCORE

Normalized Tempf score averaged between result structures. Tempf can be used as a measure of the flexibility of the structure at POI. The values are dependent on the method and the resolution the structure. The TEMPF_SCORE column normalizes (0.0-100.0) the tempf value of each POI within its structure file. Where a score of 100.0 means that the POI has the maximum tempf value within the structure (high flexibility) and a score of 0.0 means that it is among the least flexible regions in the structure.

IUPRED

Prediction of disorder at POI, values above 0.5 are predicted to be intrinsically disordered. For more information: http://iupred.enzim.hu/

DSSP

Secondary structure at POI. http://swift.cmbi.ru.nl/gv/dssp/

ASYM_ASA_DECISION

Most common description of POI location in the native PDB (asymmetric unit) result structures (Surface/Intermediate/Buried)

ASYM_ASA_DECISIONS

All descriptions of POI location in the native PDB (asymmetric unit) result structures (Surface/Intermediate/Buried)

ASYM_ASA_VALS

Accessible surface Area percentages of POI in native PDB (asymmetric unit) result structures. If POI contains multiple residues, values are averaged.

ASYM_ASA_AVG

AVG of values in column ASYM_ASA_VALS

ISOL_ASA_DECISION

Most common description of POI location in the isolated chain in result structures (Surface/Intermediate/Buried)

ISOL_ASA_DECISIONS

All descriptions of POI location in the isolated chain in result structures (Surface/Intermediate/Buried)

ISOL_ASA_VALS

Accessible surface Area percentages of POI in isolated chain in result structures. If POI contains multiple residues, values are averaged.

ISOL_ASA_AVG

AVG of values in column ISOL_ASA_VALS

BIOL_ASA_DECISION

Most common description of POI location in the determined biological assembly in result structures (Surface/Intermediate/Buried)

BIOL_ASA_DECISIONS

All descriptions of POI location in the determined biological assembly in result structures (Surface/Intermediate/Buried)

BIOL_ASA_VALS

Accessible surface Area percentages of POI in determined biological assembly in result structures. If POI contains multiple residues, values are averaged.

BIOL_ASA_AVG

AVG of values in column BIOL_ASA_VALS

TEMPF_AVG

AVG of TempF values at POI in result structures

ASYM_ASA_POI_DELTA

Difference between ASYM_ASA_POI_MIN and ASYM_ASA_POI_MAX

ASYM_ASA_POI_MIN

Smallest ASA value in POI residues

ASYM_ASA_POI_MAX

Largest ASA value in POI residues

ISOL_ASA_POI_DELTA

Difference between ISOL_ASA_POI_MIN and ISOL_ASA_POI_MAX

ISOL_ASA_POI_MIN

Smallest ASA value in POI residues

ISOL_ASA_POI_MAX

Largest ASA value in POI residues

BIOL_ASA_POI_DELTA

Difference between BIOL_ASA_POI_MIN and BIOL_ASA_POI_MAX

BIOL_ASA_POI_MIN

Smallest ASA value in POI residues

BIOL_ASA_POI_MAX

Largest ASA value in POI residues

INTERFACE

No/Homomer/Heteromer. The POIs interface is determined based on the ASA_DECISON columns. If the Isolated ASA is on the surface but the biological unit ASA (or Asymmetric unit ASA, if biological unit has not been determined) is intermediate or buried the POI is on an interface. If the occluding chain is a duplicate of the POI chain, the interface is Homomer, otherwise Heteromer. This columns shows the most common type of interface in the result structures.

INTERFACES

The interface in each result structure file.

INT_ASA_DELTA

Changes in percentage in result structures between ASA in isolated POI chain vs. biological assembly chain (or asym unit if biol. not available). Use these values instead of INTERFACE column if more subtle changes than surface/ buried are significant.

INT_ASA_DELTA_AVG

Average of values in INT_ASA_DELTA

TEMPF_POI_VALS

POI tempf (B-col) values from result structures. If POI contains multiple residues, their value is averaged.

TEMPF_POI_MIN

Minimum POI tempf values in result structures.

TEMPF_POI_MAX

Maximum POI tempf values in result structures.

TEMPF_FILE_MIN

Minimum Tempf values within result structures.

TEMPF_FILE_MAX

Maximum Tempf values within result structures.

TEMPF_DELTA

Max - Min Tempf value within result structures.

PDB_FILES

PDB files used in analysis.

PDB_SITES

Description of the POI residue in the PDB files.

BLAST_RANKS

Blast ranking of the used PDB structures. 1 meaning the highest scoring homologous structure according to BLAST.

ENTRY

SEQ_ID with BLAST ranking.

SEQ_POIPOS

Amino acid position of POI in input sequence.

SEQ_POISEQ

Sequence surrounding the POI in input sequence.

PDBSEQ

Sequence surrounding the POI in result PDB structures (in primary sequence). Comparison with SEQ_POISEQ has been used to locate the POI in the structure file.

ALIGNMENTS

Simple alignment scoring between SEQ_POIPOS and SEQ_POISEQ. Scores range from 0 to 100, where 100 means a perfect identity, and 0 no match at all.

ALIGN_AVG

Average of alignment scores (can be used to determine reliablity)

BITSCORE_AVG

BLAST bitscore AVG for resulting homologous structures. This value is dependendent of used blast query sequence window size.

IDENTITY_AVG

Percentage of identical residues around the POI in input sequence vs. homologous structure.

METHODS

List of methods by witch result structures have been acquired

R-FREE_AVG

Reported R-FREE values from result PDB files

RESOLUTIONS

Reported Resolutions from result PDB files

ORGANISM_SCIENTIFIC

Reported organisms from result PDB files

ORGANISM_TAXID

Reported organism txonomy id numbers from result PDB files