The DCA webservice is a public server to analyse protein multiple sequence alignments (MSA) and compute direct couplings among amino acid pairs within the MSA. In order to use the service, a user must register with the only purpose to have a way to organize the jobs submitted by each user. The user will have access to the workbench tab, were jobs can be submitted and managed.
Users are encouraged to register, but that is not necessary to use the full feature-set of the service. Unregistered users are assigned a cryptographically secure guest username. That username is stored in a cookie. Should a user start jobs as a guest, and later sign-in as a registered user (or register a new account and sign in) all jobs created as a guest user will have their ownership transfered to that user’s registered username. Registration provides the advantage of being able to move from machine to machine, or clear one’s cookies without losing results.
The input of the DCA webservice is a multiple sequence alignment following the FASTA format. It is compatible with the FASTA MSA compiled by the Pfam Database (http://pfam.sanger.ac.uk/ ). Due to resource limitations we only allow MSA with a maximum length of 500 amino acids. We also impose a 30 MB limit on the size of the input file. The user also decides what is the identity threshold they want to use. The default value is 0.8 which treats sequences with more than 80% identity as equivalent.
The user can also specify a PDB number to map the amino acid sites in the Pfam family to an experimental structure. This mapping would then allow the user to compare the top DI ranked pairs of the family with the actual amino acids in a particular structure. This can help evaluating if the directly coupled pairs are physical contacts in the experimental structure or to investigate if their coupling might be related to conformational plasticity or complex formation.
In the future we plan to create MSA’s from user defined sequences and then analyze with DCA.
The output of the webserver is a Direct Information (DI) file which contains all the possible residue-residue pairings in the MSA and a their corresponding values for Direct Information. This file can then be used to make estimates of protein contacts, particularly the top highly ranked pairs. The output files consists of three columns, where the first two represent the each of the MSA sites and the third column contains the DI value.