[ Japanese ]

4.12 PCDq - protein complex database constracted from protein-protein interactions (PPIs)

4.12.1 Overview of PCDq

PCDq is a database of a comprehensive annotation of human protein-protein interaction (PPI), together with the protein complexes identified from the PPI network from six PPI databases (BIND, DIP, MINT, HPRD, IntAct, and GNP_Y2H). Proteins are subjected to interact with other proteins or bio-molecules to perform their functions in the cell and many cellular processes are performed by protein complexes. Therefore the extraction of information about protein complex and function from PPI network is an important issue.

We predicted human protein complexes from integrated PPI network data by finding densely connected regions with their cluster properties in the PPI network by DPClus algorithm [6]. Then, the existence of the predicted complex were checked using literature information, summarized as protein complex quality index (CQI). Finally relevant data entities such as protein function, localization, structure, expression profile, gene locus, and binary interactions among complex member proteins and complex outside adjacent proteins were annotated. As a prominant feature of PCDq, it provides a landscape view or edit tool called "PPI-Map", which shows protein complexes, protein interactions, Complex-Complex Interactions (CCIs), graphically. Users can edit (delete, move, expand, etc.) their nodes and edges of the network.

In brief, PCDq provides an integrative information on PPI by "PPI view", protein complex information by "protein complex view", and network information including both PPI and Complex-Complex Interactions (CCIs) by "PPI-Map" (Figure 4.12.1).


Fig 4.12.1 Overview of the PCDq

4.12.2 Access to PCDq

You can access to PCDq (human protein complex database with quality index) at the top page (http://h-invitational.jp/hinv/pcdq/).

4.12.3 Data of PCDq

HIP: H-Invitational Protein

HIP is an originally predicted protein from H-InvDB transcriptional product. All proteins in H-InvDB are assigned their own HIP IDs. If one residue in two protein sequences is different, they have distinct HIP IDs.

Definition of identical proteins in the PCDq

The identical proteins in PCDq are defined as clusters of proteins concatenated by single linkage clustering in which their sequence identities are over or same 98% and the alignment length coverage to each other are over or same 95%.

Data sources

We collected PPI data from BIND, DIP, MINT, HPRD, IntAct, and GNP_Y2H.


Fig 4.12.2 Statistics of the PPI databases

We removed redundancies of the PPI data among the databases with the sequence similarity (98% identity in 95% alignment length coverage to each other) and integrated them with the H-InvDB proteins. As the result, we got 32,198 human PPIs comprised of 9,268 proteins.

Data download

You can download the data files from the download section of the H-InvDB (http://h-invitational.jp/hinv/dataset/download.cgi). From the menu "Results of complational analysis", find "Human protein complex database with quality index (PCDq), data set".

4.12.4 Explanation of the frequent icons

or F Link to the protein complex view.
or F Link to the protein-protein interaction (PPI) view. When there are parentheses, "()", they include the numbers of PPIs.
or F Link to the PPI -Map (network drawing tool) of the subject PPI or protein complex.


4.12.5 Complex Quality Index (CQI)

CQI (Complex Quality Index) is a reliability index of a protein complex annotation which we originally defined. We defined categories for member proteins of the predicted complexes at the complex annotation.

A CQI represents the numbers of the each protein categories in a complex, and total numbers of the complex, as bellow.

CQI: ../Total proteins of a complex

For example, a CQI "5.2.1/8" indicates that the complex has five, two and one proteins as Category I, II, and III respectively, and eight proteins in total.

CQI "5.0.0/5" indicates that the complex have five proteins which perfectly match to a known complex. CQI: 0.0.3/3 indicates that the complex have three proteins which have no complex evidences in literature references. We defined a complex which has no evidences in references as "a hypothetical protein complex" (all complex proteins are in Category III).

4.12.6 Search functions of the PCDq

The PCDq provides two search methods: one for protein complexes and another for proteins having PPI data. Both methods show a list of all annotated protein complexes.

Search for protein complexes
You can search protein complexes of your interest by keywords, name, description, (member protein's) Gene symbol, H-InvDB IDs (HIX: gene cluster ID, HIT: transcript ID, HIP: protein ID), accession No. (UniProt, RefSeq, GI, CCDS, PDB).

Search for proteins in PPIs
You can search proteins having PPI data of your interest by keywords, Gene symbol, H-InvDB IDs (HIX: gene cluster ID, HIT: transcript ID, HIP: protein ID), accession No. (UniProt, RefSeq, GI, CCDS, PDB).

Search results
It will display a view of hit protein complexes or proteins having PPI data of your query.

4.12.7 Protein complex annotation information page (Protein complex view)

The protein complex annotation has following items.

Complex type
Indicates the type of a protein complex.
Protein complex (stable interactions)
Functional unit (transient interactions)
Protein complex & functional unit (mixed)

Functional category(ies)
Description of the function(s) of a protein complex.

Subcellular localization(s)
Description of the subcellular location(s) of a protein complex.

Complex name
The name of a protein complex is specified.

Description and evidences
Manually annotated description with information extracted from references.

CQI (Complex Quality Index)
CQI is a reliability index for a protein complex, which come from the defined categories of complex member proteins.
See the section "4.12.5 Complex annotation Quality check Index (CQI)" above.

PubMed reference ID relations
It shows the PubMed IDs, data resources and experimental methods at the PPI level of among complex member proteins and complex outside adjacent proteins.


Fig 4.12.3 Protein complex annotation information page (protein complex view)

Details of complex member proteins
Detailed information of complex member proteins (red background) and complex outside adjacent proteins (blue background) which interact to the complex are shown (categories of the member proteins, accession No., protein definition, EC No., Gene symbol, links to H-InvDB or PPI information, 3D structure (PDB), InterPro domains, GO, prediction of subcellar localizations, and etc.).


Fig 4.12.4 Details of complex member proteins

4.12.8 Protein-Protein Interaction (PPI) information page (PPI view)

It shows human PPI information. The query protein (red title) is displayed above and a view of its interacting proteins (blue titles) is displayed below.


Fig 4.12.5 Protein-Protein Interaction (PPI) information page (PPI view)

It shows the following items of the each protein which interacts with the query protein.

Title
Protein ID: H-InvDB representative HIP ID
Symbol: A symbol of this protein
Links to the PPI information and the PPI map (network drawing tool) of this protein. () includes the number of PPIs of this protein.

Definitions
Annotation of the protein.

Protein Complex Information
Links to the complex information and the complex name to which the protein joins as a member are shown.

H-InvDB IDs
H-InvDB provides series of data such as gene clusters, transcriptional products (transcripts) from the gene clusters, and proteins predicted from the transcripts.
Gene cluster ID: HIX ID (H-InvDB gene cluster ID)
If you want to check a location of genes in a chromosome, please click IDs to access to H-InvDB Locus view.
Transcript ID: HIT ID (H-InvDB transcript ID)
If you want to check the detailed transcript information, please click IDs to access to H-InvDB transcript view.
Protein ID: HIP ID (H-InvDB protein ID)
It shows identical protein HIP IDs of the representative HIP ID.
Please see the section of "4.12.3 Data of the PCDq HIP: H-Invitational Protein" for the detailed HIP description.

Protein Accession Numbers
It shows the protein accession numbers of UniProt, RefSeq, GI, CCDS, and PDB, which link to the each database.

Human PPI cross reference information
Database cross reference information: Provides links to PPI databases (BIND, DIP, MINT, HPRD, IntAct, and GNP_Y2H) which we use as data sources.
PubMed IDs (Number of reporting PPIs): PubMed IDs of evidential publications for the PPI.
() includes the number of PPIs which the publication reports. It will be an indicator whether the PPI experiments had been performed as large scale or small scale.
*This number of reporting PPIs of the publication may be different to the actual numbers because of the data compiling procedures.
Experimental methods: Experimental methods to detect the PPI.


Fig 4.12.6 Information of a protein which interacts with the query protein

4.12.9 Network drawing tool of the PPIs and the CCIs (PPI-Map)

The PPI-Map present two ways of network drawing windows that the detailed network and the whole network of the Protein-Protein Interactions (PPIs) and the Complex-Complex Interactions (CCIs).
The both of them represent a protein as a minimum size of a node and a protein complex as a node size according to its member proteins. An interaction is represented as an edge and the edge width will be displayed according to "the Edge Weight" described bellow.

Edge Weight
This is the number of interactions (edge weight) among PPIs and CCIs.
The edge weight is always "1" when the interaction is PPI. The edge weight of between "protein - complex" or "complex - complex" interactions, however, will increase rapidly because interactions among member proteins of each complex are considered.

Colors of the annotated protein complexes
The colors of the annotated protein complexes are represented as below with the combinations of the protein types and the matching rate of known protein complexes of references.


Fig 4.12.7 Colors of the annotated protein complexes

PPI-Map detailed network window
A subject protein/complex is displayed at the center and its adjacent interacting proteins and complexes are displayed. The detailed information of the protein/complex is displayed at the bottom of the window. The new PPI-Map detailed network window will open when you focus and double click the presenting protein or complex in the window.
Mode button: Switch the indication of Gene Symbol - ID - None presentation.
Hide Node button: Hide the protein or complex of mouse focus.
Reload button: Initialize the window.
View CCI button: Open the PPI-Map whole network window.
Scale bar: Change the scale.


Fig 4.12.8 PPI-Map detailed network window

PPI map whole network window
The whole network of CCIs will be displayed.
The new PPI map detailed network window will open when you focus and double click the presenting complex in the window.
Edge Weight: You can choose the Edge Weight (number of interactions) to display the interactions above or same to it.
Mode button: Switch the indication of Gene Symbol - ID - None presentation.
Hide Node button: Hide the protein or complex of mouse focus.
Reload button: Initialize the window.
Scale bar: Change the scale.


Fig 4.12.9 PPI-Map whole network window

References

  1. Kikugawa S, Nishikata K, Murakami K, Sato Y, Suzuki M, Altaf-Ul-Amin M, Kanaya S, Imanishi T. PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted H-Invitational protein-protein interactions integrative dataset. BMC Syst. Biol. 2012. in press.
  2. Alfarano C, et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 2005 Jan 1;33
  3. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004 Jan 1;32
  4. Zanzoni A., Montecchi-Palazzi L., Quondam M., Ausiello G., Helmer-Citterich M. and Cesareni G. MINT: a Molecular INTeraction database. FEBS Letters. 2002, 513(1);135-140.
  5. Peri, S. et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research. 2003, 13:2363-2371.
  6. IntAct - an open source molecular interaction database. H. Hermjakob, L. Montecchi-Palazzi, C. Lewington, S. Mudali, S. Kerrien, S. Orchard, M. Vingron, B. Roechert, P. Roepstorff, A. Valencia, H. Margalit, J. Armstrong, A. Bairoch, G. Cesareni, D. Sherman, R. Apweiler. Nucl. Acids. Res. 2004 32: D452-D455.
  7. Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S: Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics 2006, 7:207.