Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data integration for robust network-based disease gene prediction
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

For many complex diseases the cause/mechanism can be tied not to a single gene and in order to cope with the complexity a systems wide approach is needed. By combining evidence indicative of functional association it is possible to infer networks of protein functional coupling. The reliability of these networks is dependent on having sufficient data and on the data being informative.

By combining evidence from multiple species, functional coupling networks can reach higher coverage and accuracy. Genes in different species derived from the same gene by a speciation event are orthologous and likely to have a conserved function. In order to enable the transfer of information across species we inferred orthology with the InParanoid algorithm and made the inferences available to the public in the associated database.

Identification of genes involved in diseases is an important biomedical goal. Based on the "guilt by association" principle, we implemented an approach, Maxlink, for identifying and prioritizing novel disease genes. By searching the FunCoup network for genes functionally coupled to cancer genes we identified some 1800 novel cancer gene candidates showing characteristics of cancer genes.

While proteins are the active components, mRNA is often used as a proxy due to the difficulty of measuring protein abundance. We examined the relationship between mRNA and protein, using properties of expression profiles to identify subsets of genes with higher mRNA-protein concordance.

If technical and biological differences between patient/control studies of gene expression have a large impact, the results of studies of the same disease might be inconsistent. To determine this impact we examined the consistency in differential (co)expression between different studies of cancer, as well as non-cancer studies. Such consistency could generally be found, even between studies of different diseases, but only when common pitfalls of gene expression analysis are avoided.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2013. , 71 p.
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
URN: urn:nbn:se:su:diva-87962ISBN: 978-91-7447-629-3 (print)OAI: oai:DiVA.org:su-87962DiVA: diva2:608489
Public defence
2013-04-12, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 5: Manuscript.

Available from: 2013-03-21 Created: 2013-02-27 Last updated: 2013-03-18Bibliographically approved
List of papers
1. InParanoid 6: eukaryotic ortholog clusters with inparalogs
Open this publication in new window or tab >>InParanoid 6: eukaryotic ortholog clusters with inparalogs
2008 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 36, D263-D266 p.Article in journal (Refereed) Published
Abstract [en]

The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.

Keyword
Animals, Cluster Analysis, Databases, Protein, Gene Duplication, Humans, Internet, Phylogeny, Proteins/*genetics, Proteomics
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-17878 (URN)10.1093/nar/gkm1020 (DOI)000252545400048 ()18055500 (PubMedID)
Available from: 2009-01-21 Created: 2009-01-21 Last updated: 2013-02-28Bibliographically approved
2. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Open this publication in new window or tab >>InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Show others...
2010 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 38, no 1, D196-D203 p.Article in journal (Refereed) Published
Abstract [en]

The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-34279 (URN)10.1093/nar/gkp931 (DOI)000276399100030 ()19892828 (PubMedID)
Available from: 2010-01-18 Created: 2010-01-07 Last updated: 2017-12-12Bibliographically approved
3. Network-based Identification of Novel Cancer Genes
Open this publication in new window or tab >>Network-based Identification of Novel Cancer Genes
2010 (English)In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 9, no 4, 648-655 p.Article in journal (Refereed) Published
Abstract [en]

Genes involved in cancer susceptibility and progression can serve as templates for searching protein networks for novel cancer genes. To this end, we introduce a general network searching method, MaxLink, and apply it to find and rank cancer gene candidates by their connectivity to known cancer genes. Using a comprehensive protein interaction network, we searched for genes connected to known cancer genes. First, we compiled a new set of 812 genes involved in cancer, more than twice the number in the Cancer Gene Census. Their network neighbors were then extracted. This candidate list was refined by selecting genes with unexpectedly high levels of connectivity to cancer genes and without previous association to cancer. This produced a list of 1891 new cancer candidates with up to 55 connections to known cancer genes. We validated our method by cross-validation, Gene Ontology term bias, and differential expression in cancer versus normal tissue. An example novel cancer gene candidate is presented with detailed analysis of the local network and neighbor annotation. Our study provides a ranked list of high priority targets for further studies in cancer research. Supplemental material is included.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-49937 (URN)10.1074/mcp.M900227-MCP200 (DOI)000276379400005 ()
Note

authorCount :3

Available from: 2010-12-20 Created: 2010-12-20 Last updated: 2017-12-11Bibliographically approved
4. Quality criteria for finding genes with high mRNA-protein expression correlation and coexpression correlation
Open this publication in new window or tab >>Quality criteria for finding genes with high mRNA-protein expression correlation and coexpression correlation
2012 (English)In: Gene, ISSN 0378-1119, E-ISSN 1879-0038, Vol. 497, no 2, 228-236 p.Article in journal (Refereed) Published
Abstract [en]

mRNA expression is widely used as a proxy for protein expression. However, their true relation is not known and two genes with the same mRNA levels might have different abundances of respective proteins. A related question is whether the coexpression of mRNA for gene pairs is reflected by the corresponding protein pairs. We examined the mRNA-protein correlation for both expression and coexpression. This analysis yielded insights into the relationship between mRNA and protein abundance, and allowed us to identify subsets of greater mRNA-protein coherence. The correlation between mRNA and protein was low for both expression and coexpression, 0.12 and 0.06 respectively. However, applying the best-performing quality measure, high-quality subsets reached a Spearman correlation of 0.31 for expression, 034 for coexpression and 0.49 for coexpression when restricted to functionally coupled genes. Our methodology can thus identify subsets for which the mRNA levels are expected to be the strongest correlated with protein levels.

Keyword
mRNA expression, mRNA coexpression, Protein expression, Protein coexpression, mRNA-protein expression concordance, Microarray
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-76043 (URN)10.1016/j.gene.2012.01.029 (DOI)000302584300012 ()
Note

2

Available from: 2012-05-09 Created: 2012-05-08 Last updated: 2017-12-07Bibliographically approved
5. Pitfalls in gene (co)expression meta-analysis
Open this publication in new window or tab >>Pitfalls in gene (co)expression meta-analysis
(English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-87964 (URN)
Available from: 2013-02-27 Created: 2013-02-27 Last updated: 2013-02-28Bibliographically approved

Open Access in DiVA

fulltext(1161 kB)391 downloads
File information
File name FULLTEXT01.pdfFile size 1161 kBChecksum SHA-512
45131884bc164045c4add01c9e47c957cf5d9263e35929ccd82eb45c35c349e0bbee4b76c1dc729f2c6117771d9cc7699d518ef842e2b34d7422f5dc77b0c52e
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Östlund, Gabriel
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 391 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 270 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf