SEMANTIC SIMILARITY AND RELATEDNESS
About
- Semantic similarity and relatedness measures quantify the degree to which two concepts are similar (e.g. liver-organ) or related (e.g. headache-aspirin). The automated discovery of groups of semantically similar or related concepts and terms is critical to improving the retrieval and clustering of biomedical and clinical documents, and the development of biomedical terminologies and ontologies.
- Relatedness measures quantify the degree to which two words are associated with each other (scissors-paper). Similarity is a subset of relatedness and quantifies how alike two concepts are based on their location within an is-a hierarchy (car-vehicle). The score assigned to a term pair indicates the degree to which the terms are connected together through is-a relations. For example, "Lung Cancer" is-a type of "Disease" and therefore would receive a high similarity score, but "Lung Cancer" and "Coughing" would not receive a high similarity score, although the two are clearly related.
- In this work, we are exploring taxonomy based metrics, corpus based metrics and hybrids.
Software
- UMLS-Similarity -- a suite of Perl modules that implement a number of semantic similarity measures. The measures use the UMLS-Interface module to access the UMLS to generate similarity scores between concepts. Currently, this package includes programs that implement the similarity measures described by Leacock & Chodorow (1998), Wu & Palmer (1994), Nguyen & Al-Mubaid (2006), Rada, et. al. (1989), Jiang & Conrath (1997), Resnik (1995) and Lin (1998), and the relatedness measures proposed by Banerjee & Pedersen (2002) and Patwardhan (2003).
Publications
- Measuring the quality of patient-physician communication.
Cuffy, Clint, Nao Hagiwara, Scott Vrana, and Bridget T. McInnes.
Journal of biomedical informatics 112 (2020): 103589.
- Association measures for estimating semantic similarity and relatedness between biomedical concepts.
Sam Henry, Alex McQuilkin and Bridget T. McInnes.
Artificial intelligence in medicine 93, 1-10, 2019.
- Unsupervised Text Similarity.
Clint Cuffy, Sam Henry and Bridget T. McInnes.
The 2019 n2c2/OHNLP Shared-Task and Workshop, Washington, D.C.
- Vector representations of multi-word terms for semantic relatedness.
Sam Henry, Clint Cuffy and Bridget McInnes. Journal of Biomedical Informatics (JBI), Volume 77, pages 111-119, January 2018.
- Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second--Order Vectors.
Bridget McInnes and Ted Pedersen.
In Proceedings of the 16th Workshop on Biomedical Natural Language Processing (BioNLP) at the Association of Computational Linguistics, 2017.
- VCU at Semeval-2016 Task 14: Evaluating Similarity Measures for Semantic
Taxonomy Enrichment. Bridget T. McInnes. Appears in the Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval), June 2016.
- Evaluating Semantic Similarity and Relatedness over the Semantic Grouping of Clinical Term Pairs. Bridget T. McInnes and Ted Pedersen. Journal of Biomedical Informatics. 2015 Apr; 54:329-336.
- U-path: An undirected path-based measure of semantic similarity. Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and Genevieve B. Melton. To Appear in the Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA), November 2014.
- Determining the Difficulty of Word Sense Disambiguation. Bridget T. McInnes and Mark Stevenson. Journal of Biomedical Informatics. 2014 Feb; 47:83-90.
- Evaluating Measures of Semantic Similarity and Relatedness to Disambiguate Terms in Biomedical Text. Bridget T. McInnes and Ted Pedersen. Journal of Biomedical Informatics. 2013 December; 46(6):1116-24.
- UMLS::Similarity: Measuring the Relatedness and Similarity of Biomedical Concepts. Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and Genevieve B. Melton. In the Proceedings of the North American Association of Computational Linguistics Demonstration Systems. June 10-12, 2013, Atlanta, Georgia.
- Evaluating Semantic Relatedness and Similarity Measures with Standardized MedDRA Queries. Robert W. Bill, Ying Liu, Bridget T. McInnes, Genevieve B. Melton, Ted Pedersen, and Serguei Pakhomov. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA), November 2012.
- Semantic Relatedness Study Using Second Order Co-occurrence Vectors Computed from Biomedical Corpora, UMLS and WordNet. Ying Liu, Bridget T. McInnes, Ted Pedersen, Serguei Pakhomov and Genevieve B. Melton. Appears in the Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012, Miami, FL.
- Measuring the Similarity and Relatedness of Concepts in the Medical Domain : IHI 2012 Tutorial Overview. Ted Pedersen, Serguei Pakhomov, Bridget T. McInnes, and Ying Liu. Appears in the Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012, Miami, FL.
- Towards a Framework for Developing Semantic Relatedness Reference Standards. Serguei V. Pakhomov, Ted Pedersen, Bridget T. McInnes, Genevieve B. Melton, Alexander Ruggieri, and Christopher G. Chute. Journal of Biomedical Informatics. 2011 44(2), 251-265.
- Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms Using Information Content Measures of Similarity. Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and Genevieve B. Melton. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA). Oct. 2011, Washington DC.
- Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. Serguei Pakhomov, Bridget T. McInnes, Terrence Adam, Ying Liu, Ted Pedersen and Genevieve B. Melton. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association. Nov 13-17, 2010, Washington, DC.
- UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity. Bridget T. McInnes, Ted Pedersen and Serguei V. Pakhomov. In the Proceedings of the Annual Symposium of the American Medical Informatics Association, Nov 14-18, 2009, San Francisco, CA