Information Extraction
About
- Information extraction (IE) is the task of extracting information from unstructured text. IE has two main subtasks: Named Entity Recognition (NER) and Relation Extraction (RE). The goal of NER is to automatically identify and classify terms in text into pre-defined categories. The goal of RE is to classify the relationship between entities. We have explored extracting information from a number of domains including clinical texts, FDA drug labels, scientific literature and PubMed abstracts.
Software
- medaCy -- A freely available python package for named entity recogntion that includes three state-of-the-art models: CRF, BiLSTM+CRF and BERT+CRF.
- RelEx -- A freely available python package for relation extraction using Convolutional Neural Networks.
- Multitasking Transformers -- A freely available BERT-based multi-tasking architecture written in python which contains a NER head.
Publications
- MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning.
Andriy Mulyar, Ozlem Uzuner, and Bridget T. McInnes.
Journal of the American Medical Informatics Association 28.10 (2021): 2108-2115.
- Identifying Chemical Reactions and Their Associated Attributes in Patents.
Darshini Mahendran, Gabrielle Gurdin, Nastassja Lewinski, Christina Tang, and Bridget T. McInnes.
Frontiers in Research Metrics and Analytics. 6:688353; 2021.
-
Review of Temporal Reasoning in the Clinical Domain for Timeline Extraction: Where we are and where we need to be.
Amy Olex and Bridget T. McInnes.
Journal of Biomedical Informatics (2021): 103784.
- Extracting Adverse Drug Events from Clinical Notes.
Samantha Mahendran and Bridget T. McInnes.
In Proceedings of the 2020 American Medical Informatics Summit. March 2020.
- Psuedo-data generation for the extraction of Problems, Treatments and Tests.
Jeff Smith, Evan French, William Cramer, Ozlem Uzuner and Bridget T. McInnes.
In Proceedings of the 2020 American Medical Informatics Summit. March 2020.
- NLP@VCU: Identifying adverse effects in English tweets for unbalanced data.
Mahendran, Darshini, Cora Lewis, and Bridget McInnes.
In Proceedings of the Fifth Social Media Mining for Health Applications Workshop &
Shared Task, pp. 158-160. 2020. (code).
-
NLPatVCU CLEF 2020 ChEMU Shared Task System Description.
Mahendran, Darshini, Gabrielle Gurdin, Nastassja Lewinski, Christina Tang,
and Bridget T. McInnes.
In the Conference and Labs of the Evaluation Forum (CLEF) 2020 Working Notes.
- Adverse drug event detection using reason assignments in FDA drug labels.
Corey Sutphin, Kahyun Lee, Antonio Jimeno Yepes, Özlem Uzuner, and Bridget T. McInnes.
Journal of Biomedical Informatics 110 (2020): 103552.
- NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction.
Amy Olex, Luke Maffey, and Bridget T. McInnes.
In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 3682-3692, 2019.
- SciREL at SemEval-2018 Task 7: A System for Semantic Relation Extraction and Classification. Darshini Mahendran, Chathurika S. Wickramasinghe and Bridget T. McInnces. Proceedings of The 12th International Workshop on Semantic Evaluation, pages 853-857, June 2018.
- Chrono at SemEval-2018 Task 6: A System for Normalizing Temporal Expressions. Amy Olex, Luke Maffey, Nicholas Morgan, and B T. McInnes. Proceedings of The 12th International Workshop on Semantic Evaluation. pages 97-101, June 2018.
- An annotated corpus with nanomedicine and pharmacokinetic parameters.
Nastassja Lewinski, Ivan Jimenez and Bridget McInnes. Journal of Nanomedicine. 2017; 12: 7519–7527.
- Nanomedicine Entity Extraction. Bridget McInnes, Ryan Murphy, Gabrielle Jones,
Marley Hodson, Tanin Izadi, Ivan Jimenez, and Nastassja Lewinski. Appears in the American Medical
Informatics Symposium (AMIA), November 2016. (poster)
- END, an Annotated Nanomedicine Corpus. Nastassja Lewinski, Marley
Hodson, Tanin Izadi, Ivan Jimenez, Ryan Murphy, Gabrielle Jones and Bridget McInnes. Appears in the
International Nanotoxicology Congress (nanoTOX), June 2016. (poster)