Information Extraction

PROJECTS

Information Extraction

About
  • Information extraction (IE) is the task of extracting information from unstructured text. IE has two main subtasks: Named Entity Recognition (NER) and Relation Extraction (RE). The goal of NER is to automatically identify and classify terms in text into pre-defined categories. The goal of RE is to classify the relationship between entities. We have explored extracting information from a number of domains including clinical texts, FDA drug labels, scientific literature and PubMed abstracts.

Software
  • medaCy -- A freely available python package for named entity recogntion that includes three state-of-the-art models: CRF, BiLSTM+CRF and BERT+CRF.
  • RelEx -- A freely available python package for relation extraction using Convolutional Neural Networks.
  • Multitasking Transformers -- A freely available BERT-based multi-tasking architecture written in python which contains a NER head.

Publications