LNN-EL: A neuro-symbolic approach to short-text entity linking
Hang Jiang, Sairam Gurajada, et al.
ACL-IJCNLP 2021
As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) - the discipline concerned with extracting structured information from unstructured text. Classical IE techniques developed by the NLP community were based on cascading grammars and regular expressions. However, due to the inherent limitations of grammar-based extraction, these techniques are unable to: (i) scale to large data sets, and (ii) support the expressivity requirements of complex information tasks. At the IBM Almaden Research Center, we are developing SystemT, an IE system that addresses these limitations by adopting an algebraic approach. By leveraging well-understood database concepts such as declarative queries and cost-based optimization, SystemT enables scalable execution of complex information extraction tasks. In this paper, we motivate the SystemT approach to information extraction. We describe our extraction algebra and demonstrate the effectiveness of our optimization techniques in providing orders of magnitude reduction in the running time of complex extraction tasks.
Hang Jiang, Sairam Gurajada, et al.
ACL-IJCNLP 2021
Laura Chiticariu, Yunyao Li, et al.
SIGMOD 2010
Manish Sethi, Narendran Sachindran, et al.
ICDE 2013
Alan Akbik, Vishwajeet Kumar, et al.
EMNLP 2016