Book My Seat!

Hands-on knowledge from top Graph AI minds! Learn from Workshops, Masterclasses, Presentations & Networking sessions.

Machine Learning-powered Taxonomization: AI Lends Taxonomists a Hand

A Talk by Alena Vasilevich
Computational Linguist, Coreon GmbH

Register to watch this content

By submitting you agree to the Terms & Privacy Policy
Watch this content now

About this talk

In realm of data-driven businesses, formalized knowledge is a valuable resource for AI projects, created at great expense.

IATE, with almost one million concepts storing multilingual terms and metadata, holds a large part of the textual knowledge of the EU. However, it can only be accessed lexically, and the database concepts stand alone.

Taxonomization is linking a flat set of concepts into a hierarchical knowledge graph. So if IATE were converted in a full-fledged ontology, its data could not only be consumed by linguists, but would also become accessible for machines through e.g. a SPARQL endpoint.

In this talk, we will present our approach to a semi-automatic generation of taxonomised concept maps, elevating a sub-domain of IATE terminology into a multilingual knowledge graph. We taxonomized a flat list of concepts within the COVID sub-domain, benchmarking two approaches to tackle this task: automatic concept map creation using an enhanced ML-powered language model and manual creation of the graph by a linguist expert.

We will dwell on performance and resource-saving advantages of our collaborative method, made easy by Coreon user-friendly UI, and show how the achieved productivity rate can make the taxonomization of even larger terminology databases economically viable.

To demonstrate empirically the effectiveness of the semi-automatic approach in a typical industry use case scenario, the resulting IATE/Covid graph was used to initialize a CNN for a multilingual document classification task. Leveraging the created taxonomy, we got a classification granularity that is not reachable by state-of-the-art models, such as non-initialised CNNs and zero-shot classifiers.

Categories covered by this talk

Alena Vasilevich

Alena Vasilevich holds an international MSc degree in Language Science and Technology from Saarland University. At Coreon, she focuses on pragmatic data conversion, hands-on natural language processing, and analytics.

Proudly supported by

Want to sponsor this event? Contact Us.

Loading content...

Loading content...