Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Arabic named entity recognition in social media based on BiLSTM-CRF using an attention mechanism
Hassan First Univ Settat, Fac Sci & Tech, IR2M Lab, Settat, Morocco..
Hassan First Univ Settat, Fac Sci & Tech, IR2M Lab, Settat, Morocco..
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
Sultan Moulay Slimane Univ, Natl Sch Business & Management, Beni Mellal, Morocco..
Show others and affiliations
2022 (English)In: Journal of Intelligent & Fuzzy Systems, ISSN 1064-1246, E-ISSN 1875-8967, Vol. 42, no 6, p. 5427-5436Article in journal (Refereed) Published
Abstract [en]

Named Entity Recognition (NER) is a vitally important task of Natural Language Processing (NLP), which aims at finding named entities in natural language text and classifying them into predefined categories such as persons (PER), places (LOC), organizations (ORG), and so on. In the Arabic context, the current NER approaches based on deep learning are mainly based on word embedding or character-level embedding as input. However, using a single granularity representation has problems with out-of-vocabulary (OOV), word embedding errors, and relatively simple semantic content. This paper presents a multi-headed self-attention mechanism implemented in the BiLSTM-CRF neural network structure to recognize Arabic named entities on social media using two embeddings. Unlike other state-of-the-art approaches, this approach combines character and word embedding at the embedding layer, and the attention mechanism calculates the similarity over the entire sequence of characters and captures local context information. The proposed approach better recognized NEs in Dialect Arabic, reaching an F1 value of 74.15% on Darwish's dataset (a publicly available Arabic NER benchmark for social media). According to our knowledge, our findings outperform the current state-of-the-art models for Arabic Named Entity Recognition on social media.

Place, publisher, year, edition, pages
IOS Press IOS Press, 2022. Vol. 42, no 6, p. 5427-5436
Keywords [en]
Arabic named entity recognition (ANER), natural language processing (NLP), multi-head self-attention, BiLSTM, CRF, dialect arabic, social media
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:uu:diva-474816DOI: 10.3233/JIFS-211944ISI: 000790690300044OAI: oai:DiVA.org:uu-474816DiVA, id: diva2:1660606
Available from: 2022-05-24 Created: 2022-05-24 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Ait-Mlouk, Addi

Search in DiVA

By author/editor
Ait-Mlouk, Addi
By organisation
Department of Information Technology
In the same journal
Journal of Intelligent & Fuzzy Systems
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 116 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf