uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
BETA
Basirat, Ali, Postdoctoral Researcher
Publications (10 of 14) Show all publications
Basirat, A. & Tang, M. (2019). Linguistic information in word embeddings. In: Randy Goebel, Yuzuru Tanaka, and Wolfgang Wahlster (Ed.), Lecture notes in artificial intelligence: . Dordrecht: Springer
Open this publication in new window or tab >>Linguistic information in word embeddings
2019 (English)In: Lecture notes in artificial intelligence / [ed] Randy Goebel, Yuzuru Tanaka, and Wolfgang Wahlster, Dordrecht: Springer, 2019Chapter in book (Refereed)
Place, publisher, year, edition, pages
Dordrecht: Springer, 2019
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:uu:diva-366478 (URN)
Available from: 2018-11-20 Created: 2018-11-20 Last updated: 2019-03-27
Basirat, A., de Lhoneux, M., Kulmizev, A., Kurfal, M., Nivre, J. & Östling, R. (2019). Polyglot Parsing for One Thousand and One Languages (And Then Some). In: : . Paper presented at First workshop on Typology for Polyglot NLP, Florence, Italy, August 1 2019.
Open this publication in new window or tab >>Polyglot Parsing for One Thousand and One Languages (And Then Some)
Show others...
2019 (English)Conference paper, Poster (with or without abstract) (Other academic)
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:uu:diva-392156 (URN)
Conference
First workshop on Typology for Polyglot NLP, Florence, Italy, August 1 2019
Available from: 2019-08-29 Created: 2019-08-29 Last updated: 2019-08-30Bibliographically approved
Basirat, A. (2019). Random Word Vectors. In: : . Paper presented at 3rd Swedish Symposium on Deep Learning (SSDL), Norrköping, June 10-11 2019. Norrköping
Open this publication in new window or tab >>Random Word Vectors
2019 (English)Conference paper, Poster (with or without abstract) (Other academic)
Place, publisher, year, edition, pages
Norrköping: , 2019
National Category
General Language Studies and Linguistics Computer Systems
Identifiers
urn:nbn:se:uu:diva-392155 (URN)
Conference
3rd Swedish Symposium on Deep Learning (SSDL), Norrköping, June 10-11 2019
Available from: 2019-08-29 Created: 2019-08-29 Last updated: 2019-08-30Bibliographically approved
Basirat, A. & Nivre, J. (2019). Real-valued syntactic word vectors. Journal of experimental and theoretical artificial intelligence (Print)
Open this publication in new window or tab >>Real-valued syntactic word vectors
2019 (English)In: Journal of experimental and theoretical artificial intelligence (Print), ISSN 0952-813X, E-ISSN 1362-3079Article in journal (Refereed) Published
Abstract [en]

We introduce a word embedding method that generates a set of real-valued word vectors from a distributional semantic space. The semantic space is built with a set of context units (words) which are selected by an entropy-based feature selection approach with respect to the certainty involved in their contextual environments. We show that the most predictive context of a target word is its preceding word. An adaptive transformation function is also introduced that reshapes the data distribution to make it suitable for dimensionality reduction techniques. The final low-dimensional word vectors are formed by the singular vectors of a matrix of transformed data. We show that the resulting word vectors are as good as other sets of word vectors generated with popular word embedding methods.

Keywords
Word embeddings, context selection, transformation, dependency parsing, singular value decomposition, entropy
National Category
Languages and Literature General Language Studies and Linguistics Computer Systems
Identifiers
urn:nbn:se:uu:diva-392095 (URN)10.1080/0952813X.2019.1653385 (DOI)
Available from: 2019-08-29 Created: 2019-08-29 Last updated: 2019-08-29Bibliographically approved
Basirat, A. & Tang, M. (2018). Lexical and Morpho-syntactic Features in Word Embeddings: A Case Study of Nouns in Swedish. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence: Volume 2 (pp. 663-674). Setubal: SciTePress
Open this publication in new window or tab >>Lexical and Morpho-syntactic Features in Word Embeddings: A Case Study of Nouns in Swedish
2018 (English)In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence: Volume 2, Setubal: SciTePress, 2018, p. 663-674Chapter in book (Refereed)
Place, publisher, year, edition, pages
Setubal: SciTePress, 2018
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:uu:diva-351926 (URN)978-989-758-275-2 (ISBN)
Available from: 2018-05-31 Created: 2018-05-31 Last updated: 2018-11-21Bibliographically approved
Basirat, A. (2018). Principal Word Vectors. (Doctoral dissertation). Uppsala: Acta Universitatis Upsaliensis
Open this publication in new window or tab >>Principal Word Vectors
2018 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Word embedding is a technique for associating the words of a language with real-valued vectors, enabling us to use algebraic methods to reason about their semantic and grammatical properties. This thesis introduces a word embedding method called principal word embedding, which makes use of principal component analysis (PCA) to train a set of word embeddings for words of a language. The principal word embedding method involves performing a PCA on a data matrix whose elements are the frequency of seeing words in different contexts. We address two challenges that arise in the application of PCA to create word embeddings. The first challenge is related to the size of the data matrix on which PCA is performed and affects the efficiency of the word embedding method. The data matrix is usually a large matrix that requires a very large amount of memory and CPU time to be processed. The second challenge is related to the distribution of word frequencies in the data matrix and affects the quality of the word embeddings. We provide an extensive study of the distribution of the elements of the data matrix and show that it is unsuitable for PCA in its unmodified form.

We overcome the two challenges in principal word embedding by using a generalized PCA method. The problem with the size of the data matrix is mitigated by a randomized singular value decomposition (SVD) procedure, which improves the performance of PCA on the data matrix. The data distribution is reshaped by an adaptive transformation function, which makes it more suitable for PCA. These techniques, together with a weighting mechanism that generalizes many different weighting and transformation approaches used in literature, enable the principal word embedding to train high quality word embeddings in an efficient way.

We also provide a study on how principal word embedding is connected to other word embedding methods. We compare it to a number of word embedding methods and study how the two challenges in principal word embedding are addressed in those methods. We show that the other word embedding methods are closely related to principal word embedding and, in many instances, they can be seen as special cases of it.

The principal word embeddings are evaluated in both intrinsic and extrinsic ways. The intrinsic evaluations are directed towards the study of the distribution of word vectors. The extrinsic evaluations measure the contribution of principal word embeddings to some standard NLP tasks. The experimental results confirm that the newly proposed features of principal word embedding (i.e., the randomized SVD algorithm, the adaptive transformation function, and the weighting mechanism) are beneficial to the method and lead to significant improvements in the results. A comparison between principal word embedding and other popular word embedding methods shows that, in many instances, the proposed method is able to generate word embeddings that are better than or as good as other word embeddings while being faster than several popular word embedding methods.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2018. p. 159
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 22
Keywords
word, context, word embedding, principal component analysis, PCA, sparse matrix, singular value decomposition, SVD, entropy
National Category
General Language Studies and Linguistics Language Technology (Computational Linguistics) Computer Systems
Identifiers
urn:nbn:se:uu:diva-353866 (URN)978-91-513-0365-9 (ISBN)
Public defence
2018-09-08, Room 22-0008, Humanistiska teatern, 752 38, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2018-08-14 Created: 2018-06-17 Last updated: 2018-08-27
Zarei, F., Basirat, A., Faili, H. & Mirain, M. (2017). A bootstrapping method for development of Treebank. Journal of experimental and theoretical artificial intelligence (Print), 29(1), 19-42
Open this publication in new window or tab >>A bootstrapping method for development of Treebank
2017 (English)In: Journal of experimental and theoretical artificial intelligence (Print), ISSN 0952-813X, E-ISSN 1362-3079, Vol. 29, no 1, p. 19-42Article in journal (Refereed) Published
Abstract [en]

Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].

Keywords
Treebank, supertagging, parser, annotated corpus, bootstrapping, semi-supervised
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-316124 (URN)10.1080/0952813X.2015.1057239 (DOI)000392422400002 ()
Available from: 2017-03-10 Created: 2017-03-10 Last updated: 2018-01-13Bibliographically approved
de Lhoneux, M., Yan, S., Basirat, A., Kiperwasser, E., Stymne, S., Goldberg, Y. & Nivre, J. (2017). From raw text to Universal Dependencies: look, no tags!. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Paper presented at CoNLL 2017, August 3-4, 2017, Vancouver, Canada (pp. 207-217). Vancouver, Canada: Association for Computational Linguistics
Open this publication in new window or tab >>From raw text to Universal Dependencies: look, no tags!
Show others...
2017 (English)In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada: Association for Computational Linguistics, 2017, p. 207-217Conference paper, Published paper (Refereed)
Abstract [en]

We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs joint word and sentence segmentation on raw text; the second predicts dependency trees from raw words. The parser bypasses the need for part-of-speech tagging, but uses word embeddings based on universal tag distributions. We achieved a macroaveraged LAS F1 of 65.11 in the official test run and obtained the 2nd best result for sentence segmentation with a score of 89.03. After fixing two bugs, we obtained an unofficial LAS F1 of 70.49.

Place, publisher, year, edition, pages
Vancouver, Canada: Association for Computational Linguistics, 2017
Keywords
dependency, parsing, multilingual, segmentation
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-333439 (URN)978-1-945626-70-8 (ISBN)
Conference
CoNLL 2017, August 3-4, 2017, Vancouver, Canada
Available from: 2017-11-13 Created: 2017-11-13 Last updated: 2018-04-10Bibliographically approved
Basirat, A. & Tang, M. (2017). Neural network and human cognition: A case study of grammatical gender in Swedish. In: Proceedings of the 13th Swedish Cognitive Science Society (SweCog) national conference: . Paper presented at 13th Swedish Cognitive Science Society (SweCog) national conference (pp. 28-30). Uppsala
Open this publication in new window or tab >>Neural network and human cognition: A case study of grammatical gender in Swedish
2017 (English)In: Proceedings of the 13th Swedish Cognitive Science Society (SweCog) national conference, Uppsala, 2017, p. 28-30Conference paper, Oral presentation with published abstract (Other academic)
Place, publisher, year, edition, pages
Uppsala: , 2017
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:uu:diva-336891 (URN)978-91-983667-2-3 (ISBN)
Conference
13th Swedish Cognitive Science Society (SweCog) national conference
Available from: 2017-12-18 Created: 2017-12-18 Last updated: 2018-01-13Bibliographically approved
Basirat, A. & Nivre, J. (2017). Real-valued Syntactic Word Vectors (RSV) for Greedy Neural Dependency Parsing. In: : . Paper presented at Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa (pp. 21-28). Linköping University
Open this publication in new window or tab >>Real-valued Syntactic Word Vectors (RSV) for Greedy Neural Dependency Parsing
2017 (English)Conference paper, Published paper (Refereed)
Abstract [en]

We show that a set of real-valued word vectors formed by right singular vectors of a transformed co-occurrence matrix are meaningful for determining different types of dependency relations between words. Our experimental results on the task of dependency parsing confirm the superiority of the word vectors to the other sets of word vectors generated by popular methods of word embedding. We also study the effect of using these vectors on the accuracy of dependency parsing in different languages versus using more complex parsing architectures.

Place, publisher, year, edition, pages
Linköping University, 2017
Series
NEALT Proceedings Series, ISSN 1650-3686, E-ISSN 1650-3740
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics; Computer Science with specialization in Computer Communication
Identifiers
urn:nbn:se:uu:diva-336722 (URN)978-91-7685-601-7 (ISBN)
Conference
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa
Available from: 2017-12-15 Created: 2017-12-15 Last updated: 2018-01-13

Search in DiVA

Show all publications