Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Link to record
Permanent link

Direct link
Megyesi, Beáta, ProfessorORCID iD iconorcid.org/0000-0002-4838-6518
Alternative names
Publications (10 of 99) Show all publications
Megyesi, B., Tudor, C., Láng, B., Lehofer, A., Kopal, N., de Leeuw, K. & Waldispühl, M. (2024). Keys with nomenclatures in the early modern Europe. Cryptologia, 48(2), 97-139
Open this publication in new window or tab >>Keys with nomenclatures in the early modern Europe
Show others...
2024 (English)In: Cryptologia, ISSN 0161-1194, E-ISSN 1558-1586, Vol. 48, no 2, p. 97-139Article in journal (Refereed) Published
Abstract [en]

We give an overview of the development of European historical cipher keys originating from early Modern times. We describe the nature and the structure of the keys with a special focus on the nomenclatures. We analyze what was encoded and how and take into account chronological and regional differences. The study is based on the analysis of over 1,600 cipher keys, collected from archives and libraries in 10 European countries. We show that historical cipher keys evolved over time and became more secure, shown by the symbol set used for encoding, the code length and the code types presented in the key, the size of the nomenclature, as well as the diversity and complexity of linguistic entities that are chosen to be encoded.

Place, publisher, year, edition, pages
Taylor & Francis Group, 2024
Keywords
cryptanalysis, cipher keys, DECRYPT project, historical cryptology
National Category
Natural Language Processing
Research subject
Computational Linguistics; Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-490560 (URN)10.1080/01611194.2022.2113185 (DOI)000878534700001 ()
Funder
Swedish Research Council, 2018-06074
Available from: 2022-12-12 Created: 2022-12-12 Last updated: 2025-02-07Bibliographically approved
Megyesi, B., Sikora, J., Fornmark, F., Waldispühl, M., Kopal, N. & Mikhalev, V. (2023). Historical Language Models in Cryptanalysis: Case Studies on English and German. In: Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023: . Paper presented at The 6th International Conference on Historical Cryptology HistoCrypt 2023.
Open this publication in new window or tab >>Historical Language Models in Cryptanalysis: Case Studies on English and German
Show others...
2023 (English)In: Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we study the impact of language models (LM) on decipherment of historical homophonic substitution ciphers. In particular, we investigate if decipherment by using hill-climbing and simulated annealing can benefit from LMs generated from historical texts in general and century-specific texts in particular. We carry out experiments on homophonic substitution ciphers with English and German as plaintext languages. We take into account ciphertext length as well as n-gram size of the LMs. We compare the results on decipherment based on historical LMs with large LMs generated from modern texts. The results show that using historical LMs in decipherment of homophonic substitution ciphers leads to significantly better performance on ciphertext produced in the 17th century or earlier, and century-specific language models yield better results on longer and older ciphertexts.

Keywords
language models, historical texts, decipherment
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-507037 (URN)10.3384/ecp195701 (DOI)
Conference
The 6th International Conference on Historical Cryptology HistoCrypt 2023
Funder
Swedish Research Council, 2018-06074
Available from: 2023-06-30 Created: 2023-06-30 Last updated: 2025-02-07
Ilinykh, N., Morger, F., Dannélls, D., Dobnik, S., Megyesi, B. & Nivre, J. (Eds.). (2023). Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023). Paper presented at The Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), Tórshavn, Faroe Islands, May 22nd, 2023. Association for Computational Linguistics
Open this publication in new window or tab >>Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)
Show others...
2023 (English)Conference proceedings (editor) (Refereed)
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2023. p. 140
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-518961 (URN)978-1-959429-73-9 (ISBN)
Conference
The Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), Tórshavn, Faroe Islands, May 22nd, 2023
Available from: 2024-01-01 Created: 2024-01-01 Last updated: 2025-02-07Bibliographically approved
Wu, Y., Nouri, J., Megyesi, B., Henriksson, A., Duneld, M. & Li, X. (2023). Towards Data-effective Educational Question Generation with Prompt-based Learning. In: Proceedings of 2023 Computing Conference: . Paper presented at Proceedings of 2023 Computing Conference.
Open this publication in new window or tab >>Towards Data-effective Educational Question Generation with Prompt-based Learning
Show others...
2023 (English)In: Proceedings of 2023 Computing Conference, 2023Conference paper, Published paper (Refereed)
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-507023 (URN)
Conference
Proceedings of 2023 Computing Conference
Available from: 2023-06-30 Created: 2023-06-30 Last updated: 2025-02-07
Mikhalev, V., Kopal, N., Esslinger, B., Waldispühl, M., Láng, B. & Megyesi, B. (2023). What is the Code for the Code?Historical Cryptology Terminology. In: Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023: . Paper presented at The 6th International Conference on Historical Cryptology HistoCrypt 2023.
Open this publication in new window or tab >>What is the Code for the Code?Historical Cryptology Terminology
Show others...
2023 (English)In: Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023Conference paper, Published paper (Refereed)
Abstract [en]

The cross-disciplinary nature of historical cryptology involves the challenge to find a terminology that is both consistent and accepted across the different disciplines and applicable in the single fields. In this paper, we propose a terminology based on concise principles developed by an interdisciplinary group of researchers. We present terms prominent in the study of historical cryptology, define them, and illustrate their usage. Our goal is to initiate and/or continue the discussion of how we use various terms for different types of historical encrypted sources, and their study. Our hope is that this paper will contribute to consistent and systematic usage of terms in the HistoCrypt community.

Keywords
historical cryptology, decipherment, decryption, encryption, cipher keys
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-507036 (URN)10.3384/ecp195702 (DOI)
Conference
The 6th International Conference on Historical Cryptology HistoCrypt 2023
Funder
Swedish Research Council, 2018-06074
Available from: 2023-06-30 Created: 2023-06-30 Last updated: 2025-02-07
Souibgui, M. A., Fornés, A., Kessentini, Y. & Megyesi, B. (2022). Few shots are all you need: A progressive learning approach for low resource handwritten text recognition. Pattern Recognition Letters, 160, 43-49
Open this publication in new window or tab >>Few shots are all you need: A progressive learning approach for low resource handwritten text recognition
2022 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 160, p. 43-49Article in journal (Refereed) Published
Abstract [en]

Handwritten text recognition in low resource scenarios, such as manuscripts with rare alphabets, is a challenging problem. In this paper, we propose a few-shot learning-based handwriting recognition approach that significantly reduces the human annotation process, by requiring only a few images of each alphabet symbols. The method consists of detecting all the symbols of a given alphabet in a textline image and decoding the obtained similarity scores to the final sequence of transcribed symbols. Our model is first pretrained on synthetic line images generated from an alphabet, which could differ from the alphabet of the target domain. A second training step is then applied to reduce the gap between the source and the target data. Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach that automatically assigns pseudo-labels to the unlabeled data. The evaluation on different datasets shows that our model can lead to competitive results with a significant reduction in human effort. 

Place, publisher, year, edition, pages
ElsevierElsevier BV, 2022
Keywords
Handwritten text recognition, Few-shot learning, Unsupervised progressive learning, encrypted manuscripts
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-478941 (URN)10.1016/j.patrec.2022.06.003 (DOI)000836819000007 ()
Funder
Swedish Research Council, 2018-06074
Available from: 2022-06-27 Created: 2022-06-27 Last updated: 2025-02-07Bibliographically approved
Gambardell, M. E., Pettersson, E. & Megyesi, B. (2022). Identifying Cleartext in Historical Ciphers. In: Rachele Sprugnoli and Marco Passarotti (Ed.), Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022.: . Paper presented at Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022..
Open this publication in new window or tab >>Identifying Cleartext in Historical Ciphers
2022 (English)In: Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022. / [ed] Rachele Sprugnoli and Marco Passarotti, 2022Conference paper, Published paper (Refereed)
Abstract [en]

In historical encrypted sources we can find encrypted text sequences, also called ciphertext, as well as non-encrypted cleartexts written in a known language. While most of the cryptanalysis focuses on the decryption of ciphertext, cleartext is often overlooked although it can give us important clues about the historical interpretation and contextualisation of the manuscript. In this paper, we investigate to what extent we can automatically distinguish cleartext from ciphertext in historical ciphers and to what extent we are able to identify its language. The problem is challenging as cleartext sequences in ciphers are often short, up to a few words, in different languages due to historical code-switching. To identify the sequences and the language(s), we chose a rule-based approach and run 7 different models using historical language models on various ciphertexts.

National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-478968 (URN)979-10-95546-78-8 (ISBN)
Conference
Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022.
Funder
Swedish Research Council, 2018-06074
Available from: 2022-06-27 Created: 2022-06-27 Last updated: 2025-02-07Bibliographically approved
Magnifico, G., Megyesi, B., Souibgui, M. A., Chen, J. & Fornés, A. (2022). Lost in Transcription of Graphic Signs in Ciphers. In: Carola Dahlke and Beáta Megyesi (Ed.), Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022: . Paper presented at The 5th International Conference on Historical Cryptology. HistoCrypt 2022 (pp. 153-158). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Lost in Transcription of Graphic Signs in Ciphers
Show others...
2022 (English)In: Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022 / [ed] Carola Dahlke and Beáta Megyesi, Linköping: Linköping University Electronic Press, 2022, p. 153-158Conference paper, Published paper (Refereed)
Abstract [en]

Hand-written Text Recognition techniques withthe aim to automatically identify and transcribehand-written text have been applied to histor-ical sources including ciphers. In this paper,we compare the performance of two machinelearning architectures, an unsupervised methodbased on clustering and a deep learning methodwith few-shot learning. Both models are testedon seen and unseen data from historical cipherswith different symbol sets consisting of varioustypes of graphic signs. We compare the modelsand highlight their differences in performance,with their advantages and shortcomings.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2022
Keywords
transcription of ciphers, hand-written text recognition of symbols, graphic signs
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-477917 (URN)10.3384/ecp188403 (DOI)
Conference
The 5th International Conference on Historical Cryptology. HistoCrypt 2022
Funder
Swedish Research Council, 2018-06074
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2025-02-07Bibliographically approved
Dahlke, C. & Megyesi, B. (Eds.). (2022). Proceedings of the 5th International Conference on Historical Cryptology. Paper presented at The 5th International Conference on Historical Cryptology. Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>Proceedings of the 5th International Conference on Historical Cryptology
2022 (English)Conference proceedings (editor) (Refereed)
Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2022. p. 11
Keywords
Historical cryptology
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-477910 (URN)10.3384/ecp188 (DOI)978-91-7929-397-0 (ISBN)
Conference
The 5th International Conference on Historical Cryptology
Projects
VR 2018-06074
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2025-02-07Bibliographically approved
Héder, M. & Megyesi, B. (2022). The DECODE Database of Historical Ciphers and Keys: Version 2. In: Carola Dahlke and Beáta Megyesi (Ed.), Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022.: . Paper presented at The 5th International Conference on Historical Cryptology. HistoCrypt 2022. (pp. 111-114). Linköping: Linköping University Electronic Press
Open this publication in new window or tab >>The DECODE Database of Historical Ciphers and Keys: Version 2
2022 (English)In: Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022. / [ed] Carola Dahlke and Beáta Megyesi, Linköping: Linköping University Electronic Press, 2022, p. 111-114Conference paper, Published paper (Refereed)
Abstract [en]

We report recent developments of the DE-CODE database aimed for the system-atic collection and annotation of encryptedsources: ciphertexts, keys and related doc-uments. We released a new, more func-tional graphical user interface, revisedsome metadata features and enlarged thecollection and tripled its size.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2022
Keywords
historical ciphers, historical cipher keys, database, historical cryptology
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-477915 (URN)10.3384/ecp188397 (DOI)
Conference
The 5th International Conference on Historical Cryptology. HistoCrypt 2022.
Funder
Swedish Research Council, 2018-06074
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4838-6518

Search in DiVA

Show all publications