Flavia-Elena Hațiegan, Luminiţa Dumănescu
Automating Historical Research Processes on Romanian Texts: A Case Study on University Annals from the Interwar Period
Flavia-Elena Hațiegan, Luminiţa Dumănescu
Article Information
Pages: 21-46
DOI: https://doi.org/10.24193/RJPS.2025.1.02
Flavia-Elena Hațiegan*, Luminița Dumănescu**
* Babeş-Bolyai University, Doctoral School “Population Studies and History of Minorities”, Cluj-Napoca, Romania, flavia.hatiegan@ubbcluj.ro
** Babeş-Bolyai University, Centre for Population Studies, Cluj-Napoca, Romania, luminita.dumanescu@ubbcluj.ro
Abstract. This study explores the role of academic discourse in constructing national and racial identity in interwar Romania, focusing on the Annals of the University of Cluj between 1919 and 1942. By employing text analysis methods from the field of digital humanities, such as natural language processing (NLP), bigram frequency analysis, and network visualisation, we examine how nationalist and racial categories were embedded in academic speech. The research reveals the systematic integration of concepts such as race, eugenics, and national identity across disciplines, from hygiene and ethnography to philosophy and psychology. These findings highlight the university’s central role in the Romanianisation process and the exclusion of ethnic minorities, particularly in the aftermath of the 1918 unification. The results also underscore the deep interconnection between intellectual production and state ideology during this formative period. While the analysis is limited by challenges in OCR quality and text standardisation, it demonstrates the value of digital tools for uncovering discursive patterns in historical sources. This interdisciplinary approach offers new pathways for understanding the socio-political functions of academic institutions and contributes to broader debates on nationalism, race, and memory in Central and Eastern Europe.
Keywords: nationalism, race, digital humanities, text networks, NLP
Sources
Anuarul Universităţii din Cluj. Anul I, 1919-1920. (1921). Cluj.
Anuarul Universităţii din Cluj. Anul şcolar 1922/23. (1924). Cluj.
Anuarul Universităţii Regele Ferdinand I Cluj pe anul şcolar 1929-30. (1930). Cluj.
Anuarul Universităţii Regele Ferdinand I Cluj pe anul şcolar 1930/1931. (1931). Cluj.
Anuarul Universităţii Regele Ferdinand I Cluj pe anul şcolar 1931/32. (1932). Cluj.
Anuarul Universităţii Regele Ferdinand I Cluj pe anul şcolar 1934/35. (1935). Cluj.
Anuarul Universităţii Regele Ferdinand I Cluj pe anul şcolar 1935/36. (1937). Cluj.
Anuarul. 1936-37. (1938). Cluj.
Anuarul Universităţii Regele Ferdinand I din Cluj. 1937-1938. (1939). Cluj.
Anuarul Universităţii Regele Ferdinand I din Cluj. 1938-1939. (1940). Cluj.
Anuarul Universităţii Regele Ferdinand I Cluj-Sibiu în al doilea an de refugiu. 1941-1942. (1943). Cluj.
Secondary sources
Berger, J., Grant, P. (2022). “Using Natural Language Processing to Understand People and Culture.” American Psychologist 77 (4): 525–37. https://doi.org/10.1037/amp0000882.
Bucur, M. (2002). Eugenics and Modernization in Interwar Romania. Pittsburgh: University of Pittsburgh Press.
Cârstocea, R. (2014). “The Path to the Holocaust. Fascism and Antisemitism in Interwar Romania”. S:I.M.O.N. – Shoah: Intervention. Methods. Documentation. 1 (1): 43–53.
Cârstocea, R. (2017) “Building a Fascist Romania: Voluntary Work Camps as Mobilisation Strategies of the Legionary Movement in Interwar Romania”. Fascism 6 (2): 163–195. https://doi.org/10.1163/22116257-00602002.
Clark, R. (2015). Holy Legionary Youth: Fascist Activism in Interwar Romania. Ithaca: Cornell University Press.
Clipici, R. M. (2021). “Ordinul Naţional Steaua României in Grad de Colan. Istorie şi Actualitate”. Revista Biserica Ortodoxă Română, Buletinul Oficial al Patriarhiei Române 2: 322–342.
Craioveanu, F. (2020). “Racism in Interwar Romanian Press. Disseminators and Influences in ’Societatea de Mâine’: A Case Study”. Acta Musei Porolissensis 42: 151-173.
Dan, P. (2018). “Identity, Collective Memory and Antisemitism”. Analele Universităţii Din Bucureşti, Seria ştiinţe Politice 1: 91–106.
Furtună, A-N. (2018.). E Rroma Rumuniatar thaj o Holocausto: historia, teorie, kultura = Rromii din România şi Holocaustul: istorie, teorie, cultură = Rroma from Romania and the Holocaust: history, theory, culture. Ediţie trilingvă. Romane rodimata. Popeşti Leordeni: Dykhta! Publishing House.
Gifu, D., Dascalu, D., Trausan-Matu, S.and Allen, L. K. (2016). “Time Evolution of Writing Styles in Romanian Language”. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), 1048–54. San Jose, CA, USA: IEEE. https://doi.org/10.1109/ICTAI.2016.0161.
Hagberg, A. A, Schult, D.A and Swart, Pieter, J (2008). “Exploring Network Structure, Dynamics, and Function Using NetworkX”. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman (Eds). Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena, CA USA, pp. 11-15.
Hitchins, K. (2002). “The Idea of Nation among the Romanians of Transylvania, 1700-1849”. In Nation and National Ideology. Past, Present and Prospects. The Center for the History of the Imaginary and New Europe College, pp. 78–109.
Honnibal, M., Montani, I., Van Landeghem, S. and Boyd, A.(2020). “spaCy: Industrial-Strength Natural Language Processing in Python”. https://doi.org/10.5281/zenodo.1212303.
Ioanid, R. (1990). The Sword of the Archangel: Fascist Ideology in Romania. East European Monographs, no. 292. Boulder [Colo.]: New York: East European Monographs ; Distributed by Columbia University Press.
Karády, V., Nastasă-Kovács, L (2004). The University of Kolozsvár/Cluj and the Students of the Medical Faculty: 1872-1918. Budapest Cluj: Central European University ; Ethnocultural Diversity Resource Center.
Khurana, D., Koli, Khatter, A.K and Singh, S. (2023) “Natural Language Processing: State of the Art, Current Trends and Challenges”. Multimedia Tools and Applications 82(3): 3713–44. https://doi.org/10.1007/s11042-022-13428-4.
Koszor-Codrea, C. (2022). “Mismeasuring Diversity: Popularizing Scientific Racism in the Romanian Principalities Around the Mid-Nineteenth Century”. Journal of Romanian Studies 4 (1): 37–56. https://doi.org/10.3828/romanian.2022.4.
Livezeanu, I. (1995). Cultural Politics in Greater Romania: Regionalism, Nation Building and Ethnic Struggle, 1918-1930. Ithaca: Cornell University Press.
Lucy, L., Dorottya, D., Bromley, P. and Jurafsky, D. (2020). “Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks”. AERA Open 6 (3): 233285842094031. https://doi.org/10.1177/2332858420940312.
Matei, P. (2022). Roma Deportations to Transnistria during WWII: Between Central Decision-Making and Local Initiatives. AT: Wiener Wiesenthal Institut. https://doi.org/10.23777/sn.0222/art_pmat01.
Motta, G. (2019). “Nationalism and Anti-Semitism in an Independent Romania”. Academic Journal of Interdisciplinary Studies 8 (2): 14–26. https://doi.org/10.2478/ajis-2019-0012.
Neagu, L. M. et al. (2020). “Automated Modeling of Romanian Literary Trends in History Using Topics Over Time and Co-Occurences”. Bucureşti. https://doi.org/10.12753/2066-026X-20-019.
Nguyen, T-T-H., Jatowt, A., Coustaty, M., Nguyen, N-Van and Doucet, A. (2019). “Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing”. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 29–38. Champaign, IL, USA: IEEE. https://doi.org/10.1109/JCDL.2019.00015.
Pârvulescu, A. and Boatcă, M. (2022). Creolizing the Modern: Transylvania across Empires. Ithaca ; London: Cornell University Press.
Perrone, G., Unpingco, J. and Lu, H-M (2020). “Network Visualizations with Pyvis and VisJS”. arXiv Preprint arXiv:2006.04951. https://arxiv.org/abs/2006.04951.
Řehůřek, R., and Sojka, S.(2010). “Software Framework for Topic Modelling with Large Corpora”. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50. Valletta, Malta: ELRA.
Sarkar, D. (2016). Text Analytics with Python. Berkeley, CA: Apress. https://doi.org/10.1007/978-1-4842-2388-8.
Smedley, A., and Smedley, B. D.(2005). “Race as Biology Is Fiction, Racism as a Social Problem Is Real. Anthropological and Historical Perspectives on the Social Construction of Race”. American Psychologist 60 (1): 16–26.
Stan, A-M. (2016). “Statutul profesional şi public al personalului academic de la universitatea românească din Cluj între 1919-1940”. InIrina Nastasă-Matei and Zoltán Rostás (Eds). Alma mater în derivă: aspecte alternative ale vieţii universitare interbelice. Şcoala ardeleană de istorie. Cluj-Napoca Bucureşti: Şcoala Ardeleană Eikon.
Stan, A-M.(2021). “De la separatism regional la centralizare: două proiecte legislative ale universitarilor clujeni privind reforma învăţământului superior românesc după 1918’. Plural 9 (1).: 141-157.
Szabó, M. K., Ring, O., Nagy, B., Kiss, L., Koltai, J., Berend, G., Vidács, L., Gulyás, A., and Kmetty, Z. (2020). “Exploring the Dynamic Changes of Key Concepts of the Hungarian Socialist Era with Natural Language Processing Methods’. Historical Methods: A Journal of Quantitative and Interdisciplinary History 54 (1): 1–13. https://doi.org/10.1080/01615440.2020.1823289.
Turda, M. (2008). Eugenism si antropologie rasiala în România, 1874-1944. Bucharest: Fundatia Amfiteatru.
Turda, M. (Ed.). (2015). The History of East-Central European Eugenics, 1900-1945. London: Bloomsbury.
Turda, M. (2016). Eugenism şi modernitate. Naţiune, rasă şi biopolitică în Europa: 1870-1950. Libreka GmbH.
Turda, M., and Balogun, B. (2023). “Colonialism, Eugenics and “Race” in Central and Eastern Europe”. Global Social Challenges Journal 20:1–11. https://doi.org/10.1332/TQUQ2535.
Turda, M., Bokor, Z., Pârâianu, R., and Varga, A (2022). Războiul sfânt’ al rasei: Eugenia şi protecţia naţiunii în Ungaria : 1900-1919. Ediţia a 2-A. Cluj-Napoca: Academia Română. Centrul de Studii Transilvane and Şcoala Ardeleană.
Turda, M., and Furtuna, A. N( 2022). “The Roma and the Question of Ethnic Origin in Romania during the Holocaust”. Critical Romani Studies 4 (2): 8–32. https://doi.org/10.29098/crs.v4i2.143.
Vasiliev, Y. (2020). Natural Language Processing with Python and spaCy: A Practical Introduction. San Francisco: No Starch Press.
Volk, M., Lenz, F. and Sennrich, R (2011). “Strategies for Reducing and Correcting OCR Errors”. In Language Technology for Cultural Heritage. Caroline Sporleder, Antal Van Den Bosch, and Kalliopi Zervanou (Eds). Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 3-22. https://doi.org/10.1007/978-3-642-20227-8_1.