This post was authored by Gabrielle Isgar, Doctoral Candidate in the Department of Modern Languages.
Did you know that there are still about 30 different Mayan languages spoken across Central America (Mexico, Guatemala, Honduras, Belize) today? My name is Gabrielle Isgar and I’m currently a third-year Ph.D. Candidate in Spanish Linguistics (the scientific study of language) within the Department of Modern Languages & Linguistics at Florida State University and, for the duration of my time at FSU, the sound systems and sociolinguistic context of Mayan languages have been the central focus of my research. It’s still in its early stages, but my dissertation involves analyzing and comparing sound changes and patterns of stress across Guatemalan Spanish, K’iche’ and Mam.
The goal for my PEN & Inc project was to create a tool which I could use, along with other linguists, community members and language learners, to more easily work between the two main symbolic systems used to represent Mayan languages: the Unified Mayan Alphabet (UMA) (and its language-specific adaptations), which is used by community members and widespread across pedagogical and descriptive materials, and the International Phonetic Alphabet (IPA), which is used by linguists to as a one-size-fits-all representation to capture all the possible sounds across all the World’s languages. As is often the case for understudied and/or endangered languages, like K’iche’ and Mam, most of the material available includes transcribed speech in the community’s orthography (in this case, UMA), which may cause some confusion for linguists (like me) who need to report the same speech samples using IPA notation in their theoretical descriptions. Therefore, my PEN & Inc proposal involved developing a transcription tool which makes this transference more accessible, allowing the user to input text (words) in UMA and view how each sound would be represented using IPA.
Before I outline some of the challenges I’ve faced while creating this transcription tool, let’s work through some of the linguistic jargon related to capturing various levels of sound representations. Starting with the most familiar layer for native speakers of any written language, orthography(or orthographic representations) is composed of graphemesor, basically, letters from the alphabet (or abugida, abjad). Orthography demonstrates how things are “spelled”. Believe it or not, not all languages merit spelling bees, like in English, because their orthographic systems were designed to make pronunciation more obvious for language learners. This is the case for Mayan languages, where, due to their originally oral traditions, the development of their written system only occurred within the past few decades.
On a more theoretical level, the phonological transcription, composed of phonemes, represents each sound in a language that creates a meaning-bearing contrast. We can find the phonemes by thinking of minimal pairs, or two words that have different meanings and only differ by one sound, like pat /pat/ vs. cat /kat/ in English, papa /papa/ ‘potato’ vs. capa /kapa/ ‘cloak, cape’ in Spanish and pon /pon/ ‘to improvise something when not available’ and kon /kon/ ‘stupid, ignorant (noun or adjective)’ in K’iche’. Across all three languages, the existence of these minimal pairs provides evidence that the sounds /p/ and /k/ (‘cat’) are phonemes. Transcribingspeech phonologically results in something like you saw in the previous examples, where the orthographic representations are italicized and the phonological transcription, using IPA, is provided in using //.
Getting even more specific, phonetic transcriptions account for the possible sounds, or allophones, that are produced when a phoneme is produced in different environments (e.g., at the beginning or end of a word) or by different speakers/speech communities. Following IPA standards, phonetic transcriptions often include information about syllable structure, stress, tone and certain types of sound alterations (or changes that occur in different environments). Examples are latter /lætəɹ/ [ˈlæ.ɾɹ̩] in English or ¿Cómo estás? /como estas/ [co.mwe.ˈtah] in Puerto Rican Spanish. When working on transcriptions, phonemes (or allophones) and graphemes don’t always match up one-to-one; this begins the first ‘problem’ I faced in the tool-creation process. For example, in English, I’m sure you can think of many examples where the way that you spell a word is completely different from the way that you pronounce it (for me, one that sticks out is knight /naɪt/ [naɪt]). Luckily, in K’iche’, the phonemes do tend to match up with graphemes, but this is not the case for allophones. Deciding whether to work on phonological vs. phonetic transcriptions was the first hurdle. To avoid overly-complicating my first-go, I decided to go with phonological transcriptions. This will make the back-end process a bit more straightforward and be more feasible to accomplish within the given timeframe.
When it came to approaching the coding-side of this project, I must admit that I was completely lost at the beginning. With my minimal experience in classes and online courses for C++, SAS, R, HTML and CSS, I was confident in my ability to work through a tutorial and troubleshoot on my own. The only issue was, I didn’t even know which tutorial to search for. Thankfully, I had Sarah and Matt (plus all the lovely PEN & Inc modules) to lean on and also, very fortunately, I had help from two linguistics with much more computational experience than me: Dr. Tom Juzek, Assistant Professor in Computational Linguistics at FSU within the Department of Modern Languages & Linguistics, and, my dear friend from CoLang 2022 this past summer, Sunkulp (‘Sunny’) Ananthanarayan, who is currently a Post-baccalaureate Researcher for the Yale HistLing Lab. Coming from different computational backgrounds and languages of interest, Dr. Juzek and Sunny offered me very different, yet plausible solutions to my ‘problem’. (Thank you both!)
Sunny spent time holding my hand through GitHub and how he used HTML/CSS/JavaScript to accomplish a similar task with a transcription tool for Dâw, a Naduhup language of Brazil. Dr. Juzek talked me through how I could achieve a related goal using corpus data in Python and working towards an export package, like eng_to_ipa for English, or go even further and use a machine-learning-driven approach involving even more data. Because I had more confidence working with HTML/CSS/JavaScript from my Myspace days than with Python, I decided to develop a rule-based process, like Sunny, rather than relying on a database and probabilistic methods. This also made sense because there weren’t many databases available for K’iche’ (nor Mam), so using the same process as English, for example, was less feasible for me, especially given the limitations of my skill set.
On my WordPress site for PEN & Inc, I look forward to getting into more detail related to both approaches. I will also be linking to my GitHub repository for the UMA-to-IPA transcriber, where Sunny and I are collaborating on adapting his real-time Dâw-to-IPA transcription to for Mayan languages. To start, it will be specific to K’iche’, but we hope to continue working on this further and extend the tool to other Mayan languages. Although this process has been more of the more humbling experiences I have encountered in graduate school, it has provided me with the platform to collaborate with faculty and lean on friends to fill my gaps and gain a new skill which will, undoubtedly, benefit my research later. Seeing other scholars working on their passion projects over the past year and having a venue to share them all at the end of spring is an invaluable experience. Thank you, FSU Libraries and PEN & Inc for the opportunity to participate in the 2022-2023 festivities! I can’t wait to see how far we can take the UMA-to-IPA transcription tool, the first attempt of its kind for Mayan languages.
I would also like to give a shoutout to my main advisor and fellow participant in the PEN & Inc 2022-2023 cohort, Dr. Carolina González, and to the language school which made me first fall in love with Mayan languages, the Guatemalensis Spanish School located in Quetzaltenango, Guatemala, where K’iche’, Mam and Spanish are all spoken by the same community. The Academia de Lenguas Mayas de Guatemala (ALMG) offers a community-led repository for educational materials in Mayan languages. Learn K’iche’ for free online through University of Texas at Austin’s Chqeta’maj le qach’ab’al K’iche’: A beginner to Advanced Level K’iche’ Online Course..