Logo DOAB
  • Publisher login
    • Support
    • Language 
      • English
      • français
    • Deposit
            View Item 
            •   DOAB Home
            • View Item
            •   DOAB Home
            • View Item
            JavaScript is disabled for your browser. Some features of this site may not work without it.

            The Unicode cookbook for linguists

            Managing writing systems using orthography profiles

            Thumbnail
            Author(s)
            Moran, Steven
            Cysouw, Michael
            Collection
            Knowledge Unlatched (KU)
            Language
            English
            Show full item record
            Abstract
            This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research.
            URI
            https://doab-dev.siscern.org/handle/20.500.12854/176928
            Keywords
            Linguistics
            DOI
            10.5281/zenodo.1296780
            ISBN
            9783961100903
            Publisher
            Language Science Press
            Publisher website
            http://langsci-press.org/
            Publication date and place
            Berlin, 2018-07-11
            Grantor
            • Knowledge Unlatched
            Series
            Translation and Multilingual Natural Language Processing,
            • OAPEN harvesting collection

            Browse

            All of DOABSubjectsPublishersLanguagesCollections

            My Account

            LoginRegister

            Export

            Repository metadata
            Doabooks

            • For Researchers
            • For Librarians
            • For Publishers
            • Our Supporters
            • Resources
            • DOAB

            Newsletter


            • subscribe to our newsletter
            • view our news archive

            Follow us on

            • Twitter

            License

            • If not noted otherwise all contents are available under Attribution 4.0 International (CC BY 4.0)

            donate


            • Donate
              Support DOAB and the OAPEN Library

            Credits


            • logo Investir l'avenirInvestir l'avenir
            • logo MESRIMESRI
            • logo EUEuropean Union
              This project received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871069.

            Directory of Open Access Books is a joint service of OAPEN, OpenEdition, CNRS and Aix-Marseille Université, provided by DOAB Foundation.

            Websites:

            DOAB
            www.doabooks.org

            OAPEN Home
            www.oapen.org

            OAPEN OA Books Toolkit
            www.oabooks-toolkit.org

            Export search results

            The export option will allow you to export the current search results of the entered query to a file. Differen formats are available for download. To export the items, click on the button corresponding with the preferred download format.

            A logged-in user can export up to 15000 items. If you're not logged in, you can export no more than 500 items.

            To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

            After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.