台語華語辭典

台灣記憶

花蓮風景




台語文消息
楊允言 台語文處理技術:以變調及詞性標記為例
Processing Techniques for Written Taiwanese -- Tone Sandhi and POS Tagging
[回學位論文] [回本站首頁]        更新:2011/2/18 / 最近閱讀:2023/10/1
目錄
Ch1 Introduction
     1.1 Background
     1.2 Different Types of Written Taiwanese Scripts
     1.3 Related Issues of Written Taiwanese Processing
     1.4 Organization of This Dissertation
Ch2 Resources and Survey of Written Taiwanese Processing
     2.1 Digital Resources of Written Taiwanese
     2.2 Survey of Written Taiwanese Processing Technique
     2.3 Summary
Ch3 Coding, I/O of POJ and Text Processing
     3.1 Character Code of POJ
     3.2 Two Kinds of POJ Representation
     3.3 Search Problem of POJ Text
     3.4 Display of POJ Text
     3.5 Some Text Processing Utilities for POJ
     3.6 Word Segmentation for HR Mixed Script
Ch4 Tone Sandhi Problem and Algorithm
     4.1 Tone Sandhi Problem of the Taiwanese Language
     4.2 Implementation of the Taiwanese Pronunciation System
     4.3 Rule-based Tone Sandhi Algorithm
     4.4 Results, Accuracy Rate and Discussion
     4.5 Summary and Possible Direction
Ch5 POS Tagging Method
     5.1 Problems of POS Tagging
     5.2 POS Tagging Methods
     5.3 Results
     5.4 Error Analysis
     5.5 Discussion
     5.6 Summary
Ch6 Conclusion and Future Work
     6.1 Our Contributions to Written Taiwanese Resources and Processing
     6.2 Future Works and Prospect of Written Taiwanese Processing Research
Reference
Appendix
     A.1 Brief Introduction to The Phoneme of Taiwanese
     A.2 Examples of Written Taiwanese
     A.3 Terminologies
     A.4 Webpages Made by Author
     A.5 Differences between POJ and TL
List of Tables
     Table 1 - 1 Living Languages in Taiwan
     Table 1 - 2 Southern Min Language Populations around the World.
     Table 1 - 3 Visits to the Taiwan Southern Min Contents Website
     Table 1 - 4 The Names of “Taiwan Southern Min”
     Table 3 - 1 POJ Unicode Encoding
     Table 3 - 2 POJ and Numbered POJ Synopsis
     Table 3 - 3 Example of Taiwanese Article in 3 Scripts
     Table 3 - 4 Toneless Search for “hoe-chhia”
     Table 3 - 5 Checked Syllable Search for “cha”
     Table 3 - 6 Vowel Search for “uiN”
     Table 4 - 1 The Taiwanese Tone Sandhi Phenomena
     Table 4 - 2 Observation Data Sources
     Table 4 - 3 Test Data Sources
     Table 4 - 4 POS Classes
     Table 4 - 5 Tone Sandhi Marks
     Table 4 - 6 Tone Sandhi Marking Algorithm
     Table 4 - 7 Number of Errors and Accuracy Rate for Each Paragraph
     Table 4 - 8 Affected and Accurately Affected Syllables of Each Rule
     Table 4 - 9 Number of Dominant Rule, Accurate Dominate Rule and Accuracy Rate
     Table 4 - 10 Additional Rules to Obtain Higher Accuracy Rate
     Table 4 - 11 Error Conditions and Possible Solutions
     Table 5 - 1 Test Data List
     Table 5 - 2 Tagging Accuracy Rate for the Test Data
     Table 5 - 3 Example of POS Tagging Result
     Table 5 - 4 The Incorrect Mandarin Words Selected and Their Respective POS
     Table 5 - 5 Errors Caused by the Absence of Appropriate Mandarin Words in the OTMD
     Table 5 - 6 Unknown Words from the Viewpoint of Mandarin
     Table 5 - 7 The Reasons for the POS Tagging Errors
     Table 5 - 8 Tagging Accuracy Rates for Different Genres
     Table 5 - 9 Tagging Accuracy Rates for Different Eras
     Table A - 1 Consonants
     Table A - 2 Single Vowels
     Table A - 3 Compound Vowels
     Table A - 4 Nasal and Glottal
     Table A - 5 Nasal Finals and Final Stops
     Table A - 6 Syllabic Consonants
     Table A - 7 Tones
     Table A - 8 Terminologies
     Table A - 9 Differences between POJ and TL
List of Figures
     Fig 3 - 1 POJ Text Match Algorithm (Target: Word)
     Fig 3 - 2 POJ Toneless Search Algorithm (Target: Syllable)
     Fig 3 - 3 POJ Checked Syllable Search Algorithm (Target: Syllable)
     Fig 3 - 4 POJ Vowel Search Algorithm (Data: Syllable)
     Fig 3 - 5 POJ to Numbered POJ Algorithm
     Fig 3 - 6 Numbered POJ to POJ Algorithm
     Fig 3 - 7 Numbered POJ to POJ Graph Algorithm
     Fig 3 - 8 Unicode Display of POJ
     Fig 3 - 9 Graph Display of POJ
     Fig 3 - 10 Check If a Legal POJ Syllable Algorithm
     Fig 3 - 11 Result from Online Syllable/Word/Sentence Count System for POJ
     Fig 3 - 12 Backward Maximal Matching Algorithm for HR Mixed Script
     Fig 4 - 1 Taiwanese Tone Sandhi System Diagram
     Fig 5 - 1 Taiwanese Language POS Tagging System Architecture Diagram
     Fig A - 1 Han script (Taiwanese Folk Song)
     Fig A - 2 Han script (Taiwanese Textbook in Japanese-Ruled Period)
     Fig A - 3 POJ script (Taiwan Prefectural City Church News)
     Fig A - 4 HR Mixed Script (Taiwanese Writing Forum)