目錄
|
Ch1 Introduction
1.1 Background
1.2 Different Types of Written Taiwanese Scripts
1.3 Related Issues of Written Taiwanese Processing
1.4 Organization of This Dissertation
|
Ch2 Resources and Survey of Written Taiwanese Processing
2.1 Digital Resources of Written Taiwanese
2.2 Survey of Written Taiwanese Processing Technique
2.3 Summary
|
Ch3 Coding, I/O of POJ and Text Processing
3.1 Character Code of POJ
3.2 Two Kinds of POJ Representation
3.3 Search Problem of POJ Text
3.4 Display of POJ Text
3.5 Some Text Processing Utilities for POJ
3.6 Word Segmentation for HR Mixed Script
|
Ch4 Tone Sandhi Problem and Algorithm
4.1 Tone Sandhi Problem of the Taiwanese Language
4.2 Implementation of the Taiwanese Pronunciation System
4.3 Rule-based Tone Sandhi Algorithm
4.4 Results, Accuracy Rate and Discussion
4.5 Summary and Possible Direction
|
Ch5 POS Tagging Method
5.1 Problems of POS Tagging
5.2 POS Tagging Methods
5.3 Results
5.4 Error Analysis
5.5 Discussion
5.6 Summary
|
Ch6 Conclusion and Future Work
6.1 Our Contributions to Written Taiwanese Resources and Processing
6.2 Future Works and Prospect of Written Taiwanese Processing Research
|
Reference
|
Appendix
A.1 Brief Introduction to The Phoneme of Taiwanese
A.2 Examples of Written Taiwanese
A.3 Terminologies
A.4 Webpages Made by Author
A.5 Differences between POJ and TL
|
List of Tables
Table 1 - 1 Living Languages in Taiwan
Table 1 - 2 Southern Min Language Populations around the World.
Table 1 - 3 Visits to the Taiwan Southern Min Contents Website
Table 1 - 4 The Names of “Taiwan Southern Min”
Table 3 - 1 POJ Unicode Encoding
Table 3 - 2 POJ and Numbered POJ Synopsis
Table 3 - 3 Example of Taiwanese Article in 3 Scripts
Table 3 - 4 Toneless Search for “hoe-chhia”
Table 3 - 5 Checked Syllable Search for “cha”
Table 3 - 6 Vowel Search for “uiN”
Table 4 - 1 The Taiwanese Tone Sandhi Phenomena
Table 4 - 2 Observation Data Sources
Table 4 - 3 Test Data Sources
Table 4 - 4 POS Classes
Table 4 - 5 Tone Sandhi Marks
Table 4 - 6 Tone Sandhi Marking Algorithm
Table 4 - 7 Number of Errors and Accuracy Rate for Each Paragraph
Table 4 - 8 Affected and Accurately Affected Syllables of Each Rule
Table 4 - 9 Number of Dominant Rule, Accurate Dominate Rule and Accuracy Rate
Table 4 - 10 Additional Rules to Obtain Higher Accuracy Rate
Table 4 - 11 Error Conditions and Possible Solutions
Table 5 - 1 Test Data List
Table 5 - 2 Tagging Accuracy Rate for the Test Data
Table 5 - 3 Example of POS Tagging Result
Table 5 - 4 The Incorrect Mandarin Words Selected and Their Respective POS
Table 5 - 5 Errors Caused by the Absence of Appropriate Mandarin Words in the OTMD
Table 5 - 6 Unknown Words from the Viewpoint of Mandarin
Table 5 - 7 The Reasons for the POS Tagging Errors
Table 5 - 8 Tagging Accuracy Rates for Different Genres
Table 5 - 9 Tagging Accuracy Rates for Different Eras
Table A - 1 Consonants
Table A - 2 Single Vowels
Table A - 3 Compound Vowels
Table A - 4 Nasal and Glottal
Table A - 5 Nasal Finals and Final Stops
Table A - 6 Syllabic Consonants
Table A - 7 Tones
Table A - 8 Terminologies
Table A - 9 Differences between POJ and TL
|
List of Figures
Fig 3 - 1 POJ Text Match Algorithm (Target: Word)
Fig 3 - 2 POJ Toneless Search Algorithm (Target: Syllable)
Fig 3 - 3 POJ Checked Syllable Search Algorithm (Target: Syllable)
Fig 3 - 4 POJ Vowel Search Algorithm (Data: Syllable)
Fig 3 - 5 POJ to Numbered POJ Algorithm
Fig 3 - 6 Numbered POJ to POJ Algorithm
Fig 3 - 7 Numbered POJ to POJ Graph Algorithm
Fig 3 - 8 Unicode Display of POJ
Fig 3 - 9 Graph Display of POJ
Fig 3 - 10 Check If a Legal POJ Syllable Algorithm
Fig 3 - 11 Result from Online Syllable/Word/Sentence Count System for POJ
Fig 3 - 12 Backward Maximal Matching Algorithm for HR Mixed Script
Fig 4 - 1 Taiwanese Tone Sandhi System Diagram
Fig 5 - 1 Taiwanese Language POS Tagging System Architecture Diagram
Fig A - 1 Han script (Taiwanese Folk Song)
Fig A - 2 Han script (Taiwanese Textbook in Japanese-Ruled Period)
Fig A - 3 POJ script (Taiwan Prefectural City Church News)
Fig A - 4 HR Mixed Script (Taiwanese Writing Forum)
|