Semantic Arabic Encoding and Format
هذه الصفحة موجودة بالعربية تحت عنوان الترميز المعنوي للغة العربية |
Semantic Arabic Encoding and Format
This information is intended to be a permanent exchange during the progression of the project. Any suggestions or contributions are welcome.
محتويات
First glance
Computers still deal with the Arabic language in comparison to other languages, especially English, in a limited way. It will deal with it on the textual level, with regards to form, storage, encoding problems,or their treatment or with respect to meaning which is included in by the text. Moreover, the Arabic language, in contrast to European languages, is a heavily morphological language (i.e. there is a rigid structure that links the structure of words in relation to their semantic and logical meaning).
This should justify striving for the understanding of different mechanisms of the structure of the Arabic language from which it benefits and ending the pressing need for treating unstructured text and dealing with meaning in Arabic.
Currently, we are in need of a clear reading of texts on human pronunciation, especially Arabic letters, to guarantee Semantic Arabic Encoding and Format's binary specs accurately whether they reflect human characteristics, acoustic characteristics, or that which is specific to the Arabic language in the way the letters follow each other or are grouped together in the hope that we may contribute to limited the total space it is necessary to study for this project.
Also, we need a chart of irregular words and loanwords in the Arabic language and irregular grammar rules. It appears that the Arab Translators Society is succeeding in this way as we too need to make up charts of derived patterns and devices for their classification and linking.
In the advanced stages, we will also need to round up all components of the word with the connected pronouns or even separated ones with genitive markers.
Project Objectives
The Arabic Semantic Encoding project aims to writing mechanisms for storing Arabic texts (words) including its structure and derivational relationships in a way that reflects the most of semantic correlation possible. These objectives include:
- The development of a system to include the meaning and the sentence structure.
- The generation of an non-itemized Arabic morphological lexicon which will be the basis for a complete Arabic lexicon.
- Easing searching and categorization whether on the morphological or semantic level.
- The possibility for linking the encoding with components to form a spell checker and then a grammar checker (which is the first system we really wanted to develop, an interactive spell checker for simplifying writing in Modern Standard Arabic (Fusha), an effective dictionary for words and technical terms, etc.).
- The ability for the system to simplify dealing with Modern Standard Arabic (Fusha) and its writing accurately (to increase its use) and store texts to structure unstructured texts, then find mechanisms for moving Arabic texts to a structured and correlated format. (Currently, less than 1% of Arabic texts are structured.)
- A way to distinguish non-Arabic words (i.e. not derived from Arabic or are not subject to Arabic structure) and putting it through a framework that must exist.
The Current Idea for the Project
ُAs far as I see it, this project depends upon breaking new ground on dealing with Arabic text as a series of utterances in an alphabetic encoding and dealing with word encoding at the alphabet level like a continuous classification device that resembles building a rule resolver with an attempt to preserve the authenticity of the the symbol throughout and simplifying dealing with it via high level programming languages.
Project Progress
We have succeeded currently in our first rough sketch for project. We have written a preliminary outline for the Semantic Encoding and Format Project as written by Brother Khaldoun Sinjab as a simple prototype to clarify the basic idea behind the project.
Links
What We Need
- The vocal research