Projects

Asset Publisher

Aswat Corpus:

Aswat is a general spoken corpus that includes multiple linguistic levels—both Modern Standard Arabic and dialects—sourced from various regions within Saudi Arabia. The audio recordings were collected from diverse social groups across five regions of the Kingdom. These recordings are accompanied by key metadata and have been transcribed in accordance with global standards for structuring and managing linguistic audio data, such as (TEI) and (CODA).

Objectives of the Corpus:
•    Strengthen the academy’s reference in the field of Arabic spoken corpus creation.
•    Promote scientific and research activity in the development of Arabic spoken corpora.
•    Collect audio data representing various Saudi dialects.
•    Build a spoken corpus based on modern methodologies used in the creation of global spoken corpora.
•    Represent different social strata and document their dialects audiolly within Saudi society.
•    Leverage modern technologies to deliver linguistic audio data to the academic linguistic community for the study of its phenomena.
•    Provide machine-readable audio data, supported by morphological, syntactical, lexical, and semantic analyses for use in artificial intelligence models.