Malaysian Parliamentary Hansard Corpus (1981–2025)
A cleaned, speaker-resolved, bilingual corpus of Malaysian parliamentary debates, with derived year-level discourse indicators.
| Coverage | 1981–2025; Dewan Rakyat (House of Representatives) and Dewan Negara (Senate) |
|---|---|
| Source | Official Parliament of Malaysia (Parlimen Malaysia) records |
| Language | Malay (primary) and English |
| Unit of observation | Speech turn (~1.37 million turns) |
| Processing | OCR (Tesseract, msa+eng) for scanned years; idempotent cleaning; speaker resolution at ~94% coverage |
| Derived measures | Year-level discourse indicator (projection of Islam-related terms onto a state-administration ↔ social-autonomy semantic axis); per-turn semantic scores; party-camp monthly aggregates |
| Format | Parquet and CSV |
| Status | Cleaned and validated; documentation available on request |
This dataset is not available for direct download. It is shared upon request for academic, non-commercial use. Please email me to discuss access, intended use, and citation terms.
Request accessChiu, I. (2026). Malaysian parliamentary Hansard corpus, 1981–2025 [Unpublished data set].
Note: the underlying parliamentary records are official public documents of the Parliament of Malaysia. This corpus reflects my own compilation, digitization, cleaning, speaker resolution, and derived measures; please cite both this data set and the primary source.