Ibrahim Chiu

Data

Below are research datasets I have assembled and processed. They are shared for academic, non-commercial use, and are available upon request rather than direct download. If you would like to use any of them, please email me to discuss access and terms; appropriate citation is required.

Malaysian Parliamentary Hansard Corpus (1981–2025)

A cleaned, speaker-resolved, bilingual corpus of Malaysian parliamentary debates, with derived year-level discourse indicators.

Coverage 1981–2025; Dewan Rakyat (House of Representatives) and Dewan Negara (Senate)
Source Official Parliament of Malaysia (Parlimen Malaysia) records
Language Malay (primary) and English
Unit of observation Speech turn (~1.37 million turns)
Processing OCR (Tesseract, msa+eng) for scanned years; idempotent cleaning; speaker resolution at ~94% coverage
Derived measures Year-level discourse indicator (projection of Islam-related terms onto a state-administration ↔ social-autonomy semantic axis); per-turn semantic scores; party-camp monthly aggregates
Format Parquet and CSV
Status Cleaned and validated; documentation available on request
Access

This dataset is not available for direct download. It is shared upon request for academic, non-commercial use. Please email me to discuss access, intended use, and citation terms.

Request access
Suggested citation (APA 7th):
Chiu, I. (2026). Malaysian parliamentary Hansard corpus, 1981–2025 [Unpublished data set].

Note: the underlying parliamentary records are official public documents of the Parliament of Malaysia. This corpus reflects my own compilation, digitization, cleaning, speaker resolution, and derived measures; please cite both this data set and the primary source.

Last updated: May 2026.