Microsoft announced plans to create digital data for AI models in over 12 languages to promote linguistic diversity in AI technologies. The initiative involves digitizing non-English books and preparing hundreds of hours of audio recordings across various languages.
The company will open new research center branches in Strasbourg, eastern France, starting September 2025, aiming to expand data availability in at least 10 of the European Union’s 24 official languages.
Microsoft President Brad Smith explained that most AI training databases are primarily in English, limiting model effectiveness for languages with scarce data. This imbalance may lead users to switch to English rather than using their native languages.