Why do we need a national language operator?

Anybody using Google translate has surely noticed that recently, it has developed into a rather handy service – even in Finnish!

Automatic translation is one solution in which we benefit from natural language use and AI services based on it. In future we may encounter AI-based services that utilise text, sound and voice recognition in more and more versatile and demanding contexts.

One of the greatest effects that AI development has had on society is the automation of different services, and the possibility to tailor them to individual needs. In addition to automation, systems utilising Finnish language AI may create entirely new tools that can add wellbeing and work productivity. AI also enables taking into consideration groups that require special support, such as elderly, sight and hearing-impaired people, and persons with reduced mobility.

In order for us to get AI resources developed in our mother tongue, we need Finnish (or Swedish or Sami) language resources. These are for instance language collections – i.e. language corpora – that include language created by people as speech or text, and speech language models based on them. Thus, the computer has to be taught how to ”speak” Finnish. The intention is also that these models would take into consideration for example different kinds of dialect words and differences in pronunciation, regardless of whether the speakers are young or old, or if the language used is not their mother tongue.

Although there are language models on the market – like there is behind Google’s service – the most significant ones are closed and primarily serve the needs of the multinational corporations that developed them. These companies naturally prioritise large markets, and thus Finnish language solutions will be made available long after large language areas get theirs – if even then.

Small language areas must drive the utilisation of their own language themselves

The need for services in our own mother tongue has been identified in Finland, but companies and public sector actors approach the problem separately and independently, and solutions are often directed to research use for example due to licensing. Small language areas are inevitably left behind in AI development if they do not have their own language operator.

Consequently, we would need an actor who ensures that our language resources are available reliably and in the long run. This organisation would maintain material, software, and models, and offer guidance on how these resources are used and shared. The same organisation could coordinate collaboration between all parties interested in Finnish language technologies.

The EU Commission arranged a Stakeholder Consultation on Language Technologies in Digital Europe in February, in which the pioneer work conducted in Finland was presented as an example of best practices in Europe. The Finnish State Development Company Vake published a pre-study report towards the end of 2019 about Finnish language resources as a part of AI development, for which 50 commercial and public sector representatives were interviewed regarding their needs. The research showed commercial need for a large-scale, balanced language corpus of everyday language. At present it is being considered where such a corpus should be located and how it should work. The operating model created could act as an example for similar operators in other countries.

The market for solutions utilising AI is growing rapidly and Finnish companies could take a forerunner position in this development work. As the market lacks an actor that would boost Finnish language utilisation, the government has the opportunity to take on an accelerating role in creating a language operator.