Searching in multiple languages – the NLP implications
As regular readers of my blog know, I have been highlighting for many years the importance of recognizing that enterprise search strategies have to reflect corporate language policies and management . I have just come across an August 2020 post by Sebastian Ruder entitled port‘Why You Should Do NLP Beyond English’. Lest you put this aberration down to a research student Sebastian has a wealth of experience in the topic and works for DeepMind. It is such a relief to know that Sebastian is supporting my assertion that far too little attention is paid to NLP/AI/ML etc etc in languages other than English and especially in situations where a repository has content in multiple languages and the search queries may be in multiple languages. Not only is there an index/query match issue but signiificant challenges on how the results are then presented on the SERP.
As it happens a recent research paper investigates how 492 of the largest companies in Norway comply with the language requirement of the Norwegian Accounting Act Article 3-4. The results show that 36% of the companies presented their financial statements in Norwegian only, 45% in one or more language(s) in addition to Norwegian, while 19% had been granted dispensation and presented statements in English-only. This is an outcome of a 2010 decision by the Norwegian Government on a national language policy. Note that the company has to apply for a dispensation; it cannot take the decision itself based on a corporate language policy. It is worth noting that with Brexit now accomplished there is no ‘member state’ reason for translating documents into English.
The complex issues around a viewpoint that ‘English will do’ has given rise to the Journal of English as a Lingua Franca. The papers in this journal take a neutral stance on whether the organisational adoption of English as a Lingua Franca (ELF) is a benefit or an obstacle. As the papers show it can be both and it is concomitant on the organisation to ensure that the balance is optimal. In another recent paper Guro Sanden notes that based on case studies from Sweden and Denmark that the MNCs’ language policy and planning (LPP) activities go beyond the boundaries of the organisations, and interfere with the LPP activities of their home countries. The paper concludes that the language planning activities of MNCs may be even more important and impactful than those of the nation‐state.
In a 2018 paper ‘Ten reasons why corporate language policies can create more problems than they solve’ Sanden looks at the wider picture and comments that increasing number of multilingual organisations such as multinational corporations (MNCs) choose to address linguistic diversity through corporate language policies, for example by adopting a common corporate language. Although a common corporate language may improve efficiency of communication at the front-line level, previous research has demonstrated that there are several potentially negative consequences associated with the implementation of such policies. This conceptual paper reviews the role of language policies in multilingual organisations and identifies ten crucial language policy challenges in international business and management. The bottom line is the need to recognise the need to manage language and not just feel that a coporate policy will solve all the problems.
Just in case you think that Guro Sanden is an outlier it is well worth looking at the web site of Anne-Will Harzing, Professor of International Management at Middlesex University, London. International language management has been a research interest of hers for many years.
Searching across multiple languages is difficult. Over the years I have found client after client sweeping the issues aside and hoping that multilingual staff will sort out the problem. The paper by Sebastian Ruder broadens the discussion considerably. Whether you are a search developer, a customer or a prospective customer, language management issues need to be taken very seriously indeed.