Convert Language Code Three Characters (iso 639-2) To Two-character Code (iso 639-1)
Solution 1:
tl;dr
As a workaround, make your own Map< Locale , String>
mapping each known Locale
to its 2-letter language code per ISO 639-1.
new LocaleLookup().lookupTwoLetterLanguageCode( Locale.CANADA_FRENCH )
fr
Or maybe just parse the text of Locale::toString
.
Locale
.CANADA_FRENCH.toString() // fr_CA.split( "_" ) // Array: { "fr" , "CA" }[ 0 ]// Grab first element in array, "fr".
fr
For two-letter country code, use the second part of that split string. Use index of 1
instead of 0
.
Locale
.CANADA_FRENCH.toString() // fr_CA.split( "_" ) // Array: { "fr" , "CA" }[ 1 ]// Grab first element in array, "CA".
CA
Bug?
It seems to be a bug that Locale::getLanguage
would return a 3-letter code. The Javadoc uses a 2-letter code in its code example. But unfortunately the Javadoc fails to specify explicitly 2 or 3 letters. I suggest you file a request with the OpenJDK project to clarify this Javadoc.
Workaround
As a workaround, perhaps you could call Locale.getISOLanguages
to get an array of all known languages in 2-letter codes. Then loop those. For each, use the code seen in the Javadoc, passing 2-letter code to constrict a Locale
object for comparison:
if (locale.getLanguage().equals(new Locale("he").getLanguage()))
From this build your own Map
between locale and 2-letter code.
Example class
Here is my first stab at such a workaround map.
In the constructor, we get a list of all known locales, and all known 2-letter ISO 639-1 language codes.
Next we do a nested loop. For each locale, we loop all the 2-letter language codes until we find a match. Notice that we do not do a string match. The Javadoc warns us that the ISO 639 standard is not stable; the codes are changing. Quoting:
Note: ISO 639 is not a stable standard— some languages' codes have changed. Locale's constructor recognizes both the new and the old codes for the languages whose codes have changed, but this function always returns the old code. If you want to check for a specific language whose code has changed, don't do
if (locale.getLanguage().equals("he")) // BAD!
Instead, do
if (locale.getLanguage().equals(new Locale("he").getLanguage())) // GOOD.
So our inner loop looks at each known 2-letter language code, and gets a Locale
object for that language. Then our if
statement compares the output of getLanguage
for (a) our outer loop’s Locale
, and (b) our inner loop’s generated language-only Locale
(generated by our 2-letter code). In your case, you claim some device is outputting 3-letter code value for our call to getLanguage
. But whether 2 or 3 letters, does not matter. We are just looking for a match.
Once instantiated, we can ask our LocaleLookup
instance for the two-letter code matching a particular Locale
by calling the lookupTwoLetterLanguageCode
method.
LocaleLookuplocaleLookup=newLocaleLookup();
Localelocale= Locale.CANADA_FRENCH;
Stringcode= localeLookup.lookupTwoLetterLanguageCode( locale );
System.out.println( "Locale: " + locale.toString() + " " + locale.getDisplayName( Locale.getDefault() ) + " | ISO 639-1 code: " + code );
Locale: fr_CA French (Canada) | ISO 639-1 code: fr
I'm just guessing at all this. I have not thought it through, nor have I tested any of this. So buyer-beware, this solution is worth every penny you paid for it. Good luck.
Here is the entire class, with a public static void main
to use as demonstration.
package work.basil.example;
import java.util.*;
publicclassLocaleLookup
{
private Map < Locale, String > mapLocaleToTwoLetterLangCode;
publicLocaleLookup( )
{
this.mapLocaleToTwoLetterLangCode = newHashMap <>( Locale.getAvailableLocales().length );
this.makeMaps();
System.out.println( "mapLocaleToTwoLetterLangCode = " + mapLocaleToTwoLetterLangCode );
}
privatevoidmakeMaps( )
{
// Get all locales.
Set < Locale > locales = Set.of( Locale.getAvailableLocales() );
// Get all languages, per 2-letter code.
Set < String > twoLetterLanguageCodes = Set.of( Locale.getISOLanguages() ); // Returns: An array of ISO 639 two-letter language codes.for ( Locale locale : locales )
{
for ( String twoLetterLanguageCode : twoLetterLanguageCodes )
{
if ( locale.getLanguage().equals( newLocale( twoLetterLanguageCode ).getLanguage() ) )
{
this.mapLocaleToTwoLetterLangCode.put( locale , twoLetterLanguageCode );
break;
}
}
}
// System.out.println( "locales = " + locales );// System.out.println( "twoLetterLanguageCodes = " + twoLetterLanguageCodes );
}
public String lookupTwoLetterLanguageCode( final Locale locale )
{
Stringcode=this.mapLocaleToTwoLetterLangCode.get( locale );
Objects.requireNonNull( code );
return code;
}
publicstaticvoidmain( String[] args )
{
LocaleLookuplocaleLookup=newLocaleLookup();
Localelocale= Locale.CANADA_FRENCH;
Stringcode= localeLookup.lookupTwoLetterLanguageCode( locale );
System.out.println( "Locale: " + locale.toString() + " " + locale.getDisplayName( Locale.getDefault() ) + " | ISO 639-1 code: " + code );
}
}
And here is the map I produce in a pre-release version of Java 15. Note this may be incorrect, as I have seen some goofiness with locales in the pre-release version.
mapLocaleToTwoLetterLangCode = {nn=nn, ar_JO=ar, bg=bg, zu=zu, am_ET=am, fr_DZ=fr, ti_ET=ti, bo_CN=bo, qu_EC=qu, ta_SG=ta, lv=lv, en_NU=en, en_MS=en, zh_SG_#Hans=zh, ff_LR_#Adlm=ff, en_GG=en, en_JM=en, vo=vo, sd__#Arab=sd, sv_SE=sv, sr_ME=sr, dz_BT=dz, es_BO=es, en_ZM=en, fr_ML=fr, br=br, ha_NG=ha, fa_AF=fa, ar_SA=ar, sk=sk, os_GE=os, ml=ml, en_MT=en, en_LR=en, ar_TD=ar, en_GH=en, en_IL=en, sv=sv, cs=cs, el=el, af=af, ff_MR_#Latn=ff, sw_UG=sw, tk_TM=tk, sr_ME_#Cyrl=sr, ar_EG=ar, sd__#Deva=sd, ji_001=yi, yo_NG=yo, se_NO=se, ku=ku, sw_CD=sw, vo_001=vo, en_PW=en, pl_PL=pl, ff_MR_#Adlm=ff, it_VA=it, sr_CS=sr, ne_IN=ne, es_PH=es, es_ES=es, es_CO=es, bg_BG=bg, ji=yi, ar_EH=ar, bs_BA_#Latn=bs, en_VC=en, nb_SJ=nb, es_US=es, en_US_POSIX=en, en_150=en, ar_SD=ar, en_KN=en, ha_NE=ha, pt_MO=pt, ro_RO=ro, zh__#Hans=zh, lb_LU=lb, sr_ME_#Latn=sr, es_GT=es, so_KE=so, ff_LR_#Latn=ff, ff_GH_#Latn=ff, fr_PM=fr, ar_KM=ar, no_NO_NY=no, fr_MG=fr, es_CL=es, mn=mn, tr_TR=tr, eu=eu, fa_IR=fa, en_MO=en, wo=wo, en_BZ=en, sq_AL=sq, ar_MR=ar, es_DO=es, ru=ru, az=az, su__#Latn=su, fa=fa, kl_GL=kl, en_NR=en, nd=nd, kk=kk, en_MP=en, az__#Cyrl=az, en_GD=en, tk=tk, hy=hy, en_BW=en, en_AU=en, en_CY=en, ta_MY=ta, ti_ER=ti, en_RW=en, sv_FI=sv, nd_ZW=nd, lb=lb, ne=ne, su=su, zh_SG=zh, en_IE=en, ln_CD=ln, en_KI=en, om_ET=om, no=no, ja_JP=ja, my=my, ka=ka, ar_IL=ar, ff_GH_#Adlm=ff, or_IN=or, fr_MF=fr, ms_ID=ms, kl=kl, en_SZ=en, zh=zh, es_PE=es, ta=ta, az__#Latn=az, en_GB=en, zh_HK_#Hant=zh, ar_SY=ar, bo=bo, kk_KZ=kk, tt_RU=tt, es_PA=es, om_KE=om, ar_PS=ar, fr_VU=fr, en_AS=en, zh_TW=zh, sd_IN=sd, fr_MC=fr, kw=kw, fr_NE=fr, pt_MZ=pt, ur_IN=ur, ln=ln, en_JE=en, ln_CF=ln, en_CX=en, pt=pt, en_AT=en, gl=gl, sr__#Cyrl=sr, es_GQ=es, kn_IN=kn, ff__#Adlm=ff, ar_YE=ar, en_SX=en, to=to, ga=ga, qu=qu, ru_KZ=ru, en_TZ=en, et=et, en_PR=en, jv=jv, ko_KP=ko, in=in, sn=sn, ps=ps, nl_SR=nl, en_BS=en, km=km, fr_NC=fr, be=be, gv=gv, es=es, gd_GB=gd, nl_BQ=nl, ff_GN_#Adlm=ff, fr_CM=fr, uz_UZ_#Cyrl=uz, pa_IN_#Guru=pa, en_KE=en, ja=ja, fr_SN=fr, or=or, fr_MA=fr, pt_LU=pt, ff_GM_#Adlm=ff, fr_BL=fr, en_NL=en, ln_CG=ln, te=te, sl=sl, ha=ha, mr_IN=mr, ko_KR=ko, el_CY=el, ku_TR=ku, es_MX=es, es_HN=es, hu_HU=hu, ff_SN=ff, sq_MK=sq, sr_BA_#Cyrl=sr, fi=fi, bs__#Cyrl=bs, uz=uz, et_EE=et, sr__#Latn=sr, en_SS=en, bo_IN=bo, sw=sw, fy_NL=fy, ar_OM=ar, tr_CY=tr, rm=rm, fr_BI=fr, en_MG=en, uz_UZ_#Latn=uz, bn=bn, de_IT=de, kn=kn, fr_TN=fr, sr_RS=sr, bn_BD=bn, de_CH=de, fr_PF=fr, gu=gu, pt_GQ=pt, en_ZA=en, en_TV=en, lo=lo, fr_FR=fr, en_PN=en, fr_BJ=fr, en_MH=en, zh__#Hant=zh, zh_HK_#Hans=zh, cu_RU=cu, nl_NL=nl, en_GY=en, ps_AF=ps, bs__#Latn=bs, ky=ky, os=os, bs_BA_#Cyrl=bs, nl_CW=nl, ar_DZ=ar, sk_SK=sk, pt_CH=pt, fr_GQ=fr, xh=xh, ki_KE=ki, am=am, fr_CI=fr, en_NG=en, ia_001=ia, en_PK=en, zh_CN=zh, en_LC=en, rw=rw, ff_BF_#Adlm=ff, wo_SN=wo, gv_IM=gv, iw=iw, en_TT=en, mk_MK=mk, sl_SI=sl, fr_HT=fr, te_IN=te, nl_SX=nl, ce=ce, fr_CG=fr, xh_ZA=xh, fr_BE=fr, ff_NE_#Adlm=ff, es_VE=es, mt_MT=mt, mr=mr, mg=mg, ko=ko, en_BM=en, nb_NO=nb, ak=ak, dz=dz, vi_VN=vi, en_VU=en, ia=ia, en_US=en, ff_SL_#Latn=ff, to_TO=to, ff_SN_#Adlm=ff, fr_BF=fr, pa__#Guru=pa, it_SM=it, su_ID=su, fr_YT=fr, gu_IN=gu, ii_CN=ii, ff_CM_#Latn=ff, pa_PK_#Arab=pa, fr_RE=fr, fi_FI=fi, ca_FR=ca, sr_BA_#Latn=sr, bn_IN=bn, fr_GP=fr, pa=pa, tg=tg, fr_DJ=fr, rn=rn, uk_UA=uk, ks__#Arab=ks, hu=hu, fr_CH=fr, en_NF=en, ff_GW_#Adlm=ff, ha_GH=ha, sr_XK_#Cyrl=sr, bm=bm, ar_SS=ar, en_GU=en, nl_AW=nl, de_BE=de, en_AI=en, en_CM=en, cs_CZ=cs, ca_ES=ca, tr=tr, ff_GW_#Latn=ff, rm_CH=rm, ru_MD=ru, ms_MY=ms, ta_LK=ta, en_TO=en, ff_SN_#Latn=ff, ff_SL_#Adlm=ff, cy=cy, en_PG=en, fr_CF=fr, pt_TL=pt, sq=sq, tg_TJ=tg, fr=fr, en_ER=en, qu_PE=qu, sr_BA=sr, es_PY=es, de=de, es_EC=es, ff_CM_#Adlm=ff, lg_UG=lg, ff_NE_#Latn=ff, zu_ZA=zu, fr_TG=fr, su_ID_#Latn=su, sr_XK_#Latn=sr, en_PH=en, ig_NG=ig, fr_GN=fr, zh_MO_#Hans=zh, lg=lg, ru_RU=ru, se_FI=se, ff=ff, en_DM=en, en_CK=en, sd=sd, ar_MA=ar, ga_IE=ga, en_BI=en, en_AG=en, fr_TD=fr, fr_LU=fr, en_WS=en, fr_CD=fr, so=so, rn_BI=rn, en_NA=en, mi_NZ=mi, ar_ER=ar, ms=ms, sn_ZW=sn, iw_IL=iw, ug=ug, es_EA=es, ga_GB=ga, th_TH_TH_#u-nu-thai=th, hi=hi, fr_SC=fr, ca_IT=ca, ff_NG_#Latn=ff, en_SL=en, no_NO=no, ca_AD=ca, ff_NG_#Adlm=ff, zh_MO_#Hant=zh, en_SH=en, qu_BO=qu, vi=vi, sd_PK_#Arab=sd, fr_CA=fr, de_LU=de, sq_XK=sq, en_KY=en, mi=mi, mt=mt, it_CH=it, de_DE=de, si_LK=si, en_AE=en, en_DK=en, so_DJ=so, eo=eo, lt_LT=lt, it_IT=it, en_ZW=en, ar_SO=ar, ro=ro, en_UM=en, ps_PK=ps, eo_001=eo, ee=ee, fr_MU=fr, nn_NO=nn, se_SE=se, pl=pl, en_TK=en, en_SI=en, ur=ur, uz__#Arab=uz, pt_GW=pt, se=se, lo_LA=lo, af_ZA=af, ar_LB=ar, ms_SG=ms, ee_TG=ee, ln_AO=ln, be_BY=be, ff_GN=ff, in_ID=in, es_BZ=es, ar_AE=ar, hr_HR=hr, as=as, it=it, pt_CV=pt, ks_IN=ks, uk=uk, my_MM=my, mn_MN=mn, ur_PK=ur, en_FM=en, da_DK=da, es_PR=es, en_BE=en, ii=ii, fr_WF=fr, tt=tt, ru_BY=ru, fo_DK=fo, ee_GH=ee, en_SG=en, ar_BH=ar, ff_GM_#Latn=ff, om=om, en_CH=en, hi_IN=hi, fo_FO=fo, yo_BJ=yo, fr_KM=fr, fr_MQ=fr, ff_GN_#Latn=ff, en_SD=en, es_AR=es, ff__#Latn=ff, en_MY=en, ja_JP_JP_#u-ca-japanese=ja, es_SV=es, pt_BR=pt, ml_IN=ml, en_FK=en, uz__#Cyrl=uz, is_IS=is, hy_AM=hy, en_GM=en, en_DG=en, fo=fo, ne_NP=ne, pt_ST=pt, hr=hr, ak_GH=ak, lt=lt, uz_AF_#Arab=uz, ta_IN=ta, fr_GF=fr, en_SE=en, zh_CN_#Hans=zh, es_419=es, is=is, pt_AO=pt, si=si, en_001=en, jv_ID=jv, en=en, es_IC=es, fr_MR=fr, ca=ca, ru_KG=ru, ar_TN=ar, ks=ks, zh_TW_#Hant=zh, ff_BF_#Latn=ff, bm_ML=bm, kw_GB=kw, ug_CN=ug, as_IN=as, es_BR=es, zh_HK=zh, sw_KE=sw, en_SB=en, th_TH=th, rw_RW=rw, ar_IQ=ar, en_MW=en, mk=mk, en_IO=en, pa__#Arab=pa, en_DE=en, ar_QA=ar, en_CC=en, ro_MD=ro, en_FI=en, bs=bs, pt_PT=pt, fy=fy, az_AZ_#Cyrl=az, th=th, es_CU=es, ar=ar, en_SC=en, en_VI=en, eu_ES=eu, en_UG=en, en_NZ=en, es_UY=es, sg_CF=sg, ru_UA=ru, sg=sg, uz__#Latn=uz, el_GR=el, da_GL=da, en_FJ=en, de_LI=de, en_BB=en, km_KH=km, hr_BA=hr, de_AT=de, nl=nl, lu_CD=lu, ca_ES_VALENCIA=ca, ar_001=ar, so_SO=so, lv_LV=lv, sd_IN_#Deva=sd, es_CR=es, ar_KW=ar, fr_GA=fr, ar_LY=ar, sr=sr, sr_RS_#Cyrl=sr, en_MU=en, da=da, gl_ES=gl, az_AZ_#Latn=az, en_IM=en, en_LS=en, ig=ig, en_HK=en, en_GI=en, ce_RU=ce, gd=gd, en_CA=en, ka_GE=ka, fr_SY=fr, sw_TZ=sw, so_ET=so, fr_RW=fr, nl_BE=nl, ar_DJ=ar, mg_MG=mg, en_VG=en, cy_GB=cy, cu=cu, sr_RS_#Latn=sr, os_RU=os, en_TC=en, sv_AX=sv, ky_KG=ky, af_NA=af, lu=lu, en_IN=en, yo=yo, ki=ki, es_NI=es, nb=nb, sd_PK=sd, ti=ti, ms_BN=ms, br_FR=br}
Substring of Locale.toString
?
Now, after having done all that work, I notice that the toString
representation of the locale name starts with the two-letter language code! 🤦
If this always the case for all Locale
objects, we can simply parse that string.
String twoLetterLanguageCode = Locale.CANADA_FRENCH.toString().split( "_" )[ 0 ];
twoLetterCode = fr
For country code, do the same, but pull the second part. Use an index value of 1
versus 0
.
String twoLetterCountryCode = Locale.CANADA_FRENCH.toString().split( "_" )[ 1 ];
For this quick check on my pre-release Java 15, it does seem to be the case that every Locale
object’s toString
text starts with the 2-letter language code. But I do not know if you can count on that always being the case in the past and in the future.
System.out.println( Locale.getAvailableLocales().length ); ArrayList < Locale > problemLocales = new ArrayList <>( Locale.getAvailableLocales().length ); for ( Locale locale : Locale.getAvailableLocales() ) { String parsed = locale.toString().split( "_" )[ 0 ]; if ( ! parsed.equalsIgnoreCase( locale.getLanguage() ) ) { problemLocales.add( locale ); } }
System.out.println( "problemLocales = " + problemLocales );
problemLocales = []
Or, vice-versa:
System.out.println( "Locale.getAvailableLocales().length: " + Locale.getAvailableLocales().length );
ArrayList < Locale > matchingLocales = newArrayList <>( Locale.getAvailableLocales().length );
for ( Locale locale : Locale.getAvailableLocales() )
{
Stringparsed= locale.toString().split( "_" )[ 0 ];
if ( parsed.equalsIgnoreCase( locale.getLanguage() ) )
{
matchingLocales.add( locale );
}
}
System.out.println( "matchingLocales.size: " + matchingLocales.size() );
System.out.println( "matchingLocales = " + matchingLocales );
Locale.getAvailableLocales().length: 810
matchingLocales.size: 810
Post a Comment for "Convert Language Code Three Characters (iso 639-2) To Two-character Code (iso 639-1)"