Skip to content Skip to sidebar Skip to footer

Unicode Const Char* To JString Using JNI And C++

Simple question. How can I get a jstring out of a unicode const char*, using JNI and C++? Here's my issue, and what I have already tried: const char* value = (some value from serve

Solution 1:

As the error message shows, your char* is not a valid Modifed-utf8, so the JVM aborted.

You got two methods to avoid them.

  1. check char* content to avoid a crash.

the check logic in android ART check_jni.cc is as following https://android.googlesource.com/platform/art/+/35e827a/runtime/check_jni.cc#1273

jstring toJString(JNIEnv* env, const char* bytes) {
    const char* error = nullptr;
    auto utf8 = CheckUtfBytes(bytes, &error);
    if (error) {
        std::ostringstream msg;
        msg << error << " 0x" << std::hex << static_cast<int>(utf8);
        throw std::system_error(-1, std::generic_category(), msg.str());
    } else {
        return env->NewStringUTF(bytes);
    }

This way, you always get a valid jstring.

  1. Using String constructor to build from a jbyteArray.
jstring toJString(JNIEnv *env, const char *pat) {
    int len = strlen(pat);
    jbyteArray bytes = env->NewByteArray(len);
    env->SetByteArrayRegion(bytes, 0, len, (jbyte *) pat);
    jstring encoding = env->NewStringUTF("utf-8");
    jstring jstr = (jstring) env->NewObject(java_lang_String_class,
            java_lang_String_init, bytes, encoding);
    env->DeleteLocalRef(encoding);
    env->DeleteLocalRef(bytes);
    return jstr;
}

This way, you just avoid the crash, but the string may be still not valid, and you copy memory twice, which performs badly.

plus the code:

inline bool checkUtfBytes(const char* bytes) {
  while (*bytes != '\0') {
    const uint8_t* utf8 = reinterpret_cast<const uint8_t*>(bytes++);
    // Switch on the high four bits.
    switch (*utf8 >> 4) {
      case 0x00:
      case 0x01:
      case 0x02:
      case 0x03:
      case 0x04:
      case 0x05:
      case 0x06:
      case 0x07:
        // Bit pattern 0xxx. No need for any extra bytes.
        break;
      case 0x08:
      case 0x09:
      case 0x0a:
      case 0x0b:
        // Bit patterns 10xx, which are illegal start bytes.
        return false;
      case 0x0f:
        // Bit pattern 1111, which might be the start of a 4 byte sequence.
        if ((*utf8 & 0x08) == 0) {
          // Bit pattern 1111 0xxx, which is the start of a 4 byte sequence.
          // We consume one continuation byte here, and fall through to consume two more.
          utf8 = reinterpret_cast<const uint8_t*>(bytes++);
          if ((*utf8 & 0xc0) != 0x80) {
            return false;
          }
        } else {
          return false;
        }
        // Fall through to the cases below to consume two more continuation bytes.
      case 0x0e:
        // Bit pattern 1110, so there are two additional bytes.
        utf8 = reinterpret_cast<const uint8_t*>(bytes++);
        if ((*utf8 & 0xc0) != 0x80) {
          return false;
        }
        // Fall through to consume one more continuation byte.
      case 0x0c:
      case 0x0d:
        // Bit pattern 110x, so there is one additional byte.
        utf8 = reinterpret_cast<const uint8_t*>(bytes++);
        if ((*utf8 & 0xc0) != 0x80) {
          return false;
        }
        break;
    }
  }
  return true;
}

Post a Comment for "Unicode Const Char* To JString Using JNI And C++"