Usage Of U2net Model In Android

December 24, 2023 Post a Comment

I converted the original u2net model weight file u2net.pth to tensorflow lite by following these instructructions, and it is converted successfully. However I'm having trouble usin

Solution 1:

I will write a long answer here. Getting in touch with the github repo of U2Net it leaves you with the effort to examine the pre and post-processing steps so you can aply the same inside the android project.

First of all preprocessing: In the u2net_test.py file you can see at this line that all the images are preprocessed with function ToTensorLab(flag=0). Navigating to this you see that with flag=0 the preprocessing is this:

else: # with rgb color (flag = 0)
            tmpImg = np.zeros((image.shape[0],image.shape[1],3))
            image = image/np.max(image)
            if image.shape[2]==1:
                tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,1] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,2] = (image[:,:,0]-0.485)/0.229
            else:
                tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
                tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225

Pay attention to 2 steps.

First every color pixel value is divided by the maximum value of all color pixel values:

image = image/np.max(image)

and

Second at every color pixel value is applied mean and std:

tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225

So basically in Kotlin if you have a bitmap you have to do something like:

funbitmapToFloatArray(bitmap: Bitmap):
                Array<Array<Array<FloatArray>>> {
            
            val width: Int = bitmap.width
            val height: Int = bitmap.height
            val intValues = IntArray(width * height)
            bitmap.getPixels(intValues, 0, width, 0, 0, width, height)

            // Create aa array to find the maximum valueval fourDimensionalArray = Array(1) {
                Array(320) {
                    Array(320) {
                        FloatArray(3)
                    }
                }
            }
            // https://github.com/xuebinqin/U-2-Net/blob/f2b8e4ac1c4fbe90daba8707bca051a0ec830bf6/data_loader.py#L204for (i in0 until width - 1) {
                for (j in0 until height - 1) {
                    val pixelValue: Int = intValues[i * width + j]
                    fourDimensionalArray[0][i][j][0] =
                        Color.red(pixelValue)
                            .toFloat()
                    fourDimensionalArray[0][i][j][1] =
                        Color.green(pixelValue)
                            .toFloat()
                    fourDimensionalArray[0][i][j][2] =
                        Color.blue(pixelValue).toFloat()
                }

            }
            // Convert multidimensional array to 1Dval oneDFloatArray = ArrayList<Float>()

            for (m in fourDimensionalArray[0].indices) {
                for (x in fourDimensionalArray[0][0].indices) {
                    for (y in fourDimensionalArray[0][0][0].indices) {
                        oneDFloatArray.add(fourDimensionalArray[0][m][x][y])
                    }
                }
            }

            val maxValue: Float = oneDFloatArray.maxOrNull() ?: 0f//val minValue: Float = oneDFloatArray.minOrNull() ?: 0f// Final array that is going to be used with interpreterval finalFourDimensionalArray = Array(1) {
                Array(320) {
                    Array(320) {
                        FloatArray(3)
                    }
                }
            }
            for (i in0 until width - 1) {
                for (j in0 until height - 1) {
                    val pixelValue: Int = intValues[i * width + j]
                    finalFourDimensionalArray[0][i][j][0] =
                        ((Color.red(pixelValue).toFloat() / maxValue) - 0.485f) / 0.229f
                    finalFourDimensionalArray[0][i][j][1] =
                        ((Color.green(pixelValue).toFloat() / maxValue) - 0.456f) / 0.224f
                    finalFourDimensionalArray[0][i][j][2] =
                        ((Color.blue(pixelValue).toFloat() / maxValue) - 0.406f) / 0.225f
                }

            }

            return finalFourDimensionalArray
        }

Then this array is fed inside the interpreter and as your model has multiple outputs we are using runForMultipleInputsOutputs:

// Convert Bitmap to Float arrayvalinputStyle= ImageUtils.bitmapToFloatArray(loadedBitmap)

            // Create arrays with size 1,320,320,1valoutput1=  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            valoutput2=  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            valoutput3=  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            valoutput4=  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            valoutput5=  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            valoutput6=  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}

            val outputs: MutableMap<Int,
                    Any> = HashMap()
            outputs[0] = output1
            outputs[1] = output2
            outputs[2] = output3
            outputs[3] = output4
            outputs[4] = output5
            outputs[5] = output6
          
            // Runs model inference and gets result.valarray= arrayOf(inputStyle)
            interpreterDepth.runForMultipleInputsOutputs(array, outputs)

Then we use the first output of the interpreter as you can see atu2net_test.py file. (I have also printed results of line 112 but it seems that it has no effect. You are free to try that with min and max value of the color pixel values). So we have the post proseccing like you can see at the save_output function:

// Convert output array to Bitmapval (finalBitmapGrey, finalBitmapBlack) = ImageUtils.convertArrayToBitmapTensorFlow(
                output1, CONTENT_IMAGE_SIZE,
                CONTENT_IMAGE_SIZE
            )

where the above function will be like:

funconvertArrayToBitmapTensorFlow(
            imageArray: Array<Array<Array<FloatArray>>>,
            imageWidth: Int,
            imageHeight: Int
        ): Bitmap {
            val conf = Bitmap.Config.ARGB_8888 // see other conf typesval grayToneImage = Bitmap.createBitmap(imageWidth, imageHeight, conf)

            for (x in imageArray[0].indices) {
                for (y in imageArray[0][0].indices) {
                    val color = Color.rgb(
                        //
                        (((imageArray[0][x][y][0]) * 255f).toInt()),
                        (((imageArray[0][x][y][0]) * 255f).toInt()),
                        (((imageArray[0][x][y][0]) * 255f).toInt())
                    )

                    // this y, x is in the correct order!!!
                    grayToneImage.setPixel(y, x, color)
                }
            }
            return grayToneImage
        }

then this grayscale image you can use it as you want.

Due to multiple steps of the preprocessing I used directly interpreter with no additional libraries. I will try later in the week if you can insert metadata with all the steps but I doubt that.

If you need some clarifications please do not hesitate to ask me.

Colab notebook link

Happy coding

Android Coder

Usage Of U2net Model In Android

Solution 1:

Post a Comment for "Usage Of U2net Model In Android"