Usage Of U2net Model In Android
Solution 1:
I will write a long answer here. Getting in touch with the github repo of U2Net it leaves you with the effort to examine the pre and post-processing steps so you can aply the same inside the android project.
First of all preprocessing:
In the u2net_test.py
file you can see at this line that all the images are preprocessed with function ToTensorLab(flag=0)
. Navigating to this you see that with flag=0 the preprocessing is this:
else: # with rgb color (flag = 0)
tmpImg = np.zeros((image.shape[0],image.shape[1],3))
image = image/np.max(image)
if image.shape[2]==1:
tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,1] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,2] = (image[:,:,0]-0.485)/0.229
else:
tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225
Pay attention to 2 steps.
First every color pixel value is divided by the maximum value of all color pixel values:
image = image/np.max(image)
and
Second at every color pixel value is applied mean and std:
tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225
So basically in Kotlin if you have a bitmap you have to do something like:
funbitmapToFloatArray(bitmap: Bitmap):
Array<Array<Array<FloatArray>>> {
val width: Int = bitmap.width
val height: Int = bitmap.height
val intValues = IntArray(width * height)
bitmap.getPixels(intValues, 0, width, 0, 0, width, height)
// Create aa array to find the maximum valueval fourDimensionalArray = Array(1) {
Array(320) {
Array(320) {
FloatArray(3)
}
}
}
// https://github.com/xuebinqin/U-2-Net/blob/f2b8e4ac1c4fbe90daba8707bca051a0ec830bf6/data_loader.py#L204for (i in0 until width - 1) {
for (j in0 until height - 1) {
val pixelValue: Int = intValues[i * width + j]
fourDimensionalArray[0][i][j][0] =
Color.red(pixelValue)
.toFloat()
fourDimensionalArray[0][i][j][1] =
Color.green(pixelValue)
.toFloat()
fourDimensionalArray[0][i][j][2] =
Color.blue(pixelValue).toFloat()
}
}
// Convert multidimensional array to 1Dval oneDFloatArray = ArrayList<Float>()
for (m in fourDimensionalArray[0].indices) {
for (x in fourDimensionalArray[0][0].indices) {
for (y in fourDimensionalArray[0][0][0].indices) {
oneDFloatArray.add(fourDimensionalArray[0][m][x][y])
}
}
}
val maxValue: Float = oneDFloatArray.maxOrNull() ?: 0f//val minValue: Float = oneDFloatArray.minOrNull() ?: 0f// Final array that is going to be used with interpreterval finalFourDimensionalArray = Array(1) {
Array(320) {
Array(320) {
FloatArray(3)
}
}
}
for (i in0 until width - 1) {
for (j in0 until height - 1) {
val pixelValue: Int = intValues[i * width + j]
finalFourDimensionalArray[0][i][j][0] =
((Color.red(pixelValue).toFloat() / maxValue) - 0.485f) / 0.229f
finalFourDimensionalArray[0][i][j][1] =
((Color.green(pixelValue).toFloat() / maxValue) - 0.456f) / 0.224f
finalFourDimensionalArray[0][i][j][2] =
((Color.blue(pixelValue).toFloat() / maxValue) - 0.406f) / 0.225f
}
}
return finalFourDimensionalArray
}
Then this array is fed inside the interpreter and as your model has multiple outputs we are using runForMultipleInputsOutputs
:
// Convert Bitmap to Float arrayvalinputStyle= ImageUtils.bitmapToFloatArray(loadedBitmap)
// Create arrays with size 1,320,320,1valoutput1= Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
valoutput2= Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
valoutput3= Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
valoutput4= Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
valoutput5= Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
valoutput6= Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
val outputs: MutableMap<Int,
Any> = HashMap()
outputs[0] = output1
outputs[1] = output2
outputs[2] = output3
outputs[3] = output4
outputs[4] = output5
outputs[5] = output6
// Runs model inference and gets result.valarray= arrayOf(inputStyle)
interpreterDepth.runForMultipleInputsOutputs(array, outputs)
Then we use the first output of the interpreter as you can see atu2net_test.py
file. (I have also printed results of line 112 but it seems that it has no effect. You are free to try that with min and max value of the color pixel values).
So we have the post proseccing like you can see at the save_output function:
// Convert output array to Bitmapval (finalBitmapGrey, finalBitmapBlack) = ImageUtils.convertArrayToBitmapTensorFlow(
output1, CONTENT_IMAGE_SIZE,
CONTENT_IMAGE_SIZE
)
where the above function will be like:
funconvertArrayToBitmapTensorFlow(
imageArray: Array<Array<Array<FloatArray>>>,
imageWidth: Int,
imageHeight: Int
): Bitmap {
val conf = Bitmap.Config.ARGB_8888 // see other conf typesval grayToneImage = Bitmap.createBitmap(imageWidth, imageHeight, conf)
for (x in imageArray[0].indices) {
for (y in imageArray[0][0].indices) {
val color = Color.rgb(
//
(((imageArray[0][x][y][0]) * 255f).toInt()),
(((imageArray[0][x][y][0]) * 255f).toInt()),
(((imageArray[0][x][y][0]) * 255f).toInt())
)
// this y, x is in the correct order!!!
grayToneImage.setPixel(y, x, color)
}
}
return grayToneImage
}
then this grayscale image you can use it as you want.
Due to multiple steps of the preprocessing I used directly interpreter with no additional libraries. I will try later in the week if you can insert metadata with all the steps but I doubt that.
If you need some clarifications please do not hesitate to ask me.
Colab notebook link
Happy coding
Post a Comment for "Usage Of U2net Model In Android"