1 - Introduction
2 - NAND Dump Analysis
3 - Bit Errors Fixing with ECC
4 - UBI Image Analysis
5 - Firmware Extraction
6 - Conclusion
1 - Introduction
This is a paper about how a NAND dump to be processed from a hacker point of view and obtain all the files included in the dump file. For each step of the process, the applied method is explained in detail together with example.The NAND dump that is going to focus in depth is physical NAND dump, which is the dump file getting from a universal programmer. For the dump file getting from bootloader such as u-boot, I name it as logical NAND dump.
For logical NAND dump, the correctness of data is ensured by the Flash Translation Layer(FTL). In other words, the FTL will do all the bit errors fixing with Error Correcting Code (ECC) for you. However, for physical NAND dump, the data will come along with ECC, and you are on your own to guess how to use the ECC to ensure the correctness of data. If bit errors exist, the ECC should be used to fix the errors accordingly. But, it is not easy to guess how the ECC works associated with the data. If the association between ECC and data is not known, it is impossible to use the ECC to fix bit errors in data. So, it is necessary to perform thorough NAND dump analysis systematically and uncover the association between ECC and data which is in secret. It is not a good idea to uncover the secret by brute forcing it blindly. Instead, by making use the result from thorough analysis, the blindly brute forcing can be transformed into guided brute forcing. As a result, the chance of getting the secret association between ECC and data is maximized in the guided brute forcing manner.
Once the bit errors in data get fixed, and the ECCs get removed, the NAND dump transformed from physical into logical, and it is ready for actual firmware image analysis. As a real case scenario for this paper, an UBI image is going to deal with. The analysis to the UBI image will be discussed in pretty detail. Based on the substantial knowledge gained from the UBI image analysis, a creative approach is proposed to recover the file system and extract all the files being hosted inside the file system. It is important to note that the entire process being discussed in this paper is not possible to replicate with those automated tools such as binwalk or unblob. Besides, the entire analysis process is getting demonstrated on step-by-step basis manually to make sure everything is explained clearly.Without wasting more time in mere talk, let's get started from the actual NAND dump analysis in details.
2 - NAND Dump Analysis
First of all, let's start with a little bit of fundamental stuff. A NAND flash comprises a lot of so called "page" in certain size, and a group of "page" in certain count will make up a "block". Since the sample NAND dump that is going to be used for the demonstration is obtained from an actual NAND chip with part number of MT29F2G08ABAEAWP, and so it should be used as example to illustrate the hacking-related technical specification accordingly.
So, for MT29F2G08ABAEAWP, the size of a "page" is 2048+64=2112 bytes, and a group of 64 "page" make up a "block", and 2048 "block" make up the entire storage of the NAND flash, which contain 2048*64=131072 "page".
For each "page" with 2112 bytes in size, the first 2048 bytes are data and the rest of 64 bytes are spare area to host ECC or some kind of vendor specific metadata. Sometimes, the spare area is also known as Out Of Band (OOB) in some literatures.
As a overview of the sample NAND dump in hex mode for the first "page", 0x0000 to 0x07ff is data portion, and 0x0800 to 0x083f is spare area or OOB portion, as shown below.
cawan% hexdump -C -n 2112 ./MT29F2G08ABAEAWP@TSOP48.BIN
00000000 20 54 56 4e 00 02 00 00 a0 ac 00 00 ff ff ff ff | TVN............|
00000010 55 aa 55 aa 2e 00 00 00 20 02 00 b0 00 00 00 01 |U.U..... .......|
00000020 64 02 00 b0 18 00 00 c0 20 02 00 b0 18 00 00 01 |d....... .......|
00000030 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
00000040 28 18 00 b0 4a d8 dc 53 08 18 00 b0 14 80 00 00 |(...J..S........|
00000050 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
00000060 aa 55 aa 55 01 00 00 00 00 18 00 b0 76 04 03 00 |.U.U........v...|
00000070 aa 55 aa 55 01 00 00 00 04 18 00 b0 21 00 00 00 |.U.U........!...|
00000080 aa 55 aa 55 01 00 00 00 04 18 00 b0 23 00 00 00 |.U.U........#...|
00000090 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
000000a0 aa 55 aa 55 01 00 00 00 04 18 00 b0 27 00 00 00 |.U.U........'...|
000000b0 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
000000c0 aa 55 aa 55 01 00 00 00 20 18 00 b0 00 00 00 00 |.U.U.... .......|
000000d0 24 18 00 b0 00 00 00 00 1c 18 00 b0 00 40 00 00 |$............@..|
000000e0 18 18 00 b0 32 03 00 00 10 18 00 b0 06 00 00 00 |....2...........|
000000f0 04 18 00 b0 27 00 00 00 aa 55 aa 55 01 00 00 00 |....'....U.U....|
00000100 aa 55 aa 55 01 00 00 00 aa 55 aa 55 01 00 00 00 |.U.U.....U.U....|
00000110 04 18 00 b0 2b 00 00 00 04 18 00 b0 2b 00 00 00 |....+.......+...|
00000120 04 18 00 b0 2b 00 00 00 18 18 00 b0 32 02 00 00 |....+.......2...|
00000130 1c 18 00 b0 81 47 00 00 1c 18 00 b0 01 44 00 00 |.....G.......D..|
00000140 04 18 00 b0 20 00 00 00 34 18 00 b0 20 88 88 00 |.... ...4... ...|
00000150 aa 55 aa 55 01 00 00 00 18 02 00 b0 08 00 00 00 |.U.U............|
00000160 60 31 00 b8 00 80 00 00 a0 31 00 b8 00 80 00 00 |1.......1......|
00000170 2c 02 00 b0 00 01 00 00 2c 02 00 b0 00 01 00 00 |,.......,.......|
00000180 2c 02 00 b0 00 01 00 00 00 00 00 00 00 00 00 00 |,...............|
00000190 13 00 00 ea 14 f0 9f e5 10 f0 9f e5 0c f0 9f e5 |................|
000001a0 08 f0 9f e5 04 f0 9f e5 00 f0 9f e5 04 f0 1f e5 |................|
000001b0 20 03 00 00 78 56 34 12 78 56 34 12 78 56 34 12 | ...xV4.xV4.xV4.|
000001c0 78 56 34 12 78 56 34 12 78 56 34 12 78 56 34 12 |xV4.xV4.xV4.xV4.|
000001d0 00 02 00 00 a0 ac 00 00 80 b5 00 00 a0 ac 00 00 |................|
000001e0 de c0 ad 0b 00 00 0f e1 1f 00 c0 e3 d3 00 80 e3 |................|
000001f0 00 f0 29 e1 bc d0 9f e5 07 d0 cd e3 00 00 a0 e3 |..).............|
00000200 70 05 00 eb 00 40 a0 e1 01 50 a0 e1 02 60 a0 e1 |p....@...P.....|
00000210 04 d0 a0 e1 8c 00 4f e2 00 90 46 e0 06 00 50 e1 |......O...F...P.|
00000220 06 00 00 0a 06 10 a0 e1 5c 30 1f e5 03 20 80 e0 |........\0... ..|
00000230 00 06 b0 e8 00 06 a1 e8 02 00 50 e1 fb ff ff 3a |..........P....:|
00000240 74 00 9f e5 74 10 9f e5 00 20 a0 e3 01 00 50 e1 |t...t.... ....P.|
00000250 02 00 00 2a 00 20 80 e5 04 00 80 e2 fa ff ff ea |...*. ..........|
00000260 00 00 9f e5 00 f0 a0 e1 54 06 00 00 a0 ac 00 00 |........T.......|
00000270 a0 ac 00 00 a0 ac 00 00 00 00 a0 e3 17 0f 07 ee |................|
00000280 17 0f 08 ee 10 0f 11 ee 23 0c c0 e3 87 00 c0 e3 |........#.......|
00000290 02 00 80 e3 01 0a 80 e3 10 0f 01 ee 0e c0 a0 e1 |................|
000002a0 0a 00 00 eb 0c e0 a0 e1 0e f0 a0 e1 00 00 a0 e1 |................|
000002b0 e8 d0 1f e5 fe ff ff eb 00 80 00 bc a0 ae 00 00 |................|
000002c0 80 b7 00 00 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 |................|
000002d0 68 00 9f e5 00 10 e0 e3 00 10 80 e5 00 00 0f e1 |h...............|
000002e0 c0 00 80 e3 00 f0 21 e1 54 00 9f e5 54 10 9f e5 |......!.T...T...|
000002f0 00 10 80 e5 50 00 9f e5 50 10 9f e5 00 10 80 e5 |....P...P.......|
00000300 4c 00 9f e5 05 14 a0 e3 00 10 80 e5 44 00 9f e5 |L...........D...|
00000310 44 10 9f e5 00 10 80 e5 03 2a a0 e3 01 20 52 e2 |D........*... R.|
00000320 fd ff ff 1a 20 00 9f e5 30 10 9f e5 00 10 80 e5 |.... ...0.......|
00000330 01 2b a0 e3 01 20 52 e2 fd ff ff 1a 0e f0 a0 e1 |.+... R.........|
00000340 24 21 00 b8 04 10 00 b0 84 00 04 40 04 02 00 b0 |$!.........@....|
00000350 ff 0f 00 00 08 02 00 b0 0c 02 00 b0 24 4f 00 00 |............$O..|
00000360 fc 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000370 00 00 51 e3 1f 00 00 0a 01 30 a0 e3 00 20 a0 e3 |..Q......0... ..|
00000380 01 00 50 e1 19 00 00 3a 01 02 51 e3 00 00 51 31 |..P....:..Q...Q1|
00000390 01 12 a0 31 03 32 a0 31 fa ff ff 3a 02 01 51 e3 |...1.2.1...:..Q.|
000003a0 00 00 51 31 81 10 a0 31 83 30 a0 31 fa ff ff 3a |..Q1...1.0.1...:|
000003b0 01 00 50 e1 01 00 40 20 03 20 82 21 a1 00 50 e1 |..P...@ . .!..P.|
000003c0 a1 00 40 20 a3 20 82 21 21 01 50 e1 21 01 40 20 |..@ . .!!.P.!.@ |
000003d0 23 21 82 21 a1 01 50 e1 a1 01 40 20 a3 21 82 21 |#!.!..P...@ .!.!|
000003e0 00 00 50 e3 23 32 b0 11 21 12 a0 11 ef ff ff 1a |..P.#2..!.......|
000003f0 02 00 a0 e1 0e f0 a0 e1 04 e0 2d e5 c9 1c 00 eb |..........-.....|
00000400 00 00 a0 e3 00 80 bd e8 03 50 2d e9 d7 ff ff eb |.........P-.....|
00000410 06 50 bd e8 90 02 03 e0 03 10 41 e0 0e f0 a0 e1 |.P........A.....|
00000420 03 50 2d e9 09 00 00 eb 06 50 bd e8 90 02 03 e0 |.P-......P......|
00000430 03 10 41 e0 0e f0 a0 e1 00 00 a0 e1 00 00 a0 e1 |..A.............|
00000440 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 |................|
00000450 00 00 51 e3 01 c0 20 e0 42 00 00 0a 00 10 61 42 |..Q... .B.....aB|
00000460 01 20 51 e2 27 00 00 0a 00 30 b0 e1 00 30 60 42 |. Q.'....0...0B|
00000470 01 00 53 e1 26 00 00 9a 02 00 11 e1 28 00 00 0a |..S.&.......(...|
00000480 0e 02 11 e3 81 11 a0 01 08 20 a0 03 01 20 a0 13 |......... ... ..|
00000490 01 02 51 e3 03 00 51 31 01 12 a0 31 02 22 a0 31 |..Q...Q1...1.".1|
000004a0 fa ff ff 3a 02 01 51 e3 03 00 51 31 81 10 a0 31 |...:..Q...Q1...1|
000004b0 82 20 a0 31 fa ff ff 3a 00 00 a0 e3 01 00 53 e1 |. .1...:......S.|
000004c0 01 30 43 20 02 00 80 21 a1 00 53 e1 a1 30 43 20 |.0C ...!..S..0C |
000004d0 a2 00 80 21 21 01 53 e1 21 31 43 20 22 01 80 21 |...!!.S.!1C "..!|
000004e0 a1 01 53 e1 a1 31 43 20 a2 01 80 21 00 00 53 e3 |..S..1C ...!..S.|
000004f0 22 22 b0 11 21 12 a0 11 ef ff ff 1a 00 00 5c e3 |""..!.........\.|
00000500 00 00 60 42 0e f0 a0 e1 00 00 3c e1 00 00 60 42 |..B......<...B|
00000510 0e f0 a0 e1 00 00 a0 33 cc 0f a0 01 01 00 80 03 |.......3........|
00000520 0e f0 a0 e1 01 08 51 e3 21 18 a0 21 10 20 a0 23 |......Q.!..!. .#|
00000530 00 20 a0 33 01 0c 51 e3 21 14 a0 21 08 20 82 22 |. .3..Q.!..!. ."|
00000540 10 00 51 e3 21 12 a0 21 04 20 82 22 04 00 51 e3 |..Q.!..!. ."..Q.|
00000550 03 20 82 82 a1 20 82 90 00 00 5c e3 33 02 a0 e1 |. ... ....\.3...|
00000560 00 00 60 42 0e f0 a0 e1 04 e0 2d e5 6d 1c 00 eb |..B......-.m...|
00000570 00 00 a0 e3 04 f0 9d e4 00 00 a0 e1 00 00 a0 e1 |................|
00000580 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 |................|
00000590 20 30 52 e2 20 c0 62 e2 30 02 a0 41 31 03 a0 51 | 0R. .b.0..A1..Q|
000005a0 11 0c 80 41 31 12 a0 e1 0e f0 a0 e1 20 30 52 e2 |...A1....... 0R.|
000005b0 20 c0 62 e2 11 12 a0 41 10 13 a0 51 30 1c 81 41 | .b....A...Q0..A|
000005c0 10 02 a0 e1 0e f0 a0 e1 20 30 52 e2 20 c0 62 e2 |........ 0R. .b.|
000005d0 30 02 a0 41 51 03 a0 51 11 0c 80 41 51 12 a0 e1 |0..AQ..Q...AQ...|
000005e0 0e f0 a0 e1 2d de 4d e2 00 40 a0 e3 6c 31 9f e5 |....-.M..@..l1..|
000005f0 0d 00 a0 e1 00 30 8d e5 04 30 8d e5 1c 40 8d e5 |.....0...0...@..|
00000600 bc d2 8d e5 30 40 8d e5 50 40 8d e5 d1 01 00 eb |....0@..P@......|
00000610 1c 30 9d e5 04 00 53 e1 02 00 00 0a 04 10 a0 e1 |.0....S.........|
00000620 8a 0f 8d e2 33 ff 2f e1 8a 0f 8d e2 01 10 a0 e3 |....3./.........|
00000630 ca 1a 00 eb 00 00 50 e3 46 00 00 1a 70 04 00 eb |......P.F...p...|
00000640 38 42 9d e5 3c 52 9d e5 04 00 a0 e1 05 10 a0 e1 |8B..<R..........|
00000650 46 ff ff eb 04 10 a0 e1 0e a6 a0 e3 00 b0 a0 e1 |F...............|
00000660 0a 08 a0 e3 41 ff ff eb 04 10 a0 e1 00 70 a0 e1 |....A........p..|
00000670 ec 00 9f e5 3d ff ff eb 04 10 a0 e1 00 90 a0 e1 |....=...........|
00000680 0a 08 a0 e3 5f ff ff eb 01 00 a0 e1 05 10 a0 e1 |...._...........|
00000690 36 ff ff eb 00 60 a0 e1 24 00 00 ea 3c 12 9d e5 |6......$...<...|
000006a0 38 02 9d e5 31 ff ff eb 8a 4f 8d e2 bc 52 9d e5 |8...1....O...R..|
000006b0 50 10 a0 e3 00 20 a0 e3 90 07 03 e0 04 00 a0 e1 |P.... ..........|
000006c0 0f e0 a0 e1 34 f0 95 e5 04 00 a0 e1 0f e0 a0 e1 |....4...........|
000006d0 08 f0 95 e5 ff 00 50 e3 01 90 89 12 12 00 00 1a |......P.........|
000006e0 0e 00 00 ea bc 42 9d e5 38 02 9d e5 d4 51 94 e5 |.....B..8....Q..|
000006f0 3c 12 9d e5 00 00 55 e3 05 00 00 0a 1b ff ff eb |<.....U.........|
00000700 04 10 a0 e1 0a 20 a0 e1 90 67 23 e0 8a 0f 8d e2 |..... ...g#.....|
00000710 35 ff 2f e1 3c 32 9d e5 01 60 86 e2 03 a0 8a e0 |5./.<2.........|
00000720 0b 00 56 e1 ee ff ff 3a 00 60 a0 e3 01 70 87 e2 |..V....:....p..|
00000730 09 00 57 e1 d8 ff ff 9a 1c 30 9d e5 00 00 53 e3 |..W......0....S.|
00000740 02 00 00 0a 8a 0f 8d e2 00 10 e0 e3 33 ff 2f e1 |............3./.|
00000750 0e 36 a0 e3 33 ff 2f e1 2d de 8d e2 1e ff 2f e1 |.6..3./.-...../.|
00000760 00 d0 00 b0 ff cf 11 00 f0 40 2d e9 02 60 d3 e5 |.........@-....|
00000770 00 40 d3 e5 00 00 d2 e5 01 c0 d3 e5 02 50 d2 e5 |.@...........P..|
00000780 01 30 d2 e5 00 40 24 e0 03 c0 2c e0 05 60 26 e0 |.0...@$...,..&.|
00000790 ff 00 04 e2 06 30 8c e1 03 30 90 e1 01 70 a0 e1 |.....0...0...p..|
000007a0 03 00 a0 01 f0 80 bd 08 ac 50 a0 e1 0c 30 25 e0 |.........P...0%.|
000007b0 55 30 03 e2 55 00 53 e3 28 00 00 1a a0 30 20 e0 |U0..U.S.(....0 .|
000007c0 55 30 03 e2 55 00 53 e3 24 00 00 1a a6 30 26 e0 |U0..U.S.$....0&.|
000007d0 54 30 03 e2 54 00 53 e3 20 00 00 1a 80 20 a0 e1 |T0..T.S. .... ..|
000007e0 00 31 a0 e1 20 30 03 e2 40 20 02 e2 03 20 82 e1 |.1.. 0..@ ... ..|
000007f0 80 10 04 e2 80 31 a0 e1 01 20 82 e1 10 30 03 e2 |.....1... ...0..|
00000800 ff ff 00 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000810 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000820 f6 89 f7 79 e5 60 c9 e0 d6 e3 ed cb 9c b0 f9 f0 |...y...........|
00000830 1f da d4 a4 9c d4 1b e0 e0 90 cc 85 d8 d2 e2 80 |................|
00000840
This sample NAND dump is in fact a physical NAND dump from a real industrial product. As mentioned earlier, this sample will be used as a real case scenario to illustrate each step of analysis process until the full file system getting extracted and recovered. Let's start with DumpFlash tool and try to identify the ID codes of the NAND chip. However, it's failed and the output is shown below.
This happen might due to the ID codes are missing or changed to something strange in the NAND dump.
cawan% python2.7 dumpflash.py -i./MT29F2G08ABAEAWP@TSOP48.BIN
PageSize: 0x200
OOBSize: 0x10
PagePerBlock: 0x20
BlockSize: 0x4000
RawPageSize: 0x210
FileSize: 0x10800000
PageCount: 0x84000
So, just forget about the false output generated by DumpFlash, and back to the technical specification as provided by the datasheet of MT29F2G08ABAEAWP.Let's have a brief look to the OOB with 64 bytes in size of the first "page" in particular.
00000800 ff ff 00 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000810 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000820 f6 89 f7 79 e5 60 c9 e0 d6 e3 ed cb 9c b0 f9 f0 |...y...........|
00000830 1f da d4 a4 9c d4 1b e0 e0 90 cc 85 d8 d2 e2 80 |................|
From this, two assumptions can be made. One, the first 32 bytes of OOB might be a constant. Two, the second 32 bytes might be ECCs. Let's verify the first assumption is a fact or a mistake, by checking the OOB of the second "page", as shown below.
cawan% hexdump -v -C -n $((2112*2)) ./MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 5
00001040 ff ff 00 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00001050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00001060 8f ce f4 8b 1c 26 38 00 bd 61 a0 c7 48 c4 d3 60 |.....&8..a..H..|
00001070 d2 1b 46 ab 53 8f 41 f0 8d 18 2b 3b 8d 54 21 50 |..F.S.A...+;.T!P|
Yes, it seems unchanged. How about the third "page" then ?
cawan% hexdump -v -C -n $((2112*3)) ./MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 5
00001880 ff ff 00 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00001890 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
000018a0 01 8b bb 0a bb 54 88 50 7e 0e b9 9a c2 7b bd 40 |.....T.P~....{.@|
000018b0 dd 63 cb 9a e3 5a bc 70 65 ca 16 7a 50 dc 60 e0 |.c...Z.pe..zP..|
Still unchanged. How about the first "page" of the next block then ?
cawan% hexdump -C -v -n $((2112*64+2112)) ./MT29F2G08ABAEAWP@TSOP48.BIN | \
tail -n 5
00021800 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00021810 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00021820 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00021830 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
Well, this is a blank page that should be ignored. By grabbing a few samples and make a conclusion is really not a good idea. Let's check it in proper.
############################### check_const.py ###############################
input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
suspect_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
blank = \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
page_count = 0
diff_count = 0
while 1:
data = input_file.read(2112)
if len(data) == 0:
break
oob_first_32_bytes = data[2048:2048+32]
page_count += 1
if len(data) == 2112 and oob_first_32_bytes != blank:
if oob_first_32_bytes != suspect_const:
diff_count += 1
print("diff_count: %d page_count: %d\n" % (diff_count, page_count))
##################################### end ####################################
The output is,
cawan% python3.8 check_const.py
diff_count: 0 page_count: 131072
So, it is convincing enough to say that the first 32 bytes of OOB for all the "page" are constant. Next, let's verify the second assumption about the second 32 bytes of OOB are ECCs or not. The ECC suspected portion of OOB for the first 4 "page" are shown below.
00000820 f6 89 f7 79 e5 60 c9 e0 d6 e3 ed cb 9c b0 f9 f0 |...y...........|
00000830 1f da d4 a4 9c d4 1b e0 e0 90 cc 85 d8 d2 e2 80 |................|
00001060 8f ce f4 8b 1c 26 38 00 bd 61 a0 c7 48 c4 d3 60 |.....&8..a..H..|
00001070 d2 1b 46 ab 53 8f 41 f0 8d 18 2b 3b 8d 54 21 50 |..F.S.A...+;.T!P|
000018a0 01 8b bb 0a bb 54 88 50 7e 0e b9 9a c2 7b bd 40 |.....T.P~....{.@|
000018b0 dd 63 cb 9a e3 5a bc 70 65 ca 16 7a 50 dc 60 e0 |.c...Z.pe..zP..|
000020e0 43 a9 36 70 be b0 5e 90 1c 4f c1 ad 19 54 4d 20 |C.6p..^..O...TM |
000020f0 b8 6a 20 ba 32 c2 74 80 76 73 45 10 64 3e 38 c0 |.j .2.t.vsE.d>8.|
The output looks positive, and it provides extra information about how the ECC suspected portion of OOB going to be used by the system implementation.For each "page", it seems the 32 bytes of ECC suspected portion can be divided into four of 8 bytes each ECCs. The reason is the last 4 bits of each 8 bytes of suspected ECC are always to be zero, as shown below.
f6 89 f7 79 e5 60 c9 e0
d6 e3 ed cb 9c b0 f9 f0
1f da d4 a4 9c d4 1b e0
e0 90 cc 85 d8 d2 e2 80
8f ce f4 8b 1c 26 38 00
bd 61 a0 c7 48 c4 d3 60
d2 1b 46 ab 53 8f 41 f0
8d 18 2b 3b 8d 54 21 50
01 8b bb 0a bb 54 88 50
7e 0e b9 9a c2 7b bd 40
dd 63 cb 9a e3 5a bc 70
65 ca 16 7a 50 dc 60 e0
43 a9 36 70 be b0 5e 90
1c 4f c1 ad 19 54 4d 20
b8 6a 20 ba 32 c2 74 80
76 73 45 10 64 3e 38 c0
^
0
Since a "page" comprises four ECCs, it is reasonable to deduce the data portion of a "page" with 2048 bytes in size can be divided into four 512 bytes of "sub-page". For each "sub-page", it is protected by the respective ECC, in sequence, as shown below.
f6 89 f7 79 e5 60 c9 e0 <- ECC of the 1st "sub-page" in 1st "page"
d6 e3 ed cb 9c b0 f9 f0 <- ECC of the 2nd "sub-page" in 1st "page"
1f da d4 a4 9c d4 1b e0 <- ECC of the 3rd "sub-page" in 1st "page"
e0 90 cc 85 d8 d2 e2 80 <- ECC of the 4th "sub-page" in 1st "page"
8f ce f4 8b 1c 26 38 00 <- ECC of the 1st "sub-page" in 2nd "page"
bd 61 a0 c7 48 c4 d3 60 <- ECC of the 2st "sub-page" in 2nd "page"
d2 1b 46 ab 53 8f 41 f0 <- ECC of the 3st "sub-page" in 2nd "page"
8d 18 2b 3b 8d 54 21 50 <- ECC of the 4st "sub-page" in 2nd "page"
01 8b bb 0a bb 54 88 50 <- ECC of the 1st "sub-page" in 3rd "page"
7e 0e b9 9a c2 7b bd 40 <- ECC of the 2st "sub-page" in 3rd "page"
dd 63 cb 9a e3 5a bc 70 <- ECC of the 3st "sub-page" in 3rd "page"
65 ca 16 7a 50 dc 60 e0 <- ECC of the 4st "sub-page" in 3rd "page"
43 a9 36 70 be b0 5e 90 <- ECC of the 1st "sub-page" in 4th "page"
1c 4f c1 ad 19 54 4d 20 <- ECC of the 2st "sub-page" in 4th "page"
b8 6a 20 ba 32 c2 74 80 <- ECC of the 3st "sub-page" in 4th "page"
76 73 45 10 64 3e 38 c0 <- ECC of the 4st "sub-page" in 4th "page"
^
0
When saying the last 4 bits of each ECC is zero, it might indicate the length of the ECC is 8*8=64-4=60 bits.As a side note, it is important to note that the ECC length is normally expressed in bit form. Let's get confirm to all the ECCs are 60-bits in size by checking the last 4 bits for each of them are always zero.
########################### check_ecc_last_4bit.py ###########################
input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
suspect_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
blank = \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
masking = b'\x00\x00\x00\x00\x00\x00\x00\x0f'
page_count = 0
diff_count = 0
while 1:
data = input_file.read(2112)
if len(data) == 0:
break
oob_1st_32_bytes = data[2048:2048+32]
oob_2nd_32_bytes = data[2048+32:2048+64]
page_count += 1
if len(data) == 2112 and oob_1st_32_bytes != blank:
for i in range(4):
last_4_bits = bytes([a & b for a, b in \
zip(oob_2nd_32_bytes[i*8:i*8+8], masking)])
if last_4_bits[7] != 0:
diff_count += 1
print("diff_count: %d page_count: %d\n" % (diff_count, page_count))
##################################### end ####################################
The output is,
cawan% python3.8 check_ecc_last_4bit.py
diff_count: 0 page_count: 131072
With such a convincing result, it is reasonable to say that the ECC length is 60 bits.
Now, let's get a brief hacker overview of ECC algorithm. In general, three types of implementation are normally being used: Hamming,Reed-Solomon (RS), and Binary BCH. However, due to the Hamming code can only correct a single bit of error, and the RS code require more code redundancy for a given error correction, Binary BCH code is the mostly used modern ECC implementation. Thus, the Binary BCH is assumed to be the ECC implementation here. In addition, some special characteristics of Binary BCH can help to further identifying the ECC implementation. The first characteristic is for those data with all zero regardless its size, the respective ECC in Binary BCH should also be all zero. Let's show it in example by using bchlib. Let's be clear that all the parameters are just for demo at this stage, the actual parameters will be derived from the analysis part by part. Let's go ahead to the first characteristic.
############################## test_bchlib_01.py #############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\x00'*512)
ecc = bch.encode(data)
for i in ecc:
print("%X" % i, end='')
print("")
##################################### end ####################################
The bchlib is used for Binary BCH encoding and decoding tasks.
Two parameters have to be specified to make it works, BCH_POLYNOMIAL and BCH_BITS. The BCH_POLYNOMIAL is about the primitive polynomial going to be used, and the BCH_BITS is about the maximum number of bit errors in data that can be corrected by the ECC. All the details about these two parameters will be discussed in the coming section of Binary BCH implementation as it is crucial to uncover the secret association between ECC and data. Now, let's get the first glance of bchlib and study the first characteristic of Binary BCH. The output of test_bchlib_01.py is shown below.
cawan% python3.8 test_bchlib_01.py
0000000
The BCH encoded output of 512 bytes of zero is indeed 3.5 bytes of zero. How about 512 bytes of 0xFF then?Let's check.
############################## test_bchlib_02.py #############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*512)
ecc = bch.encode(data)
for i in ecc:
print("%X" % i, end='')
print("")
##################################### end ####################################
The output is,
cawan% python3.8 test_bchlib_02.py
D7EC33C6695380
The output is not all 0xFF and it makes sense. Otherwise, if 512 bytes of 0xFF getting BCH encoded as 7 bytes of 0xFF, then it is not convenient to differentiate from a blank "page". Now, let's proceed to the second characteristic about the zeros padding issues. The question now is what happen if 32 bytes of zeros appended to the 512 bytes of 0xFF ? Let's check it.
############################## test_bchlib_03.py #############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*512 + b'\x00'*32)
ecc = bch.encode(data)
for i in ecc:
print("%X" % i, end='')
print("")
##################################### end ####################################
The output is,
cawan% python3.8 test_bchlib_03.py
BCE3B0AE479EB0
Well, it seems the zeros padded data is having different BCH encoded output than the non-zeros padded data does, provided the data is not all zeros. However, this is not the case of an inherent BCH encoder.An inherent BCH encoder will generate exactly the same output for both zeros padded data and non-zeros padded data. while such a characteristic will cause some kind of discrepancy, such an issue should be avoided. A common approach in overcoming such an issue caused by its inherent characteristic is by reversing the bit order of the entire data, right before getting it BCH encoded. So, it is reasonable to assume bchlib should follow such an approach, but how to verify it ? Well, while making such an assumption, for the data with 512 bytes of 0xFF appended by 32 bytes of zeros, it means the actual data being BCH encoded by bchlib is in fact 32 bytes of zeros being prepended at the 512 bytes of 0xFF. So, if this is the case, the BCH encoded output of the zeros prepended data should be the same with the non-zeros prepended data. Let's verify it.
############################## test_bchlib_04.py #############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data1 = bytearray(b'\x00'*32 + b'\xFF'*512)
ecc1 = bch.encode(data1)
data2 = bytearray(b'\xFF'*512)
ecc2 = bch.encode(data2)
print("Zeros Prepended:")
for i in ecc1:
print("%X" % i, end='')
print("")
print("Nothing Prepended:")
for i in ecc2:
print("%X" % i, end='')
print("")
##################################### end ####################################
As expected, both of the BCH encoded output are exactly the same, and the output is shown below.
cawan% python3.8 test_bchlib_04.py
Zeros Prepended:
D7EC33C6695380
Nothing Prepended:
D7EC33C6695380
One important point should take note here. If the input data is bit order reversed, the BCH encoded output should be in bit order reversed form also. Thanks to bchlib for implementing this in default mode. Now, another question arises, is it possible to remain the bit order of the input data which is going to be BCH encoded ? Yes, it is possible by performing bit order reversing to the input data first before passing to the bchlib encoder, and of course the BCH encoded output should perform bit order reversing accordingly. Let's show it by example.
############################## test_bchlib_05.py #############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*511 + b'\xAA')
data_reverse_bit = b''
for i in range(0, len(data)):
data_reverse_bit += bytes([int("{:08b}".format(data[i])[::-1],2)])
data_reverse_bit = data_reverse_bit[::-1]
ecc = bch.encode(data_reverse_bit)
ecc_reverse_bit = b''
for i in range(0, len(ecc)):
ecc_reverse_bit += bytes([int("{:08b}".format(ecc[i])[::-1],2)])
ecc_reverse_bit = ecc_reverse_bit[::-1]
for i in ecc_reverse_bit:
print("%X" % i, end='')
print("")
##################################### end ####################################
In this test_bchlib_05.py, the last bytes of the entire 512 bytes of data input is purposely changed from 0xFF to 0xAA to avoid symmetricity of the data ( 0b11111111 after bit order reversing is still 0b11111111 ). Now, let's see the output.
cawan% python3.8 test_bchlib_05.py
72FFA2590ECDB
So, if everything correct, if 32 bytes of zeros appended to this 512 bytes of data input and get BCH encoded, the output should be equal to 72FFA2590ECDB also. Let's verify it.
############################## test_bchlib_06.py #############################
import bchlib
BCH_POLYNOMIAL = 8219
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
data = bytearray(b'\xFF'*511 + b'\xAA' + b'\x00'*32)
data_reverse_bit = b''
for i in range(0, len(data)):
data_reverse_bit += bytes([int("{:08b}".format(data[i])[::-1],2)])
data_reverse_bit = data_reverse_bit[::-1]
ecc = bch.encode(data_reverse_bit)
ecc_reverse_bit = b''
for i in range(0, len(ecc)):
ecc_reverse_bit += bytes([int("{:08b}".format(ecc[i])[::-1],2)])
ecc_reverse_bit = ecc_reverse_bit[::-1]
for i in ecc_reverse_bit:
print("%X" % i, end='')
print("")
##################################### end ####################################
Perfect, the output is exactly as expected as shown below.
cawan% python3.8 test_bchlib_06.py
72FFA2590ECDB
That's enough for the "first glance" of bchlib by studying some characteristics of Binary BCH. To summarize the lesson learned from the "first glance" in a hacker perspective, one should clear with two points. First, a data input with all zeros will generate all zeros output.Second, a data input padded with whatever size of zeros will generate the same output as no zeros being appended to the data input. Get back to the NAND dump, the two points inspire a mind click. If the 60-bits BCH encoded ECC exists somewhere in the form of all zeros, the 512 bytes of the data in the respective "sub-page" should be in all zeros form too. If yes, it means the data being BCH encoded is either no padding added or all zeros padding added. If not, it means the padding being added is not all zeros. Sound confused ? Let's grab a "sub-page" in the NAND dump where the respective BCH encoded ECC is in all zeros form. It should be clear to explain it by example.
########################### check_all_zeros_ecc.py ###########################
input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
oob_const = b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
zeros_ecc = b'\x00\x00\x00\x00\x00\x00\x00\x00'
page_cnt = 0
positive_cnt = 0
while 1:
data = input_file.read(2112)
if len(data) == 0:
break
oob_1st_32_bytes = data[2048:2048+32]
oob_2nd_32_bytes = data[2048+32:2048+32+32]
if len(data) == 2112 and oob_1st_32_bytes == oob_const:
for i in range(0, 4):
ecc = oob_2nd_32_bytes[i*8:i*8+8]
if ecc == zeros_ecc:
positive_cnt += 1
print("Page Num: %d, Address: 0x%X" % (page_cnt, page_cnt*2112))
break
if positive_cnt == 1:
break
page_cnt += 1
print("Completed")
##################################### end ####################################
Let's see any "page" can meet the condition, if yes, show the "page" number and its address of the first found item. The output is shown below.
cawan% python3.8 check_all_zeros_ecc.py
Page Num: 256, Address: 0x84000
Completed
Nice, the first found item is at address 0x84000. Let's display the full "page" in hex view.
cawan% hexdump -C -v -n $((0x84000+2112)) MT29F2G08ABAEAWP@TSOP48.BIN \
| tail -n $((0x840/16+1))
00084000 76 3d f5 33 62 61 75 64 72 61 74 65 3d 31 31 35 |v=.3baudrate=115|
00084010 32 30 30 00 62 6f 6f 74 61 72 67 73 3d 6d 65 6d |200.bootargs=mem|
00084020 3d 36 34 4d 20 63 6f 6e 73 6f 6c 65 3d 74 74 79 |=64M console=tty|
00084030 53 30 2c 31 31 35 32 30 30 20 75 62 69 2e 6d 74 |S0,115200 ubi.mt|
00084040 64 3d 32 20 72 6f 6f 74 3d 75 62 69 30 3a 75 62 |d=2 root=ubi0:ub|
00084050 69 66 73 20 72 77 20 72 6f 6f 74 66 73 74 79 70 |ifs rw rootfstyp|
00084060 65 3d 75 62 69 66 73 20 69 6e 69 74 3d 2f 6c 69 |e=ubifs init=/li|
00084070 6e 75 78 72 63 00 62 6f 6f 74 63 6d 64 3d 6e 62 |nuxrc.bootcmd=nb|
00084080 6f 6f 74 2e 65 20 30 78 37 46 43 30 20 30 20 30 |oot.e 0x7FC0 0 0|
00084090 78 32 30 30 30 30 30 3b 20 62 6f 6f 74 6d 20 30 |x200000; bootm 0|
000840a0 78 37 46 43 30 00 62 6f 6f 74 64 65 6c 61 79 3d |x7FC0.bootdelay=|
000840b0 31 00 65 74 68 61 63 74 3d 65 6d 61 63 00 65 74 |1.ethact=emac.et|
000840c0 68 61 64 64 72 3d 30 30 3a 30 30 3a 30 30 3a 31 |haddr=00:00:00:1|
000840d0 31 3a 36 36 3a 38 38 00 69 70 61 64 64 72 3d 31 |1:66:88.ipaddr=1|
000840e0 39 32 2e 31 36 38 2e 38 2e 32 30 33 00 6d 74 64 |92.168.8.203.mtd|
000840f0 70 61 72 74 73 3d 6d 74 64 70 61 72 74 73 3d 6e |parts=mtdparts=n|
00084100 61 6e 64 30 3a 32 6d 28 75 2d 62 6f 6f 74 29 2c |and0:2m(u-boot),|
00084110 34 6d 28 6b 65 72 6e 65 6c 29 2c 31 36 6d 28 75 |4m(kernel),16m(u|
00084120 62 69 66 73 29 2c 33 32 6d 28 61 70 70 6c 69 63 |bifs),32m(applic|
00084130 61 74 69 6f 6e 29 2c 33 32 6d 28 62 61 63 6b 75 |ation),32m(backu|
00084140 70 29 2c 2d 28 64 61 74 61 29 00 6e 65 74 6d 61 |p),-(data).netma|
00084150 73 6b 3d 32 35 35 2e 32 35 35 2e 30 2e 30 00 72 |sk=255.255.0.0.r|
00084160 6f 6f 74 76 65 72 3d 4c 59 30 43 2d 30 36 30 31 |ootver=LY0C-0601|
00084170 2d 52 54 30 30 2d 48 30 53 30 2d 32 31 30 31 32 |-RT00-H0S0-21012|
00084180 37 2d 30 30 00 73 65 72 76 65 72 69 70 3d 31 39 |7-00.serverip=19|
00084190 32 2e 31 36 38 2e 38 2e 34 00 73 74 64 65 72 72 |2.168.8.4.stderr|
000841a0 3d 73 65 72 69 61 6c 00 73 74 64 69 6e 3d 73 65 |=serial.stdin=se|
000841b0 72 69 61 6c 00 73 74 64 6f 75 74 3d 73 65 72 69 |rial.stdout=seri|
000841c0 61 6c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |al..............|
000841d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000841e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000841f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084230 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084250 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084260 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084270 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000842f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084300 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084310 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084330 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084340 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084350 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084360 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084370 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084390 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000843f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084440 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084450 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084460 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084470 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084490 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000844f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084500 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084510 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084520 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084540 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084550 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084560 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084570 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084580 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084590 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000845f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084600 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084610 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084620 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084630 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084640 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084650 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084660 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084670 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084680 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084690 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000846f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084700 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084710 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084720 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084730 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084740 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084750 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084760 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084770 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084780 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084790 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000847f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084800 ff ff 00 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00084810 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00084820 3b 8d c6 e5 19 b2 24 50 00 00 00 00 00 00 00 00 |;.....$P........|
00084830 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00084840
So, what can be deduced from this "page" ? Well, it is almost certain the data portion of a "page" in 2048 bytes in size is divided into four parts with 512 bytes each, which I named it as "sub-page" at the start of this article. In this "page", the first "sub-page" is started from 0x84000 to 0x841ff, which contains non-zeros data, with BCH encoded ECC as 3b8dc6e519b22450. The following three "sub-page" are containing all zeros data, with BCH encoded ECC as all zeros, respectively. In other words, the 512 bytes of zeros in each of these three "sub-page" are either being BCH encoded directly, or being padded with a certain number of zeros ONLY, in order to generate all zeros ECC. Hence, once the others BCH encoding parameters are slowly unveiled in the discussion of the following section, it becomes straightforward in recovering the secret association between ECC and data. So, the second, third, and fourth "sub-page" in a "page" are clear now, and it is usually about the same for all the other "page". However, the padding scheme of the first "sub-page" is still uncertain yet, unless a "page" with four all zeros ECCs can be found. Let's try it.
####################### check_all_zeros_in_all_ecc.py ########################
input_file = open("MT29F2G08ABAEAWP@TSOP48.BIN","rb")
oob_const = \
b'\xff\xff\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff' + \
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
zeros_ecc = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
page_cnt = 0
while 1:
data = input_file.read(2112)
if len(data) == 0:
break
oob_1st_32_bytes = data[2048:2048+32]
oob_2nd_32_bytes = data[2048+32:2048+32+32]
if len(data) == 2112 and oob_1st_32_bytes == oob_const:
if oob_2nd_32_bytes[0:32] == zeros_ecc[0:32]:
print("Page Num: %d, Address: 0x%X" % (page_cnt, page_cnt*2112))
break
page_cnt += 1
print("Completed")
##################################### end ####################################
Let's find for any expected "page". However, the output is unexpected, as shown below.
cawan% python3.8 check_all_zeros_in_all_ecc.py
Completed
Anyhow, just let go the unsolved part for now, we will get back later in the next section. Now, let's have a brief hacker overview of Binary BCH implementation, yes, solely from a hacker's perspective, not academic.
In general, the BCH codec needs a primitive polynomial in order to derive a generator polynomial to be used for code generation. The Gallois Field order will determine the number of primitive polynomial that can be used by the BCH codec. A polynomial can be represented by an integer or in bit form binary. The set bits of the integer or the bit form binary represents the coefficients of the given order of magnitude of the selected primitive polynomial. Sound confused ? Let's have an example.
0x201B
|
V
0b0010000000011011
|
V
0b 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 1
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | | | | | | | | | |
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
For the hex representation of 0x201B, it can be represented in bit form binary as 0b0010000000011011. Each of the set bit in this number will reflect the coefficient of the given order of magnitude to form a primitive polynomial. For the case of 0x201B, bit-0, bit-1, bit-3, bit-4, and bit-13 are the set bits. So, the primitive polynomial is,
x^13 + x^4 + x^3 + x^1 + 1
Yes, each set bit position reflects the selected order of magnitude, and the greatest set bit position is defined as the degree of the primitive polynomial. Again, for the case of 0x201B, it is in degree 13. For most of the times, the degree number is known as m to represent the Gallois Field order, and so for the case of 0x201B, it can be expressed as m=13. In order to protect a data in a certain number of size in the unit of bit, the number should be less than 2^m. For example, to protect a data with the size of 512 bytes, the data length in the unit of bit is 512*8=4096. This number is normally known as k, and so, it is more appropriate to write in the form of k=4096. So, number of 2^m should be greater than 4096, then m should be greater than log(4096)/log(2)=12, and the m should be at least 13. Again, for the case of 0x201B, since its m is 13, then it is suitable to be used in protecting a data with 512 bytes in size. What is the hex number of 0x201B in decimal ? It is 8219, sound familiar ? Yes, it was being used in the "first glance" bchlib section in defining the variable BCH_POLYNOMIAL.
When talking about data protection, one must talk about the protection strength. The protection strength is about if something went wrong in data, then the data can tolerate up to how many bit of errors in order to recover it back to the correct state. The strength is normally known as t. So, when someone mentions t=4, it means the ECC can tolerate up to 4 bits of error. Alright, it is clear for m, k, and t now. Let's proceed to the discussion about the length of ECC, which is more commonly named as the size of parity bits. For BCH, the size of parity bits is equal to mt. Thus, by given m=13, k=4096, and t=4, since 2^m=2^13=8192 which is greater than k=4096, it is appropriate and no discrepancy at all to generate BCH encoded ECC of parity bits with the size of mt=13*4=52 bits. Remember the ECC size being found from the NAND dump analysis in the previous part ? Yes, it is 60-bits (8 bytes deduct the last 4 bits of zeros).
Well, the boring stuff is getting interesting now. Let's see what can be deduced with this little clue. The data size to be protected is 512 bytes, which is 4096 bits. The m should be at least 13 and so 2^m=2^13=8192, which is sufficient to protect the 4096 bits of data. As the number of parity bits is 60, the respective factors are 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, and 60. By given m*t=60, and m>=13, the possible combination of (m, t) are (15, 4), (20, 3), (60, 1). While t=4 is a common approach for majority of the BCH implementation of ECC, the combination of m=15 and t=4 is most probably. The others two combinations of (20, 3) and (60, 1) are not only unrealistic, but also terribly overkilled. At this stage, by assuming m=15 and t=4, which primitive polynomial should be selected ? Let's refer to the primitive polynomial list as stated in [4]. For degree 15, the candidates are shown below.
x^15 + x^1 + 1
x^15 + x^4 + 1
x^15 + x^7 + 1
x^15 + x^7 + x^6 + x^3 + x^2 + x^1 + 1
x^15 + x^10 + x^5 + x^1 + 1
x^15 + x^10 + x^5 + x^4 + 1
x^15 + x^10 + x^5 + x^4 + x^2 + x^1 + 1
x^15 + x^10 + x^9 + x^7 + x^5 + x^3 + 1
x^15 + x^10 + x^9 + x^8 + x^5 + x^3 + 1
x^15 + x^11 + x^7 + x^6 + x^2 + x^1 + 1
x^15 + x^12 + x^3 + x^1 + 1
x^15 + x^12 + x^5 + x^4 + x^3 + x^2 + 1
x^15 + x^12 + x^11 + x^8 + x^7 + x^6 + x^4 + x^2 + 1
x^15 + x^14 + x^13 + x^12 + x^11 + x^10 + x^9 + x^8 + x^7 + x^6 + \
x^5 + x^4 + x^3 + x^2+1
Well, the first candidate should be selected, which is,
x^15 + x^1 + 1
The polynomial can be represented in binary bit form as mentioned earlier, which is,
0b1000000000000011
In hex, it is 0x8003, in decimal it is 32771. So, get back to the bchlib, the BCH_POLYNOMIAL and BCH_BITS, both of them should be set as 32771 and 4, respectively.
Now, by assuming nobody will naive enough to do BCH encoding without performing bit order reversing of the entire data input first, let's try the BCH encoding without any padding for the first "page".
###################### bch_encoding_without_padding.py #######################
import bchlib
import binascii
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
page = input_file.read(2112)
ECC = page[2048+32:2048+32+32]
for i in range(0, 4):
ecc_generated = bch.encode(page[i*512:i*512+512])
print("\nSub-page: %d" % i)
print("ECC Ori:", end=' ')
print(ECC[i*8:i*8+8].hex().upper())
print("ECC Generated:", end=' ')
print(ecc_generated.hex().upper())
if ECC[i*8:i*8+8] == ecc_generated:
print("Match !")
else:
print("Wrong !")
print("\nCompleted")
##################################### end ####################################
The output is shown below.
cawan% python3.8 bch_encoding_without_padding.py
Sub-page: 0
ECC Ori: F689F779E560C9E0
ECC Generated: 8DE136AAF3E03F90
Wrong !
Sub-page: 1
ECC Ori: D6E3EDCB9CB0F9F0
ECC Generated: 6C6CF320EFAD8660
Wrong !
Sub-page: 2
ECC Ori: 1FDAD4A49CD41BE0
ECC Generated: 1058EAC213313D70
Wrong !
Sub-page: 3
ECC Ori: E090CC85D8D2E280
ECC Generated: B36A94B537E14BA0
Wrong !
Completed
None of the four "sub-page" generate the correct ECC. So, the "sub-page" should be padded by a certain number of zero before getting BCH encoded. Let's try to do BCH encoding by padding the "sub-page" from 1 to 32 bytes of zeros.
#################### bch_encoding_with_zeros_padding.py ######################
import bchlib
import binascii
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
page = input_file.read(2112)
ECC = page[2048+32:2048+32+32]
found_flag = 0
for i in range(0, 4):
print("\nSub-page: %d" % i)
print("ECC Ori:", end=' ')
print(ECC[i*8:i*8+8].hex().upper())
for j in range(1, 33):
padding = b'\x00'*j
ecc_generated = bch.encode(page[i*512:i*512+512]+padding)
if ECC[i*8:i*8+8] == ecc_generated:
print("ECC Generated:", end=' ')
print(ecc_generated.hex().upper())
print("Match !", end=' ')
print("Zeros padded number: %d" % j)
found_flag = 1
break
if found_flag == 0:
print("Wrong !")
found_flag = 0
print("\nCompleted")
#################################### end ####################################
Let's go and run the check. Hola, the output is interesting, as shown below.
cawan% python3.8 bch_encoding_with_zeros_padding.py
Sub-page: 0
ECC Ori: F689F779E560C9E0
Wrong !
Sub-page: 1
ECC Ori: D6E3EDCB9CB0F9F0
ECC Generated: D6E3EDCB9CB0F9F0
Match ! Zeros padded number: 24
Sub-page: 2
ECC Ori: 1FDAD4A49CD41BE0
ECC Generated: 1FDAD4A49CD41BE0
Match ! Zeros padded number: 24
Sub-page: 3
ECC Ori: E090CC85D8D2E280
ECC Generated: E090CC85D8D2E280
Match ! Zeros padded number: 24
Completed
So, for those four "sub-page" in a "page", other than the first "sub-page", the second, third, and fourth "sub page" are padded with 24 bytes of zeros before being BCH encoded in order to generate the correct ECC, respectively.
However, the first "sub-page" is still in cryptic, which need to tweak a bit.Since the rest of the "sub-page" are padded with 24 bytes of zeros, it is very likely the first "sub-page" is padded with 24 bytes of non-zeros data then. It should be something related to some kind of "metadata" which is descriptive to the "page" itself. Remember the first 32 bytes of OOB ?
Let's check it again.
cawan% hexdump -C -v -n $((2112-32)) MT29F2G08ABAEAWP@TSOP48.BIN | tail -n 3
00000800 ff ff 00 00 ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000810 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
00000820
The two bytes of zeros at 0x802 and 0x803 are a little bit strange. So, is it possible for the first few bytes of the 24 bytes of zeros padding are replaced by some bytes from here ? Let's try to replace the 24 bytes of zeros padding byte by byte, until the entire 24 bytes of padding become.
ffff0000ffffffffffffffffffffffffffffffffffffffff
Let's try it.
####################### bch_encoding_of_1st_subpage.py #######################
import bchlib
import binascii
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
page = input_file.read(2112)
subpage = page[0:512]
ECC = page[2048+32:2048+32+8]
paddingx = \
b'\xFF\xFF\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF' + \
b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF'
padding0 = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'
data_input = subpage + padding0
data_input = bytearray(data_input)
for i in range(0, 24):
data_input[512+i] = paddingx[i]
ecc_generated = bch.encode(data_input)
if ecc_generated == ECC:
print("Match !")
print("Padding:", end=' ')
print(data_input[512:].hex().upper())
break
print("\nCompleted")
#################################### end ####################################
Let's run it. Bingo, the padding pattern found, as shown below.
cawan% python3.8 bch_encoding_of_1st_subpage.py
Match !
Padding: FFFF00000000000000000000000000000000000000000000
Completed
3 - Bit Errors Fixing with ECC
Perfect. Now, the secret association between ECC and data is fully unveiled. As a conclusion, for each of the "sub-page" in a "page", the first "sub-page" has to be padded by 24 bytes of padding which comprise 2 bytes of 0xFF following by 22 bytes of zeros, before getting BCH encoded to generate correct ECC. For the case of second, third, and fourth "sub-page", only a 24 bytes of all zeros padding is needed to generate correct ECC, respectively. So, by doing the BCH decoding in the similar manner to all the "page" of the entire NAND dump, all the bit errors are getting fixed perfectly. After that, all the 64 bytes OOB in each "page" should be removed and generating a new NAND dump with contiguous data in "page" by "page" without any bit errors, and I rename it as cawan_output.bin, as shown below.
####################### NAND_dump_fix_bit_erros_ecc.py #######################
import bchlib
BCH_POLYNOMIAL = 32771
BCH_BITS = 4
input_file = open("./MT29F2G08ABAEAWP@TSOP48.BIN", "rb")
output_file = open("./cawan_output.bin", "wb")
pad_sub0 = \
b'\xFF\xFF\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'
pad_subx = \
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' + \
b'\x00\x00\x00\x00\x00\x00\x00\x00'
bch = bchlib.BCH(BCH_POLYNOMIAL, BCH_BITS)
count = 0
error_cnt = 0
while 1:
page = input_file.read(2112)
if len(page) != 2112:
break
for i in range(0, 4):
data, ecc = page[512*i:512*i+512], page[2048+32+i*8:2048+32+i*8+8]
if i == 0:
data_padded = data + pad_sub0
else:
data_padded = data + pad_subx
data_padded = bytearray(data_padded)
bitflips = bch.decode_inplace(data_padded, ecc)
if bitflips == 0:
output_file.write(data_padded[:512])
elif bitflips > 0:
error_cnt += 1
output_file.write(data_padded[:512])
elif bitflips == -1:
output_file.write(data_padded[:512])
count += 1
print("Sub-page with error count: %d\n" % error_cnt)
print("Completed.")
#################################### end ####################################
Well, there are 20 "sub-page" with bit errors have being fixed with the ECC, as shown below.
cawan% python3.8 NAND_dump_fix_bit_erros_ecc.py
Sub-page with error count: 20
Completed.
By armed with knowledge, any suitable common tool can be weaponized for hacking purposes. Don't be silly and get stubborn in believing a proprietary, special, commercial, or even an automated tool can work as expected without requiring a single knowledge in the field. So, the firmware is ready right now, let's proceed to the firmware analysis.
4 - UBI Image Analysis
As a common approach, let's begin with binwalk and expect for gold strikes or money grow on tree, or both. Let's see the binwalk output as shown below.
cawan% binwalk cawan_output.bin
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
963584 0xEB400 CRC32 polynomial table, little endian
966688 0xEC020 CRC32 polynomial table, little endian
970868 0xED074 LZO compressed data
2097152 0x200000 uImage header, header size: ...
2097216 0x200040 Linux kernel ARM boot executable zImage ...
2115956 0x204974 gzip compressed data, maximum compression, ...
6291456 0x600000 UBI erase count header, version: 1, ...
It looks interesting. As what is stated in the title of this article, only the UBI image is going to be analyzed. The full description of the UBI header being detected at address 0x600000 is shown below.
UBI erase count header,
version: 1,
EC: 0x1,
VID header offset: 0x800,
data offset: 0x1000
The header really makes sense with UBI magic at 0x600000, version 1, the erase count is 1, which mean it is a new NAND flash, or at least it is just being reformatted. After that, the volume ID header is 0x800 or 2048 indecimal away from 0x600000, which is a common approach for NAND flash. One important thing to emphasize here. The newly generated NAND dump is defined as logical NAND dump which is OOB removed and the size of each "page" is 2048 bytes. So, it is really a common approach in locating the volume ID header one "page" away from the UBI header. Then, the actual data is 0x1000 or 4096 in decimal away from the 0x600000, in other words it is another one "page" away from the volume ID header. This is also a common approach for NAND flash. So, there is something as a lunch ? Let's try to extract it with binwalk by passing in the well known parameters, -Me. The lengthy output seems convincing. Let's get into the directory hosting the extracted files, as shown below.
cawan% cd _cawan_output.bin.extracted
cawan% ls
204974 _204974.extracted 600000.ubi ED074.lzo ubifs-root
As ubifs-root directory is generated, let's get into the directory.
cawan% cd ubifs-root
cawan% ls
1941946494 3823591600
Another two directory found. Let's check each directory by using tree command.
cawan% tree -L 2 1941946494
1941946494
ubifs
bin
dev
etc
home
lib
linuxrc -> bin/busybox
mnt
proc
root
sbin
sys
tmp
usr
var
work
15 directories, 1 file
cawan% tree -L 3 3823591600
3823591600
app
1 directory, 0 files
Well, it seems the file system is extracted in the directory of 1941946494. However, for 3823591600, it is an empty directory. Let's go further.
cawan% cd 1941946494
cawan% cd ubifs
cawan% ls
bin dev etc home lib linuxrc mnt proc root sbin sys tmp usr \
var work
cawan% cd etc
cawan% ls
fstab HOSTNAME inittab pointercal profile~ ts.conf
group inetd.conf networks ppp services vsftpd.conf
gshadow init.d passwd profile shadow
cawan% cat fstab
cawan% ls -la fstab
-rw-rw-r-- 1 user user 186 Mar 30 2015 fstab
cawan% cat fstab | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 ..........
Well, must be something wrong to the file system extraction. It seems the free lunch is not really free. Let's go further to find the reason ? Don't get into mischief, this is really not in the right track for a hardcore hacker. While talking about analysis, each step of the entire process should be strictly under control, trackable and explainable, and it applies to firmware analysis too. Let's start from the beginning with dd again and craft the UBI image out manually.
cawan% dd if=./cawan_output.bin of=./ubi.bin bs=1 skip=$((0x600000))
262144000+0 records in
262144000+0 records out
262144000 bytes (262 MB, 250 MiB) copied, 281.069 s, 933 kB/s
cawan% file ubi.bin
ubi.bin: UBI image, version 1
It really takes a while to generate ubi.bin. Now, let's verify the UBI header, volume ID header, and the start of data in hex view.
cawan% hexdump -C -n $((2048*3)) ./600000.ubi
00000000 55 42 49 23 01 00 00 00 00 00 00 00 00 00 00 01 |UBI#............|
00000010 00 00 08 00 00 00 10 00 73 bf c0 7e 00 00 00 00 |........s..~....|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 01 9f 6b b3 |..............k.|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000800 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00001800
Let's interpret the UBI header with its data structure as shown below.
struct ubi_ec_hdr {
__be32 magic;
__u8 version;
__u8 padding1[3];
__be64 ec;
__be32 vid_hdr_offset;
__be32 data_offset;
__be32 image_seq;
__u8 padding2[32];
__be32 hdr_crc;
}
The header magic is "UBI#" with 4 bytes in size, following by the version number as 1 which is 1 byte in size. After the 3 bytes of padding, then it is so called Erase-Counter with abbreviation as ec which indicate how many times the block has been erased. A little bit of background knowledge about this which might not be hacker friendly. The NAND flash storage has a certain number of lifespan. For each time of erase operation to the same place in the flash, it will reduce the lifespan. So, once the lifespan count reached, the place becomes useless.
UBI divides the NAND flash storage into "block", which comprise a number of "page". For the case of MT29F2G08ABAEAWP, a "block" comprises 64 "page" where for each "page" is 2048 bytes in size. So, it is crucial in monitoring the used count of all the "block" in order to avoid data loss. Hence, while the used count of a "block" reached a certain number of triggering level, the entire data in the "block" has to be relocated to another "block" which is in good condition. While the relocation of the physical "block" will affect the order or sequence of the "block", it needs some kind of abstraction to manage the physical "block" in the logical way.
By ensuring the order or sequence of logical "block" in high level, the logical "block" can particularly being remapped to the appropriate physical "block" accordingly. Such an abstraction is formally known as wear-leveling. Well, the so called used count is identical to erase count in UBI, or worn count in wear-leveling. UBI is responsible to provide such a wear-leveling mechanism by managing the logical "block" in the most appropriate way.
Let's get back to the 8 bytes of ec item of the UBI header. The ec is 1 means it is getting formatted for 1 time. After the ec, it is 4 bytes of volume ID offset from the beginning of UBI header, it is 0x800, which is about 1 "page" size. The volume ID is followed by data offset in 4 bytes size, it is 0x1000, which is another 1 "page" from the volume ID. Next to the data offset is another 4 bytes to represent image sequence for identifing the respective UBI block is belonging to which UBIFS for file system construction. So, the UBIFS is indeed the actual file system that a hacker should focus on. After that, there are 32 bytes of padding, and at last, it is the UBI header CRC checksum in 4 bytes.
Now, let's check how many UBIFS exist in the UBI image.
############################ check_ubifs_count.py ############################
input_file = open("./600000.ubi", "rb")
count = 0
img_seq = b''
tmp_seq = b''
while 1:
block = input_file.read(2048*64)
if len(block) != 2048*64:
break
if block[0:4] == b'\x55\x42\x49\x23':
img_seq = block[24:28]
if img_seq != tmp_seq:
print("0x", end='')
print(img_seq.hex().upper(), end=' -> ')
print("%d" % int(img_seq.hex(),16))
tmp_seq = img_seq
count += 1
print("\nCompleted.")
#################################### end ####################################
The output is shown below.
cawan%% python3.8 check_ubifs_count.py
0x73BFC07E -> 1941946494
0xE3E760B0 -> 3823591600
0x9F61AB77 -> 2673978231
0x49F558F2 -> 1240815858
Completed.
Sound familiar ? Yes, definitely. 1941946494 and 3823591600 were being used by binwalk to name the folders to host extracted files. How about the another two ? That's definitely something wrong in the process while binwalk extracting the UBI image. Before proceed further, let's try to estimate the size of data in used in the UBI image. One thing to clarify first. Whenever an UBI erase block is being in used, it should come with valid volume ID header, and the magic is "UBI!". Please note that the term "UBI erase block" is in fact the formal term of logical UBI block.
############################ check_data_inuse.py #############################
input_file = open("./600000.ubi", "rb")
data_inuse = 0
UBI_hdr = b'\x55\x42\x49\x23'
VID_hdr = b'\x55\x42\x49\x21'
while 1:
block = input_file.read(2048*64)
if len(block) != 2048*64:
break
if block[0:4] == UBI_hdr and block[2048:2048+4] == VID_hdr:
data_inuse += 2048*64
print("Data size in use: %d" % data_inuse)
print("\nCompleted.")
#################################### end ####################################
The output is shown below.
cawan% python3.8 check_data_inuse.py
Data size in use: 40239104
Completed.
Nice, it is about 40 MB in size, including some extra space which is hard to estimate precisely. Now, it is time to talk about how to extract the UBIFS from UBI image. As it is about the matter of re-arranging the UBI erase blocks according to the image_seqnumber, it is no harm to try with a well known toolkit, UBI Reader. Let's see the result.
cawan% ubireader_extract_images ubi.bin
cawan% ls
cawan_output.bin ubi.bin ubifs-root
cawan% cd ubifs-root
cawan% ls
ubi.bin
cawan% cd ubi.bin
cawan% ls
img-1240815858_vol-data.ubifs img-2673978231_vol-backup.ubifs
img-1941946494_vol-ubifs.ubifs img-3823591600_vol-app.ubifs
cawan% ls -la
total 145212
drwxrwxr-x 2 user user 4096 May 29 16:46 .
drwxrwxr-x 3 user user 4096 May 29 16:46 ..
-rw-rw-r-- 1 user user 100438016 May 29 16:46 img-1240815858_vol-data.ubifs
-rw-rw-r-- 1 user user 11935744 May 29 16:46 img-1941946494_vol-ubifs.ubifs
-rw-rw-r-- 1 user user 27299840 May 29 16:46 img-2673978231_vol-backup.ubifs
-rw-rw-r-- 1 user user 9015296 May 29 16:46 img-3823591600_vol-app.ubifs
Cool. No error prompt at all and 4 UBIFS getting extracted. Remember the estimated data in use size is about 40 MB ? It is reasonable to assume the UBIFS with the name of img-1240815858_vol-data.ubifs is something wrong. For the rest of 3 UBIFS should be in good condition because their total size is about 40 MB plus, estimation.
Let's try to use the UBI Reader toolkit again to extract files from UBIFS. Let's start from img-1941946494_vol-ubifs.ubifs as shown below.
cawan% ubireader_extract_files img-1941946494_vol-ubifs.ubifs
Extracting files to: ubifs-root
decompress Warn: LZO Error: EResult.LookbehindOverrun
_process_reg_file Warn: inode num:693 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.InputOverrun
_process_reg_file Warn: inode num:592 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.LookbehindOverrun
_process_reg_file Warn: inode num:587 path:<...> :can't concat NoneType to bytearray
decompress Warn: LZO Error: EResult.InputOverrun
...
...
...
cawan% ls
img-1240815858_vol-data.ubifs img-2673978231_vol-backup.ubifs ubifs-root
img-1941946494_vol-ubifs.ubifs img-3823591600_vol-app.ubifs
cawan% cd ubifs-root
cawan% ls
bin dev etc home lib linuxrc mnt proc root sbin sys tmp usr \
var work
After getting a huge number of error prompt, it seems a file system is generated. Is that the same thing as what was being generated by binwalk earlier ? Let's check.
cawan% cd etc
cawan% cat fstab
cawan% ls -la fstab
-rw-rw-r-- 1 user user 186 Mar 30 2015 fstab
cawan% cat fstab | xxd
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 ..........
Damn, it is indeed the same thing. It seems the ubireader_extract_files is unable to fully interpret the UBIFS and generate correct files. How about the others two UBIFS ? Let's check.
cawan% ubireader_extract_files img-2673978231_vol-backup.ubifs
Extracting files to: ubifs-root
index Fatal: LEB: 110 at 13998336, Node size smaller than expected.
cawan% ubireader_extract_files img-3823591600_vol-app.ubifs
Extracting files to: ubifs-root
index Fatal: LEB: 58 at 7461120, Node size smaller than expected.
Sorry, fatal error this time, nothing generated. Since this is the NAND dump from a real device which is fully functional, and all the bit errors have being fixed, the UBIFS should work accordingly. It should proceed in another route by emulating the NAND chip to work associated with MTD by using nandsim. In most of the hacking literature, while talking about nandsim, a standard conventional approach is dd the entire UBI image into the emulated MTD device by nandsim, and modprobe the ubi driver with some parameters, and the ubi driver is on its own to deal with the UBI image blob.
Let's put a few words of comment about this. As what mentioned earlier, UBI erase block is purposely for wear levelingimplementation in UBI layer. Since the UBI erase block is in logical form, they are normally not in sequence physically, which is the case of the NAND dump.
So, instead of relying the UBI driver to work extra for block remapping operation, which might have high chance in causing errors in all the regards under emulation mode, it is better to pre-process the UBI image in offline mode by using ubireader_extract_images first. The output of ubireader_extract_images is already in UBIFS form, which is the actual file system like squashfs, jffs2, yaffs2, or CRAMFS do.
In other words,by dealing with UBIFS directly, the chance of getting errors will get minimized. Anyway, it is no harm to go with the standard conventional approach first. Let's get started to grab the low-hanging fruit. In order to emulate a NAND chip, one should get know the ID codes of the chip.By referring to the datasheet of MT29F2G08ABAEAWP, the first 4 bytes are 0x2c, 0xda, 0x90, and 0x95. With such an info, it is ready for nandsim.
cawan% sudo modprobe nandsim first_id_byte=0x2c second_id_byte=0xda
third_id_byte=0x90 fourth_id_byte=0x95
cawan% cat /proc/mtd
dev: size erasesize name
mtd0: 10000000 00020000 "NAND simulator partition 0"
cawan% sudo mtdinfo -a
Count of MTD devices: 1
Present MTD devices: mtd0
Sysfs interface supported: yes
mtd0
Name: NAND simulator partition 0
Type: nand
Eraseblock size: 131072 bytes, 128.0 KiB
Amount of eraseblocks: 2048 (268435456 bytes, 256.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size: 512 bytes
OOB size: 64 bytes
Character device major/minor: 90:0
Bad blocks are allowed: true
Device is writable: true
Since it is assumed as low-hanging fruit for now, just ignore the parameters shown first. Now, let's dd the UBI image into /dev/mtd0.
cawan% sudo dd if=ubi.bin of=/dev/mtd0 bs=2048
128000+0 records in
128000+0 records out
262144000 bytes (262 MB, 250 MiB) copied, 2.7339 s, 95.9 MB/s
Done. Now, modprobe the ubi driver.
cawan% sudo modprobe ubi mtd=0,2048
modprobe: ERROR: could not insert 'ubi': Invalid argument
Sorry, the low-hanging fruit is in fact not so low for this NAND dump.Let's proceed in the proper way as what being proposed earlier. Let's start again from the beginning, by rmmod the nandsim first and modprobe the nandsim again.
cawan% sudo rmmod nandsim
cawan% sudo modprobe nandsim first_id_byte=0x2c second_id_byte=0xda
third_id_byte=0x90 fourth_id_byte=0x95
Well, nothing special here. The output of mtdinfo -a is nothing special also because it is just about the parameters of MT29F2G08ABAEAWP. The only thing that need to make sure is the /dev/mtd0 is created. After that,use ubiformat with correct parameters to bring up the emulated NAND flash as UBI compatible with the UBI specification being used in the NAND dump,as shown below.
cawan% sudo ubiformat -s 2048 -O 2048 /dev/mtd0
ubiformat: mtd0 (nand), size 268435456 bytes (256.0 MiB), \
2048 eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
libscan: scanning eraseblock 2047 -- 100 % complete
ubiformat: 2048 eraseblocks are supposedly empty
ubiformat: formatting eraseblock 2047 -- 100 % complete
Let's explain the two compulsory input parameters of ubiformat. The -s is also known as sub-page-size, which is the minimum i/o unit used for UBI headers. By setting it as 2048, it prevents the UBI from dividing the entire 2048 bytes into smaller unit of sub-page. Next, the -O is volume ID header offset. By setting it as 2048, it means the volume ID header should start 1 page or 2048 bytes away from the start of the UBI erase block.
Please note that without specifying these two parameters with the correct figures, or leave everything by default, it will end-up with errors in the following steps. Let's proceed further to modprobe the UBI driver.
cawan% sudo modprobe ubi
cawan%
No error prompt, just assume it is succeeded. Now, use ubiattach to create a UBI device file which work associated with /dev/mtd0, as shown below.
cawan% sudo ubiattach -p /dev/mtd0 -O 2048
UBI device number 0, \
total 2048 LEBs (260046848 bytes, 248.0 MiB), \
available 2002 LEBs (254205952 bytes, 242.4 MiB), \
LEB size 126976 bytes (124.0 KiB)
Again, the input parameter of -O 2048 is crucial to specify the volume ID header offset as 2048 bytes away from the UBI eraseblock, which is similar to ubiformat. It is extremely important to make sure the Logical Eraseblock (LEB) size is 126976 bytes. Why ? Because a eraseblock size is 2048*64=131072, and after deducting 2 pages with the size of 2048 bytes each (one for UBI header and one for volume ID header) from it,then the LEB size becomes 131072-2048-2048=126976. So, they match each other. Otherwise, it will end-up with errors in the following step also.
A new UBI device file is created as /dev/ubi0, which can check its details by using ubinfo, as shown below.
cawan% sudo ubinfo /dev/ubi0 -a
ubi0
Volumes count: 0
Logical eraseblock size: 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks: 2048 (260046848 bytes, 248.0 MiB)
Amount of available logical eraseblocks: 2002 (254205952 bytes, 242.4 MiB)
Maximum count of volumes 128
Count of bad physical eraseblocks: 0
Count of reserved physical eraseblocks: 40
Current maximum erase counter value: 0
Minimum input/output unit size: 2048 bytes
Character device major/minor: 237:0
Now, a UBI environment which is having exactly the same specification with the UBI image in the NAND dump is getting ready. Let's create a volume with sufficient storage to host the UBIFS being created by ubireader_extract_images, as shown below.
cawan% sudo ubimkvol -N volume1 -s 50MiB /dev/ubi0
Volume ID 0, \
size 413 LEBs (52441088 bytes, 50.0 MiB), \
LEB size 126976 bytes (124.0 KiB), dynamic, name "volume1", alignment 1
Well, a new volume named as "volume1" with 50 MB in size has been created successfully, together with a new device file as /dev/ubi0_0,by using ubimkvol. Now, it is time to let volume1 to host a UBIFS by using ubiupdatevol. Let's start with img-1941946494_vol-ubifs.ubifs first, as shown below.
cawan% ls -la
total 145216
drwxrwxr-x 3 user user 4096 May 30 01:40 .
drwxrwxr-x 3 user user 4096 May 29 16:46 ..
-rw-rw-r-- 1 user user 100438016 May 29 16:46 img-1240815858_vol-data.ubifs
-rw-rw-r-- 1 user user 11935744 May 29 16:46 img-1941946494_vol-ubifs.ubifs
-rw-rw-r-- 1 user user 27299840 May 29 16:46 img-2673978231_vol-backup.ubifs
-rw-rw-r-- 1 user user 9015296 May 29 16:46 img-3823591600_vol-app.ubifs
drwxrwxr-x 2 user user 4096 May 30 01:40 ubifs-root
cawan% sudo ubiupdatevol /dev/ubi0_0 img-1941946494_vol-ubifs.ubifs
cawan%
5 - Firmware Extraction
Everything works perfectly without any single error so far. Let's see the low-hanging fruit which is not so low is available now or not.
cawan% mkdir /tmp/nand
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% cd /tmp/nand
cawan% ls
bin dev etc home lib linuxrc mnt proc root sbin sys tmp usr \
var work
Hopefully this is not the same thing as what ubireader_extract_files generates in the previous section. Let's verify it.
cawan% cd etc
cawan% cat fstab
proc /proc proc defaults 0 0
none /var/shm shm defaults 0 0
sysfs /sys sysfs defaults 0 0
none /tmp tmpfs defaults 0 0
what an amazing moment. Let's try with another two UBIFS.
cawan% sudo umount /tmp/nand
cawan% sudo ubiupdatevol /dev/ubi0_0 img-2673978231_vol-backup.ubifs
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% ls /tmp/nand
14x8.hzk dat.ini flat_backup libplat.so ParaAutoNet.db ParamUniq.db
acmet driver_gwzd.ko gsmMuxd lyzd ParamMeter.db ppp
check.ini factory icons.bmp manuf.xin ParamOther.db seting.ini
chs.bin filecheck libacmet.so metproto.so ParamTerm.db startup.sh
cawan% sudo umount /tmp/nand
cawan% sudo ubiupdatevol /dev/ubi0_0 img-3823591600_vol-app.ubifs
cawan% sudo mount -t ubifs /dev/ubi0_0 /tmp/nand
cawan% ls /tmp/nand
14x8.hzk driver_gwzd.ko libacmet.so manuf.xin startup.sh
check.ini filecheck libplat.so metproto.so tmt_info.log
chs.bin gsmMuxd lyzd ppp updateinfo.xin
dat.ini icons.bmp lyzd.xzip seting.ini
6 - Conclusion
So, as a conclusion, the entire file system hosting in three different UBIFS have been fully extracted successfully.
Happy hacking, and keep hacking.
References:-
[1] MT29F2G08ABAEAWP Data Sheet, https://datasheet.lcsc.com/lcsc/1811032117_Micron-Tech-MT29F2G08ABAEAWP-E_C110895.pdf
[2] DumpFlash Tool, https://github.com/ohjeongwook/dumpflash
[3] python-bchlib, https://github.com/jkent/python-bchlib
[4] Primitive Polynomial List, https://www.partow.net/programming/polynomials/index.html
[5] UBI Header Structure, https://kernel.googlesource.com/pub/scm/linux/kernel/git/rw/mtd-utils/+/refs/heads/master/include/mtd/ubi-media.h
