Forensic question: What information can be located regarding searches conducted in the Google search bar?
OS Version: Nougat (220.127.116.11)
WinHex, Version 19.7 (Specialist License)
Search history. It is an excellent way to peer into someone’s mind and see what they are thinking at a particular moment in time. In a court room, search history can be used to show intent (mens rea). There are plenty of examples where search history has been used in court to establish a defendant’s intent. Probably the most gruesome was the New York City Cannibal Cop trial, where prosecutors used the accused’s search history against him. Of course, there is a fine line between intent and protected speech under the First Amendment.
Over the past month and a half I have published a couple of blog posts dealing with Google Assistant and some of the artifacts it leaves behind, which you can find here and here. While poking around I found additional artifacts present in the same area that have nothing to do with Google Assistant: search terms.
While I wasn’t surprised, I was; after all, the folder where this data was found had “search” in the title (com.google.android.googlequicksearchbox). The surprising thing about these search terms is that they are unique to this particular area in Android; they do not appear anywhere else, so it is possible that you or I (or both) could have been missing pertinent artifacts in our examinations (I have missed something). Conducting a search via this method can trigger Google Chrome to go to a particular location on the Internet, but the term used to conduct the search is missing from the usual spot in the History.db file in Chrome.
My background research on the Google Search Bar (as it is now known) found that this feature may not be used as much as, say, the search/URL bar inside Chrome. In fact, there are numerous tutorials online that show a user how to remove the Google Search Bar from Android’s Home Screen, presumably to make more space for home screen icons. I will say, however, that while creating two Android images (Nougat and Oreo), having that search bar there was handy, so I can’t figure out why people wouldn’t use it more. But, I digress…
Before I get started there are a few things to note. First, the data for this post comes from two different flavors of Android: Nougat (7.1.2) and Oreo (8.1). The images can be found here and here, respectively. Second, the device used for each image was the same (LG Nexus 5X), and it was rooted both times using TWRP and Magisk. Third, I will not provide a file structure breakdown here as I did within the Google Assistant blog posts. This post will focus on the pertinent contents along with content markers within the binarypb files. I found the binarypb files related to Google Search Bar activity to contain way more protobuff data than those from Google Assistant, so a file structure breakdown is impractical here.
Finally, I thought it might be a good idea to give some historical context about this feature by taking a trip down memory lane.
Back in 2009 Google introduced what, at the time, it called Quick Search Box for Android for Android 1.6 (Doughnut). It was designed as a place a user could go to type a word or phrase and search not only the local device but also the Internet. Developers could adjust their app to expose services and content to Quick Search Box so returned results would include their app. The neat thing about this feature was that it was contextually/location aware, so, for example, I could type the word “weather” and it would display the weather conditions for my current location. All of this could occur without the need of another app on the phone (depending on the search).
Google Quick Search Box – circa 2009.
Showtimes…which one do you want?
Prior to Google Assistant, Quick Search Box had a vocal input feature (the microphone icon) that could execute commands (e.g. call Mike’s mobile) and that was about it. Compared today this seems archaic, but, at the time, it was cutting edge.
Yes, I’m listening.
Fast forward three years to 2012’s Jelly Bean (4.1). By that time Quick Search Bar (QSB) had been replaced by Google Now, Google’s search and prediction service. If we were doing Ancestry.com or 23andMe, Google Now would definitely be a genetic relative of Google Search Bar/Google Assistant. The resemblance is uncanny.
Mom, is that you? Google Now in Jelly Bean.
The following year, Kit Kat allowed a device to start listening for the hotword “Ok, Google.” The next big iteration was Now on Tap in 2015’s Marshmallow (6.x), and, with the arrival of Oreo (8.x) we have what we now know today as Google Assistant and the Google Search Bar (GSB). Recently in Android Pie (9.x) GSB moved from the top part of the home screen to the bottom.
Google Search Bar/Google Assistant at the bottom in Android Pie (9.x).
As of the Fall of 2018 Nougat and Oreo accounted for over half of the total Android install base. Since I had access to images of both flavors and conducted research on both, the following discussion covers both. There were a few differences between the two systems, which I will note, but, overall, there was no major divergence.
To understand where GSB lives and the data available, let’s review…
GSB and Google Assistant are roommates in both Nougat and Oreo; they both reside in the /data/data directory in the folder com.google.android.googlequicksearchbox. See Figure 1.
Figure 1. GSB & Google Assistant’s home in Android.
This folder holds data about searches that are done from GSB along with vocal input generated by interacting with Google Assistant. The folder has the usual suspect folders along with several others. See Figure 2 for the folder listings.
Figure 2. Folder listing inside of the googlequicksearchbox folder
The folder of interest here is app_session. This folder has a great deal of data, but just looking at what is here one would not suspect anything. The folder contains several binarypb files, which are binary protocol buffer files. These files are Google’s home-grown, XML-ish rival to JSON files. They contain data that is relevant to how a user interacts with their device via Google Assistant and GSB. See Figure 3.
Figure 3. binarypb files (Nougat)
A good deal of the overall structure of these binarypb files differ from those generated by Google Assistant. I found the GSB binarypb files not easy to read compared to the Google Assistant files. However, the concept is similar: there are markers that allow an examiner to quickly locate and identify the pertinent data.
To start, I chose 18551.binarypb in the Nougat image (7.1.2). This search occurred on 11/30/2018 at 03:55 PM (EST). The search was conducted while the phone was sitting on my desk in front of me, unlocked and displaying the home screen. The term I typed in to the GSB was “dfir.” I was presented with a few choices, and then chose the option that took me to the “AboutDFIR” website via Google Chrome. The beginning of the file appears in Figure 4.
Figure 4. Oh hello!
While not a complete match, this structure is slightly similar to that of the Google Assistant binarypb files. The big takeaway here is the “search” in the blue box. This is what this file represents/where the request is coming from. The BNDLs in the red boxes are familiar to those who have read the Google Assistant posts. While BNDLs are scattered throughout these files, it is difficult to determine where the individual transactions occur within the binarypb files, thus I will ignore them for the remainder of the post.
Scrolling down a bit finds the first area of interest seen in Figure 5.
Figure 5. This looks familiar.
In the Google Assistant files, there was an 8-byte string that appeared just before each vocal input. Here there is a four-byte string (0x40404004 – green box) that appears before the search term (purple box). Also present is a time stamp in Unix Epoch Time format (red box). The string, 0x97C3676667010000 is read little endian and converted to decimal. Here, that value is 1543611335575.
Figure 6. The results of the decimal conversion.
This time is the time I conducted the search from GSB on the home screen.
Down further is the area seen in Figure 7. The bit in the orange box looks like the Java wrappers in the Google Assistant files. The string webj and.gsa.widget.text* search dfir and.gsa.widget.text has my search term “dfir” wrapped in two strings: “and.gsa.widget.text.” Based on Android naming schemas, I believe this to be “Android Google Search Assistant Widget” with text. This is speculation on my part as I haven’t been able to find anything that confirms or denies this.
Figure 7. More search information.
The 4-byte string (green box), my search term (purple box), and the time stamp (red box) are all here. Additionally, is the string in the blue box. The string, a 5-byte string 0xBAF1C8F803, is something seen in Google Assistant files. In the Google Assistant files, this string appeared just prior to the first vocal input in a binarypb file, regardless of when, chronologically, it occurred during the session (remember, the last thing chronologically in the session was the first thing in those binarypb files). Here, this string occurs at the second appearance of the search term.
Traveling further, I find the area depicted in Figure 8. This area of the file is very similar to that of the Google Assistant files.
Figure 8. A familiar layout.
The 16-byte string ending in 0x12 in the blue box is one that was seen in the Google Assistant files. In those files I postulated this string marked the end of a vocal transaction. Here, it appears to be doing the same thing. Just after that, a BNDL appears, then the 4-byte string in the green box, and finally my “dfir” search term (purple box). Just below this area, in Figure 9, there is a string “android.search.extra.EVENT_ID” and what appears to be some type of identifier (orange box). Just below that, is the same time stamp from before (red box).
Figure 9. An identifier.
I am showing Figure 10 just to show a similarity between GSB and Google Assistant files. In Google Assistant, there was a 16-byte string at the end of the file that looked like the one shown in Figure 8, but it ended in 0x18 instead of 0x12. In GSB files, that string is not present. Part of it is, but not all of it (see the red box). What is present is the and.gsa.d.ssc. string (blue box), which was also present in Google Assistant files.
Figure 10. The end (?).
The next file I chose was 33572.binarypb. This search occurred on 12/04/2018 at 08:48 AM (EST). The search was conducted while the phone was sitting on my desk in front of me, unlocked and displaying the home screen. The term I typed in to the GSB was “nist cfreds.” I was presented with a few choices, and then chose the option that took me to NIST’s CFReDS Project website via Google Chrome. The beginning of the file appears in Figure 11.
Figure 11. Looks the same.
This looks just about the same as Figure 4. As before, the pertinent piece is the “search” in the blue box. Traveling past a lot of protobuff data, I arrive at the area shown in Figure 12.
Figure 12. The same, but not.
Other than the search term (purple box) and time stamp (red box) this looks just like Figure 5. The time stamp converts to decimal 1543931294855 (Unix Epoch Time). See Figure 13.
Figure 13. Looks right.
As before, this was the time that I had conducted the search in GSB.
Figure 14 recycles what was seen in Figure 7.
Figure 14. Same as Figure 7.
Figure 15 is a repeat of what was seen in Figures 8 and 9.
Figure 15. Same as Figures 8 & 9.
While I am not showing it here, just know that the end of this file looks the same as the first (seen in Figure 10).
In both instances, after having received a set of results, I chose ones that I knew would trigger Google Chrome, so I thought there would be some traces of my activities there. I started looking at the History.db file, which shows a great deal of Google Chrome activity. If you aren’t familiar, you can find it in the data\com.android.chrome\app_chrome\Default folder. I used ol’ trusty DB Browser for SQLite (version 3.10.1) to view the contents.
As it turns out, I was partially correct.
Figure 16 shows the table “keyword_search_terms” in the History.db file.
Figure 16. Something(s) is missing.
This table shows search terms used Google Chrome. The term shown, “george hw bush,” is one that that I conducted via Chrome on 12/01/2018 at 08:35 AM (EST). The terms I typed in to GSB to conduct my searches, “dfir” and “nist cfreds,” do not appear. However, viewing the table “urls,” a table that shows the browsing history for my test Google account, you can see when I went to the AboutDFIR and CFReDS Project websites. See Figures 17 and 18.
Figure 17. My visit to About DFIR.
Figure 18. My visit to NIST’s CFReDS.
The column “last_visit_time” stores the time of last visit to the site seen in the “url” column. The times are stored in Google Chrome Time (aka WebKit time), which is a 64-bit value in microseconds since 01/01/1601 at 00:00 (UTC). Figure 19 shows the time I visited AboutDFIR and Figure 20 shows the time I visited CFReDS.
Figure 19. Time of my visit to AboutDFIR.
Figure 20. Time of my visit to NIST’s CFReDS.
I finished searching the Chrome directory and did not find any traces of the search terms I was looking for, so I went back over to the GSB directory and looked there (other than the binarypb files). Still nothing. In fact, I did not find any trace of the search terms other than in the binarypb files. As a last-ditch effort, I ran a raw keyword search across the entire Nougat image, and still did not find anything.
This could potentially be a problem. Could it be that we are missing parts of the search history in Android? The History.db file is a great and easy place to look and I am certain the vendors are certainly parsing that file, but are the tool vendors looking at and parsing the binarypb files, too?
As I previously mentioned, I also had access to an Oreo image, so I loaded that one up and navigated to the com.google.android.googlequicksearchbox\app_session folder. Figure 21 shows the file listing.
Figure 21. File listing for Oreo.
The file I chose here was 26719.binarypb. This search occurred on 02/02/2019 at 08:48 PM (EST). The search was conducted while the phone was sitting in front of me, unlocked and displaying the home screen. The term I typed in to the GSB was “apple macintosh classic.” I was presented with a few choices but took no action beyond that. Figure 22 shows the beginning of the file in which the “search” string can be seen in the blue box.
Figure 22. Top of the new file.
Figure 23 shows an area just about identical to that seen in Nougat (Figures 5 and 12). My search term can be seen in the purple box and a time stamp in the red box. The time stamp converts to decimal 1549158503573 (Unix Epoch Time). The results can be seen in Figure 24.
Figure 23. An old friend.
Figure 24. Time when I searched for “apple macintosh classic.”
Figure 23 does show a spot where Oreo differs from Nougat. The 4-byte in the green box that appears just before the search term, 0x50404004, is different. In Nougat, the first byte is 0x40, and here it is 0x50. A small change, but a change, nonetheless.
Figure 25 shows a few things that appeared in Nougat (Figures 7 & 14).
Figure 25. The same as Figures 7 and 14.
As seen, the search term is in the purple box, the search term is wrapped in the orange box, the 4-byte string appears in the green box, and the 5-byte string seen in the Nougat and the Google Assistant files is present (blue box).
Figure 26 shows the same objects as those in the Nougat files (Figures 8, 9, & 15). The 16-byte string ending in 0x12, the 4-byte string (green box), my search term (purple box), some type of identifier (orange box), and the time stamp (red box).
Figure 26. Looks familiar…again.
While not depicted in this post, the end of the file looks identical to those seen in the Nougat files.
Just like before, I traveled to the History.db file to look at the “keyword_search_terms” table to see if I could find any artifacts left behind. See Figure 27.
Figure 27. Something is missing…again.
My search term is missing. Again. I looked back at the rest of the GSB directory and struck out. Again. I then ran a raw keyword search against the entire image. Nothing. Again.
Out of curiosity, I decided to try two popular forensic tools to see if they would find these search terms. The first tool I tried was Cellebrite Physical Analyzer (Version 18.104.22.168). I ran both images through PA, and the only search terms I saw (in the parsed data area of PA) were the ones that were present in Figures 16 & 27; these terms were pulled from the “keyword_search_terms” table in the History.db file. I ran a search across both images (from the PA search bar) using the keywords “dfir,” “cfreds,” and “apple macintosh classic.” The only returned hits were the ones from the “urls” table in the History.db file of the Nougat image; the search term in the Oreo image (“apple macintosh classic”) did not show up at all.
Next, I tried Internet Evidence Finder (Version 22.214.171.12477). The Returned Artifacts found the same ones Physical Analyzer did and from the same location but did not find the search terms from GSB.
So, two tools that have a good foot print in the digital forensic community missed my search terms from GSB. My intentions here are not to speak ill of either Cellebrite or Magnet Forensics, but to show that our tools may not be getting everything that is available (the vendors can’t research everything). It is repeated often in our discipline, but it does bear repeating here: always test your tools.
There is a silver lining here. Just to check, I examined my Google Takeout data. As it turns out, these searches were present in what was provided by Google.
Search terms and search history are great evidence. They provide insight in to a user’s mindset and can be compelling evidence in a court room, civil or criminal. Google Search Bar provides users a quick and convenient way to conduct searches from their home screen without opening any apps. These convenient searches can be spontaneous and, thus, dangerous; a user could conduct a search without much thought given to the consequences or how it may look to third parties. The spontaneity can be very revealing.
I will be the first to admit, now that I know this, that I have probably missed a search term or two. If you think a user conducted a search and you’re not seeing the search term(s) in the usual spot, try the area discussed in this post. Keep in mind, too, that your tools may be missing this data, depending on whether or not know this data is present and they have been updated by the vendor.
And remember: Always. Test. Your. Tools.
A few days after this blog post was published, I had a chance to test Cellebrite Physical Analyzer, version 126.96.36.199. This version does parse the .binarypb files, although you will get multiple entries for the same search, and some entries may have different timestamps. So, caveat emptor; it will be up to you/the investigator/both of you to determine which is accurate
I also have had some time to discuss this subject further with Phil Moore (This Week in 4n6), who has done a bit of work with protobuf files (Spotify and the KnowledgeC database). The thought was to use Google’s protoc.exe (found here) to encode the .binarypb files and then try to decode the respective fields. Theoretically, this would make it slightly easier than manually cruising through the hexadecimal and decoding the time manually. To test this, I ran the file 26719.binarypb through protoc.exe. You can see the results for yourself in Figures 28, 29, and 30, with particular attention being paid to Figure 29.
Figure 28. Beginning of protoc output.
Figure 29. Middle part of the protoc output (spaces added for readability).
Figure 30. Footer of the protoc output.
In Figure 28 the “search” string is identified nicely, so a user could easily see that this represents a search, but you can also see there is a bunch of non-sensical data grouped in octets. These octets represent the data in the .binarypb file, but how it lines up with the hexadecimal values/ASCII values is anyone’s guess. It is my understanding that there is a bit of educated guessing that occurs when attempting to decode this type of data. Since protobuf data is serialized and the programmers have carte blanche in determining what key/value pairs exist, the octets could represent anything.
That being said, the lone educated guess I have is that the octet 377 represents 0xFF. I counted the number of 377’s backwards from the end of the octal time (described below) and found that they matched (24 – there were 24 0xFF’s that proceeded the time stamp seen in Figure 23). Again, speculation on my part.
Figure 29 is the middle of the output (I added spaces for readability). The area in the red box, as discovered by Phil, is believed to the be the timestamp, but in an octal (base-8) format…sneaky, Google. The question mark at the end of the string lines up with the question mark seen at the end of each timestamp seen in the figures of this article. The area in the green box shows the first half of the Java wrapper that was discussed and seen in Figure 25. The orange box contains the search string and the last half of the Java wrapper.
Figure 30 shows the end of the protoc output with the and.gsa.d.ssc.16 string.
So, while there is not an open-source method of parsing this data as of this writing, Cellebrite, as previously mentioned, has baked this into the latest version of Physical Analyzer, but care should be taken to determine which timestamp(s) is accurate.
The author provides step-by-step instructions which make it easy for other examiners to find traces on any dataset of the same version. The screenshots provided help to support the author’s conclusions. The traces located during the verified and validated reviews were consistent with the findings in the article, but it cannot be asserted that this was the exact mechanism that created this trace.
Additional research could be conducted into .binarypb files. Based on the article, it appears that each search generates an additional .binarypb file, but is this always the case? What if a search was conducted for the same term?
A script could be created to parse out search terms from an input file and provide the timestamps for those search terms.
Hannes Spichiger (Verified Review using Author Provided Data Sets, Validated Review using Reviewer Generated Datasets)
Linda Shou (Methodology Review)