Skip to main content
SearchLoginLogin or Signup

OK Computer...er...Google. Dissecting Google Assistant (Part 1)

Published onMay 26, 2020
OK Computer...er...Google. Dissecting Google Assistant (Part 1)
·

Synopsis

Forensic question: What information is recoverable from the use of Google Assistant when the device is connected to a car using Android Auto?

OS Version: Android 8.1 (Oreo)
Patch Level: December 8, 2018

Google Assistant v. 3.8.584564 - Installed 01/16/2019 13:15 (EST)

Google v8.91.5.21 - Installed 01/16/2019 13:10 (EST)

Maps v10.7.1 - Installed 01/16/2019 13:01 (EST)

Tools:
WinHex, Version 19.7 (Specialist License)
Cellebrite UFED 4PC, Version 7.12.1.100
DCode Version 4.02a

Introduction

A few weeks ago, I posted a blog about some research I conducted on Android Auto, and I mentioned there was some interesting data left behind by Google Assistant when using Android Auto. Based on what I found there, I decided to go further down the virtual assistant rabbit hole to see what I could find.

As far as virtual assistants go, I use Siri. When I had a newborn, Siri was used a lot. In addition to turning on/off lights or playing music, I used Siri to turn on certain appliances (via smart plugs), respond to texts, and make phone calls as my hands were usually holding a baby or trying to do something for/to/with a baby. Siri was really helpful then. Siri is still useful, but, nowadays, my primary use of Siri is in the car. There are times where I still yell at my phone or HomePod to find a TV show, play a song, turn on a light, or answer a quick math question. For other things such as search and typing a message (outside of the car), I’m old fashioned.

I have been, in one way or another, fascinated by talking computers/A.I. for a long time. According to my parents, I used to love Knight Rider. It had the Knight Industries Two Thousand (K.I.T.T.) – the snarky, crime-fighting automotive sidekick of the Hoff. As you can see from the GIF above, I also like Star Trek. Captain Kirk, Spock, et al. have been verbally interacting with their computers since the 1960’s. WOPR/Joshua, check. Project 2501, check. The Architect, check. And, the crème de la crème: HAL 9000, check check check.

While researching Google Assistant, I stumbled across an article that had some interesting statistics. In 2017, there was a 128.9% year-over -year increase in the use of voice-activated assistants, with an expected growth of 40% in 2019. Another statistic: 2 out of 5 adults use voice search at least once a day; this was in 2017, so I suspect this number is higher by now. This explosion, in my opinion, started when Alexa arrived. Say what you like about Amazon, but they were smart to open Alexa to developers. Alexa is everywhere and owns around 70% of the virtual assistant market. With Amazon’s recent acquisition of eero (to my dismay), wireless routers, the televisions of the 21st century, will have Alexa in them. Alexa. Is. Everywhere. I am sure Google will follow.

There are three things to be aware of before I get started. First, the data I found resides in the /data/ directory, and, thus, is not easily accessible on newer devices unless the device is rooted, or you have a tool that can access this area. Second, a warning: this post is lengthy, but in order to understand the patterns and data in the three files I examine, it is necessary.

Finally, this will be a two-parter. There was way too much data to cover everything in just one post. This post will examine the data left behind by Google Assistant when used via Android Auto. The second post will examine the data left behind by Google Assistant when used outside of the car (yelling at the device).

One final note: this data was generated on a rooted Nexus 5X running Android Oreo (8.1) with a patch date of December 5, 2018. The data used in this article can be found here.

This research is an off-shoot of what I did with Android Auto, so if you want the full backstory, you can see the post here. But…

Let’s Review

Google Assistant resides in the /data/data directory. The folder is com.google.android.googlequicksearchbox. See Figure 1.

Figure 1.

This folder also holds data about searches that are done from the Quick Search Box that resides at the top of my home screen (in Oreo). This box has been around, in some fashion or the other, since Doughnut, so it has had around 10 years or so to mature. The folder has the usual suspect folders along with several others. See Figure 2 for the folder listings.

Figure 2.

The folder of interest here is app_session. This folder has a great deal of data, but just looking at what is here one would not suspect anything. The folder contains several .binarypb files, which I have learned, after having done additional research, are binary protocol buffer files. These files are Google’s home-grown, XML-ish rival to JSON files. They contain data that is relevant to how a user interacts with their device via Google Assistant. See Figure 3.

Figure 3.

Each .binarypb file here represents a “session,” which I define as each time Google Assistant was invoked. Based on my notes, I know when I summoned Google Assistant, how I summoned it, and what I did when I summoned it. The first time I summoned Google Assistant is represented in the file with the last five digits of 43320.binarypb. Figure 4 shows the header of the file.

Figure 4.

The ASCII “car_assistant” seems to imply this request had been passed to Google Assistant from Android Auto. In each test that I ran in Android Auto, this phrase appeared at the beginning of the file. Additionally, the string in the smaller orange box (0x5951EF) appeared at the beginning of the file at the same byte offset each time. I hesitate to call this a true “file header,” though. I think someone with more time in DFIR should make that call.

If you read my Android Auto post, you will know the string in the red box is the start of a MP3 file. You can see the end of the MP3 file in Figure 5.

Figure 5.

The string in the orange box is the marker of the LAME MP3 codec, and the strings in the red boxes in Figures 4 and 5 are what I called “yoda” strings. Seeing these things, I carved from the first yoda (seen in Figure 4), to the last (seen in Figure 5), for a total of 11.1 KB. I then saved the file with no extension and opened it in VLC Player. The following came out of my speakers:

“You’ve got a few choices. Pick the one you want.”

Based on my notes, this was the last phrase Google Assistant spoke to me via Android Auto prior to handing me off to Maps. In this session, I had asked for directions to Starbucks and had not been specific about which one, which caused the returned reply that I had just heard. There was other interesting data in this file, such as the text of what I had dictated to Google Assistant. I began to wonder if it would be possible to determine if there were any patterns or other identifying data in this file that would be useful to or could act as “markers” for digital forensic practitioners. Using 43320.binarypb as a starting point, I set off to see if I could map this file and the others on which I had taken notes.

Looking at these files in hex and ASCII, I started to notice a bit of a pattern. While there is a difference between interactions in the car (via Android Auto) and outside of the car (yelling at the phone), there are some high-level similarities between these files regardless of how Google Assistant is used.

A Deep Dive

I chose 43320.binarypb as my starting point on purpose: there was a single request for directions in this session. Thus, I thought the file would be straight forward. I was right…sorta.

The session was started via Android Auto, and I had invoked Google Assistant via a button on my steering wheel (the phone was connected to the car). The session went like this:

Me: “I need directions to Starbucks.”

// Google Assistant thought for a few seconds //

GA: “You’ve got a few choices. Pick the one you want.”

After that I was handed off to Maps and presented with a few choices. I chose a particular location and route, and then went on my way. Figure 4 shows the top of the file, and I have already mentioned the MP3 data (Figures 4 and 5), so I will skip that portion of the file.

The first area of the file after the MP3 portion was a 4-byte string, BNDL (0x42444C02). Just make note of this for now, because it comes up, a lot. After BNDL there was some information about the version of Android Auto I was running, and, potentially, where the voice input was coming from (/mic /mic); see the red box and orange box, respectively, in Figure 6.

Figure 6.

There is an additional string in there that, if you weren’t paying attention, you would miss as it’s stuck at the end of some repetitive data. I certainly missed it. Take a look at the string in the blue box in Figure 6. The string is 0x30CAF25768010000 (8 bytes) and appears at the end of some padding (please do not judge - I couldn’t come up with a better name for those 0xFF’s). I read it little endian, converted it to decimal, and got a pleasant surprise: 1547663755824. I recognized this format as Unix Epoch Time, so I turned to DCode, and had my Bob Ross-ian moment. See Figure 7.

Side note: I had been trying to find a date/time stamp in this file for two weeks, and, as frequently happens with me, I found it by accident.

Figure 7.

Based on my notes, this is when I had summoned Google Assistant in order to ask for directions: 01/16/2019 at 13:35 (EST).

Next, com.google.android.apps.gsa.shared.search.QueryTriggerType (red box) caught my attention. Just below it was the following: webj gearhead* car_assistant gearhead (green box). If you read my Android Auto post, you will know the title of the folder in which Android Auto resides has “gearhead” in it (com.google.android.projection.gearhead). So, does this indicate Google Assistant was triggered via Android Auto? Maybe…or maybe not. This could be a one off. I filed this away and continued. See Figure 8.

Figure 8.

The next thing is something I mentioned in the Android Auto post. A 5- byte string (0xBAF1C8F803) and an 8-byte string (0x014C604080040200) that appeared just above my actual vocal inquiry. They can be seen in Figure 9: the 5-byte string is in the blue box, the 8-byte string is in the green box, and the voice inquiry is in the top purple box. Take note that there is a variation of what I actually said in the bottom purple box. Also note the BNDL in the red box.

Figure 9.

Below that is data I had seen earlier in the file (in Figure 6): the Android Auto version number, /mic /mic (orange box), the same time stamp I had just seen (purple box) QueryTriggerType with webj gearhead* car_assistant I need directions to Starbucks gearhead (green box). And, there is BNDL again. See Figures 10 and 11.

Figure 10.

Figure 11.

I want to draw attention to two additional things. The first, in Figure 11, is another time stamp in the blue box. This is another Unix time stamp (0xB176F15768010000). This time is 01/16/2019 at 13:34:28 (EST). which is just under 1:30 earlier than the time stamp I had seen previously (when I invoked Google Assistant). This is odd, but the string just below it may have something to do with it: com.google.android.apps.gsa.shared.logger.latency.LatencyEvents. I will say that 13:34 is when I connected the phone to the car and started Android Auto.

The second area is in the red box in Figure 12-1. There you see the following: velvet:query_state_search_result_id (red box) and then a 16-byte string ending in 0x12 (blue box). This area appears in every Google Assistant session I have examined. I have a theory about it but will wait until later in this article to explain. As with BNDL, just put it to the side for the moment.

Figure 12-1.

Figure 12-2.

In Figure 12 -1, you can also see BNDL (yellow box), the 8-byte green box string just prior to my vocal inquiry, and then the inquiry itself (purple box). After a bit of padding, there is BNDL. After that, in Figure 12-2, there is the same data seen in Figures 6 and 10 (orange box) , and…what’s this? Another time stamp (red box)? I did the same thing as before and got another Unix Epoch Time time stamp.

As with the previous time stamp, this one is also prior to the first time stamp I had encountered in the file, although it is within the same minute in which I had invoked Google Assistant. As before, this time stamp appears just before the string that contains LatencyEvents. Does this have something to do with any latency the device is experiencing between it and Google’s servers? Again, I am not sure.

Below this time stamp is a replay of what I had seen in Figure 10 (Figure 12-2 – orange box). The area I discussed in Figure 11 is also present, sans my vocal input (purple). See Figure 13.

Figure 13.

After that last BNDL, the same items I have already discussed are recycled again, and the first time stamp I had found is present again (red box Figure 14-2). See Figures 14-1, 14-2, and 14-3.

Figure 14-1.

Figure 14-2.

Figure 14-3.

The very last portion of the file is velvet:query_state:search_result_id (orange box) along with the 16-byte string (purple box); however, there is a small twist: the last byte has changed from 0x12 to 0x18. Just after that string is a 9-byte string, 0x01B29CF4AE04120A10 (blue box). This string appears at the end of each session file I have examined, along with the string and.gsa.d.ssc (red box). See Figure 15.

Figure 15.

So, just in this one file I saw patterns within the file, and recurring strings. Were these things unique to this particular file, or does this pattern span across all of these files?

The next file I chose was 12067.binarypb. As before, there was a single request for directions in this session. This session, I was a bit more specific about the location for which I was looking.

This session was also started via Android Auto, and I had invoked Google Assistant via a button on my steering wheel (the phone was connected to the car). The session went like this:

Me: “Give me directions to the Starbucks in Fuquay Varina.”

// Google Assistant thought for a few seconds //

GA: “Starbucks is 10 minutes from your location by car and light traffic.”

As can be seen in Figure 16, the strings 0x5951EF and car_assistant can be seen at the top of the file. Unlike the previous file, however, there is an additional bit of data here: com.android.apps.gsa.search.core.al.a.au, a BNDL, and ICING_CONNECTION_INITIALIZATION_TIME_MSEC. The “yoda” is also here. See the blue, green, purple, orange, and red box, respectively, in Figure 16.

Figure 16.

Figures 17-1 and 17-2 show the end of the MP3 data, a BNDL, and then some data seen in the 43320.binarypb file: the Android Auto version number, /mic /mic (orange box), a time stamp (red box), and QueryTriggerType with the webj gearhead* car_assistant gearhead (green box). The time stamp here is 0x9FC2CB5B68010000, which, when converted to decimal, is 1547728306847. Just like the previous file, this is also Unix Epoch Time. I used DCode to convert, and got 01/17/2019 at 07:31:46 (EST). According to my notes, this is the time I invoked Google Assistant, and asked for directions.

Figure 17-1.

Figure 17-2.

Traveling slightly further down I arrive in the area seen in Figure 18. Here I find the 5-byte (blue box) and 8-byte strings (green box) I had seen in 43320.binarypb. Then I see my request (purple box). Also note the lower purple boxes; these appear to be what I said, and variations of what I said. Just before each new variation, there is a number (4, 3, 5, and 9). I will note that the text behind 4 and 5 differ only by the period at the end of the request. I suspect that these numbers are assigned to each variation to keep tabs on each; however, I am not sure why. There is also a BNDL at the end of this area (red box).

Figure 18.

Just below the requests I found some familiar information (Figures 19-1 and 19-2). The Android Auto version number, /mic /mic (purple box), a time stamp (orange box), and QueryTriggerType with the webj gearhead* car_assistant gearhead (green box) are all here. The time stamp here is the same as the previous one. There is an additional piece of data here; just past the webj gearhead* car_assistant string is the 4give me directions to the Starbucks in Fuquay Varina gearhead (blue box). There is also a BNDL at the end (red box).

Figure 19-1.

Figure 19-2.

Below the area in Figure 19-2, there is a time stamp (Figure 20) shown in a blue box. The string (0xBC32C65B68010000) results in a Unix Epoch Time (

Figure 20.

Below the time stamp, the velvet:query_state_search_result_id appears again in Figures 21-1 and 21-2, along with the 16-byte string ending in 0x12 (green box) and a BNDL, the 8-byte string, and then my vocal inquiries and their variations, and another BNDL. See the red, green, blue, purple, and orange boxes, respectively.

Figure 21-1.

Figure 21-2

Just after the BNDL is the information about the Android Auto version I was using, the /mic /mic string (orange box), and a Unix Epoch Time time stamp (red box). This one is the same as the first one I had seen in this file (the time I invoked Google Assistant). See Figure 22.

Figure 22.

Below that are some new things. First, the text of the MP3 file at the beginning of this file (purple box). Second, a string that fits a pattern that I see in other files: xxxxxxxxxxxx_xxxxxxxxx (green box). The content of the string is different, but, most of the time, the format is 12 characters underscore 12 characters. I am not sure what these are, so if any reader knows, please let me know so I can add it here (full credit given). For the purposes of this article I will refer to it as an identifier string.

Also present is the URL for the location I asked for in Google Maps (orange box), and another identifier string (yellow box). Beyond that, is the velvet:query_state_search_result_id string, along with the 16-byte string ending in 0x18 (red box), the 9-byte string (0x01B29CF4AE04120A10 - blue box), and the string and.gsa.d.ssc (yellow box). See Figures 23-1 and 23-2.

Figure 23-1.

Figure 23-2.

So, for those keeping score, let’s review. While each request was slightly different, there were some consistencies between both files. The format, in particular, was fairly close:

  1. The beginning of the file

    1. The 3-byte string: 0x5951EF

    2. The string “car_assistant”

  2. The MP3 audio at the beginning of the file which contains the last audio interaction prior to being sent to a different app (Maps).

  3. BNDL.

  4. Android Auto Version along with the /mic /mic string.

  5. The date/time stamp of when Google Assistant is invoked, which appears just after some padding (0xFF).

  6. A 5-byte string (0xBAF1C8F803) that appears just before the vocal input appears the first time in a file. This string only appears here, and does not appear elsewhere.

  7. An 8-byte string (0x014C604080040200) that appears just before the vocal input, regardless of where it appears within the file.

  8. Text of the vocal input.

  9. BNDL.

  10. Android Auto Version along with the /mic /mic string.

  11. Another date/time stamp of when Google Assistant was invoked (same as the first).

  12. The string webj gearhead* car_assistant <my vocal input> gearhead (what I actually said)

  13. BNDL

  14. What I have decided to call a “latency time stamp,” although, it may indicate the last time any activity was done via Android Auto (including starting Android Auto) prior to the invocation of Google Assistant.

  15. The velvet:query_state:search_result_id string appears along with the 16-byte string ending in 0x12.

  16. Items 7, 8, 9, 10, and 11 recycle.

  17. The velvet:query_state:search_result_id string appears along with the 16-byte string ending in 0x18, which appears at the end of the file.

  18. The 9-byte string 0x01B29CF4AE04120A10 after Item 17.

  19. The string and.gsa.d.ssc that appears after Item 18.

There is some divergence in the files. In 43320, the items 7, 8, 9, 10, and 11 recycle a second time, whereas they recycle only once in 12067, and it also contains an extra latency time stamp that was not present in 12067. Additionally, 12067 contains some extra data at the end of the file, specifically, the text of the MP3 file at the start of the file, an identifier string, a URL for Maps, and another identifier string.

File 12067 also had some extra data at the beginning that did not appear in 43320.

I also used Android Auto to test sending and receiving messages, and the file that represents that test is 22687.binarypb. There were three sessions on 01/27/2019. The first session, which started at 14:16 (EST) went as follows:

Me: “Read my newest text message.”

// Google Assistant thought for a few seconds //

GA: “You have two new messages. The first one is a chat message from Josh Hickman. Do you want to hear it or skip it?”

Me: “Read it.”

GA: “It says ‘This is a test message for the Google Assistant test. Hopefully, she will read this correctly.’ Do you want to reply, repeat it, or go on to the next one?

Me: “Go on to the next one.”

GA: “The next one is a chat message from [Number from TracFone]. Do you want to hear it or is that it for now?”

Me: “That is it for now.”

GA: “Ok.”

About two minutes later at 14:16, a second session was started. It went as follows:

Me: “Read my newest text messages.”

// Google Assistant thought for a few seconds //

GA: “You have two messages. The first one is a chat message from Josh Hickman. Do you want to hear it or skip it?”

Me: “Hear it.”

GA: “It says ‘Ok, let’s try this one. Make sure you record it. Oh, and don’t forget to document everything for your paper.’ Do you want to reply, repeat it, or go on to the next one?”

Me: “I’ll reply.”

GA: “What’s the message?”

Me: “Sure. I’ll make sure to write down everything before I leave.”

GA: “Here’s your message: ‘Sure. I’ll make sure to write down everything before I leave.’ Do you want to send it or change it?”

Me: “Send it.”

GA: “Sending message.”

About 3 minutes later (14:21 EST) I asked for directions using the same method as before (invocation via a button on my steering wheel). The session went as follows:

Me: “Get me directions to the Smithfield Chicken and Barbecue in Warsaw North Carolina.”

// Google Assistant thought for a few seconds //

GA: “Smithfield Chicken and BBQ is 51 minutes from your location by car and light traffic.”

The first thing I notice is there is a single binarypb file for 01/27/2019 (22687), even though there were three sessions. Inspection of the file finds the 3-byte string, 0x5951EF, is present along with car_assistant string. There is also a “yoda.” See the orange, blue, and red boxes, respectively in Figure 24. I carved from the yoda in Figure 24 to the end of the padding in Figure 25 (orange box).

Figure 24.

Figure 25.

The following came out of my computer speakers:

“Smithfield Chicken and BBQ is 51 minutes from your location by car and light traffic.”

Now, this is interesting. The first two sessions, which started at 14:16 and 14:18, did not include anything regarding directions. The third session at 14:21 did involve directions. I wonder if the fact that the three sessions were so close together that Google Assistant/Android just made one binarypb file to encompass all three sessions. That would require more testing to confirm (or disprove) but is beyond the scope of this exercise and article.

Figures 26-1 and 26-2 shows the end of the MP3 data and some familiar data: the Android Auto version information, /mic /mic, and a time stamp. It also shows the QueryTriggerType and the webj gearhead* car_assistant gearhead string. See the blue, orange, red, purple, respectively. The time stamp here is 0x38CCC29068010000. I converted to decimal (

Figure 26-1.

Figure 26-2.

Below that is some more familiar data. The 5-byte string (0xBAF1C8F803) and 8-byte string (0x014C604080040200) appear (blue and green boxes in Figure 27), and there is the vocal input from my request for directions (that occurred roughly three minutes later). There are also variations of the actual vocal input; each variation is designated by a letter (J, K, O, P, and M) (purple boxes). After the variations, is the Android Auto version string, the /mic /mic string, and the same timestamp from before (orange and red boxes).

Figure 27-1.

Figure 27-2.

The QueryTriggerType (red box) appears along with the webj gearhead* car_assistant J get me directions to the Smithfield Chicken & BBQ in Warsaw North Carolina gearhead (green box). A BNDL appears (blue box), and then another time stamp (purple). The byte string is 0x171DBC9068010000, and, in decimal is

Figure 28.

After that data is the velvet:query_state:search_result_id string, the accompanying 16-byte string ending in 0x12 (orange box), and a BNDL (blue box). The 8-byte string (0x014C604080040200) appears (green box), my vocal input that started the first session (“read my newest text message” – purple box), and then a BNDL (blue box). After that it is the Android Auto version, /mic /mic (yellow box), and a timestamp (red box). See Figures 29-1 and 29-2. The time stamp here is 0x0385BD9068010000, which is decimal

Figure 29-1.

Figure 29-2

Also in Figure 29-2 is the QueryTriggerType and the webj gearhead* car_assistant (dark purple box) string.

Figure 30 has some new data in it. A string appears that, while not completely similar, is something I had seen before. It appears to be an identifier string: GeQNOXLPoNc3n_QaG4J3QCw. This is not 12 characters underscore 12 characters, but it is close. Right after the identifier string is my vocal input “read my new text message.” See the red box, and blue box, respectively in Figure 30.

Figure 30.

Figures 31-1 and 31-2 shows the two new text messages that were identified. See the blue and red boxes.

Figure 31-1.

Figure 31-2.

Scrolling down a bit I find another identifier string: eQNOXLPoNc3n_QaG4J3QCw (red box). This identifier is the same as the first one, but without the leading “G.” After this identifier is the velvet:query_state:search_result_id and the accompanying 16-byte string ending in 0x12 (orange box). A BNDL appears at the end (green box). See Figure 32.

Figure 32.

Next up is the 8-byte string (0x014C604080040200), and my next vocal input “read it.” Just below my vocal input is the Android Auto version information, /mic /mic, and a time stamp. Just below the time stamp is the QueryTriggerType and the webj gearhead* car_assistant gearhead strings (not pictured). See the blue, orange, red, and purple boxes, respectively in Figures 33-1 and 33-2. The time stamp here is 0xD796BD9068010000. I converted it to decimal (1548616570583‬), fired up DCode and got 01/27/2019 at 14:16:10 (EST). While I was not keeping exact time, this would seem to be when Google Assistant asked me whether or not I wanted to read the chat message from Josh Hickman.

Figure 33-1.

Figure 33-2.

There is another identifier string further down the file: GeQNOXLPoNc3n_QaG4J3QCw and just below it my vocal input “read my newest text message.” See the blue and red boxes, respectively, in Figure 34. This is interesting. Could it be that Google Assistant is associating this newest vocal input (“read it”) with the original request (“read my newest text message”) by way of the identifier string in order to know that the second request is related to the first? Maybe. This would definitely require some additional research if the case.

Figure 34.

Figures 35 and 36 show the text messages that were new when the request was made.

Figure 35.

Figure 36.

After some gobbly-goo, I found another identifier string: gwNOXPm3FfKzggflxo7QDg (red box). This format is completely different from the previous two I had seen. Maybe this is an identifier for the vocal input “read it.” Maybe it’s a transactional identifier…I am not sure. See Figure 37.

Figure 37.

In Figure 37 you can also see the velvet:query_state:search_result_id and the 16-byte string ending in 0x12 (orange box), a BNDL, (blue box) the 8-byte string (green box), my next vocal input (purple box), and another BNDL.

Figure 38 shows familiar data: Android Auto version, /mic /mic (green box), and a time stamp: 0x27BABD9068010000 (red box). This converts to 1548616579623 in decimal, and 01/27/2019 at 14:16:19 in Unix Epoch Time. As with the previous request, I wasn’t keeping exact time, but this would probably line up with when I said “Go on to the next one.”

Figure 38.

Figure 39 shows the QueryTriggerType string along with webj gearhead* car_assistant string.

Figure 39.

Figure 40 shows that identifier string again, and the vocal input that kicked off this session “read my newest text message.” I am beginning to suspect this is actually some type of transactional identifier to associate “go on to the next one” with “read my newest text message.”

Figure 40.

Figures 41 and 42 show the new text messages.

Figure 41.

Figure 42.

There is another identifier string in Figure 43: jwNOXMv_Ne_B_QbewpK4CQ (blue box). This format is completely new compared to the previous ones. Additionally, the velvet:query_state:search_result_id and the 16-byte string ending in 0x12 (orange box) appears, a BNDL (red box), along with the 8-byte string (green box) my next vocal input (purple box), and a BNDL.

Figure 43.

Figure 44 shows the Android Auto version, /mic /mic (blue box), and another time stamp (red box). This time stamp is 0xC1ECBD9068010000, which converts to 1548616592577. This is 01/27/2019 at 14:16:32 (EST). This probably coincides with my vocal input “That’s it for now.”

Figure 44.

Figure 45 has the QueryTriggerType and webj gearhead* car_assistant.

Figure 45.

Figure 46 shows a few things. The first is the velvet:query_state:search_result_id and the 16-byte string ending in 0x12 (orange box). The second thing is another identifier string, MgNOXNWTAtGp5wL03bHACg (blue box). As before, this format does not match anything I have seen previously.

The third, and the most interesting part, is the start of the second session. The only dividing line here is the velvet:query_state:search_result_id and the 16-byte string ending in 0x12, and BNDL (red box). The green box is the 8-byte string, and the purple box contains my vocal input, “read my newest text messages.” The purple boxes below are variations of what I said.

Figure 46.

Figure 47 shows the Android Auto version string, the /mic /mic string (blue box), and another time stamp (red box). This time the stamp is 0x8C28C09068010000. This converts to 1548616738956 in decimal, which is 01/27/2019 at 14:18:58 (EST) in Unix Epoch Time, which is the time I invoked Google Assistant for the second session.

Figure 47.

The next string that appears is the QueryTriggerType and webj gearhead* car_assistant strings. See Figure 48.

Figure 48.

The next string is another identifier string. This time, it is associated with my newest vocal input: “read my newest text messages.” (blue box) The string is GJgROXL2qNeSIggfk05CQCg (green box). See Figure 49.

Figure 49.

Figures 50 and 51 show the text messages.

Figure 50.

Figure 51.

The next thing I see is a unique identifier string: JgROXL2qNeSIggfk05CWCg (orange box). This string is the same as the previous one (in Figure 49), but without the leading “G.” This behavior is the same that I saw in the first session. Beyond that there is the velvet:query_state:search_result_id and the 16-byte string ending in 0x12 (blue box), a BNDL (red box), the 8-byte string (green box), and my next vocal input, “hear it” (purple box), and another BNDL. See Figure 52.

Figure 52.

Figure 53 shows the Android Auto version string, the /mic /mic string (blue box), and another time stamp (red box). This time the stamp is 0x153AC09068010000. This converts to 1548616743445 in decimal, which is 01/27/2019 at 14:19:03 (EST) in Unix Epoch Time, which would coincide with my vocal input “hear it.”

Figure 53.

The next string that appears is the QueryTriggerType and webj gearhead* car_assistant strings. See Figure 54.

Figure 54.

Scrolling a bit finds an identifier string I have seen before: GJgROXL2qNeSIggfk05CQCg (green box). This is the first identifier seen in this session (the second one). Just below it is the vocal input that started this session: “read my newest text messages” (red box). See Figure 55.

Figure 55.

Figures 56 and 57 show the messages that were new.

Figure 56.

Figure 57.

Figure 58 shows a pattern I have seen before. First is another identifier string: MAROXPPhAcvn_QaPpI24BA (orange box). The second and third are velvet:query_state:search_result_id and the 16-byte string ending in 0x12 (blue box). There is another BNDL (red box), the 8-byte string (green box), my next vocal input (purple box), I’ll reply, and another BNDL. Also note the variations of what I said below my actual input (lower purple boxes).

Figure 58.

Figure 59 shows the Android Auto version string, the /mic /mic string (blue box), and another time stamp (red box). This time the stamp is 0xD85CC09068010000. This converts to 1548616752344 in decimal, which is 01/27/2019 at 14:19:12 (EST) in Unix Epoch Time, which would coincide with my vocal input “I’ll reply.”

Figure 59.

Figure 60 shows the next string that appears is the QueryTriggerType and webj gearhead* car_assistant strings.

Figure 60.

The next thing of interest is what is seen in Figure 61. There is another identifier string, GPQROXMz3L6qOggfWpKeoCw (blue box). This is not a string we have seen before. Just below it is the vocal input that started this session (red box).

Figure 61.

I had to scroll quite a bit through some Klingon, but eventually I got to the area in Figure 62. The red box shows another identifier string: PQROXMz3L6qOggfWpKeoCw. This is the same string that we saw in Figure 61, sans the leading “G.” Again, this behavior is a pattern that we have seen in this particular file. It causes my suspicion to grow that it is some type of transactional identifier that keeps vocal input grouped together.

Figure 62.

The second thing is velvet:query_state:search_result_id and the 16-byte string ending in 0x12 (blue box). In Figure 63, there is another BNDL (red box), the 8-byte string (green box), and my next vocal input (the dictated message - purple box). Also note the variations of what I said below my actual input (lower purple boxes). Note that each variation is delineated by a character: B, =, and >.

Figure 63.

Figure 64 shows the Android Auto version string, the /mic /mic string (blue box), and another time stamp (red box). This time the stamp is 0x2192C09068010000. This converts to 1548616765985 in decimal, which is 01/27/2019 at 14:19:25 (EST) in Unix Epoch Time, which would coincide with my dictation of a message to Google Assistant.

Figure 64.

Figure 65 shows the next string that appears is the QueryTriggerType and webj gearhead* car_assistant strings.

Figure 65.

The next thing of interest is what is seen in Figure 66. There is an identifier string that we have seen before, GPQROXMz3L6qOggfWpKeoCw (blue box), and the initial vocal input that started this session (green box). Again, I am beginning to think this is a method of keeping vocal inputs grouped within the same session.

Figure 66.

Scrolling through yet more Klingon, I find the area shown in Figure 67. The blue box shows another identifier string: RwROXPzPAcG5gge9s5n4DQ (red box). This is a new identifier. The velvet:query_state:search_result_id string and the 16-byte string ending in 0x12 (orange box) are also present. There is another BNDL (blue box), the 8-byte string (green box), my next vocal input (purple box), “send it”, and another BNDL.

Figure 67.

Figure 68 shows the Android Auto version string, the /mic /mic string (blue box), and another time stamp (red box). This time the stamp is 0x16BAC09068010000. This converts to 1548616776214 in decimal, which is 01/27/2019 at 14:19:36 (EST) in Unix Epoch Time, which would coincide with my instructing Google Assistant to send the message I dictated.

Figure 68.

Figure 69 shows the next string that appears is the QueryTriggerType and webj gearhead* car_assistant strings.

Figure 69.

Figure 70 shows an identifier string, UQROXJ6aGKixggfh64qYDg (orange box), which is new. Just below it is the velvet:query_state:search_result_id string and the 16-byte string ending in 0x12 (blue box) are also present. There is another BNDL (red box), the 8-byte string (green box). Below the 8-byte string is the vocal input that started the third session: “get me directions to the Smithfield Chicken & BBQ in Warsaw North Carolina” (purple box). Figure 71 shows the variations of my vocal input, which are identified by J, K, O, P, and M (purple boxes). Below that is a BNDL. This is the same data seen in Figure 27.

Figure 70.

Figure 71.

Figure 72 shows the Android Auto version string, the /mic /mic string (blue box), and another time stamp (red box). This time the stamp is 0x38CCC29068010000. This converts to 1548616911928 in decimal, which is 01/27/2019 at 14:21:51 (EST) in Unix Epoch Time. This is the same time stamp seen in Figure 26.

Figure 72.

Figure 73 shows the next string that appears is the QueryTriggerType and webj gearhead* car_assistant strings.

Figure 73.

Scrolling down oh so slightly I find the text of the MP3 file at the beginning of the file (blue box), and the Google Maps URL for the location for which I had asked for directions (green box). See Figure 74. Figure 75 shows another identifier string, 1wROXOOFJc7j_Aa72aPAB (orange box). After the identifier string is the velvet:query_state:search_result_id string. Additionally, the string with the 16-byte string ending in 0x18 (green box), the 9-byte string (0x01B29CF4AE04120A10 - blue box), and the string and.gsa.d.ssc (red box) appear.

Figure 74.

Figure 75.

Comparisons

So, what is the final score? If you’re still reading, let’s recap, and include what we found in the 22687.binarypb file. The differences are in italics:

  1. The beginning of the file

    1. The 3-byte string: 0x5951EF

    2. The string “car_assistant”

  2. The MP3 audio at the beginning of the file which contains the last audio interaction prior to being sent to a different app (Maps).

  3. BNDL.

  4. Android Auto Version along with the /mic /mic string.

  5. The date/time stamp of when Google Assistant is invoked, which appears just after some padding (0xFF). In the third file, 22687, the time stamp is the time for the third session.

  6. A 5-byte string (0xBAF1C8F803) that appears just before the vocal input appears the first time in a file. This string only appears here, and does not appear elsewhere. In the third file, 22687, this appeared before the first vocal input, which, as it turns out, is the vocal input that started the third session.

  7. An 8-byte string (0x014C604080040200) that appears just before the vocal input, regardless of where and how many times it appears within the file.

  8. Text of the vocal input.

  9. BNDL.

  10. Android Auto Version along with the /mic /mic string.

  11. Another date/time stamp of when Google Assistant was invoked (same as the first).

  12. The string webj gearhead* car_assistant <my vocal input> gearhead (what I actually said). This item only appeared once in 22687 (the inquiry asking for directions).

  13. BNDL

  14. What I have decided to call a “latency time stamp,” although, it may indicate the last time any activity was done via Android Auto (including starting Android Auto) prior to the invocation of Google Assistant. In 22687, this only happened once.

  15. The velvet:query_state:search_result_id string appears along with the 16-byte string ending in 0x12.

  16. Items 7, 8, 9, 10, and 11 recycle.

  17. The velvet:query_state:search_result_id string appears along with the 16-byte string ending in 0x18, which appears at the end of the file.

  18. The 9-byte string 0x01B29CF4AE04120A10 after Item 17.

  19. The string and.gsa.d.ssc that appears after Item 18.

Conclusions

I feel comfortable enough at this point to draw a few conclusions based on my observations up to this point.

  1. Each binarypb file will start by telling you where the request is coming from (car_assistant).

  2. What is last chronologically is first in the binarypb file. Usually, this is Google Assistant’s response (MP3 file) to a vocal input just before being handed off to whatever service (e.g. Maps) you were trying to use. The timestamp associated with this is also at the beginning of the file.

  3. A session can be broken down in to micro-sessions. I will call them vocal transactions.

  4. Vocal transactions have a visible line of demarcation by way of the 16-byte string ending in 0x12.

  5. A BNDL starts a vocal transaction, but also further divides the vocal transaction in to small chunks.

  6. The first vocal input in the binarypb file is marked by a 5-byte string: 0xBAF1C8F803, regardless of when, chronologically it occurred in the session.

  7. Each vocal input is marked by an 8-byte string: 0x014C604080040200. While the 5-byte string appears at the first in the binarypb file only, the 8-byte string appears just prior to each vocal input.

  8. When Google Assistant doesn’t think it understands you, it generates different variations of what you said…candidates…and then selects the one it thinks you said.

  9. In sessions where Google Assistant needs to keep things tidy, it will assign an identifier. There does not appear to be any consistency (as far as I can tell) as to the format of these identifiers.

  10. The end of the final vocal transaction is marked by a 16-byte string ending in 0x18.

Figure 76 shows a visual version of the session files, over all, and Figure 77 shows the vocal transactions portion in more detail.

Figure 76.

Figure 77.

What’s Next?

So, there is definitely a format to these files, and I believe there are enough markers that someone could create a tool to parse them to make them easier to read and examine. They contain what a device owner said…what they actually said…to their car/device. This data could be extremely valuable to examiners and investigators, regardless of the venue in which they operate (civil or criminal).

If I had scripting skills, I would try to see if something could be written to parse these files; however, alas, I do not. I have zero scripting skills. Most of what I do is via Google and Cntrl/Cmd-C and Cntrl/Cmd-V. If any reader can do this, and is willing, please let me know and I will post your product here and give full credit. It would be awesome.

The second part of this post is forthcoming. If you can’t wait, here’s a sneak peek: there are some similarities…

DFIR Review

The author provides clear documentation of the process and helpful analysis of the raw data. Clear conclusions are provided. The discoveries related to timestamps, geo-location information, and file artifacts were interesting.

Future Work

A step-by-step process may be helpful to allow other examiners to repeat the same process. A parser could be created to help determine what was said, possibly when it was said, and where it might have happened.

Reviewers

  • Terrence Nemayire (Methodology Review)

  • Joshua I. James (Methodology Review)

Comments
0
comment
No comments here
Why not start the discussion?