Forensics Question: Can Google Takeout Location Data be trusted? | |
OS Version: Google Takeout data exported on 25th September 2020; data generated using Android 10 device | |
Tools: Google Pixel 3 running Android 10 to produce data, Mozilla Firefox (JSON viewer) |
The Rise of Cloud Data
Data in the cloud is becoming increasingly important to forensic examinations. Services such as those offered by Google and Apple collect a vast quantity of data about the users, and in recent years there has been a shift to allowing the user to access this data too. The Takeout service allows all data held by Google regarding a user to be downloaded.
One of the most interesting artefacts within the Google Takeout is location data. With Android phones recording their users' every move, often without them realising, this data can be a trove of critical information that provides both potential evidence and alibis. As the data is downloaded directly from Google, there is a temptation to blindly trust this data. This leads to an important question: could a user alter the location data held by Google in order to mislead an investigator?
Location Manipulation
The manipulation of the stored location data is trivial. When logged into the user’s account, visiting https://google.com/locationhistory allows a user to browse their location data by date. It is a comprehensive record of journeys and visited locations. It also gives the option to 'remove this stop'; effectively erasing the visit from the location history. Not only that, it allows a user to add a stop by creating a log of a visit that never actually happened.
The below screenshot shows what is presented to the user when first accessing their Location History.
Experiments were performed by capturing the Google Takeout for the Location History data, removing a stop and re-downloading the Takeout, then finally adding a fictional stop in its place and downloading the Takeout again. The Takeouts were then compared to identify the impacts of the manipulation. The screenshots below depict the original unedited trip log showing a trip from home to the office and back (left), the trip log after deletion of the stop at the office (right), and finally with the fictional stop at the local shopping centre, ‘intu Metrocentre’, in its place (bottom).
It was noted that there were inconsistencies displayed on the Location History map after having been edited, attempting to combine the real location data for waypoints on the journey, with sudden jumps to the edited location. The map on the left below was captured after the deletion, which still shows the journey to the office even though the ‘stop’ no longer exists. The map on the right shows the addition of the ‘Metrocentre’ stop, where the journey can be seen to be confused between the original and edited locations. Unfortunately, the native website is not usually used by the investigator, relying upon the Takeout data instead.
Google Takeout Analysis
Two sources of location data are provided by Google in the Takeout.
The first source is the ‘Location History.json’ file. This provides raw coordinates and timestamps with limited further context. Comparison of this file across the three Takeouts revealed no differences (other than some additional locations that were recorded whilst logging in to download the earlier Takeouts). Ultimately, the deletion or adding stops does not change the raw location data in this file.
The second source of data is the files within the Semantic Location History. These are presented as separate JSON files for each month, and record in a more readable format details about locations visited and journeys undertaken, adding Google’s interpretation of the raw data. It largely consists, to oversimplify it, of ‘activitySegment’ entries reflecting journeys, and ‘placeVisit’ entries reflecting places visited. It is in these files that the editing has an effect.
When a location is deleted, that entry is simply removed from the Sematic Location History, with the JSON files being otherwise identical before and after the removed entry.
When a location is added, it was found that there were two additional entries created and one updated entry in the Semantic Location History. The remaining entries were untouched.
The activitySegment preceding the added location was updated, retaining some original data and some data reflecting the fictional visit. The ‘endLocation’ specifies the added location. Information such as activityType (‘IN_PASSENGER_VEHICLE’) including confidence levels, deviceTag, and the startTimestampMs are all retained from the unedited entry. The endTimestampMs value matches the start time of the added location specified by the user.
A placeVisit entry is then inserted. This is a simple entry, reflecting the times and location specified by the user.
An activitySegment is then inserted. The start location was the location specified by the user, and the startTimestampMs matches the end time specified by the user. The end location matches the start location of the following, genuine activitySegment, and the endTimestampMs matches the start time of that following segment.
These changes were more substantial than expected, not only adding the location specified, but, in effect, inventing a journey to and from the location to fit with the surrounding genuine data.
Identification of Manipulation
Given that the ‘Location History.json’ file is unaltered, any indicators of edits must be found in the Semantic Location History files.
The deleted location does not leave behind any placeholder entry to show that it previously existed. This means the only potential indicators are inconsistencies between the entries immediately before and after the removed entry. Only a single indicator was identified which was where two ‘activitySegment’ entries are in a row. The only example found was where a location (which is stored as a ‘placeVisit’ entry) had been deleted. Whilst multiple ’placeVisit’ entries can occur in succession, the ‘activitySegment’ do not naturally occur in succession. As multiple ’placeVisit’ entries can occur, this method is not fool proof, as if the deleted entry is one of a run of ‘placeVisit’ entries, its deletion will not be detectable using this method.
The added fictional location had a number of differences to genuine location data. These indicators can be used to identify where manipulation has taken place.
Precision of timestamps. Genuine entries are recorded to millisecond accuracy. The editing functions only allow specifying to the minute. This means edited timestamps appear with four trailing zeroes.
locationConfidence and visitConfidence values. Edited entries are set to 100.0 (note the Firefox JSON Viewer displays this as just 100, but the raw data is ‘100.0’) suggesting Google fully confident they have the location correct as the user specified it. A small number of genuine entries were also found to be set to 100.0, so whilst these values can be an indicator of editing it is not definitive.
placeConfidence values. This is set to ‘USER_CONFIRMED’ for edited entry. Genuine entries stated ‘HIGH_CONFIDENCE’ or similar.
editConfirmationStatus values. This is set to ‘CONFIRMED’ for edited entries. Genuine entries are set to ‘NOT_CONFIRMED’.
activityType values. This is set to ‘UNKNOWN_ACTIVITY_TYPE’ for edited entries. Genuine entries record the mode of travel (eg ‘WALKING’ or ‘IN_PASSENGER_VEHICLE’). No other unknowns were identified, suggesting this may be a reliable indicator of edits. No method of manually assigning this to any edits was identified.
A ‘parkingEvent’ subentry for the activitySegment preceding the inserted location was found to be retained from before the edits. This means there was a discrepancy between the end location and the parking location, which was an infeasible mismatch.
Note that Google will prompt within the Location History for a user ‘confirm’ a genuine location if it is not certain. Items 2, 3 and 4 of the indicators above were found to be in common with such confirmed entries. Item 1 is completed as would be expected for a genuine entry (millisecond precision), allowing the confirmed entry to be distinguished from edited entries. Item 5 generally does not apply, as it relates to the activitySegments rather than placeVisits, which is what is updated when ‘confirming’ a visit.
Finally, given that the ‘Location History.json’ file is not changed by manipulation of the location data, it is possible to cross reference the locations recorded within that file against those in the Semantic Location History in order to identify where the user is presented as being in different places by each source at around the same time. It is these discrepancies that would appear to be the cause of the suspicious maps discussed earlier. Such circumstances would suggest that the location data had been manipulated.
Conclusion
The location data available within Google Takeout could be critical to help support or refute facts of a case, or to generate intelligence leads for further investigation. It is therefore essential that examiners are able to trust this data, but it is trivial for a user to edit their location data using tools readily provided by Google.
Such editing can, if care is not taken, easily mislead an examiner. This paper highlights a number of indicators that examiners can rely on to establish whether editing has taken place with the easiest indicators to identify being:
Comparison of location data in the Semantic Location History against the ‘Location History.json’, which does not appear to be editable.
Precision of timestamps. Genuine entries are recorded to millisecond accuracy. The editing functions only allow specifying to the minute. This means edited timestamps appear with four trailing zeroes.
activityType values are set to ‘UNKNOWN_ACTIVITY_TYPE’ for edited entries.
Two ‘activitySegment’ entries are in a row, indicating deletion of a ‘placeVisit’ entry.
Ross Donnelly
Digital Team Leader
Keith Borer Consultants
The details shared provide investigators with indicators that can be used to determine the validity of Google Takeout data for a subject. The author explains the steps taken to establish the proposed process and should be able to be followed by other interested investigators.
It is important to consider applications that allow you to trick the phone’s GPS position and consider its impact on Google Maps. The author discusses the “comprehensive record of journeys and visited locations”, however, it is important to consider that the data is only recorded if the user allows it. One of the reviewers also found a “PlaceVisit” entry instead of an “ActivitySegment” entry when testing with their own data. It may be worth trying to analyze this discrepancy, as the reviewer suggests it may depend on individual setups.
Additional work could continue to explore the possibilities of modifying the GPS data received by Google Maps. A study on the accuracy of the data depending on the type of journey (on foot, by bike, by car, etc.) could also be conducted. A script could be used to compare the “LocationHistory.json” data and the data from the Semantic Location History to highlight a hypothesis of manipulation.
Adrien Vincart (Methodology Review)
Eric Eppley (Methodology Review)
Manon Fischer (Methodology Review and Validated Review using Reviewer Generated Datasets)