When your car becomes a data vault. An unexpected lesson from the Spotify scrape.

Nine years ago, in a meeting room at Volvo Cars, Guy Fletcher, OBE (Chairman PRS for Music at the time), and Thor Pettersen argued what seemed unrealistic. They insisted that data was becoming the new oil and that collecting data from the dashboard would be a goldmine. I wish they had been wrong...

When your car becomes a data vault. An unexpected lesson from the Spotify scrape.

On December 19, 2025, news broke that hackers had systematically scraped almost the entire Spotify music library, extracting metadata for 256 million tracks and audio files for 86 million songs. The scope was staggering: about 300 terabytes of data, making up 99.6 percent of all streams on the platform. This incident serves as a stark reminder that in the digital age, content and data exist in environments far more vulnerable than most users realize.

The connection to automotive security might not seem obvious at first, but it is direct and urgent. When preparing to sell a personal vehicle, the same data hygiene habits used to wipe a computer hard drive were necessary for the car itself.


They are networked sensor platforms that continuously gather, process, and share large amounts of personal and behavioral data.

The dashboard is a data collection infrastructure.

The car dashboard and infotainment system now serve as a hub for computing and communications. Navigation, entertainment, communications, diagnostics, and software updates all go through this interface. European data protection authorities explicitly classify connected vehicle data as personal data when it includes location histories, driving behavior, device identifiers, or content linked to identifiable individuals.

Music listening habits illustrate the range of data that can be collected. When users log in to native infotainment apps or connect their smartphones via projection systems, metadata flows into vehicle systems and cloud accounts. Timestamps, track IDs, playlists, session lengths, and voice commands used to select content all record behavior. Over time, these metadata streams can be linked to a person through the vehicle account, the paired mobile device, or persistent identifiers such as the vehicle identification number.

The issue goes far beyond entertainment. Modern vehicles with advanced driver assistance systems gather continuous streams of camera and sensor data about their surroundings, location, speed, acceleration, steering, and braking patterns. Many also include microphones and cameras for cabin features or voice interfaces. Even when raw audio or video isn't uploaded constantly, event-triggered uploads and metadata remain highly sensitive.

National security dimensions of cross-border data flows.

This data environment becomes a geopolitical concern when connectivity infrastructure and software ecosystems are controlled by entities subject to foreign state jurisdiction. The United States Department of Commerce issued a rule in January 2025 restricting certain connected vehicles and key connectivity components linked to China or Russia, explicitly grounding the action in national security risks related to data access and remote control surfaces. The rule considers foreign jurisdiction over vehicle software, connectivity modules, and over-the-air update channels as potential pathways for state-mandated data sharing or remote access.

The concern is real, not hypothetical. China's National Intelligence Law requires organizations and citizens to support and cooperate with national intelligence efforts. This law is often mentioned in policy debates as a reason why corporate assurances alone might not fully address national security risks. European regulators are also starting to consider geopolitical and supply chain risks in connected vehicle cybersecurity strategies, noting that current technical safeguards may be inadequate if they overlook foreign government influence over suppliers and data routes.

The confidentiality risk occurs when sensor, location, identity, and usage data are routed through vendor-controlled systems that might include offshore operations, foreign jurisdictional reach, or opaque subcontractors. Even if individual data points seem harmless, aggregation at the fleet level can reveal sensitive patterns of infrastructure use, mobility flows, and daily routines among targeted populations. Integrity risk arises because the same connectivity that enables software updates and remote diagnostics also creates potential remote access points for unauthorized actors.

The automotive data lifecycle and AI training pipelines

Beyond immediate privacy and security issues, connected vehicles produce high-quality training data for spatial intelligence and world models. These systems do not process language but focus on space, movement, context, and changes in the physical environment. Detailed location trails, routes, sensor observations linked to time and movement, and behavioral telemetry are valuable inputs for machine learning systems that model the physical world.

This is a natural extension of the web scraping methods that led to the creation of large language models. Previously, technology companies collected text, images, and videos from the internet, often without clear consent and in legal gray areas. This laid the groundwork for generative AI. Now, the focus is on passive, ongoing collection of geospatial and behavioral data from everyday products at scale, with little public awareness of what this data can do or how it might be used later.

The strategic aspect is that large-scale passive data collection can boost capabilities in foreign countries without data subjects understanding the downstream uses or cross-border effects. It is not necessary to assume a single severe misuse scenario. It is enough to recognize that, at scale, vehicle-derived geospatial and behavioral data has lasting value. That geopolitical rivalry makes building capabilities a logical goal for both states and companies.

Practical steps for data hygiene before vehicle handover

The Spotify scrape highlights that digital content and metadata can be extracted on a large scale when access controls, rate limiting, and digital rights management are inadequate. Vehicles present a similar vulnerability. When selling or transferring a car, data hygiene practices that are common on personal computers now need to extend to the car itself.

  1. First, disconnect all paired devices. Bluetooth connections, smartphone integrations, and linked accounts create ongoing links between the vehicle and personal identities. Removing these connections prevents the next owner from inheriting contact lists, message histories, or location data associated with previous use.
  2. Second, erase all stored data from infotainment systems. Navigation history, saved destinations, call logs, and media libraries should be cleared. Many modern vehicles provide factory reset options within the settings menu. If such an option is unavailable, consult the manufacturer's documentation or dealership support to ensure all data is completely removed.
  3. Third, log out of all cloud-connected accounts. Services integrated into vehicle operating systems often keep persistent login sessions. Logging out ensures that future vehicle activity doesn't feed into personal cloud accounts or appear in activity logs linked to the original owner.
  4. Fourth, review and revoke application permissions and data sharing settings. Many infotainment systems include opt-in analytics programs, voice recording retention, and third-party data sharing. Disabling these features before handover limits the data accessible to future users and third parties.
  5. Fifth, verify that over-the-air update credentials and remote service accounts are disconnected. Some manufacturers maintain vehicle service accounts that remain active across ownership transfers unless they are explicitly deactivated. Ensuring these accounts are closed prevents unauthorized access to vehicle diagnostics, remote start features, or location tracking by previous owners.

Broader policy implications and risk mitigation

Regulatory responses now address the national security and privacy aspects of connected vehicle ecosystems. Restrictions on high-risk connectivity and automated driving parts, stricter cybersecurity standards, and improved incident reporting are appearing across various jurisdictions. The United States follows a model with a national security supply chain rule for connected vehicles. European frameworks combine basic cybersecurity rules with ongoing monitoring of connected vehicle risks and supply chain resilience.

For organizations managing vehicle fleets, a risk-aware approach generally involves procurement policies, contractual control over telemetry and cloud routing, strict oversight of updates, and clear restrictions on which vehicle applications and services can access. For individual consumers, it means transparency about what data is collected, where it is processed, whether essential vehicle functions stay operational when data sharing is limited, and what hygiene measures are necessary before transferring the vehicle.

The Spotify incident and the broader movement toward spatial intelligence models highlight the same basic reality: everyday products now serve as data-collection tools for systems whose full effects are not yet well understood by the general public. Vehicles are a particularly sensitive example because they combine mobility, location tracking, behavioral observation, and constant connectivity in a single platform, which is subject to cross-border supply chains and foreign government jurisdiction.

Data hygiene at the point of vehicle handover is not a minor administrative task. It is a crucial safety measure in an environment where connected systems routinely collect, transmit, and store data at scale, and where geopolitical competition increasingly influences the data flows embedded in consumer products. The Spotify scrape demonstrated that 300 terabytes of content can be extracted when safeguards are inadequate. The automotive equivalent is already underway, distributed across millions of vehicles, collecting data every day.