How whatsapp database works


Understanding WhatsApp's Architecture & System Design

Which app has over 2.5 billion active users, over 5 billion downloads, and is the most popular app in over 100 countries?

Hint: check the article title.

Yes, that’s right. WhatsApp is the most popular messaging service in the world. According to Mark Zuckerberg, over 100 billion messages are sent over WhatsApp every day.

With such almost-astronomical traffic, one can’t help but wonder how WhatsApp works - its system design, server architecture, technology . How does it handle so many concurrent users and messages? What kind of frameworks and programming languages enable that kind of scale? How do they keep all that data secure? So many questions!

In this article, we are going to take a deep dive into WhatsApp’s architecture and system design. We’ll answer all the above-mentioned questions and more.
If you’ve ever wondered about the top dog in the chat app world, keep reading.

Disclaimer: We scoured the internet to collect every resource on WhatsApp architecture design and have compiled and summarized it here. To the best of our knowledge, this information is accurate. However, as companies do update their tech stack frequently, this information is subject to change.

WhatsApp Front-End Tech Stack


Let’s start with the frontend and work our way to the hardware on the backend.

The first part of the WhatsApp system design that a user interacts with is the mobile or web app. WhatsApp supports nearly all platforms. It has an iOS app, Android app, desktop app, web app, and Windows Phone app. Up until 2017, you could even use WhatsApp on a BlackBerry.

With so many supported platforms, you may have guessed that WhatsApp would be a hybrid app. But, in fact, it’s not. They actually built a native app for each platform. Here's a list of all the supported platforms with the front-end language(s) that were used to build each one:

  • Android: Java 
  • iOS: Swift
  • Windows Phone: C#
  • Web app: JavaScript/HTML/CSS/
  • Mac Desktop app: Swift/Objective-C
  • PC Desktop app: C/C#/Java

How WhatsApp Stores Chat Locally

In addition to the programming language itself, another important technology that WhatsApp uses on the frontend is an SQLite database. SQLite is a stand-alone, self-contained, relational database that is meant to be embedded into applications—which means it lives on your device. WhatsApp uses it to store conversations. Since it would be a waste of resources to download all the messages from the cloud every time you open the app, WhatsApp chooses to store the messages locally. In fact, WhatsApp only stores messages until they are received at which point they get removed.

Which Messaging Protocols Does WhatsApp Use?


WhatsApp uses a highly modified version of XMPP on an Ejabberd server (more on that later) to communicate with the clients.

The XMPP on the client opens an SSL socket to the WhatsApp servers. All the sent messages are queued on the servers until the client opens or reconnects to this socket to retrieve the messages. Once a message is successfully retrieved by the client, a success status is sent back to the WhatsApp server. The server then forwards this status to the original sender; letting them know that the message was received by adding the “checkmark” icon next to the successfully sent message.

Keep in mind that, while XMPP is one of the most popular messaging protocols for chat apps, it is definitely not the only option for choosing a messaging protocol. 

WhatsApp Encryption Technology

WhatsApp uses end-to-end encryption. Ideally, this means that only the original sender and the true recipient of the message can read the message in plain text.

When you send a message, it gets encrypted using a specific encryption protocol (more on that next). WhatsApp then stores this encrypted message on their servers until it’s delivered to the recipient. Upon delivery, the recipient's device decrypts the message back into a readable, plaintext message using a unique cryptographic key. Across this entire process, WhatsApp never knows the content of your message.

WhatsApp’s encryption technology is called Signal Encryption Protocol, which was developed by Open System Whispers to be a modern, open-source, strong encryption protocol for asynchronous messaging systems.

While end-to-end encryption may make you feel safe in theory. In practice, end-to-end encryption isn’t as privacy-protecting as one would hope.

WhatsApp Back-End Tech Stack


Let’s move on to the backend.

To the best of our knowledge, the current WhatsApp back-end system design looks like this:

  • Erlang is the main programming language
  • FreeBSD is the operating system
  • Ejabberd is the XMPP application server 
  • BEAM is the Erlang-based virtual machine
  • Mnesia is their Erlang-based database
  • YAWS is their multimedia web server

Let’s explore some of the more interesting aspects of WhatsApp’s back-end architecture:

Erlang

WhatsApp's choice of programming language is in large part what allows it to work on such a colossal scale. 

Erlang is a functional programming language that is oriented towards building concurrent, scalable, and reliable systems. It uses a process-based model called the “actor model” in which small, isolated processes communicate with each other through messages. These processes can create new processes, send messages and modify their state in response to receiving messages.

Its process-based property gives Erlang its extremely high concurrency, scalability, and reliability.

These processes can also communicate with processes outside of the core on which it runs. This makes it easy to scale the system horizontally (by adding more machines) or vertically (by adding more cores). Lastly, since the processes can communicate with each other and, more importantly, restart each other, it’s easy to build self-healing systems. If a bug crashes a process, another process can restart it.

FreeBSD


An interesting technical choice by WhatsApp's founders was picking FreeBSD as an operating system instead of a more widely used system (like Linux).

Brian Acton, one of the cofounders of WhatsApp, said this in an interview with Wired about the decision:

“Linux is a beast of complexity. FreeBSD has the advantage of being a single distribution with an extraordinarily good ports collection.”

Also, when it comes to raw performance, especially in regards to system load per packet, no other operating system can beat FreeBSD. 

However, when it comes down to it, the real reason that they decided to use FreeBSD is probably because both co-founders had a long history of working with it at Yahoo!.

Ejabberd


Ejabberd is an open-source XMPP server that is written in Erlang. WhatsApp uses a modified version of XMPP as its protocol for handling message delivery. Even the Ejabberd server that WhatsApp uses is heavily customized to optimize for server performance.

What’s the purpose of Ejabberd?

Well, it handles the message routing, deliverability, and general instant messaging aspects of the app. Features of Ejabberd include:

  • One-on-one messaging
  • Group chat
  • Storing and forwarding offline messages
  • Contact list and presence

Mnesia


To store data and temporary messages, WhatsApp uses an Erlang-based, distributed DBMS (Database Management System) called Mnesia. This DBMS provides benefits that many traditional databases don’t such as:

  • Real-time key/value lookup
  • High fault tolerance
  • Dynamic reconfiguration
  • Complex objects 

Mnesia is also the only DBMS that’s written in Erlang. This in itself is a benefit because there are no data structure differences between Erlang in the application and Erlang in the DBMS. Coding is, therefore, quicker and more explicit.

BEAM


BEAM, short for “Bogdan’s Erlang Abstract Machine”, is a virtual machine that compiles and executes Erlang source code. The BEAM is designed specifically for highly concurrent applications - perfect for WhatsApp’s use case. BEAM’s secret sauce is light-weight processes that don’t share memory and are managed by schedulers. These schedulers can manage millions of processes across multiple cores. This makes BEAM highly scalable and resistant to failures, such as those caused by high traffic loads, system updates, and network outages.

BEAM is so crucial to the WhatsApp system design that the WhatsApp team has published many patches and fixes to the core source code.

YAWS


YAWS (Yet Another Web Server) is an Erlang-based web server that's ideal for dynamic content. WhatsApp uses YAWS for storing multimedia data. YAWS itself uses HTML5 WebSockets that simplify two-way communication by establishing a reliable and fast connection between the server and the app. Through the use of this technology, WhatsApp is able to send and receive multimedia data across billions of devices—in near real time.

WhatsApp Hardware Components

In 2017, four years after being acquired by Facebook, WhatsApp was taken off of IBM SoftLayer’s cloud and brought into Facebook’s proprietary data centers.

What we do know is that in 2014 WhatsApp required around 550 servers and over 11,000 cores that ran Erlang. We also know that WhatsApp’s user base was "only" around half a billion in 2014 compared to the more than 2 billion users it reached in 2020. So, with that data in mind, we'll let you imagine how many servers and cores WhatsApp now requires. We imagine it's a lot.

WhatsApp Architecture Diagram


The easiest way to get a full understanding of WhatsApp’s architecture design is, of course, through a WhatsApp architecture diagram.

Starting from the left side we have multiple different clients (mobile and web apps), each of which hosts a local SQLite database for storing conversations. 

The clients use HTTP WebSockets to send and retrieve multimedia data like images and videos from the YAWS web server. But, as you can see, XMPP is used to actually send those files and other messages to other users.

When an XMPP message is sent, it goes through the series of steps depicted above. First, it gets sent to WhatsApp’s custom Ejabberd server which runs on BEAM and FreeBSD. The Ejabberd server saves the message in a Mnesia database table where it gets put into a queue. When the receiving user opens the app, thereby reconnecting to the socket, the message in the queue gets routed through the Ejabberd server and delivered to the recipient. Once successful delivery can be confirmed, the message gets deleted from the Mnesia database.

Conclusion

While we don’t know the exact specifications of WhatsApp’s technical architecture and system design, we can get a good idea based on the technologies that WhatsApp employs. We hope this article, exploring the WhatsApp architecture design, has answered your burning questions. Now that you've gained an understanding of how the WhatsApp server works, learned what the WhatsApp tech stack looks like, and even scanned a WhatsApp architecture diagram...maybe you're feeling empowered to take on a chat app project of your own.  

If you’re ready to give WhatsApp a run for their money, sign up to our developer dashboard and start building your chat app for free. 

But keep in mind that many of the technologies in the WhatsApp technology stack were specifically chosen for their ability to scale and handle extremely high concurrency. 

If you’re trying to build a dating app or telemedicine, (or anything that doesn’t need almost the entire world to be online at the same time), you may not need the amount of scale that WhatsApp does.  

In other words, the WhatsApp tech stack, while perfect for WhatsApp, may not be the best solution for you. To learn about the ideal architecture and tech stack for a chat app, head to this article.

If you still have questions about what IS right for you, feel free to talk to our experts and before you start building your own chat app.

Just hungry for more? Here are some more great resources to dive into:

  • The Myth of End-to-End Encryption in Messaging Apps
  • Understanding the Architecture & System Design of a Chat Application
  • 11 Silly Mistake Developers Make When Building a Chat Application

About the Author

Cosette Cressler is a passionate content marketer specializing in SaaS, technology, careers, productivity, entrepreneurship and self-development. She helps grow businesses of all sizes by creating consistent, digestible content that captures attention and drives action.

Analyzing My WhatsApp Database using SQL and Redash

How to create an interactive dashboard with some KPIs about how you and your friends use WhatsApp

Some months ago I was scrolling through my WhatsApp chats and suddenly an idea came through: why do not extract my WhatsApp database and perform some data analysis on it? A lot of interesting metadata about how I and my contacts use WhatsApp could be extracted.

Photo by LinkedIn Sales Solutions on Unsplash

It was not an easy process, since you need to somehow copy that database from your phone to your computer, then understand it, and once you have understood how it is structured, think on what useful information can be extracted and how to present it. It has taken me a lot of time, but now I feel very proud of the results, and I am really excited about sharing them with you.

In this post I will explain you:

  • How to copy your WhatsApp database from your iPhone or Android phone to your computer
  • How to get the relevant data for this experiment from the raw database
  • The SQL queries to get the most interesting KPIs about our usage of WhatsApp
  • How to create an interactive dashboard to visualize those KPIs –we will use Redash, but feel free to choose any other–.

If you want to try it yourself, you will need a couple of things:

  • A rooted Android phone or an iPhone (no need to be jailbroken)
  • A USB cable to connect the phone to your computer
  • adb if your phone is an Android, or a Windows PC with iTunes or a Mac if it is an iPhone
  • sqlite3 installed on your computer to work with the database
  • gcc and make installed on your computer to compile the backup extractor (unless you have a Mac, because the tool is already compiled for macOS)
  • Docker installed on your computer to deploy Redash or any other dashboarding tool

Are you ready to join me? Let’s go!

From an Android phone

If you have an Android phone, I am sorry to tell you that you need a rooted device. The WhatsApp database is stored in a location of the filesystem with restricted access, so you will need special permissions to be able to get it. There are some alternative ways to get it, but at the end all require root access.

Photo by Andrew M on Unsplash

In case you have your phone rooted, you have to install adb following these steps, and then connect your mobile phone with a USB cable and run the commands inside your working directory:

adb root
adb shell sqlite3 wa.db “.backup ‘wa.db.bak’”
adb pull /data/data/com.whatsapp/databases/msgstore.db
adb pull /data/data/com.whatsapp/databases/wa.db.bak wa.db
sqlite3 msgstore.db .dump > whatsapp.sql
sqlite3 wa.db .dump >> whatsapp.sql
sqlite3 whatsapp.db < whatsapp.sql

What we are doing here is get root permissions, get the two databases where the WhatsApp data is stored, and join them in a single one.

From an iPhone

With a jailbroken iPhone the process might be easier –it would probably look similar to the explained above for Android–, but since my iPhone is not jailbroken (and most out there in the world neither), we will extract it from a backup instead.

First of all, you have to connect your iPhone to your computer and back it up. You can find detailed instructions for Mac and Windows on the official Apple website. Then, we will extract the WhatsApp database from the backup using an open-source tool called imobax. If you have a Mac, you can directly download the executable file from here. If you are using Windows, then you will need to compile it yourself using gcc and make (instructions on how to install it here): just download the repository to your computer and run the command “make” inside the folder.

If you have used a Mac to make the backup, it will be stored in ~/Library/Application Support/MobileSync/Backup. If you did it in Windows, the directory will be either %userprofile%\Apple\MobileSync\Backup or %appdata%\Apple Computer\MobileSync\Backup. Inside those directories there will be a folder with your backup, and inside all the backed up files, including our database. The problem is that the files have random alphanumeric names. It is there where imobax will help us, by telling us which file is the database.

./imobax -l <backup location> | grep ChatStorage.sqlite | awk ‘{print $1}’

The command above will give you the name of the database. Just search for it in the backup folder, copy it to your working directory and rename it as you wish, for example, whatsapp.db.

The WhatsApp database is full of tables, some of them with myriads of columns and rows. Most of them will not be relevant for us, so let’s see which tables and columns we are going to work with.

On Android

  • wa_contacts: information about your contacts and groups
    unseen_msg_count: number of messages without reading from that contact/group
    jid: identifier of the contact or group: contacts will end with “@s.whatsapp.net” and groups with “@g.us”
    display_name: name of the contact or group
  • chat_view: chat sessions
    raw_string_jid: identifier of the contact/group
    last_message_row_id: FK of the last message from that chat in messages
    sort_timestamp: date of the last message from that contact/group
  • messages
    key_from_me: 0 for incoming messages, 1 for outgoing
    media_wa_type: whether the message is text (0), an image (1), a video (2), a voice message (3)…
    timestamp: date and time when the message was sent or received in UNIX time format.
    data: text of the message (null for multimedia messages without a text)
    key_remote_jid: identifier of the remote part (the sender if the message was received or the receiver if the message was sent)

On iPhone

  • ZWACHATSESSION: information about your contacts and groups
    ZMESSAGECOUNTER: number of messages exchanged with this contact/group
    ZSESSIONTYPE: 0 if it is a private message to/from a contact, 1 if it is a group, 2 for broadcast and 3 for status.
    ZUNREADCOUNT: number of messages without reading from that contact/group
    ZLASTMESSAGE: FK for the last message from that contact/group in ZWAMESSAGE
    ZLASTMESSAGEDATE: date of the last message from that contact/group
    ZCONTACTJID: identifier of the contact or group: contacts will end with “@s.whatsapp.net” and groups with “@g.us”
    ZPARTNERNAME: name of the contact/group
  • ZWAMESSAGE: sent and received messages
    ZISFROMME: 0 for incoming messages, 1 for outgoing
    ZMESSAGETYPE: whether the message is text (0), an image (1), a video (2), a voice message (3)…
    ZMESSAGEDATE: date and time when the message was sent or received in UNIX time format, but with the timestamp starting on 1st January 2001 instead 1st January 1970 (explanation here)
    ZTEXT: text of the message (null for multimedia messages without a text)
    ZFROMJID: identifier of the sender; if the message was sent by the user (ZISFROMME == 1), this field will be null
    ZTOJID: identifier of the receiver; if the message was received by the user (ZISFROMME == 0), this field will be null

To later make our SQL queries easier, it is a good idea to create now two views in our database: one for the private, individual messages (let’s call it friends_messages) and another for the messages from group chats (group_messages). This will not only simplify the SQL queries, but also allow us to use the same query for iPhone and Android.

On Android

On iPhone

For my experiment, I have used Redash to create a dashboard to visualize the KPIs that we will explore afterwards. This open-source tool can be easily deployed with Docker following this guide. It goes without saying that you are free to use any other visualization tool which supports SQLite, like PowerBI, Tableau, etc.

Once we have imported the database with the views from previous step created, the next step is to create one query to get the list of contacts and another to get the list of groups. This will be needed to then be able to show a dropdown list to filter those KPIs which are about a single contact or group. In Redash, this is done by creating those two SQL queries, and then adding a parameter of type “Query Based Dropdown List” on the corresponding KPIs, which can be referred as “{{ parameter_name }}” in the SQL query.

Definition of a variable called “friend_name” inside a Redash query, which will take the value from a dropdown list with all your contacts names

SQL query to retrieve the names of all your contacts:

SQL query to retrieve the names of all your groups:

Now we are ready to get our hands dirty and write some SQL queries to extract some interesting KPIs about how do we use WhatsApp. The actual number of KPIs that can be extracted is just limited by your imagination and creativity, so you are welcome to create your own. I will leave you here some of the ones I created:

KPI #1: People with whom you have talked the most during the last 30 days

This query can be visualized with a vertical bar chart, with the column “friend_name” on the X-axis and “number_of_messages” on the Y-axis.

Top 20 of contacts with whom I have talked the most during the latest 30 days (names on the bottom are cropped for privacy reasons)

KPI #2: Number per messages of day with each of your friends

This query can be visualized with a line chart, with the column “day” on the X-axis and “ma_mom” on the Y-axis. Pay attention to the first lines of the query, where a moving average of 30 days is calculated. Without it the chart would look really sharp and noisy, hence applying this filter.

It is also important to remark that {{ friend_name }} is a variable, so Redash (or the corresponding visualization tool) will replace it with the selected contact.

Number of messages per day exchanged with a contact among time. The value has been smoothed with a 30 days moving average.

KPI #3: moments of the day and week where you talk the most with one of your friends

For this KPI, we are grouping the messages by hour of the day and day of the week, using the sqlite function strftime(), and then counting the total number of exchanged messages on each day of the hour and day of the week. Then we can plot it in a pivot table.

This heat map shows the number of messages exchanged with a friend on each hour of each day of the week

KPI #4: Top 10 friends who write the longest messages

In this query, we are calculating the average length of the messages sent to us by each of our contacts, getting the top 10, and then plotting it in a vertical bars chart. However, it is quite normal to write long texts split in several messages, so this KPI is not very accurate. I tried to calculate it taking this fact into account, but very complex algorithms would be needed, which unfortunately cannot be implemented using sqlite.

This chart shows the average number of letters of the messages sent by the top 10 of contacts who write the longest messages on average (names on the bottom are cropped for privacy reasons)

KPI #5: Number of messages sent by each member of a group among time

In a similar way to KPI #2, here we are filtering with a 30 days moving average the number of messages sent per day by each of the group members, including myself (“Me”). Since we are working with groups, the view “group_messages” is used instead “friend_messages”, and a new variable is defined in the dashboard (“group_name”), so the user can choose the group on which he wants to see this chart.

Evolution of the participation rate (number of messages per day filtered with a 30 days moving average) of each group participant, including myself

This was a huge post, where I have covered a long process: extracting the database from our phone, cleaning the data, doing some analysis with SQL and then visualizing those results in a chart. I had a lot of fun doing it, so I welcome you to try it yourselves and try to elaborate some new KPIs. Feel free to post a comment about your impressions, findings or improvements, I will be happy to read from you!

Sources

  1. ADB documentation: https://developer.android.com/studio/command-line/adb#shellcommands
  2. SQLite 3 window functions: https://sqlite.org/windowfunctions.html
  3. strftime manual: https://man7.org/linux/man-pages/man3/strftime.3.html
  4. WhatsApp in Plain Sight: Where and How You Can Collect Forensic Artifacts: https://blog.group-ib.com/whatsapp_forensic_artifacts
  5. Adventures in WhatsApp DB — extracting messages from backups (with code examples): https://medium.com/@Med1um1/extracting-whatsapp-messages-from-backups-with-code-examples-49186de94ab4

where and how can forensic artifacts be found? / Sudo Null IT News With this article, Igor Mikhailov

, a specialist at the Group-IB Computer Forensics Laboratory, opens a series of publications about WhatsApp forensic research and what information can be obtained from device analysis.

Just note that different operating systems store different types of WhatsApp artifacts, and if a researcher can extract certain types of WhatsApp data from one device, this does not mean at all that similar types of data can be extracted from another device. For example, if a system unit running Windows is removed, then WhatsApp chats will probably not be found on its drives (the exception is backup copies of iOS devices that can be found on the same drives). When seizing laptops and mobile devices, there will be some peculiarities. Let's talk about this in more detail.

WhatsApp artifacts in Android device

In order to extract WhatsApp artifacts from an Android device, the researcher must have root access ( 'root' ) on the device under investigation, or be able to otherwise extract the physical memory dump of the device, or its file system (for example, using software vulnerabilities specific mobile device).

Application files are located in the phone's memory in the partition where user data is stored. Typically, this section is named ‘userdata’ . Subdirectories and files of the program are located along the path: ‘/data/data/com.whatsapp/’ .

The main files that contain WhatsApp forensic artifacts in Android OS are databases 'wa.db' and 'msgstore.db' .

Database ‘wa.db’ contains the complete WhatsApp user contact list, including phone number, display name, timestamps and any other information provided during WhatsApp registration. File ‘wa.db’ is located along the path: ‘/data/data/com.whatsapp/databases/’ and has the following structure:

The most interesting tables in the database 'wa.db' for the researcher are:

  • 'wa_contacts'
    This table contains contact information: whatsapp contact id, status information, user display name, timestamps, etc.

    Table appearance:


    Table structure

    Field name Meaning
    _id sequence number of the record (in the SQL table)
    jid WhatsApp Contact ID, written in the format @s. whatsapp.net
    is_whatsapp_user contains '1' if the contact is an actual WhatsApp user, '0' otherwise
    status contains the text displayed in contact status
    status_timestamp contains timestamp in Unix Epoch Time (ms) format
    number phone number associated with contact
    raw_contact_id contact number
    display_name contact display name
    phone_type phone type
    phone_label label associated with contact number
    unseen_msg_count number of messages sent by the contact but not read by the recipient
    photo_ts contains a timestamp in Unix Epoch Time format
    thumb_ts contains timestamp in Unix Epoch Time 9 format0060
    photo_id_timestamp contains timestamp in Unix Epoch Time (ms) format
    given_name field value is the same as 'display_name' for each contact
    wa_name Whatsapp contact name (displays the name in the contact's profile)
    sort_name Contact name used in sort operations
    nickname WhatsApp nickname of the contact (displays the nickname specified in the contact's profile)
    company company (displays the company listed in the contact profile)
    title Title (Madam/Mr. ; displays the title configured in the contact profile)
    offset offset

  • ‘sqlite_sequence’
    This table contains information about the number of contacts;
  • ‘android_metadata’
    This table contains information about the language localization of WhatsApp.

The database 'msgstore.db' contains information about transferred messages, such as contact number, message text, message status, timestamps, information about transferred files included in messages, etc. File ‘msgstore.db’ is located along the path: ‘/data/data/com.whatsapp/databases/’ and has the following structure:

The most interesting tables in file ‘msgstore.db’ for the researcher are:

  • ‘sqlite_sequence’
    This table contains general information about this database, such as the total number of messages stored, the total number of chats, and so on.

    Table view:

  • ‘message_fts_content’
    Contains the text of the sent messages.

    Table view:

  • ‘messages’
    This table contains information such as contact number, message text, message status, timestamps, information about transferred files included in messages.

    Table appearance:


    Table structure

    Field name Meaning
    _id sequence number of the record (in the SQL table)
    key_remote_jid WhatsApp Communication partner ID
    key_from_me message direction: '0' - incoming, '1' - outgoing
    key_id unique message identifier
    status message status: '0' - delivered, '4' - waiting on the server, '5' - received at destination, '6' - control message, '13' - message opened by the recipient (read)
    need_push is '2' if it is a broadcast message, otherwise '0'
    data message text (when 'media_wa_type' is '0')
    timestamp contains a timestamp in Unix Epoch Time (ms) format, the value is taken from the device clock
    media_url contains the URL of the file being transferred (when the 'media_wa_type' parameter is '1', '2', '3')
    media_mime_type MIME type of the transferred file (when the 'media_wa_type' parameter is '1', '2', '3')
    media_wa_type message type: '0' - text, '1' - graphic file, '2' - audio file, '3' - video file, '4' - contact card, '5' - location data
    media_size transfer file size (when parameter 'media_wa_type' is '1', '2', '3')
    media_name transfer file name (when parameter 'media_wa_type' is '1', '2', '3')
    media_caption Contains the words 'audio', 'video' for the corresponding values ​​of the parameter 'media_wa_type' (when the parameter 'media_wa_type' is '1', '3')
    media_hash base64 encoded hash of the transmitted file calculated using the HAS-256 algorithm (when the 'media_wa_type' parameter is '1', '2', '3')
    media_duration duration in seconds for the media file (when 'media_wa_type' is '1', '2', '3')
    origin is '2' if it is a broadcast message, otherwise '0'
    latitude location data: latitude (when 'media_wa_type' is set to '5')
    longitude geodata: longitude (when 'media_wa_type' is '5')
    thumb_image service information
    remote_recource Sender ID (only for group chats)
    received_timestamp time of receipt, contains a timestamp in Unix Epoch Time (ms) format, the value is taken from the device clock (when the 'key_from_me' parameter is '0', '-1' or another value)
    send_timestamp not used, usually set to ‘-1’
    receipt_server_timestamp time received by the central server, contains a timestamp in the Unix Epoch Time (ms) format, the value is taken from the device clock (when the 'key_from_me' parameter has '1', '-1' or another value
    receipt_device_timestamp the time the message was received by another subscriber, contains a timestamp in the Unix Epoch Time (ms) format, the value is taken from the device clock (when the 'key_from_me' parameter has '1', '-1' or another value
    read_device_timestamp message opening (reading) time, contains a timestamp in the Unix Epoch Time (ms) format, the value is taken from the device clock
    played_device_timestamp message playback time, contains a timestamp in Unix Epoch Time (ms) format, the value is taken from the device clock
    raw_data Transfer file thumbnail (when 'media_wa_type' is '1' or '3')
    recipient_count number of recipients (for broadcast messages)
    participant_hash is used when sending messages with geodata
    starred not used
    quoted_row_id unknown, usually '0'
    mentioned_jids not used
    multicast_id not used
    offset offset

    This list of fields is not exhaustive. For different versions of WhatsApp, some of the fields may or may not be present. Additionally, the fields 'media_enc_hash' , 'edit_version' , 'payment_transaction_id' , etc. may be present.

  • ‘messages_thumbnails’
    This table contains information about transferred images and timestamps. The 'timestamp' column indicates the time in Unix Epoch Time (ms) format.
  • ‘chat_list’
    This table contains information about chats.

    Table view:

Also, when examining WhatsApp on an Android mobile device, you should pay attention to the following files:

You also need to pay attention to the following directories:

  • Directory ‘/data/media/0/WhatsApp/Media/WhatsApp Images/’ . Contains uploaded graphic files.
  • Directory ‘/data/media/0/WhatsApp/Media/WhatsApp Voice Notes/’ . Contains voice messages in . OPUS format files.
  • Directory ‘/data/data/com.whatsapp/cache/Profile Pictures/’ . Contains graphic files - images of contacts.
  • Directory ‘/data/data/com.whatsapp/files/Avatars/’ . Contains graphic files - thumbnail images of contacts. These files have a '.j' extension, but are still JPEG (JPG) image files.
  • Directory ‘/data/data/com.whatsapp/files/Avatars/’ . Contains graphic files - an image and a thumbnail of an image set as an avatar by the account owner.
  • Directory ‘/data/data/com.whatsapp/files/Logs/’ . Contains the program operation log (file 'whatsapp.log') and backup copies of the application operation logs (files with file names in the format whatsapp-yyyy-mm-dd.1.log.gz).

Whatsapp log files:


Log snippet

2017-01-10 09:37:09.757 LL_I D [524:WhatsApp Worker #1] missedcallnotification/init count:0 timestamp:0
2017-01-10 09:37:09. 758 LL_I D [524:WhatsApp Worker #1] missedcallnotification/update cancel true
2017-01-10 09:37:09.768 LL_I D [1:main] app-init/load-me
2017-01-10 09:37:09.772 LL_I D [1:main] password file missing or unreadable
2017-01-10 09:37:09.782 LL_I D [1:main] statistics Text Messages: 59 sent, 82 received / Media Messages: 1 sent (0 bytes), 0 received (9850158 bytes) / Offline Messages: 81 received (19522 msec average delay) / Message Service: 116075 bytes sent, 211729 bytes received / Voip Calls: 1 outgoing calls, 0 incoming calls, 2492 bytes sent, 1530 bytes received / Google Drive: 0 bytes sent, 0 bytes received / Roaming: 1524 bytes sent, 1826 bytes received / Total Data: 118567 bytes sent, 10063417 bytes received
2017-01-10 09:37:09.785 LL_I D [1:main] media-state-manager/refresh-media-state/writable-media
2017-01-10 09:37:09.806 LL_I D [1:main] app-init/initialize/timer/stop: 24
2017-01-10 09:37:09.811 LL_I D [1:main] msgstore/checkhealth
2017-01-10 09:37:09. 817 LL_I D [1:main] msgstore/checkhealth/journal/delete false
2017-01-10 09:37:09.818 LL_I D [1:main] msgstore/checkhealth/back/delete false
2017-01-10 09:37:09.818 LL_I D [1:main] msgstore/checkdb/data/data/com.whatsapp/databases/msgstore.db
2017-01-10 09:37:09.819 LL_I D [1:main] msgstore/checkdb/list _jobqueue-WhatsAppJobManager 16384 drw=011
2017-01-10 09:37:09.820 LL_I D [1:main] msgstore/checkdb/list _jobqueue-WhatsAppJobManager-journal 21032 drw=011
2017-01-10 09:37:09.820 LL_I D [1:main] msgstore/checkdb/list axolotl.db 184320 drw=011
2017-01-10 09:37:09.821 LL_I D [1:main] msgstore/checkdb/list axolotl.db-wal 436752 drw=011
2017-01-10 09:37:09.821 LL_I D [1:main] msgstore/checkdb/list axolotl.db-shm 32768 drw=011
2017-01-10 09:37:09.822 LL_I D [1:main] msgstore/checkdb/list msgstore.db 540672 drw=011
2017-01-10 09:37:09.823 LL_I D [1:main] msgstore/checkdb/list msgstore.db-wal 0 drw=011
2017-01-10 09:37:09.823 LL_I D [1:main] msgstore/checkdb/list msgstore. db-shm 32768 drw=011
2017-01-10 09:37:09.824 LL_I D [1:main] msgstore/checkdb/list wa.db 69632 drw=011
2017-01-10 09:37:09.825 LL_I D [1:main] msgstore/checkdb/list wa.db-wal 428512 drw=011
2017-01-10 09:37:09.825 LL_I D [1:main] msgstore/checkdb/list wa.db-shm 32768 drw=011
2017-01-10 09:37:09.826 LL_I D [1:main] msgstore/checkdb/list chatsettings.db 4096 drw=011
2017-01-10 09:37:09.826 LL_I D [1:main] msgstore/checkdb/list chatsettings.db-wal 70072 drw=011
2017-01-10 09:37:09.827 LL_I D [1:main] msgstore/checkdb/list chatsettings.db-shm 32768 drw=011
2017-01-10 09:37:09.838 LL_I D [1:main] msgstore/checkdb/version 1
2017-01-10 09:37:09.839 LL_I D [1:main] msgstore/canquery
2017-01-10 09:37:09.846 LL_I D [1:main] msgstore/canquery/count 1
2017-01-10 09:37:09.847 LL_I D[1:main] msgstore/canquery/timer/stop: 8
2017-01-10 09:37:09.847 LL_I D [1:main] msgstore/canquery 517 | time spent:8
2017-01-10 09:37:09. 848 LL_I D [529:WhatsApp Worker #3] media-state-manager/refresh-media-state/internal-storage available:1,345,622,016 total:5,687,922,688



mobile device models

Some models of Android mobile devices may store WhatsApp artifacts in a different location. This is due to the change in the application data storage space by the system software of the mobile device. So, for example, in Xiaomi mobile devices there is a function to create a second workspace (“SecondSpace”). When this function is activated, the location of the data is changed. So, if in a regular mobile device running the Android OS, user data is stored in the directory ‘/data/user/0/’ (which is a reference to the usual ‘/data/data/’ ), then in the second workspace, application data is stored in the directory ‘/data/user/10/’ . That is, for example, the location of the file 'wa.db' :

  • in a regular Android smartphone: /data/user/0/com. whatsapp/databases/wa.db' (which is equivalent to ' /data/data/com.whatsapp/databases/wa.db') ;
  • in the second workspace of the Xiaomi smartphone: ‘/data/user/10/com.whatsapp/databases/wa.db’ .

WhatsApp artifacts in iOS device

Unlike Android OS, in iOS, WhatsApp data is transferred to a backup copy (iTunes backup). Therefore, extracting data from this application does not require extracting the file system or creating a physical memory dump of the device under investigation. Most of the relevant information is contained in the database ‘ChatStorage.sqlite’ , which is located along the path: ‘/private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/’ (in some programs this path appears as ‘AppDomainGroup-group.net.whatsapp.WhatsApp.shared’ ).

Structure ‘ChatStorage.sqlite’ :

The most informative in the database 'ChatStorage. sqlite' are tables 'ZWAMESSAGE' and 'ZWAMEDIAITEM' .

Table layout ‘ZWAMESSAGE’ :


Table structure ‘ZWAMESSAGE’

Field name Meaning
Z_PK sequence number of the record (in the SQL table)
Z_ENT table identifier, value is '9'
Z_OPT unknown, usually contains values ​​from '1' to '6'
ZCHILDMESSAGESDELIVEREDCOUNT unknown, usually contains the value '0'
ZCHILDMESSAGESPLAYEDCOUNT unknown, usually '0'
ZCHILDMESSAGESREADCOUNT unknown, usually '0'
ZDATAITEMVERSION unknown, usually contains value '3', probably text message pointer
ZDOCID unknown
ZENCRETRYCOUNT unknown, usually '0'
ZFILTEREDRECIPIENTCOUNT unknown, usually contains the values ​​'0', '2', '256'
ZISFROMME message direction: '0' - incoming, '1' - outgoing
ZMESSAGEERRORSTATUS message transfer status. If the message is sent/received, it has the value '0'
ZMESSAGETYPE message type to be transmitted
ZSORT unknown
ZSPOTLIGHSTATUS unknown
ZSTARRED unknown, not used
ZCHATSESSION unknown
unknown, not used
ZLASTSESSION unknown
ZMEDIAITEM unknown
ZMESSAGEINFO unknown
ZPARENTMESSAGE unknown, not used
ZMESSAGEDATE timestamp in OS X Epoch Time format
ZSENTDATE the time the message was sent in OS X Epoch Time format
ZFROMJID WhatsApp Sender ID
ZMEDIASECTIONID contains the year and month the media file was sent
ZPHASH unknown, not used
ZPUSHPAME name of the contact who sent the media file in UTF-8 format
ZSTANZID unique message identifier
ZTEXT message text
ZTOJID WhatsApp Recipient ID
OFFSET offset

Table appearance ‘ZWAMEDIAITEM’ :


Table structure ‘ZWAMEDIAITEM’

Field name Meaning
Z_PK record number (in SQL table)
Z_ENT table identifier, value is '8'
Z_OPT is unknown, usually contains values ​​from '1' to '3'.
ZCLOUDSTATUS contains the value '4' if the file is loaded.
ZFILESIZE contains the file length (in bytes) for uploaded files
ZMEDIAORIGIN unknown, usually '0'
ZMOVIEDURATION duration of the media file, for pdf files it can contain the number of pages of the document
ZMESSAGE contains a serial number (number differs from the one shown in the 'Z_PK' column)
ZASPECTRATIO aspect ratio, not used, usually set to '0'
unknown, usually '0'
ZLATTITUDE width in pixels
ZLONGTITUDE height in pixels
ZMEDIAURLDATE timestamp in OS X Epoch Time format
author (for documents, may contain file name)
ZCOLLECTIONNAME not used
ZMEDIALOCALPATH file name (with path) in the file system of the device
ZMEDIAURL The URL where the media file was located. If the file was transferred from one subscriber to another, it was encrypted, and its extension will be indicated as the extension of the transferred file - .enc
ZTHUMBNAILLOCALPATH path to the file thumbnail in the device file system
ZTITLE file header
ZVCARDNAME hash of the media file, when transferring a file to a group, it may contain the sender ID
ZVCARDSTRING contains information about the type of file being transferred (for example, image/jpeg), when transferring a file to a group, it may contain the recipient identifier
ZXMPPTHUMBPATH path to file thumbnail in device file system
ZMEDIAKEY unknown, probably contains the key to decrypt the encrypted file.
ZMETADATA metadata of the transmitted message
Offset offset

Other interesting tables in database 'ChatStorage. sqlite' are:

  • 'ZWAPROFILEPUSHNAME' . Corresponds WhatsApp ID with the name of the contact;
  • ‘ZWAPROFILEPICTUREITEM’ . Corresponds WhatsApp ID with the contact's avatar;
  • ‘Z_PRIMARYKEY’ . The table contains general information about this database, such as the total number of stored messages, the total number of chats, and so on.

Also, when examining WhatsApp on an iOS mobile device, pay attention to the following files:

  • File ‘BackedUpKeyValue.sqlite’ . Contains cryptographic keys and other data that are necessary to identify the account owner. Located along the way: /private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/ .
  • File ‘ContactsV2.sqlite’ . Contains information about the user's contacts, such as full name, phone number, contact status (as text), WhatsApp ID, etc. Located along the path: /private/var/mobile/Applications/group. net.whatsapp.WhatsApp.shared/ .
  • File ‘consumer_version’ . Contains the version number of the installed WhatsApp application. Located along the way: /private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/ .
  • File ‘current_wallpaper.jpg’ . Contains the current WhatsApp background wallpaper. Located along the path: /private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/ . In older versions of the application, the file ‘wallpaper’ is used, which is located along the path: '/private/var/mobile/Applications/net.whatsapp.WhatsApp/Documents/' .
  • File ‘blockedcontacts.dat’ . Contains information about blocked contacts. Located along the path: /private/var/mobile/Applications/net.whatsapp.WhatsApp/Documents/ .
  • File ‘pw.dat’ . Contains an encrypted password. Located along the path: ‘/private/var/mobile/Applications/net.whatsapp.WhatsApp/Library/’ .
  • File ‘net.whatsapp.WhatsApp.plist’ (or file ‘group.net.whatsapp.WhatsApp.shared.plist’ ). Contains information about the WhatsApp account profile. The file is located along the path: ‘/private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/Library/Preferences/’ .

The contents of the ‘group.net.whatsapp.WhatsApp.shared.plist’ file

You also need to pay attention to the following directories:

  • Directory ‘/private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/Media/Profile/’ . Contains thumbnails of contacts, groups (files with extension .thumb ), contact avatars, WhatsApp account owner avatar (file ‘Photo.jpg’ ).
  • Directory ‘/private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/ Message/Media/’ . Contains multimedia files and their thumbnails
  • Directory ‘/private/var/mobile/Applications/net. whatsapp.WhatsApp/Documents/’ . Contains the program operation log (file ‘calls.log’ ) and backup copies of the program operation logs (file ‘calls.backup.log’ ).
  • Directory ‘/private/var/mobile/Applications/group.net.whatsapp.WhatsApp.shared/stickers/’ . Contains stickers (files in ‘.webp’ format).
  • Directory ‘/private/var/mobile/Applications/net.whatsapp.WhatsApp/Library/Logs/’ . Contains program logs.

WhatsApp artifacts on Windows


WhatsApp artifacts on Windows

can be found in several places. First of all, these are directories containing executable and auxiliary files of the program (for Windows 8/10):

  • ‘C:\Program Files (x86)\WhatsApp\’
  • ‘C:\Users\%User profile%\ AppData\Local\WhatsApp\’
  • ‘C:\Users\%User profile%\ AppData\Local\VirtualStore\ Program Files (x86)\WhatsApp\’

The directory ‘C:\Users\%User profile%\ AppData\Local\WhatsApp\’ contains the log file ‘SquirrelSetup. log’ , which contains information about checking for updates and installing the program.

The directory ‘C:\Users\%User profile%\ AppData\Roaming\WhatsApp\’ contains several subdirectories:

File ‘main-process.log’ contains information about the operation of WhatsApp.

Subdirectory 'databases' contains a file 'Databases.db' , but this file does not contain any information about chats or contacts.

The most interesting from a forensic point of view are the files located in the directory ‘Cache’ . These are mainly files named ‘f_*******’ (where * is a number from 0 to 9) containing encrypted multimedia files and documents, but there are also unencrypted files among them. Of particular interest are the files 'data_0' , 'data_1' , 'data_2' , 'data_3' , located in the same subdirectory. Files 'data_0' , 'data_1' , 'data_3' contain external links to the transmitted encrypted multimedia files and documents.

Example of information contained in file 'data_1'

Also file 'data_3' may contain graphic files.

File 'data_2' contains contact avatars (can be retrieved by searching through file titles).

Avatars contained in the file ‘data_2’ :

Thus, chats themselves cannot be found in the computer's memory, but you can find:

  • multimedia files;
  • documents sent via WhatsApp;
  • account holder contact information.

WhatsApp Artifacts on MacOS

On MacOS, you can find types of WhatsApp artifacts similar to those found on Windows.

Program files are located in the following directories:


Sources

  1. Forensic analysis of WhatsApp Messenger on Android smartphones, by Cosimo Anglano, 2014.
  2. Whatsapp Forensics: Eksplorasi sistem berkas dan basis data pada aplikasi Android dan iOS by Ahmad Pratama, 2014.

In the following articles in this series:

Decryption of WhatsApp encrypted databases

An article that will provide information on how to generate a WhatsApp encryption key and give practical examples showing how to decrypt encrypted WhatsApp databases. applications.


Retrieve WhatsApp data from cloud storage

An article that will explain what WhatsApp data is stored in the clouds and describe methods for extracting this data from cloud storage.


WhatsApp data extraction: practical examples

An article that will step by step describe which programs and how to extract WhatsApp data from various devices.

Group-IB knows everything about cybercrime, but they tell you the most interesting things.

Action-packed Telegram channel (https://t.me/Group_IB) about information security, hackers and cyberattacks, hacktivists and Internet pirates. Investigation of sensational cybercrimes step by step, practical cases using Group-IB technologies and, of course, recommendations on how not to become a victim on the Internet.

Group-IB YouTube channel
Group-IB photo feed on Instagram www.instagram.com/group_ib
Brief news on Twitter twitter.com/GroupIB

Group-IB is a leading provider of cyber-attack detection, prevention, fraud detection, and online intellectual property protection, headquartered in Singapore.

How to read encrypted WhatsApp messages on Android without keys

One of the reasons WhatsApp is becoming one of the most popular messaging services is because of its powerful security feature. It encrypts messages end to end, so the only people who can read those messages are the sender and the recipient - unless someone else can open the sender's or recipient's phones.

But sometimes even the phone owner cannot access their phones due to technical failures. If you can't access your own phone, can you still read encrypted WhatsApp messages?

  • Part 1: WhatsApp message encryption types
  • Part 2: How to decrypt whatsapp crypt12/8 database without keys?
  • Bonus tip: How to read deleted whatsapp messages on android without root?

Part 1: WhatsApp Message Encryption Types

In September 2012, WhatsApp introduced data encryption as a security feature. This step is taken to prevent session hijacking and packet sniffing that often happened in the past. And WhatsApp uses the forms crypt2, crypt5, crypt7, crypt8 and crypt12 to encrypt all data. This means that hacking the database files to read all chat messages has become almost impossible.

But there are tricks you can use to decrypt the database without keys and supporting files. You can use this method to access your conversations.

Part 2: How to decrypt whatsapp crypt12/8 database without keys?

The trick below works when reading encrypted WhatsApp messages on Android devices. Before you start, you need to create a copy of your WhatsApp database to make sure you haven't destroyed the original file.

To do this, open Android Explorer or a file browser. Then create a new folder or SD card. Then navigate to this location on your SD card: /WhatsApp/Databases/msgstore.db.crypt. Then copy the msgstore.db.crypt files to the new folder you just created.

  • Method 1. (For rooted devices) Decrypt crypt12 WhatsApp database without key on PC
  • Method 2. (For non-rooted devices) Read crypt12 WhatsApp database without key on PC

Method 1. (For rooted devices) Decrypt crypt12 WhatsApp database without key on PC

Whatsapp encrypts all data in .crypt5/7/8/12 format. But on a rooted Android phone, you can easily decrypt and read these encrypted messages with Whatsapp Viewer.

  • Locate the backup file of your WhatsApp message, such as msgstore.db.crypt 12, in the device storage / WhatsApp /Database.

  • Find your key file containing the decryption key to decrypt the encrypted file from /data/data/com.whatsapp/files/key.

  • Connect your phone to your computer and copy the database file named msgstore.db.crypt. and the key file to your computer.
  • Download and install Whatsapp Viewer on your computer. Open Whatsapp Viewer and navigate to File > Decrypt .crypt12.

  • Now you need to download the database file and the key file. Click on the "..." button next to the database file field to import it and do the same for the key file. After that click OK to decrypt the database file.

  • When you see the message "Database has been decrypted to msgstore.decrypted.db", decryption is complete. You will find a file called "msgstore.decrypted.db" in the folder where you stored the database file and the key file.

  • Launch the WhatsApp viewer again and click File > Open. Click on the "..." button to import the msgstore.decrypted.db file and click OK.

  • You can now select a mobile phone number in the right panel and view its chats in the left panel. You can export it in .text / .html / .json format if you like.

Method 2. (For devices without root) Read crypt12 WhatsApp database without key on PC

To decrypt the database into something human-readable, we can use the help of one of the decryption apps available on the Google Play store. The recommended application that you can use is Omni-crypt. It can easily decrypt whatsapp database without root. Please note that in order to decrypt a database above crypt6 version, you will need whatsapp-key-db-extractor to extract the encryption key.

  • Connect your Android phone to your computer. Download and install Omni-crypt on your Android device.
  • Download whatsapp-key-db-extractor to your computer from github. com.
  • Open the WhatsApp-Key-DB-Extractor folder and find the file named WhatsAppKeyDBExtract.sh . Right-click on it and select Properties.

  • On the Permissions tab, select the Allow executing file as a program check box.

  • After that run WhatsAppKeyDBExtract.sh file in Terminal on Mac.

  • When you are prompted to unlock your device and confirm the backup operation, open your Android phone and tap BACKUP MY DATA.

  • Wait for WhatsAppKeyDBExtract to restore WhatsApp and press Enter when finished to exit the Terminal.

  • Now open Omni-crypt on your Android phone. Click on ENABLE CRYPT BACKUP 6-12 and then click on WHATSAPP database DECRYPTION.

  • Now open the Whatsapp-Key-DB-Extractor folder and navigate to the extracted folder. Here you can see the 'msgstore.db' and 'wa.db' files. ‘msgstore.db' stores all messages along with attachments, while 'wa.db' stores all information related to contacts.

  • Use a utility called "Whatsapp Viewer" and specify the path to "msgstore.db" and "wa.db". You will see all chat messages that have been retrieved from the database.

Bonus tip: How to read deleted whatsapp messages on android without root?

That's all about how to read encrypted WhatsApp messages. These steps are a bit complicated for normal users. If you are looking for a way to read deleted WhatsApp messages, Tenorshare UltData WhatsApp Recovery offers you an easy way to recover WhatsApp messages and contacts from Android without root.