Accessing and Wrangling iMessage Data

We first need to find our data. Luckily, Mac users have easy access via the chat.db file saved on their computer.

First, make sure that your iPhone's messages are backed up to the cloud. On your phone go to settings>profile>iCloud and check that the switch next to Messages is on. On your Mac open the Messages app and go to preferences>iMessage and click "Enable Messages in iCloud". It's quite likely that both are already on from when you set up your devices.

Searching for chat.db on your Mac, you will find it illusive still. To open it, first go to system preferences>security and privacy>privacy. Now click 'Full Disk Access' and grant access to Terminal. Chat.db is now available!

Programming in python, I've found the sqlite3 library most effective for perusing the file. In your notebook or environment, run the function below.

db_filepath should be your path to chat.db. By default, this is likely /Users/yourname/Library/Messages/chat.db. chat_id is the arbitrary ID of the specific conversation you are hoping to access. Find it easiest by trying different values or by opening up the db file using a database browser (I use 'db Browser for Sqlite').

This function returns a pandas dataframe with messages and their metadata filling each row. We need to clean it further.

Here's a function to convert the time column into a datetime format we can work with:

Here's a function that will add a column "msg_type" to your dataframe identifying the type of message being sent. iMessage has reactions, polls, stickers, and many more.

In your df, "associated_message_guid" is used to identify the root message for reaction messages. Unfortunately, it's done using the convoluted guides associated with each message. Here's a function that will clean up the guide numbers via enumeration:

Now we can pull just the columns that we need:

Voila! Dataframe wrangled.