whatstk package

Subpackages

Submodules

whatstk.data module

Load sample chats.

Tthis module contains the links to currently online-available chats. For more details, please refer to the source code.

Classes:

Urls(POKEMON, LOREM, LOREM1, LOREM2, LOREM_2000)

class whatstk.data.Urls(POKEMON, LOREM, LOREM1, LOREM2, LOREM_2000)

Bases: tuple

Attributes:

`LOREM`	Alias for field number 1
`LOREM1`	Alias for field number 2
`LOREM2`	Alias for field number 3
`LOREM_2000`	Alias for field number 4
`POKEMON`	Alias for field number 0

LOREM: Alias for field number 1

LOREM1: Alias for field number 2

LOREM2: Alias for field number 3

LOREM_2000: Alias for field number 4

POKEMON: Alias for field number 0

Module contents

Python wrapper and analysis tools for WhatsApp chats.

This library provides a powerful wrapper for multiple Languages and OS. In addition, analytics tools are provided.

Classes:

`FigureBuilder`([df, chat])	Generate a variety of figures from your loaded chat.
`WhatsAppChat`(df)	Load and process a WhatsApp chat file.

Functions:

`df_from_txt_whatsapp`(filepath, **kwargs)	Alias for `df_from_whatsapp`.
`df_from_whatsapp`(filepath[, auto_header, ...])	Load chat as a DataFrame.

class whatstk.FigureBuilder(df: DataFrame | None = None, chat: BaseChat = None)[source]

Bases: object

Generate a variety of figures from your loaded chat.

Integrates feature extraction and visualization logic to automate data plots.

Note: Either df or chat must be provided.

Parameters:

df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.
chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.

Attributes:

`user_color_mapping`	Get mapping between user and color.
`usernames`	Get list with users available in given chat.

Methods:

`user_interventions_count_linechart`([...])	Plot number of user interventions over time.
`user_message_responses_flow`([title])	Get the flow of message responses.
`user_message_responses_heatmap`([norm, title])	Get the response matrix heatmap.
`user_msg_length_boxplot`([title, xlabel])	Generate figure with boxplots of each user's message length.

property user_color_mapping: Dict[str, str]

Get mapping between user and color.

Each user is assigned a color automatically, so that this color is preserved for that user in all to-be-generated plots.

Returns:: dict – Mapping from username to color (rgb).

user_interventions_count_linechart(date_mode: str = 'date', msg_length: bool = False, cumulative: bool = False, all_users: bool = False, title: str = 'User interventions count', xlabel: str = 'Date/Time') → Figure[source]

Plot number of user interventions over time.

Parameters:

date_mode (str, optional) –
Choose mode to group interventions by. Defaults to 'date'. Available modes are:
- 'date': Grouped by particular date (year, month and day).
- 'hour': Grouped by hours.
- 'month': Grouped by months.
- 'weekday': Grouped by weekday (i.e. monday, tuesday, …, sunday).
- 'hourweekday': Grouped by weekday and hour.
msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
cumulative (bool, optional) – Set to True to obtain commulative counts.
all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
title (str, optional) – Title for plot. Defaults to “User interventions count”.
xlabel (str, optional) – x-axis label title. Defaults to “Date/Time”.

Returns:

plotly.graph_objs.Figure – Plotly Figure.

See also

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_interventions_count_linechart(cumulative=True)
>>> plot(fig)

user_message_responses_flow(title: str = 'Message flow') → Figure[source]

Get the flow of message responses.

A response from user X to user Y happens if user X sends a message right after a message from user Y.

Uses a Sankey diagram.

Parameters:: title (str, optional) – Title for plot. Defaults to “Message flow”.
Returns:: plotly.graph_objs.Figure – Plotly Figure.

See also

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_message_responses_flow()
>>> plot(fig)

user_message_responses_heatmap(norm: str = 'absolute', title: str = 'Response matrix') → Figure[source]

Get the response matrix heatmap.

A response from user X to user Y happens if user X sends a message right after a message from user Y.

Parameters:

norm (str, optional) –
Specifies the type of normalization used for reponse count. Can be:
- 'absolute': Absolute count of messages.
- 'joint': Normalized by total number of messages sent by all users.
- 'sender': Normalized per sender by total number of messages sent by user.
- 'receiver': Normalized per receiver by total number of messages sent by user.
title (str, optional) – Title for plot. Defaults to “Response matrix”.

Returns:

plotly.graph_objs.Figure – Plotly Figure.

See also

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_message_responses_heatmap()
>>> plot(fig)

user_msg_length_boxplot(title: str = 'User message length', xlabel: str = 'User') → Figure[source]

Generate figure with boxplots of each user’s message length.

Parameters:

title (str, optional) – Title for plot. Defaults to “User message length”.
xlabel (str, optional) – x-axis label title. Defaults to “User”.

Returns:

dict – Dictionary with data and layout. Plotly compatible.

See also

fig_boxplot_msglen

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_msg_length_boxplot()
>>> plot(fig)

property usernames: BaseChat

Get list with users available in given chat.

Returns:: list – List with usernames available in chat DataFrame.

class whatstk.WhatsAppChat(df: DataFrame)[source]

Bases: BaseChat

Load and process a WhatsApp chat file.

Parameters:: df (pandas.DataFrame) – Chat.

Example

This simple example loads a chat using WhatsAppChat. Once loaded, we can access its attribute df, which contains the loaded chat as a DataFrame.

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> chat.df.head(5)
                 date     username                                            message
0 2016-08-06 13:23:00  Ash Ketchum                                          Hey guys!
1 2016-08-06 13:25:00        Brock              Hey Ash, good to have a common group!
2 2016-08-06 13:30:00        Misty  Hey guys! Long time haven't heard anything fro...
3 2016-08-06 13:45:00  Ash Ketchum  Indeed. I think having a whatsapp group nowada...
4 2016-08-06 14:30:00        Misty                                          Definetly

Optionally, you can use the argument extra_metadata to add additional metadata to the chat:

>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON, extra_metadata=True)
>>> chat.name
'Pokemon Chat'
>>> chat.df_system
                 date                                            message
0   2016-04-15 15:04:00     Messages and calls are end-to-end encrypted. N...
>>> chat.df.head()
                 date     username                                            message
0 2016-08-06 13:23:00  Ash Ketchum                                          Hey guys!
1 2016-08-06 13:25:00        Brock              Hey Ash, good to have a common group!
2 2016-08-06 13:30:00        Misty  Hey guys! Long time haven't heard anything fro...
3 2016-08-06 13:45:00  Ash Ketchum  Indeed. I think having a whatsapp group nowada...
4 2016-08-06 14:30:00        Misty                                          Definetly

Methods:

`from_source`(filepath[, extra_metadata])	Create an instance from a chat text file.
`from_sources`(filepaths[, auto_header, ...])	Load a WhatsAppChat instance from multiple sources.
`to_txt`(filepath[, hformat, encoding])	Export chat to a text file.
`to_zip`(filepath[, hformat, encoding])	Export chat to a zip file.

classmethod from_source(filepath: str, extra_metadata: bool | None = None, **kwargs: Any) → WhatsAppChat[source]

Create an instance from a chat text file.

Parameters:

filepath (str) –
Path to the file. Accepted sources are:
- Local file, e.g. ‘path/to/file.txt’ or ‘path/to/file.zip’ (iOS).
- URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
- Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.
**kwargs – Refer to the docs from df_from_whatsapp for details on additional arguments.
extra_metadata (bool) – This is experimental. If True, additional metadata will be added to the DataFrame. This includes class attributes such as chat.name, chat.df_system (DataFrame with only system messages). Note that this attribute only works on group chats.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

See also

df_from_whatsapp
WhatsAppChat.from_sources

classmethod from_sources(filepaths: str, auto_header: bool | None = None, hformat: str | None = None, encoding: str = 'utf-8') → WhatsAppChat[source]

Load a WhatsAppChat instance from multiple sources.

Parameters:

filepaths (list) – List with filepaths.
auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to perform automatic header detection for all files. If False, hformat is required.
hformat (list, optional) – List with the header format to be used for each file. The list must be of length equal to len(filenames). A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’.
encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

See also

WhatsAppChat.from_source
merge_chats

Example

Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code whatstk.data).

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.LOREM1
>>> filepath_2 = whatsapp_urls.LOREM2
>>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2])
>>> chat.df.head(5)
                 date        username                                            message
0 2019-10-20 10:16:00            John        Laborum sed excepteur id eu cillum sunt ut.
1 2019-10-20 11:15:00            Mary  Ad aliquip reprehenderit proident est irure mo...
2 2019-10-20 12:16:00  +1 123 456 789  Nostrud adipiscing ex enim reprehenderit minim...
3 2019-10-20 12:57:00  +1 123 456 789  Deserunt proident laborum exercitation ex temp...
4 2019-10-20 17:28:00            John                Do ex dolor consequat tempor et ex.

to_txt(filepath: str, hformat: str | None = None, encoding: str = 'utf8') → None[source]

Export chat to a text file.

Usefull to export the chat to different formats (i.e. using different hformats).

Parameters:

filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

to_zip(filepath: str, hformat: str | None = None, encoding: str = 'utf8') → None[source]

Export chat to a zip file.

Usefull to export the chat to different formats (i.e. using different hformats).

Parameters:

filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

whatstk.df_from_txt_whatsapp(filepath: str, **kwargs: Any) → DataFrame[source]: Alias for df_from_whatsapp.

whatstk.df_from_whatsapp(filepath: str, auto_header: bool = True, hformat: str | None = None, encoding: str = 'utf-8', message_type: bool | None = None) → DataFrame[source]

Load chat as a DataFrame.

Parameters:

filepath (str) –
Path to the file. Accepted sources are:
- Local file, e.g. ‘path/to/file.txt’ OR ‘path/to/_chat.zip’ (e.g. iOS export).
- URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
- Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.
auto_header (bool, optional) – Detect header automatically. If False, hformat is required.
hformat (str, optional) –
Format of the header, e.g. '[%y-%m-%d %H:%M:%S] - %name:'. Use following keywords:
- '%y': for year ('%Y' is equivalent).
- '%m': for month.
- '%d': for day.
- '%H': for 24h-hour.
- '%I': for 12h-hour.
- '%M': for minutes.
- '%S': for seconds.
- '%P': for “PM”/”AM” or “p.m.”/”a.m.” characters.
- '%name': for the username.
Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the 'hformat='%d/%m/%y, %H:%M - %name:'.

Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have hformat='%y-%m-%d, %I:%M %P - %name:'.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
message_type (bool, optional) – Label for the message type. Can be ‘user’ or ‘system’, based on who sent the message.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

Example

Read a chat

>>> from whatstk import df_from_whatsapp
>>> from whatstk.data import whatsapp_urls
>>> df = df_from_whatsapp(filepath=whatsapp_urls.LOREM)
>>> df.head(5)
                 date        username                                            message    message_type
0 2020-01-15 02:22:56            Mary                     Nostrud exercitation magna id.          system
1 2020-01-15 03:33:01            Mary     Non elit irure irure pariatur exercitation. 🇩🇰            user
2 2020-01-15 04:18:42  +1 123 456 789  Exercitation esse lorem reprehenderit ut ex ve...            user
3 2020-01-15 06:05:14        Giuseppe  Aliquip dolor reprehenderit voluptate dolore e...            user
4 2020-01-15 06:56:00            Mary              Ullamco duis et commodo exercitation.            user

Read a chat, labelling each message as ‘user’ or ‘system’. ‘system’ messages are those sent by the chat itself (creation of chat, etc.)

>>> from whatstk import df_from_whatsapp
>>> from whatstk.data import whatsapp_urls
>>> df = df_from_whatsapp(filepath=whatsapp_urls.POKEMON, message_type=True)
>>> df.head()

                 date        username                                            message    message_type
0 2016-04-15 15:04:00    Pokemon Chat  Messages and calls are end-to-end encrypted. N...          system
1 2016-08-06 13:23:00     Ash Ketchum                                          Hey guys!            user
2 2016-08-06 13:25:00           Brock              Hey Ash, good to have a common group!            user
3 2016-08-06 13:30:00           Misty  Hey guys! Long time since heard anything from you            user

See also