whatstk package

Submodules

whatstk.data module

Load sample chats.

Tthis module contains the links to currently online-available chats. For more details, please refer to the source code.

Classes

Urls(POKEMON, LOREM, LOREM1, LOREM2, LOREM_2000)
class whatstk.data.Urls(POKEMON, LOREM, LOREM1, LOREM2, LOREM_2000)

Bases: tuple

Attributes

LOREM Alias for field number 1
LOREM1 Alias for field number 2
LOREM2 Alias for field number 3
LOREM_2000 Alias for field number 4
POKEMON Alias for field number 0
property LOREM

Alias for field number 1

property LOREM1

Alias for field number 2

property LOREM2

Alias for field number 3

property LOREM_2000

Alias for field number 4

property POKEMON

Alias for field number 0

Module contents

Python wrapper and analysis tools for WhatsApp chats.

This library provides a powerful wrapper for multiple Languages and OS. In addition, analytics tools are provided.

Classes

WhatsAppChat(df) Load and process a WhatsApp chat file.
FigureBuilder([df, chat]) Generate a variety of figures from your loaded chat.
class whatstk.WhatsAppChat(df)[source]

Bases: whatstk._chat.BaseChat

Load and process a WhatsApp chat file.

Parameters:df (pandas.DataFrame) – Chat.

Methods

from_source(filepath, **kwargs) Create an instance from a chat text file.
from_sources(filepaths[, auto_header, …]) Load a WhatsAppChat instance from multiple sources.
to_txt(filepath[, hformat]) Export chat to a text file.

Example

This simple example loads a chat using WhatsAppChat. Once loaded, we can access its attribute df, which contains the loaded chat as a DataFrame.

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> chat.df.head(5)
                 date     username                                            message
0 2016-08-06 13:23:00  Ash Ketchum                                          Hey guys!
1 2016-08-06 13:25:00        Brock              Hey Ash, good to have a common group!
2 2016-08-06 13:30:00        Misty  Hey guys! Long time haven't heard anything fro...
3 2016-08-06 13:45:00  Ash Ketchum  Indeed. I think having a whatsapp group nowada...
4 2016-08-06 14:30:00        Misty                                          Definetly
classmethod from_source(filepath, **kwargs)[source]

Create an instance from a chat text file.

Parameters:
  • filepath (str) – Path to the file. It can be a local file (e.g. ‘path/to/file.txt’) or an URL to a hosted file (e.g. ‘http://www.url.to/file.txt’)
  • **kwargs – Refer to the docs from df_from_txt_whatsapp for details on additional arguments.
Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

classmethod from_sources(filepaths, auto_header=None, hformat=None, encoding='utf-8')[source]

Load a WhatsAppChat instance from multiple sources.

Parameters:
  • filepaths (list) – List with filepaths.
  • auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to perform automatic header detection for all files. If False, hformat is required.
  • hformat (list, optional) – List with the header format to be used for each file. The list must be of length equal to len(filenames). A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’.
  • encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

Example

Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code whatstk.data).

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.LOREM1
>>> filepath_2 = whatsapp_urls.LOREM2
>>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2])
>>> chat.df.head(5)
                 date        username                                            message
0 2019-10-20 10:16:00            John        Laborum sed excepteur id eu cillum sunt ut.
1 2019-10-20 11:15:00            Mary  Ad aliquip reprehenderit proident est irure mo...
2 2019-10-20 12:16:00  +1 123 456 789  Nostrud adipiscing ex enim reprehenderit minim...
3 2019-10-20 12:57:00  +1 123 456 789  Deserunt proident laborum exercitation ex temp...
4 2019-10-20 17:28:00            John                Do ex dolor consequat tempor et ex.
to_txt(filepath, hformat=None)[source]

Export chat to a text file.

Usefull to export the chat to different formats (i.e. using different hformats).

Parameters:
  • filepath (str) – Name of the file to export (must be a local path).
  • hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
class whatstk.FigureBuilder(df=None, chat=None)[source]

Bases: object

Generate a variety of figures from your loaded chat.

Integrates feature extraction and visualization logic to automate data plots.

Note: Either df or chat must be provided.

Parameters:
  • df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.
  • chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.

Attributes

user_color_mapping Get mapping between user and color.
usernames Get list with users available in given chat.

Methods

user_interventions_count_linechart([…]) Plot number of user interventions over time.
user_message_responses_flow([title]) Get the flow of message responses.
user_message_responses_heatmap([norm, title]) Get the response matrix heatmap.
user_msg_length_boxplot([title, xlabel]) Generate figure with boxplots of each user’s message length.
property user_color_mapping

Get mapping between user and color.

Each user is assigned a color automatically, so that this color is preserved for that user in all to-be-generated plots.

Returns:dict – Mapping from username to color (rgb).
user_interventions_count_linechart(date_mode='date', msg_length=False, cumulative=False, all_users=False, title='User interventions count', xlabel='Date/Time', cummulative=None)[source]

Plot number of user interventions over time.

Parameters:
  • date_mode (str, optional) –

    Choose mode to group interventions by. Defaults to 'date'. Available modes are:

    • 'date': Grouped by particular date (year, month and day).
    • 'hour': Grouped by hours.
    • 'month': Grouped by months.
    • 'weekday': Grouped by weekday (i.e. monday, tuesday, …, sunday).
    • 'hourweekday': Grouped by weekday and hour.
  • msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
  • cumulative (bool, optional) – Set to True to obtain commulative counts.
  • all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
  • title (str, optional) – Title for plot. Defaults to “User interventions count”.
  • xlabel (str, optional) – x-axis label title. Defaults to “Date/Time”.
  • cummulative (bool, optional) – Deprecated, use cumulative.
Returns:

plotly.graph_objs.Figure – Plotly Figure.

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_interventions_count_linechart(cumulative=True)
>>> plot(fig)
user_message_responses_flow(title='Message flow')[source]

Get the flow of message responses.

A response from user X to user Y happens if user X sends a message right after a message from user Y.

Uses a Sankey diagram.

Parameters:title (str, optional) – Title for plot. Defaults to “Message flow”.
Returns:plotly.graph_objs.Figure – Plotly Figure.

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_message_responses_flow()
>>> plot(fig)
user_message_responses_heatmap(norm='absolute', title='Response matrix')[source]

Get the response matrix heatmap.

A response from user X to user Y happens if user X sends a message right after a message from user Y.

Parameters:
  • norm (str, optional) –

    Specifies the type of normalization used for reponse count. Can be:

    • 'absolute': Absolute count of messages.
    • 'joint': Normalized by total number of messages sent by all users.
    • 'sender': Normalized per sender by total number of messages sent by user.
    • 'receiver': Normalized per receiver by total number of messages sent by user.
  • title (str, optional) – Title for plot. Defaults to “Response matrix”.
Returns:

plotly.graph_objs.Figure – Plotly Figure.

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_message_responses_heatmap()
>>> plot(fig)
user_msg_length_boxplot(title='User message length', xlabel='User')[source]

Generate figure with boxplots of each user’s message length.

Parameters:
  • title (str, optional) – Title for plot. Defaults to “User message length”.
  • xlabel (str, optional) – x-axis label title. Defaults to “User”.
Returns:

dict – Dictionary with data and layout. Plotly compatible.

Example

>>> from whatstk import WhatsAppChat
>>> from whatstk.graph import plot, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM)
>>> fig = FigureBuilder(chat=chat).user_msg_length_boxplot()
>>> plot(fig)
property usernames

Get list with users available in given chat.

Returns:list – List with usernames available in chat DataFrame.