whatstk package¶
- whatstk.analysis package
- whatstk.graph package
- whatstk.utils package
- whatstk.whatsapp package
whatstk.data module¶
Load sample chats.
Tthis module contains the links to currently online-available chats. For more details, please refer to the source code.
- class whatstk.data.Urls(POKEMON, LOREM, LOREM1, LOREM2, LOREM_2000)¶
Alias for field number 1
Alias for field number 2
Alias for field number 3
Alias for field number 4
Alias for field number 0
Alias for field number 1
Alias for field number 2
Alias for field number 3
- LOREM_2000¶
Alias for field number 4
Alias for field number 0
Module contents¶
Python wrapper and analysis tools for WhatsApp chats.
This library provides a powerful wrapper for multiple Languages and OS. In addition, analytics tools are provided.
Generate a variety of figures from your loaded chat. |
Load and process a WhatsApp chat file. |
Alias for |
Load chat as a DataFrame. |
- class whatstk.FigureBuilder(df: Optional[DataFrame] = None, chat: Optional[BaseChat] = None)[source]¶
Generate a variety of figures from your loaded chat.
Integrates feature extraction and visualization logic to automate data plots.
Note: Either
must be provided.- Parameters
df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
is ignored.chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
is None.
Get mapping between user and color.
Get list with users available in given chat.
Plot number of user interventions over time.
([title])Get the flow of message responses.
([norm, title])Get the response matrix heatmap.
([title, xlabel])Generate figure with boxplots of each user's message length.
- property user_color_mapping: Dict[str, str]¶
Get mapping between user and color.
Each user is assigned a color automatically, so that this color is preserved for that user in all to-be-generated plots.
- Returns
dict – Mapping from username to color (rgb).
- user_interventions_count_linechart(date_mode: str = 'date', msg_length: bool = False, cumulative: bool = False, all_users: bool = False, title: str = 'User interventions count', xlabel: str = 'Date/Time') Figure [source]¶
Plot number of user interventions over time.
- Parameters
date_mode (str, optional) –
Choose mode to group interventions by. Defaults to
. Available modes are:'date'
: Grouped by particular date (year, month and day).'hour'
: Grouped by hours.'month'
: Grouped by months.'weekday'
: Grouped by weekday (i.e. monday, tuesday, …, sunday).'hourweekday'
: Grouped by weekday and hour.
msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
cumulative (bool, optional) – Set to True to obtain commulative counts.
all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
title (str, optional) – Title for plot. Defaults to “User interventions count”.
xlabel (str, optional) – x-axis label title. Defaults to “Date/Time”.
- Returns
plotly.graph_objs.Figure – Plotly Figure.
See also
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_interventions_count_linechart(cumulative=True) >>> plot(fig)
- user_message_responses_flow(title: str = 'Message flow') Figure [source]¶
Get the flow of message responses.
A response from user X to user Y happens if user X sends a message right after a message from user Y.
Uses a Sankey diagram.
- Parameters
title (str, optional) – Title for plot. Defaults to “Message flow”.
- Returns
plotly.graph_objs.Figure – Plotly Figure.
See also
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_message_responses_flow() >>> plot(fig)
- user_message_responses_heatmap(norm: str = 'absolute', title: str = 'Response matrix') Figure [source]¶
Get the response matrix heatmap.
A response from user X to user Y happens if user X sends a message right after a message from user Y.
- Parameters
norm (str, optional) –
Specifies the type of normalization used for reponse count. Can be:
: Absolute count of messages.'joint'
: Normalized by total number of messages sent by all users.'sender'
: Normalized per sender by total number of messages sent by user.'receiver'
: Normalized per receiver by total number of messages sent by user.
title (str, optional) – Title for plot. Defaults to “Response matrix”.
- Returns
plotly.graph_objs.Figure – Plotly Figure.
See also
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_message_responses_heatmap() >>> plot(fig)
- user_msg_length_boxplot(title: str = 'User message length', xlabel: str = 'User') Figure [source]¶
Generate figure with boxplots of each user’s message length.
- Parameters
title (str, optional) – Title for plot. Defaults to “User message length”.
xlabel (str, optional) – x-axis label title. Defaults to “User”.
- Returns
dict – Dictionary with data and layout. Plotly compatible.
See also
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_msg_length_boxplot() >>> plot(fig)
- class whatstk.WhatsAppChat(df: DataFrame)[source]¶
Load and process a WhatsApp chat file.
- Parameters
df (pandas.DataFrame) – Chat.
This simple example loads a chat using
. Once loaded, we can access its attributedf
, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> chat.df.head(5) date username message 0 2016-08-06 13:23:00 Ash Ketchum Hey guys! 1 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 3 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 4 2016-08-06 14:30:00 Misty Definetly
Optionally, you can use the argument extra_metadata to add additional metadata to the chat:
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON, extra_metadata=True) >>> chat.name 'Pokemon Chat' >>> chat.df_system date message 0 2016-04-15 15:04:00 Messages and calls are end-to-end encrypted. N... >>> chat.df.head() date username message 0 2016-08-06 13:23:00 Ash Ketchum Hey guys! 1 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 3 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 4 2016-08-06 14:30:00 Misty Definetly
(filepath[, extra_metadata])Create an instance from a chat text file.
(filepaths[, auto_header, ...])Load a WhatsAppChat instance from multiple sources.
(filepath[, hformat, encoding])Export chat to a text file.
(filepath[, hformat, encoding])Export chat to a zip file.
- classmethod from_source(filepath: str, extra_metadata: Optional[bool] = None, **kwargs: Any) WhatsAppChat [source]¶
Create an instance from a chat text file.
- Parameters
filepath (str) –
Path to the file. Accepted sources are:
Local file, e.g. ‘path/to/file.txt’ or ‘path/to/file.zip’ (iOS).
URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run
**kwargs – Refer to the docs from
for details on additional arguments.extra_metadata (bool) – This is experimental. If True, additional metadata will be added to the DataFrame. This includes class attributes such as chat.name, chat.df_system (DataFrame with only system messages). Note that this attribute only works on group chats.
- Returns
WhatsAppChat – Class instance with loaded and parsed chat.
- classmethod from_sources(filepaths: str, auto_header: Optional[bool] = None, hformat: Optional[str] = None, encoding: str = 'utf-8') WhatsAppChat [source]¶
Load a WhatsAppChat instance from multiple sources.
- Parameters
filepaths (list) – List with filepaths.
auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to perform automatic header detection for all files. If False,
is required.hformat (list, optional) – List with the header format to be used for each file. The list must be of length equal to
. A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’.encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- Returns
WhatsAppChat – Class instance with loaded and parsed chat.
See also
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
).>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.LOREM1 >>> filepath_2 = whatsapp_urls.LOREM2 >>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2]) >>> chat.df.head(5) date username message 0 2019-10-20 10:16:00 John Laborum sed excepteur id eu cillum sunt ut. 1 2019-10-20 11:15:00 Mary Ad aliquip reprehenderit proident est irure mo... 2 2019-10-20 12:16:00 +1 123 456 789 Nostrud adipiscing ex enim reprehenderit minim... 3 2019-10-20 12:57:00 +1 123 456 789 Deserunt proident laborum exercitation ex temp... 4 2019-10-20 17:28:00 John Do ex dolor consequat tempor et ex.
- to_txt(filepath: str, hformat: Optional[str] = None, encoding: str = 'utf8') None [source]¶
Export chat to a text file.
Usefull to export the chat to different formats (i.e. using different hformats).
- Parameters
filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- to_zip(filepath: str, hformat: Optional[str] = None, encoding: str = 'utf8') None [source]¶
Export chat to a zip file.
Usefull to export the chat to different formats (i.e. using different hformats).
- Parameters
filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- whatstk.df_from_txt_whatsapp(filepath: str, **kwargs: Any) DataFrame [source]¶
Alias for
- whatstk.df_from_whatsapp(filepath: str, auto_header: bool = True, hformat: Optional[str] = None, encoding: str = 'utf-8', message_type: Optional[bool] = None) DataFrame [source]¶
Load chat as a DataFrame.
- Parameters
filepath (str) –
Path to the file. Accepted sources are:
Local file, e.g. ‘path/to/file.txt’ OR ‘path/to/_chat.zip’ (e.g. iOS export).
URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run
auto_header (bool, optional) – Detect header automatically. If False,
is required.hformat (str, optional) –
Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'
. Use following keywords:'%y'
: for year ('%Y'
is equivalent).'%m'
: for month.'%d'
: for day.'%H'
: for 24h-hour.'%I'
: for 12h-hour.'%M'
: for minutes.'%S'
: for seconds.'%P'
: for “PM”/”AM” or “p.m.”/”a.m.” characters.'%name'
: for the username.
Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the
'hformat='%d/%m/%y, %H:%M - %name:'
.Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have
hformat='%y-%m-%d, %I:%M %P - %name:'
.encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
message_type (bool, optional) – Label for the message type. Can be ‘user’ or ‘system’, based on who sent the message.
- Returns
WhatsAppChat – Class instance with loaded and parsed chat.
Read a chat
>>> from whatstk import df_from_whatsapp >>> from whatstk.data import whatsapp_urls >>> df = df_from_whatsapp(filepath=whatsapp_urls.LOREM) >>> df.head(5) date username message message_type 0 2020-01-15 02:22:56 Mary Nostrud exercitation magna id. system 1 2020-01-15 03:33:01 Mary Non elit irure irure pariatur exercitation. 🇩🇰 user 2 2020-01-15 04:18:42 +1 123 456 789 Exercitation esse lorem reprehenderit ut ex ve... user 3 2020-01-15 06:05:14 Giuseppe Aliquip dolor reprehenderit voluptate dolore e... user 4 2020-01-15 06:56:00 Mary Ullamco duis et commodo exercitation. user
Read a chat, labelling each message as ‘user’ or ‘system’. ‘system’ messages are those sent by the chat itself (creation of chat, etc.)
>>> from whatstk import df_from_whatsapp >>> from whatstk.data import whatsapp_urls >>> df = df_from_whatsapp(filepath=whatsapp_urls.POKEMON, message_type=True) >>> df.head() date username message message_type 0 2016-04-15 15:04:00 Pokemon Chat Messages and calls are end-to-end encrypted. N... system 1 2016-08-06 13:23:00 Ash Ketchum Hey guys! user 2 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! user 3 2016-08-06 13:30:00 Misty Hey guys! Long time since heard anything from you user