whatstk package¶
Subpackages¶
- whatstk.analysis package
- whatstk.graph package
- whatstk.utils package
- whatstk.whatsapp package
Submodules¶
whatstk.data module¶
Load sample chats.
Tthis module contains the links to currently online-available chats. For more details, please refer to the source code.
Classes:
|
- class whatstk.data.Urls(POKEMON, LOREM, LOREM1, LOREM2, LOREM_2000)¶
Bases:
tuple
Attributes:
Alias for field number 1
Alias for field number 2
Alias for field number 3
Alias for field number 4
Alias for field number 0
- LOREM¶
Alias for field number 1
- LOREM1¶
Alias for field number 2
- LOREM2¶
Alias for field number 3
- LOREM_2000¶
Alias for field number 4
- POKEMON¶
Alias for field number 0
Module contents¶
Python wrapper and analysis tools for WhatsApp chats.
This library provides a powerful wrapper for multiple Languages and OS. In addition, analytics tools are provided.
Classes:
|
Generate a variety of figures from your loaded chat. |
|
Load and process a WhatsApp chat file. |
Functions:
|
Alias for |
|
Load chat as a DataFrame. |
- class whatstk.FigureBuilder(df: Optional[DataFrame] = None, chat: Optional[BaseChat] = None)[source]¶
Bases:
object
Generate a variety of figures from your loaded chat.
Integrates feature extraction and visualization logic to automate data plots.
Note: Either
df
orchat
must be provided.- Parameters
df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
chat
is ignored.chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
df
is None.
Attributes:
Get mapping between user and color.
Get list with users available in given chat.
Methods:
Plot number of user interventions over time.
user_message_responses_flow
([title])Get the flow of message responses.
user_message_responses_heatmap
([norm, title])Get the response matrix heatmap.
user_msg_length_boxplot
([title, xlabel])Generate figure with boxplots of each user's message length.
- property user_color_mapping: Dict[str, str]¶
Get mapping between user and color.
Each user is assigned a color automatically, so that this color is preserved for that user in all to-be-generated plots.
- Returns
dict – Mapping from username to color (rgb).
- user_interventions_count_linechart(date_mode: str = 'date', msg_length: bool = False, cumulative: bool = False, all_users: bool = False, title: str = 'User interventions count', xlabel: str = 'Date/Time') Figure [source]¶
Plot number of user interventions over time.
- Parameters
date_mode (str, optional) –
Choose mode to group interventions by. Defaults to
'date'
. Available modes are:'date'
: Grouped by particular date (year, month and day).'hour'
: Grouped by hours.'month'
: Grouped by months.'weekday'
: Grouped by weekday (i.e. monday, tuesday, …, sunday).'hourweekday'
: Grouped by weekday and hour.
msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
cumulative (bool, optional) – Set to True to obtain commulative counts.
all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
title (str, optional) – Title for plot. Defaults to “User interventions count”.
xlabel (str, optional) – x-axis label title. Defaults to “Date/Time”.
- Returns
plotly.graph_objs.Figure – Plotly Figure.
See also
Example
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_interventions_count_linechart(cumulative=True) >>> plot(fig)
- user_message_responses_flow(title: str = 'Message flow') Figure [source]¶
Get the flow of message responses.
A response from user X to user Y happens if user X sends a message right after a message from user Y.
Uses a Sankey diagram.
- Parameters
title (str, optional) – Title for plot. Defaults to “Message flow”.
- Returns
plotly.graph_objs.Figure – Plotly Figure.
See also
Example
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_message_responses_flow() >>> plot(fig)
- user_message_responses_heatmap(norm: str = 'absolute', title: str = 'Response matrix') Figure [source]¶
Get the response matrix heatmap.
A response from user X to user Y happens if user X sends a message right after a message from user Y.
- Parameters
norm (str, optional) –
Specifies the type of normalization used for reponse count. Can be:
'absolute'
: Absolute count of messages.'joint'
: Normalized by total number of messages sent by all users.'sender'
: Normalized per sender by total number of messages sent by user.'receiver'
: Normalized per receiver by total number of messages sent by user.
title (str, optional) – Title for plot. Defaults to “Response matrix”.
- Returns
plotly.graph_objs.Figure – Plotly Figure.
See also
Example
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_message_responses_heatmap() >>> plot(fig)
- user_msg_length_boxplot(title: str = 'User message length', xlabel: str = 'User') Figure [source]¶
Generate figure with boxplots of each user’s message length.
- Parameters
title (str, optional) – Title for plot. Defaults to “User message length”.
xlabel (str, optional) – x-axis label title. Defaults to “User”.
- Returns
dict – Dictionary with data and layout. Plotly compatible.
See also
Example
>>> from whatstk import WhatsAppChat >>> from whatstk.graph import plot, FigureBuilder >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM) >>> fig = FigureBuilder(chat=chat).user_msg_length_boxplot() >>> plot(fig)
- class whatstk.WhatsAppChat(df: DataFrame)[source]¶
Bases:
BaseChat
Load and process a WhatsApp chat file.
- Parameters
df (pandas.DataFrame) – Chat.
Example
This simple example loads a chat using
WhatsAppChat
. Once loaded, we can access its attributedf
, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> chat.df.head(5) date username message 0 2016-08-06 13:23:00 Ash Ketchum Hey guys! 1 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 3 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 4 2016-08-06 14:30:00 Misty Definetly
Optionally, you can use the argument extra_metadata to add additional metadata to the chat:
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON, extra_metadata=True) >>> chat.name 'Pokemon Chat' >>> chat.df_system date message 0 2016-04-15 15:04:00 Messages and calls are end-to-end encrypted. N... >>> chat.df.head() date username message 0 2016-08-06 13:23:00 Ash Ketchum Hey guys! 1 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 3 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 4 2016-08-06 14:30:00 Misty Definetly
Methods:
from_source
(filepath[, extra_metadata])Create an instance from a chat text file.
from_sources
(filepaths[, auto_header, ...])Load a WhatsAppChat instance from multiple sources.
to_txt
(filepath[, hformat, encoding])Export chat to a text file.
to_zip
(filepath[, hformat, encoding])Export chat to a zip file.
- classmethod from_source(filepath: str, extra_metadata: Optional[bool] = None, **kwargs: Any) WhatsAppChat [source]¶
Create an instance from a chat text file.
- Parameters
filepath (str) –
Path to the file. Accepted sources are:
Local file, e.g. ‘path/to/file.txt’ or ‘path/to/file.zip’ (iOS).
URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run
gdrive_init
.
**kwargs – Refer to the docs from
df_from_whatsapp
for details on additional arguments.extra_metadata (bool) – This is experimental. If True, additional metadata will be added to the DataFrame. This includes class attributes such as chat.name, chat.df_system (DataFrame with only system messages). Note that this attribute only works on group chats.
- Returns
WhatsAppChat – Class instance with loaded and parsed chat.
- classmethod from_sources(filepaths: str, auto_header: Optional[bool] = None, hformat: Optional[str] = None, encoding: str = 'utf-8') WhatsAppChat [source]¶
Load a WhatsAppChat instance from multiple sources.
- Parameters
filepaths (list) – List with filepaths.
auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to perform automatic header detection for all files. If False,
hformat
is required.hformat (list, optional) – List with the header format to be used for each file. The list must be of length equal to
len(filenames)
. A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’.encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- Returns
WhatsAppChat – Class instance with loaded and parsed chat.
See also
Example
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
whatstk.data
).>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.LOREM1 >>> filepath_2 = whatsapp_urls.LOREM2 >>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2]) >>> chat.df.head(5) date username message 0 2019-10-20 10:16:00 John Laborum sed excepteur id eu cillum sunt ut. 1 2019-10-20 11:15:00 Mary Ad aliquip reprehenderit proident est irure mo... 2 2019-10-20 12:16:00 +1 123 456 789 Nostrud adipiscing ex enim reprehenderit minim... 3 2019-10-20 12:57:00 +1 123 456 789 Deserunt proident laborum exercitation ex temp... 4 2019-10-20 17:28:00 John Do ex dolor consequat tempor et ex.
- to_txt(filepath: str, hformat: Optional[str] = None, encoding: str = 'utf8') None [source]¶
Export chat to a text file.
Usefull to export the chat to different formats (i.e. using different hformats).
- Parameters
filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- to_zip(filepath: str, hformat: Optional[str] = None, encoding: str = 'utf8') None [source]¶
Export chat to a zip file.
Usefull to export the chat to different formats (i.e. using different hformats).
- Parameters
filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- whatstk.df_from_txt_whatsapp(filepath: str, **kwargs: Any) DataFrame [source]¶
Alias for
df_from_whatsapp
.
- whatstk.df_from_whatsapp(filepath: str, auto_header: bool = True, hformat: Optional[str] = None, encoding: str = 'utf-8', message_type: Optional[bool] = None) DataFrame [source]¶
Load chat as a DataFrame.
- Parameters
filepath (str) –
Path to the file. Accepted sources are:
Local file, e.g. ‘path/to/file.txt’ OR ‘path/to/_chat.zip’ (e.g. iOS export).
URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run
gdrive_init
.
auto_header (bool, optional) – Detect header automatically. If False,
hformat
is required.hformat (str, optional) –
Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'
. Use following keywords:'%y'
: for year ('%Y'
is equivalent).'%m'
: for month.'%d'
: for day.'%H'
: for 24h-hour.'%I'
: for 12h-hour.'%M'
: for minutes.'%S'
: for seconds.'%P'
: for “PM”/”AM” or “p.m.”/”a.m.” characters.'%name'
: for the username.
Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the
'hformat='%d/%m/%y, %H:%M - %name:'
.Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have
hformat='%y-%m-%d, %I:%M %P - %name:'
.encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
message_type (bool, optional) – Label for the message type. Can be ‘user’ or ‘system’, based on who sent the message.
- Returns
WhatsAppChat – Class instance with loaded and parsed chat.
Example
Read a chat
>>> from whatstk import df_from_whatsapp >>> from whatstk.data import whatsapp_urls >>> df = df_from_whatsapp(filepath=whatsapp_urls.LOREM) >>> df.head(5) date username message message_type 0 2020-01-15 02:22:56 Mary Nostrud exercitation magna id. system 1 2020-01-15 03:33:01 Mary Non elit irure irure pariatur exercitation. 🇩🇰 user 2 2020-01-15 04:18:42 +1 123 456 789 Exercitation esse lorem reprehenderit ut ex ve... user 3 2020-01-15 06:05:14 Giuseppe Aliquip dolor reprehenderit voluptate dolore e... user 4 2020-01-15 06:56:00 Mary Ullamco duis et commodo exercitation. user
Read a chat, labelling each message as ‘user’ or ‘system’. ‘system’ messages are those sent by the chat itself (creation of chat, etc.)
>>> from whatstk import df_from_whatsapp >>> from whatstk.data import whatsapp_urls >>> df = df_from_whatsapp(filepath=whatsapp_urls.POKEMON, message_type=True) >>> df.head() date username message message_type 0 2016-04-15 15:04:00 Pokemon Chat Messages and calls are end-to-end encrypted. N... system 1 2016-08-06 13:23:00 Ash Ketchum Hey guys! user 2 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! user 3 2016-08-06 13:30:00 Misty Hey guys! Long time since heard anything from you user