whatstk.whatsapp

WhatsApp parser.


whatstk.whatsapp.objects

Library WhatsApp objects.

Classes

WhatsAppChat(df) Load and process a WhatsApp chat file.
class whatstk.whatsapp.objects.WhatsAppChat(df)[source]

Bases: whatstk._chat.BaseChat

Load and process a WhatsApp chat file.

Parameters:df (pandas.DataFrame) – Chat.

Attributes

df Chat as DataFrame.
end_date Chat end date.
start_date Chat starting date.
users List with users.

Methods

from_source(filepath, **kwargs) Create an instance from a chat text file.
from_sources(filepaths[, auto_header, …]) Load a WhatsAppChat instance from multiple sources.
merge(chat[, rename_users]) Merge current instance with chat.
rename_users(mapping) Rename users.
to_csv(filepath) Save chat as csv.
to_txt(filepath[, hformat, encoding]) Export chat to a text file.

Example

This simple example loads a chat using WhatsAppChat. Once loaded, we can access its attribute df, which contains the loaded chat as a DataFrame.

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> chat.df.head(5)
                 date     username                                            message
0 2016-08-06 13:23:00  Ash Ketchum                                          Hey guys!
1 2016-08-06 13:25:00        Brock              Hey Ash, good to have a common group!
2 2016-08-06 13:30:00        Misty  Hey guys! Long time haven't heard anything fro...
3 2016-08-06 13:45:00  Ash Ketchum  Indeed. I think having a whatsapp group nowada...
4 2016-08-06 14:30:00        Misty                                          Definetly
property df

Chat as DataFrame.

Returns:pandas.DataFrame
property end_date

Chat end date.

Returns:datetime
classmethod from_source(filepath, **kwargs)[source]

Create an instance from a chat text file.

Parameters:
  • filepath (str) –

    Path to the file. Accepted sources are:

    • Local file, e.g. ‘path/to/file.txt’.
    • URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
    • Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.
  • **kwargs – Refer to the docs from df_from_txt_whatsapp for details on additional arguments.
Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

classmethod from_sources(filepaths, auto_header=None, hformat=None, encoding='utf-8')[source]

Load a WhatsAppChat instance from multiple sources.

Parameters:
  • filepaths (list) – List with filepaths.
  • auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to perform automatic header detection for all files. If False, hformat is required.
  • hformat (list, optional) – List with the header format to be used for each file. The list must be of length equal to len(filenames). A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’.
  • encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

Example

Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code whatstk.data).

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.LOREM1
>>> filepath_2 = whatsapp_urls.LOREM2
>>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2])
>>> chat.df.head(5)
                 date        username                                            message
0 2019-10-20 10:16:00            John        Laborum sed excepteur id eu cillum sunt ut.
1 2019-10-20 11:15:00            Mary  Ad aliquip reprehenderit proident est irure mo...
2 2019-10-20 12:16:00  +1 123 456 789  Nostrud adipiscing ex enim reprehenderit minim...
3 2019-10-20 12:57:00  +1 123 456 789  Deserunt proident laborum exercitation ex temp...
4 2019-10-20 17:28:00            John                Do ex dolor consequat tempor et ex.
merge(chat, rename_users=None)

Merge current instance with chat.

Parameters:
  • chat (WhatsAppChat) – Another chat.
  • rename_users (dict) – Dictionary mapping old names to new names. Example: {‘John’:[‘Jon’, ‘J’], ‘Ray’: [‘Raymond’]} will map ‘Jon’ and ‘J’ to ‘John’, and ‘Raymond’ to ‘Ray’. Note that old names must come as list (even if there is only one).
Returns:

WhatsAppChat – Merged chat.

Example

Merging two chats can become handy when you have exported a chat in different times with your phone and hence each exported file might contain data that is unique to that file.

In this example however, we merge files from different chats.

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.LOREM1
>>> filepath_2 = whatsapp_urls.LOREM2
>>> chat_1 = WhatsAppChat.from_source(filepath=filepath_1)
>>> chat_2 = WhatsAppChat.from_source(filepath=filepath_2)
>>> chat = chat_1.merge(chat_2)
rename_users(mapping)

Rename users.

This might be needed in multiple occations:

  • Change typos in user names stored in phone.
  • If a user appears multiple times with different usernames, group these under the same name (this might
    happen when multiple chats are merged).
Parameters:mapping (dict) – Dictionary mapping old names to new names, example: {‘John’: [‘Jon’, ‘J’], ‘Ray’: [‘Raymond’]} will map ‘Jon’ and ‘J’ to ‘John’, and ‘Raymond’ to ‘Ray’. Note that old names must come as list (even if there is only one).
Returns:pandas.DataFrame – DataFrame with users renamed according to mapping.
Raises:ValueError – Raised if mapping is not correct.

Examples

Load LOREM2 chat and rename users Maria and Maria2 to Mary.

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM2)
>>> chat.users
['+1 123 456 789', 'Giuseppe', 'John', 'Maria', 'Maria2']
>>> chat = chat.rename_users(mapping={'Mary': ['Maria', 'Maria2']})
>>> chat.users
['+1 123 456 789', 'Giuseppe', 'John', 'Mary']
property start_date

Chat starting date.

Returns:datetime
to_csv(filepath)

Save chat as csv.

Parameters:filepath (str) – Name of file.
to_txt(filepath, hformat=None, encoding='utf8')[source]

Export chat to a text file.

Usefull to export the chat to different formats (i.e. using different hformats).

Parameters:
  • filepath (str) – Name of the file to export (must be a local path).
  • hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
  • encoding (str, optional) –

    Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

property users

List with users.

Returns:list

whatstk.whatsapp.parser

Parser utils.

Functions

df_from_txt_whatsapp(filepath[, …]) Load chat as a DataFrame.
generate_regex(hformat) Generate regular expression from hformat.
whatstk.whatsapp.parser.df_from_txt_whatsapp(filepath, auto_header=True, hformat=None, encoding='utf-8')[source]

Load chat as a DataFrame.

Parameters:
  • filepath (str) –

    Path to the file. Accepted sources are:

    • Local file, e.g. ‘path/to/file.txt’.
    • URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
    • Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.
  • auto_header (bool, optional) – Detect header automatically. If False, hformat is required.
  • hformat (str, optional) –

    Format of the header, e.g. '[%y-%m-%d %H:%M:%S] - %name:'. Use following keywords:

    • '%y': for year ('%Y' is equivalent).
    • '%m': for month.
    • '%d': for day.
    • '%H': for 24h-hour.
    • '%I': for 12h-hour.
    • '%M': for minutes.
    • '%S': for seconds.
    • '%P': for “PM”/”AM” or “p.m.”/”a.m.” characters.
    • '%name': for the username.

    Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the 'hformat='%d/%m/%y, %H:%M - %name:'.

    Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have hformat='%y-%m-%d, %I:%M %P - %name:'.

  • encoding (str, optional) –

    Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

whatstk.whatsapp.parser.generate_regex(hformat)[source]

Generate regular expression from hformat.

Parameters:hformat (str) – Simplified syntax for the header, e.g. '%y-%m-%d, %H:%M:%S - %name:'.
Returns:str – Regular expression corresponding to the specified syntax.

Example

Generate regular expression corresponding to 'hformat=%y-%m-%d, %H:%M:%S - %name:'.

>>> from whatstk.whatsapp.parser import generate_regex
>>> generate_regex('%y-%m-%d, %H:%M:%S - %name:')
('(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (?P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(?
P<seconds>\\d{2}) - (?P<username>[^:]*): ', '(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (?
P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(?P<seconds>\\d{2}) - ')

whatstk.whatsapp.auto_header

Detect header from chat.

Functions

extract_header_from_text(text[, encoding]) Extract header from text.
whatstk.whatsapp.auto_header.extract_header_from_text(text, encoding='utf-8')[source]

Extract header from text.

Parameters:
  • text (str) – Loaded chat as string (whole text).
  • encoding (str) –

    Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

str – Format extracted. None if no header was extracted.

Example

Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code whatstk.data).

>>> from whatstk.whatsapp.parser import extract_header_from_text
>>> from urllib.request import urlopen
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.POKEMON
>>> with urlopen(filepath_1) as f:
...     text = f.read().decode('utf-8')
>>> extract_header_from_text(text)
'%d.%m.%y, %H:%M - %name:

whatstk.whatsapp.generation


whatstk.whatsapp.hformat

Header format utils.

Example: Check if header is available.

>>> from whatstk.utils.hformat import is_supported
>>> is_supported('%y-%m-%d, %H:%M:%S - %name:')
(True, True)

Functions

get_supported_hformats_as_dict([encoding]) Get dictionary with supported formats and relevant info.
get_supported_hformats_as_list([encoding]) Get list of supported formats.
is_supported(hformat[, encoding]) Check if header hformat is currently supported.
is_supported_verbose(hformat) Check if header hformat is currently supported (both manually and using auto_header).
whatstk.whatsapp.hformat.get_supported_hformats_as_dict(encoding='utf8')[source]

Get dictionary with supported formats and relevant info.

Parameters:encoding (str, optional) –

Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:dict
Dict with two elements:
  • format: Header format. All formats appearing are supported.
  • auto_header: 1 if auto_header is supported), 0 otherwise.
whatstk.whatsapp.hformat.get_supported_hformats_as_list(encoding='utf8')[source]

Get list of supported formats.

Returns:list – List with supported formats (as str). encoding (str, optional): Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
whatstk.whatsapp.hformat.is_supported(hformat, encoding='utf8')[source]

Check if header hformat is currently supported.

Parameters:
Returns:

tuple – * bool: True if header is supported. * bool: True if header is supported with auto_header feature.

whatstk.whatsapp.hformat.is_supported_verbose(hformat)[source]

Check if header hformat is currently supported (both manually and using auto_header).

Result is shown as a string.

Parameters:hformat (str) – Information message.

Example

Check if format '%y-%m-%d, %H:%M - %name:' is supported.

>>> from whatstk.whatsapp.hformat import is_supported_verbose
>>> is_supported_verbose('%y-%m-%d, %H:%M - %name:')
"The header '%y-%m-%d, %H:%M - %name:' is supported. `auto_header` for this header is supported."