whatstk.whatsapp package¶

Submodules¶

whatstk.whatsapp.auto_header module¶

Detect header from chat.

Functions

extract_header_from_text(text[, encoding]) Extract header from text.

whatstk.whatsapp.auto_header.extract_header_from_text(text, encoding='utf-8')[source]¶

Extract header from text.

Parameters:	text (str) – Loaded chat as string (whole text). encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns:	str – Format extracted. None if no header was extracted.

Example

Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code whatstk.data).

>>> from whatstk.whatsapp.parser import extract_header_from_text
>>> from urllib.request import urlopen
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.POKEMON
>>> with urlopen(filepath_1) as f:
...     text = f.read().decode('utf-8')
>>> extract_header_from_text(text)
'%d.%m.%y, %H:%M - %name:

whatstk.whatsapp.generation module¶

whatstk.whatsapp.hformat module¶

Header format utils.

Example: Check if header is available.

>>> from whatstk.utils.hformat import is_supported
>>> is_supported('%y-%m-%d, %H:%M:%S - %name:')
(True, True)

Functions

`get_supported_hformats_as_dict`([encoding])	Get dictionary with supported formats and relevant info.
`get_supported_hformats_as_list`([encoding])	Get list of supported formats.
`is_supported`(hformat[, encoding])	Check if header hformat is currently supported.
`is_supported_verbose`(hformat)	Check if header hformat is currently supported (both manually and using auto_header).

whatstk.whatsapp.hformat.get_supported_hformats_as_dict(encoding='utf8')[source]¶

Get dictionary with supported formats and relevant info.

Parameters:	encoding (str, optional) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns:	dict – Dict with two elements: `format`: Header format. All formats appearing are supported. `auto_header`: 1 if auto_header is supported), 0 otherwise.

whatstk.whatsapp.hformat.get_supported_hformats_as_list(encoding='utf8')[source]¶

Get list of supported formats.

Returns:	list – List with supported formats (as str). encoding (str, optional): Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

whatstk.whatsapp.hformat.is_supported(hformat, encoding='utf8')[source]¶

Check if header hformat is currently supported.

Parameters:	hformat (str) – Header format. encoding (str, optional) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns:	tuple – * bool: True if header is supported. * bool: True if header is supported with auto_header feature.

whatstk.whatsapp.hformat.is_supported_verbose(hformat)[source]¶

Check if header hformat is currently supported (both manually and using auto_header).

Result is shown as a string.

Parameters:	hformat (str) – Information message.

Example

Check if format '%y-%m-%d, %H:%M - %name:' is supported.

>>> from whatstk.whatsapp.hformat import is_supported_verbose
>>> is_supported_verbose('%y-%m-%d, %H:%M - %name:')
"The header '%y-%m-%d, %H:%M - %name:' is supported. `auto_header` for this header is supported."

whatstk.whatsapp.objects module¶

Library WhatsApp objects.

Classes

WhatsAppChat(df) Load and process a WhatsApp chat file.

class whatstk.whatsapp.objects.WhatsAppChat(df)[source]¶

Bases: whatstk._chat.BaseChat

Load and process a WhatsApp chat file.

Parameters:	df (pandas.DataFrame) – Chat.

Methods

`from_source`(filepath, **kwargs)	Create an instance from a chat text file.
`from_sources`(filepaths[, auto_header, …])	Load a WhatsAppChat instance from multiple sources.
`to_txt`(filepath[, hformat, encoding])	Export chat to a text file.

Example

This simple example loads a chat using WhatsAppChat. Once loaded, we can access its attribute df, which contains the loaded chat as a DataFrame.

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> chat.df.head(5)
                 date     username                                            message
0 2016-08-06 13:23:00  Ash Ketchum                                          Hey guys!
1 2016-08-06 13:25:00        Brock              Hey Ash, good to have a common group!
2 2016-08-06 13:30:00        Misty  Hey guys! Long time haven't heard anything fro...
3 2016-08-06 13:45:00  Ash Ketchum  Indeed. I think having a whatsapp group nowada...
4 2016-08-06 14:30:00        Misty                                          Definetly

classmethod from_source(filepath, **kwargs)[source]¶

Create an instance from a chat text file.

Parameters:

filepath (str) –
Path to the file. Accepted sources are:
- Local file, e.g. ‘path/to/file.txt’.
- URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
- Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.
**kwargs – Refer to the docs from df_from_txt_whatsapp for details on additional arguments.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

whatstk.whatsapp.parser module¶

Parser utils.

Functions

`df_from_txt_whatsapp`(filepath[, …])	Load chat as a DataFrame.
`generate_regex`(hformat)	Generate regular expression from hformat.

whatstk.whatsapp.parser.df_from_txt_whatsapp(filepath, auto_header=True, hformat=None, encoding='utf-8')[source]¶

Load chat as a DataFrame.

Parameters:

filepath (str) –
Path to the file. Accepted sources are:
- Local file, e.g. ‘path/to/file.txt’.
- URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
- Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.
auto_header (bool, optional) – Detect header automatically. If False, hformat is required.
hformat (str, optional) –
Format of the header, e.g. '[%y-%m-%d %H:%M:%S] - %name:'. Use following keywords:
- '%y': for year ('%Y' is equivalent).
- '%m': for month.
- '%d': for day.
- '%H': for 24h-hour.
- '%I': for 12h-hour.
- '%M': for minutes.
- '%S': for seconds.
- '%P': for “PM”/”AM” or “p.m.”/”a.m.” characters.
- '%name': for the username.
Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the 'hformat='%d/%m/%y, %H:%M - %name:'.

Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have hformat='%y-%m-%d, %I:%M %P - %name:'.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

Module contents¶

WhatsApp parser.