whatstk.whatsapp package

Submodules

whatstk.whatsapp.auto_header module

Detect header from chat.

Functions:

extract_header_from_text(text[, encoding])

Extract header from text.

whatstk.whatsapp.auto_header.extract_header_from_text(text: str, encoding: str = 'utf-8') str | None[source]

Extract header from text.

Parameters:
  • text (str) – Loaded chat as string (whole text).

  • encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

str – Format extracted. None if no header was extracted.

Example

Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code whatstk.data).

>>> from whatstk.whatsapp.parser import extract_header_from_text
>>> from urllib.request import urlopen
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.POKEMON
>>> with urlopen(filepath_1) as f:
...     text = f.read().decode('utf-8')
>>> extract_header_from_text(text)
'%d.%m.%y, %H:%M - %name:

whatstk.whatsapp.generation module

Automatic generation of chat using Lorem Ipsum text and time series statistics.

Classes:

ChatGenerator(size[, users, seed])

Generate a chat.

Functions:

generate_chats_hformats(output_path[, size, ...])

Generate a chat and export using given header format.

class whatstk.whatsapp.generation.ChatGenerator(size: int, users: List[str] | None = None, seed: int = 100)[source]

Bases: object

Generate a chat.

Parameters:
  • size (int) – Number of messages to generate.

  • users (list, optional) – List with names of the users. Defaults to module variable USERS.

  • seed (int, optional) – Seed for random processes. Defaults to 100.

Examples

This simple example loads a chat using WhatsAppChat. Once loaded, we can access its attribute df, which contains the loaded chat as a DataFrame.

>>> from whatstk.whatsapp.generation import ChatGenerator
>>> from datetime import datetime
>>> from whatstk.data import whatsapp_urls
>>> chat = ChatGenerator(size=10).generate(last_timestamp=datetime(2020, 1, 1, 0, 0))
>>> chat.df.head(5)
                        date  username                                            message
0 2019-12-31 09:43:04.000525  Giuseppe                               Nisi ad esse cillum.
1 2019-12-31 10:19:21.980039  Giuseppe      Tempor dolore sint in eu lorem veniam veniam.
2 2019-12-31 13:56:45.575426  Giuseppe  Do quis fugiat sint ut ut, do anim eu est qui ...
3 2019-12-31 15:47:29.995420  Giuseppe  Do qui qui elit ea in sed culpa, aliqua magna ...
4 2019-12-31 16:23:00.348542      Mary  Sunt excepteur mollit voluptate dolor sint occ...

Methods:

generate([filepath, hformat, last_timestamp])

Generate random chat as WhatsAppChat.

generate(filepath: str | None = None, hformat: str | None = None, last_timestamp: datetime | None = None) str[source]

Generate random chat as WhatsAppChat.

Parameters:
  • filepath (str) – If given, generated chat is saved with name filepath (must be a local path).

  • hformat (str, optional) – Format of the header, e.g. '[%y-%m-%d %H:%M:%S] - %name:'.

  • last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.

Returns:

WhatsAppChat – Chat with random messages.

whatstk.whatsapp.generation.generate_chats_hformats(output_path: str, size: int = 2000, hformats: str | None = None, filepaths: str | None = None, last_timestamp: datetime | None = None, seed: int = 100, verbose: bool = False) None[source]

Generate a chat and export using given header format.

If no hformat specified, chat is generated & exported using all supported header formats.

Parameters:
  • output_path (str) – Path to directory to export all generated chats as txt.

  • size (int, optional) – Number of messages of the chat. Defaults to 2000.

  • hformats (list, optional) – List of header formats to use when exporting chat. If None, defaults to all supported header formats.

  • filepaths (list, optional) – List with filepaths. If None, defaults to whatstk.utils.utils._map_hformat_filename(filepath).

  • last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.

  • seed (int, optional) – Seed for random processes. Defaults to 100.

  • verbose (bool) – Set to True to print runtime messages.

whatstk.whatsapp.hformat module

Header format utils.

Example: Check if header is available.

>>> from whatstk.utils.hformat import is_supported
>>> is_supported('%y-%m-%d, %H:%M:%S - %name:')
(True, True)

Functions:

get_supported_hformats_as_dict([encoding])

Get dictionary with supported formats and relevant info.

get_supported_hformats_as_list([encoding])

Get list of supported formats.

is_supported(hformat[, encoding])

Check if header hformat is currently supported.

is_supported_verbose(hformat)

Check if header hformat is currently supported (both manually and using auto_header).

whatstk.whatsapp.hformat.get_supported_hformats_as_dict(encoding: str = 'utf8') Dict[str, int][source]

Get dictionary with supported formats and relevant info.

Parameters:

encoding (str, optional) –

Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

dict

Dict with two elements:
  • format: Header format. All formats appearing are supported.

  • auto_header: 1 if auto_header is supported), 0 otherwise.

whatstk.whatsapp.hformat.get_supported_hformats_as_list(encoding: str = 'utf8') List[str][source]

Get list of supported formats.

Returns:

list – List with supported formats (as str). encoding (str, optional): Encoding to use for UTF when reading/writing (ex. ‘utf-8’).

whatstk.whatsapp.hformat.is_supported(hformat: str, encoding: str = 'utf8') Tuple[bool, bool][source]

Check if header hformat is currently supported.

Parameters:
Returns:

tuple – * bool: True if header is supported. * bool: True if header is supported with auto_header feature.

whatstk.whatsapp.hformat.is_supported_verbose(hformat: str) str[source]

Check if header hformat is currently supported (both manually and using auto_header).

Result is shown as a string.

Parameters:

hformat (str) – Information message.

Example

Check if format '%y-%m-%d, %H:%M - %name:' is supported.

>>> from whatstk.whatsapp.hformat import is_supported_verbose
>>> is_supported_verbose('%y-%m-%d, %H:%M - %name:')
"The header '%y-%m-%d, %H:%M - %name:' is supported. `auto_header` for this header is supported."

whatstk.whatsapp.objects module

Library WhatsApp objects.

Classes:

WhatsAppChat(df)

Load and process a WhatsApp chat file.

class whatstk.whatsapp.objects.WhatsAppChat(df: DataFrame)[source]

Bases: BaseChat

Load and process a WhatsApp chat file.

Parameters:

df (pandas.DataFrame) – Chat.

Example

This simple example loads a chat using WhatsAppChat. Once loaded, we can access its attribute df, which contains the loaded chat as a DataFrame.

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> chat.df.head(5)
                 date     username                                            message
0 2016-08-06 13:23:00  Ash Ketchum                                          Hey guys!
1 2016-08-06 13:25:00        Brock              Hey Ash, good to have a common group!
2 2016-08-06 13:30:00        Misty  Hey guys! Long time haven't heard anything fro...
3 2016-08-06 13:45:00  Ash Ketchum  Indeed. I think having a whatsapp group nowada...
4 2016-08-06 14:30:00        Misty                                          Definetly

Methods:

from_source(filepath, **kwargs)

Create an instance from a chat text file.

from_sources(filepaths[, auto_header, ...])

Load a WhatsAppChat instance from multiple sources.

to_txt(filepath[, hformat, encoding])

Export chat to a text file.

classmethod from_source(filepath: str, **kwargs: Any) WhatsAppChat[source]

Create an instance from a chat text file.

Parameters:
  • filepath (str) –

    Path to the file. Accepted sources are:

    • Local file, e.g. ‘path/to/file.txt’.

    • URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.

    • Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.

  • **kwargs – Refer to the docs from df_from_txt_whatsapp for details on additional arguments.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

classmethod from_sources(filepaths: str, auto_header: bool | None = None, hformat: str | None = None, encoding: str = 'utf-8') WhatsAppChat[source]

Load a WhatsAppChat instance from multiple sources.

Parameters:
  • filepaths (list) – List with filepaths.

  • auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to perform automatic header detection for all files. If False, hformat is required.

  • hformat (list, optional) – List with the header format to be used for each file. The list must be of length equal to len(filenames). A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’.

  • encoding (str) –

    Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

Example

Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code whatstk.data).

>>> from whatstk.whatsapp.objects import WhatsAppChat
>>> from whatstk.data import whatsapp_urls
>>> filepath_1 = whatsapp_urls.LOREM1
>>> filepath_2 = whatsapp_urls.LOREM2
>>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2])
>>> chat.df.head(5)
                 date        username                                            message
0 2019-10-20 10:16:00            John        Laborum sed excepteur id eu cillum sunt ut.
1 2019-10-20 11:15:00            Mary  Ad aliquip reprehenderit proident est irure mo...
2 2019-10-20 12:16:00  +1 123 456 789  Nostrud adipiscing ex enim reprehenderit minim...
3 2019-10-20 12:57:00  +1 123 456 789  Deserunt proident laborum exercitation ex temp...
4 2019-10-20 17:28:00            John                Do ex dolor consequat tempor et ex.
to_txt(filepath: str, hformat: str | None = None, encoding: str = 'utf8') None[source]

Export chat to a text file.

Usefull to export the chat to different formats (i.e. using different hformats).

Parameters:
  • filepath (str) – Name of the file to export (must be a local path).

  • hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.

  • encoding (str, optional) –

    Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

whatstk.whatsapp.parser module

Parser utils.

Functions:

df_from_txt_whatsapp(filepath[, ...])

Load chat as a DataFrame.

generate_regex(hformat)

Generate regular expression from hformat.

whatstk.whatsapp.parser.df_from_txt_whatsapp(filepath: str, auto_header: bool = True, hformat: str | None = None, encoding: str = 'utf-8') WhatsAppChat[source]

Load chat as a DataFrame.

Parameters:
  • filepath (str) –

    Path to the file. Accepted sources are:

    • Local file, e.g. ‘path/to/file.txt’.

    • URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.

    • Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run gdrive_init.

  • auto_header (bool, optional) – Detect header automatically. If False, hformat is required.

  • hformat (str, optional) –

    Format of the header, e.g. '[%y-%m-%d %H:%M:%S] - %name:'. Use following keywords:

    • '%y': for year ('%Y' is equivalent).

    • '%m': for month.

    • '%d': for day.

    • '%H': for 24h-hour.

    • '%I': for 12h-hour.

    • '%M': for minutes.

    • '%S': for seconds.

    • '%P': for “PM”/”AM” or “p.m.”/”a.m.” characters.

    • '%name': for the username.

    Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the 'hformat='%d/%m/%y, %H:%M - %name:'.

    Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have hformat='%y-%m-%d, %I:%M %P - %name:'.

  • encoding (str, optional) –

    Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.

Returns:

WhatsAppChat – Class instance with loaded and parsed chat.

whatstk.whatsapp.parser.generate_regex(hformat: str) Tuple[str, str][source]

Generate regular expression from hformat.

Parameters:

hformat (str) – Simplified syntax for the header, e.g. '%y-%m-%d, %H:%M:%S - %name:'.

Returns:

str – Regular expression corresponding to the specified syntax.

Example

Generate regular expression corresponding to 'hformat=%y-%m-%d, %H:%M:%S - %name:'.

>>> from whatstk.whatsapp.parser import generate_regex
>>> generate_regex('%y-%m-%d, %H:%M:%S - %name:')
('(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (?P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(?
P<seconds>\\d{2}) - (?P<username>[^:]*): ', '(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (?
P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(?P<seconds>\\d{2}) - ')

Module contents

WhatsApp parser.