whatstk.whatsapp package
Submodules
whatstk.whatsapp.auto_header module
Detect header from chat.
Functions:
|
Extract header from text. |
- whatstk.whatsapp.auto_header.extract_header_from_text(text: str, encoding: str = 'utf-8') str | None[source]
Extract header from text.
- Parameters:
text (str) – Loaded chat as string (whole text).
encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- Returns:
str – Format extracted. None if no header was extracted.
Example
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
whatstk.data).>>> from whatstk.whatsapp.parser import extract_header_from_text >>> from urllib.request import urlopen >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.POKEMON >>> with urlopen(filepath_1) as f: ... text = f.read().decode('utf-8') >>> extract_header_from_text(text) '%d.%m.%y, %H:%M - %name:
whatstk.whatsapp.generation module
Automatic generation of chat using Lorem Ipsum text and time series statistics.
Classes:
|
Generate a chat. |
Functions:
|
Generate a chat and export using given header format. |
- class whatstk.whatsapp.generation.ChatGenerator(size: int, users: List[str] | None = None, seed: int = 100)[source]
Bases:
objectGenerate a chat.
- Parameters:
size (int) – Number of messages to generate.
users (list, optional) – List with names of the users. Defaults to module variable USERS.
seed (int, optional) – Seed for random processes. Defaults to 100.
Examples
This simple example loads a chat using
WhatsAppChat. Once loaded, we can access its attributedf, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.generation import ChatGenerator >>> from datetime import datetime >>> from whatstk.data import whatsapp_urls >>> chat = ChatGenerator(size=10).generate(last_timestamp=datetime(2020, 1, 1, 0, 0)) >>> chat.df.head(5) date username message 0 2019-12-31 09:43:04.000525 Giuseppe Nisi ad esse cillum. 1 2019-12-31 10:19:21.980039 Giuseppe Tempor dolore sint in eu lorem veniam veniam. 2 2019-12-31 13:56:45.575426 Giuseppe Do quis fugiat sint ut ut, do anim eu est qui ... 3 2019-12-31 15:47:29.995420 Giuseppe Do qui qui elit ea in sed culpa, aliqua magna ... 4 2019-12-31 16:23:00.348542 Mary Sunt excepteur mollit voluptate dolor sint occ...
Methods:
generate([filepath, hformat, last_timestamp])Generate random chat as
WhatsAppChat.- generate(filepath: str | None = None, hformat: str | None = None, last_timestamp: datetime | None = None) str[source]
Generate random chat as
WhatsAppChat.- Parameters:
filepath (str) – If given, generated chat is saved with name
filepath(must be a local path).hformat (str, optional) – Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'.last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.
- Returns:
WhatsAppChat – Chat with random messages.
See also
- whatstk.whatsapp.generation.generate_chats_hformats(output_path: str, size: int = 2000, hformats: str | None = None, filepaths: str | None = None, last_timestamp: datetime | None = None, seed: int = 100, verbose: bool = False, export_as_zip: bool = False) None[source]
Generate a chat and export using given header format.
If no hformat specified, chat is generated & exported using all supported header formats.
- Parameters:
output_path (str) – Path to directory to export all generated chats as txt.
size (int, optional) – Number of messages of the chat. Defaults to 2000.
hformats (list, optional) – List of header formats to use when exporting chat. If None, defaults to all supported header formats.
filepaths (list, optional) – List with filepaths (only txt files). If None, defaults to whatstk.utils.utils._map_hformat_filename(filepath).
last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.
seed (int, optional) – Seed for random processes. Defaults to 100.
verbose (bool) – Set to True to print runtime messages.
export_as_zip (bool) – Set to True to export the chat(s) zipped, additionally.
See also
whatstk.whatsapp.hformat module
Header format utils.
Example: Check if header is available.
>>> from whatstk.utils.hformat import is_supported >>> is_supported('%y-%m-%d, %H:%M:%S - %name:') (True, True)
Functions:
|
Get dictionary with supported formats and relevant info. |
|
Get list of supported formats. |
|
Check if header hformat is currently supported. |
|
Check if header hformat is currently supported (both manually and using auto_header). |
- whatstk.whatsapp.hformat.get_supported_hformats_as_dict(encoding: str = 'utf8') Dict[str, int][source]
Get dictionary with supported formats and relevant info.
- Parameters:
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- Returns:
dict –
- Dict with two elements:
format: Header format. All formats appearing are supported.auto_header: 1 if auto_header is supported), 0 otherwise.
- whatstk.whatsapp.hformat.get_supported_hformats_as_list(encoding: str = 'utf8') List[str][source]
Get list of supported formats.
- Returns:
list – List with supported formats (as str). encoding (str, optional): Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
- whatstk.whatsapp.hformat.is_supported(hformat: str, encoding: str = 'utf8') Tuple[bool, bool][source]
Check if header hformat is currently supported.
- Parameters:
hformat (str) – Header format.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- Returns:
tuple – * bool: True if header is supported. * bool: True if header is supported with auto_header feature.
- whatstk.whatsapp.hformat.is_supported_verbose(hformat: str) str[source]
Check if header hformat is currently supported (both manually and using auto_header).
Result is shown as a string.
- Parameters:
hformat (str) – Information message.
Example
Check if format
'%y-%m-%d, %H:%M - %name:'is supported.>>> from whatstk.whatsapp.hformat import is_supported_verbose >>> is_supported_verbose('%y-%m-%d, %H:%M - %name:') "The header '%y-%m-%d, %H:%M - %name:' is supported. `auto_header` for this header is supported."
whatstk.whatsapp.objects module
Library WhatsApp objects.
Classes:
|
Load and process a WhatsApp chat file. |
- class whatstk.whatsapp.objects.WhatsAppChat(df: DataFrame)[source]
Bases:
BaseChatLoad and process a WhatsApp chat file.
- Parameters:
df (pandas.DataFrame) – Chat.
Example
This simple example loads a chat using
WhatsAppChat. Once loaded, we can access its attributedf, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> chat.df.head(5) date username message 0 2016-08-06 13:23:00 Ash Ketchum Hey guys! 1 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 3 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 4 2016-08-06 14:30:00 Misty Definetly
Optionally, you can use the argument extra_metadata to add additional metadata to the chat:
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON, extra_metadata=True) >>> chat.name 'Pokemon Chat' >>> chat.df_system date message 0 2016-04-15 15:04:00 Messages and calls are end-to-end encrypted. N... >>> chat.df.head() date username message 0 2016-08-06 13:23:00 Ash Ketchum Hey guys! 1 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 3 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 4 2016-08-06 14:30:00 Misty Definetly
Methods:
from_source(filepath[, extra_metadata])Create an instance from a chat text file.
from_sources(filepaths[, auto_header, ...])Load a WhatsAppChat instance from multiple sources.
to_txt(filepath[, hformat, encoding])Export chat to a text file.
to_zip(filepath[, hformat, encoding])Export chat to a zip file.
- classmethod from_source(filepath: str, extra_metadata: bool | None = None, **kwargs: Any) WhatsAppChat[source]
Create an instance from a chat text file.
- Parameters:
filepath (str) –
Path to the file. Accepted sources are:
Local file, e.g. ‘path/to/file.txt’ or ‘path/to/file.zip’ (iOS).
URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run
gdrive_init.
**kwargs – Refer to the docs from
df_from_whatsappfor details on additional arguments.extra_metadata (bool) – This is experimental. If True, additional metadata will be added to the DataFrame. This includes class attributes such as chat.name, chat.df_system (DataFrame with only system messages). Note that this attribute only works on group chats.
- Returns:
WhatsAppChat – Class instance with loaded and parsed chat.
- classmethod from_sources(filepaths: str, auto_header: bool | None = None, hformat: str | None = None, encoding: str = 'utf-8') WhatsAppChat[source]
Load a WhatsAppChat instance from multiple sources.
- Parameters:
filepaths (list) – List with filepaths.
auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to perform automatic header detection for all files. If False,
hformatis required.hformat (list, optional) – List with the header format to be used for each file. The list must be of length equal to
len(filenames). A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’.encoding (str) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- Returns:
WhatsAppChat – Class instance with loaded and parsed chat.
See also
Example
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
whatstk.data).>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.LOREM1 >>> filepath_2 = whatsapp_urls.LOREM2 >>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2]) >>> chat.df.head(5) date username message 0 2019-10-20 10:16:00 John Laborum sed excepteur id eu cillum sunt ut. 1 2019-10-20 11:15:00 Mary Ad aliquip reprehenderit proident est irure mo... 2 2019-10-20 12:16:00 +1 123 456 789 Nostrud adipiscing ex enim reprehenderit minim... 3 2019-10-20 12:57:00 +1 123 456 789 Deserunt proident laborum exercitation ex temp... 4 2019-10-20 17:28:00 John Do ex dolor consequat tempor et ex.
- to_txt(filepath: str, hformat: str | None = None, encoding: str = 'utf8') None[source]
Export chat to a text file.
Usefull to export the chat to different formats (i.e. using different hformats).
- Parameters:
filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
- to_zip(filepath: str, hformat: str | None = None, encoding: str = 'utf8') None[source]
Export chat to a zip file.
Usefull to export the chat to different formats (i.e. using different hformats).
- Parameters:
filepath (str) – Name of the file to export (must be a local path).
hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
whatstk.whatsapp.parser module
Parser utils.
Functions:
|
Alias for |
|
Load chat as a DataFrame. |
|
Generate regular expression from hformat. |
- whatstk.whatsapp.parser.df_from_txt_whatsapp(filepath: str, **kwargs: Any) DataFrame[source]
Alias for
df_from_whatsapp.
- whatstk.whatsapp.parser.df_from_whatsapp(filepath: str, auto_header: bool = True, hformat: str | None = None, encoding: str = 'utf-8', message_type: bool | None = None) DataFrame[source]
Load chat as a DataFrame.
- Parameters:
filepath (str) –
Path to the file. Accepted sources are:
Local file, e.g. ‘path/to/file.txt’ OR ‘path/to/_chat.zip’ (e.g. iOS export).
URL to a remote hosted file, e.g. ‘http://www.url.to/file.txt’.
Link to Google Drive file, e.g. ‘gdrive://35gKKrNk-i3t05zPLyH4_P1rPdOmKW9NZ’. The format is expected to be ‘gdrive://[FILE-ID]’. Note that in order to load a file from Google Drive you first need to run
gdrive_init.
auto_header (bool, optional) – Detect header automatically. If False,
hformatis required.hformat (str, optional) –
Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'. Use following keywords:'%y': for year ('%Y'is equivalent).'%m': for month.'%d': for day.'%H': for 24h-hour.'%I': for 12h-hour.'%M': for minutes.'%S': for seconds.'%P': for “PM”/”AM” or “p.m.”/”a.m.” characters.'%name': for the username.
Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the
'hformat='%d/%m/%y, %H:%M - %name:'.Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have
hformat='%y-%m-%d, %I:%M %P - %name:'.encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
message_type (bool, optional) – Label for the message type. Can be ‘user’ or ‘system’, based on who sent the message.
- Returns:
WhatsAppChat – Class instance with loaded and parsed chat.
Example
Read a chat
>>> from whatstk import df_from_whatsapp >>> from whatstk.data import whatsapp_urls >>> df = df_from_whatsapp(filepath=whatsapp_urls.LOREM) >>> df.head(5) date username message message_type 0 2020-01-15 02:22:56 Mary Nostrud exercitation magna id. system 1 2020-01-15 03:33:01 Mary Non elit irure irure pariatur exercitation. 🇩🇰 user 2 2020-01-15 04:18:42 +1 123 456 789 Exercitation esse lorem reprehenderit ut ex ve... user 3 2020-01-15 06:05:14 Giuseppe Aliquip dolor reprehenderit voluptate dolore e... user 4 2020-01-15 06:56:00 Mary Ullamco duis et commodo exercitation. user
Read a chat, labelling each message as ‘user’ or ‘system’. ‘system’ messages are those sent by the chat itself (creation of chat, etc.)
>>> from whatstk import df_from_whatsapp >>> from whatstk.data import whatsapp_urls >>> df = df_from_whatsapp(filepath=whatsapp_urls.POKEMON, message_type=True) >>> df.head() date username message message_type 0 2016-04-15 15:04:00 Pokemon Chat Messages and calls are end-to-end encrypted. N... system 1 2016-08-06 13:23:00 Ash Ketchum Hey guys! user 2 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! user 3 2016-08-06 13:30:00 Misty Hey guys! Long time since heard anything from you user
- whatstk.whatsapp.parser.generate_regex(hformat: str) Tuple[str, str][source]
Generate regular expression from hformat.
- Parameters:
hformat (str) – Simplified syntax for the header, e.g.
'%y-%m-%d, %H:%M:%S - %name:'.- Returns:
str – Regular expression corresponding to the specified syntax.
Example
Generate regular expression corresponding to
'hformat=%y-%m-%d, %H:%M:%S - %name:'.>>> from whatstk.whatsapp.parser import generate_regex >>> generate_regex('%y-%m-%d, %H:%M:%S - %name:') ('(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (?P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(? P<seconds>\\d{2}) - (?P<username>[^:]*): ', '(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (? P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(?P<seconds>\\d{2}) - ')
Module contents
WhatsApp parser.