whatstk.whatsapp package¶
Submodules¶
whatstk.whatsapp.auto_header module¶
Detect header from chat.
Functions
extract_header_from_text (text[, encoding]) |
Extract header from text. |
-
whatstk.whatsapp.auto_header.
extract_header_from_text
(text, encoding='utf-8')[source]¶ Extract header from text.
Parameters: - text (str) – Loaded chat as string (whole text).
- encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns: str – Format extracted. None if no header was extracted.
Example
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
whatstk.data
).>>> from whatstk.whatsapp.parser import extract_header_from_text >>> from urllib.request import urlopen >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.POKEMON >>> with urlopen(filepath_1) as f: ... text = f.read().decode('utf-8') >>> extract_header_from_text(text) '%d.%m.%y, %H:%M - %name:
whatstk.whatsapp.generation module¶
Automatic generation of chat using Lorem Ipsum text and time series statistics.
Classes
ChatGenerator (size[, users, seed]) |
Generate a chat. |
Functions
generate_chats_hformats (output_path[, size, …]) |
Generate a chat and export using given header format. |
-
class
whatstk.whatsapp.generation.
ChatGenerator
(size, users=None, seed=100)[source]¶ Bases:
object
Generate a chat.
Parameters: - size (int) – Number of messages to generate.
- users (list, optional) – List with names of the users. Defaults to module variable USERS.
- seed (int, optional) – Seed for random processes. Defaults to 100.
Methods
generate
([filepath, hformat, last_timestamp])Generate random chat as WhatsAppChat
.Examples
This simple example loads a chat using
WhatsAppChat
. Once loaded, we can access its attributedf
, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.generation import ChatGenerator >>> from datetime import datetime >>> from whatstk.data import whatsapp_urls >>> chat = ChatGenerator(size=10).generate(last_timestamp=datetime(2020, 1, 1, 0, 0)) >>> chat.df.head(5) date username message 0 2019-12-31 09:43:04.000525 Giuseppe Nisi ad esse cillum. 1 2019-12-31 10:19:21.980039 Giuseppe Tempor dolore sint in eu lorem veniam veniam. 2 2019-12-31 13:56:45.575426 Giuseppe Do quis fugiat sint ut ut, do anim eu est qui ... 3 2019-12-31 15:47:29.995420 Giuseppe Do qui qui elit ea in sed culpa, aliqua magna ... 4 2019-12-31 16:23:00.348542 Mary Sunt excepteur mollit voluptate dolor sint occ...
-
generate
(filepath=None, hformat=None, last_timestamp=None)[source]¶ Generate random chat as
WhatsAppChat
.Parameters: - filepath (str) – If given, generated chat is saved with name
filepath
(must be a local path). - hformat (str, optional) – Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'
. - last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.
Returns: WhatsAppChat – Chat with random messages.
See also
- filepath (str) – If given, generated chat is saved with name
-
whatstk.whatsapp.generation.
generate_chats_hformats
(output_path, size=2000, hformats=None, filepaths=None, last_timestamp=None, seed=100, verbose=False)[source]¶ Generate a chat and export using given header format.
If no hformat specified, chat is generated & exported using all supported header formats.
Parameters: - output_path (str) – Path to directory to export all generated chats as txt.
- size (int, optional) – Number of messages of the chat. Defaults to 2000.
- hformats (list, optional) – List of header formats to use when exporting chat. If None, defaults to all supported header formats.
- filepaths (list, optional) – List with filepaths. If None, defaults to hformat.replace(‘ ‘, ‘_’).replace(‘/’, ‘').
- last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.
- seed (int, optional) – Seed for random processes. Defaults to 100.
- verbose (bool) – Set to True to print runtime messages.
See also
whatstk.whatsapp.hformat module¶
Header format utils.
Example: Check if header is available.
>>> from whatstk.utils.hformat import is_supported >>> is_supported('%y-%m-%d, %H:%M:%S - %name:') (True, True)
Functions
get_supported_hformats_as_dict () |
Get dictionary with supported formats and relevant info. |
get_supported_hformats_as_list () |
Get list of supported formats. |
is_supported (hformat) |
Check if header hformat is currently supported. |
is_supported_verbose (hformat) |
Check if header hformat is currently supported (both manually and using auto_header). |
-
whatstk.whatsapp.hformat.
get_supported_hformats_as_dict
()[source]¶ Get dictionary with supported formats and relevant info.
Returns: dict – - Dict with two elements:
format
: Header format. All formats appearing are supported.auto_header
: 1 if auto_header is supported), 0 otherwise.
-
whatstk.whatsapp.hformat.
get_supported_hformats_as_list
()[source]¶ Get list of supported formats.
Returns: list – List with supported formats (as str).
-
whatstk.whatsapp.hformat.
is_supported
(hformat)[source]¶ Check if header hformat is currently supported.
Parameters: hformat (str) – Header format. Returns: tuple – * bool: True if header is supported. * bool: True if header is supported with auto_header feature.
-
whatstk.whatsapp.hformat.
is_supported_verbose
(hformat)[source]¶ Check if header hformat is currently supported (both manually and using auto_header).
Result is shown as a string.
Parameters: hformat (str) – Information message. Example
Check if format
'%y-%m-%d, %H:%M - %name:'
is supported.>>> from whatstk.whatsapp.hformat import is_supported_verbose >>> is_supported_verbose('%y-%m-%d, %H:%M - %name:') "The header '%y-%m-%d, %H:%M - %name:' is supported. `auto_header` for this header is supported."
whatstk.whatsapp.objects module¶
Library WhatsApp objects.
Classes
WhatsAppChat (df) |
Load and process a WhatsApp chat file. |
-
class
whatstk.whatsapp.objects.
WhatsAppChat
(df)[source]¶ Bases:
whatstk._chat.BaseChat
Load and process a WhatsApp chat file.
Parameters: df (pandas.DataFrame) – Chat. Methods
from_source
(filepath, **kwargs)Create an instance from a chat text file. from_sources
(filepaths[, auto_header, …])Load a WhatsAppChat instance from multiple sources. to_txt
(filepath[, hformat])Export chat to a text file. Example
This simple example loads a chat using
WhatsAppChat
. Once loaded, we can access its attributedf
, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> chat.df.head(5) date username message 0 2016-08-06 13:23:00 Ash Ketchum Hey guys! 1 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 3 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 4 2016-08-06 14:30:00 Misty Definetly
-
classmethod
from_source
(filepath, **kwargs)[source]¶ Create an instance from a chat text file.
Parameters: - filepath (str) – Path to the file. It can be a local file (e.g. ‘path/to/file.txt’) or an URL to a hosted file (e.g. ‘http://www.url.to/file.txt’)
- **kwargs – Refer to the docs from
df_from_txt_whatsapp
for details on additional arguments.
Returns: WhatsAppChat – Class instance with loaded and parsed chat.
-
classmethod
from_sources
(filepaths, auto_header=None, hformat=None, encoding='utf-8')[source]¶ Load a WhatsAppChat instance from multiple sources.
Parameters: - filepaths (list) – List with filepaths.
- auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to
perform automatic header detection for all files. If False,
hformat
is required. - hformat (list, optional) – List with the header format to be used for each file.
The list must be of length equal to
len(filenames)
. A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’. - encoding (str) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns: WhatsAppChat – Class instance with loaded and parsed chat.
See also
Example
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
whatstk.data
).>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.LOREM1 >>> filepath_2 = whatsapp_urls.LOREM2 >>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2]) >>> chat.df.head(5) date username message 0 2019-10-20 10:16:00 John Laborum sed excepteur id eu cillum sunt ut. 1 2019-10-20 11:15:00 Mary Ad aliquip reprehenderit proident est irure mo... 2 2019-10-20 12:16:00 +1 123 456 789 Nostrud adipiscing ex enim reprehenderit minim... 3 2019-10-20 12:57:00 +1 123 456 789 Deserunt proident laborum exercitation ex temp... 4 2019-10-20 17:28:00 John Do ex dolor consequat tempor et ex.
-
to_txt
(filepath, hformat=None)[source]¶ Export chat to a text file.
Usefull to export the chat to different formats (i.e. using different hformats).
Parameters: - filepath (str) – Name of the file to export (must be a local path).
- hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
-
classmethod
whatstk.whatsapp.parser module¶
Parser utils.
Functions
df_from_txt_whatsapp (filepath[, …]) |
Load chat as a DataFrame. |
generate_regex (hformat) |
Generate regular expression from hformat. |
-
whatstk.whatsapp.parser.
df_from_txt_whatsapp
(filepath, auto_header=True, hformat=None, encoding='utf-8')[source]¶ Load chat as a DataFrame.
Parameters: - filepath (str) – Path to the file. It can be a local file (e.g. ‘path/to/file.txt’) or an URL to a hosted file (e.g. ‘http://www.url.to/file.txt’)
- auto_header (bool, optional) – Detect header automatically. If False,
hformat
is required. - hformat (str, optional) –
Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'
. Use following keywords:'%y'
: for year ('%Y'
is equivalent).'%m'
: for month.'%d'
: for day.'%H'
: for 24h-hour.'%I'
: for 12h-hour.'%M'
: for minutes.'%S'
: for seconds.'%P'
: for “PM”/”AM” or “p.m.”/”a.m.” characters.'%name'
: for the username.
Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the
'hformat='%d/%m/%y, %H:%M - %name:'
.Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have
hformat='%y-%m-%d, %I:%M %P - %name:'
. - encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns: WhatsAppChat – Class instance with loaded and parsed chat.
-
whatstk.whatsapp.parser.
generate_regex
(hformat)[source]¶ Generate regular expression from hformat.
Parameters: hformat (str) – Simplified syntax for the header, e.g. '%y-%m-%d, %H:%M:%S - %name:'
.Returns: str – Regular expression corresponding to the specified syntax. Example
Generate regular expression corresponding to
'hformat=%y-%m-%d, %H:%M:%S - %name:'
.>>> from whatstk.whatsapp.parser import generate_regex >>> generate_regex('%y-%m-%d, %H:%M:%S - %name:') ('(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (?P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(? P<seconds>\\d{2}) - (?P<username>[^:]*): ', '(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (? P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(?P<seconds>\\d{2}) - ')
Module contents¶
WhatsApp parser.