whatstk.whatsapp¶
WhatsApp parser.
whatstk.whatsapp.objects¶
Library WhatsApp objects.
Classes
WhatsAppChat (df) |
Load and process a WhatsApp chat file. |
-
class
whatstk.whatsapp.objects.
WhatsAppChat
(df)[source]¶ Bases:
whatstk._chat.BaseChat
Load and process a WhatsApp chat file.
Parameters: df (pandas.DataFrame) – Chat. Attributes
df
Chat as DataFrame. end_date
Chat end date. start_date
Chat starting date. users
List with users. Methods
from_source
(filepath, **kwargs)Create an instance from a chat text file. from_sources
(filepaths[, auto_header, …])Load a WhatsAppChat instance from multiple sources. merge
(chat[, rename_users])Merge current instance with chat
.rename_users
(mapping)Rename users. to_csv
(filepath)Save chat as csv. to_txt
(filepath[, hformat])Export chat to a text file. Example
This simple example loads a chat using
WhatsAppChat
. Once loaded, we can access its attributedf
, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> chat.df.head(5) username message date 2016-08-06 13:23:00 Ash Ketchum Hey guys! 2016-08-06 13:25:00 Brock Hey Ash, good to have a common group! 2016-08-06 13:30:00 Misty Hey guys! Long time haven't heard anything fro... 2016-08-06 13:45:00 Ash Ketchum Indeed. I think having a whatsapp group nowada... 2016-08-06 14:30:00 Misty Definetly
-
property
df
¶ Chat as DataFrame.
Returns: pandas.DataFrame
-
property
end_date
¶ Chat end date.
Returns: datetime
-
classmethod
from_source
(filepath, **kwargs)[source]¶ Create an instance from a chat text file.
Parameters: - filepath (str) – Path to the file. It can be a local file (e.g. ‘path/to/file.txt’) or an URL to a hosted file (e.g. ‘http://www.url.to/file.txt’)
- **kwargs – Refer to the docs from
df_from_txt_whatsapp
for details on additional arguments.
Returns: WhatsAppChat – Class instance with loaded and parsed chat.
-
classmethod
from_sources
(filepaths, auto_header=None, hformat=None, encoding='utf-8')[source]¶ Load a WhatsAppChat instance from multiple sources.
Parameters: - filepaths (list) – List with filepaths.
- auto_header (bool, optional) – Detect header automatically (applies to all files). If None, attempts to
perform automatic header detection for all files. If False,
hformat
is required. - hformat (list, optional) – List with the header format to be used for each file.
The list must be of length equal to
len(filenames)
. A valid header format might be ‘[%y-%m-%d %H:%M:%S] - %name:’. - encoding (str) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns: WhatsAppChat – Class instance with loaded and parsed chat.
See also
Example
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
whatstk.data
).>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.LOREM1 >>> filepath_2 = whatsapp_urls.LOREM2 >>> chat = WhatsAppChat.from_sources(filepaths=[filepath_1, filepath_2]) >>> chat.df.head(5) username message date 2019-10-20 10:16:00 John Laborum sed excepteur id eu cillum sunt ut. 2019-10-20 11:15:00 Mary Ad aliquip reprehenderit proident est irure mo... 2019-10-20 12:16:00 +1 123 456 789 Nostrud adipiscing ex enim reprehenderit minim... 2019-10-20 12:57:00 +1 123 456 789 Deserunt proident laborum exercitation ex temp... 2019-10-20 17:28:00 John Do ex dolor consequat tempor et ex.
-
merge
(chat, rename_users=None)¶ Merge current instance with
chat
.Parameters: - chat (WhatsAppChat) – Another chat.
- rename_users (dict) – Dictionary mapping old names to new names. Example: {‘John’:[‘Jon’, ‘J’], ‘Ray’: [‘Raymond’]} will map ‘Jon’ and ‘J’ to ‘John’, and ‘Raymond’ to ‘Ray’. Note that old names must come as list (even if there is only one).
Returns: WhatsAppChat – Merged chat.
See also
Example
Merging two chats can become handy when you have exported a chat in different times with your phone and hence each exported file might contain data that is unique to that file.
In this example however, we merge files from different chats.
>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.LOREM1 >>> filepath_2 = whatsapp_urls.LOREM2 >>> chat_1 = WhatsAppChat.from_source(filepath=filepath_1) >>> chat_2 = WhatsAppChat.from_source(filepath=filepath_2) >>> chat = chat_1.merge(chat_2)
-
rename_users
(mapping)¶ Rename users.
This might be needed in multiple occations:
- Change typos in user names stored in phone.
- If a user appears multiple times with different usernames, group these under the same name (this might
- happen when multiple chats are merged).
Parameters: mapping (dict) – Dictionary mapping old names to new names, example: {‘John’: [‘Jon’, ‘J’], ‘Ray’: [‘Raymond’]} will map ‘Jon’ and ‘J’ to ‘John’, and ‘Raymond’ to ‘Ray’. Note that old names must come as list (even if there is only one). Returns: pandas.DataFrame – DataFrame with users renamed according to mapping. Raises: ValueError – Raised if mapping is not correct. Examples
Load LOREM2 chat and rename users Maria and Maria2 to Mary.
>>> from whatstk.whatsapp.objects import WhatsAppChat >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM2) >>> chat.users ['+1 123 456 789', 'Giuseppe', 'John', 'Maria', 'Maria2'] >>> chat = chat.rename_users(mapping={'Mary': ['Maria', 'Maria2']}) >>> chat.users ['+1 123 456 789', 'Giuseppe', 'John', 'Mary']
-
property
start_date
¶ Chat starting date.
Returns: datetime
-
to_csv
(filepath)¶ Save chat as csv.
Parameters: filepath (str) – Name of file.
-
to_txt
(filepath, hformat=None)[source]¶ Export chat to a text file.
Usefull to export the chat to different formats (i.e. using different hformats).
Parameters: - filepath (str) – Name of the file to export (must be a local path).
- hformat (str, optional) – Header format. Defaults to ‘%y-%m-%d, %H:%M - %name:’.
-
property
users
¶ List with users.
Returns: list
-
property
whatstk.whatsapp.parser¶
Parser utils.
Functions
df_from_txt_whatsapp (filepath[, …]) |
Load chat as a DataFrame. |
generate_regex (hformat) |
Generate regular expression from hformat. |
-
whatstk.whatsapp.parser.
df_from_txt_whatsapp
(filepath, auto_header=True, hformat=None, encoding='utf-8')[source]¶ Load chat as a DataFrame.
Parameters: - filepath (str) – Path to the file. It can be a local file (e.g. ‘path/to/file.txt’) or an URL to a hosted file (e.g. ‘http://www.url.to/file.txt’)
- auto_header (bool, optional) – Detect header automatically. If False,
hformat
is required. - hformat (str, optional) –
Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'
. Use following keywords:'%y'
: for year ('%Y'
is equivalent).'%m'
: for month.'%d'
: for day.'%H'
: for 24h-hour.'%I'
: for 12h-hour.'%M'
: for minutes.'%S'
: for seconds.'%P'
: for “PM”/”AM” or “p.m.”/”a.m.” characters.'%name'
: for the username.
Example 1: For the header ‘12/08/2016, 16:20 - username:’ we have the
'hformat='%d/%m/%y, %H:%M - %name:'
.Example 2: For the header ‘2016-08-12, 4:20 PM - username:’ we have
hformat='%y-%m-%d, %I:%M %P - %name:'
. - encoding (str, optional) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns: WhatsAppChat – Class instance with loaded and parsed chat.
-
whatstk.whatsapp.parser.
generate_regex
(hformat)[source]¶ Generate regular expression from hformat.
Parameters: hformat (str) – Simplified syntax for the header, e.g. '%y-%m-%d, %H:%M:%S - %name:'
.Returns: str – Regular expression corresponding to the specified syntax. Example
Generate regular expression corresponding to
'hformat=%y-%m-%d, %H:%M:%S - %name:'
.>>> from whatstk.whatsapp.parser import generate_regex >>> generate_regex('%y-%m-%d, %H:%M:%S - %name:') ('(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (?P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(? P<seconds>\\d{2}) - (?P<username>[^:]*): ', '(?P<year>\\d{2,4})-(?P<month>\\d{1,2})-(?P<day>\\d{1,2}), (? P<hour>\\d{1,2}):(?P<minutes>\\d{2}):(?P<seconds>\\d{2}) - ')
whatstk.whatsapp.auto_header¶
Detect header from chat.
Functions
extract_header_from_text (text[, encoding]) |
Extract header from text. |
-
whatstk.whatsapp.auto_header.
extract_header_from_text
(text, encoding='utf-8')[source]¶ Extract header from text.
Parameters: - text (str) – Loaded chat as string (whole text).
- encoding (str) –
Encoding to use for UTF when reading/writing (ex. ‘utf-8’). List of Python standard encodings.
Returns: str – Format extracted. None if no header was extracted.
Example
Load a chat using two text files. In this example, we use sample chats (available online, see urls in source code
whatstk.data
).>>> from whatstk.whatsapp.parser import extract_header_from_text >>> from urllib.request import urlopen >>> from whatstk.data import whatsapp_urls >>> filepath_1 = whatsapp_urls.POKEMON >>> with urlopen(filepath_1) as f: ... text = f.read().decode('utf-8') >>> extract_header_from_text(text) '%d.%m.%y, %H:%M - %name:
whatstk.whatsapp.generation¶
Automatic generation of chat using Lorem Ipsum text and time series statistics.
Classes
ChatGenerator (size[, users, seed]) |
Generate a chat. |
Functions
generate_chats_hformats (output_path[, size, …]) |
Generate a chat and export using given header format. |
-
class
whatstk.whatsapp.generation.
ChatGenerator
(size, users=None, seed=100)[source]¶ Bases:
object
Generate a chat.
Parameters: - size (int) – Number of messages to generate.
- users (list, optional) – List with names of the users. Defaults to module variable USERS.
- seed (int, optional) – Seed for random processes. Defaults to 100.
Methods
generate
([filepath, hformat, last_timestamp])Generate random chat as WhatsAppChat
.Examples
This simple example loads a chat using
WhatsAppChat
. Once loaded, we can access its attributedf
, which contains the loaded chat as a DataFrame.>>> from whatstk.whatsapp.generation import ChatGenerator >>> from datetime import datetime >>> from whatstk.data import whatsapp_urls >>> chat = ChatGenerator(size=10).generate(last_timestamp=datetime(2020, 1, 1, 0, 0)) >>> chat.df.head(5) username message date 2019-12-31 09:43:04.000525 John Quis labore laboris proident et deserunt. 2019-12-31 10:19:21.980039 +1 123 456 789 Non ullamco esse nulla voluptate. 🇩🇰 2019-12-31 13:56:45.575426 John Duis non ut officia, enim enim qui cupidatat a... 2019-12-31 15:47:29.995420 Giuseppe Non ut nulla laboris nostrud aute. 🏊🏻 2019-12-31 16:23:00.348542 John Tempor irure in velit tempor.
-
generate
(filepath=None, hformat=None, last_timestamp=None)[source]¶ Generate random chat as
WhatsAppChat
.Parameters: - filepath (str) – If given, generated chat is saved with name
filepath
(must be a local path). - hformat (str, optional) – Format of the header, e.g.
'[%y-%m-%d %H:%M:%S] - %name:'
. - last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.
Returns: WhatsAppChat – Chat with random messages.
See also
- filepath (str) – If given, generated chat is saved with name
-
whatstk.whatsapp.generation.
generate_chats_hformats
(output_path, size=2000, hformats=None, filepaths=None, last_timestamp=None, seed=100, verbose=False)[source]¶ Generate a chat and export using given header format.
If no hformat specified, chat is generated & exported using all supported header formats.
Parameters: - output_path (str) – Path to directory to export all generated chats as txt.
- size (int, optional) – Number of messages of the chat. Defaults to 2000.
- hformats (list, optional) – List of header formats to use when exporting chat. If None, defaults to all supported header formats.
- filepaths (list, optional) – List with filepaths. If None, defaults to hformat.replace(‘ ‘, ‘_’).replace(‘/’, ‘').
- last_timestamp (datetime, optional) – Datetime of last message. If None, defaults to current date.
- seed (int, optional) – Seed for random processes. Defaults to 100.
- verbose (bool) – Set to True to print runtime messages.
See also
whatstk.whatsapp.hformat¶
Header format utils.
Example: Check if header is available.
>>> from whatstk.utils.hformat import is_supported >>> is_supported('%y-%m-%d, %H:%M:%S - %name:') (True, True)
Functions
get_supported_hformats_as_dict () |
Get dictionary with supported formats and relevant info. |
get_supported_hformats_as_list () |
Get list of supported formats. |
is_supported (hformat) |
Check if header hformat is currently supported. |
is_supported_verbose (hformat) |
Check if header hformat is currently supported (both manually and using auto_header). |
-
whatstk.whatsapp.hformat.
get_supported_hformats_as_dict
()[source]¶ Get dictionary with supported formats and relevant info.
Returns: dict – - Dict with two elements:
format
: Header format. All formats appearing are supported.auto_header
: 1 if auto_header is supported), 0 otherwise.
-
whatstk.whatsapp.hformat.
get_supported_hformats_as_list
()[source]¶ Get list of supported formats.
Returns: list – List with supported formats (as str).
-
whatstk.whatsapp.hformat.
is_supported
(hformat)[source]¶ Check if header hformat is currently supported.
Parameters: hformat (str) – Header format. Returns: tuple – * bool: True if header is supported. * bool: True if header is supported with auto_header feature.
-
whatstk.whatsapp.hformat.
is_supported_verbose
(hformat)[source]¶ Check if header hformat is currently supported (both manually and using auto_header).
Result is shown as a string.
Parameters: hformat (str) – Information message. Example
Check if format
'%y-%m-%d, %H:%M - %name:'
is supported.>>> from whatstk.whatsapp.hformat import is_supported_verbose >>> is_supported_verbose('%y-%m-%d, %H:%M - %name:') "The header '%y-%m-%d, %H:%M - %name:' is supported. `auto_header` for this header is supported."