whatstk.analysis¶
Analysis tools.
Functions:
|
Get number of interventions per user per unit of time. |
|
Get response matrix for given chat. |
- whatstk.analysis.get_interventions_count(df: DataFrame = None, chat: BaseChat = None, date_mode: str = 'date', msg_length: bool = False, cumulative: bool = False, all_users: bool = False) DataFrame [source]¶
Get number of interventions per user per unit of time.
The unit of time can be chosen by means of argument
date_mode
.Note: Either
df
orchat
must be provided.- Parameters:
df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
chat
is ignored.chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
df
is None.date_mode (str, optional) –
Choose mode to group interventions by. Defaults to
date_mode=date
. Available modes are:'date'
: Grouped by particular date (year, month and day).'hour'
: Grouped by day hours (24 hours).'month'
: Grouped by months (12 months).'weekday'
: Grouped by weekday (i.e. monday, tuesday, …, sunday).'hourweekday'
: Grouped by weekday and hour.
msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
cumulative (bool, optional) – Set to True to obtain commulative counts.
all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
- Returns:
pandas.DataFrame – DataFrame with shape NxU, where N: number of time-slots and U: number of users.
- Raises:
ValueError – if
date_mode
value is not supported.
Example
Get number of interventions per user from POKEMON chat. The counts are represented as a NxU matrix, where N: number of time-slots and U: number of users.
>>> from whatstk import WhatsAppChat >>> from whatstk.analysis import get_interventions_count >>> from whatstk.data import whatsapp_urls >>> filepath = whatsapp_urls.POKEMON >>> chat = WhatsAppChat.from_source(filepath) >>> counts = get_interventions_count(chat=chat, date_mode='date', msg_length=False) >>> counts.head(5) username Ash Ketchum Brock Jessie & James ... Prof. Oak Raichu Wobbuffet date ... 2016-08-06 2 2 0 ... 0 0 0 2016-08-07 1 1 0 ... 1 0 0 2016-08-10 1 0 1 ... 0 2 0 2016-08-11 0 0 0 ... 0 0 0 2016-09-11 0 0 0 ... 0 0 0 [5 rows x 8 columns]
- whatstk.analysis.get_response_matrix(df: DataFrame | None = None, chat: BaseChat | None = None, zero_own: bool = True, norm: str = 'absolute') DataFrame [source]¶
Get response matrix for given chat.
Obtains a DataFrame of shape [n_users, n_users] counting the number of responses between members. Responses can be counted in different ways, e.g. using absolute values or normalised values. Responses are counted based solely on consecutive messages. That is, if \(user_i\) sends a message right after \(user_j\), it will be counted as a response from \(user_i\) to \(user_j\).
Axis 0 lists senders and axis 1 lists receivers. That is, the value in cell (i, j) denotes the number of times \(user_i\) responded to a message from \(user_j\).
Note: Either
df
orchat
must be provided.- Parameters:
df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
chat
is ignored.chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
df
is None.zero_own (bool, optional) – Set to True to avoid counting own responses. Defaults to True.
norm (str, optional) –
Specifies the type of normalization used for reponse count. Can be:
'absolute'
: Absolute count of messages.'joint'
: Normalized by total number of messages sent by all users.'sender'
: Normalized per sender by total number of messages sent by user.'receiver'
: Normalized per receiver by total number of messages sent by user.
- Returns:
pandas.DataFrame – Response matrix.
Example
Get absolute count on responses (consecutive messages) between users.
>>> from whatstk import WhatsAppChat >>> from whatstk.analysis import get_response_matrix >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> responses = get_response_matrix(chat=chat) >>> responses Ash Ketchum Brock ... Raichu Wobbuffet Ash Ketchum 0 0 ... 1 0 Brock 1 0 ... 0 0 Jessie & James 0 1 ... 0 0 Meowth 0 0 ... 0 0 Misty 2 1 ... 1 0 Prof. Oak 0 1 ... 0 0 Raichu 1 0 ... 0 0 Wobbuffet 0 0 ... 0 0