whatstk.analysis package¶
Submodules¶
whatstk.analysis.interventions module¶
Base analysis tools.
Functions
get_interventions_count ([df, chat, …]) |
Get number of interventions per user per unit of time. |
-
whatstk.analysis.interventions.
get_interventions_count
(df=None, chat=None, date_mode='date', msg_length=False, cumulative=False, all_users=False, cummulative=None)[source]¶ Get number of interventions per user per unit of time.
The unit of time can be chosen by means of argument
date_mode
.Note: Either
df
orchat
must be provided.Parameters: - df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
chat
is ignored. - chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
df
is None. - date_mode (str, optional) –
Choose mode to group interventions by. Defaults to
date_mode=date
. Available modes are:'date'
: Grouped by particular date (year, month and day).'hour'
: Grouped by day hours (24 hours).'month'
: Grouped by months (12 months).'weekday'
: Grouped by weekday (i.e. monday, tuesday, …, sunday).'hourweekday'
: Grouped by weekday and hour.
- msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
- cumulative (bool, optional) – Set to True to obtain commulative counts.
- all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
- cummulative (bool, optional) – Deprecated, use cumulative.
Returns: pandas.DataFrame – DataFrame with shape NxU, where N: number of time-slots and U: number of users.
Raises: ValueError – if
date_mode
value is not supported.Example
Get number of interventions per user from POKEMON chat. The counts are represented as a NxU matrix, where N: number of time-slots and U: number of users.
>>> from whatstk import WhatsAppChat >>> from whatstk.analysis import get_interventions_count >>> from whatstk.data import whatsapp_urls >>> filepath = whatsapp_urls.POKEMON >>> chat = WhatsAppChat.from_source(filepath) >>> counts = get_interventions_count(chat=chat, date_mode='date', msg_length=False) >>> counts.head(5) username Ash Ketchum Brock Jessie & James ... Prof. Oak Raichu Wobbuffet date ... 2016-08-06 2 2 0 ... 0 0 0 2016-08-07 1 1 0 ... 1 0 0 2016-08-10 1 0 1 ... 0 2 0 2016-08-11 0 0 0 ... 0 0 0 2016-09-11 0 0 0 ... 0 0 0 [5 rows x 8 columns]
- df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
whatstk.analysis.responses module¶
Get infor regarding responses between users.
Classes
Norms (ABSOLUTE, JOINT, SENDER, RECEIVER) |
Functions
get_response_matrix ([df, chat, zero_own, norm]) |
Get response matrix for given chat. |
-
class
whatstk.analysis.responses.
Norms
(ABSOLUTE, JOINT, SENDER, RECEIVER)¶ Bases:
tuple
Attributes
ABSOLUTE
Alias for field number 0 JOINT
Alias for field number 1 RECEIVER
Alias for field number 3 SENDER
Alias for field number 2 -
property
ABSOLUTE
¶ Alias for field number 0
-
property
JOINT
¶ Alias for field number 1
-
property
RECEIVER
¶ Alias for field number 3
-
property
SENDER
¶ Alias for field number 2
-
property
-
whatstk.analysis.responses.
get_response_matrix
(df=None, chat=None, zero_own=True, norm='absolute')[source]¶ Get response matrix for given chat.
Obtains a DataFrame of shape [n_users, n_users] counting the number of responses between members. Responses can be counted in different ways, e.g. using absolute values or normalised values. Responses are counted based solely on consecutive messages. That is, if \(user_i\) sends a message right after \(user_j\), it will be counted as a response from \(user_i\) to \(user_j\).
Axis 0 lists senders and axis 1 lists receivers. That is, the value in cell (i, j) denotes the number of times \(user_i\) responded to a message from \(user_j\).
Note: Either
df
orchat
must be provided.Parameters: - df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
chat
is ignored. - chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
df
is None. - zero_own (bool, optional) – Set to True to avoid counting own responses. Defaults to True.
- norm (str, optional) –
Specifies the type of normalization used for reponse count. Can be:
'absolute'
: Absolute count of messages.'joint'
: Normalized by total number of messages sent by all users.'sender'
: Normalized per sender by total number of messages sent by user.'receiver'
: Normalized per receiver by total number of messages sent by user.
Returns: pandas.DataFrame – Response matrix.
Example
Get absolute count on responses (consecutive messages) between users.
>>> from whatstk import WhatsAppChat >>> from whatstk.analysis import get_response_matrix >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> responses = get_response_matrix(chat=chat) >>> responses Ash Ketchum Brock ... Raichu Wobbuffet Ash Ketchum 0 0 ... 1 0 Brock 1 0 ... 0 0 Jessie & James 0 1 ... 0 0 Meowth 0 0 ... 0 0 Misty 2 1 ... 1 0 Prof. Oak 0 1 ... 0 0 Raichu 1 0 ... 0 0 Wobbuffet 0 0 ... 0 0
- df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
Module contents¶
Analysis tools.
Functions
get_interventions_count ([df, chat, …]) |
Get number of interventions per user per unit of time. |
get_response_matrix ([df, chat, zero_own, norm]) |
Get response matrix for given chat. |
-
whatstk.analysis.
get_interventions_count
(df=None, chat=None, date_mode='date', msg_length=False, cumulative=False, all_users=False, cummulative=None)[source]¶ Get number of interventions per user per unit of time.
The unit of time can be chosen by means of argument
date_mode
.Note: Either
df
orchat
must be provided.Parameters: - df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
chat
is ignored. - chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
df
is None. - date_mode (str, optional) –
Choose mode to group interventions by. Defaults to
date_mode=date
. Available modes are:'date'
: Grouped by particular date (year, month and day).'hour'
: Grouped by day hours (24 hours).'month'
: Grouped by months (12 months).'weekday'
: Grouped by weekday (i.e. monday, tuesday, …, sunday).'hourweekday'
: Grouped by weekday and hour.
- msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
- cumulative (bool, optional) – Set to True to obtain commulative counts.
- all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
- cummulative (bool, optional) – Deprecated, use cumulative.
Returns: pandas.DataFrame – DataFrame with shape NxU, where N: number of time-slots and U: number of users.
Raises: ValueError – if
date_mode
value is not supported.Example
Get number of interventions per user from POKEMON chat. The counts are represented as a NxU matrix, where N: number of time-slots and U: number of users.
>>> from whatstk import WhatsAppChat >>> from whatstk.analysis import get_interventions_count >>> from whatstk.data import whatsapp_urls >>> filepath = whatsapp_urls.POKEMON >>> chat = WhatsAppChat.from_source(filepath) >>> counts = get_interventions_count(chat=chat, date_mode='date', msg_length=False) >>> counts.head(5) username Ash Ketchum Brock Jessie & James ... Prof. Oak Raichu Wobbuffet date ... 2016-08-06 2 2 0 ... 0 0 0 2016-08-07 1 1 0 ... 1 0 0 2016-08-10 1 0 1 ... 0 2 0 2016-08-11 0 0 0 ... 0 0 0 2016-09-11 0 0 0 ... 0 0 0 [5 rows x 8 columns]
- df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
-
whatstk.analysis.
get_response_matrix
(df=None, chat=None, zero_own=True, norm='absolute')[source]¶ Get response matrix for given chat.
Obtains a DataFrame of shape [n_users, n_users] counting the number of responses between members. Responses can be counted in different ways, e.g. using absolute values or normalised values. Responses are counted based solely on consecutive messages. That is, if \(user_i\) sends a message right after \(user_j\), it will be counted as a response from \(user_i\) to \(user_j\).
Axis 0 lists senders and axis 1 lists receivers. That is, the value in cell (i, j) denotes the number of times \(user_i\) responded to a message from \(user_j\).
Note: Either
df
orchat
must be provided.Parameters: - df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,
chat
is ignored. - chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if
df
is None. - zero_own (bool, optional) – Set to True to avoid counting own responses. Defaults to True.
- norm (str, optional) –
Specifies the type of normalization used for reponse count. Can be:
'absolute'
: Absolute count of messages.'joint'
: Normalized by total number of messages sent by all users.'sender'
: Normalized per sender by total number of messages sent by user.'receiver'
: Normalized per receiver by total number of messages sent by user.
Returns: pandas.DataFrame – Response matrix.
Example
Get absolute count on responses (consecutive messages) between users.
>>> from whatstk import WhatsAppChat >>> from whatstk.analysis import get_response_matrix >>> from whatstk.data import whatsapp_urls >>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON) >>> responses = get_response_matrix(chat=chat) >>> responses Ash Ketchum Brock ... Raichu Wobbuffet Ash Ketchum 0 0 ... 1 0 Brock 1 0 ... 0 0 Jessie & James 0 1 ... 0 0 Meowth 0 0 ... 0 0 Misty 2 1 ... 1 0 Prof. Oak 0 1 ... 0 0 Raichu 1 0 ... 0 0 Wobbuffet 0 0 ... 0 0
- df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given,