whatstk.analysis

Analysis tools.

Functions

get_interventions_count([df, chat, …]) Get number of interventions per user per unit of time.
get_response_matrix([df, chat, zero_own, norm]) Get response matrix for given chat.
whatstk.analysis.get_interventions_count(df=None, chat=None, date_mode='date', msg_length=False, cumulative=False, all_users=False, cummulative=None)[source]

Get number of interventions per user per unit of time.

The unit of time can be chosen by means of argument date_mode.

Note: Either df or chat must be provided.

Parameters:
  • df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.
  • chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.
  • date_mode (str, optional) –

    Choose mode to group interventions by. Defaults to date_mode=date. Available modes are:

    • 'date': Grouped by particular date (year, month and day).
    • 'hour': Grouped by day hours (24 hours).
    • 'month': Grouped by months (12 months).
    • 'weekday': Grouped by weekday (i.e. monday, tuesday, …, sunday).
    • 'hourweekday': Grouped by weekday and hour.
  • msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.
  • cumulative (bool, optional) – Set to True to obtain commulative counts.
  • all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.
  • cummulative (bool, optional) – Deprecated, use cumulative.
Returns:

pandas.DataFrame – DataFrame with shape NxU, where N: number of time-slots and U: number of users.

Raises:

ValueError – if date_mode value is not supported.

Example

Get number of interventions per user from POKEMON chat. The counts are represented as a NxU matrix, where N: number of time-slots and U: number of users.

>>> from whatstk import WhatsAppChat
>>> from whatstk.analysis import get_interventions_count
>>> from whatstk.data import whatsapp_urls
>>> filepath = whatsapp_urls.POKEMON
>>> chat = WhatsAppChat.from_source(filepath)
>>> counts = get_interventions_count(chat=chat, date_mode='date', msg_length=False)
>>> counts.head(5)
username    Ash Ketchum  Brock  Jessie & James  ...  Prof. Oak  Raichu  Wobbuffet
date                                            ...
2016-08-06            2      2               0  ...          0       0          0
2016-08-07            1      1               0  ...          1       0          0
2016-08-10            1      0               1  ...          0       2          0
2016-08-11            0      0               0  ...          0       0          0
2016-09-11            0      0               0  ...          0       0          0

[5 rows x 8 columns]
whatstk.analysis.get_response_matrix(df=None, chat=None, zero_own=True, norm='absolute')[source]

Get response matrix for given chat.

Obtains a DataFrame of shape [n_users, n_users] counting the number of responses between members. Responses can be counted in different ways, e.g. using absolute values or normalised values. Responses are counted based solely on consecutive messages. That is, if \(user_i\) sends a message right after \(user_j\), it will be counted as a response from \(user_i\) to \(user_j\).

Axis 0 lists senders and axis 1 lists receivers. That is, the value in cell (i, j) denotes the number of times \(user_i\) responded to a message from \(user_j\).

Note: Either df or chat must be provided.

Parameters:
  • df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.
  • chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.
  • zero_own (bool, optional) – Set to True to avoid counting own responses. Defaults to True.
  • norm (str, optional) –

    Specifies the type of normalization used for reponse count. Can be:

    • 'absolute': Absolute count of messages.
    • 'joint': Normalized by total number of messages sent by all users.
    • 'sender': Normalized per sender by total number of messages sent by user.
    • 'receiver': Normalized per receiver by total number of messages sent by user.
Returns:

pandas.DataFrame – Response matrix.

Example

Get absolute count on responses (consecutive messages) between users.

>>> from whatstk import WhatsAppChat
>>> from whatstk.analysis import get_response_matrix
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> responses = get_response_matrix(chat=chat)
>>> responses
                Ash Ketchum  Brock  ...  Raichu  Wobbuffet
Ash Ketchum               0      0  ...       1          0
Brock                     1      0  ...       0          0
Jessie & James            0      1  ...       0          0
Meowth                    0      0  ...       0          0
Misty                     2      1  ...       1          0
Prof. Oak                 0      1  ...       0          0
Raichu                    1      0  ...       0          0
Wobbuffet                 0      0  ...       0          0