whatstk.analysis package

Submodules

whatstk.analysis.interventions module

Base analysis tools.

Functions:

get_interventions_count([df, chat, ...])

Get number of interventions per user per unit of time.

whatstk.analysis.interventions.get_interventions_count(df: DataFrame = None, chat: BaseChat = None, date_mode: str = 'date', msg_length: bool = False, cumulative: bool = False, all_users: bool = False) DataFrame[source]

Get number of interventions per user per unit of time.

The unit of time can be chosen by means of argument date_mode.

Note: Either df or chat must be provided.

Parameters:
  • df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.

  • chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.

  • date_mode (str, optional) –

    Choose mode to group interventions by. Defaults to date_mode=date. Available modes are:

    • 'date': Grouped by particular date (year, month and day).

    • 'hour': Grouped by day hours (24 hours).

    • 'month': Grouped by months (12 months).

    • 'weekday': Grouped by weekday (i.e. monday, tuesday, …, sunday).

    • 'hourweekday': Grouped by weekday and hour.

  • msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.

  • cumulative (bool, optional) – Set to True to obtain commulative counts.

  • all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.

Returns:

pandas.DataFrame – DataFrame with shape NxU, where N: number of time-slots and U: number of users.

Raises:

ValueError – if date_mode value is not supported.

Example

Get number of interventions per user from POKEMON chat. The counts are represented as a NxU matrix, where N: number of time-slots and U: number of users.

>>> from whatstk import WhatsAppChat
>>> from whatstk.analysis import get_interventions_count
>>> from whatstk.data import whatsapp_urls
>>> filepath = whatsapp_urls.POKEMON
>>> chat = WhatsAppChat.from_source(filepath)
>>> counts = get_interventions_count(chat=chat, date_mode='date', msg_length=False)
>>> counts.head(5)
username    Ash Ketchum  Brock  Jessie & James  ...  Prof. Oak  Raichu  Wobbuffet
date                                            ...
2016-08-06            2      2               0  ...          0       0          0
2016-08-07            1      1               0  ...          1       0          0
2016-08-10            1      0               1  ...          0       2          0
2016-08-11            0      0               0  ...          0       0          0
2016-09-11            0      0               0  ...          0       0          0

[5 rows x 8 columns]

whatstk.analysis.responses module

Get infor regarding responses between users.

Classes:

Norms(ABSOLUTE, JOINT, SENDER, RECEIVER)

Functions:

get_response_matrix([df, chat, zero_own, norm])

Get response matrix for given chat.

class whatstk.analysis.responses.Norms(ABSOLUTE, JOINT, SENDER, RECEIVER)

Bases: tuple

Attributes:

ABSOLUTE

Alias for field number 0

JOINT

Alias for field number 1

RECEIVER

Alias for field number 3

SENDER

Alias for field number 2

ABSOLUTE

Alias for field number 0

JOINT

Alias for field number 1

RECEIVER

Alias for field number 3

SENDER

Alias for field number 2

whatstk.analysis.responses.get_response_matrix(df: DataFrame | None = None, chat: BaseChat | None = None, zero_own: bool = True, norm: str = 'absolute') DataFrame[source]

Get response matrix for given chat.

Obtains a DataFrame of shape [n_users, n_users] counting the number of responses between members. Responses can be counted in different ways, e.g. using absolute values or normalised values. Responses are counted based solely on consecutive messages. That is, if \(user_i\) sends a message right after \(user_j\), it will be counted as a response from \(user_i\) to \(user_j\).

Axis 0 lists senders and axis 1 lists receivers. That is, the value in cell (i, j) denotes the number of times \(user_i\) responded to a message from \(user_j\).

Note: Either df or chat must be provided.

Parameters:
  • df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.

  • chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.

  • zero_own (bool, optional) – Set to True to avoid counting own responses. Defaults to True.

  • norm (str, optional) –

    Specifies the type of normalization used for reponse count. Can be:

    • 'absolute': Absolute count of messages.

    • 'joint': Normalized by total number of messages sent by all users.

    • 'sender': Normalized per sender by total number of messages sent by user.

    • 'receiver': Normalized per receiver by total number of messages sent by user.

Returns:

pandas.DataFrame – Response matrix.

Example

Get absolute count on responses (consecutive messages) between users.

>>> from whatstk import WhatsAppChat
>>> from whatstk.analysis import get_response_matrix
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> responses = get_response_matrix(chat=chat)
>>> responses
                Ash Ketchum  Brock  ...  Raichu  Wobbuffet
Ash Ketchum               0      0  ...       1          0
Brock                     1      0  ...       0          0
Jessie & James            0      1  ...       0          0
Meowth                    0      0  ...       0          0
Misty                     2      1  ...       1          0
Prof. Oak                 0      1  ...       0          0
Raichu                    1      0  ...       0          0
Wobbuffet                 0      0  ...       0          0

Module contents

Analysis tools.

Functions:

get_interventions_count([df, chat, ...])

Get number of interventions per user per unit of time.

get_response_matrix([df, chat, zero_own, norm])

Get response matrix for given chat.

whatstk.analysis.get_interventions_count(df: DataFrame = None, chat: BaseChat = None, date_mode: str = 'date', msg_length: bool = False, cumulative: bool = False, all_users: bool = False) DataFrame[source]

Get number of interventions per user per unit of time.

The unit of time can be chosen by means of argument date_mode.

Note: Either df or chat must be provided.

Parameters:
  • df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.

  • chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.

  • date_mode (str, optional) –

    Choose mode to group interventions by. Defaults to date_mode=date. Available modes are:

    • 'date': Grouped by particular date (year, month and day).

    • 'hour': Grouped by day hours (24 hours).

    • 'month': Grouped by months (12 months).

    • 'weekday': Grouped by weekday (i.e. monday, tuesday, …, sunday).

    • 'hourweekday': Grouped by weekday and hour.

  • msg_length (bool, optional) – Set to True to count the number of characters instead of number of messages sent.

  • cumulative (bool, optional) – Set to True to obtain commulative counts.

  • all_users (bool, optional) – Obtain number of interventions of all users combined. Defaults to False.

Returns:

pandas.DataFrame – DataFrame with shape NxU, where N: number of time-slots and U: number of users.

Raises:

ValueError – if date_mode value is not supported.

Example

Get number of interventions per user from POKEMON chat. The counts are represented as a NxU matrix, where N: number of time-slots and U: number of users.

>>> from whatstk import WhatsAppChat
>>> from whatstk.analysis import get_interventions_count
>>> from whatstk.data import whatsapp_urls
>>> filepath = whatsapp_urls.POKEMON
>>> chat = WhatsAppChat.from_source(filepath)
>>> counts = get_interventions_count(chat=chat, date_mode='date', msg_length=False)
>>> counts.head(5)
username    Ash Ketchum  Brock  Jessie & James  ...  Prof. Oak  Raichu  Wobbuffet
date                                            ...
2016-08-06            2      2               0  ...          0       0          0
2016-08-07            1      1               0  ...          1       0          0
2016-08-10            1      0               1  ...          0       2          0
2016-08-11            0      0               0  ...          0       0          0
2016-09-11            0      0               0  ...          0       0          0

[5 rows x 8 columns]
whatstk.analysis.get_response_matrix(df: DataFrame | None = None, chat: BaseChat | None = None, zero_own: bool = True, norm: str = 'absolute') DataFrame[source]

Get response matrix for given chat.

Obtains a DataFrame of shape [n_users, n_users] counting the number of responses between members. Responses can be counted in different ways, e.g. using absolute values or normalised values. Responses are counted based solely on consecutive messages. That is, if \(user_i\) sends a message right after \(user_j\), it will be counted as a response from \(user_i\) to \(user_j\).

Axis 0 lists senders and axis 1 lists receivers. That is, the value in cell (i, j) denotes the number of times \(user_i\) responded to a message from \(user_j\).

Note: Either df or chat must be provided.

Parameters:
  • df (pandas.DataFrame, optional) – Chat data. Atribute df of a chat loaded using Chat. If a value is given, chat is ignored.

  • chat (Chat, optional) – Chat data. Object obtained when chat loaded using Chat. Required if df is None.

  • zero_own (bool, optional) – Set to True to avoid counting own responses. Defaults to True.

  • norm (str, optional) –

    Specifies the type of normalization used for reponse count. Can be:

    • 'absolute': Absolute count of messages.

    • 'joint': Normalized by total number of messages sent by all users.

    • 'sender': Normalized per sender by total number of messages sent by user.

    • 'receiver': Normalized per receiver by total number of messages sent by user.

Returns:

pandas.DataFrame – Response matrix.

Example

Get absolute count on responses (consecutive messages) between users.

>>> from whatstk import WhatsAppChat
>>> from whatstk.analysis import get_response_matrix
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.POKEMON)
>>> responses = get_response_matrix(chat=chat)
>>> responses
                Ash Ketchum  Brock  ...  Raichu  Wobbuffet
Ash Ketchum               0      0  ...       1          0
Brock                     1      0  ...       0          0
Jessie & James            0      1  ...       0          0
Meowth                    0      0  ...       0          0
Misty                     2      1  ...       1          0
Prof. Oak                 0      1  ...       0          0
Raichu                    1      0  ...       0          0
Wobbuffet                 0      0  ...       0          0