Custom plot¶
FigureBuilder
provides some tools to easily visualize your chat. However, the possible
visualizations are infinite. Here, we provide some examples of a custom visualization using some library tools together
with pandas and plotly.
Number of messages vs. Number of characters sent¶
For each user, we will obtain a 2D scatter plot measuring the number of messages and characters sent in a day. That is, for a given user we will have N points, where N is the number of days that the user has sent at least one message. Each point therefore corresponds to a specific day, where the x-axis and the y-axis measure the number of messages sent and the average number of characters per message in that day, respectively.
First of all, lets instatiate objects WhatsAppChat
(chat loading) and
FigureBuilder
(figure coloring).
>>> from whatstk import WhatsAppChat, FigureBuilder
>>> from whatstk.data import whatsapp_urls
>>> chat = WhatsAppChat.from_source(filepath=whatsapp_urls.LOREM_2000)
>>> fb = FigureBuilder(chat=chat)
Next, we obtain the number of messages and number of characters sent per user per day.
>>> from whatstk.analysis import get_interventions_count
>>> counts_interv = get_interventions_count(chat=chat, date_mode='date', msg_length=False, cumulative=False)
>>> counts_len = get_interventions_count(chat=chat, date_mode='date', msg_length=True, cumulative=False)
Time to process a bit the data. We obtain a DataFrame with five columns: username, date, num_characters, num_interventions and avg_characters.
>>> import pandas as pd
>>> counts_len = pd.DataFrame(counts_len.unstack(), columns=['num_characters'])
>>> counts_interv = pd.DataFrame(counts_interv.unstack(), columns=['num_interventions'])
>>> counts = counts_len.merge(counts_interv, left_index=True, right_index=True)
>>> # Remove all zero entries and get average number of characters
>>> counts = counts[~(counts['num_interventions'] == 0)].reset_index()
>>> counts['avg_characters'] = counts['num_characters']/counts['num_interventions']
>>> counts.head(5)
username date num_characters num_interventions avg_characters
0 +1 123 456 789 2019-04-16 40 1 40.000000
1 +1 123 456 789 2019-04-17 21 1 21.000000
2 +1 123 456 789 2019-04-21 90 2 45.000000
3 +1 123 456 789 2019-04-25 127 3 42.333333
4 +1 123 456 789 2019-04-26 33 1 33.000000
[5 rows x 5 columns]
So far we have obtained a dataframe counts
, whose rows correspond to a specific message. However, in this example we
are interested in the aggregated values per day. Hence, we group this dataframe by user and date and re-calculate the
number of messages sent and average number of characters sent per day.
>>> agg_operations = {'avg_characters': 'mean','num_interventions': 'mean'}
>>> counts = counts.groupby(['username', counts.date.dt.date]).agg(agg_operations)
>>> counts = counts.rename_axis(index=['username', 'date'])
>>> counts = counts.reset_index()
>>> counts.head(5)
username date avg_characters num_interventions
0 +1 123 456 789 2019-04-16 40.000000 1
1 +1 123 456 789 2019-04-17 21.000000 1
2 +1 123 456 789 2019-04-21 45.000000 2
3 +1 123 456 789 2019-04-25 42.333333 3
4 +1 123 456 789 2019-04-26 33.000000 1
Once the dataframe is obtained, we generate a plot using Histogram2dContour by plotly.
>>> from whatstk.graph import plot
>>> import plotly.graph_objs as go
>>> traces = []
>>> for username in fb.usernames:
>>> counts_user = counts[counts['username']==username]
>>> traces.append(
>>> go.Histogram2dContour(
>>> contours={'coloring': 'none'},
>>> x=counts_user['num_interventions'],
>>> y=counts_user['avg_characters'],
>>> # mode='markers',
>>> # marker=dict(color=fb.user_color_mapping[username], opacity=0.2),
>>> name=username,
>>> showlegend=True,
>>> line={'color': fb.user_color_mapping[username]},
>>> nbinsx=10, nbinsy=20
>>> )
>>> )
>>> layout = {
>>> 'title': 'Average number of characters sent in a day vs Interventions per day',
>>> 'yaxis_title': 'avg characters',
>>> 'xaxis_title': 'num interventions',
>>> }
>>> fig = go.Figure(data=traces, layout=layout)
>>> plot(fig)