In Analytics insights

It’s not just our clients that like making magic with our Data Mining tools. Spread across our various teams, there is an elite band of individuals constantly experimenting with ways to get more out of data using SQL.

We asked each of our resident query kings to choose one favorite query that our clients could use to unearth valuable insights from their game data. These are the answers they gave.


 

Russell ‘Mighty Joe’ Young

Senior Analyst

I like to have a nose at how a random player flows through the game. Looking at individual flows is a really good way of getting a quick feel for typical event sequences in the platform.

This very simple query returns a whole lot of interesting and useful information:

select eventTimestamp, eventName, missionName, productName, transactionVector
from events
where userID = '1001-0211-7978-FAKE'
order by 1

The screencap below shows the event sequence of a player starting a session and how they fare in their first couple of sessions, breezing through 2 missions.

Looking at what happened immediately after, we can see transactions starting to come into the equation. After completing mission First Time User Forest #2, the player spends some of their gold coins in exchange for an inventory ‘shuffle’ before going onto the next mission with a new set of items.


 

Isaac ‘Wild’ Roseboom

Chief Product Officer

Along the same lines as Russell above, my query allows you to see the event stream for players that meet certain criteria. It’s super handy for QA and debugging because you’ll get a different user every time you run it.

If, you suspect that events aren’t being correctly recorded for players who installed a certain version of your game, for example, you can run something like the following:

select collectInsertedTimestamp, eventTimestamp, userID, eventName, sessionID, clientVersion from events 
where userID IN (
select
user_ID
from user_metrics
where eventNewPlayerCount is null
and user_first_seen_date between '2019-08-01' and '2019-09-25'
limit 1
)
order by collectInsertedTimestamp

What comes back is the event stream of a random player running the game version that seems to be experiencing problems. Do this a few times and you can very quickly start to spot patterns in the data.


 

Michael ‘The Rain’ Forrest

Product Manager

Your global playerbase will show peaks and troughs in user activity, but I really like to observe player behaviors in the context of specific timezones. By looking at activity patterns against common 24-hour time periods, you can start to build a clear picture of how individuals like to play and when.

Not all games track local time but, assuming yours does, you can do something like the following query:

select date_part('hour', userTimestamp)
, count(*) events
, count(distinct userID) users
, sum(msSinceLastEvent)/1000/60/60 as time_spent_hours
, sum(msSinceLastEvent)/count(distinct userID)/1000/60/60 as time_spent_hours_per_user
, count(case when eventName = 'transaction' and eventLevel = 0 then 1 else 0 end) as purchases
-- and so on
from events
group by 1
order by 1

Here’s a nice example of the kind of insights you can get from looking at things in terms of timezones:

For games that do track local time as a timestamp, you can do some magic to shift that timestamp and compare the behaviors of players from multiple timezones as if they were all on the same schedule. More on that particular wizardry here. What a world.


 

Ben ‘Brand’ Newbon

Senior Data Analyst

This is an absolute monster of a query but it’s a belter, honest. The complexity is justified because what you get back is a list of all the units/items/cards that a player can own, as well as performance and usage stats per player.

To help you deconstruct this behemoth, I’ve made some of the key terms bold so that I can refer to them specifically and explain what they mean below. God’s speed:

with starters as(
select 'deckChanged' eventName,'ADDED' changeType,'CARD_FIRETEAM' cardName
union
select 'deckChanged','ADDED','CARD_RPG'
union
select 'deckChanged','ADDED','CARD_SHOTGUNNER'
union
select 'deckChanged','ADDED','COMMAND_BOMB_DROP'
)
,activeDeckData as(
select eventTimestamp,userId,eventname,cardName,changeType,missiontype,outcome,userscore
from events
where eventName in('cardGained','deckChanged','cardUpgraded')
and userId in (select user_Id from userSelection)
union all
select min(eventTimestamp) as eventTimestamp,userId,s.eventname,s.cardName,s.changeType,'' as missiontype,'' as outcome,userscore
from events e full outer join starters s on 1
where e.eventName = 'newPlayer'
group by userid,s.eventname,s.cardname,s.changeType,missionType,outcome,userscore
)
,battleData as(
select eventTimestamp,userId,eventname,cardName,changeType,missiontype,outcome,userscore
from events
where eventName in ('missionStarted','missionSummary')
)
,userCardData as(
select userId,cardName
from activeDeckData
group by userId,cardName
)
,enrichedBattleData as(
select eventTimestamp,b.userId,eventName,u.cardname,changeType,missiontype,outcome,userscore
from battleData b inner join userCardData u on b.userId = u.userID
)
,combinedData as(
select * from activeDeckData
union all
select * from enrichedBattleData
)
,data as(
select *
,1 as unlocked
,greatest(0,sum(case when eventname = 'deckChanged' and changeType = 'ADDED' then 1
when eventname = 'deckChanged' and changeType = 'SWAPPED' then -1 else 0 end)
over (partition by userId,cardName order by eventTimestamp)) as inDeck
from combinedData
order by userid,cardname,eventtimestamp
)
select cardName
,count(distinct case when unlocked > 0 then userid end) as owners
,round(count(case when eventname = 'cardUpgraded' then 1 end) / count(distinct case when unlocked > 0 then userid end),2.0) as Upgrades 
,round( count(case when eventname = 'missionStarted' then 1 end) / count(distinct case when unlocked > 0 then userid end),2.0) as BattlesSinceUnlocked
,round( count(case when eventname = 'missionStarted' and inDeck > 0 then 1 end) / count(distinct case when unlocked > 0 then userid end),2.0) as BattlesPlayedEquipped
,round(count(case when eventname = 'missionStarted' and inDeck > 0 then 1 end)/greatest(1,count(case when eventname = 'missionStarted' and unlocked > 0 then 1 end))*100, 2.0) as 'usage %'
,round(count(case when eventname = 'missionSummary' and outcome = 'WON' and inDeck > 0 then 1 end)/greatest(1,count(case when eventname = 'missionSummary' and inDeck > 0 then 1 end))*100, 2.0) as 'win %'
,round(count(case when eventname = 'missionSummary' and outcome = 'LOST' and inDeck > 0 then 1 end)/greatest(1,count(case when eventname = 'missionSummary' and inDeck > 0 then 1 end))*100, 2.0) as 'loss %'
from data
group by 1
order by 2 desc

As intimidating as it may look in SQL, the results of this query are actually very easily digestible. Check them out below.

Notes & terms

The bracketed sections next to each term are referred to as CTEs (common table expressions) and can be used to set up a kind of sub-table that only exists within the runtime of the query.

  • starters – hard codes into the query the cards that the player begins the game with when there isn’t data sent for players receiving these cards like others they gain during gameplay.
  • activeDeckData – combines starters with actual data received surrounding the gaining, assigning and upgrading of cards in time order.
  • battleData – pulls out all the mission data for each player – the card related fields will come through as null from the mission events.
  • userCardData – lists all the cards ever owned by each player.
  • enrichedBattleData – combines battleData and userCardData so each battle now has a line for each card (even if the card wasn’t used for that battle).
  • combinedData – adds activeDeckData to enrichedBattleData.
  • data – sorts combinedData in time order and adds tags to track when a card is unlocked and when it is equipped or not.
  • The final section uses data to pull out the overall stats for each card.

 

Steven ‘The Transit’ Van Stiphout

Product Manager

I’ve chosen to highlight one particular trick that utilizes the structure of our data warehouse to make running queries more efficient. By examining specific subsets of users,  it really comes into its own when drilling into large amounts of data with complex computation.

Because our data warehouse stores all values of a column together, rather than rows, that means that limiting the number of columns included causes queries to run a LOT faster.

Let’s say you want to run this simple query against your data:

select count(distinct eventName, userID)/count(distinct userID) as 'avg number of event types per user' 
from events

You might expect it to run fairly quickly given its simplicity but, with a large dataset, that isn’t always the case. Taking advantage of the shape of our warehouse and the fact userIDs are stored as a hash, a quick trick can be used to reduce the number of users that are included to increase the query speed:

select count(distinct eventName, userID)/count(distinct userID) as 'avg number of event types per user' 
from events 
where mod(hash(userID, 100) = 66

Here, we’re splitting our users into 100 pseudo-randomly assigned and numbered buckets and taking a look at bucket #66.

Adding this code into simple queries won’t make much difference but, as we increase computational complexity, filtering data in this way becomes more and more important. It saves time and money, and it allows you to get insights on sample subsets of users without having to process entire datasets.


In the pursuit of knowledge, our SQL heroes have forgone countless Friday nights, relationships, weddings, and Christmas dinners* so that you don’t have to. When you run these queries, take a moment to remember the brave souls that created them.

*Probably

 

 

If you have any questions, about the contents of this piece or anything else, contact us at [email protected] and we’ll connect you to the relevant person.

 

Recent Posts

Leave a Comment

Start typing and press Enter to search

X