Posts

Showing posts from December, 2018

Adventures with R - Cricket Analysis (Clustering players based on ODI data)

Continuing on the Cricket Analysis Series I wanted to take a deep dive into Clustering. I have the ODI database and I thought it would be instructive to put the data to use The main guide I used can be found HERE This is an excellent guide to cluster analysis in R and I highly recommend it The main code can be found HERE The final output file can be found HERE The final Tableau public dashboard can be found HERE The code walk through is as ## Different packages that you need library(mclust) library(tidyverse) library(cluster) library(factoextra) library(data.table) library(reshape2) library(sqldf) setwd('C:/Training/R/CricketAnalysis/') myData <- read.csv("ODIData.csv") myData <- sqldf("select Player, sum(Runs) Runs, sum(Mins) Mins, sum(BF) BF, sum(Fours) Fours, sum(Sixes) Sixes,                    ((100 * sum(Runs))/sum(BF)) SR from myData                 group by Player") ## Removing players that score less than the me

Adventures with R - Cricket Analysis (creating a Player Database)

One of the things I had encountered in the main series   was the inability to create a comprehensive player database. That part of the code was fairly manual. I kept noodling and tinkering around to get an approach for creating a comprehensive player database. This would replace the manual approach of extracting one player link at a time. I had initially set out to use the excellent Rvest package but ran into some issues trying to decipher the xpath that is required to make the link work. I believe that the player information is coded directly as html tags on crickinfo and it would have taken me a couple of xpath loops to get the player name and then then the player profile id out. Definitely doable (but will keep Rvest for another code I have in mind.. I focused on using dplyr, tidyr to do my heavy lifting The code can be found HERE Code walk through. I am reproducing the first part of the code, the main code is available for everyone to look at library(stringr) library

Adventures with R - Cricket Analysis

On my journey for more interesting R packages, I stumbled onto CricketR.  CricketR is a wonderful package found on the CRAN library HERE   and written/maintained by tvganesh ( GitHub link ) .  Big Hat Tip to him for taking the Crickinfo Stats Guru website and converting the query/output to a R package As usual,  RStudio is the programming interface that I used. For ease of use, I go to Tools --> Global Options --> Appearance. I used Sky as my RStudio theme, Lucida console, font size 11 and Idle Fingers as my Editor theme. It gives a neutral black background and the fonts (comments, code, etc) is much more clearer to see All the files on the code are as follows, Main Code Player F ile Venue File Result File A key objective I had in this exercise Avoid manual code runs or brute force . The package was easy enough to run one player at a time but I wanted to run it in an automated fashion for multiple players The R package needs player id to run but Cricinfo provid