Experimenting with Hierarchical Clustering in a galaxy far far away...

Introduction This post will be taking a bit of an unexpected diversion. As I was experimenting with hierarchical clustering I ran into the issue of how many clusters to assume. From that point I went deep into the rabbit hole and found out some really useful stuff that I wish I’d have known when I wrote my previous post. I’ve discovered that choosing a number of clusters is a whole topic in itself, and there are, in general, two ways of validating a choice of cluster number: [Read More]

Use the k-means clustering, Luke

In my last post I scraped some character statistics from the mobile game Star Wars: Galaxy of Heroes. In this post, I’ll be aiming to try out k-means clustering in order to see if it comes out with an intuitive result, and to learn how to integrate this kind of analysis into a tidy workflow using broom. First I’ll load the required packages and set some plot preferences. library(tidyverse) ## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1. [Read More]

Experimentation with Unsupervised Learning

Motivation I’ve written before about my learning plans, which always seem to be in a state of flux, and in particular learning about machine learning. Part of the reason why I’m so reticent is because I’m a mathematician and statistics does not come natural or easy for me. My limited past experience has exposed to me just how much I don’t know. It’s fairly easy to apply a statistical model in R, and even have a go at assessing its performance, however I am acutely aware that there is a certain ‘dark art’ to it requiring a deeper understanding of knowing exactly how to interpret results, and how far you can take it. [Read More]

Are R ecosystems the future?

Some random thoughts… Over the past 6 months I’ve been creating, refining, and delivering a variety of ‘Introduction to R’ training courses. The more I do this, the more I come to the view that not nearly enough is made of taking an ecosystem-oriented view to packages. A good way of talking about #rstats functionality is in terms of ecosystems, rather than individual packages. Tidyverse, tidymodels, RMarkdown & Co, and HTML widgets are all worth highlighting. [Read More]

Let's call it tidysearch

R became 25 years old last year, and yet it’s only in relatively recent years that the language has really taken off with numerous conferences every year driven by a passionate and vibrant community of users. A large part of this has been driven by an ecosystem of R packages called the Tidyverse, which many new users nowadays begin their R journey with. This alternative ‘opinionated’ set of packages has been adopted now as canon by many users (including me) and the wave of hype and success associated with it has caused many experienced R users, well versed in long established ‘Base R’ functions, to take the leap into a new way of coding and a whole new set of functions. [Read More]

Mapping homelessness in England

Introduction Data wrangling Initial analysis The painful part Introduction For this blog post, I decided to try to find a dataset covering an issue I feel quite strongly about - homelessness. I managed to find a fairly large dataset from the Cambridgeshire Insight website. For a while I’ve wanted to try out R’s mapping potential and hopefully generate a heatmap, so I’ve deliberately tried to find a dataset where I can try this out. [Read More]

Two years in Data Science and not yet a Data Scientist

What’s in a name? Despite the potentially grumpy sounding title of this post, this is more a positive reflection of the past two years since I started working in Data Science. I think I’ve come a long way, but there is still so far to to go if I am to confidently call myself a Data Scientist. Why does a job title matter? It’s a good way of thinking about your competencies and describing where you want to go, and conveying that to other people. [Read More]

Portsmouth R User Group - 2nd Meeting

Last month I attended my first ever R User Group meeting, which was held at the University of Portsmouth in their impressive Future Technology Centre. I’d been itching to go to one of these meetups for a while, but unfortunately there was nothing in the South of England, so when this opportunity came around I couldn’t miss it, especially as I couldn’t attend the first one. It was really well attended by about 30 people from all manner of backgrounds, and two briefs were given. [Read More]

Learning some tough lessons!

It’s been a while since I’ve posted - mainly because I got myself into a mess with Git and I’ve been putting off trying to sort it out. I’ve been wanting to post about htmlwidgets for a while now, and in my naivity I thought I’d retrospectively try to use the DT package on one of my older posts to jazz up the tables to be more interactive…big mistake! Lesson 1 - Don’t enhance old posts! [Read More]

Embarking on nested dataframes

In a recent sprint, I was faced with the problem of carrying out analysis on data extracted from a database where there were several instances of the same table type and I wanted to do the same tasks on each of them. I know enough about the tidyverse to realise that this was a good opportunity to use functions such as map() and nest(). However, I fell at the first hurdle when the pressure of producing results meant I couldn’t spend the time I needed to get it to work…something to work on in slower time, hence this post. [Read More]