Category Archives: Data Science

Excel Map Charts with UK Data Sets

As part of my work for the Superhighways data project I’ve been looking at how familiar, generic software can be used to support data analysis and presentation. The obvious example is MS Excel, which offers not only analysis tools but a range of chart, graph and infographic options. One that isn’t used very frequently, however, is the ‘Map Chart’ option which generates choropleth maps by using Bing Maps. You need to have your data in columns with the names of geographical areas and the values to be plotted. With these highlighted you can select the ‘Insert’ menu and hit the ‘Maps’ button and the map will be generated.

The Map Chart button lurking on the Excel ‘Insert’ Toolbar

There is a short description and some examples online, with some advice about the need to provide ‘disambiguation’ in the form of an additional column of data. This locates the target areas within a higher-level geographical division – so ‘Counties’ are located within a ‘Province’. The disambiguation is important as I discovered pretty quickly. A test dataset with Kingston (the one in the UK) produced a nice map of Jamaica, and one with Cambridge (again, the UK one), a map of Massachusetts.

Some Features – and Limitations

Once you have generated a map you can adjust the colour scheme. The expectation is that you will have some kind of scalar value and that you will choose a colour gradient to reflect the values on this. Double clicking on the map that’s been generated brings up a contextual menu where you can adjust the colour schemes and opt for labels to be displayed. You can also choose to display a map showing only those areas with values, which allows you to produce regional maps. Here’s East Anglia, for example:

Map of East Anglia produced by Excel Map Charts
East Anglia. Note that the Essex coast has been smoothed significantly and there’s something odd going on on the Essex-Suffolk border.

There are some issues with the shapes that are generated – it appears that there are multiple shape definitions being used, so that a ‘filled’ area may overlap the boundaries of the ‘wireframe’ outline, as has happened in the image above, where the boundaries between Essex and Suffolk don’t match. You can see this more clearly here:

Something definitely awry here. Digitisation error? The ‘two-horned’ area intruding into Suffolk appears to be the boundary of the Dedham Vale Area of Outstanding Natural Beauty so perhaps that has been mistakenly read as a county boundary.

Since I was particularly interested in generating maps of London I was disappointed to find that as far as Excel Map Charts is concerned, ‘London’ is a single geographical unit. It can feature on a Map Chart, but you can’t display any subdivisions such as the London Boroughs within it. The reason is that the divisions which are used for the UK maps are the ‘ceremonial counties’ – these are different from the ‘historic counties’ and, perhaps more significantly for data science applications, they don’t match exactly to the administrative divisions used in government data sets. You can read more about these distinctions here.

Generating UK Map Charts

Despite these limitations, I decided to persevere and tried plotting some larger maps. Bing Maps can be quite picky about the names given to areas, so even with the recommended disambiguation, I had to try out different versions of county names. Let’s have a look at the four parts of the UK in turn. First, Northern Ireland. A few issues emerged immediately:

  • The counties are not the administrative divisions of NI – in fact they are very different – so most data sets would be hard to plot here
  • Belfast and Derry/Londonderry do not plot as separate areas
  • It is essential to include the “County” in the names
  • Both “County Derry” and “County Londonderry” will plot correctly
  • Lough Neagh (the UK’s largest inland area of water!) does not feature on the map

This example does shows how a Map Chart can capture aspects of data which a more conventional chart would not. Here, the East-West variation in population is evident: although this is again deceptive as it does not represent the concentration of population in Belfast (pop. 340,000 in the city itself and about double that in the metropolitan area).

Data set and Map of Northern Ireland
Map Chart of Northern Ireland showing population by county.

The Map Chart of England plots reasonably easily, as long as you remember that these are the counties rather than some of the more familiar areas. The East Riding of Yorkshire needs its name in full – not “East Yorkshire”.

English Counties on Map Chart. Some more overlapping boundaries are evident here.

When we plot Scotland, things get a little more complicated. Some of the counties like Borders incorporate a number of smaller ‘historic counties’ which will also plot onto the map, so consistency is important. More significantly, West Dumbartonshire does not appear to plot at all, and when the map is generated, it shows as having “No data”.

Scottish Counties with the ‘West Dumbartonshire’ bug showing.

There are similar issues with the map of Wales. A case in point is the County of Powys, which (like Scottish Borders) incorporates a number of older and smaller counties (Radnorshire, Brecknockshire and Montgomeryshire and part of Denbighshire). These will also plot, but some will map to the whole of Powys. So probably best avoided.

There are several bugs in the Wales map:

  • Blaenau Gwent plots in the wrong place – it replaces Monmouthshire, which doesn’t seem to want to plot at all (neither ‘Monmouth’ nor ‘Monmouthshire’ works)
  • Neath and Port Talbot plots onto Swansea, although Swansea does plot correctly

Counties of Wales with Neath and Port Talbot (centre bottom, in grey) and Blaenau Gwent plotting incorrectly to Monmouthshire (bottom right).

There’s a discussion on a Microsoft Support forum about these plotting errors here, but they clearly haven’t been resolved as yet.

In Summary …

In summary, then, Excel Map Charts is potentially a very useful tool offering the potential to generate choropleth maps very easily using familiar software. However:

  • The areas and shape set offered may not be the most useful for working with contemporary administrative data
  • The use of ceremonial counties means that maps of different parts of the UK will work differently – Edinburgh and Glasgow appear on the map of Scotland, but Belfast and Derry/Londonderry don’t, on the map of Northern Ireland
  • No subdivisions of Greater London
  • Some unresolved bugs as listed above

Download the UK Counties List

As a result of my investigations, I was able to produce a list of county names almost all of which will reliably plot onto the Map Chart if used in an Excel spreadsheet.

You can download it here but as I hope I’ve made clear here, I would ‘handle with care’ and carefully check any maps that are produced!

Data Science and Civil Society in London: 1

I have started work as Data and Evaluation Advisor to a project led by Superhighways (part of Kingston Voluntary Action) to develop data science, inquiry and evaluation across London’s voluntary sector. This ties in well with broader initiatives to strengthen ‘civil society’ more generally in the capital. As I get up to speed on the project, it will be reported on the Superhighways website, but I will also be reporting on the data science aspects of the project here.

One thing I want to do is to develop a more accurate ‘map’ of the networked data assets that are available for use by voluntary sector organisations, researchers, and workers – as well as looking at how these can be located, shared, used, added to, disassembled and reassembled. Given my prior interests in case study and case-based learning, I’m particularly interested in how heterogeneous data can be configured and reconfigured into ‘cases’ with different purposes, audiences and outcomes.

Some Common Themes

There are many analogies here with the work I carried out a few years ago as part of the ESRC funded ‘Ensemble’ Project (2009-2012). This was concerned to explore how semantic web and linked data approaches could be used to enhance and reframe teaching and learning in higher education – especially in support of case based learning. While some of the technologies have moved on since Ensemble, many of the issues and barriers to data use that we identified then in educational contexts, seem still to be current more broadly.

The other tie-in here is with my more recent work about the need for the development of ‘critical data literacies’ as part of broader critical digital literacies. These are not restricted to addressing concerns about the reliability of data on which policy decisions (for example) might be based, nor to highlighting issues around the use of personal data, but on laying the foundations for a more expansive ‘data activism’. Stefania Milan and Lonneke van der Velden have written an interesting article about this which is worth a read, and I have been writing about this myself with my colleague Fran Tracy. We presented a paper on this at the Networked Learning Conference 2018 which related critical digital literacies to the idea of ‘students as producers’ of knowledge. The slides from our presentation ‘Student Inquiry, Networks of Knowledge, and Linked Data‘ are here to download. A more developed version of this paper is to be published in late 2019 or early 2020 in a collection entitled Mobility, Data and Learner Agency in Networked Learning (Springer).

In subsequent posts I will be presenting some of the data assets – not only quantitative data sets, but qualitative resources too – that I find interesting. I’ll also be producing examples of how they can be used within existing organisations and activities, but also to support new forms of inquiry and activism.