
Last week we looked into the context of video in our analysis and scouting work. We also spoke about the types of video and how to begin with analysing video footage. If you want to read again of haven’t read it yet, you can view it here.
Today we are going a bit deeper into the actual tools you might need in the world of recruitment analysis, scouting and opposition analysis/scouting. Today we are making a start with working with data. The actual analysis with data and the visualisations will come in a later part, but this part is all about collecting data, preparing data and manipulating data.
Part IV is about data and we will discuss these elements of working with data:
- Collecting data
- Translating data
- Preparing data
Before we start, it’s important to know the role of data in your recruitment process and how to give a particular value to it. Data on it’s own doesn’t say anything, it’s effectively useless. It’s always important to give the proper context to the data you are using and it will support your process or workflow. It will not become leading in my opinion in cases such as recruitment. Sometimes data can support the style of play and sometimes it supports the performances. It’s important that you give context, make sure it’s representative and have a good database to work with.
Collecting data
There are two ways of getting data to work with. The first one is to look at different platforms that offer data. I’m working with match-level performance data and not event data, and these platforms do offer great stuff:
- Wyscout
- Opta
- Playmaker AI
- TransferLab – Analytics FC
- Statsbomb
I think all of these platforms do great things, but one thing about them is that they cost money. Again, it’s all about what you can spend and it might be a lot of money. But if you have the means or know people with access, these can be great tools. I will use Wyscout as an example to illustrate how to get data to work with in terms of preparing data for a recruitment profile.

When you are logged in, you get a screen. On the right side, you can choose men or women, but for the example of Wyscout, I will use women’s football. The next step is to go to advanced search.


On the left side, you can look at the preferred league you want to look at and the preferred season. I’m choosing the German Frauen-Bundesliga and focusing on the current 2022/2023 season.

The next step is to look at the positions you want to include when working with this database. I want to focus on the attacking players and that’s why I have included all strikers, wingers and attacking midfielders in the Frauen-Bundesliga.

Last thing before downloading is to look at the data you want to select. Wyscout has some profile already built in into the website, and I will choose the attacking one – but you can also opt to create one yourself and those can be found under ‘Custom’.

The last thing you do is go to the bottom of the page and it says ‘Export to Excel’ – if you click on that, it will download that database into an excel which we can use later.
Again, this is for the people who can afford or have access to Wyscout, which can come at a price which is too high for many. Another way is to access it via websites like FBRef. They host Opta data and you can download it directly from the website to an excel file.

I went to the page of the Primeira Liga and I wanted to download the stats on this table.

It’s quite easy to download it from here, but you can also modify the table so it fits your needs. After that a download will start and you will have an excel file to start working on. FBRef is free and you can access a variety of men’s and women’s leagues.
If you don’t like the manual work, you can always have a look at what experts have done in tutorials for R and Python. For R definitely go to WorldFootballR and for Python, I would go to McKay Johns.
Translating data
Data translation is the process of converting data from one format or structure to another format or structure. It is an important process because it enables data to be shared, analysed, and utilised across different systems and applications that may use different formats or structures.
Here are some reasons why data translation is important:
- Data integration: Data translation allows different systems and applications to share data and integrate with each other, enabling organisations to have a more complete view of their data and make better-informed decisions.
- Interoperability: Data translation ensures that different systems and applications can work together seamlessly, even if they use different formats or structures for their data.
- Data analysis: Data translation enables data to be analysed and processed using different tools and applications, allowing organizations to gain insights into their data and make better decisions.
Data translation is crucial for enabling data to be shared, integrated, and utilised across different systems. So when collecting data, you need to make sure it’s translated and prepared for analysis in R/Python, or for Tableau or for other systems in which you can do the analysis.
Preparing data
For this example we continue with the data of the Frauen-Bundesliga and I want to make sure it’s representative to use and it fits my particular profile. I’m looking for an attacking player who has good striking abilities, can create and will also offer attacking action in possession.

Okay so I have my excel file with 98 players, but I need to filter and make sure the data is prepared for further analysis. First of all I think it’s important to have an adequate amount of minutes played. Otherwise, your data will get skewed and it isn’t representative. Someone with great numbers in one game can then look better than someone playing 10 games, but it isn’t always representative. That’s why I have a threshold of 450 minutes played in the case of the Frauen-Bundesliga halfway through the season.

Now we see a change. There are only players in my file that have played at least 500 minutes. That also means that our database of 98 players has gone down to 52 players. These 52 players are representative in terms of minutes played to be analysed further down the line.
But we are not done, there are different metrics in this file that I might use and some that I don’t use. I want to look at the data per 90 minutes and not total because that gives a better point of view.

I have deleted all totals, as well as successful actions (a collection metric) and success rate of crosses and dribbles, as well as the offensive duels. Why I have deleted the success rate has all to do that I’m looking for style and intention, and not so much (at least in this step of the process) for performance.
This is now the prepared database, which I will use for my data analysis. It’s not a big step and this is obviously quite basic – but it’s a necessary step to do so that your analysis will be sound and useful in the context of the recruitment process. Now you are ready for further data analysis
If you want to already see what it looks like and what you can do with data, I’ve included two files from Wyscout you can use. Download them here.

Next week we will come to the fifth part of video analysis report. We will have a look at how you make a report based on video analysis