What is the PEX algorithm and why is it important to study it?
Published: 30th June 2017
Author: Aisling Tuite
In December 2016 we received funding from the Irish Research Council under the New Horizons Interdisciplinary Project Scheme to carry out a pilot project titled; Understanding Unemployment in the era of Big Data: Exploring how data-driven theory and algorithmic knowledge can support better policy and personal decision making. The purpose of the research is to explore the PEX (Probability of Exit) algorithm. Below are some initial notes on how we have begun to explore this algorithm and its effect on unemployed people.
There is no doubt that in the digital age the collation of large data sets and their organisation into useful ways for categorising and targeting is set to become increasingly prominent and an interesting topic to study. Algorithms are essentially a set of functions that use logic mathematical patterns to solve problems. They are part of the discrete mathematics family that feed into computing. We hear a lot about them, especially in the ‘online’ world where specific relevant adverts can be targeted towards us based on categories such as our age or gender. This is a very simplistic understanding of algorithms, many of which are far more complex. It is the everyday algorithms that we hear most about – those that control and influence our choices, whether as a consumer of goods or news. Algorithms are not new, but combined with large quantities of data stored online and increased computation power they will continue to influence our everyday lives. For this reason we need to study algorithms of all types, as Kavanagh, McGarry & Kelly (2015) note when exploring the possibility of carrying out an ethnography of an algorithm that their design is inaccessible to the observer, however, this is not always the case. The PEX algorithm is localised and has the benefit of being transparent, as opposed to those used by global corporate entities, it is therefore a good starting point for the study of algorithms.
Our interest in the PEX algorithm is two-fold. PEX came to our attention through our research into unemployment. At the time of our initial WUERC research project and the subsequent years of producing the Sociology of Unemployment PEX was in its infancy and was only being rolled out across the country with the introduction of the new DSP Intreo offices. While unemployment is our main interest we are also concerned with modern and future developments of policy on unemployment and the use of longitudinal datasets and algorithms to inform the DSP and Intreo agents is part of this ongoing process. As primarily social science researchers we have the tools and knowledge to develop an understanding of the social consequences of the use of algorithms but not the technical understanding. Therefore, we have teamed up with our STEM partner/mathematician Dr Aoife Hennessy to develop a more rounded understanding of how algorithms are developed and how they interact with society.
How and Why was PEX developed?
The Probability of Exit (PEX) algorithm was developed by the Economic and Social Research Institute (ESRI) at the request of the Department of Social Protection (DSP). It follows on from some previous attempts at understanding the requirements for reducing unemployment/live register numbers in the mid to late 1990s. It is important to note that attempts at understanding and profiling unemployed people have a longer history and that the driving force behind developing PEX was not the 2008 financial and employment crisis.
The algorithm was created by attaching an additional questionnaire to all new entrants into the social welfare system in a three month window from September 2006. The final number of relevant questionnaires was 30,762 people who received either Jobseekers Benefit and/or Jobseekers Allowance in this time. Respondents were then traced over a further period of 78 weeks which allowed for the development of six, twelve and fifteen month profiles. The types of questions asked were based on Age, Gender, Marital Status, Children, Perceived Health, Spousal Earnings, Employment/Unemployment History, Willingness to Relocate, Location, Transport and Education History. The result was a set of mathematical analyses that created two algorithms, one for male and one for female and with three classifications (low, medium, high) in each based on an individual’s probability of leaving the live register within 12 months. Subsequent intervention from case managers were to be informed by this classification with more intervention given to those with the lowest probability of exit.
The findings were reported in the document National Profiling of the Unemployed in Ireland in July 2009. This report largely presents a transparent process of collecting and analysing the data, it is available for anyone to read and download. The authors note that this is only a first attempt at profiling and welcome any comments on it. If we were to follow the Australians example (as detailed in the report) a series of readjustments would be necessary to accurately represent the current economic and social climate – between 1994 and 2008 they made at least four significant changes to their profiling model. A second follow-up ERSI report was issued in June 2014 called Predicting the Probability of Long-Term Unemployment Using Administrative Data. This report focused on those who did not leave the live register when predictions suggest that they should have found employment, returned to education or otherwise ‘signed-off’. This study was in some ways an audit of PEX but, as I will discuss later could also be used for other, more sinister, reasons.
What are we interested in?
Over the past few months we have been considering the PEX algorithm and what it means to unemployed people and the future of welfare policy. To begin I will point out that in our discussions we are all open to the fact that the use of large datasets and algorithms are a part of our world and are something that will continue to be part of it. Where we see our research being of practical importance is in influencing policy at both national and European level. We want to ensure that the ‘human’ side is not forgotten. So, we have asked a number of questions that will inform our research.
Categories and classifications:
The categorisations that are chosen for the questionnaire are interesting. Age and Gender are pretty much the standard categories for any questionnaire. Yes, demographics are important but gender is not so clear cut. The report authors have divided this algorithm into male and female. It is based on anomalies between the two groups from their findings. But is it such a clear cut demographic? With marriage equality and a greater acceptance of personal gender identification is such a category fit for the future of profiling? Similarly age is a category that is not stable. The International Labour Organisation (ILO) defines youth unemployment as from the age of 15 to 24, whereas it is now appears that transitions to adulthood may be happening later. We question if these are perhaps somewhat lazy categories that we rarely question. I do not want to criticise the use of demographic categorisation, there cannot be an infinite number of categories, but this does highlight the inflexibility of using solely quantitative measures. The questionnaire does touch briefly on individual perceptions and responsibilities by asking questions about perceived health and willingness to relocate but for the most part there is little personal input into developing the PEX or, as I will discuss later in any future interventions. There are also mathematical elements that are questioned, which I discuss below.
As an overall discussion of the future use of algorithms and how they impact individuals who are unemployed we need to return to philosophical reasoning and question the ethics of using this method to profile and characterise individual people. As I have already mentioned there is very little personal input in the process of developing PEX and its subsequent use. Unemployment cannot be discussed without considering employment. Statistical profiling is rigid, the questions are asked and the answers determine where a person fits into an overall classification system. Intervention for unemployed people is (meant to be) given based on the likelihood of an individual finding work within 12 months. But what is this work and what type of interventions are given? Work needs to be sustainable and meaningful. We do not believe that individuals within the social welfare system are given much opportunity to discuss their aspirations for careers, their interests, or abilities to carry out particular types of work. If we do not question this we may fall into the trap of stereotyping people based on well-worn categorisations of class and gender; it may be assumed that a people from different social classes have differing perceptions on economic life, but such stereotyping is not cut and dry, what about those who fall outside these well-worn assumptions, we need to consider everyone. More involvement of individuals is an element which could be pivotal in helping individuals find employment that is meaningful and therefore sustainable in the long term. So, is it ethically moral to treat individuals as unthinking or unfeeling ‘numbers’ and force upon them employment that is uninteresting to them, where tasks may difficult to accomplish based on their skills, or in industries that expect high staff turnovers or are not sustainable for the long term?
We also question the robustness of the mathematical model both technically and in practice. This is something that the report authors invite. They note the iterations that other profiling models have gone through and recognise that improvements may be required and the development of the model is transparent. Some areas of interest are; how they picked the cut-off point for each of the classifications, the reliability of the data due to its age (as noted by the authors), and the usefulness of the model which is between 60 and 80% reliable on its predictions. So, we ask where the model is weakest and if there is any improvements that can be made. We are also considering what happens when questions are answered differently – that is could a different answer on a single question move a borderline case into a different category and what are the consequences of this? Currently we do not believe that many unemployed people are aware of this profiling, it is however information that is widely available. In the event that one classification of unemployed was seen to be treated better than another, what would the consequences of ‘gaming’ the questionnaire be?
The Consequences of PEX:
While we would like to delve further into the actual real life consequences of PEX, this is just a pilot project and of limited time. It is certainly something that needs further academic exploration. Currently our colleague Kenny Doyle is interviewing unemployed people for his PhD research. So, anecdotally we know some of the consequences. His work has allowed us some glimpse inside this system where subcontractors to the DSP are now engaged to carry out the job-finding intervention services. The early indications (although not rigidly proven, just anecdotal) are that these private companies may be giving more attention to those with a high PEX score (people who it would be easier to find a job for) rather than those with low PEX scores.
A second set of unintended consequences could be sanctions. The follow up report from 2014 may have been an attempt by the authors to audit their own work, but it presents a worrying opportunity for sanctions to be imposed on those who do not leave the live register within the timeframe that the PEX algorithm says they should.
The above is a brief discussion of PEX and based on discussions of the project team over the initial months of the research.
We have two ongoing streams of research at the moment:
One is to gain access to the DSP’s longitudinal datasets, which is administered by the CSO. Then we can use the PEX model on a sample of claimants and see the effects that adjustments to the model have on PEX score.
The second is to conduct an ethnography of the algorithm. This is an emerging area of ethnography so the first stage is to conduct a literature review and then consider how this type of ethnography is possible. For this I am reading standard ethnography literature such as Van Maanen, literature on new forms of ethnographies, maths books, computing books, and design books. This is so I can understand the language, symbols and culture around an algorithm while pulling myself back from just concentrating on the social side. I will be putting some of my ideas and reports up here soon.
We are expecting to issue a call for papers soon for a Symposium to be held in December 2017. This will be designed to bring together anyone interested in unemployment for the future. We are also interested in expanding our research group to become more European focused and are looking for potential Horizon 2020 collaborators.
Understanding Unemployment in the Era of Big Data is funded through the IRC New Horizons Interdisciplinary Research Project Award. www.research.ie #loveirishresearch
Kavanagh, D., McGarry, S. & Kelly, S., 2015. Ethnography in and around an Algorithm. Athens, 30th EGOS Colloquium.
McGuiness, S., Kelly, E. & Walsh, J., 2014. Predicting the Probability of Long-Term Unemployment in Ireland using Administrative Data. [Online] Available at: https://www.esri.ie/publications/predicting-the-probability-of-long-term-unemployment-in-ireland-using-administrative-data-2/ [Accessed december 2016].
O’Connell, P. J., McGuiness, S., Kelly, E. & Walsh, J., 2009. National Profiling of the Unemployed in Ireland. [Online] Available at: https://www.esri.ie/publications/national-profiling-of-the-unemployed-in-ireland/ [Accessed December 2016].
Rosen, K. H., 1999. Discrete Mathematics and Its Applications. 4th ed. s.l.:WCB/McGraw-Hill.