18
Oct
2015
7:30 AM

Ekstraklasa Patterns 1994-2015

This piece on the Ekstraklasa, the top level of Polish football (soccer) is being created as a companion piece to my upcoming presentation at the Data+ conference in Warsaw at the end of November 2015. Within this story, data will be used to illustrate many of the patterns within the league over the last 21 seasons, from the 1994-95 campaign through the 2014-15 season. We will tell several different stories using this data, which is found on the us.soccerway.com website.

Before we begin, let's pause to consider a number of possible storylines we could pursue using the data. The following paragraphs highlight just a few of the many possibilities.

One simple storyline might be to explore the number of seasons per team over this 21-year period. Unlike many sports, soccer has a system where poor-performing teams are subject to demotion to a lower level of competition. It might prove interesting to understand which teams were present for all 21 seasons (if any), versus those teams that ascended to this level for only a few seasons. Another facet of this story could be to highlight teams that no longer exist, for financial or other business reasons.

Another angle might look at some of the more common metrics involved in determining success (or failure). In this dataset, we have a number of critical measures readily available, ranging from points per season to wins, draws, losses, goals scored and allowed, and rank. These can be examined within individual seasons, or as an aggregate over the 21-year period.

Yet another interesting view can be presented through the use of geographic data. In this case, we have dozens of teams from multiple cities competing in at least a single season. It may prove interesting to display each team based on its unique geographic location, using latitude and longitude coordinates.

Each team also has a distinct logo we can use for easy visual identification. Coupling a logo with the team name beside it helps to create a much richer visualization, and can be especially useful for drawing in viewers via recognition of their favorite team's logo.

We have barely scraped the surface on available data, yet have quickly identified several potentially rich storylines. Other available data we could employ includes individual player statistics, game results, attendance levels, and more. There is also the possibility of merging external datasets to provide context to our story. In fact, our greatest challenge when telling this or any other story is what not to include. We want our story to be focused and easy to follow, and of course, we want to add multiple visual elements.

Defining Our Story Elements

After examining the data I spoke of earlier, as well as some additional possibilities, I have decided to refine the story into a few significant pieces. These are the elements that will be pulled together to tell an informative story about the Ekstraklasa and its respective teams over the last 21 seasons:

  1. Team statistics at the season level, including points, rank, and several other measures will help us to paint an overall picture on the level of success (or failure) of each team.

  2. Geographic data will allow us to place the story in context, giving it a local flavor that cannot be achieved through simple charts and tables. To do this effectively, we must use maps that will permit interactive exploration through panning and zooming. We may also choose to include other map layers to augment our story.

  3. Team logos or symbols will also be utilized to provide greater context and authenticity to the story. How and where these will be used is yet to be determined, but will become more apparent as we compose the various elements of the narrative.

  4. Additional elements may be added as the story progresses; I always like to leave a little room for new discoveries and improvisations as the story begins to unfold. Therefore, the final story may include timelines, photos, and other appropriate items that complement the story.

Ekstraklasa Overview

The Ekstraklasa was officially formed in 1927, and continues to be the top level for Polish football clubs. In recent years, the number of teams competing per season has typically been 16, although there has been some variance around that figure.

Teams are not guaranteed to remain in the league, as they can be demoted based on poor performance, or may leave based on business considerations if the club is unprofitable. This is why you will see only a handful of clubs who have fielded teams at this level for most or all of the 21 seasons between 1994 and 2015.

Team Profiles

In this section, the goal is to provide an overview of each team that participated in the Ekstraklasa for at least one season starting with the 1994-95 campaign. As part of telling a data-based story, we will view team locations using geocoded data. Precision levels for this data may vary, depending on availability, especially for inactive teams. Nonetheless, we will make our best effort to place each club accurately.

Two sections will follow as we attempt to provide a concise yet informative summary of each club. We'll look first at Team Summaries, followed by a section called Geographic Patterns.

So let's begin with a visual summary for each of the more than 40 teams to take the field over the 21 season time period. For this section, we'll simply lay the teams out in alphabetic order while looking at three common measures; in subsequent sections, charts will often be sorted based on team performance relative to their competitors. Not only will this provide readers with considerable information, it will also help illustrate the advantages of visual displays versus more traditional text-based data.

Team Summaries

In this section we will take a brief look at each team using three simple measures - number of seasons played (out of 21 possible), seasons finished in 1st place, and seasons finished in the top 3 teams. In the interest of simplicity, we will disregard the post-season playoffs.

Amica

Seasons: 11

Titles: 0

Top 3 Finishes: 1

Arka Gdynia

Seasons: 5

Titles: 0

Top 3 Finishes: 0

Belchatow

Seasons: 12

Titles: 0

Top 3 Finishes: 1

Cracovia Krakow

Seasons: 10

Titles: 0

Top 3 Finishes: 0

Dyskobilia Grodzisk

Seasons: 10

Titles: 0

Top 3 Finishes: 3

Gornik Leczna

Seasons: 5

Titles: 0

Top 3 Finishes: 0

Gornik Zabrze

Seasons: 20

Titles: 0

Top 3 Finishes: 0

Hutnik Nowa Huta

Seasons: 3

Titles: 0

Top 3 Finishes: 1

Jagiellona Bialystok

Seasons: 8

Titles: 0

Top 3 Finishes: 1

Katowice

Seasons: 10

Titles: 0

Top 3 Finishes: 2

Korona Kielce

Seasons: 9

Titles: 0

Top 3 Finishes: 0

KSZO

Seasons: 3

Titles: 0

Top 3 Finishes: 0

Lech Poznan

Seasons: 19

Titles: 1

Top 3 Finishes: 5

Lechia Gdansk

Seasons: 9

Titles: 0

Top 3 Finishes: 0

Legia Warszawa

Seasons: 21

Titles: 7

Top 3 Finishes: 17

Lodzki KS

Seasons: 10

Titles: 1

Top 3 Finishes: 1

Odra Wodzislaw Slaski

Seasons: 14

Titles: 1

Top 3 Finishes: 2

Orlen Plock

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Petrochemia

Seasons: 3

Titles: 0

Top 3 Finishes: 0

Piast Gliwice

Seasons: 5

Titles: 0

Top 3 Finishes: 0

Podbeskidzie

Seasons: 4

Titles: 0

Top 3 Finishes: 0

Pogon Szczecin

Seasons: 14

Titles: 0

Top 3 Finishes: 2

Polkowice

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Polonia Bytom

Seasons: 4

Titles: 0

Top 3 Finishes: 0

Polonia Warszawa

Seasons: 15

Titles: 1

Top 3 Finishes: 3

Radomsko

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Raków Czestochowa

Seasons: 4

Titles: 0

Top 3 Finishes: 0

Ruch Chorzow

Seasons: 16

Titles: 0

Top 3 Finishes: 4

Ruch Radzionków

Seasons: 3

Titles: 0

Top 3 Finishes: 0

Siarka Tarnobrzeg

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Slask Wroclaw

Seasons: 11

Titles: 1

Top 3 Finishes: 3

Sokol Pniewy

Seasons: 3

Titles: 0

Top 3 Finishes: 0

Stal Mielic

Seasons: 2

Titles: 0

Top 3 Finishes: 0

Stal Stalowa Wola

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Stomil Olsztyn

Seasons: 8

Titles: 0

Top 3 Finishes: 0

Swit

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Szczakowianka Jaworzno

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Warta Poznań

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Widzew Łódź

Seasons: 16

Titles: 2

Top 3 Finishes: 4

Wisla Krakow

Seasons: 19

Titles: 8

Top 3 Finishes: 13

Wisla Plock

Seasons: 19

Titles: 8

Top 3 Finishes: 13

Zagłębie Lubin

Seasons: 18

Titles: 1

Top 3 Finishes: 2

Zagłębie Sosnowiec

Seasons: 1

Titles: 0

Top 3 Finishes: 0

Zawisza Bydgoszcz

Seasons: 2

Titles: 0

Top 3 Finishes: 0

While this may have been an interesting exercise made slightly relevant by the inclusion of team logos, it was nonetheless a bit lengthy, and has multiple shortcomings. To paraphrase visualization legend Edward Tufte, we have a very inefficient ink to data ratio using this approach. Here are a few of the many limitations this format creates:

  1. It is very challenging to compare team results without considerable effort
  2. There are no categorical rankings to provide context
  3. No information is provided to inform us when individual teams were more or less successful
  4. We cannot see any geographic patterns

As we shall see in a bit, a series of well-designed visuals (charts, maps, tables) will allow us to start overcoming these flaws.

Geographic Patterns

Another set of questions we can address and perhaps answer revolves around the geographic pattern of teams over this 21-year period. Through the use of season-level maps, we should be able to gain an understanding into any major changes that may have transpired. In simple terms, has the footprint of the Ekstraklasa changed? There are two principal ways for changes to occur:

  1. Poor performance may get a team demoted to a lower level of play. In order to return to the Ekstraklasa, that team will need to finish near the top of the lower level. So it is quite possible a very poor team may be demoted and never return to the Ekstraklasa.
  2. Financial considerations may also lead to the disappearance of an Ekstraklasa team. If the owners elect to fold the club, we will of course not see it return to the Ekstraklasa.

So with these considerations in mind, we will now examine team maps to determine whether there have been any notable patterns taking place during our study period. Our goal is not simply to create some interesting visuals, but to use these maps to inform readers of any underlying patterns.

Using CartoDB, we can easily create multiple maps to understand a variety of patterns. Our first effort will be a temporal (time-based) geographic view of active clubs between the 1994-95 and 2014-15 seasons. You may navigate each of these maps using zoom and pan functionality.

Ekstraklasa Temporal Map by Club

There are a few patterns we can detect:

  1. There has been at least one active Warsaw team for the entire period
  2. Many teams are concentrated in the south central part of the country near the Czech border
  3. Fewer clubs are concentrated in the northern half of the country

Let's take another look, this time as an attempt to understand the total number of seasons in Ekstraklasa play. One simple look is to see a static view of all clubs, with only the clubs with the most seasons highlighted.

Ekstraklasa Most Seasons Played Map

With this map, we can see that the teams with the most frequent Ekstraklasa participation are scattered around the country - Warsaw, Krakow, and elsewhere. There does not appear to be a bias toward the largest cities, at least in terms of number of seasons played. Later on, we'll see if this holds true for success in the standings.

Our third and final map clusters the teams by geography, so we can really understand where the most seasons have been played.

Ekstraklasa Geographic Cluster Map

Now there is a clear pattern that was suggested previously, as the most seasons played are clearly in the south central part of the country. This makes sense, given the large number of Ekstraklasa teams in this area over the 21-year period. We also see the influence of the Warsaw clubs, with two teams accounting for 36 of a possible 42 seasons. To see additional detail, simply double click any cluster to begin exploring the teams making up the group, or use the search capability to explore further.

As these brief examples illustrated, there are many possibilities to analyze geo-based data using maps.

Aggregate Patterns, 1994-2015

This next section is where we really get into the performance of each team within the Ekstraklasa, examining teams by final rank, points, wins, goals, and more. Using charts to visually track these metrics will lead to greater insights compared to viewing the same data in a series of tables. There are times when tables are quite appropriate, and even preferred, but in most instances we can make our point more clearly by visualizing using the proper chart types.

Rankings

The most critical variable for measuring a club's success is where they finish at the end of the season. We are not looking at the post-season in the context of this story, so the regular season finish is our best indicator of success or failure. All other measures - points, goals for and against, and so on, contribute to a club's final rank in the standings.

We'll initially look at this in aggregate fashion, showing how often individual teams finish in a given rank in the standings. An ideal way to do this is to use a modified sort of bubble plot; the modification comes in placing clubs (rather than a number) along the x-axis. The y-axis will have numbers from 1 through 18, to reflect annual positions in the standings. To make it intuitive, the axis should be inverted, since a 1 represents first place in the standings. The chart will be read from top to bottom, so a high proportion of values at the top of the chart will reflect a high degree of success.

Ranking Frequency by Team, 1994-2015

This chart really enable us to see at a glance which clubs have had the greatest frequency of high or low positions in the standings over our 21-year period. Splitting the clubs into 3 groups based on the number of Ekstraklasa seasons makes the results easy to navigate. We can also customize our display by selecting only the clubs we wish to view for comparison purposes.

Next we'll move into some of the specific measures that will help explain how the positional rankings are achieved.

Points

We will begin by examining some scatter plots showing the relationships between points and other metrics. Mouse over any of the points in the following charts to see which teams are represented.

Average Wins & Points by Team, 1994-2015

As we might have expected, the relationship between wins and points is very linear. Since each win equals 3 points, this makes perfect sense. Note the two clubs at the upper right of the chart, traditional powers Legia Warszawa and Wisla Krakow. These two have clearly been the top regular season clubs over our 21 season period. While wins have the expected linear relationship with points, what about draws - do successful teams manage to get the single point from draws more often than unsuccessful clubs? Let's have a look.

Average Draws & Points by Team, 1994-2015

Now we see a slightly positive relationship, but much less linear than for wins. In fact, the top point teams, Legia Warszawa and Wisla Krakow have only modest numbers of draws per season yet far outrank their competitors in total points. We do see the worst clubs having few draws, suggesting their inability to secure an even match with better clubs.

Another metric that should be a strong success predictor is goal differential. Let's plot that against points to see the strength of the relationship, using individual team-season combinations.

Differential & Points by Team, 1994-2015

As we expected, a very linear relationship exists between differential and points. In most cases, the clubs with the greatest goal differentials earn the most points in a season. There are very few cases that stray far from the trend line, primarily a group of teams with slightly positive differentials yet very few points. These examples may be worth investigating to see if there is a common element, but we will not pursue that now. Two clubs stand out at the upper right of the chart, and both turn out to be from the 1995-96 season, Widzew Lodz and Legia Warzawa.

Wins & Points by Team, 1994-2015

We should also expect a strong positive relationship between wins and points at an individual season level. Recall our earlier chart showing such a pattern using averages by team. Now let's see what happens at the more discrete team-season level.

Once again the results are strongly positively correlated, despite a modest amount of variation at each level of wins. Notice the group of lower points across win levels; these appear to be outliers, but in fact provide some interesting information. Each of these points turn out to be from the 1994-95 season, when wins accounted for just 2 points. Every other season in the analysis awards 3 points for a win, leading to a more consistent pattern in the chart. Once again, the two markers at the upper right represent the same 1995-96 clubs we saw in the previous chart - Widzew Lodz and Legia Warszawa.

Before we move to visualizing individual seasons, there is one additional angle to explore. The next section will use heatmaps to examine overall patterns across time. Heatmaps should allow us to detect variations by category across time, and may provide some insight into overall trends.

Statistical Heatmaps

In this section, we're going to take a look at seasonal patterns in the aggregate. Our aim here is to understand macro-level patterns by team, which should help confirm some of the previously observed charts. With a heatmap, we can show multiple measures in a single matrix-style chart, placing teams along the y-axis and measures on the x-axis.

In this instance, white will represent low values, while we use red for the high end of each measure. Red is desirable in positive categories, such as points, wins, and goals scored, but it is viewed as a negative for categories such as goals allowed and losses. All variables are re-scaled versus the average to derive consistent results. Here's what we see:

Not surprisingly, we see solid red for both Wisla Krakow and Legia Warszawa on key variables such as Wins, Points, and Differential, confirming the values from previous charts. The beauty of this heatmap lies in its ability to provide insight across multiple variable for dozens of teams, with the eyes naturally drawn toward extreme values based on their color.

While we have explored just a handful of possibilities, the previous set of charts has nonetheless provided us with some new insights about aggregate patterns in the data.

Charting Individual Seasons

To this point, we've been concentrating primarily on aggregate data. Now it's time to go one level deeper to understand the individual season results that feed the aggregate patterns. This is where we can gain a much better understanding for the ups and downs of each team, and to see if recent dominant clubs are very different from those of 15 or 20 years earlier. To do this, we'll examine a few basic categories, starting with season standings.

Season Metrics

We have a variety of charts where we could share this information - bar charts, line charts, dotplots, and tables, to mention a few options. Our goal is to provide the maximum amount of knowledge in the simplest form possible - a bit of an Occam's Razor approach. The volume of data is a concern in this case - we have 21 seasons, typically with 16 teams per year. This translates to more than 300 data points to encode, so our first question may be to determine whether one visualization will be effective in displaying this data.

Another question we need to address is this - do we wish to understand our information at a team or chronological level? Our answer to this question will help dictate our display type. In this case, we have the luxury of viewing the data through each of these lenses, which provides an opportunity to use multiple approaches.

In the case of a season-based approach, we can use some interesting techniques to share information. For example, let's have a look at some box plots that summarizes information by team, while still allowing us to see individual data points. As noted earlier, this will not provide insight into any team-level trends, but it will answer questions about overall patterns by season. One of the goals here is to determine whether the competitive balance has changed over the 21 season period.

Points Box Plots by Season

Some of the early seasons, especially through the 1997-98 campaign, show very high levels of variation, with some weak clubs having very low points relative to the average, while a few teams have very high point totals. In other words, the competitive balance is weak, with a great disparity from top to bottom. On to the differentials, to determine whether this is a consistent pattern.

Differential Box Plots by Season

Once again, we see a similar pattern with the early seasons. Notice also the tighter distribution in the more recent seasons, from about 2009 to the present. This suggests that the competitive balance has improved, with a smaller disparity from top to bottom, at least as measured by differential.

Goals Scored Box Plots by Season

Goals scored reflect a similar finding, with a much closer top to bottom distribution. Everything points to greater competitive balance, with small differences from top to bottom.

Goals Allowed Box Plots by Season

Finally, the goals allowed measure displays a very similar trend, with the recent seasons showing a tight distribution, at least relative to the earlier seasons. All four measures suggest an increase in competitive balance that should encourage some of the traditionally less competitive teams.

Conclusion

I hope this article has helped shed some light on the Ekstraklasa and its teams over the last two decades, as well as some different ways we can visually display this information. Data storytelling is an emerging field, enabled by many of the great open source tools available today. Utilizing tools such as rCharts, Dimple, Polycharts, D3, NVD3, and CartoDB allows us to convert raw data into more meaningful visual stories with many opportunities for user interaction.



You may also like