Soccer Table in Haskell: Core Logic

Part 2: Implementing the Library

In the first part of this series, the project was set up using Cabal. In this second part, the library code shall be written. The library shall only contain pure, i.e. side-effect free functions. Reading the files and printing the table will be part of the executable—to be described in the next article.

Working with GHCi

Haskell is a compiled language, but thanks to GHCi, it can be programmed using a REPL. Cabal integrates well into this REPL, which can be started as follows from the project folder:

cabal repl

In the REPL, the library module SoccerTable can be imported as follows (λ is my prompt, as customized in ~/.ghci):

λ import SoccerTable
λ greet "you"
"Hello, you!"

Having written and exported an additional function called thank:

module SoccerTable (greet, thank) where

greet :: String -> String
greet whom = "Hello, " <> whom <> "!"

thank :: String -> String
thank person = "Thank you, " <> person <> "."

The code can simply be reloaded using :r (short for :reload) in GHCi:

λ :r
λ thank "Joe"
"Thank you, Joe."

This is a convenient setup to implement all required data structures and functions one by one interactively.

The library shall be implemented in a bottom-up manner by identifying sub-problems, solving them one by one, and then integrating those parts into an overall solution, providing a high-level API.

Parsing Game Results

The first sub-problem is the parsing of game results, which look like this:

Liverpool 2:3 Leeds United

This problem could be resolved using string split operations, but a regular expression is more expressive. Haskell offers various implementations (backends) for a common regex API (frontend). In our case, POSIX regular expressions are sufficient.

Two libraries—regex-base and regex-posix—need to be made available in the library section of the Cabal file (soccer-table.cabal):

library
  import:           compilation
  exposed-modules:  SoccerTable
  build-depends:    base
                  , regex-base
                  , regex-posix
  hs-source-dirs:   src

The version numbers have been omitted yet and shall be added once the entire application is in a working condition.

The regex-posix library provides the =~~ operator, which returns a tuple of four values, of which the fourth contains the captured groups. This operator is imported as follows:

import Text.Regex.Posix ((=~~))

The result shall be put into a record called GameResult:

data GameResult = GameResult
  { homeTeam :: String
  , awayTeam :: String
  , homeGoals :: Int
  , awayGoals :: Int
  }
  deriving (Show)

It is made an instance of Show so that its result can be displayed on the REPL. The function fromRawResult attempts to parse a raw result, i.e. a string denoting a game result. If this works, a GameResult is returned; otherwise, nothing is returned. The Maybe type allows for both variants:

fromRawResult :: String -> Maybe GameResult
fromRawResult result =
  let
    matches :: Maybe (String, String, String, [String])
    matches = result =~~ "^(.+) ([0-9]+):([0-9]+) (.+)$"
  in
    case matches of
      Just (_, _, _, [hT, hG, aG, aT]) -> Just GameResult
        { homeTeam = hT
        , awayTeam = aT
        , homeGoals = read hG :: Int
        , awayGoals = read aG :: Int
        }
      _ -> Nothing

The regular expression ^(.+) ([0-9]+):([0-9]+) (.+)$ states:

  1. Any characters at the beginning of the string are matched and captured.
  2. After a space, a number consisting of at least a single digit is captured.
  3. There is a colon in the middle of the string.
  4. Another number is captured.
  5. After a space, the rest of the string is matched and captured.

If the result argument matches, a GameResult is built up from the result and returned. The numbers are read as Int. If the result argument does not match, Nothing is returned.

A demonstration in GHCi shows its workings:

λ fromRawResult "Liverpool 2:3 Leeds United"
Just (GameResult {homeTeam = "Liverpool", awayTeam = "Leeds United",
                  homeGoals = 2, awayGoals = 3})
λ fromRawResult "Unparsable Game Result"
Nothing

Table Entries

Every line of a league table represents the team’s result up to a certain round. From a single game result, two table entries can be calculated: one for each team.

The type TableEntry is defined to hold all relevant columns:

data TableEntry = TableEntry
  { rank :: Int
  , name :: String
  , points :: Int
  , won :: Int
  , tied :: Int
  , lost :: Int
  , scored :: Int
  , conceded :: Int
  , difference :: Int
  }
  deriving (Eq, Show)

Those columns and their meaning have already been discussed in the first part. TableEntry is made an instance of Eq (for equality checks) and Show (for debugging output). It will be made an instance of Ord (for ordering) later.

But first, two TableEntry values, each representing a single round for its respective team, shall be created out of a GameResult:

fromGameResult :: GameResult -> [TableEntry]
fromGameResult result =
  let
    (hG, aG) = (homeGoals result, awayGoals result)
    (hP, aP, hW, aW, hT, aT, hL, aL) =
      case (compare hG aG) of
        LT -> (0, 3, 0, 1, 0, 0, 1, 0)
        EQ -> (1, 1, 0, 0, 1, 1, 0, 0)
        GT -> (3, 0, 1, 0, 0, 0, 0, 1)
  in
    [ TableEntry
      { rank = 0
      , name = (homeTeam result)
      , points = hP
      , won = hW
      , tied = hT
      , lost = hL
      , scored = hG 
      , conceded = aG 
      , difference = hG - aG 
      }
    , TableEntry
      { rank = 0
      , name = (awayTeam result)
      , points = aP
      , won = aW
      , tied = aT
      , lost = aL
      , scored = aG
      , conceded = hG
      , difference = aG - hG
      }
    ]

The values for home/away goals are extracted and then compared. If there are fewer home goals than away goals, the away team won. If both teams scored the same amount of goals, the game was tied. And if there are more home goals than away goals, the home team won.

For each outcome (LT, EQ, GT), a tuple of according values are assigned: home points, away points, home wins, away wins, home tied, away tied, home lost, and away lost. Three points are given to the winning team, and zero to the losing one. Or one point is given to each team in case of a draw.

Those values are then assigned to the two TableEntry compound values, which are returned in a list: the first element representing the home team’s entry, the second one that one of the away team. Notice how the goal difference is calculated the opposite way for the home and away team.

This function is rather simple, but error-prone due to its density nonetheless, so an interactive test comes in handy. First, all three scenarios need to be created:

λ Just homeWin = fromRawResult "Liverpool 2:3 Leeds United"
λ Just awayWin = fromRawResult "Liverpool 1:4 Leeds United"
λ Just noWin = fromRawResult "Liverpool 2:2 Leeds United"

The Maybe is unwrapped by matching the result directly to the Just variant. The fromGameResult expects a GameResult, not a Maybe GameResult. (Dealing with those kinds of issues might be subject to a fifth article in the series.)

Now those parsed results shall be processed and checked:

λ fromGameResult homeWin
[TableEntry {rank = 0, name = "Liverpool", points = 0, won = 0, tied = 0, lost = 1, scored = 2, conceded = 3, difference = -1}
,TableEntry {rank = 0, name = "Leeds United", points = 3, won = 1, tied = 0, lost = 0, scored = 3, conceded = 2, difference = 1}]
λ fromGameResult awayWin
[TableEntry {rank = 0, name = "Liverpool", points = 0, won = 0, tied = 0, lost = 1, scored = 1, conceded = 4, difference = -3}
,TableEntry {rank = 0, name = "Leeds United", points = 3, won = 1, tied = 0, lost = 0, scored = 4, conceded = 1, difference = 3}]
λ fromGameResult noWin
[TableEntry {rank = 0, name = "Liverpool", points = 1, won = 0, tied = 1, lost = 0, scored = 2, conceded = 2, difference = 0}
,TableEntry {rank = 0, name = "Leeds United", points = 1, won = 0, tied = 1, lost = 0, scored = 2, conceded = 2, difference = 0}]

Which looks correct!

Merging Rounds

Each of those TableEntry compound values represents a league table entry for a single round. In order to build up the table for the entire season, all the TableEntry values belonging to the same team need to be merged.

This can be done by simply adding up all the numbers—but only if the team name of the two entries match:

merge :: TableEntry -> TableEntry -> Maybe TableEntry
merge left right =
  if (name left) == (name right)
  then Just TableEntry
  { rank = 0
  , name = (name left)
  , points = (points left) + (points right)
  , won = (won left) + (won right)
  , tied = (tied left) + (tied right)
  , lost = (lost left) + (lost right)
  , scored = (scored left) + (scored right)
  , conceded = (conceded left) + (conceded right)
  , difference = (difference left) + (difference right)
  }
  else Nothing

In order to make that work, the TableEntry values must be grouped by their team name. A map (dictionary, associative array) is a good fit for this problem: the team name shall serve as the key, and its accumulative TableEntry as the value.

Haskell offers a Map data type, for which the containers library must be made available in the Cabal file:

library
  import:           compilation
  exposed-modules:  SoccerTable
  build-depends:    base
                  , containers
                  , regex-base
                  , regex-posix
  hs-source-dirs:   src

The Map and its functions shall be made available under the symbol M:

import qualified Data.Map as M

This map is built up by merging each team’s TableEntry values one by one, which is carried further as an accumulator to be stored as the map’s value:

mergeAll :: [TableEntry] -> (M.Map String TableEntry)
mergeAll entries =
  foldl combine (M.fromList []) entries
  where
    combine :: (M.Map String TableEntry) -> TableEntry -> (M.Map String TableEntry)
    combine acc e =
      case M.lookup (name e) acc of
        Just x ->
          case (merge e x) of
            Just f -> M.insert (name f) f acc
            Nothing -> acc
        Nothing -> M.insert (name e) e acc

The functoin mergeAll expects a list of TableEntry values and returns a map (M.Map) with String as its key and TableEntry as its value type.

The entries are folded together using the combine auxiliary function defined in the where block, which integrates an additional TableEntry value into the map.

If the team is already represented in the map, i.e. if looking its name up in the map yields a result (Just x), the existing and the new TableEntry values are combined using the merge function discussed further above.

Otherwise, if the team is not in the map yet, the current entry is just put into the map.

An interactive test is due; shorter team names are used for the sake of readability:

λ rawResults = ["A 2:1 B", "B 3:0 C", "C 0:0 A"]
λ dayEntries = concat [fromGameResult r | Just r <- map fromRawResult rawResults]
λ mergeAll dayEntries
fromList
[("A",TableEntry {rank = 0, name = "A", points = 4, won = 1, tied = 1, lost = 0, scored = 2, conceded = 1, difference = 1})
,("B",TableEntry {rank = 0, name = "B", points = 3, won = 1, tied = 0, lost = 1, scored = 4, conceded = 2, difference = 2})
,("C",TableEntry {rank = 0, name = "C", points = 1, won = 0, tied = 1, lost = 1, scored = 0, conceded = 3, difference = -3})]

The rawResults are parsed and converted to TableEntry values in a list comprehension. The resulting nested list is flattened using concat, which then can be processed using mergeAll.

The result looks good: Team A won once and tied once; team B won once and lost once, and team C lost once and tied once. The individual entries are sound. But the proper order only happened by chance.

Soring the Table

The table needs to be ordered by four criteria: 1) points, 2) goal difference, 3) number of wins (all descendingly), and 4) name (ascendingly).

This is achieved by implementing TableEntry as an instance of Ord, for which the compare function must be implemented, which indicates how the entries are ordered ascendingly:

instance Ord TableEntry where
  compare l r = compare
    (points l, difference l, won l, name r)
    (points r, difference r, won r, name l)

This sorting function simply delegates the sorting of the relevant attributes in the proper order using a tuple.

Note that the first tuple contains the three numeric values of the left argument and the name of the right argument, while the second tuple contains the three numeric values of the right argument and the name of the left argument. This is because the numeric values shall be sorted in descendingly, while the name shall be sorted ascendingly. (The compare function defines an ascending order, which neesd to be reversed later.)

For clarification, consider the following interactive examples of two-element tuples being compared:

λ compare ('a', 1) ('a', 0)
GT

If the first element ('a' and 'a') is the same for both tuples, the second element is (1 and 0) is used for sorting. GT reads as ('a', 1) > ('a', 0).

λ compare ('a', 1) ('b', 0)
LT

Here, the left’s first argument is less than the second’s, so LT reads as ('a', 1) < ('b', 0); the second element is irrelevant.

λ compare ('a', 0) ('a', 0)
EQ

And finally, if two equal tuples are compared, the result is EQ, i.e. equal.

With the sorting of the TableEntry values set, the pieces (sub-solutions) can be combined to a whole.

Calculate the Table

A high-level API is needed to process a list of strings to a list of table entries. This function (calculateTable) looks like this:

calculateTable :: [String] -> [TableEntry]
calculateTable rawResults =
  let
    nested = concat [fromGameResult x | Just x <- map fromRawResult rawResults]
  in
    calcRank . reverse . L.sort . M.elems . mergeAll $ nested
  where
    calcRank = map (\(r, e) -> e { rank = r }) . zip [1..]

The definition of nested in the let part is similar to the demonstration of the mergeAll function in GHCi further above.

The in block applies compositions (using the compolsition operator .), which needs to be read from the right to the left:

  1. mergeAll: The individual table entries are merged.
  2. M.elems: The values are extracted from the map.
  3. L.sort: The table entries are sorted ascendingly.
  4. reverse: The table order is reversed to achieve the descending order (of points, goal difference, wins).
  5. calcRank: The ordered table is zipped with a sequence of numbers (1, 2, …) to assign the proper rank to each entry.

The sort function needs to be imported:

import qualified Data.List as L (sort)

Then a final test is due in GHCi:

λ calculateTable ["A 2:1 B", "B 3:0 C", "C 0:0 A"]
[TableEntry {rank = 1, name = "A", points = 4, won = 1, tied = 1, lost = 0, scored = 2, conceded = 1, difference = 1}
,TableEntry {rank = 2, name = "B", points = 3, won = 1, tied = 0, lost = 1, scored = 4, conceded = 2, difference = 2}
,TableEntry {rank = 3, name = "C", points = 1, won = 0, tied = 1, lost = 1, scored = 0, conceded = 3, difference = -3}]

All those functions and data types are exported by the following module definition on top, so that they can be tested interactively in GHCi.

module SoccerTable
( GameResult
, fromRawResult
, TableEntry
, fromGameResult
, merge
, mergeAll
, calculateTable
)
where
-- import statements

The full code of src/SoccerTable.hs can be found on GitHub.

Conclusion and Next Step

The library logic has been implemented using regular expressions, some custom data structures (GameResult, TableEntry), a Map, and Haskell’s built-in sorting facilities. A table can be calculated from a list of strings denoting game results.

Now those game results need to be read from a folder containing multiple text files. Having processed them to a list of TableEntry values, the table needs to be printed in a proper format. This will be done in the third article of this series.