BP_Kvapil_Ondřej_2020.tex

% vim: set tabstop=4 shiftwidth=4

% Neovim setup:
% let g:vimtex_compiler_latexrun = {
%	\ 'build_dir' : '',
%	\ 'options' : [
%	\   '-verbose-cmds',
%	\   '-xelatex',
%	\   '-shell-escape',
%	\   '-interaction=nonstopmode',
%	\   '-synctex=1',
%	\   '-file-line-error',
%	\   '--latex-args="-shell-escape"',
%	\ ],
%	\}

% options:
% thesis=B bachelor's thesis
% thesis=M master's thesis
% czech thesis in Czech language
% english thesis in English language
% hidelinks remove colour boxes around hyperlinks

\documentclass[thesis=B,english]{FITthesis}[2019/12/23]

\usepackage[utf8]{inputenc} % LaTeX source encoded as UTF-8
% \usepackage[latin2]{inputenc} % LaTeX source encoded as ISO-8859-2
% \usepackage[cp1250]{inputenc} % LaTeX source encoded as Windows-1250

% \usepackage{subfig} %subfigures
% \usepackage{amsmath} %advanced maths
% \usepackage{amssymb} %additional math symbols

\usepackage{dirtree} %directory tree visualisation
\usepackage{morewrites}
\usepackage{xcolor}
\usepackage{blindtext}
\usepackage{footnote}
\usepackage{tabularx}
\usepackage{ragged2e}
\usepackage{booktabs}
\usepackage{microtype}
\usepackage[numbers]{natbib}
\usepackage[htt]{hyphenat}
\usepackage{xparse,minted}
% taken from
% https://tex.stackexchange.com/questions/161124/how-to-make-a-minted-code-listing-centered-on-a-page
\RecustomVerbatimEnvironment{Verbatim}{BVerbatim}{}

\usepackage{tikz}
\usepackage{tcolorbox}
\tcbuselibrary{breakable}
\tcbuselibrary{fitting}
\tcbuselibrary{minted}

\usetikzlibrary{
	shapes.multipart, arrows, positioning
}


% custom commands
\newcommand{\eg}{\emph{e.g.}\xspace}
\newcommand{\ie}{\emph{i.e.}\xspace}
\newcommand{\todo}[1]{\textcolor{red}{\textbf{[[#1]]}}}
\newcommand{\blind}[1][1]{\textcolor{gray}{\Blindtext[#1][1]}}
\newcommand{\citationNeeded}{\textcolor{red}{\textbf{[citation needed]}}}
\newcommand{\hackage}[1]{\texttt{#1}}
\newcommand{\hsSignature}[1]{\hsCode{#1}}
\newcommand{\hsPat}[1]{\texttt{#1}}
\newcommand{\hsType}[1]{\texttt{#1}}
\newcommand{\hsIdent}[1]{\texttt{#1}}
\newcommand{\hsModule}[1]{\texttt{#1}}
\newcommand{\hsTC}[1]{\texttt{#1}}
\newcommand{\hsCode}[1]{\mintinline[
	breakbytokenanywhere,breaklines,escapeinside=&&,mathescape=true
]{haskell}{#1}}

% tabularx customisation
\newcolumntype{L}{>{\RaggedRight\arraybackslash}X}


% list of acronyms
\usepackage[acronym,nonumberlist,toc,numberedsection=autolabel]{glossaries}
\iflanguage{czech}{\renewcommand*{\acronymname}{Seznam pou{\v z}it{\' y}ch zkratek}}{}
\makeglossaries

% \newacronym{CVUT}{{\v C}VUT}{{\v C}esk{\' e} vysok{\' e} u{\v c}en{\' i} technick{\' e} v Praze}
% \newacronym{FIT}{FIT}{Fakulta informa{\v c}n{\' i}ch technologi{\' i}}
\newacronym{ghc}{GHC}{Glasgow Haskell Compiler}
\newacronym{ghci}{GHCi}{\acrshort{ghc} interpreter}
\newacronym{rts}{RTS}{Runtime System}
\newacronym{stm}{STM}{Software Transactional Memory}
\newacronym{ui}{UI}{User Interface}
\newacronym{ffi}{FFI}{Foreign Function Interface}
\newacronym{th}{TH}{Template Haskell}
\newacronym{os}{OS}{Operating System}
\newacronym{llvm}{LLVM}{Low-Level Virtual Machine}
\newacronym{syb}{SYB}{Scrap Your Boilerplate}
\newacronym{adt}{ADT}{Algebraic Data Type}
\newacronym{gadt}{GADT}{Generalised Algebraic Data Type}
\newacronym{repl}{REPL}{Read-Eval-Print Loop}
\newacronym{ast}{AST}{Abstract Syntax Tree}
\newacronym{ir}{IR}{Intermediate Representation}
\newacronym{csv}{CSV}{Comma-Separated Values}
\newacronym{api}{API}{Application Programming Interface}
\newacronym{rhs}{RHS}{Right-Hand Side}
\newacronym{hpc}{HPC}{Haskell Program Coverage}
\newacronym{stg}{STG}{Spineless Tagless G-machine}
\newacronym{gnu}{GNU}{GNU's Not Unix, a Unix-like operating system}
\newacronym{hls}{HLS}{Haskell Language Server}
\newacronym{bco}{BCO}{Byte Code Object}
\newacronym{lsp}{LSP}{Language Server Protocol}
\newacronym{tso}{TSO}{Thread State Object}
\newacronym{whnf}{WHNF}{Weak Head Normal Form}


% % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% EDIT THIS
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % %

\department{Programming Research Laboratory}
\title{Haskell Dynamic Tracing}
\authorGN{Ondřej} %author's given name/names
\authorFN{Kvapil} %author's surname
\author{Ondřej Kvapil} %author's name without academic degrees
\authorWithDegrees{Ondřej Kvapil} %author's name with academic degrees
\supervisor{Ing. Filip Křikava, Ph.D.}
\acknowledgements{
	I would like to thank Ing. Filip Křikava, Ph.D. and professor Jan Vitek for
	not only making this work possible, but for being open, friendly, and
	supportive throughout the whole process. Special thanks goes to Artem
	Pelenitsyn and Aviral Goel who provided me with advice and direction in my
	work.  Additional thanks goes to Benedict Allen, Abigail Magalhães de
	Alcantara, and Jonathan Coates, who have had a greater influence on my love
	of programming languages than they might think, and to my friends and
	family for an endless supply of entertainment.
}
\abstractEN{
	Haskell is one of the most well-known instances of a programming language
	that uses non-strict semantics. On the one hand, this brings the
	convenience of infinite data structures, user-defined control flow, and the
	possibility to avoid unnecessary computation. On the other hand, these
	benefits are hampered by the runtime overhead and hard-to-predict the
	behaviour of call-by-need. This begs the question: \emph{Is laziness worth
	it?} To answer this question, we need to understand how laziness is used in
	the wild. To this end, we develop a tool for dynamic analysis used to trace
	the evaluation of function parameters. It is implemented as a compiler
	plugin for the \acrlong{ghc}.
}
\abstractCS{
	Haskell je jeden z nejznámějších jazyků s non-strict sémantikou. Na jednu
	stranu přináší tato sémantika pohodlí nekonečných datových struktur,
	řídících konstrukcí definovaných uživatelem a možnost vyhnout se
	nepotřebným výpočtům. Na stranu druhou jsou tyto výhody postiženy daní na
	výkonu za běhu programu a těžko předvídatelným chováním call-by-need.
	Nabízí se otázka: \emph{Vyplatí se líná evaluace?} K zodpovězení této
	otázky musíme porozumět tomu, jak je lenost využívána v praxi. K tomuto
	účelu jsme vyvinuli nástroj pro dynamickou analýzu použitelný k trasování
	evaluace funkčních parametrů. Je implementován jako zásuvný modul
	kompilátoru \acrlong{ghc}.
}
\placeForDeclarationOfAuthenticity{Prague}
\keywordsCS{Haskell, dynamické trasování, líné vyhodnocování, zásuvné moduly
kompilátorů, generické programování, GHC}
\keywordsEN{Haskell, dynamic tracing, lazy evaluation, compiler plugins,
generic programming, GHC}
\declarationOfAuthenticityOption{4} %select as appropriate, according to the desired license (integer 1-6)
% \website{http://site.example/thesis} %optional thesis URL


\begin{document}

\setsecnumdepth{part}
\chapter{Introduction} \label{sec:intro}
Conventional programming languages of all paradigms use -- almost equivocally
-- eager evaluation strategies. Non-strict semantics has far-reaching
implications on the design of a language~\cite{haskell-is-pure} and comes with
both benefits in expressiveness and implementation challenges.

The non-strict semantics of the Haskell language were a guiding principle which
influenced or directly determined many of the decisions made at its inception
over thirty years ago~\cite{history-of-haskell}. Lazy evaluation is a
potentially powerful implementation strategy for non-strict languages, freeing
the programmer to focus on what a program means rather than on how it is
computed. Laziness naturally accommodates user-defined control flow and
evaluates only the required subset of a given program in a demand-driven
manner. However, the implementation of non-strict features via laziness in
\acrshort{ghc} brings many pitfalls which Haskell programmers need to deal
with. Automatic avoidance of unnecessary thunk allocations is
conservative~\cite{cmtary-demand-analysis}: if \acrshort{ghc} is unable to
prove the strictness of a function in an argument by static strictness
analysis, the function will remain lazy, possibly leading to pathological
memory behaviour at runtime.

Haskell code is pure.\footnote{
	Unless it uses unsafe facilities of the language.
} Functions in Haskell correspond closely to mathematical functions: they are
deterministic and free of side-effects. Haskell programs include pure
descriptions of effectful computations built in a compositional way via the
\hsType{IO} monad. Each program exports a top-level definition called
\hsIdent{main}. Invoking the program begins the demand-driven evaluation of its
\hsIdent{main} definition by the runtime. Haskell's strong static type system
provides a compile-time distinction between pure and effectful code and ensures
the two cannot be mixed in an impure way.

\begin{listing}[h]
	\centering
	\begin{minted}[autogobble]{haskell}
		length []     = 0
		length (_:xs) = 1 + length xs
	\end{minted}
	\caption{A naive implementation of list length.}
	\label{lst:length}
\end{listing}

When the runtime evaluates an expression, it does so to the least extent possible.
For example, take the list \hsCode{ys = fact 5 : fact 6 : fact 7 : []} of three
values of the factorial function.\footnote{
	The empty list is spelled \hsCode{[]} and the cons cell is written infix as
	\hsCode{:}.
} Evaluation of the function applications is delayed by storing the necessary
data in runtime structures called \textit{thunks}. When we apply the function
\hsIdent{length} (defined in Listing \ref{lst:length}) to \hsIdent{ys} and
force the value of the application, \eg by printing it to standard output, the
function pattern-matches on the \hsIdent{ys} value. Case analysis requires the
scrutinee to be in \textit{\acrfull{whnf}}. An expression is said to be in weak
head normal form if it has been evaluated to the outermost data constructor
(such as \hsCode{:} or \hsCode{[]}) or lambda abstraction.

The list \hsIdent{ys} is in \acrshort{whnf} already, it is an evaluated cons
cell with a thunk at the head and another evaluated cons cell at the tail.
Since \hsIdent{length} only counts the cons cells in a list and does not need
to evaluate their elements, the application of \hsIdent{length} to \hsIdent{ys}
will leave the thunks untouched and finish in linear time.

\begin{listing}[h]
	\centering
	\begin{minted}[autogobble]{haskell}
		-- snd is non-strict in the first
		-- component of the pair
		snd :: (Int, Int) -> Int
		snd (x, y) = y

		-- purity and laziness: foo reduces to 3,
		-- complexComputation is not evaluated
		foo = snd (complexComputation, 3)

		-- non-strict semantics can prevent
		-- runtime errors
		foo' = snd (error "oops!", 3)

		-- computations are shared,
		-- even across threads
		bar = let x = complexComputation
		      in x `par` f x
	\end{minted}
	\caption[Example lazy expressions.]{Example expressions where the semantics
	of Haskell notably differ from that of strict languages.}
	\label{lst:let-x}
\end{listing}

Other examples can be seen in Listing \ref{lst:let-x}. The binding
\hsIdent{foo} evaluates efficiently to the integer~3, but not until its value
is required for the evaluation of another computation. The \hsIdent{foo'}
example is more interesting. Some programs which would crash or diverge in
strict languages cleanly terminate in Haskell.

The Haskell Prelude, a collection of commonly used definitions imported into
every module, includes the special function \hsSignature{seq :: a -> b -> b},
which evaluates its first argument to \acrshort{whnf} and returns its second
argument.\footnote{Although not necessarily in this order.} Since evaluation to
\acrshort{whnf} happens automatically, one may wonder what is the purpose of
this function. Its usefulness becomes apparent when the user starts dealing
with programs in which performance is critical or with longer-running
applications. In these situations, the problems of laziness tend to surface.
Thunks which are never forced by the program but are still reachable in the
object graph waste memory. Haskell is a garbage collected language and similar
memory leaks slow the garbage collector down, adding a negative impact on
runtime performance. The problems with laziness are well-known and difficult to
debug. The user may be tempted to add calls to \hsIdent{seq} or other utilities
to force evaluation and avoid thunk build-up.

This fight against the semantics is detrimental to the developer experience of
the language. The question arises whether the benefits of laziness outweigh the
toll it takes on the programmer. This work focuses on laying the empirical
groundwork to help answer this question.


\setsecnumdepth{all}
\chapter{State-of-the-art} \label{sec:state-of-the-art}
Although there are many functional languages of the ML family which enjoy
widespread use (F\#, OCaml, SML), Haskell is the only non-strict language among
them. The \acrfull{ghc} implements Haskell's non-strict semantics by lazy
evaluation facilitated mainly by a runtime data structure called a
\textit{thunk}, which represents delayed computations.

Laziness leads to many issues with runtime behaviour of Haskell programs,
although it is an efficient implementation of non-strict semantics as required
by the Haskell spec~\cite{haskell2010}. The accumulation of thunks at runtime is
a frequent cause of pathological memory behaviour and unpredictable
performance. There is a number of libraries and tools which aim to help the
Haskell programmer inspect the runtime state of the Haskell heap, force the
evaluation of thunks known to be forced by the program at a later point anyway,
and avoid their creation altogether for certain expressions.

We open this chapter with an overview of \acrshort{ghc}. We then follow with a
survey of several debuggers and solutions for the inspection and management of
thunks. We found no existing tools that would directly capture enough
information to be suitable for dynamic tracing and strictness analysis, but two
came close (\nameref{sec:hat} and \nameref{sec:ghc-heap-view}).


\section{The Glasgow Haskell Compiler} \label{sec:ghc}
\acrshort{ghc} is the most widespread Haskell distribution. Its plethora of
language extensions~\cite{ghc-language-extensions}, which range from simple
syntactical utilities to complex type system add-ons, lets the programmer
customise the set of features provided by the language. We briefly discuss the
internal organisation of the project and in the process explain those basics of
the Haskell language that are needed for the later chapters.

\subsection{Architectural overview}
Although a thorough and authoritative -- if a little dated -- description of
the architecture of the compiler is available in the aptly named, freely
accessible Architecture of Open Source Applications \cite{arch-ghc}, we include
a summary of the key points relevant to our work as well as to the discussed
technicalities. \acrshort{ghc} is an optimising compiler for the Haskell
language. The project consists of three major components: (1) the compiler
itself, (2) the boot libraries (a collection of core libraries \acrshort{ghc}
itself depends on), and (3) the \acrlong{rts} (\acrshort{rts}, a large library
of C code linked into every compiled program). \acrshort{rts} provides
low-overhead runtime support for facilities abstracted away by Haskell code
such as garbage collection, exception handling, or concurrency primitives.

The compiler turns Haskell source code into object and interface
files.\footnote{
	These describe high-level information about a compiled module, including
	data type definitions and in\-line\-able functions.
} The process is organised in a pipeline that consists of the following phases:

\begin{description}
	\item[Parsing] constructs abstract syntax trees. Lexical and syntactical
		errors are reported here.
	\item[Renaming] resolves identifiers into fully qualified names. Undefined
		references are reported here. The renaming phase re\-associates
		operator applications in the \acrshort{ast} formed during parsing. This
		is because Haskell allows specifying the precedence and associativity
		of infix operators, but their properties are only available after their
		references have been resolved.
	\item[Typechecking] verifies the program's type-correctness. Type checking
		annotates all binders in the program with type signatures.  Type errors
		are reported here.
	\item[Desugaring] converts Haskell surface syntax to the much smaller
		intermediate language, Core.
	\item[Simplification] performs optimisations on the Core language,
		including demand analysis, \hsCode{let} floating, dead-code
		elimination, common subexpression elimination, constructor
		specialisation, and others.
	\item[Conversion to \acrshort{stg}] translates Core to the language of the
		\acrlong{stg}, suitable for code generation.
	\item[Code generation] produces machine code or \acrshort{llvm} bitcode
		for further processing by the \acrshort{llvm} toolchain.
\end{description}

\subsubsection*{Compiler front end}
Phases of the pipeline from parsing to desugaring form the compiler front end.
It starts out with a textual representation and gradually transforms it into
increasingly structured data before passing it on to the back end. During this
process, the front end identifies and reports all errors in the user's code.

As the program flows through the pipeline, its invariants gradually change. The
codebase reflects this by passing different data types from phase to phase. For
example, the type of binders changes from \hsType{Name}s, which represent
fully-qualified names, to \hsType{Id}s, which are annotated with type
information. The types of the nodes of the surface syntax tree are indexed by
an uninhabited type \hsType{GhcPass}, which is itself indexed by the
\hsType{Pass} data type (i.e. \hsType{GhcPass} has kind \hsType{Pass -> *}),
lifted to the type level.  Together, the \hsType{GhcPass} types represent the
various phases of the compiler front end, from parsing to renaming to type
checking. The type level distinction between phases complicates the type
signatures of almost all functions in the pipeline, but the choice comes with
important benefits.  First, indexing \acrshort{ast} types by the
\hsType{GhcPass} types provides a compile-time guarantee that nodes from
different phases cannot be mixed unintentionally. Second, the phase type
parameter allows one to use the Tress that Grow pattern~\cite{trees-that-grow},
that enables easy extensions of both sum and product types at various phases.

\begin{tcolorbox}[parbox=false, breakable, title=Trees that Grow]
Let us take a small detour to explain the concept of the design pattern,
invented specifically to add extensibility to \acrshort{ghc}'s abstract syntax
data types. The basic algorithm for making a data type extensible is as
follows:
\begin{enumerate}
	\item Index the data type of choice by a type parameter $\xi$, called the
		\textit{extension descriptor},
		\begin{center}
			\hsCode{data D = ...} $\rightarrow$ \hsCode{data D &$\xi$& = ...}
		\end{center}
	\item add one new constructor $\mathrm{Extra}$ to the data type,
		\begin{center}
			\hsCode{data D &$\xi$& = C&$_1$& ... | ... | C&$_n$& ...} \\
			$\downarrow$ \\
			\hsCode{data D &$\xi$& = C&$_1$& ... | ... | C&$_n$& ... | Extra}
		\end{center}
	\item create a \textit{type family} $X_{\mathrm{Con}}$ -- a function from
		types to types -- for all constructors,
		\begin{center}
			\hsCode{type family X&$_{C_1}$& &$\quad\xi$&} \\
			$\vdots$ \\
			\hsCode{type family X&$_{C_n}$& &$\quad\xi$&} \\
			\hsCode{type family X&$_{\mathrm{Extra}}$& &$\xi$&}
		\end{center}
	\item add one field of type $X_{\mathrm{Con}}~\xi$ to every constructor.
		\begin{center}
			\hsCode{data D &$\xi$& = C&$_1$& ... | ... | Extra} \\
			$\downarrow$ \\
			\hsCode{data D &$\xi$& = C&$_1$& (X&$_{C_1}$& &$\xi
				$&) ... | ... | Extra (X&$_{\mathrm{Extra}}$& &$\xi$&)}
		\end{center}
\end{enumerate}

This small refactoring enables the programmer to both restrict the use of
certain constructors and introduce new constructors depending on the extension
descriptor $\xi$. The programmer can apply these modifications by manipulating
the definitions of the type families rather than the original data type itself.
It also lets the programmer add new fields to the existing constructors, again
depending on the particular type $\xi$.

To define the original data type without extensions in terms of its extensible
variant, it suffices to fix the extension descriptor to some type, \eg to
\hsType{Void}, and omit any equations for the type families. Doing so leaves
the type level applications of the shape $X_{\mathrm{Con}}~\mathrm{Void}$
irreducible and thus isomorphic to any empty type, with the only valid value
being $\bot$ (such as \hsIdent{undefined}). In effect, the extension fields of
constructors cannot be pattern-matched against, because they have no
constructors. The extension constructor $\mathrm{Extra}$ can still be matched,
but cannot hold any data. It can be hidden completely by not exporting it from
the module of definition.

For extensions, it suffices to add type family instances -- the analogy of
function equations for type functions -- which resolve a particular assignment
of the extension descriptor to the desired type of the extension.

As presented, the Trees that Grow transformation leaves much to be desired from
a usage perspective: we have to pass a \hsIdent{void} or \hsIdent{undefined}
for unused extension fields during construction, these extensions also clutter
the pattern matches, and matching on both constructors and multiple fields
added via extensions is clunky at best. These grievances can be solved by the
use of a convenient syntactical feature of Haskell called \textit{pattern
synonyms}\cite{pattern-synonyms}. These let the programmer abstract over
patterns and so define reusable interfaces to the data types extended via the
Trees that Grow transformation, hiding the structural complexity of the
underlying flexible data type.

For an in-depth description of the design pattern, its generalisations to
multiple type parameters, existentials, \acrshortpl{gadt}, hierarchies of
extension descriptors, as well as relations to generic programming,
typeclasses, and for many other practically useful details, we
recommend~\cite{trees-that-grow}, which introduces the idea.

Although the Trees that Grow pattern is not used universally throughout the
\acrshort{ghc} project, its concepts play an important role in many of the core
data types.
\end{tcolorbox}

\subsubsection*{Compiler back end}
The back end of the compiler starts with the desugaring phase, which translates
the resolved and type checked surface syntax into an \acrfull{ir} called Core.
The Haskell language contains many redundancies and shorthands designed to make
the syntax more user-friendly. The \acrshort{ast} data types contains hundreds
of constructors. In contrast, Core only has about 10 syntactical forms.
Essentially, it is a variant of System F extended with type equality
coercions~\cite{system-fc}.

Although Core is typed, the compiler only type checks Core programs if the user
explicitly asks for it. Core types exist mostly to validate the compiler's
internal consistency -- desugaring a Haskell program that passed typechecking
into incorrectly typed Core would be a compiler bug.

Most of the optimisation passes \acrshort{ghc} performs are local
semantics-pre\-serv\-ing transformations of Core (Core-to-Core passes) which
are applied in many iterations during invocations of the
simplifier~\cite{cmtary-core2core}. These include \eg constant folding,
inlining, or fusion of nested case expressions. The local rewritings improve
intermediate code between applications of heavier optimisations, such as
specialisation (to eliminate overloading), demand analysis, \hsCode{let}
floating, and others.

The optimised Core is transformed to a slightly different representation which
corresponds to programs of an abstract graph reduction machine, the
\acrfull{stg} (\cite{stg-classic}, later revised in \cite{stg2}). It is
translated again to the low-level imperative language \texttt{Cmm}, a dialect
of C$--$, before entering one of the final stages of the code generation phase.
A successful run of the compiler typically terminates in \acrshort{ghc}'s
built-in native code back end, but the \texttt{Cmm} representation can be
translated to \acrshort{llvm} bitcode and additionally processed by the
\acrshort{llvm} pipeline.

A notable divergence in the compiler is the bytecode compilation pipeline.
Bytecode is executed by the \acrshort{rts} interpreter, the backbone of
\acrshort{ghc}'s interactive interface (\acrshort{ghci}). \acrshort{ghci}
includes a debugger which can pause and resume the evaluation of an interpreted
Haskell program and print the runtime values of local bindings. We will discuss
\acrshort{ghci} in greater detail in Chapter~\ref{sec:analysis-design}. The
bytecode pipeline does not involve optimisations, the conversion to
\acrshort{stg}, or any later passes. Instead, the separate generator translates
Core directly to bytecode instructions, although this is about to change in
\acrshort{ghc} 9.2 \cite{mr-ghci-stg-unboxed}.

\subsubsection*{Compiler plugins}
Both the front end and the back end of the compiler can be modified or extended
in a modular way using \textit{compiler plugins}, which come in two main
flavours:\footnote{
	There are other types of plugins as well, including typechecker, hole fit,
	front end, and DynFlags plugins, but these are not that relevant to our
	work.
} Core plugins on the back end and source plugins on the front
end~\cite{ghc-source-plugins}. The former act on the Core language and are best
suited for optimisations, while high level analyses, language extensions and
code generation are better handled by the latter. We will discuss source
plugins in more detail in Chapter~\ref{sec:analysis-design}.

\subsubsection*{\acrlong{rts}}
The runtime system consists of about 50,000 lines of C and C$--$ code. It
implements all the functionality Haskell programs require that is not compiled
into the programs themselves, much of which involves low-level interactions
with abstractions provided by the operating system. The major components of the
\acrshort{rts} are the following:
\begin{itemize}
	\item A user-space scheduler which multiplexes lightweight Haskell threads
		onto heavy \acrshort{os} threads,
	\item a storage manager, including a block allocation layer, which
		abstracts over memory management, and a parallel generational garbage
		collector,
	\item primitives for exception handling, concurrency, and built-in
		operations,
	\item a bytecode interpreter and a dynamic linker for \acrshort{ghci}, and
	\item support for \acrfull{stm}.
\end{itemize}

The scheduler is at the heart of the \acrshort{rts}. Haskell threads yield to
the scheduler when their assigned slice of execution time expires, when they
run out of heap or stack space, or when they need to switch between machine
code execution and bytecode interpretation. Any foreign calls into or out of
Haskell need to pass through the scheduler as well.

The storage manager defines the data structures which represent Haskell values
at runtime. Since the understanding of these representations is crucial for the
understanding of the implementation of laziness in \acrshort{ghc} and the
trade-offs involved, we will discuss the relevant parts of the storage manager
here.

\begin{figure}[h]
	\centering
	\begin{tikzpicture}[
		memory slab/.style={
			rectangle, draw,
		},
	]
		\node[
			memory slab, rectangle split, rectangle split parts = 2,
			rectangle split horizontal, align = center
		] (obj) at (0, 0) {
			Header
			\nodepart[text width = 70pt]{second} Payload
		};
		\node[
			memory slab, rectangle split, rectangle split parts = 2,
			align = center
		] (header) at (1, -2) {
			Info table
			\nodepart{second} Entry code
		};
		% for some reason, just (a) |- (b) doesn't work, even though (a) -| (b) works fine...
		\draw[->] (obj.one south) -- (obj.one south |- header.one split west) -- (header.one split west);
	\end{tikzpicture}
	\caption{The memory layout of a generic closure.}
	\label{fig:closure-layout}
\end{figure}

\textit{Closures} (the runtime objects of programs compiled with
\acrshort{ghc}) share the same basic representation shown in Figure
\ref{fig:closure-layout}. The \textit{header} contains primarily a pointer to
the metadata of a closure, though it also includes a profiling header if
profiling is enabled. The \textit{payload} of a closure usually contains data
not known at compile time. The \textit{info table} identifies the type of the
closure (data constructor, function, thunk, \ldots). It informs the garbage
collector about the pointer\-hood of the payload. The \textit{entry code} is
the code executed when \textit{entering}, \ie evaluating the closure. For
example, the entry code of functions represents the body of the function.

The \acrshort{stg} uses a number of registers, a heap, and a stack which stores
function arguments and continuations. Closures can also reside statically in
the compiled object code of a Haskell program. During execution, any heap
allocations are preceded by a \textit{heap check}, which invokes garbage
collection if not enough space is left on the heap. Similarly, when code needs
to push values onto the stack, it performs a \textit{stack check} and grows the
stack if necessary.

All the dynamic allocations are managed by the garbage collector, including
stack frames and lightweight threads.

There are over 60 different types of closures. Here is a summary of the most
important ones:

\begin{description}
	\item[Function closures] represent Haskell functions. When entered,
		functions assume that all their arguments are present at the top of the
		stack. This is known as the \textit{eval/apply} evaluation
		model~\cite{eval-apply}.

		The payload of a function closure carries pointers to the free
		variables of the function's body.

	\item[Thunks] represent unevaluated expressions. When entered, the
		corresponding expression is evaluated and the closure is replaced with
		an indirection to the resulting value. This ensures that thunks are not
		evaluated multiple times, as subsequent attempts at evaluation will
		instead enter the indirection which will simply return the existing
		value.

	\item[Indirections] are proxies to other closures. Their payload is simply
		a single pointer to the target object. To reduce the overhead of
		sharing, indirections are removed by the garbage collector and never
		outlive the youngest generation.

	\item[Black holes] are thunks under evaluation, with a layout identical to
		that of indirections. Since thunks are shared across threads, a thread
		entering a black hole blocks until it is overwritten with an
		indirection to the evaluated object.

	\item[Data constructors] carry their arguments (fields) as payload, ordered
		such that pointers come first. Their entry code returns immediately to
		the topmost stack frame (a constructor itself is always evaluated,
		although its arguments may not be).

	\item[Thread state objects] represent lightweight Haskell threads,
		including their stacks. Since a \acrshort{tso} is simply a closure, it
		is managed by the garbage collector, just like any other heap object.
		The garbage collector sends exceptions to blocked threads which become
		unreachable.
\end{description}


\subsection{Strictness features}
\acrshort{ghc} is not only used for production ready Haskell, but also serves
as an incubator of new language features -- including those directly related to
managing the amount of laziness in a program. These allow programmers to aid
the compiler in optimising by avoiding unnecessary non-strictness where its
static analysis does not suffice.

A simple and robust method of preventing undesired laziness is the
language extension \texttt{BangPatterns}, which introduces a new pattern syntax
\hsPat{!pat} for forcing an expression to \acrshort{whnf} before
pattern-matching it against \hsPat{pat}. For short functions and clear
algorithms which do not benefit from pervasive laziness it is often very easy
to simply annotate certain patterns in the program with exclamation marks and
observe a reduction in memory consumption.

The language extension shares the exclamation mark syntax with the Haskell 2010
strictness flags feature~\cite{haskell2010-strictness-flags}. While
\texttt{BangPatterns} add optional strictness to pattern matching, strictness
flags do the same for data types. Unfortunately, proper use of this flexibility
hinges on the programmer's knowledge of how is the particular piece of code
going to be used. While it is good practice to request the early evaluation of
values which will have to be forced anyway, sprinkling strictness annotations
throughout library code in an attempt to prevent space leaks may lead to the
unintentional sacrifice of the benefits of laziness, even preventing some usage
patterns in subtle ways. Additionally, since these strict evaluation facilities
only force thunks to \acrshort{whnf}, the evaluated objects may still retain
large delayed expressions. The ability to excise thunks from a Haskell value
completely was the core motivation for the development of the \hackage{deepseq}
library.

The lack of programmer insight into how a piece of code is used in a program
and what strictness properties it has is a major developer experience
issue~\cite{memory-profiling-blog-post, anatomy-of-thunk-leak-blog-post,
nothunks-blog-post, haskell-space-leaks, detecting-space-leaks-blog-post}.
Some of the discussed debugging tools help ameliorate the problem, but
\acrshort{ghc} itself includes features especially suited to doing so. The
compiler supports two profiling modes, cost-centre profiling and
``ticky-ticky'' profiling, which the \acrshort{ghc} User's Guide dedicates a
chapter to~\cite{ghc-profiling}. While the ``ticky-ticky'' mode is only of
interest to \acrshort{ghc} developers, the cost-centre profiling functionality
is an easy-to-use tool for understanding the time and space behaviour of
Haskell programs. All it requires of the programmer is a recompilation of the
modules of interest with a few specific compiler options.

Cost-centre profiling assigns the so-called ``cost-centres'' to certain
sections of code. The \acrshort{rts} records any time spent and allocations
performed during the evaluation of code associated with a cost-centre. These
recordings are summarised by a time and allocation profiling report, which the
profiled program generates. The report indicates the time and space
requirements of each cost centre in proportion to the entire program.
\acrshort{ghc} is able to introduce cost centres automatically by adding them
to all non-in\-lined bindings, but the user also has the option to annotate
terms with a pragma to fine-tune the placement of cost centres.

\acrshort{ghc}'s implementation of profiling can shed some light on the use of
call-by-need in a Haskell program. The compiler can also provide certain deeper
insights about the program's strictness, although it presents them in a
substantially less user-friendly manner. In particular, \acrshort{ghc} can
output the translation of surface syntax to its internal language, Core. Being
a fairly small $\lambda$ calculus, Core has a clearer semantics including a
strict pattern-matching operator \hsCode{case e of arms...}, which indicates
obviously strict subexpressions. Furthermore, the Core output features
\textit{demand signatures}, inferred by \acrshort{ghc}'s demand
analysis~\cite{cmtary-demand-analysis}, which classify binders depending on how
strict they are in their arguments and to what extent do they use the
components of arguments of product types. The results of demand analysis are
crucial for subsequent optimisation. Understanding the demand signatures of a
program can equip the programmer with the information necessary to determine
which patterns would most benefit from the \texttt{BangPatterns} extension,
which data types could be annotated with strictness flags, and which parts of
the program should be refactored in other ways in order to improve the native
code generated by the compiler.

The \acrshort{ghc}-provided tooling outlined above -- particularly the option
to dump Core code during compilation and analyse demand signatures -- is rather
obscure. It is reasonable to expect the average Haskell programmer to only
reach for the profiling tools in a time of dire need, \eg when writing
high-performance code or dealing with unacceptable space leaks. It is further
reasonable not to expect the average Haskell programmer to know the internals
of the compiler well enough to ask it for the Core representation of their
program, or indeed to be aware at all of the existence of demand signatures,
which are only described in the \acrshort{ghc} Commentary.\footnote{
	The commentary is intended for \acrshort{ghc} developers and is hosted on a
	GitLab instance (online), unlike the User's Guide which is bundled with the
	\acrshort{ghc} distribution and revised for every release.
} Perhaps it would be interesting to include the strictness information
inferred by the compiler in interfaces programmers often interact with, such as
the various widgets provided by the \acrfull{hls}~\cite{gh-hls}, but to our
knowledge no such tool exists at the time of writing.

In theory, the \acrlong{ghc}'s optimisations are advanced enough to compile the
majority of Haskell code fairly efficiently, without space leaks or allocation
slow-downs, while enabling the greater flexibility, code reuse, and abstraction
of a non-strict language. However, inefficiencies introduced to support
unnecessary laziness which are small enough not to cause substantial problems
could hide in the compiled program. It is a part of the motivation behind this
thesis to lay the groundwork necessary for their detection.

\section{Existing tools} \label{sec:existing-tools}
Apart from functionality implemented in the compiler itself, the Haskell
environment includes a number of practical solutions to help with debugging. A
few of these deal specifically with the issues with laziness that Haskell
programmers have to face.

\subsection*{Hoed} \label{sec:hoed}
Hoed~\cite{gh-hoed} is a tracer and debugger for Haskell. Unlike the built-in
debugger of \acrshort{ghci}, Hoed is implemented as a regular Haskell library.
Users of Hoed manually annotate functions of interest to make the tracer
capture relevant information during execution. The annotations are simply calls
to the provided debugging function \hsIdent{observe} with a signature similar
to that of the \hsIdent{trace} function from the \hsModule{Debug.Trace} module
of Haskell's standard library. Both \hsIdent{trace} and \hsIdent{observe}
circumvent the guarantees of the type system and are in fact impure.
\hsIdent{observe} has type \hsType{Observable a => Text -> a -> a}, its
\hsType{Text} argument has to equal the name of the function being annotated.
The \hsType{Observable} constraint on \hsType{a} is used by Hoed internally,
the typeclass has a default implementation. The resulting trace of the
debugging session is exposed via a web-based interface, to which the users
connect with a regular web browser. Hoed's traces include information about
which functions have been called during the execution of the annotated program
and what were their arguments. It only collects information about annotated
functions.

Hoed features several tools to help users analyse problems with their code and
find the culprits of test failures. One of these is \textit{algorithmic
debugging}, an interactive trace browser which uses an algorithm similar to
binary search to locate the deepest incorrect function in the recorded call
tree. It does so by asking the user questions about whether certain evaluations
were correct, working its way gradually deeper into the tree. The ``algorithmic
debugger'' ultimately reports the faults it located.

While Hoed's approach to debugging is certainly interesting and quite distinct
in comparison to debuggers in other languages, it lacks any kind of awareness
of the low-level details of non-strictness. Hoed is thus intended for use with
property testers like QuickCheck~\cite{quickcheck-paper}, and not as a tool for
the identification and resolution of language implementation -dependent issues,
such as memory leaks.


\subsection*{\hackage{nothunks}} \label{sec:nothunks}
\hackage{nothunks} is a recently released Haskell package which helps in writing
thunk-free code. It defines a new typeclass, \hsTC{NoThunks}, along with
instances for common Haskell types. Any type with a \hsTC{NoThunks} instance
can be inspected for unexpected thunks. The library also implements a number of
alternatives to common functions from the Prelude. These re\-implementations
check for unexpected thunks introduced during execution, throwing an exception
whenever a thunk is detected.

The exceptions of \hackage{nothunks} contain helpful information about the
context of the thunk which the library function detected, guiding the
programmer in locating the unexpectedly lazy code or data structure. The
library also allows various relaxations to the strictness of its inspection
policy, such as the \hsType{OnlyCheckWhnf} and \hsType{AllowThunk}
\hsCode{newtype}s. Thanks to \hsModule{GHC.Generics}\cite{ghc-generics},
\hackage{nothunks} also offers the convenient \hsCode{deriving (Generic,
NoThunks)} syntax to add instances of the necessary typeclasses for custom data
structures automatically.

The \hackage{nothunks} package can greatly help fix serious memory leaks caused
by thunk accumulation. However, it is intended primarily for the complete
removal of thunks from the runtime state of a program, and does not help with
careful strictness analysis.

\subsection*{Hat} \label{sec:hat}
The Haskell Tracer Hat~\cite{proj-hat} is a source-level tracer. It works by
compiling Haskell source files to annotated -- but still textual -- Haskell
source files. After this source-to-source translation, the user compiles the
annotated source code and runs it to produce a Hat trace.

The trace is a rich recording which contains high-level information about each
reduction the program performed. Hat comes with a number of utilities for
exploring the trace files, including some forms of forward and backward
debugging, filtering utilities which show all arguments passed to top-level
functions, virtual stack traces, and even an interactive tool for locating
errors in a program, similar to one of the features of \nameref{sec:hoed}.

Hat was initially developed for the \texttt{nhc} Haskell
compiler~\cite{hat-history}. It centred around the idea of using a single, rich
trace of a program's execution to support several different kinds of debugging.
Despite its advanced features, it did not seem to attract many
users~\cite{hat-history}, possibly due to feature disparities between the
supported syntax and new language extensions.

Hat's source-to-source translation makes it portable between different
compilers. The project uses the \hackage{haskell-src-exts} package to parse the
language, rather than relying \eg on the \acrshort{ghc} \acrshort{api}. While
Hat cannot directly answer questions about the strictness of debugged
functions, its approach to rewriting the source language is interesting.
Tracing necessarily leads to runtime overhead, but the code produced by Hat is
subject to compiler optimisations. Hat therefore does not need to worry about
low-level details of the optimiser and how it reorders, splits and combines
expressions. The connection between the tracing code and the original source is
maintained thanks to the semantics-preserving nature of optimisations.


\subsection*{\hackage{htrace}} \label{sec:htrace}
\hackage{htrace} \cite{hkg-htrace} is a simple package which exports a single
function: \hsCode{htrace :: String -> a -> a}. As the name and function
signature suggest, this function mirrors the behaviour of the standard
\hsIdent{trace}, except that \hsIdent{htrace} hierarchically indents the
tracing messages based on the current call depth. It works simply by
manipulating a global mutable variable and hiding this fact from the user with
\hsIdent{unsafePerformIO}.

Although very simple and oblivious to any laziness implementation details, this
approach is still useful for debugging purposes. The indented tracing messages
suggest the depth to which various thunks are evaluated at different points of
the program's operation.

\subsection*{\hackage{ghc-heap-view}} \label{sec:ghc-heap-view}
\hackage{ghc-heap-view} is a Haskell package which enables advanced
introspection of the Haskell heap from within pure Haskell code. It relies on
the \hackage{ghc-heap} library which comes bundled with \acrshort{ghc}.

The library's notable high-level features include a function which attempts to
recreate readable Haskell source code from a runtime value, using \hsCode{let}
bindings to express sharing. There are also tree and graph data structures for
heap mapping and a high-level algebraic data type for all Haskell closures,
complete with their info tables.


\subsection*{Haskell Program Coverage} \label{sec:hpc}
\acrlong{hpc}\cite{hpc-paper} is (unsurprisingly) a code coverage tool for
Haskell. Unlike the other tools in this section, \acrshort{hpc} is not directly
related to laziness control and debugging. Similarly to Hat, \acrshort{hpc} has
a source-to-source mode of operation but additionally offers tight integration
with \acrshort{ghc} and comes bundled with modern releases of the compiler. It
supports all \acrshort{ghc} language extensions.

\acrshort{hpc} allows easy instrumentation of arbitrarily complex Haskell
programs without source annotations. It wraps subexpressions in the program
with an unsafe side-effecting function which records its evaluation by mutating
a module-wide array of integer counters. The final state of the per-module
arrays forms the \acrshort{hpc} trace. This architecture is wired into the
\acrshort{ghc} compiler pipeline in all the major data structures (the surface
syntax, Core language, and \acrshort{stg}), which makes it both robust and
per\-for\-mant. The tool comes bundled with utilities for displaying the
original source code with colourful mark-up, highlighting interesting
subexpressions based on the information extracted from the trace. Notably,
\acrshort{hpc} supports traces of the boolean values of pattern guards, which
are added to the visualisation.

\acrshort{hpc}'s feature set can be of tremendous help to the Haskell
programmer, especially when combined with tools like
QuickCheck~\cite{quickcheck-paper}. However, its traces are tuned specifically
for code coverage and do not contain enough information to be useful for any
kind of dynamic strictness analysis. While the \acrshort{hpc} traces are
sufficiently granular, the subexpression counters lack necessary information
about their execution context and timing.


\subsection*{Summary} \label{sec:summary}
Table \ref{tbl:thunk-manager-comparison} summarizes the surveyed tooling. The
\textit{Memory awareness} column suggests to what extent is the particular
package or program aware of runtime representations. Independence of the
structures underlying Haskell values leads to better portability and a clean
interaction with regular Haskell code. On the other hand, more low-level
approaches such as \nameref{sec:ghc-heap-view} give a much clearer view of the
runtime state.

\begin{table}\noindent
\begin{minipage}{\textwidth} % support for footnotes within a tabular environment
	\centering
	\small
	\begin{tabularx}{\textwidth}{|| l *{3}{L} >{\raggedright\arraybackslash}X ||}
		\hline
		Tool
		& Source changes
		& Order of evaluation
		& Thunks
		& Memory awareness
		\\ \hline \hline

		\nameref{sec:hoed}
			& Required    % source changes
			& Recorded    % order of evaluation
			& Transparent % thunks
			& None        % memory awareness
			\\ \hline
		\nameref{sec:nothunks}
			& Required    % source changes
			& Ignored     % order of evaluation
			& Detected    % thunks
			& Limited     % memory awareness
			\\ \hline
		\nameref{sec:hat}
			& Unnecessary % source changes
			& Recorded    % order of evaluation
			& Transparent % thunks
			& None        % memory awareness
			\\ \hline
		\nameref{sec:htrace}
			& Required    % source changes
			& Illustrated % order of evaluation
			& Transparent % thunks
			& None        % memory awareness
			\\ \hline
		\nameref{sec:ghc-heap-view}
			& Unnecessary % source changes
			& Ignored     % order of evaluation
			& Reified     % thunks
			& Full        % memory awareness
			\\ \hline
	\end{tabularx}
	\caption{An overview of existing solutions to thunk discovery and laziness
	debugging.}
	\label{tbl:thunk-manager-comparison}
\end{minipage}
\end{table}

Despite Haskell users' considerable interest in avoiding the implicit delaying
of computations which the language is notorious for, there are no records of a
large-scale study of the use of laziness in practice akin to
\cite{emp-study-laziness-r}. The tool with a feature set closest to what is
necessary for a comprehensive analysis of the practical use of laziness is
likely \hackage{ghc-heap-view}, which allows the user to interactively inspect
the heap objects and look inside thunks using \acrshort{ghci}. However, the
package primarily provides a rich library interface. It does not implement a
tracing mode, which would facilitate collection of laziness-relevant
information during the execution of entire programs.


\chapter{Analysis and design} \label{sec:analysis-design}

The goal of this work is to design and implement a tool suitable for
understanding how is laziness used in real-life Haskell programs. To analyse
the practical implications of \acrshort{ghc}'s implementation of non-strict
semantics, we have to understand the strictness properties of functions. For
example, some arguments may be evaluated if and only if others are. Our tool
must capture these dependencies and usage patterns, as they may uncover both
use cases where laziness is essential and places where it could be safely
avoided, even though static analysis cannot determine so. In this chapter, we
focus on the task of dynamic tracing and evaluate two possible solutions to the
problem.

\section{Approach} \label{sec:approach}

Dynamically inferring the strictness properties of functions requires a peek
under the hood of Haskell's runtime machinery. Typical Haskell code is
oblivious to the underlying representation of the values it manipulates, as
reification of the heap objects underneath the abstractions would weaken
equational reasoning and parametricity.

Once we have the power to inspect the runtime representations of values, we
need to use it to determine the strictness of functions. A function \hsIdent{f}
is strict in an argument \hsIdent{a} if \hsIdent{a} has to be evaluated
whenever \hsCode{f a} is evaluated.

There is a number of possible approaches to this problem. As discussed in
Chapter~\ref{sec:state-of-the-art}, related projects which we could build on
already exist, at various levels of abstraction. At the lowest level, we could
modify the \acrshort{rts} and extract information about heap objects there. We
could also modify the compiler in various ways, since it already includes
support for \acrshort{hpc} and profiling, which is similar to the tracing we
would like to implement. Another option is to follow the path of
\nameref{sec:hat}, rewriting the textual source code of traced programs. In
this work, we explore two design directions: extending \acrshort{ghci} and
tracing with compiler plugins.

\section{Using \acrshort{ghci}} \label{sec:using-ghci}
The bytecode compilation pipeline and the interpreter offer a refreshing break
from the comparative complexity of \acrshort{ghc}'s back end. The interpreter
is at the perfect level of abstraction to directly track evaluations of
interpreted functions and the stream of control flow in a program.

One of the issues with this approach that is clear from the outset is the range
of supported language extensions. The bytecode compiler and the interpreter
lack support for un\-box\-ed tuples and sums, shrinking the set of programs the
tracer would be able to analyse. In our design of a \acrshort{ghci}-based
solution, we took inspiration from \cite{emp-study-laziness-r}, which modified
the virtual machine of the R language.

Before we describe the \acrshort{ghci} tracing modifications, let us take a
closer look at how this part of the compiler project works. \acrshort{ghci} is
an interactive interface built on \acrshort{ghc}'s bytecode compilation
pipeline and the bytecode interpreter of the \acrshort{rts}. It offers a
read-eval-print loop popular in other functional programming languages.

\acrshort{ghci} consists of several key components: the \acrshort{ui}, the
\acrshort{ghci} library code, the debugger, the bytecode generator, and the
bytecode interpreter. The following sections will introduce each of the
building blocks from which \acrshort{ghci} is composed, starting with an
overview of how they fit together.

\subsection*{The life of an interpreted expression}
\acrshort{ghci} can serve either as a \acrshort{repl} interface, processing
expressions one by one, or as an alternative compilation and execution
environment for entire modules. These two modes can be freely mixed. The
backbone of \acrshort{ghci} is a modified \acrshort{ghc} pipeline which
culminates in bytecode generation, producing a collection of bytecode objects
together with high-level information about breakpoints, pointers to allocated
string literals, and other data.

Compiled\footnote{That is, compiled to bytecode instructions.} bytecode objects
and their metadata together form \hsType{CompiledByteCode}. \acrshort{ghci}
includes a dynamic linker capable of resolving references between bytecode
objects as well as between \acrshortpl{bco} and object code.

These features are transparent to the user, who manipulates \acrshort{ghci} via
its user interface. The \acrshort{ui} is implemented separately from the core
functionality and communicates with the \acrshort{ghci} library code via
message-passing. This separation allows the \acrshort{ui} and the library code,
which is in charge of interpreted evaluation, to run in different processes.
\acrshort{ghci} features a mode of operation called ``Remote \acrshort{ghci},''
in which the \acrshort{ui} and the interpreter communicate over a Unix pipe.
Remote \acrshort{ghci} is useful for situations where the capabilities and heap
object definitions of the runtime systems of the compiler and the interpreter
differ, e.g. when the compiler and \acrshort{ghci} \acrshort{ui} were built
regularly with optimisations, but the interpreter was built with profiling.

When evaluating an expression, the library code forks a new thread to perform
evaluation independently of the interpreter server. This ensures that
exceptions raised during evaluation of an expression do not crash
\acrshort{ghci}. The server forwards exception handlers appropriately to ensure
this is the case. The two threads communicate via \textit{mutable variables},
or \hsType{MVar}s. These are concurrency primitives from the
\hsModule{Control.Concurrent.MVar} module which effectively implement
concurrent, mutable \hsType{Maybe}s~\cite{concurrent-haskell}. A mutable
variable of type \hsType{MVar a} contains either no values or a single value of
type \hsType{a}. It can be safely shared across threads and supports
operations \hsIdent{takeMVar} and \hsIdent{putMVar}. The former operation
extracts the value stored in an \hsType{MVar}, leaving the variable empty if a
value is present. If the variable is empty, the operation blocks. The
complementary operation \hsIdent{putMVar} blocks on a full variable and fills
it with a value as soon as it is empty.

Two \hsType{MVar}s play an important role in the design of \acrshort{ghci},
\hsIdent{statusMVar} and \hsIdent{breakMVar}. These variables form a
communication channel between the server thread and the thread responsible for
the evaluation of an expression, which we will call the \texttt{eval} thread.

When the server thread forks into the \texttt{eval} thread to begin expression
evaluation, it waits on the \hsIdent{statusMVar}. The \texttt{eval} thread
keeps running and eventually either produces a result, throws an exception, or
hits a breakpoint. In the former two cases, it simply fills the
\hsIdent{statusMVar} with the appropriate information (either the result of
evaluation or the exception) and exits. The server thread resumes execution,
passing the result from the \texttt{eval} thread to the \acrshort{ui}.

The case when the \texttt{eval} thread hits a breakpoint is more interesting.
First, the \texttt{eval} thread fills the \hsIdent{statusMVar} to wake the
server thread, notifying it of the breakpoint. Then it waits until the server
thread fills the \hsIdent{breakMVar}, pausing evaluation. The server thread
notifies the \acrshort{ui}, passing along an identifier of the breakpoint that
the \texttt{eval} thread hit. At this point, the \acrshort{ui} notifies the
user that evaluation paused on a breakpoint. The user can continue to enter
expressions into the \acrshort{ghci} prompt, these will be evaluated
independently by newly forked threads. These can also hit breakpoints and wake
the server thread, which notifies the \acrshort{ui} again. None of this
interferes with the initial \texttt{eval} thread, because every forked
\texttt{eval} thread gets a new pair of status and breakpoint mutable
variables. The user may resume execution in the \acrshort{ui}, which messages
the server thread, which in turn fills the appropriate \hsIdent{breakMVar},
waking the \texttt{eval} thread and blocking on \hsIdent{statusMVar} once
again.

\subsection*{Bytecode generation}
The bytecode facilities of \acrshort{ghc} involve a detour from the typical
sequence of steps performed to transform Haskell sources all the way to a form
suitable for linking or execution. After desugaring, the program is transformed
directly into bytecode instructions.\footnote{
	As previously mentioned, this approach will soon be replaced by a new
	bytecode pipeline which follows the usual compilation process all the way
	to \acrshort{stg}\cite{mr-ghci-stg-unboxed}.
} Optimisations implemented in the simplifier are not performed.
\acrshort{ghci} is intended for interactive evaluation and favours fast,
iterative development over runtime performance, making the naive code
generation approach a reasonable choice.

Every top-level definition, every scrutinee of a \hsCode{case} expression, and
every right-hand side of a non-trivial \hsCode{let} expression are compiled to
a \acrfull{bco}. Such an object contains an array of bytecode instructions
together with the data typically associated with a heap object: the arity of
the \acrshort{bco}, a bitmap indicating which of its arguments are pointers,
the literals it refers to, and pointers to various objects it refers to
(symbols, primitive operations, other \acrshortpl{bco}, or the object's array
of breakpoint information).

The bytecode format comprises 67 instructions in total, 35 of which only exist
to provide various ways of pushing values to the stack. The rest of the virtual
instruction set consists of a few instructions for heap allocation, various
less-than and equality tests, two instructions for invoking the C
\acrshort{ffi}, an explicit stack check instruction, and others. The wide
variety of instructions of a shared or similar purpose, particularly in the
case of stack pushes, is the consequence of distinguishing between the
representation of their arguments. These can be pointers subject to garbage
collection, word-sized integers, 64bit integers, floating point and double
precision numbers, etc.

There is one particular instruction that catches the eye: \texttt{BRK\_FUN}.
The bytecode generator places \texttt{BRK\_FUN} instructions at the very
beginning of every bytecode object. These instructions correspond to
breakpoints, though they are only relevant when the user has ``placed'' a
breakpoint at a position in the source code. Alas, ``placing'' breakpoints is
something of an illusion, the instructions are pervasive and every breakpoint
has a numeric identifier assigned at compilation time. The introduction of a
new breakpoint in the \acrshort{ghci} \acrshort{ui} simply sets a flag in a
breakpoint bitmap, enabling the corresponding breakpoint instruction from the
perspective of the interpreter.


\subsection*{The bytecode interpreter}
The interpreter which \acrshort{ghci} relies on is a part of the
\acrshort{rts}. Its primary workhorse is the \texttt{interpretBCO} function
which handles closure evaluation, unboxed returns, function application, and
interpretation of bytecode instructions. For tasks it is unable to deal with,
such as application of machine-code functions, it returns to the scheduler.

The interpreter looks at the top of the stack to decide what to do. If it finds
a closure, it inspects its type. Most closures are evaluated by
\textit{entering}, that is, execution jumps to their entry code. For some types
of closures (such as indirections), the interpreter includes shortcuts to avoid
the overhead of returning from the interpretation loop to the scheduler and
entering the closure. To handle other types of closures, the interpreter
returns to the scheduler, setting a field in the \acrshort{tso} which indicates
that execution should proceed with machine code evaluation when the thread is
woken again.

If the stack is set up for a \acrshort{bco} application with a
\texttt{RET\_BCO} closure below a bytecode object with its arguments, the
interpreter executes the \acrshort{bco} instructions. Interpretation of byte
code works simply by case analysis on the current instruction.

\subsection*{The debugger}
A notable feature of \acrshort{ghci} is its debugger, which allows the
programmer to place breakpoints on certain expressions in their code. The
interpreter then pauses execution when it is about to evaluate an expression
marked by a breakpoint.

Due to laziness, the order in which breakpoints are hit depends on the order in
which their respective thunks are forced to \acrshort{whnf}, not directly on
the order in which functions are called. Breakpoints thus equip the Haskell
programmer with a powerful tool for debugging order of evaluation issues caused
by the language's non-strict semantics.

Internally, breakpoints rely on a special bytecode instruction called
\texttt{BRK\_FUN}. Upon encountering this instruction, the interpreter first
checks whether it is already returning from a breakpoint (via a flag in the
\acrshort{tso}). If it is not returning from a breakpoint and the associated
breakpoint is enabled, the interpreter pauses execution at this point.

Pausing on a breakpoint is quite an involved action. The interpreter prepares
to call a ``breakpoint IO action,'' a Haskell function invoked to resume
\acrshort{ghci}'s server thread by filling the shared mutable variable. A
pointer to this function is kept in a global variable in the \acrshort{rts} and
updated from the Haskell side via \acrshort{ffi}. The preparation for an IO
action call saves the top stack frame to a new closure, a pointer to which is
passed to the IO action. The stack is then set up to call the IO action, and
the interpreter returns to the scheduler in order to perform the call.

At no point is the instruction pointer persisted -- the progress of evaluation
of the current \acrshort{bco} is lost whenever the interpreter stops at a
breakpoint. This is acceptable, as the bytecode generator makes sure to only
put \texttt{BRK\_FUN} instructions at the very start of bytecode objects and
the \acrshort{tso} flag ensures that a just-visited breakpoint is not stopped
at again.

When stopped at a breakpoint, the user can still evaluate expressions at the
\acrshort{ghci} prompt. It is thus possible to encounter another breakpoint
while stopped at a breakpoint or to hit the same breakpoint multiple times,
without ever resuming paused evaluation. The debugger maintains a stack of
contexts which makes this possible.

Additionally, the \acrshort{repl} has access to the free variables in the
paused expression. The \texttt{:print} command lets the user print the values
of the free variables without forcing their evaluation, binding thunks to fresh
variables. The \texttt{:force} command does the same, except that it forces the
evaluation of the reference it is applied to. These commands are available even
when not stopped at a breakpoint, but become especially useful when stepping
through a program.

\subsection*{Trace\-points}
Our design for dynamic tracing via \acrshort{ghci} builds on the existing
functionality of breakpoints. We introduce \textit{trace\-points}, a simpler
variant of breakpoints which uses the communication channel between the server
and \texttt{eval} threads to pause evaluation at every breakable expression.

The compiler features a generic approach to \acrshort{ast} annotations via the
\hsType{Tickish} type. Values of the \hsType{Tickish} data structure are called
\textit{ticks} and appear both in the surface syntax and in the Core language.
\nameref{sec:hpc} functionality, profiling, and breakpoints are all implemented
as ticks annotating Haskell expressions. The compiler includes scoping rules
for ticks, these specify how closely should a particular tick stick to the
expression it annotates, which is important for optimisations.

Profiling ticks and breakpoints are added to the Core program during the
desugaring phase. Breakpoint ticks capture the free variables of the expression
they annotate. When the interpreter pauses at a breakpoint and sets up the
stack to call the \acrshort{ghci} breakpoint IO action, it saves the top stack
frame\footnote{
	That is, the portion of the stack referenced by the just entered
	\acrshort{bco}. The size of the stack frame is determined from the
	\acrshort{bco}'s bitmap, which indicates the pointerhood of its arguments /
	free variables.
} into a stack application closure. A pointer to the closure is passed to the
IO action and then messaged to the UI. This lets the debugger print out the
values of free variables at the site of a breakpoint.

Tracepoints are equivalent to breakpoints, except that they do not need to
support nested hits. Our motivation for introducing a breakpoint analogy was to
open room for future extensions while preserving the functionality of
breakpoints. Tracepoints share the \hsIdent{breakMVar} with breakpoints.
Besides the communication channel between the server and the \texttt{eval}
threads, most of the changes necessary for tracepoints are simple duplications
of the existing breakpoint infrastructure.  We add a new constructor for
\hsType{Tickish} which represents a tracepoint, a new bytecode instruction
\texttt{TRC\_FUN}, a new IO action, a new status message for tracepoint hits,
and the accompanying code throughout \acrshort{ghci}. Modifications to the
module abstractions are not necessary because tracepoints do not need to be
enabled or disabled -- the interpreter should stop at each and every one.

We modify the \acrshort{ui} such that when a tracepoint is hit, all the values
of the free variables of the corresponding expressions are logged to a trace
file. We also save accompanying timing information and the corresponding
location in the source file. Evaluation then resumes without pause.

\subsection*{Summary}
The tracepoint approach could be extended and improved in many ways. For
example, we could tighten the loop between the interpreter and the reaction to
a tracepoint encounter. There is no good reason why it should have to route
through the \acrshort{ui}. We could also replace the built-in \texttt{:print}
command with \nameref{sec:ghc-heap-view} and perform additional analysis of the
runtime values while stopped at a tracepoint.

Nevertheless, our evaluation of the outlined solution points to significant
problems. The resulting trace does not contain enough information to make
useful inferences about the strictness properties of the code. The problem
seems to be that with nested thunks, evaluation does not follow a single
logical thread, but rather ``skips around'' depending on the demands made on
the delayed computations. The hard-to-predict, gradual normalisation of values
makes it prohibitively difficult to relate different trace entries. The context
in which a trace record originated is not guaranteed to be the same for the
previous or the following record.

These problems could be overcome by introducing state into the tracing
framework. It would be enough to mirror the tree-like hierarchy of
subexpressions in the source code with tokens which would relate them at
runtime, similarly to how \nameref{sec:htrace} uses a simple mutable counter to
track nested calls. Unfortunately, neither the bytecode pipeline nor the
interpreter offer a good point of extension for adding said state. Introducing
impurities in the Core program is difficult and error-prone. The bytecode
generator and interpreter are both too far removed from the source program and
would require sizeable changes to accommodate the sort of ad-hoc functionality
necessary to make tracepoints work well.

The \acrshort{ghci} approach has other downsides as well. The current bytecode
pipeline lacks support for some language extensions and interpretation is much
slower than execution of machine code. These issues are not as problematic as
the low utility of the tracing results, however.

\section{Using compiler plugins}
To produce useful tracing output, a dynamic tracing framework must capture
interesting events during a program's evaluation and relate them to one
another. In particular, the evaluation of function arguments must be clearly
related to the respective function call to enable reasoning about the
strictness of a function on a call-by-call basis. While retaining the order of
evaluation is trivial in a call-by-value language, laziness introduces
interleaving. This can only be dealt with by the introduction of state into the
program (or into the tracing framework) in order to recover the dependencies
between function calls and argument evaluations, which are no longer implicit
in the order of the trace events.

It is this function-call-specific state that becomes difficult to express
without high-level information about the program structure at hand, as is the
case with the \acrshort{ghci} approach described in Section
\ref{sec:using-ghci}.

\subsection*{Adding state}
Fortunately, introduction of function-call-specific state is rather trivial.
The source program can simply be rewritten to store the state in local
variables. It suffices to keep a unique identifier of the particular function
call that the argument evaluation traces can refer to. Such a unique identifier
necessarily needs to change with every function call. In clean Haskell code
without unsafe features, this is impossible in general, as the language
requires the use of the \hsType{IO} monad in order to perform side-effecting
computations.

Since rewriting functions into a monadic form would be a difficult undertaking,
we prefer the way of unsafe features. Integer counters are enough for call
identification purposes, so we choose to keep one counter per function. All
counters can be stored in a single mutable map indexed by function names.
Unsafe functions in the standard Prelude can also be used to persist tracing
information to a file.

Equipped with a means of introducing benign side-effects into programs for
tracing purposes, we are in search of a way of rewriting source code to put
these side-effects to use. One plausible approach would be direct source code
rewriting, akin to \nameref{sec:hat}. As described in Section
\ref{sec:existing-tools}, source-to-source transformations have the benefit of
generality, but also the downside of additional complexity in both the
rewriting process itself and the build process of the program, which the user
of our tool would have to deal with. Furthermore, true implementation
agnosticism of the tracing framework would require compiler-independent support
for inspection of the Haskell heap, for which no solution seems to exist at the
time of writing. A less general but more ergonomic way of rewriting source code
is via \acrshort{ghc}'s \textit{source plugins}, which hook directly into the
compiler pipeline and can operate on the surface-level syntax at different
stages.

\subsubsection*{Source plugins}
Source plugins~\cite{ghc-source-plugins} are a relatively recently introduced
feature of \acrshort{ghc}. Compiler source plugins are Haskell packages which
invoke the \acrshort{ghc} \acrshort{api} to hook into the compiler pipeline and
modify the compiled program at various stages of the front end. Unlike Core
plugins~\cite{ghc-compiler-plugins}, which operate on the internal language,
source plugins deal with the entirety of Haskell's surface syntax.

Rather than parsing, transforming, and serialising the source code separately
to the compilation step, we can design a plugin that performs the required
source transformations in the compiler pipeline directly. We introduce two
tracing functions, \hsIdent{traceEntry} and \hsIdent{traceArg}. We then rewrite
the source program to call \hsIdent{traceEntry} every time a function in the
program is invoked and we thread every reference to a function's argument
through \hsIdent{traceArg}. This introduces the opportunity to inspect the
runtime representations of the arguments passed to a function when the result
of the function is under scrutiny.

We can determine some of the strictness properties of a transformed function
from the calls it makes to the tracing utilities. If we record a call to a
(transformed) top-level function \hsCode{f :: Int -> Int -> Int} defined as
\hsCode{f x y = ...} via \hsIdent{traceEntry} but no calls to
\hsIdent{traceArg}, the function makes no use of any of its arguments, and is
therefore non-strict in both of them. Examples of functions of this behaviour
include \hsCode{f x y = 3}, \hsCode{f x y = undefined}, or \hsCode{f x y = f x
y}. Note that the latter example references the arguments on the
\acrshort{rhs}, but these references are never evaluated. If a call to
\hsIdent{f} is followed by a call to \hsIdent{traceArg} for the \hsIdent{x}
argument, but the program terminates and no calls to \hsIdent{traceArg} for the
\hsIdent{y} argument occur, we say that \hsIdent{f} is strict in \hsIdent{x}
and \textit{potentially lazy} in \hsIdent{y}. \hsIdent{f} could be lazy in
\hsIdent{y}, but it could also conditionally require \hsIdent{y} to be
evaluated based on the value of \hsIdent{x}. The property of a multi-argument
function being strict in one argument if another argument matches a predicate
(and being non-strict in that argument otherwise) is what makes the
interpretation of traces of nested functions tricky.

More interesting cases arise with types which can contain thunks themselves --
their \acrlong{whnf} differs from their normal form. Consider \hsCode{g :: Int
-> [Int]} (defined as \hsCode{g x = ...}). We could observe entries of
\hsIdent{g} to be always followed by evaluations of \hsIdent{x}. This does not
mean that \hsIdent{g} is necessarily strict in \hsIdent{x}, because the
evaluations we have observed may simply happen to be immediately followed by
evaluations of \hsIdent{g}'s return value beyond \acrshort{whnf}. This is the
case in Listing \ref{lst:list-generator}.

\begin{listing}[h]
	\centering
	\begin{minted}[autogobble]{haskell}
		g :: Int -> [Int]
		g x = [x]

		main = print . g $ 2 + 2
	\end{minted}
	\caption{Deep evaluation of an applied lazy function.}
	\label{lst:list-generator}
\end{listing}

However, a decisive analysis of the strictness of a function is not the point
of dynamic tracing. We wish to understand the use of laziness in practice. We
are interested not only in whether a function used its argument at all, but
also in \textit{how many} calls it did so. We would like to learn \eg whether
there are lazy functions which are only invoked in a strict manner (akin to the
use of \hsIdent{g} in Listing \ref{lst:list-generator}), or invoked in a strict
manner most of the time.\footnote{
	For small functions like the one in our example, inlining will take care of
	eliminating unnecessary laziness. For larger functions, inlining may not
	help, or the strictness analysis may be too conservative to eliminate
	unnecessary overhead.
} The relationships between the values of arguments in a single function call
are instrumental in providing the context necessary for the interpretation of
tracing results.

\subsection*{Rewriting the \acrshort{ast}}
Armed with the impure tracing functions and a plan on how to apply them, we
move on to the problem of syntax tree transformation. The \hsModule{GhcPlugins}
module~\cite{hkg-ghcplugins} of the \acrshort{ghc} \acrshort{api} includes the
necessary functions to hook into the compiler pipeline. A source plugin can
choose to modify the syntax tree at three different stages: right after
parsing, between renaming and typechecking, or just after the typechecker has
run. These hooks involve different trade-offs. Construction of new (sub)trees
becomes more and more difficult further down the pipeline as the internal
representation accumulates metadata from the various stages. On the other hand,
the available metadata may be necessary for certain tasks and can help plugin
authors write more robust implementations. For example, constructing parsed
expressions is almost as easy as writing the surface syntax in a source file,
using strings as identifiers, but it may result in accidental captures of
bindings in scope. Because the renaming phase disambiguates identifiers,
constructing renamed \acrshortpl{ast} avoids this issue, at the expense of
either working with abstract identifiers, or invoking a renaming phase
manually.

As Pickering's introduction to source plugins shows, the costs associated with
the construction of syntax trees later in the pipeline are not
prohibitive~\cite{blog-source-plugins}. The \acrshort{ghc} \acrshort{api}
exports high-level functions which let the plugin author take trees from parsed
to renamed to typechecked in only a few lines of code.  Moreover, the plugin
author can use the quasi\-quoting features~\cite{th-quasiquoting} of Template
Haskell~\cite{th-classic} to greatly simplify the construction of expressions.
The quasi\-quoting facilities even manage references to definitions in the
scope of the plugin's source code automatically. Common patterns in the
expressions created by the plugin can be included as regular top-level
definitions in the plugin's module or in a module the plugin depends on and
spliced into the syntax tree. With these high-level features in mind, the
suitable injection mechanisms for a dynamic tracing source plugin seem to be
before and after typechecking. We only discuss the latter approach in the
following text, even though a source plugin operating on the renamed
\acrshort{ast} would likely be very similar. Note that the \acrshort{api} makes
no hard distinction between the different approaches to pipeline extensions.
Indeed, a source plugin simply provides a value of the \hsType{Plugin} data
type, overriding the appropriate fields of a default plugin implementation with
monadic functions. A source plugin could run custom code after each of the
front end stages.

The actual process of rewriting the right-hand sides of function definitions
involves the data types for the surface syntax of Haskell, which has hundreds
of constructs~\cite[Key~Design~Choices]{arch-ghc}. The general task of
transforming hierarchies of deeply nested data types has many innovative
Haskell solutions, including optics and generic programming. While we could use
pro\-functor optics or novel generic approaches, we leverage a fairly simple,
if a bit dated, generic programming technique via the \acrfull{syb}
library~\cite{syb-paper}. \acrshort{syb}'s built-in querying and transformation
schemes empower the Haskell programmer with means of applying type-specific
functions in all appropriately typed fields of a nested data structure. The
library is built using powerful generalisations of folding and a number of
combinators, making it easy to create new traversal schemes as compositions of
existing building blocks.


\chapter{Implementation}
In this chapter, we confront some of the development-related issues that tend
to arise when working with the \acrshort{ghc} project, in the hope of easing
future endeavours. The description of the problems we encountered in our work
relates to features of the compiler which did not fit in earlier chapters.
Next, we discuss the implementation details of a compiler source plugin we
developed for dynamic tracing and present the results of the implementation.


\section{Working with \acrshort{ghc}}
While obtaining and compiling a local copy of \acrshort{ghc} source code is
unnecessary for compiler plugin development, a programmer inexperienced with
the internals of the project may find it helpful to occasionally peek under the
hood of its \acrshortpl{api}. Obtaining a working copy is naturally a
prerequisite for modifying the project -- we did so when evaluating the
\acrshort{ghci} approach.

The \acrshort{ghc} codebase is a large and complicated collection of source
files written primarily in Haskell and C~\cite{arch-ghc}. The project is
supported by a custom build system called Hadrian~\cite{hadrian}, itself
written in Haskell, introduced to replace architecture based on \acrshort{gnu}
Make. As of \acrshort{ghc} 8.10.2, either system can build the project. We do
not recommend using Make: Hadrian was developed to address many issues with
Make, including its excessive complexity and poor performance. Additionally,
the build tool of the programmer's choice can be combined with a Docker- or
Nix-assisted setup, simplifying the installation of other dependencies required
for the build process. Tools for simplifying library management are compelling
choices for projects with larger teams. However, we found it significantly
easier to set up a ``native'' development environment. Both Nix and Docker
suffer from substantial disk usage overhead. Our experience in developing a
Docker container for \acrshort{ghc} 8.10.2 indicates that large portions of the
native setup have to be replicated in the container.

For example, correctly configuring \acrshort{hls} to load the \acrshort{ghc}
source code is quite an involved task, eased by the \texttt{hie-bios}
project~\cite{gh-hie-bios} which ships with Hadrian (as a part of the
\acrshort{ghc} source tree). However, compiling \acrshort{ghc} inside the
container results in \texttt{hie-bios} reporting include paths specific to the
container's file\-system to the language server, which in turn breaks all of
its functionality. One possible solution is installing \acrshort{hls} inside
the Docker container instead and letting the code editor of choice talk to the
containerised process. In comparison, a native setup with \texttt{ghcup} is
far easier and comes with fewer surprises along the way, which is why we would
recommend it for a project similar in scale to ours.

\subsection*{Build process}
The first step to working on the project after obtaining the source code is
setting up the build system. \acrshort{ghc} is a self-hosting compiler, meaning
it can and does compile its own source code. Since the project quickly adapts
to use new language extensions, specific releases of \acrshort{ghc} require
specific versions of it already installed. As a result, the management of
\acrshort{ghc} versions on a Unix-like system with a system-wide package
manager can be difficult. The \texttt{ghcup} tool~\cite{ghcup} can greatly ease
this task and enable quick switching between versions. \texttt{ghcup} lets the
developer quickly install and switch between the releases of not only
\acrshort{ghc} itself, but also Cabal, the Haskell build system and dependency
manager, and the \acrfull{hls}, an \acrshort{lsp}-compliant language server
providing Haskell-specific editor integration features.

In our setup, we chose to use Hadrian. The build system bootstraps the compiler
in several steps. The \acrshort{ghc} provided by the system is referred to as
the \textbf{stage 0} compiler. \acrshort{ghc} comes with build scripts which
use the \textbf{stage 0} compiler to build first the Hadrian build system. Once
Hadrian has been built, the user invokes it to build the \textbf{stage 1}
compiler, which is a \acrshort{ghc} linked against the \textbf{stage 0}
\hackage{base} library. The \textbf{stage 1} compiler is subsequently used to
build the core libraries from scratch. It is then utilised again to build the
\textbf{stage 2} compiler, which is linked against the freshly built
\hackage{base}. The \textbf{stage 2} compiler constitutes a complete build of
\acrshort{ghc} from source code. There is an optional follow-up step, where the
\textbf{stage 2} compiler builds a \textbf{stage 3} compiler, which is useful
for profiling \acrshort{ghc} while building \acrshort{ghc}.

Hadrian offers many configuration options defined as a Haskell module called
\hsModule{UserSettings} and comes with a template settings file which explains
the various options the user can tweak. The main abstraction is a \textit{build
flavour}, a collection of build settings which fully define a \acrshort{ghc}
build. For debugging the \acrshort{rts}, it is important to know about a
lower-level abstraction called a \textit{build way}. Libraries and the
\acrshort{rts} can be built in multiple \textit{ways}, which configure
far-reaching \acrshort{ghc}-specific features, such as profiling and debugging
symbols. The \acrshort{rts} can be built in threaded and non-threaded ways, the
latter way runs in a single operating system thread. When exploring
\acrshort{ghci}, we developed a custom build flavour which built the
\acrshort{rts} with the debugging variant of the threaded way. There are still
some issues with the Hadrian \acrshort{rts} build process, which means the
configuration has to include more ways than necessary.\footnote{
	\url{https://gitlab.haskell.org/ghc/ghc/-/issues/17814}
}

After the initial build, the \textbf{stage 1} compiler can be \textit{frozen}
by passing a command-line option to the build system on subsequent invocations.
This prevents rebuilding the \textbf{stage 1} compiler every time a source file
changes, which speeds up the edit-compile-run cycle tremendously.

\subsection*{Developer experience}
The complexity of the \acrshort{ghc} codebase confuses many an editor plugin,
which makes code navigation and exploration difficult. Thankfully, the
\acrshort{hls} project continues to accumulate useful features and improve in
stability. Over the course of our work, \acrshort{hls} went through releases
0.9.0, 1.0.0, and 1.1.0. The latter can reliably search for symbol references
in most of the \acrshort{ghc} source files. This feature was introduced in
release 1.0.0, but the indexing process got stuck in the \acrshort{ghc}
project. Given the daunting scale of the compiler, similar tooling is
tremendously helpful to an inexperienced developer. If the language server
continues to mature, it could play an important role in attracting more
developers to \acrshort{ghc} or even to Haskell in general.

In stark contrast to the near out-of-the-box experience of \acrshort{hls}, the
Cabal package system is very user-unfriendly software. It consists of the Cabal
library and the \hackage{cabal-install} package, which provides the
\texttt{cabal} command-line tool. The project has outgrown its intended use
cases, as evidenced by the sorry state of the command-line interface. For
example, the interface can install but not uninstall packages. The commands
that \texttt{cabal} implements are split into variants with \texttt{v1-},
\texttt{v2-}, and \texttt{new-} prefixes. The best solution for certain
use-cases (such as installing multiple versions of a local package for a
specific package database) seems to be deleting the database and building the
package again from scratch. We ran into this problem when installing our
compiler plugin into the package database of a local fork of the compiler.

\section{Dynamic tracing with plugins}
With a Haskell development environment ready, we have all we need to dive into
the implementation of a compiler plugin. We take a look at the details first,
then take a step back and cover building and applying the plugin.

Our goal is to transform the following code into a variant that produces a
recording of laziness-related events during evaluation. Take the naive version
of the QuickSort algorithm shown in Listing \ref{lst:qsort} as an example. We
would like to log both calls to this function and evaluations of its arguments.

\begin{listing}[h]
	\centering
	\begin{minted}[autogobble]{haskell}
	qsort []     = []
	qsort (a:as) = qsort left ++ [a] ++ qsort right
	    where (left, right) = (filter (<=a) as, filter (>a) as)
	\end{minted}
	\caption{The QuickSort algorithm on linked lists.}
	\label{lst:qsort}
\end{listing}

A transformed version of \hsIdent{qsort} with dynamic tracing is shown in
Listing~\ref{lst:qsort-perfect}. Every reference to an argument is replaced
with a call to the helper function \hsIdent{traceArg}, patterns in function
definitions are split into binding and pattern matching, and the right hand
sides of functions are wrapped in \hsCode{let} expressions which introduce
\hsIdent{callNumber} variables.

\begin{listing}[h]
	\centering
	\begin{minted}[autogobble]{haskell}
	qsort xs = let !callNumber = traceEntry "qsort"
	           in case traceArg "qsort" "xs" callNumber xs of
	             []     -> []
	             (a:as) ->
	               qsort left
	                 ++ [traceArg "qsort" "a" callNumber a]
	                 ++ qsort right
	               where
	               (left, right)
	                = (filter
	                    (<= traceArg "qsort" "a" callNumber a)
	                    (traceArg "qsort" "as" callNumber as),
	                   filter
	                    (> traceArg "qsort" "a" callNumber a)
	                    (traceArg "qsort" "as" callNumber as))
	\end{minted}
	\caption[The QuickSort algorithm, rewritten.]{The QuickSort algorithm on
	linked lists, extended with impure tracing calls.}
	\label{lst:qsort-perfect}
\end{listing}

The call number variables let us group the trace records by the call they
originated in. Any references to function arguments first pass through
\hsIdent{traceArg}, opening the opportunity to inspect the underlying runtime
structure before it is actually used in the program.

\subsection*{Anatomy of a plugin}
Our source plugin consists of four modules.
\begin{itemize}
	\item \hsModule{TracingPlugin}, the entry point of execution and the only
		exposed module of the package,
	\item \hsModule{Typechecking}, which contains utilities for typechecking
		expressions constructed by the plugin,
	\item \hsModule{Logging}, which defines the tracing functions that we
		compile into source programs, and
	\item \hsModule{Rewriting}, where the magic happens.
\end{itemize}

The \hsModule{TracingPlugin} module simply defines and exports a
\hsType{Plugin} derived from the \hsIdent{defaultPlugin} implementation,
overriding \hsIdent{typeCheckResultAction}, the function invoked after the
typechecking phase. Neglecting command-line arguments, our action has the type
\hsType{ModSummary -> TcGblEnv -> TcM TcGblEnv}. As the type indicates, it
computes within the typechecking monad (\hsType{TcM}) with access to
information about the current module (\hsType{ModSummary}), modifying its
typechecking environment (\hsType{TcGblEnv}). The action is invoked once for
each compiled module. The typechecking environment is a large data structure
which describes the top level of a module with 58 fields. Of these, only
\hsCode{tcg_binds :: LHsBinds GhcTc} is interesting to us. The type
constructor \hsType{LHsBinds} stands roughly for ``located Haskell bindings''
and represents a collection of all the top-level bindings of a module annotated
with their source file locations. Our post-typechecking action simply threads
this field through our rewriting function, which also computes in the
typechecking monad, and returns the transformed bindings.

The rewriting function, shown in Listing \ref{lst:hs-rewrite}, resides in the
\hsModule{Rewriting} module. It initiates a stateful computation which
transforms the bindings in a generic manner using the \acrshort{syb} library.
\begin{listing}[h]
	\centering
	\begin{minted}[autogobble]{haskell}
        rewrite :: LHsBinds GhcTc -> TcM (LHsBinds GhcTc)
        rewrite binds = fst <$> (`runStateT` initialState)
                                (everywhereM' trans binds)
	\end{minted}
	\caption[The top-level rewriting function.]{The top-level rewriting
	function, a sole export of the \hsModule{Rewriting} module.}
	\label{lst:hs-rewrite}
\end{listing}

Since the \acrshort{ghc} \acrshort{api} abstracts over compiler state using
(among other types) the \hsType{TcM} monad, the generic transformation
involving any non-trivial compiler computations needs to be monadic as well.
This transformation is implemented by the \hsIdent{trans} function (shown in
Listing \ref{lst:hs-trans}), which additionally carries a context from the
roots of the top-level definitions down to their leaves. We combine the
stateful traversal with the typechecking monad by way of the \hackage{mtl}
package, itself inspired by \cite{higher-order-polymorphism}, using the
\hsType{StateT} monad transformer.

\begin{listing}[h]
	\centering
	\begin{minted}[autogobble]{haskell}
        trans :: Typeable a => a -> StateT WrapperState TcM a
        trans = mkM collectFunInfo `extM` wrapRef `extM` incrementCC
	\end{minted}
	\caption{The generic transformation function.}
	\label{lst:hs-trans}
\end{listing}

\hsIdent{trans} is applied in a single, top-down traversal of the
\acrshortpl{ast} via a \acrshort{syb} scheme derived from \hsIdent{everywhereM}.
Ultimately, the function pattern-matches on important structures in the syntax
trees of top-level bindings in three different ways:
\begin{enumerate}
	\item \hsIdent{collectFunInfo} adds information about the current function
		to the \hsType{WrapperState},
	\item \hsIdent{wrapRef} wraps argument references with a tracing function, and
	\item \hsIdent{incrementCC} wraps the right-hand side of each function with
		a \hsCode{let} binding, introducing a call counter variable into
		its scope.
\end{enumerate}

Each of these building blocks of the complete transformation operates slightly
differently.

\begin{description}
	\item[\hsCode{collectFunInfo :: Bind -> StateT WrapperState TcM Bind}]
		pattern-\\matches on the various sorts of bindings that can appear in
		an \acrshort{ast} and extracts the names of the named ones, saving them
		to the \hsType{WrapperState} context, thus providing the name of the
		innermost named function to the other transformations.

	\item[\hsCode{incrementCC :: RHS -> StateT WrapperState TcM RHS}]
		pattern-matches \\on right-hand sides of functions and introduces calls
		to the tracing function \hsIdent{traceEntry} using \acrfull{th}.
		Calling \hsIdent{traceEntry} with a function name increments a global
		call counter for that function and returns the counter's current value.
		\hsIdent{incrementCC} has to introduce a new binding in the scope of
		the right-hand side so that tracing calls on the \acrshort{rhs} can
		refer to the call ID.  Since \acrshort{th} cannot lift the Haskell
		\acrshort{ast} types, the binding has to be constructed in two steps.

		First we read the \hsType{WrapperState} to find out the name of the
		function we are currently transforming. We construct a \acrshort{th}
		expression for the application of \hsIdent{traceEntry} to the function
		name and bind it via a \hsCode{let} binding which assigns the result to
		a new call counter variable in the scope of a dummy expression (a proxy
		to \hsIdent{undefined}). Then we typecheck this expression and run a
		\acrshort{syb} transformation which replaces the dummy subexpression
		with the original right-hand side. Care must be taken when replacing a
		node in the typechecked \acrshort{ast} because the typechecker inserts
		type applications for polymorphic terms such as \hsIdent{undefined}.

		Finally, \hsIdent{incrementCC} also finds the \hsType{Id} of the call
		counter variable via a \acrshort{syb} query and saves it in the
		\hsType{WrapperState}.

	\item[\hsCode{wrapRef :: LExpr -> StateT WrapperState TcM LExpr}]
		pattern-matches \\on references to function arguments in function
		bodies. Its purpose is to transform every argument reference into a
		call to \hsIdent{traceArg}.

		To identify references to function arguments, \hsIdent{wrapRef}
		consults the \hsCode{boundVars :: [Id]} collection. This collection is
		built independently of the \hsIdent{wrapRef} transformation, since it
		needs no function-specific information. We rely on the fact that while
		references to bindings are semantically valid only in local
		(lexically-scoped) contexts, they have globally unique identifiers.
		Collecting the identifiers of function arguments is thus a simple task
		of traversing all the syntactical pattern-matching structures which
		bind them. We once again leverage \acrshort{syb} to do this without
		having to pattern-match on the entirety of surface syntax.

		When the reference wrapping transformation identifies a function
		argument, it constructs a partial application of the \hsIdent{traceArg}
		tracing function and applies the original binding reference to it. The
		partially-applied \hsIdent{traceArg} is an unsafe identity function
		which logs information about the argument's runtime representation to a
		file.

		Since the overall rewriting operation proceeds in a top-down manner,
		the \hsIdent{wrapRef} transformation runs into the issue of producing
		subexpressions it could recursively match on again. This could be
		avoided by tagging the transformed expressions somehow. Unfortunately,
		this is difficult to achieve, because the \acrshort{ast} datatypes lack
		useful typeclass instances for doing so. Crucially, there is no notion
		of equality on syntax trees and no hashing implementation which would
		let us store the transformed expressions in a hash set (or at least a
		set). We work around this limitation by stripping the source location
		tags from the \acrshort{ast} nodes and checking for their presence
		before invoking \hsIdent{wrapRef}'s rewriting logic, but we are aware
		of the problems with this approach. However, issues with error
		reporting are largely mitigated by the fact that the plugin is invoked
		after the source program passed the typechecking phase.
\end{description}

\subsection*{Implementation of tracing utilities} \label{sec:tracing-util-impl}
The tracing functions inserted into the \acrshort{ast} by the rewriting logic
reside in the \hsModule{Logging} module. They leverage the unsafe IO features
of Haskell, specifically the standard \hsCode{unsafePerformIO :: IO a -> a}
from \hsModule{System.IO.Unsafe}, to hide the side-effects of tracing from the
type system. When invoked, these functions append a row of
\acrshort{csv}-encoded data to a \textit{trace file}, a log of interesting
events that occurred during the evaluation of a Haskell program, which is
suitable for further analysis.

\begin{description}
	\item[\hsCode{traceEntry :: String -> Int}] marks the evaluation entry
		point of a function. Taking the function's name, it increments its call
		counter in the background and returns its new value. The call counters
		are stored in a global map called \hsCode{functionEntries :: IORef (Map
		String Int)}. The \hsType{IORef} indirection makes
		\hsIdent{functionEntries} a mutable variable which can be manipulated
		in the \hsType{IO} monad. The map is explicitly marked with a
		\mintinline[
			breakbytokenanywhere,breaklines
		]{haskell}|{-# NOINLINE #-}|
		pragma to ensure it is shared between the tracing calls. Since the
		\hsType{IORef} constructor returns a reference in the \hsType{IO}
		monad, we allocate the global variable via \hsIdent{unsafePerformIO}.

		The call counter map is empty at first, individual counters are
		initialised on-demand. The initialisation of a new call counter and the
		increment of an existing one are both described concisely by the
		\hsIdent{insertWith} operation on \hsType{Map}s, which takes a binary
		function on values, a key, an initial value, and a map, and either
		initialises the key to the initial value or updates it by applying the
		binary function to its current value and the initial one. This
		operation is applied atomically via \hsIdent{atomicModifyIORef'} to
		accommodate concurrent updates.

	\item[\hsCode{traceArg :: String -> String -> Int -> a -> a}] indicates a
		reference to a function argument. Partially applying this function to
		the name of the enclosing function, the name of the referenced
		argument, and the number of the call to the enclosing function leaves
		an impure identity, which is applied to the actual argument.
		\hsIdent{traceArg} leverages the \hackage{ghc-heap-view} library to
		take a peek at the runtime representation of the argument to determine
		whether it has been evaluated or not.

	\item[\hsCode{logt :: TraceSort -> [String] -> IO ()}] persists a tracing
		message to the trace file. Calls to this function are not introduced
		during the rewriting process directly, but both \hsIdent{traceEntry}
		and \hsIdent{traceArg} call it internally. The function can thus stay
		in the safe realm of the language, as its type indicates. File system
		operations in Haskell require a value of type \hsType{Handle} which the
		\acrshort{rts} uses to manage IO with file system objects. Allocating a
		handle corresponds to opening a file. Since that is a potentially
		expensive operation, we store the handle in another non-in\-line\-able
		\hsType{IORef}, again created globally with \hsIdent{unsafePerformIO}.
		\hsIdent{logt} then simply reads the \hsType{IORef}, appends tracing
		data to the file, and flushes the handle, to avoid problems with lazy
		IO and prevent data loss when the program exits.
\end{description}

The types \hsType{Bind}, \hsType{RHS}, and \hsType{LExpr} are aliases for the
more verbose structures of the surface syntax.

\subsection*{Using the plugin}
Applying a compiler plugin in the pipeline of \acrshort{ghc} is
straightforward. The user simply passes a command-line option \texttt{-fplugin}
set to the name of the module that exports the plugin definition. Multiple
plugins can be specified at once by passing more than one \texttt{-fplugin}
option, the compiler applies them in the order they are defined.\footnote{
	Still subject primarily to the order of phases in the pipeline, but ordered
	secondarily according to the succession of command-line arguments to avoid
	ambiguities.
} To make the module visible to the compiler, it has to be installed in
\acrshort{ghc}'s package database. Installation is a matter of executing the
Cabal command \mintinline{bash}{cabal install --lib} in the plugin's directory.
An example project with a Cabal configuration that applies the plugin on all
modules is included on the enclosed medium.

\subsection*{Shortcomings}
Our implementation work does not include all the transformations required to
perform the perfect rewriting as illustrated in Listing
\ref{lst:qsort-perfect}. Instead, it performs a simplified transformation which
neither separates variable binding from destructuring nor untangles nested
pattern matches. Due in part to limitations in Template Haskell, \hsCode{let}
bindings of call counters do not include bang patterns. Several styles of
pattern definitions are also unsupported. A notable omission is the support for
the popular \hsCode{do} notation, commonly used when working with monads.

The plugin unfortunately suffers from a bug caused by the traversal scheme used
in the rewriting transformation. The error stems from the way the call counter
variables are referred to in the stateful computation.  When the traversal
exits a lambda, it should restore the call counter reference to that of the
surrounding context. This does not happen because \acrshort{syb} does not offer
a way of detecting this step. In effect, the plugin inserts undefined
references to code of certain shapes, crashing the compiler during desugaring.
Fixing this bug in the scope of \acrshort{syb} is challenging, because the
monadic transformation proceeds sequentially through the syntax tree.

Despite these issues, we feel that the design laid out and partially
implemented in this work is suitable for dynamic tracing. All of the
shortcomings listed above are implementation problems which can be addressed in
future work. Handling of \hsCode{do} notation can be implemented via
source-level desugaring, mirroring the work of the desugaring phase of the
compiler. This is in fact the technique already implemented by
\nameref{sec:hat}. The traversal issue can similarly be fixed by the choice of
another generic programming library better suited for monadic tree operations,
or by replacing the generic programming approach with another. The remaining
deficiencies have straightforward solutions and could be alleviated with more
time.

One simple but necessary improvement to the implementation is changing the
representation of names in the trace. The examples and our implementation uses
strings, which can lead to name clashes. However, \acrshort{ghc} assigns
globally unique identifiers to all names. During compilation, the plugin should
produce a table associating fully qualified names to their numeric identifiers.
The trace can then refer to functions and arguments with integers instead.

It remains to be seen whether the choice of rewriting after the typechecking
phase supports the future extensions. It may be easier to perform some
transformations without the annotations introduced by the typechecker and
trigger rewriting just after the renaming phase, or even right after parsing.


\setsecnumdepth{part}
\chapter{Conclusion}

This thesis aimed to design and implement a scalable dynamic tracing framework
for Haskell. Although the implementation does not support all of Haskell
syntax, the design allows for great scalability thanks to tight integration
with the \acrlong{ghc}. Tracing is added by rewriting the syntax trees of
source programs. The traced programs produce recordings of function call and
argument evaluations in \acrshort{csv} format, suitable for subsequent
analysis.

\section*{Future work}
Our work is only the beginning of the journey towards understanding the
practical implications of laziness. The compiler plugin we developed needs to
be finalised and extended before it can start recording data on unmodified
real-world programs. After incorporating these fixes, it can support
large-scale collection of traces, providing the empirical evidence for use in a
future study.


\bibliography{./bibliography/bbl.bib}
\bibliographystyle{./styles/iso690.bst}

\setsecnumdepth{all}
\appendix

\printglossary[type=\acronymtype]


\chapter{Contents of enclosed SD card}

%change appropriately
\begin{figure}[h]
	\dirtree{%
		.1 readme.md\DTcomment{an overview of the SD card contents}.
		.1 src\DTcomment{the directory of source codes}.
		.2 examples\DTcomment{example programs for use with the plugin}.
		.2 plugin\DTcomment{implementation sources}.
		.2 thesis\DTcomment{the directory of \LaTeX{} source codes of the thesis}.
		.1 text\DTcomment{the thesis text directory}.
		.2 thesis.pdf\DTcomment{the thesis text in PDF format}.
	}
\end{figure}

\end{document}