LSA & LaTeX

Logo

A somewhat comprehensive treatment of LaTeX for the purposes of linguistics and LSA in particular. Also, info on (some) LSA templates.

Compiled examples are here; takes a while to compile.


Trees

Don’t use qtree.

Objectively, qtree takes more space (the trees are not as compact), has fewer options in terms of formatting, and is generally a tad more clumsy (than forest, that is). It’s a personal choice, but for the sake of style and space, forest is more advisable. (We will also compare the two packages side-by-side at the bottom of the page.)

Instead, use forest.

Forest (here, links to CTAN) is very flexible. So that you don’t have to read the documentation (which you are strongly advised to anyway), here’s a sample tree:

% a simple forest tree; requires \usepackage[linguistics]{forest}
\begin{forest}
sn edges
[S [NP [N [Matthew]]] [VP [V [kissed]] [NP [N [John]]] ]]
\end{forest}

A more involved tree

The parameter sn edges sets, trivially, edges to a conventionally linguistic format. Include it with every tree right after the begin bit. Somewhat more involved examples will include combining forest with expex and multicols in order to get a numbered example that has a. and b. parts to it and those parts side to side. An example would be this:

% a more involved tree; requires \usepackage[linguistics]{forest}, \usepackage{multicols}, \usepackage{expex}
\pex Explaining what's going on below.
% this \pex bit is from expex; it starts a numbered example; the parts of the example will be given by \a..\a..\a..
\begin{multicols}{2}
%trivially, this is multicols; {2} at the end indicates the number of columns you need -- {2} is the default setting if you forget to specify; but LaTeX will likely be grumpy and throw something like "Missing number, treated as zero" at you
% also, multicols is definitely not the only way to achieve two columns (expex has requisite settings too) -- choose whichever suits you
\scriptsize
% trees can get large, like these X-bar ones, so manipulating font size with \scriptsize, \footnotesize, \tiny, etc. can be useful
\a Tree 1
\begin{forest}
sn edges
[S [NP\\Mary] [S' [S\\is] [VP [V\\thinking] [CP [C\\that] [S [NP\\John] [S' [S\\is] [VP [V\\reading] [NP [D\\the] [N\\book]]]]]]]]]
% note "\\" between, e.g. NP and Mary; in the previous example [NP [Mary]] would create a line going from NP to Mary;
% but since this was pretty much abandoned for the reasons Andrew Carnie explains in "Constituent Structure", "\\" just creates a line break
\end{forest}
\columnbreak
\scriptsize
\a Tree 2
\begin{forest}
sn edges
[S [NP\\Mary-ga] [S' [VP [CP [S [NP\\John-ga] [S' [VP [NP\\ hon-o] [V\\yom-]] [S\\ -da]]] [C\\-to]] [V\\omotte-]] [S\\ -iru]]]
\end{forest}
\end{multicols}
\xe

A very involved tree with lots of arrows, some of them blue and some of them red

Note that whether a. and b. fit your page and are indeed in two columns depends on the geometry your pick for your document as well as on the size of your tree. Further, if you need arrows to and fro, tikz can be effectively combined with forest.

\begin{forest}
sn edges
[CP
[D\\Who,name=1]
% name=1 will be used later to anchor arrows
[C' [C\\did ] [IP [NP [N' [N\\you ] ] ] [I' [I\\$t_{did}$ ] [ VP [$\emptyset$,name=2 ] [VP [ ]
[V'
% note the use of \emptyset, which is mathematical, and so is enclosed in $...$; do not use \{\} or (particularly) \[\] for in-line math
[V' [V\\hear ] [NP [D\\the] [N' [N\\{rumour} ] ] ] ]
[CP
[ ]
[C' [C\\that ] [IP [NP [N' [N\\Mary ] ] ] [I' [I ]
[VP [$\emptyset$,name=3 ] [VP [] [V' [V\\loves ]
[NP\\$t_{who}$,name=x]
] ] ] ] ] ] ] ] ] ] ] ] ] ]
% now this is TikZ time
\begin{pgfinterruptboundingbox}
\draw[->,dotted,looseness=1,overlay] (x) to[out=south,in=west] (1);
% all arrows will start with \draw[->], the parameters could be dotted, dashdotted, dotted, looseness (to experiment with)
% (x) is the from where, and (1) is to where; the names in the example are very creative: (x) and (1)
% [out=__,in=__] are direction where arrow comes out and comes in respectively; south is bottom, west is left, and so on
\draw[->,dashed,red,looseness=2] (x) to[out=south,in=south west] (3);
\draw[->,dashed,blue,looseness=1.5] (3) to[out=south west,in=west] (2);
% looseness is useful when an arrow from a lower node to an upper node crosses the tree -- loosing it leads it out of the tree;
% there are other more elegant options not to be explored here
% but some, e.g., Sportiche et al's textbook, choose to use arrows that cross branches of the tree quite often
\draw[->,dashed,looseness=2] (2) to[out=south west,in=west] (1);
\end{pgfinterruptboundingbox}
\end{forest}

Note that the arrows above are not at all the most exquisite. They will do beautifully for beginners, however. If you tried the trees above and you do not like arrows, think of it as motivation to consult both TikZ and forest documentation. Conveniently, CTAN link to forest was given above; here’s minimal guide for TikZ on CTAN, and here’s full PGF/TikZ guide if you’re very interested (fair warning: it’s 1321 pages in version 3.1.9a).

Rectangular arrows are not discussed here but are straightforward; see at the bottom of the page on PRO- and trace-arrows in linear sentences (and examples in 7.1.).

Another thing to look at is just a long tree where some edges are dashed, some are dotted, some are blue, some are yellow, and some nodes are formatted with strikethrough.

The original tree for the example below is (adapted) from Masaya Yoshida’s “Antecedent-containted sluicing” (Linguistic Inquiry 41:2, 2010; p.349). Masaya did not intend on having the dotted bit or colored bit (I reckon), but everything else is kept from the original.

\begin{forest}
[IP
[NP\\John] [I' [I\\must] [VP
[VP [t\textsubscript{John}] [V' [V\\love] [NP\\someone]]]
[PP [P\\without] [CP [C] [IP [NP\\PRO] [I' [I] [VP [V\\knowing] [CP [who] [\sout{IP}
% ``sout'' requires \usepackage{ulem}
[{\sout{NP}\\he},edge=dashed] [\sout{I'},edge={dashed,green} [\sout{I},edge=dashed] [\sout{VP},edge=dashed
[t\textsubscript{John}] [\sout{V'},edge={dotted,brown} [{\sout{V}\\love},edge=dotted] [t\textsubscript{who},edge={dotted,blue}] ]
]]]]]]]]]]]]
\end{forest}

Roofs

Often, there’s no need to spell out the internal structure of, say, DP or PP. The convention of literature is to use a triangle “roof” in such cases. Package documentation (expex) gives a very useful bit of code on page 74, which helps to achieve this. To be precise, after </code>\begin{forest} and before the first [, the following fragment is inserted:

delay={where n children=0{if={instr("P",content("!u"))}{roof}{}}{}},

What is does, in fact, is say that for every node that is a phrase (...if={instr("P"...) and that does not branch to more than one node which itself does not branch, a triangle will be used. At the same time, if you add the following bit: tier=word to the if-fragment, you will get a tree that has linear sentence and branches of varying length. The description of this is rather tangled, so examples convey the idea much better. Examples of this are given in the doc linked at the top and at the bottom of this page, and the full fragment to use is below.

delay={where n children=0{if={tier=word,instr("P",content("!u"))}{roof}{}}{}},

To get a better sense of the difference, compare the two fragments of code below and make sure to see the compiled examples.

\begin{forest}
delay={where n children=0{if={instr("P",content("!u"))}{roof}{}}{}},
sn edges
[S [XP [XP [x]] [XP [XP [x] ] [XP[x]]]] [XP [x] ] [XP [XP [x]] [XP [x]]]]
\end{forest}
\begin{forest}
delay={where n children=0{tier=word,if={instr("P",content("!u"))}{roof}{}}{}},
sn edges
[S [XP [XP [x]] [XP [XP [x] ] [XP[x]]]] [XP [x] ] [XP [XP [x]] [XP [x]]]]
\end{forest}

Multidominance

Finally, multidominance. There are multiple ways to do this, but the most straightforward and simple (albeit perhaps less stylistically appealing) one is still rudimentary TikZ. For two kinds of multidominance (resulting from movement and parallel merge respectively):

% from Citko 2011:119 (adapted), which was in turn from Citko 2005 (and the same below);
% requires \usepackage[linguistics]{forest} and \usepackage{tikz}
\begin{forest}
sn edges
[L,name=L [,name=D] [ [$\beta$] [K [$\gamma$,name=G] [$\alpha$]]] ]
\begin{pgfinterruptboundingbox}
\draw[-,looseness=1] (G) to [out=west,in=south] (D);
\end{pgfinterruptboundingbox}
\end{forest}
% a slightly different one; requires the same packages as the example immediately above
\begin{forest}where n children=0{tier=T}{}
[,phantom [K,name=A [$\alpha$, ][,phantom]][,phantom [$\beta$,name=C ]][L,name=D [,phantom][$\gamma$]]]
\draw[dotted] (C.north) -- (A.south);
\draw[dotted] (C.north) -- (D.south);
\end{forest}

Another way to go with multidominance

There is another way to go (in fact there are many – TikZ is uniquely versatile, as noted above). The code given below will indeed yield a tree much like in the second multidominance example (the parallel merge/horizontal sharing one). This example, however, will not be in the .pdf with the other examples from this page – to learn more about how structures of these type are typeset, see the Heyting algebra/Rieger–Nishimura lattice from the page on symbols, math, and logic (9).

\begin{tikzpicture}[x=1cm,y=1cm]
\node at (0,0)    (o)  {$\alpha$};
\node at (-1,0)   (l) {$\beta$};
\node at (1,0)    (r) {$\gamma$};
\node at (-0.5,1)    (u1) {$\delta$};
\node at (0.5,1)    (u2) {$\sigma$};
\draw (o) -- (u1);
\draw (o) -- (u2);
\draw (l) -- (u1);
\draw (r) -- (u2);
\end{tikzpicture}

A note on size

If you are having a big X-bar tree with all the possible agreement phrases and left periphery, chances are you are concerned about the size (particularly if it’s beamer you’re using [a document class for (La)TeX slides]). Generally, forest adjusts step between nodes based on the font size of your tree (in other words, just put something like \scriptsize in front of your tree). However, if you want big letters and small step, consider changing the step manually; see here. The trees that are written this way are not particularly exquisite, but would perhaps serve the purpose you are trying to achieve.

More on size: if you’re indeed using beamer, then even more changes are needed. You’ll need to change both the font and the step. And if you’re using arrows, probably indentation too (or else parts of your arrows might be outside the slide). An example of a setup is this:

%indentation: -- this is not necessary unless you have arrows that go beyond the boundaries of the tree
\hspace*{20pt}
%fontsize:
\tiny
%step:
\pgfkeys{/pgf/inner sep=0.01em}

Trees with lambda

A textbook on formal semantics (to my view, the singularly best one!) by Elizabeth Coppock and Lucas Champollion (see here, links to Prof. Coppock’s website) uses s.-c. (typed) lambda calculus (Alonzo Church came up with that, aka “simple theory of types”). The trees in lambda calculus follow (mostly, except adding nodes, e.g., for various kinds raising that are peculiar to semantics and so on) ordinary syntactic trees in their structure, but include formulae of lambda calculus. It might be useful to show some of them here and demonstrate that they are not at all scary to write, even though the trees can appear as rather complex. Also, I think they provide a nice ground to demonstrate why and how forest is better than qtree.

First, a tree that is not very complex (from the preface to C&C’s textbook, January 2022 version). This one we’ll do with forest:

\begin{forest}
% the math mode will be used pervasively
% macros to decrease the amount of math mode and such can be written, of course
% but the purpose here is to show a version of the tree with minimal toolkit
[{DP \\ $e$ \\ $\iota x.[$Textbook$(x) \wedge $On$(x, sem)]$}
% note that to make LaTeX treats something as one element, you can wrap it in {...}
[{D \\ $\langle\langle e,t \rangle, t\rangle$} [\textit{the}]]
% you can have the line breaks (\\) inside {...}
[{N' \\ $\langle e,t \rangle$ \\ $\lambda x. [$Textbook$(x) \wedge $On$(x. sem)]$}
% note the interruptions to math mode for "Textbook", etc. it's done wit $..$ text $...$, creating two math modes, really
% this is not an ideal solution, but a more or less viable option, more on this in the following examples
[{N \\ $\langle e,t \rangle $ \\ $\lambda x. [$Textbook$(x)]$} [\textit{textbook}] ]
[{PP \\ $\langle e,t \rangle$ \\ $\lambda x. [$On$(x, sem)]$}
[{P \\ $\langle e, \langle e,t \rangle \rangle$ \\ $\lambda y \lambda x. [$On$(x,y)]$} [on]]
[{DP \\ $e$ \\ sem} [semantics]]
]  ]  ]
\end{forest}

The tree, really, is no different here from a (possible) syntactic one save for three lines on every node (label, type, and formula). The more complex the tree gets, the harder it is to achieve a neat structure and keep it all on the same page.

Let’s try to reconstruct a tree given on p. 449 of C&C’s textbook (of Jan 2022 revision) that is for the sentence John buttered the toast slowly (the simplified version where buttered the toast is a single unit). First, let’s see whether the qtree version of this looks good (without any macros, plain math mode):

% note how qtree requires a dot [.text ] before text of the node to parse it correctly
% ie, it won't parse [text ] or [.text] (because there's no space after the text), etc.
\Tree [.{S
\\ \textit{t}
\\ $\exists e.\text{agent}(e) = \text{j} \wedge \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge \text{Slow}(e)$
% so amsmath (American Mathematical Society) enables you to use \text to have normal text inside math mode; there are also options like mathrm, and so on
% depending on what it is you are trying to typeset, some of them might work and some of them might not
% trying to manage this big tree with {$...$ text $...$} is not the best idea; you'll likely get "Missing { inserted." errors
\\ $\Uparrow$
\\ $\langle \langle v, t \rangle, t \rangle$
\\ $\lambda f \exists e. \text{agent}(e) = \text{j} \wedge \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge \text{Slow}(e) \wedge f(e)$}
[.{DP
\\ $\langle\langle \langle v, t \rangle, t \rangle, \langle \langle v, t \rangle, t \rangle \rangle$
\\ $\lambda V \lambda f. V (\lambda e.\text{agent}(e) = \text{j} \wedge f(e)) $}
[.{$\theta$
\\ $\langle e, \langle\langle \langle v, t \rangle, t \rangle, \langle \langle v, t \rangle, t \rangle \rangle \rangle$
\\ $\lambda x \lambda V \lambda f. V (\lambda e. \text{agent}(e) = x \wedge f(e))$} [.[agent] ] ]
[.{DP
\\ $e$
\\ j} [.Jones ] ]
]
[.{VP
\\ $\langle \langle v, t \rangle, t \rangle$
\\ $\lambda f \exists e. \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge \text{Slow}(e) \wedge f(e)$}
[.{V
\\ $\langle \langle v, t \rangle, t \rangle$
\\ $\lambda f \exists e. \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge f(e)$} [.{buttered the toast} ] ]
[.{AdvP
\\ $\langle\langle \langle v, t \rangle, t \rangle, \langle \langle v, t \rangle, t \rangle \rangle$  
\\ $\lambda v \lambda f. V(\lambda e.\text{Slow}(e) \wedge f(e))$} [.slowly ]
]
]
]

Do take a look at the compiled example document to see this. Now, let’s try forest (even without any tweaking):

\begin{forest}
sn edges
[{S \\ \textit{t}
\\ $\exists e.\text{agent}(e) = \text{j} \wedge \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge \text{Slow}(e)$
\\ $\Uparrow$
\\ $\langle \langle v, t \rangle, t \rangle$
\\ $\lambda f \exists e. \text{agent}(e) = \text{j} \wedge \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge \text{Slow}(e) \wedge f(e)$}
[{DP
\\ $\langle\langle\langle v, t \rangle, t \rangle, \langle\langle v, t \rangle, t \rangle\rangle$
\\ $\lambda V \lambda f. V (\lambda e.\text{agent}(e) = \text{j} \wedge f(e))$}
[ {$\theta$
\\ $\langle e, \langle\langle \langle v, t \rangle, t \rangle, \langle \langle v, t \rangle, t \rangle \rangle \rangle$
\\ $\lambda x \lambda V \lambda f. V (\lambda e. \text{agent}(e) = x \wedge f(e))$ } [agent]]
[{DP \\ \textit{e} \\ j} [Jones]]]
[ {VP \\
$\langle \langle v, t \rangle, t \rangle$
\\ $\lambda f \exists e. \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge \text{Slow}(e) \wedge f(e)$}
[{V
\\ $\langle\langle v, t \rangle, t \rangle$
\\ $\lambda f \exists e. \text{Butter}(e) \wedge \text{theme}(e) = \text{t} \wedge f(e)$}
[ {buttered the toast}]]
[{AdvP \\
$\langle\langle \langle v, t \rangle, t \rangle, \langle \langle v, t \rangle, t \rangle \rangle$
\\
$\lambda v \lambda f. V(\lambda e.\text{Slow}(e) \wedge f(e))$ } [slowly]]]]
\end{forest}

While the code is very similar, or, essentially, the same except for dots and other peculiarities of qtree – there is, to my view, a striking difference in terms of presentation, which strongly favors forest.

A couple of other notes:

{\draw[<->] () .. controls +(left:1cm) and +(south west:0.4cm) ..
  node[very near start,below,sloped]{\tiny agree} (!us);}

Lastly, do consult the documentation to customize and learn more about the packages!


Compiled examples are here; takes a while to compile.

back