LSA & LaTeX

Logo

A somewhat comprehensive treatment of LaTeX for the purposes of linguistics and LSA in particular. Also, info on (some) LSA templates.

Compiled examples are here (7) and here (7.1).


Arrows with sentences

It is very frequent that those of us working on syntax (predominantly) need to draw lines within a tree; it is not always convenient or desirable to waste space for an entire tree where a linear diagram would do nicely.

This section will proceeds as follows:

So, I shall not describe the first three in detail, just giving an example of each and citing the source where syntax comes from.

pst-nodes

Indeed, in expex documentation, a seeming way go about this is given: by using expex with pstricks. See more information on interaction of these two in documentation of expex (but see also the documentation of pstricks here). However, pstricks seems to require a separate .sty document and some not insignificant fine-tuning to boot. It seems that it is very difficult to compile pstricks/expex combination successfully without going to quite some trouble locally – let alone on Overleaf.

Therefore, perhaps the most straightforward way here would be to use either (a) pst-node package or (b) topaths library for TikZ. Examples for both are given below. Note also that there definitely are many other ways to achieve the same goal, even in TikZ alone (it’s phenomenally versatile with a wealth of macros and libraries).

Here’s an example which uses pst-nodes, syntax due to tex.stackexchange.com/a/408181/272269(see the requisite packages there):

\pex
This is a very long\Rnode{st}{ \underline{sentence that}} appears in this very \Rnode{ss}{\underline{short short}} document.
\ncarc[arrows = ->, linecolor =blue, arcangle = 15, nodesep = 1pt]{st}{ss}
\ncbar[arrows = ->, linecolor =red, angle = -90, arm = 0.75em]{ss}{st}
\xe

Note that this does not compile in pdfLaTeX, in which LSA’s templates are set. This requires XeLaTeX. It does not compile in LuaLaTeX either.

to-paths

The first thing to know is that this clashes with forest package, so you won’t be able to have both forest trees and to-paths diagrams in one document without further trouble.

Here’s an example which uses to-paths, syntax again due to tex.stackexchange.com/a/408181/272269(see the requisite packages there):

\pex This is a  very long \tikz[baseline=(node1.base)]\node (node1)  {\underline{sentence that}}; appears
in this very \tikz[baseline=(node2.base)]\node (node2) {\underline{short short}}; document.

\begin{tikzpicture}[overlay]
    % Bend above text line
    \draw[-latex] (node2.north) to[bend right] (node1.north);
    % Bend below text line
    %\draw[-latex] (node2.south) to[bend left] (node1.south);
    % Angled
    \draw[-latex] (node2.south) -- ++(0,-1.5ex) -| (node1.south);
\end{tikzpicture}
\xe

…and more TikZ

Here’s another option to do this, syntax due to Gonzalo Medina from tex.stackexchange.com (links to the question):

\ex
W\Tikzmark{enda}{h}\Tikzmark{endb}{o}m(A) did John persuade \Tikzmark{starta}{t}(B) [ PRO to visit w\Tikzmark{startb}{h}om(C) ]
\xe
\DrawArrow{starta}{enda}{above}{$M_{sp}=2$}
\DrawArrow{startb}{endb}{above,pos=0.15}{$M_{sp}=6$}[3.5]

All of these examples are compiled following the link to 7 (not 7.1) on the main page, or at the top or at the bottom of this page.

Some other options I encountered upon a brief search are these: here, here, here, here, here, and here.

Why not use the options above

I think the options above are suboptimal for various reasons: one is inflexibility in requiring XeLaTeX, another is clashing with forest, yet another would be complexity of some of the syntax. For example, Medina’s example seems simple enough, but at the cost of expanding the preamble with custom settings (see the page linked above for those). These expansions are fine and convenient, but using them without understanding what’s going on (hence without being able to fix it should something go wrong) – as most people (remember the target audience of this website) would use this – is resolutely suboptimal. On the other hand, forest is already familiar to everyone through trees.

Forest!

I like forest, I think it’s great. So I see no reason not to use forest for the purposes of PRO-type diagrams. It seems there’s no tree, but trees can come in different shapes. This is a tree:

Note that examples starting with the one immediately below and until a notice like this below will not be in examples in the compiled document (mainpage link to 7.1).

[ [ [][] ] [ [][] ] ]

but this is a tree as well:

[] [] [] [] [] [] []

So, I suggest that we use arrows we used with trees (they can be rectangular as well) while splitting the original sentence in parts by putting them in separate nodes, like in the example above. For example, a simple partition would look like this:

[The] [curse] [has] [come] [upon] [me] [cried] [the] [Lady] [of] [Shalott.]

It’s not entirely necessary to separate everything. That is, if you want to draw a line from [Lady] to [curse], a partition like this should do:

[The] [curse] [has come upon me cried the] [Lady] [of Shalott.]

Interestingly, examples in the literature often use similar kinds of notation to demarcate boundaries of phrases or similar units, e.g. [ForceP [ [TopP [ [FocP [ [TopP’ [ [FinP [ [IP ]]]]]]]]]]]. So, it might become somewhat difficult to put these units into nodes of a forest-tree. For a simple example, consider [ForceP [TopP [FocP]]]. Also note that to mark that something is a single unit in LaTeX, for whatever purposes, curly braces are often used: cf. \textit{word} word word vs. \textit{word word word}. So,

[{ [ForceP }] [{ [TopP }] [{ [FocP }] [{ ]]] }]

should do. The partitions here are [ForceP, then [TopP, then [FocP, then ]]]. Each of the partitions is within {}, indicating it’s a single bit of code; and within [ ], indicating a forest-node. Now let’s consider some real examples.

The examples below are available in the compiled form following the link to 7.1 on the homepage, or the one at the top/bottom of this page.

Note that apart from forest, the examples below will require \usepackage{fixltx2e} (for \textsubscript{}, etc.) and \usetikzlibrary{matrix}. In terms of fixltx2e, if you can get to superscripts and subscripts some other way – feel free to. I do recommend staying away from math mode if there’s no real math, as it easy to lose track, and LaTeX will throw many Missing $ inserted as well as Missing { inserted at you.

This example is from Masaya Yoshida’s “Constraints and Mechanisms in Long-Distance Dependency Formation”, ex. 4 on p. 376. University of Maryland, 2006.

\pex
% this will be two-part example: one with rectangular arrow, one with elliptical one
\a
% this helps your sentence stay together (try the example without it and see what happens)
\pgfkeys{/pgf/inner sep=0.05em}
\begin{forest}
% this bit is helpful as well; you saw phantom bit in multidominance trees already
% so the structure is not just [tree], but really [,phantom, [tree]]
[,phantom,
% this is the partition itself
[{Wh-NP-Dat-},name=1] [...] [{[\textsubscript{NP} GNC}] [{[\textsubscript{NP}}] [{[\textsubscript{CP}}] [\textit{Op}] [{\textsubscript{IP}}] [Subject] [{...},name=2] [{]]}] [$NP_{host}$] [{]}] ]
% not to the arrows, just as in an ordinary tree
\begin{pgfinterruptboundingbox}
% this is syntax for rectangular-shaped arrows
% south is self-explanatory; see notes on other bits below
\draw[->, dashed, >=latex] (1.south) |- ++(0,-0.4) -| (2.south);
\end{pgfinterruptboundingbox}
% ending the fiest forest, and inserting a vertical space so that the arrow doesn't interfere with the example below
\end{forest} \vspace{1em}
% second example
\a
\pgfkeys{/pgf/inner sep=0.05em}
\begin{forest}
[,phantom,
[{Wh-NP-Dat-},name=1] [...] [{[\textsubscript{NP} GNC}] [{[\textsubscript{NP}}] [{[\textsubscript{CP}}] [\textit{Op}] [{\textsubscript{IP}}] [Subject] [{...},name=2] [{]]}] [$NP_{host}$] [{]}]
]
\begin{pgfinterruptboundingbox}
% the usual setting, which we saw on the trees page
% one can adjust looseness as desired to control how far above or below the arrow goes
\draw[->,looseness=0.3,overlay] (1) to[out=south,in=south] (2);
\end{pgfinterruptboundingbox}
\end{forest}
\xe

Let’s consider the fragment \draw[->, dashed, >=latex] (1.south) |- ++(0,-0.4) -| (2.south); in some more detail. It’s pretty similar to what we had on the page with trees, but south went into the parentheses with 1 and 2, and the new fragment |- ++(0,-0.4) -| appeared. In this fragment, there are two important parts. Both have to do with -0.4. The sign controls where your arrow is coming from: - is for below, + is for above. Make sure to use either south combined with -, or north combined with + (try + with north and see what happens though). Lastly, the number itself (0.4 in the case above) controls how far below or above your arrow goes. Two more examples are below.

The following example is Cedric Boeckx and Norbert Horstein’s “Superiority, Reconstruction, and Island” (p. 198, ex. 4) in “Foundational Issues in Linguistic Theory”, Freidin, Otero, and Zubizarreta, eds. MIT Press 2008.

\pex
\a
\pgfkeys{/pgf/inner sep=0.05em}
\begin{forest}
[,phantom,
[{[\textsubscript{CP}}] [{\_},name=1] [{[C\textsuperscript{0}}] [{\textsuperscript{IP}}] [{koj},name=2] [{[I\textsuperscript{0}}] [{[\textsubscript{IP}}] [kakvo] [V\textsuperscript{0}] [kogo] [{]]]]]}]
]
\begin{pgfinterruptboundingbox}
\draw[->,looseness=0.3,overlay] (2) to[out=south,in=south] (1);
\end{pgfinterruptboundingbox}
\end{forest} \vspace{1em}
\a
\pgfkeys{/pgf/inner sep=0.05em}
\begin{forest}
[,phantom,
[{[\textsubscript{CP}}] [{\_},name=1] [{[C\textsuperscript{0}}] [{\textsuperscript{IP}}] [{koj},name=2] [{[I\textsuperscript{0}}] [{[\textsubscript{IP}}] [kakvo] [V\textsuperscript{0}] [kogo] [{]]]]]}]
]
\begin{pgfinterruptboundingbox}
\draw[->, dotted, >=latex] (2.south) |- ++(0,-0.3) -| (1.south);
\end{pgfinterruptboundingbox}
\end{forest}
\xe

In the last one, the partitions are rather complex.

The following example (the structure, not the formatting) is from Uli Sauerland’s “Flat Binding”, p.239, ex. (83) in “Interface+Recursion=Language?”, Sauerland and Gartner, eds. Mouton de Gruyter 2007.

\pex
\a
\pgfkeys{/pgf/inner sep=0.05em}
\begin{forest}
[,phantom,
% there's no way out of math mode here; \phi cannot be got at in any other way
% also, in the original example, D subscript was a Greek letter, but html for this website would have none of that
[{the child who $\phi$P[the child]$_{D'}$ dropped the}] [{$[-]_{\text{F}}$},name=1] [{didn't pick up}] [{$\phi\text{P}[$the$ -]_D$},name=2]
]
\begin{pgfinterruptboundingbox}
\draw[-,looseness=0.3,overlay] (2) to[out=south,in=south] (1);
\end{pgfinterruptboundingbox}
\end{forest} \vspace{1em}
\a
\pgfkeys{/pgf/inner sep=0.05em}
\begin{forest}
[,phantom,
[{the child who $\phi$P[the child]$_{D'}$ dropped the}] [{$[-]_{\text{F}}$},name=1] [{didn't pick up}] [{$\phi\text{P}[$the$ -]_D$},name=2]
]
\begin{pgfinterruptboundingbox}
\draw[-, >=latex] (2.south) |- ++(0,-0.3) -| (1.south);
\end{pgfinterruptboundingbox}
\end{forest}
\xe

Please take a look at the compiled examples. Ultimately, there’s no difference in terms of how the goal is achieved – whether with forest, topaths, or something else – it’s only a matter of convenience and getting the right output.


Compiled examples are here (7) and here (7.1).

back