Shortcuts

Data Structures

TexSoup transforms a LaTeX document into a complex tree of various Python objects, but all objects fall into one of the following three categories: TexNode, TexExpr (environments and commands), and Arg s.

Node

class TexSoup.data.TexNode[source]

A tree node representing an expression in the LaTeX document.

Every node in the parse tree is a TexNode, equipped with navigation, search, and modification utilities. To navigate the parse tree, use abstractions such as children and descendant. To access content in the parse tree, use abstractions such as contents, text, string , and args.

Note that the LaTeX parse tree is largely shallow: only environments such as itemize or enumerate have children and thus descendants. Typical LaTeX expressions such as \section have arguments but not children.

all

Returns all content in this node, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \newcommand{reverseconcat}[3]{#3#2#1}
... ''')
>>> list(soup.all)
['\n', \newcommand{reverseconcat}[3]{#3#2#1}, '\n']
append(*nodes)[source]

Add node(s) to this node’s list of children.

Parameters:nodes (TexNode) – List of nodes to add
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
... \end{itemize}
... \section{Hey}
... \textit{Willy}''')
>>> soup.section
\section{Hey}
>>> soup.section.append(soup.textit)  
Traceback (most recent call last):
...
TypeError: ...
>>> soup.section
\section{Hey}
>>> soup.itemize.append('    ', soup.item)
>>> soup.itemize
\begin{itemize}
    \item Hello
    \item Hello
\end{itemize}
args

Arguments for this node. Note that this argument is settable.

Return type:TexArgs
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\newcommand{reverseconcat}[3]{#3#2#1}''')
>>> soup.newcommand.args
[RArg('reverseconcat'), OArg('3'), RArg('#3#2#1')]
>>> soup.newcommand.args = soup.newcommand.args[:2]
>>> soup.newcommand
\newcommand{reverseconcat}[3]
char_pos_to_line(char_pos)[source]

Map position in the original string to parsed LaTeX position.

Parameters:char_pos (int) – Character position in the original string
Returns:(line number, index of character in line)
Return type:Tuple[int, int]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textbf{Silly}
... \textit{Willy}''')
>>> soup.char_pos_to_line(10)
(1, 9)
>>> soup.char_pos_to_line(20)
(2, 5)
children

Immediate children of this TeX element that are valid TeX objects.

This is equivalent to contents, excluding text elements and keeping only Tex expressions.

Returns:generator of all children
Return type:Iterator[TexExpr]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     Random text!
...     \item Hello
... \end{itemize}''')
>>> next(soup.itemize.children)
\item Hello
contents

Any non-whitespace contents inside of this TeX element.

Returns:generator of all nodes, tokens, and strings
Return type:Iterator[Union[TexNode,str]]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     Random text!
...     \item Hello
... \end{itemize}''')
>>> contents = soup.itemize.contents
>>> next(contents)
'\n    Random text!\n    '
>>> next(contents)
\item Hello
count(name=None, **attrs)[source]

Number of descendants matching criteria.

Parameters:
  • name (Union[None,str]) – name of LaTeX expression
  • attrs – LaTeX expression attributes, such as item text.
Returns:

number of matching expressions

Return type:

int

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textit{Silly}
... \textit{Willy}''')
>>> soup.count('section')
1
>>> soup.count('textit')
2
delete()[source]

Delete this node from the parse tree.

Where applicable, this will remove all descendants of this node from the parse tree.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\textit{\color{blue}{Silly}}\textit{keep me!}''')
>>> soup.textit.color.delete()
>>> soup
\textit{}\textit{keep me!}
>>> soup.textit.delete()
>>> soup
\textit{keep me!}
descendants

Returns all descendants for this TeX element.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \begin{itemize}
...         \item Nested
...     \end{itemize}
... \end{itemize}''')
>>> descendants = list(soup.itemize.descendants)
>>> descendants[1]
\item Nested
find(name=None, **attrs)[source]

First descendant node matching criteria.

Returns None if no descendant node found.

Returns:descendant node matching criteria
Return type:Union[None,TexExpr]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Ooo}
... \textit{eee}
... \textit{ooo}''')
>>> soup.find('textit')
\textit{eee}
>>> soup.find('textbf')
find_all(name=None, **attrs)[source]

Return all descendant nodes matching criteria.

Parameters:
  • name (Union[None,str]) – name of LaTeX expression
  • attrs – LaTeX expression attributes, such as item text.
Returns:

All descendant nodes matching criteria

Return type:

Iterator[TexNode]

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Ooo}
... \textit{eee}
... \textit{ooo}''')
>>> gen = soup.find_all('textit')
>>> next(gen)
\textit{eee}
>>> next(gen)
\textit{ooo}
>>> next(soup.find_all('textbf'))
Traceback (most recent call last):
...
StopIteration
insert(i, *nodes)[source]

Add node(s) to this node’s list of children, inserted at position i.

Parameters:
  • i (int) – Position to add nodes to
  • nodes (TexNode) – List of nodes to add
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> item = soup.item
>>> soup.item.delete()
>>> soup.itemize.insert(1, item)
>>> soup.itemize
\begin{itemize}
    \item Hello
    \item Bye
\end{itemize}
name

Name of the expression. Used for search functions.

Return type:str
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\textbf{Hello}''')
>>> soup.textbf.name
'textbf'
>>> soup.textbf.name = 'textit'
>>> soup.textit
\textit{Hello}
remove(node)[source]

Remove a node from this node’s list of contents.

Parameters:node (TexExpr) – Node to remove
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> soup.itemize.remove(soup.item)
>>> soup.itemize
\begin{itemize}
    \item Bye
\end{itemize}
replace(child, *nodes)[source]

Replace provided node with node(s).

Parameters:
  • child (TexNode) – Child node to replace
  • nodes (TexNode) – List of nodes to subtitute in
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> items = list(soup.find_all('item'))
>>> bye = items[1]
>>> soup.itemize.replace(soup.item, bye)
>>> soup.itemize
\begin{itemize}
    \item Bye
\item Bye
\end{itemize}
replace_with(*nodes)[source]

Replace this node in the parse tree with the provided node(s).

Parameters:nodes (TexNode) – List of nodes to subtitute in
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> items = list(soup.find_all('item'))
>>> bye = items[1]
>>> soup.item.replace_with(bye)
>>> soup.itemize
\begin{itemize}
    \item Bye
\item Bye
\end{itemize}
string

This is valid if and only if

  1. the expression is a TexCmd AND
  2. the command has only one argument.
Return type:Union[None,str]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\textbf{Hello}''')
>>> soup.textbf.string
'Hello'
>>> soup.textbf.string = 'Hello World'
>>> soup.textbf.string
'Hello World'
>>> soup.textbf
\textbf{Hello World}
text

All text in descendant nodes.

This is equivalent to contents, keeping text elements and excluding Tex expressions.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \begin{itemize}
...         \item Nested
...     \end{itemize}
... \end{itemize}''')
>>> next(soup.text)
'Nested\n    '

Expressions

class TexSoup.data.TexExpr[source]

General abstraction for a TeX expression.

An expression may be a command or an environment and is identified by a name, arguments, and place in the parse tree. This is an abstract and is not directly instantiated.

all

Returns all content in this expression, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.

>>> expr1 = TexExpr('textbf', ('\n', 'hi'))
>>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True)
>>> list(expr1.all) == list(expr2.all)
True
append(*exprs)[source]

Add contents to the expression.

Parameters:exprs (Union[TexExpr,str]) – List of contents to add
>>> expr = TexExpr('textbf', ('hello',))
>>> expr
TexExpr('textbf', ['hello'])
>>> expr.append('world')
>>> expr
TexExpr('textbf', ['hello', 'world'])
contents

Returns all contents in this expression.

Optionally includes whitespace if set when node was created.

>>> expr1 = TexExpr('textbf', ('\n', 'hi'))
>>> list(expr1.contents)
['hi']
>>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True)
>>> list(expr2.contents)
['\n', 'hi']
insert(i, *exprs)[source]

Insert content at specified position into expression.

Parameters:
  • i (int) – Position to add content to
  • exprs (Union[TexExpr,str]) – List of contents to add
>>> expr = TexExpr('textbf', ('hello',))
>>> expr
TexExpr('textbf', ['hello'])
>>> expr.insert(0, 'world')
>>> expr
TexExpr('textbf', ['world', 'hello'])
remove(expr)[source]

Remove a provided expression from its list of contents.

Parameters:expr (Union[TexExpr,str]) – Content to add
Returns:index of the expression removed
Return type:int
>>> expr = TexExpr('textbf', ('hello',))
>>> expr.remove('hello')
0
>>> expr
TexExpr('textbf', [])
tokens

Further breaks down all tokens for a particular expression into words and other expressions.

>>> tex = TexEnv('lstlisting', ('var x = 10',))
>>> list(tex.tokens)
['var x = 10']
class TexSoup.data.TexEnv[source]

Abstraction for a LaTeX command, denoted by \begin{env} and \end{env}. Contains three attributes:

  1. the environment name itself,
  2. the environment arguments, whether optional or required, and
  3. the environment’s contents.
>>> t = TexEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'],
...     [RArg('c | c c')])
>>> t
TexEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'], [RArg('c | c c')])
>>> print(t)
\begin{tabular}{c | c c}
0 & 0 & * \\
1 & 1 & * \\
\end{tabular}
>>> len(list(t.children))
0
class TexSoup.data.TexCmd[source]

Abstraction for a LaTeX command. Contains two attributes:

  1. the command name itself and
  2. the command arguments, whether optional or required.
>>> textit = TexCmd('textit', args=[RArg('slant')])
>>> t = TexCmd('textbf', args=[RArg('big ', textit, '.')])
>>> t
TexCmd('textbf', [RArg('big ', TexCmd('textit', [RArg('slant')]), '.')])
>>> print(t)
\textbf{big \textit{slant}.}
>>> children = list(map(str, t.children))
>>> len(children)
1
>>> print(children[0])
\textit{slant}

Arguments

class TexSoup.data.Arg[source]

Abstraction for a LaTeX expression argument.

>>> arg = Arg('huehue')
>>> arg[0]
'h'
>>> arg[1:]
'uehue'
classmethod delims()[source]

Returns delimiters specific to an argument type.

>>> RArg.delims()
['{', '}']
>>> OArg.delims()
['[', ']']
static parse(s)[source]

Parse a string or list and return an Argument object

Parameters:s (Union[str,iterable]) – Either a string or a list, where the first and last elements are valid argument delimiters.
>>> Arg.parse(RArg('arg0'))
RArg('arg0')
>>> Arg.parse('[arg0]')
OArg('arg0')
value

Argument value, without format.

>>> arg = RArg('hello')
>>> arg
RArg('hello')
>>> arg.value
'hello'
class TexSoup.data.OArg[source]

Optional argument, denoted as [arg]

class TexSoup.data.RArg[source]

Required argument, denoted as {arg}.

class TexSoup.data.TexArgs[source]

List of arguments for a TeX expression. Supports all standard list ops.

Additional support for conversion from and to unparsed argument strings.

>>> arguments = TexArgs(['\n', RArg('arg0'), '[arg1]', '{arg2}'])
>>> arguments
[RArg('arg0'), OArg('arg1'), RArg('arg2')]
>>> arguments.all
['\n', RArg('arg0'), OArg('arg1'), RArg('arg2')]
>>> arguments[2]
RArg('arg2')
>>> len(arguments)
3
>>> arguments[:2]
[RArg('arg0'), OArg('arg1')]
>>> isinstance(arguments[:2], TexArgs)
True
append(arg)[source]

Append whitespace, an unparsed argument string, or an argument object.

Parameters:arg (Arg) – argument to add to the end of the list
>>> arguments = TexArgs([RArg('arg0'), '[arg1]', '{arg2}'])
>>> arguments.append('[arg3]')
>>> arguments[3]
OArg('arg3')
>>> arguments.append(RArg('arg4'))
>>> arguments[4]
RArg('arg4')
>>> len(arguments)
5
>>> arguments.append('\n')
>>> len(arguments)
5
>>> len(arguments.all)
6
clear()[source]

Clear both the list and the proxy .all.

>>> args = TexArgs(['\n', RArg('arg1'), OArg('arg2')])
>>> args.clear()
>>> len(args) == len(args.all) == 0
True
extend(args)[source]

Extend mixture of unparsed argument strings, arguments objects, and whitespace.

Parameters:args (List[Arg]) – Arguments to add to end of the list
>>> arguments = TexArgs([RArg('arg0'), '[arg1]', '{arg2}'])
>>> arguments.extend(['[arg3]', RArg('arg4'), '\t'])
>>> len(arguments)
5
>>> arguments[4]
RArg('arg4')
insert(i, arg)[source]

Insert whitespace, an unparsed argument string, or an argument object.

Parameters:
  • i (int) – Index to insert argument into
  • arg (Arg) – Argument to insert
>>> arguments = TexArgs(['\n', RArg('arg0'), '[arg2]'])
>>> arguments.insert(1, '[arg1]')
>>> len(arguments)
3
>>> arguments
[RArg('arg0'), OArg('arg1'), OArg('arg2')]
>>> arguments.all
['\n', RArg('arg0'), OArg('arg1'), OArg('arg2')]
>>> arguments.insert(10, '[arg3]')
>>> arguments[3]
OArg('arg3')
pop(i)[source]

Pop argument object at provided index.

Parameters:i (int) – Index to pop from the list
>>> arguments = TexArgs([RArg('arg0'), '[arg2]', '{arg3}'])
>>> arguments.pop(1)
OArg('arg2')
>>> len(arguments)
2
>>> arguments[0]
RArg('arg0')
remove(item)[source]

Remove either an unparsed argument string or an argument object.

Parameters:item (Union[str,Arg]) – Item to remove
>>> arguments = TexArgs([RArg('arg0'), '[arg2]', '{arg3}'])
>>> arguments.remove('{arg0}')
>>> len(arguments)
2
>>> arguments[0]
OArg('arg2')
reverse()[source]

Reverse both the list and the proxy .all.

>>> args = TexArgs(['\n', RArg('arg1'), OArg('arg2')])
>>> args.reverse()
>>> args.all
[OArg('arg2'), RArg('arg1'), '\n']
>>> args
[OArg('arg2'), RArg('arg1')]
sort()[source]

Sort both the list and the proxy .all.

Since it doesn’t make sense to sort the proxy, all whitespace is dropped.

>>> args = TexArgs(['\n', RArg('arg1'), OArg('arg2')])
>>> args.sort()
>>> len(args) == len(args.all)
True