Shortcuts

Data Structures

TexSoup transforms a LaTeX document into a complex tree of various Python objects, but all objects fall into one of the following three categories:

TexNode, TexExpr (environments and commands), and TexGroup s.

Node

class TexSoup.data.TexNode[source]

A tree node representing an expression in the LaTeX document.

Every node in the parse tree is a TexNode, equipped with navigation, search, and modification utilities. To navigate the parse tree, use abstractions such as children and descendant. To access content in the parse tree, use abstractions such as contents, text, string , and args.

Note that the LaTeX parse tree is largely shallow: only environments such as itemize or enumerate have children and thus descendants. Typical LaTeX expressions such as \section have arguments but not children.

all

Returns all content in this node, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \newcommand{reverseconcat}[3]{#3#2#1}
... ''')
>>> alls = soup.all
>>> alls[0]


>>> alls[1]
\newcommand{reverseconcat}[3]{#3#2#1}
append(*nodes)[source]

Add node(s) to this node’s list of children.

Parameters:nodes (TexNode) – List of nodes to add
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
... \end{itemize}
... \section{Hey}
... \textit{Willy}''')
>>> soup.section
\section{Hey}
>>> soup.section.append(soup.textit)  
Traceback (most recent call last):
...
TypeError: ...
>>> soup.section
\section{Hey}
>>> soup.itemize.append('    ', soup.item)
>>> soup.itemize
\begin{itemize}
    \item Hello
    \item Hello
\end{itemize}
args

Arguments for this node. Note that this argument is settable.

Return type:TexArgs
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\newcommand{reverseconcat}[3]{#3#2#1}''')
>>> soup.newcommand.args
[BraceGroup('reverseconcat'), BracketGroup('3'), BraceGroup('#3#2#1')]
>>> soup.newcommand.args = soup.newcommand.args[:2]
>>> soup.newcommand
\newcommand{reverseconcat}[3]
char_pos_to_line(char_pos)[source]

Map position in the original string to parsed LaTeX position.

Parameters:char_pos (int) – Character position in the original string
Returns:(line number, index of character in line)
Return type:Tuple[int, int]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textbf{Silly}
... \textit{Willy}''')
>>> soup.char_pos_to_line(10)
(1, 9)
>>> soup.char_pos_to_line(20)
(2, 5)
children

Immediate children of this TeX element that are valid TeX objects.

This is equivalent to contents, excluding text elements and keeping only Tex expressions.

Returns:generator of all children
Return type:Iterator[TexExpr]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     Random text!
...     \item Hello
... \end{itemize}''')
>>> soup.itemize.children[0]
\item Hello
contents

Any non-whitespace contents inside of this TeX element.

Returns:generator of all nodes, tokens, and strings
Return type:Iterator[Union[TexNode,str]]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     Random text!
...     \item Hello
... \end{itemize}''')
>>> contents = soup.itemize.contents
>>> contents[0]
'\n    Random text!\n    '
>>> contents[1]
\item Hello
copy()[source]

Create another copy of the current node.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textit{Silly}
... \textit{Willy}''')
>>> s = soup.section.copy()
>>> s.parent is None
True
count(name=None, **attrs)[source]

Number of descendants matching criteria.

Parameters:
  • name (Union[None,str]) – name of LaTeX expression
  • attrs – LaTeX expression attributes, such as item text.
Returns:

number of matching expressions

Return type:

int

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textit{Silly}
... \textit{Willy}''')
>>> soup.count('section')
1
>>> soup.count('textit')
2
delete()[source]

Delete this node from the parse tree.

Where applicable, this will remove all descendants of this node from the parse tree.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \textit{\color{blue}{Silly}}\textit{keep me!}''')
>>> soup.textit.color.delete()
>>> soup

\textit{}\textit{keep me!}
>>> soup.textit.delete()
>>> soup

\textit{keep me!}
descendants

Returns all descendants for this TeX element.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \begin{itemize}
...         \item Nested
...     \end{itemize}
... \end{itemize}''')
>>> descendants = list(soup.itemize.descendants)
>>> descendants[1]
\item Nested
find(name=None, **attrs)[source]

First descendant node matching criteria.

Returns None if no descendant node found.

Returns:descendant node matching criteria
Return type:Union[None,TexExpr]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Ooo}
... \textit{eee}
... \textit{ooo}''')
>>> soup.find('textit')
\textit{eee}
>>> soup.find('textbf')
find_all(name=None, **attrs)[source]

Return all descendant nodes matching criteria.

Parameters:
  • name (Union[None,str,list]) – name of LaTeX expression
  • attrs – LaTeX expression attributes, such as item text.
Returns:

All descendant nodes matching criteria

Return type:

Iterator[TexNode]

If name is a list of str‘s, any matching section will be matched.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Ooo}
... \textit{eee}
... \textit{ooo}''')
>>> gen = soup.find_all('textit')
>>> gen[0]
\textit{eee}
>>> gen[1]
\textit{ooo}
>>> soup.find_all('textbf')[0]
Traceback (most recent call last):
...
IndexError: list index out of range
insert(i, *nodes)[source]

Add node(s) to this node’s list of children, at position i.

Parameters:
  • i (int) – Position to add nodes to
  • nodes (TexNode) – List of nodes to add
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> item = soup.item.copy()
>>> soup.item.delete()
>>> soup.itemize.insert(1, item)
>>> soup.itemize
\begin{itemize}
    \item Hello
    \item Bye
\end{itemize}
>>> item.parent.name == soup.itemize.name
True
name

Name of the expression. Used for search functions.

Return type:str
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\textbf{Hello}''')
>>> soup.textbf.name
'textbf'
>>> soup.textbf.name = 'textit'
>>> soup.textit
\textit{Hello}
position

Position of first character in expression, in original source.

Note this position is NOT updated as the parsed tree is modified.

remove(node)[source]

Remove a node from this node’s list of contents.

Parameters:node (TexExpr) – Node to remove
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> soup.itemize.remove(soup.item)
>>> soup.itemize
\begin{itemize}
    \item Bye
\end{itemize}
replace(child, *nodes)[source]

Replace provided node with node(s).

Parameters:
  • child (TexNode) – Child node to replace
  • nodes (TexNode) – List of nodes to subtitute in
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> items = list(soup.find_all('item'))
>>> bye = items[1]
>>> soup.itemize.replace(soup.item, bye)
>>> soup.itemize
\begin{itemize}
    \item Bye
\item Bye
\end{itemize}
replace_with(*nodes)[source]

Replace this node in the parse tree with the provided node(s).

Parameters:nodes (TexNode) – List of nodes to subtitute in
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> items = list(soup.find_all('item'))
>>> bye = items[1]
>>> soup.item.replace_with(bye)
>>> soup.itemize
\begin{itemize}
    \item Bye
\item Bye
\end{itemize}
string

This is valid if and only if

  1. the expression is a TexCmd AND has only one argument OR
  2. the expression is a TexEnv AND has only one TexText child
Return type:Union[None,str]
>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\textbf{Hello}''')
>>> soup.textbf.string
'Hello'
>>> soup.textbf.string = 'Hello World'
>>> soup.textbf.string
'Hello World'
>>> soup.textbf
\textbf{Hello World}
>>> soup = TexSoup(r'''\begin{equation}1+1\end{equation}''')
>>> soup.equation.string
'1+1'
>>> soup.equation.string = '2+2'
>>> soup.equation.string
'2+2'
text

All text in descendant nodes.

This is equivalent to contents, keeping text elements and excluding Tex expressions.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \begin{itemize}
...         \item Nested
...     \end{itemize}
... \end{itemize}''')
>>> soup.text[0]
' Nested\n    '

Expressions

class TexSoup.data.TexExpr[source]

General abstraction for a TeX expression.

An expression may be a command or an environment and is identified by a name, arguments, and place in the parse tree. This is an abstract and is not directly instantiated.

all

Returns all content in this expression, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.

>>> expr1 = TexExpr('textbf', ('\n', 'hi'))
>>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True)
>>> list(expr1.all) == list(expr2.all)
True
append(*exprs)[source]

Add contents to the expression.

Parameters:exprs (Union[TexExpr,str]) – List of contents to add
>>> expr = TexExpr('textbf', ('hello',))
>>> expr
TexExpr('textbf', ['hello'])
>>> expr.append('world')
>>> expr
TexExpr('textbf', ['hello', 'world'])
contents

Returns all contents in this expression.

Optionally includes whitespace if set when node was created.

>>> expr1 = TexExpr('textbf', ('\n', 'hi'))
>>> list(expr1.contents)
['hi']
>>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True)
>>> list(expr2.contents)
['\n', 'hi']
>>> expr = TexExpr('textbf', ('\n', 'hi'))
>>> expr.contents = ('hehe', '👻')
>>> list(expr.contents)
['hehe', '👻']
>>> expr.contents = 35  
Traceback (most recent call last):
    ...
TypeError: ...
insert(i, *exprs)[source]

Insert content at specified position into expression.

Parameters:
  • i (int) – Position to add content to
  • exprs (Union[TexExpr,str]) – List of contents to add
>>> expr = TexExpr('textbf', ('hello',))
>>> expr
TexExpr('textbf', ['hello'])
>>> expr.insert(0, 'world')
>>> expr
TexExpr('textbf', ['world', 'hello'])
>>> expr.insert(0, TexText('asdf'))
>>> expr
TexExpr('textbf', ['asdf', 'world', 'hello'])
remove(expr)[source]

Remove a provided expression from its list of contents.

Parameters:expr (Union[TexExpr,str]) – Content to add
Returns:index of the expression removed
Return type:int
>>> expr = TexExpr('textbf', ('hello',))
>>> expr.remove('hello')
0
>>> expr
TexExpr('textbf', [])
string

All contents stringified. A convenience property

>>> expr = TexExpr('hello', ['naw'])
>>> expr.string
'naw'
>>> expr.string = 'huehue'
>>> expr.string
'huehue'
>>> type(expr.string)
<class 'TexSoup.data.TexText'>
>>> str(expr)
"TexExpr('hello', ['huehue'])"
>>> expr.string = 35  
Traceback (most recent call last):
    ...
TypeError: ...
class TexSoup.data.TexEnv[source]

Abstraction for a LaTeX command, with starting and ending markers. Contains three attributes:

  1. a human-readable environment name,
  2. the environment delimiters
  3. the environment’s contents.
>>> t = TexEnv('displaymath', r'\[', r'\]',
...     ['\\mathcal{M} \\circ \\mathcal{A}'])
>>> t
TexEnv('displaymath', ['\\mathcal{M} \\circ \\mathcal{A}'], [])
>>> print(t)
\[\mathcal{M} \circ \mathcal{A}\]
>>> len(list(t.children))
0
class TexSoup.data.TexCmd[source]

Abstraction for a LaTeX command. Contains two attributes:

  1. the command name itself and
  2. the command arguments, whether optional or required.
>>> textit = TexCmd('textit', args=[BraceGroup('slant')])
>>> t = TexCmd('textbf', args=[BraceGroup('big ', textit, '.')])
>>> t
TexCmd('textbf', [BraceGroup('big ', TexCmd('textit', [BraceGroup('slant')]), '.')])
>>> print(t)
\textbf{big \textit{slant}.}
>>> children = list(map(str, t.children))
>>> len(children)
1
>>> print(children[0])
\textit{slant}

Groups

class TexSoup.data.TexGroup[source]

Abstraction for a LaTeX environment with single-character delimiters.

Used primarily to identify and associate arguments with commands.

classmethod parse(s)[source]

Parse a string or list and return an Argument object.

Naive implementation, does not parse expressions in provided string.

Parameters:s (Union[str,iterable]) – Either a string or a list, where the first and last elements are valid argument delimiters.
>>> TexGroup.parse('[arg0]')
BracketGroup('arg0')
class TexSoup.data.BracketGroup[source]

Optional argument, denoted as [arg]

class TexSoup.data.BraceGroup[source]

Required argument, denoted as {arg}.

class TexSoup.data.TexArgs[source]

List of arguments for a TeX expression. Supports all standard list ops.

Additional support for conversion from and to unparsed argument strings.

>>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg1]', '{arg2}'])
>>> arguments
[BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')]
>>> arguments.all
['\n', BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')]
>>> arguments[2]
BraceGroup('arg2')
>>> len(arguments)
3
>>> arguments[:2]
[BraceGroup('arg0'), BracketGroup('arg1')]
>>> isinstance(arguments[:2], TexArgs)
True
append(arg)[source]

Append whitespace, an unparsed argument string, or an argument object.

Parameters:arg (TexGroup) – argument to add to the end of the list
>>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}'])
>>> arguments.append('[arg3]')
>>> arguments[3]
BracketGroup('arg3')
>>> arguments.append(BraceGroup('arg4'))
>>> arguments[4]
BraceGroup('arg4')
>>> len(arguments)
5
>>> arguments.append('\n')
>>> len(arguments)
5
>>> len(arguments.all)
6
clear()[source]

Clear both the list and the proxy .all.

>>> args = TexArgs(['\n', BraceGroup('arg1'), BracketGroup('arg2')])
>>> args.clear()
>>> len(args) == len(args.all) == 0
True
extend(args)[source]

Extend mixture of unparsed argument strings, arguments objects, and whitespace.

Parameters:args (List[TexGroup]) – Arguments to add to end of the list
>>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}'])
>>> arguments.extend(['[arg3]', BraceGroup('arg4'), '\t'])
>>> len(arguments)
5
>>> arguments[4]
BraceGroup('arg4')
insert(i, arg)[source]

Insert whitespace, an unparsed argument string, or an argument object.

Parameters:
  • i (int) – Index to insert argument into
  • arg (TexGroup) – Argument to insert
>>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg2]'])
>>> arguments.insert(1, '[arg1]')
>>> len(arguments)
3
>>> arguments
[BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')]
>>> arguments.all
['\n', BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')]
>>> arguments.insert(10, '[arg3]')
>>> arguments[3]
BracketGroup('arg3')
pop(i)[source]

Pop argument object at provided index.

Parameters:i (int) – Index to pop from the list
>>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}'])
>>> arguments.pop(1)
BracketGroup('arg2')
>>> len(arguments)
2
>>> arguments[0]
BraceGroup('arg0')
remove(item)[source]

Remove either an unparsed argument string or an argument object.

Parameters:item (Union[str,TexGroup]) – Item to remove
>>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}'])
>>> arguments.remove('{arg0}')
>>> len(arguments)
2
>>> arguments[0]
BracketGroup('arg2')
>>> arguments.remove(arguments[0])
>>> arguments[0]
BraceGroup('arg3')
>>> arguments.remove(BraceGroup('arg3'))
>>> len(arguments)
0
>>> arguments = TexArgs([
...     BraceGroup(TexCmd('color')),
...     BraceGroup(TexCmd('color', [BraceGroup('blue')]))
... ])
>>> arguments.remove(arguments[0])
>>> len(arguments)
1
>>> arguments.remove(arguments[0])
>>> len(arguments)
0
reverse()[source]

Reverse both the list and the proxy .all.

>>> args = TexArgs(['\n', BraceGroup('arg1'), BracketGroup('arg2')])
>>> args.reverse()
>>> args.all
[BracketGroup('arg2'), BraceGroup('arg1'), '\n']
>>> args
[BracketGroup('arg2'), BraceGroup('arg1')]

Environments

class TexSoup.data.TexNamedEnv[source]

Abstraction for a LaTeX command, denoted by \begin{env} and \end{env}. Contains three attributes:

  1. the environment name itself,
  2. the environment arguments, whether optional or required, and
  3. the environment’s contents.

Warning: Note that setting TexNamedEnv.begin or TexNamedEnv.end has no effect. The begin and end tokens are always constructed from TexNamedEnv.name.

>>> t = TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'],
...     [BraceGroup('c | c c')])
>>> t
TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'], [BraceGroup('c | c c')])
>>> print(t)
\begin{tabular}{c | c c}
0 & 0 & * \\
1 & 1 & * \\
\end{tabular}
>>> len(list(t.children))
0
>>> t = TexNamedEnv('equation', [r'5\sum_{i=0}^n i^2'])
>>> str(t)
'\\begin{equation}5\\sum_{i=0}^n i^2\\end{equation}'
>>> t.name = 'eqn'
>>> str(t)
'\\begin{eqn}5\\sum_{i=0}^n i^2\\end{eqn}'
class TexSoup.data.TexUnNamedEnv[source]
class TexSoup.data.TexMathEnv[source]
class TexSoup.data.TexDisplayMathEnv[source]
class TexSoup.data.TexMathModeEnv[source]
class TexSoup.data.TexDisplayMathModeEnv[source]

Text

class TexSoup.data.TexText[source]

Abstraction for LaTeX text.

Representing regular text objects in the parsed tree allows users to search and modify text objects as any other expression allows.

>>> obj = TexNode(TexText('asdf gg'))
>>> 'asdf' in obj
True
>>> 'err' in obj
False
>>> TexText('df ').strip()
'df'