Data Structures¶

TexSoup transforms a LaTeX document into a complex tree of various Python objects, but all objects fall into one of the following three categories:

TexNode, TexExpr (environments and commands), and TexGroup s.

Node¶

class TexSoup.data.TexNode[source]¶

A tree node representing an expression in the LaTeX document.

Every node in the parse tree is a TexNode, equipped with navigation, search, and modification utilities. To navigate the parse tree, use abstractions such as children and descendant. To access content in the parse tree, use abstractions such as contents, text, string , and args.

Note that the LaTeX parse tree is largely shallow: only environments such as itemize or enumerate have children and thus descendants. Typical LaTeX expressions such as \section have arguments but not children.

all¶

Returns all content in this node, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \newcommand{reverseconcat}[3]{#3#2#1}
... ''')
>>> alls = soup.all
>>> alls[0]


>>> alls[1]
\newcommand{reverseconcat}[3]{#3#2#1}

append(*nodes)[source]¶

Add node(s) to this node’s list of children.

Parameters:	nodes (TexNode) – List of nodes to add

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
... \end{itemize}
... \section{Hey}
... \textit{Willy}''')
>>> soup.section
\section{Hey}
>>> soup.section.append(soup.textit)  
Traceback (most recent call last):
...
TypeError: ...
>>> soup.section
\section{Hey}
>>> soup.itemize.append('    ', soup.item)
>>> soup.itemize
\begin{itemize}
    \item Hello
    \item Hello
\end{itemize}

args¶

Arguments for this node. Note that this argument is settable.

Return type:	TexArgs

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\newcommand{reverseconcat}[3]{#3#2#1}''')
>>> soup.newcommand.args
[BraceGroup('reverseconcat'), BracketGroup('3'), BraceGroup('#3#2#1')]
>>> soup.newcommand.args = soup.newcommand.args[:2]
>>> soup.newcommand
\newcommand{reverseconcat}[3]

char_pos_to_line(char_pos)[source]¶

Map position in the original string to parsed LaTeX position.

Parameters:	char_pos (int) – Character position in the original string
Returns:	(line number, index of character in line)
Return type:	Tuple[int, int]

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textbf{Silly}
... \textit{Willy}''')
>>> soup.char_pos_to_line(10)
(1, 9)
>>> soup.char_pos_to_line(20)
(2, 5)

children¶

Immediate children of this TeX element that are valid TeX objects.

This is equivalent to contents, excluding text elements and keeping only Tex expressions.

Returns:	generator of all children
Return type:	Iterator[TexExpr]

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     Random text!
...     \item Hello
... \end{itemize}''')
>>> soup.itemize.children[0]
\item Hello

contents¶

Any non-whitespace contents inside of this TeX element.

Returns:	generator of all nodes, tokens, and strings
Return type:	Iterator[Union[TexNode,str]]

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     Random text!
...     \item Hello
... \end{itemize}''')
>>> contents = soup.itemize.contents
>>> contents[0]
'\n    Random text!\n    '
>>> contents[1]
\item Hello

copy()[source]¶

Create another copy of the current node.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textit{Silly}
... \textit{Willy}''')
>>> s = soup.section.copy()
>>> s.parent is None
True

count(name=None, **attrs)[source]¶

Number of descendants matching criteria.

Parameters:	name (Union[None,str]) – name of LaTeX expression attrs – LaTeX expression attributes, such as item text.
Returns:	number of matching expressions
Return type:	int

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Hey}
... \textit{Silly}
... \textit{Willy}''')
>>> soup.count('section')
1
>>> soup.count('textit')
2

delete()[source]¶

Delete this node from the parse tree.

Where applicable, this will remove all descendants of this node from the parse tree.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \textit{\color{blue}{Silly}}\textit{keep me!}''')
>>> soup.textit.color.delete()
>>> soup

\textit{}\textit{keep me!}
>>> soup.textit.delete()
>>> soup

\textit{keep me!}

descendants¶

Returns all descendants for this TeX element.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \begin{itemize}
...         \item Nested
...     \end{itemize}
... \end{itemize}''')
>>> descendants = list(soup.itemize.descendants)
>>> descendants[1]
\item Nested

find(name=None, **attrs)[source]¶

First descendant node matching criteria.

Returns None if no descendant node found.

Returns:	descendant node matching criteria
Return type:	Union[None,TexExpr]

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Ooo}
... \textit{eee}
... \textit{ooo}''')
>>> soup.find('textit')
\textit{eee}
>>> soup.find('textbf')

find_all(name=None, **attrs)[source]¶

Return all descendant nodes matching criteria.

Parameters:	name (Union[None,str,list]) – name of LaTeX expression attrs – LaTeX expression attributes, such as item text.
Returns:	All descendant nodes matching criteria
Return type:	Iterator[TexNode]

If name is a list of str‘s, any matching section will be matched.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \section{Ooo}
... \textit{eee}
... \textit{ooo}''')
>>> gen = soup.find_all('textit')
>>> gen[0]
\textit{eee}
>>> gen[1]
\textit{ooo}
>>> soup.find_all('textbf')[0]
Traceback (most recent call last):
...
IndexError: list index out of range

insert(i, *nodes)[source]¶

Add node(s) to this node’s list of children, at position i.

Parameters:	i (int) – Position to add nodes to nodes (TexNode) – List of nodes to add

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> item = soup.item.copy()
>>> soup.item.delete()
>>> soup.itemize.insert(1, item)
>>> soup.itemize
\begin{itemize}
    \item Hello
    \item Bye
\end{itemize}
>>> item.parent.name == soup.itemize.name
True

name¶

Name of the expression. Used for search functions.

Return type:	str

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\textbf{Hello}''')
>>> soup.textbf.name
'textbf'
>>> soup.textbf.name = 'textit'
>>> soup.textit
\textit{Hello}

position¶

Position of first character in expression, in original source.

Note this position is NOT updated as the parsed tree is modified.

remove(node)[source]¶

Remove a node from this node’s list of contents.

Parameters:	node (TexExpr) – Node to remove

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> soup.itemize.remove(soup.item)
>>> soup.itemize
\begin{itemize}
    \item Bye
\end{itemize}

replace(child, *nodes)[source]¶

Replace provided node with node(s).

Parameters:	child (TexNode) – Child node to replace nodes (TexNode) – List of nodes to subtitute in

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> items = list(soup.find_all('item'))
>>> bye = items[1]
>>> soup.itemize.replace(soup.item, bye)
>>> soup.itemize
\begin{itemize}
    \item Bye
\item Bye
\end{itemize}

replace_with(*nodes)[source]¶

Replace this node in the parse tree with the provided node(s).

Parameters:	nodes (TexNode) – List of nodes to subtitute in

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \item Hello
...     \item Bye
... \end{itemize}''')
>>> items = list(soup.find_all('item'))
>>> bye = items[1]
>>> soup.item.replace_with(bye)
>>> soup.itemize
\begin{itemize}
    \item Bye
\item Bye
\end{itemize}

string¶

This is valid if and only if

the expression is a TexCmd AND has only one argument OR
the expression is a TexEnv AND has only one TexText child

Return type:	Union[None,str]

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''\textbf{Hello}''')
>>> soup.textbf.string
'Hello'
>>> soup.textbf.string = 'Hello World'
>>> soup.textbf.string
'Hello World'
>>> soup.textbf
\textbf{Hello World}
>>> soup = TexSoup(r'''\begin{equation}1+1\end{equation}''')
>>> soup.equation.string
'1+1'
>>> soup.equation.string = '2+2'
>>> soup.equation.string
'2+2'

text¶

All text in descendant nodes.

This is equivalent to contents, keeping text elements and excluding Tex expressions.

>>> from TexSoup import TexSoup
>>> soup = TexSoup(r'''
... \begin{itemize}
...     \begin{itemize}
...         \item Nested
...     \end{itemize}
... \end{itemize}''')
>>> soup.text[0]
' Nested\n    '

Expressions¶

class TexSoup.data.TexExpr[source]¶

General abstraction for a TeX expression.

An expression may be a command or an environment and is identified by a name, arguments, and place in the parse tree. This is an abstract and is not directly instantiated.

all¶

Returns all content in this expression, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.

>>> expr1 = TexExpr('textbf', ('\n', 'hi'))
>>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True)
>>> list(expr1.all) == list(expr2.all)
True

append(*exprs)[source]¶

Add contents to the expression.

Parameters:	exprs (Union[TexExpr,str]) – List of contents to add

>>> expr = TexExpr('textbf', ('hello',))
>>> expr
TexExpr('textbf', ['hello'])
>>> expr.append('world')
>>> expr
TexExpr('textbf', ['hello', 'world'])

contents¶

Returns all contents in this expression.

Optionally includes whitespace if set when node was created.

>>> expr1 = TexExpr('textbf', ('\n', 'hi'))
>>> list(expr1.contents)
['hi']
>>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True)
>>> list(expr2.contents)
['\n', 'hi']
>>> expr = TexExpr('textbf', ('\n', 'hi'))
>>> expr.contents = ('hehe', '👻')
>>> list(expr.contents)
['hehe', '👻']
>>> expr.contents = 35  
Traceback (most recent call last):
    ...
TypeError: ...

insert(i, *exprs)[source]¶

Insert content at specified position into expression.

Parameters:	i (int) – Position to add content to exprs (Union[TexExpr,str]) – List of contents to add

>>> expr = TexExpr('textbf', ('hello',))
>>> expr
TexExpr('textbf', ['hello'])
>>> expr.insert(0, 'world')
>>> expr
TexExpr('textbf', ['world', 'hello'])
>>> expr.insert(0, TexText('asdf'))
>>> expr
TexExpr('textbf', ['asdf', 'world', 'hello'])

remove(expr)[source]¶

Remove a provided expression from its list of contents.

Parameters:	expr (Union[TexExpr,str]) – Content to add
Returns:	index of the expression removed
Return type:	int

>>> expr = TexExpr('textbf', ('hello',))
>>> expr.remove('hello')
0
>>> expr
TexExpr('textbf', [])

string¶

All contents stringified. A convenience property

>>> expr = TexExpr('hello', ['naw'])
>>> expr.string
'naw'
>>> expr.string = 'huehue'
>>> expr.string
'huehue'
>>> type(expr.string)
<class 'TexSoup.data.TexText'>
>>> str(expr)
"TexExpr('hello', ['huehue'])"
>>> expr.string = 35  
Traceback (most recent call last):
    ...
TypeError: ...

class TexSoup.data.TexEnv[source]¶

Abstraction for a LaTeX command, with starting and ending markers. Contains three attributes:

a human-readable environment name,
the environment delimiters
the environment’s contents.

>>> t = TexEnv('displaymath', r'\[', r'\]',
...     ['\\mathcal{M} \\circ \\mathcal{A}'])
>>> t
TexEnv('displaymath', ['\\mathcal{M} \\circ \\mathcal{A}'], [])
>>> print(t)
\[\mathcal{M} \circ \mathcal{A}\]
>>> len(list(t.children))
0

class TexSoup.data.TexCmd[source]¶

Abstraction for a LaTeX command. Contains two attributes:

the command name itself and
the command arguments, whether optional or required.

>>> textit = TexCmd('textit', args=[BraceGroup('slant')])
>>> t = TexCmd('textbf', args=[BraceGroup('big ', textit, '.')])
>>> t
TexCmd('textbf', [BraceGroup('big ', TexCmd('textit', [BraceGroup('slant')]), '.')])
>>> print(t)
\textbf{big \textit{slant}.}
>>> children = list(map(str, t.children))
>>> len(children)
1
>>> print(children[0])
\textit{slant}

Groups¶

class TexSoup.data.TexGroup[source]¶

Abstraction for a LaTeX environment with single-character delimiters.

Used primarily to identify and associate arguments with commands.

classmethod parse(s)[source]¶

Parse a string or list and return an Argument object.

Naive implementation, does not parse expressions in provided string.

Parameters:	s (Union[str,iterable]) – Either a string or a list, where the first and last elements are valid argument delimiters.

>>> TexGroup.parse('[arg0]')
BracketGroup('arg0')

class TexSoup.data.BracketGroup[source]¶: Optional argument, denoted as [arg]

class TexSoup.data.BraceGroup[source]¶: Required argument, denoted as {arg}.

class TexSoup.data.TexArgs[source]¶

List of arguments for a TeX expression. Supports all standard list ops.

Additional support for conversion from and to unparsed argument strings.

>>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg1]', '{arg2}'])
>>> arguments
[BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')]
>>> arguments.all
['\n', BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')]
>>> arguments[2]
BraceGroup('arg2')
>>> len(arguments)
3
>>> arguments[:2]
[BraceGroup('arg0'), BracketGroup('arg1')]
>>> isinstance(arguments[:2], TexArgs)
True

append(arg)[source]¶

Append whitespace, an unparsed argument string, or an argument object.

Parameters:	arg (TexGroup) – argument to add to the end of the list

>>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}'])
>>> arguments.append('[arg3]')
>>> arguments[3]
BracketGroup('arg3')
>>> arguments.append(BraceGroup('arg4'))
>>> arguments[4]
BraceGroup('arg4')
>>> len(arguments)
5
>>> arguments.append('\n')
>>> len(arguments)
5
>>> len(arguments.all)
6

clear()[source]¶

Clear both the list and the proxy .all.

>>> args = TexArgs(['\n', BraceGroup('arg1'), BracketGroup('arg2')])
>>> args.clear()
>>> len(args) == len(args.all) == 0
True

extend(args)[source]¶

Extend mixture of unparsed argument strings, arguments objects, and whitespace.

Parameters:	args (List[TexGroup]) – Arguments to add to end of the list

>>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}'])
>>> arguments.extend(['[arg3]', BraceGroup('arg4'), '\t'])
>>> len(arguments)
5
>>> arguments[4]
BraceGroup('arg4')

insert(i, arg)[source]¶

Insert whitespace, an unparsed argument string, or an argument object.

Parameters:	i (int) – Index to insert argument into arg (TexGroup) – Argument to insert

>>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg2]'])
>>> arguments.insert(1, '[arg1]')
>>> len(arguments)
3
>>> arguments
[BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')]
>>> arguments.all
['\n', BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')]
>>> arguments.insert(10, '[arg3]')
>>> arguments[3]
BracketGroup('arg3')

pop(i)[source]¶

Pop argument object at provided index.

Parameters:	i (int) – Index to pop from the list

>>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}'])
>>> arguments.pop(1)
BracketGroup('arg2')
>>> len(arguments)
2
>>> arguments[0]
BraceGroup('arg0')

remove(item)[source]¶

Remove either an unparsed argument string or an argument object.

Parameters:	item (Union[str,TexGroup]) – Item to remove

>>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}'])
>>> arguments.remove('{arg0}')
>>> len(arguments)
2
>>> arguments[0]
BracketGroup('arg2')
>>> arguments.remove(arguments[0])
>>> arguments[0]
BraceGroup('arg3')
>>> arguments.remove(BraceGroup('arg3'))
>>> len(arguments)
0
>>> arguments = TexArgs([
...     BraceGroup(TexCmd('color')),
...     BraceGroup(TexCmd('color', [BraceGroup('blue')]))
... ])
>>> arguments.remove(arguments[0])
>>> len(arguments)
1
>>> arguments.remove(arguments[0])
>>> len(arguments)
0

reverse()[source]¶

Reverse both the list and the proxy .all.

>>> args = TexArgs(['\n', BraceGroup('arg1'), BracketGroup('arg2')])
>>> args.reverse()
>>> args.all
[BracketGroup('arg2'), BraceGroup('arg1'), '\n']
>>> args
[BracketGroup('arg2'), BraceGroup('arg1')]

Environments¶

class TexSoup.data.TexNamedEnv[source]¶

Abstraction for a LaTeX command, denoted by \begin{env} and \end{env}. Contains three attributes:

the environment name itself,
the environment arguments, whether optional or required, and
the environment’s contents.

Warning: Note that setting TexNamedEnv.begin or TexNamedEnv.end has no effect. The begin and end tokens are always constructed from TexNamedEnv.name.

>>> t = TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'],
...     [BraceGroup('c | c c')])
>>> t
TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'], [BraceGroup('c | c c')])
>>> print(t)
\begin{tabular}{c | c c}
0 & 0 & * \\
1 & 1 & * \\
\end{tabular}
>>> len(list(t.children))
0
>>> t = TexNamedEnv('equation', [r'5\sum_{i=0}^n i^2'])
>>> str(t)
'\\begin{equation}5\\sum_{i=0}^n i^2\\end{equation}'
>>> t.name = 'eqn'
>>> str(t)
'\\begin{eqn}5\\sum_{i=0}^n i^2\\end{eqn}'

class TexSoup.data.TexUnNamedEnv[source]¶

class TexSoup.data.TexMathEnv[source]¶

class TexSoup.data.TexDisplayMathEnv[source]¶

class TexSoup.data.TexMathModeEnv[source]¶

class TexSoup.data.TexDisplayMathModeEnv[source]¶

Text¶

class TexSoup.data.TexText[source]¶

Abstraction for LaTeX text.

Representing regular text objects in the parsed tree allows users to search and modify text objects as any other expression allows.

>>> obj = TexNode(TexText('asdf gg'))
>>> 'asdf' in obj
True
>>> 'err' in obj
False
>>> TexText('df ').strip()
'df'