Data Structures¶
TexSoup transforms a LaTeX document into a complex tree of various Python objects, but all objects fall into one of the following three categories:
TexNode
, TexExpr
(environments and commands), and TexGroup
s.
Node¶
-
class
TexSoup.data.
TexNode
[source]¶ A tree node representing an expression in the LaTeX document.
Every node in the parse tree is a
TexNode
, equipped with navigation, search, and modification utilities. To navigate the parse tree, use abstractions such aschildren
anddescendant
. To access content in the parse tree, use abstractions such ascontents
,text
,string
, andargs
.Note that the LaTeX parse tree is largely shallow: only environments such as
itemize
orenumerate
have children and thus descendants. Typical LaTeX expressions such as\section
have arguments but not children.-
all
¶ Returns all content in this node, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \newcommand{reverseconcat}[3]{#3#2#1} ... ''') >>> alls = soup.all >>> alls[0] >>> alls[1] \newcommand{reverseconcat}[3]{#3#2#1}
-
append
(*nodes)[source]¶ Add node(s) to this node’s list of children.
Parameters: nodes (TexNode) – List of nodes to add >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \end{itemize} ... \section{Hey} ... \textit{Willy}''') >>> soup.section \section{Hey} >>> soup.section.append(soup.textit) Traceback (most recent call last): ... TypeError: ... >>> soup.section \section{Hey} >>> soup.itemize.append(' ', soup.item) >>> soup.itemize \begin{itemize} \item Hello \item Hello \end{itemize}
-
args
¶ Arguments for this node. Note that this argument is settable.
Return type: TexArgs >>> from TexSoup import TexSoup >>> soup = TexSoup(r'''\newcommand{reverseconcat}[3]{#3#2#1}''') >>> soup.newcommand.args [BraceGroup('reverseconcat'), BracketGroup('3'), BraceGroup('#3#2#1')] >>> soup.newcommand.args = soup.newcommand.args[:2] >>> soup.newcommand \newcommand{reverseconcat}[3]
-
char_pos_to_line
(char_pos)[source]¶ Map position in the original string to parsed LaTeX position.
Parameters: char_pos (int) – Character position in the original string Returns: (line number, index of character in line) Return type: Tuple[int, int] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Hey} ... \textbf{Silly} ... \textit{Willy}''') >>> soup.char_pos_to_line(10) (1, 9) >>> soup.char_pos_to_line(20) (2, 5)
-
children
¶ Immediate children of this TeX element that are valid TeX objects.
This is equivalent to contents, excluding text elements and keeping only Tex expressions.
Returns: generator of all children Return type: Iterator[TexExpr] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... Random text! ... \item Hello ... \end{itemize}''') >>> soup.itemize.children[0] \item Hello
-
contents
¶ Any non-whitespace contents inside of this TeX element.
Returns: generator of all nodes, tokens, and strings Return type: Iterator[Union[TexNode,str]] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... Random text! ... \item Hello ... \end{itemize}''') >>> contents = soup.itemize.contents >>> contents[0] '\n Random text!\n ' >>> contents[1] \item Hello
-
copy
()[source]¶ Create another copy of the current node.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Hey} ... \textit{Silly} ... \textit{Willy}''') >>> s = soup.section.copy() >>> s.parent is None True
-
count
(name=None, **attrs)[source]¶ Number of descendants matching criteria.
Parameters: Returns: number of matching expressions
Return type: >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Hey} ... \textit{Silly} ... \textit{Willy}''') >>> soup.count('section') 1 >>> soup.count('textit') 2
-
delete
()[source]¶ Delete this node from the parse tree.
Where applicable, this will remove all descendants of this node from the parse tree.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \textit{\color{blue}{Silly}}\textit{keep me!}''') >>> soup.textit.color.delete() >>> soup \textit{}\textit{keep me!} >>> soup.textit.delete() >>> soup \textit{keep me!}
-
descendants
¶ Returns all descendants for this TeX element.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \begin{itemize} ... \item Nested ... \end{itemize} ... \end{itemize}''') >>> descendants = list(soup.itemize.descendants) >>> descendants[1] \item Nested
-
find
(name=None, **attrs)[source]¶ First descendant node matching criteria.
Returns None if no descendant node found.
Returns: descendant node matching criteria Return type: Union[None,TexExpr] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Ooo} ... \textit{eee} ... \textit{ooo}''') >>> soup.find('textit') \textit{eee} >>> soup.find('textbf')
-
find_all
(name=None, **attrs)[source]¶ Return all descendant nodes matching criteria.
Parameters: Returns: All descendant nodes matching criteria
Return type: Iterator[TexNode]
If name is a list of str‘s, any matching section will be matched.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Ooo} ... \textit{eee} ... \textit{ooo}''') >>> gen = soup.find_all('textit') >>> gen[0] \textit{eee} >>> gen[1] \textit{ooo} >>> soup.find_all('textbf')[0] Traceback (most recent call last): ... IndexError: list index out of range
-
insert
(i, *nodes)[source]¶ Add node(s) to this node’s list of children, at position i.
Parameters: >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> item = soup.item.copy() >>> soup.item.delete() >>> soup.itemize.insert(1, item) >>> soup.itemize \begin{itemize} \item Hello \item Bye \end{itemize} >>> item.parent.name == soup.itemize.name True
-
name
¶ Name of the expression. Used for search functions.
Return type: str >>> from TexSoup import TexSoup >>> soup = TexSoup(r'''\textbf{Hello}''') >>> soup.textbf.name 'textbf' >>> soup.textbf.name = 'textit' >>> soup.textit \textit{Hello}
-
position
¶ Position of first character in expression, in original source.
Note this position is NOT updated as the parsed tree is modified.
-
remove
(node)[source]¶ Remove a node from this node’s list of contents.
Parameters: node (TexExpr) – Node to remove >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> soup.itemize.remove(soup.item) >>> soup.itemize \begin{itemize} \item Bye \end{itemize}
-
replace
(child, *nodes)[source]¶ Replace provided node with node(s).
Parameters: >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> items = list(soup.find_all('item')) >>> bye = items[1] >>> soup.itemize.replace(soup.item, bye) >>> soup.itemize \begin{itemize} \item Bye \item Bye \end{itemize}
-
replace_with
(*nodes)[source]¶ Replace this node in the parse tree with the provided node(s).
Parameters: nodes (TexNode) – List of nodes to subtitute in >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> items = list(soup.find_all('item')) >>> bye = items[1] >>> soup.item.replace_with(bye) >>> soup.itemize \begin{itemize} \item Bye \item Bye \end{itemize}
-
string
¶ This is valid if and only if
- the expression is a
TexCmd
AND has only one argument OR - the expression is a
TexEnv
AND has only one TexText child
Return type: Union[None,str] >>> from TexSoup import TexSoup >>> soup = TexSoup(r'''\textbf{Hello}''') >>> soup.textbf.string 'Hello' >>> soup.textbf.string = 'Hello World' >>> soup.textbf.string 'Hello World' >>> soup.textbf \textbf{Hello World} >>> soup = TexSoup(r'''\begin{equation}1+1\end{equation}''') >>> soup.equation.string '1+1' >>> soup.equation.string = '2+2' >>> soup.equation.string '2+2'
- the expression is a
-
text
¶ All text in descendant nodes.
This is equivalent to contents, keeping text elements and excluding Tex expressions.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \begin{itemize} ... \item Nested ... \end{itemize} ... \end{itemize}''') >>> soup.text[0] ' Nested\n '
-
Expressions¶
-
class
TexSoup.data.
TexExpr
[source]¶ General abstraction for a TeX expression.
An expression may be a command or an environment and is identified by a name, arguments, and place in the parse tree. This is an abstract and is not directly instantiated.
-
all
¶ Returns all content in this expression, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.
>>> expr1 = TexExpr('textbf', ('\n', 'hi')) >>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True) >>> list(expr1.all) == list(expr2.all) True
-
append
(*exprs)[source]¶ Add contents to the expression.
Parameters: exprs (Union[TexExpr,str]) – List of contents to add >>> expr = TexExpr('textbf', ('hello',)) >>> expr TexExpr('textbf', ['hello']) >>> expr.append('world') >>> expr TexExpr('textbf', ['hello', 'world'])
-
contents
¶ Returns all contents in this expression.
Optionally includes whitespace if set when node was created.
>>> expr1 = TexExpr('textbf', ('\n', 'hi')) >>> list(expr1.contents) ['hi'] >>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True) >>> list(expr2.contents) ['\n', 'hi'] >>> expr = TexExpr('textbf', ('\n', 'hi')) >>> expr.contents = ('hehe', '👻') >>> list(expr.contents) ['hehe', '👻'] >>> expr.contents = 35 Traceback (most recent call last): ... TypeError: ...
-
insert
(i, *exprs)[source]¶ Insert content at specified position into expression.
Parameters: >>> expr = TexExpr('textbf', ('hello',)) >>> expr TexExpr('textbf', ['hello']) >>> expr.insert(0, 'world') >>> expr TexExpr('textbf', ['world', 'hello']) >>> expr.insert(0, TexText('asdf')) >>> expr TexExpr('textbf', ['asdf', 'world', 'hello'])
-
remove
(expr)[source]¶ Remove a provided expression from its list of contents.
Parameters: expr (Union[TexExpr,str]) – Content to add Returns: index of the expression removed Return type: int >>> expr = TexExpr('textbf', ('hello',)) >>> expr.remove('hello') 0 >>> expr TexExpr('textbf', [])
-
string
¶ All contents stringified. A convenience property
>>> expr = TexExpr('hello', ['naw']) >>> expr.string 'naw' >>> expr.string = 'huehue' >>> expr.string 'huehue' >>> type(expr.string) <class 'TexSoup.data.TexText'> >>> str(expr) "TexExpr('hello', ['huehue'])" >>> expr.string = 35 Traceback (most recent call last): ... TypeError: ...
-
-
class
TexSoup.data.
TexEnv
[source]¶ Abstraction for a LaTeX command, with starting and ending markers. Contains three attributes:
- a human-readable environment name,
- the environment delimiters
- the environment’s contents.
>>> t = TexEnv('displaymath', r'\[', r'\]', ... ['\\mathcal{M} \\circ \\mathcal{A}']) >>> t TexEnv('displaymath', ['\\mathcal{M} \\circ \\mathcal{A}'], []) >>> print(t) \[\mathcal{M} \circ \mathcal{A}\] >>> len(list(t.children)) 0
-
class
TexSoup.data.
TexCmd
[source]¶ Abstraction for a LaTeX command. Contains two attributes:
- the command name itself and
- the command arguments, whether optional or required.
>>> textit = TexCmd('textit', args=[BraceGroup('slant')]) >>> t = TexCmd('textbf', args=[BraceGroup('big ', textit, '.')]) >>> t TexCmd('textbf', [BraceGroup('big ', TexCmd('textit', [BraceGroup('slant')]), '.')]) >>> print(t) \textbf{big \textit{slant}.} >>> children = list(map(str, t.children)) >>> len(children) 1 >>> print(children[0]) \textit{slant}
Groups¶
-
class
TexSoup.data.
TexGroup
[source]¶ Abstraction for a LaTeX environment with single-character delimiters.
Used primarily to identify and associate arguments with commands.
-
classmethod
parse
(s)[source]¶ Parse a string or list and return an Argument object.
Naive implementation, does not parse expressions in provided string.
Parameters: s (Union[str,iterable]) – Either a string or a list, where the first and last elements are valid argument delimiters. >>> TexGroup.parse('[arg0]') BracketGroup('arg0')
-
classmethod
-
class
TexSoup.data.
TexArgs
[source]¶ List of arguments for a TeX expression. Supports all standard list ops.
Additional support for conversion from and to unparsed argument strings.
>>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg1]', '{arg2}']) >>> arguments [BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')] >>> arguments.all ['\n', BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')] >>> arguments[2] BraceGroup('arg2') >>> len(arguments) 3 >>> arguments[:2] [BraceGroup('arg0'), BracketGroup('arg1')] >>> isinstance(arguments[:2], TexArgs) True
-
append
(arg)[source]¶ Append whitespace, an unparsed argument string, or an argument object.
Parameters: arg (TexGroup) – argument to add to the end of the list >>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}']) >>> arguments.append('[arg3]') >>> arguments[3] BracketGroup('arg3') >>> arguments.append(BraceGroup('arg4')) >>> arguments[4] BraceGroup('arg4') >>> len(arguments) 5 >>> arguments.append('\n') >>> len(arguments) 5 >>> len(arguments.all) 6
-
clear
()[source]¶ Clear both the list and the proxy .all.
>>> args = TexArgs(['\n', BraceGroup('arg1'), BracketGroup('arg2')]) >>> args.clear() >>> len(args) == len(args.all) == 0 True
-
extend
(args)[source]¶ Extend mixture of unparsed argument strings, arguments objects, and whitespace.
Parameters: args (List[TexGroup]) – Arguments to add to end of the list >>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}']) >>> arguments.extend(['[arg3]', BraceGroup('arg4'), '\t']) >>> len(arguments) 5 >>> arguments[4] BraceGroup('arg4')
-
insert
(i, arg)[source]¶ Insert whitespace, an unparsed argument string, or an argument object.
Parameters: >>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg2]']) >>> arguments.insert(1, '[arg1]') >>> len(arguments) 3 >>> arguments [BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')] >>> arguments.all ['\n', BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')] >>> arguments.insert(10, '[arg3]') >>> arguments[3] BracketGroup('arg3')
-
pop
(i)[source]¶ Pop argument object at provided index.
Parameters: i (int) – Index to pop from the list >>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}']) >>> arguments.pop(1) BracketGroup('arg2') >>> len(arguments) 2 >>> arguments[0] BraceGroup('arg0')
-
remove
(item)[source]¶ Remove either an unparsed argument string or an argument object.
Parameters: item (Union[str,TexGroup]) – Item to remove >>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}']) >>> arguments.remove('{arg0}') >>> len(arguments) 2 >>> arguments[0] BracketGroup('arg2') >>> arguments.remove(arguments[0]) >>> arguments[0] BraceGroup('arg3') >>> arguments.remove(BraceGroup('arg3')) >>> len(arguments) 0 >>> arguments = TexArgs([ ... BraceGroup(TexCmd('color')), ... BraceGroup(TexCmd('color', [BraceGroup('blue')])) ... ]) >>> arguments.remove(arguments[0]) >>> len(arguments) 1 >>> arguments.remove(arguments[0]) >>> len(arguments) 0
-
Environments¶
-
class
TexSoup.data.
TexNamedEnv
[source]¶ Abstraction for a LaTeX command, denoted by
\begin{env}
and\end{env}
. Contains three attributes:- the environment name itself,
- the environment arguments, whether optional or required, and
- the environment’s contents.
Warning: Note that setting TexNamedEnv.begin or TexNamedEnv.end has no effect. The begin and end tokens are always constructed from TexNamedEnv.name.
>>> t = TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'], ... [BraceGroup('c | c c')]) >>> t TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'], [BraceGroup('c | c c')]) >>> print(t) \begin{tabular}{c | c c} 0 & 0 & * \\ 1 & 1 & * \\ \end{tabular} >>> len(list(t.children)) 0 >>> t = TexNamedEnv('equation', [r'5\sum_{i=0}^n i^2']) >>> str(t) '\\begin{equation}5\\sum_{i=0}^n i^2\\end{equation}' >>> t.name = 'eqn' >>> str(t) '\\begin{eqn}5\\sum_{i=0}^n i^2\\end{eqn}'
Text¶
-
class
TexSoup.data.
TexText
[source]¶ Abstraction for LaTeX text.
Representing regular text objects in the parsed tree allows users to search and modify text objects as any other expression allows.
>>> obj = TexNode(TexText('asdf gg')) >>> 'asdf' in obj True >>> 'err' in obj False >>> TexText('df ').strip() 'df'