Data Structures¶
TexSoup transforms a LaTeX document into a complex tree of various Python objects, but all objects fall into one of the following three categories:
TexNode, TexExpr (environments and commands), and TexGroup s.
Node¶
-
class
TexSoup.data.TexNode[source]¶ A tree node representing an expression in the LaTeX document.
Every node in the parse tree is a
TexNode, equipped with navigation, search, and modification utilities. To navigate the parse tree, use abstractions such aschildrenanddescendant. To access content in the parse tree, use abstractions such ascontents,text,string, andargs.Note that the LaTeX parse tree is largely shallow: only environments such as
itemizeorenumeratehave children and thus descendants. Typical LaTeX expressions such as\sectionhave arguments but not children.-
all¶ Returns all content in this node, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \newcommand{reverseconcat}[3]{#3#2#1} ... ''') >>> alls = soup.all >>> alls[0] >>> alls[1] \newcommand{reverseconcat}[3]{#3#2#1}
-
append(*nodes)[source]¶ Add node(s) to this node’s list of children.
Parameters: nodes (TexNode) – List of nodes to add >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \end{itemize} ... \section{Hey} ... \textit{Willy}''') >>> soup.section \section{Hey} >>> soup.section.append(soup.textit) Traceback (most recent call last): ... TypeError: ... >>> soup.section \section{Hey} >>> soup.itemize.append(' ', soup.item) >>> soup.itemize \begin{itemize} \item Hello \item Hello \end{itemize}
-
args¶ Arguments for this node. Note that this argument is settable.
Return type: TexArgs >>> from TexSoup import TexSoup >>> soup = TexSoup(r'''\newcommand{reverseconcat}[3]{#3#2#1}''') >>> soup.newcommand.args [BraceGroup('reverseconcat'), BracketGroup('3'), BraceGroup('#3#2#1')] >>> soup.newcommand.args = soup.newcommand.args[:2] >>> soup.newcommand \newcommand{reverseconcat}[3]
-
char_pos_to_line(char_pos)[source]¶ Map position in the original string to parsed LaTeX position.
Parameters: char_pos (int) – Character position in the original string Returns: (line number, index of character in line) Return type: Tuple[int, int] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Hey} ... \textbf{Silly} ... \textit{Willy}''') >>> soup.char_pos_to_line(10) (1, 9) >>> soup.char_pos_to_line(20) (2, 5)
-
children¶ Immediate children of this TeX element that are valid TeX objects.
This is equivalent to contents, excluding text elements and keeping only Tex expressions.
Returns: generator of all children Return type: Iterator[TexExpr] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... Random text! ... \item Hello ... \end{itemize}''') >>> soup.itemize.children[0] \item Hello
-
contents¶ Any non-whitespace contents inside of this TeX element.
Returns: generator of all nodes, tokens, and strings Return type: Iterator[Union[TexNode,str]] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... Random text! ... \item Hello ... \end{itemize}''') >>> contents = soup.itemize.contents >>> contents[0] '\n Random text!\n ' >>> contents[1] \item Hello
-
copy()[source]¶ Create another copy of the current node.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Hey} ... \textit{Silly} ... \textit{Willy}''') >>> s = soup.section.copy() >>> s.parent is None True
-
count(name=None, **attrs)[source]¶ Number of descendants matching criteria.
Parameters: Returns: number of matching expressions
Return type: >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Hey} ... \textit{Silly} ... \textit{Willy}''') >>> soup.count('section') 1 >>> soup.count('textit') 2
-
delete()[source]¶ Delete this node from the parse tree.
Where applicable, this will remove all descendants of this node from the parse tree.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \textit{\color{blue}{Silly}}\textit{keep me!}''') >>> soup.textit.color.delete() >>> soup \textit{}\textit{keep me!} >>> soup.textit.delete() >>> soup \textit{keep me!}
-
descendants¶ Returns all descendants for this TeX element.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \begin{itemize} ... \item Nested ... \end{itemize} ... \end{itemize}''') >>> descendants = list(soup.itemize.descendants) >>> descendants[1] \item Nested
-
find(name=None, **attrs)[source]¶ First descendant node matching criteria.
Returns None if no descendant node found.
Returns: descendant node matching criteria Return type: Union[None,TexExpr] >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Ooo} ... \textit{eee} ... \textit{ooo}''') >>> soup.find('textit') \textit{eee} >>> soup.find('textbf')
-
find_all(name=None, **attrs)[source]¶ Return all descendant nodes matching criteria.
Parameters: Returns: All descendant nodes matching criteria
Return type: Iterator[TexNode]
If name is a list of str‘s, any matching section will be matched.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \section{Ooo} ... \textit{eee} ... \textit{ooo}''') >>> gen = soup.find_all('textit') >>> gen[0] \textit{eee} >>> gen[1] \textit{ooo} >>> soup.find_all('textbf')[0] Traceback (most recent call last): ... IndexError: list index out of range
-
insert(i, *nodes)[source]¶ Add node(s) to this node’s list of children, at position i.
Parameters: >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> item = soup.item.copy() >>> soup.item.delete() >>> soup.itemize.insert(1, item) >>> soup.itemize \begin{itemize} \item Hello \item Bye \end{itemize} >>> item.parent.name == soup.itemize.name True
-
name¶ Name of the expression. Used for search functions.
Return type: str >>> from TexSoup import TexSoup >>> soup = TexSoup(r'''\textbf{Hello}''') >>> soup.textbf.name 'textbf' >>> soup.textbf.name = 'textit' >>> soup.textit \textit{Hello}
-
position¶ Position of first character in expression, in original source.
Note this position is NOT updated as the parsed tree is modified.
-
remove(node)[source]¶ Remove a node from this node’s list of contents.
Parameters: node (TexExpr) – Node to remove >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> soup.itemize.remove(soup.item) >>> soup.itemize \begin{itemize} \item Bye \end{itemize}
-
replace(child, *nodes)[source]¶ Replace provided node with node(s).
Parameters: >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> items = list(soup.find_all('item')) >>> bye = items[1] >>> soup.itemize.replace(soup.item, bye) >>> soup.itemize \begin{itemize} \item Bye \item Bye \end{itemize}
-
replace_with(*nodes)[source]¶ Replace this node in the parse tree with the provided node(s).
Parameters: nodes (TexNode) – List of nodes to subtitute in >>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \item Hello ... \item Bye ... \end{itemize}''') >>> items = list(soup.find_all('item')) >>> bye = items[1] >>> soup.item.replace_with(bye) >>> soup.itemize \begin{itemize} \item Bye \item Bye \end{itemize}
-
string¶ This is valid if and only if
- the expression is a
TexCmdAND has only one argument OR - the expression is a
TexEnvAND has only one TexText child
Return type: Union[None,str] >>> from TexSoup import TexSoup >>> soup = TexSoup(r'''\textbf{Hello}''') >>> soup.textbf.string 'Hello' >>> soup.textbf.string = 'Hello World' >>> soup.textbf.string 'Hello World' >>> soup.textbf \textbf{Hello World} >>> soup = TexSoup(r'''\begin{equation}1+1\end{equation}''') >>> soup.equation.string '1+1' >>> soup.equation.string = '2+2' >>> soup.equation.string '2+2'
- the expression is a
-
text¶ All text in descendant nodes.
This is equivalent to contents, keeping text elements and excluding Tex expressions.
>>> from TexSoup import TexSoup >>> soup = TexSoup(r''' ... \begin{itemize} ... \begin{itemize} ... \item Nested ... \end{itemize} ... \end{itemize}''') >>> soup.text[0] ' Nested\n '
-
Expressions¶
-
class
TexSoup.data.TexExpr[source]¶ General abstraction for a TeX expression.
An expression may be a command or an environment and is identified by a name, arguments, and place in the parse tree. This is an abstract and is not directly instantiated.
-
all¶ Returns all content in this expression, regardless of whitespace or not. This includes all LaTeX needed to reconstruct the original source.
>>> expr1 = TexExpr('textbf', ('\n', 'hi')) >>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True) >>> list(expr1.all) == list(expr2.all) True
-
append(*exprs)[source]¶ Add contents to the expression.
Parameters: exprs (Union[TexExpr,str]) – List of contents to add >>> expr = TexExpr('textbf', ('hello',)) >>> expr TexExpr('textbf', ['hello']) >>> expr.append('world') >>> expr TexExpr('textbf', ['hello', 'world'])
-
contents¶ Returns all contents in this expression.
Optionally includes whitespace if set when node was created.
>>> expr1 = TexExpr('textbf', ('\n', 'hi')) >>> list(expr1.contents) ['hi'] >>> expr2 = TexExpr('textbf', ('\n', 'hi'), preserve_whitespace=True) >>> list(expr2.contents) ['\n', 'hi'] >>> expr = TexExpr('textbf', ('\n', 'hi')) >>> expr.contents = ('hehe', '👻') >>> list(expr.contents) ['hehe', '👻'] >>> expr.contents = 35 Traceback (most recent call last): ... TypeError: ...
-
insert(i, *exprs)[source]¶ Insert content at specified position into expression.
Parameters: >>> expr = TexExpr('textbf', ('hello',)) >>> expr TexExpr('textbf', ['hello']) >>> expr.insert(0, 'world') >>> expr TexExpr('textbf', ['world', 'hello']) >>> expr.insert(0, TexText('asdf')) >>> expr TexExpr('textbf', ['asdf', 'world', 'hello'])
-
remove(expr)[source]¶ Remove a provided expression from its list of contents.
Parameters: expr (Union[TexExpr,str]) – Content to add Returns: index of the expression removed Return type: int >>> expr = TexExpr('textbf', ('hello',)) >>> expr.remove('hello') 0 >>> expr TexExpr('textbf', [])
-
string¶ All contents stringified. A convenience property
>>> expr = TexExpr('hello', ['naw']) >>> expr.string 'naw' >>> expr.string = 'huehue' >>> expr.string 'huehue' >>> type(expr.string) <class 'TexSoup.data.TexText'> >>> str(expr) "TexExpr('hello', ['huehue'])" >>> expr.string = 35 Traceback (most recent call last): ... TypeError: ...
-
-
class
TexSoup.data.TexEnv[source]¶ Abstraction for a LaTeX command, with starting and ending markers. Contains three attributes:
- a human-readable environment name,
- the environment delimiters
- the environment’s contents.
>>> t = TexEnv('displaymath', r'\[', r'\]', ... ['\\mathcal{M} \\circ \\mathcal{A}']) >>> t TexEnv('displaymath', ['\\mathcal{M} \\circ \\mathcal{A}'], []) >>> print(t) \[\mathcal{M} \circ \mathcal{A}\] >>> len(list(t.children)) 0
-
class
TexSoup.data.TexCmd[source]¶ Abstraction for a LaTeX command. Contains two attributes:
- the command name itself and
- the command arguments, whether optional or required.
>>> textit = TexCmd('textit', args=[BraceGroup('slant')]) >>> t = TexCmd('textbf', args=[BraceGroup('big ', textit, '.')]) >>> t TexCmd('textbf', [BraceGroup('big ', TexCmd('textit', [BraceGroup('slant')]), '.')]) >>> print(t) \textbf{big \textit{slant}.} >>> children = list(map(str, t.children)) >>> len(children) 1 >>> print(children[0]) \textit{slant}
Groups¶
-
class
TexSoup.data.TexGroup[source]¶ Abstraction for a LaTeX environment with single-character delimiters.
Used primarily to identify and associate arguments with commands.
-
classmethod
parse(s)[source]¶ Parse a string or list and return an Argument object.
Naive implementation, does not parse expressions in provided string.
Parameters: s (Union[str,iterable]) – Either a string or a list, where the first and last elements are valid argument delimiters. >>> TexGroup.parse('[arg0]') BracketGroup('arg0')
-
classmethod
-
class
TexSoup.data.TexArgs[source]¶ List of arguments for a TeX expression. Supports all standard list ops.
Additional support for conversion from and to unparsed argument strings.
>>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg1]', '{arg2}']) >>> arguments [BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')] >>> arguments.all ['\n', BraceGroup('arg0'), BracketGroup('arg1'), BraceGroup('arg2')] >>> arguments[2] BraceGroup('arg2') >>> len(arguments) 3 >>> arguments[:2] [BraceGroup('arg0'), BracketGroup('arg1')] >>> isinstance(arguments[:2], TexArgs) True
-
append(arg)[source]¶ Append whitespace, an unparsed argument string, or an argument object.
Parameters: arg (TexGroup) – argument to add to the end of the list >>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}']) >>> arguments.append('[arg3]') >>> arguments[3] BracketGroup('arg3') >>> arguments.append(BraceGroup('arg4')) >>> arguments[4] BraceGroup('arg4') >>> len(arguments) 5 >>> arguments.append('\n') >>> len(arguments) 5 >>> len(arguments.all) 6
-
clear()[source]¶ Clear both the list and the proxy .all.
>>> args = TexArgs(['\n', BraceGroup('arg1'), BracketGroup('arg2')]) >>> args.clear() >>> len(args) == len(args.all) == 0 True
-
extend(args)[source]¶ Extend mixture of unparsed argument strings, arguments objects, and whitespace.
Parameters: args (List[TexGroup]) – Arguments to add to end of the list >>> arguments = TexArgs([BraceGroup('arg0'), '[arg1]', '{arg2}']) >>> arguments.extend(['[arg3]', BraceGroup('arg4'), '\t']) >>> len(arguments) 5 >>> arguments[4] BraceGroup('arg4')
-
insert(i, arg)[source]¶ Insert whitespace, an unparsed argument string, or an argument object.
Parameters: >>> arguments = TexArgs(['\n', BraceGroup('arg0'), '[arg2]']) >>> arguments.insert(1, '[arg1]') >>> len(arguments) 3 >>> arguments [BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')] >>> arguments.all ['\n', BraceGroup('arg0'), BracketGroup('arg1'), BracketGroup('arg2')] >>> arguments.insert(10, '[arg3]') >>> arguments[3] BracketGroup('arg3')
-
pop(i)[source]¶ Pop argument object at provided index.
Parameters: i (int) – Index to pop from the list >>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}']) >>> arguments.pop(1) BracketGroup('arg2') >>> len(arguments) 2 >>> arguments[0] BraceGroup('arg0')
-
remove(item)[source]¶ Remove either an unparsed argument string or an argument object.
Parameters: item (Union[str,TexGroup]) – Item to remove >>> arguments = TexArgs([BraceGroup('arg0'), '[arg2]', '{arg3}']) >>> arguments.remove('{arg0}') >>> len(arguments) 2 >>> arguments[0] BracketGroup('arg2') >>> arguments.remove(arguments[0]) >>> arguments[0] BraceGroup('arg3') >>> arguments.remove(BraceGroup('arg3')) >>> len(arguments) 0 >>> arguments = TexArgs([ ... BraceGroup(TexCmd('color')), ... BraceGroup(TexCmd('color', [BraceGroup('blue')])) ... ]) >>> arguments.remove(arguments[0]) >>> len(arguments) 1 >>> arguments.remove(arguments[0]) >>> len(arguments) 0
-
Environments¶
-
class
TexSoup.data.TexNamedEnv[source]¶ Abstraction for a LaTeX command, denoted by
\begin{env}and\end{env}. Contains three attributes:- the environment name itself,
- the environment arguments, whether optional or required, and
- the environment’s contents.
Warning: Note that setting TexNamedEnv.begin or TexNamedEnv.end has no effect. The begin and end tokens are always constructed from TexNamedEnv.name.
>>> t = TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'], ... [BraceGroup('c | c c')]) >>> t TexNamedEnv('tabular', ['\n0 & 0 & * \\\\\n1 & 1 & * \\\\\n'], [BraceGroup('c | c c')]) >>> print(t) \begin{tabular}{c | c c} 0 & 0 & * \\ 1 & 1 & * \\ \end{tabular} >>> len(list(t.children)) 0 >>> t = TexNamedEnv('equation', [r'5\sum_{i=0}^n i^2']) >>> str(t) '\\begin{equation}5\\sum_{i=0}^n i^2\\end{equation}' >>> t.name = 'eqn' >>> str(t) '\\begin{eqn}5\\sum_{i=0}^n i^2\\end{eqn}'
Text¶
-
class
TexSoup.data.TexText[source]¶ Abstraction for LaTeX text.
Representing regular text objects in the parsed tree allows users to search and modify text objects as any other expression allows.
>>> obj = TexNode(TexText('asdf gg')) >>> 'asdf' in obj True >>> 'err' in obj False >>> TexText('df ').strip() 'df'