Shortcuts

Parsing Mechanics

Parsing mechanisms should not be directly invoked publicly, as they are subject to change.

Tokenizer

TexSoup.reader.tokenize(text)[source]

Generator for LaTeX tokens on text, ignoring comments.

Parameters:text (Union[str,iterator,Buffer]) – LaTeX to process
>>> print(*tokenize(r'\textbf{Do play \textit{nice}.}'))
\textbf { Do play  \textit { nice } . }
>>> print(*tokenize(r'\begin{tabular} 0 & 1 \\ 2 & 0 \end{tabular}'))
\begin { tabular }  0 & 1 \\ 2 & 0  \end { tabular }
TexSoup.reader.next_token(text)[source]

Returns the next possible token, advancing the iterator to the next position to start processing from.

Parameters:text (Union[str,iterator,Buffer]) – LaTeX to process
Return str:the token
>>> b = Buffer(r'\textbf{Do play\textit{nice}.}   $$\min_w \|w\|_2^2$$')
>>> print(next_token(b), next_token(b), next_token(b), next_token(b))
\textbf { Do play \textit
>>> print(next_token(b), next_token(b), next_token(b), next_token(b))
{ nice } .
>>> print(next_token(b))
}
>>> print(next_token(Buffer('.}')))
.
>>> next_token(b)
'   '
>>> next_token(b)
'$$'
>>> b2 = Buffer(r'\gamma = \beta')
>>> print(next_token(b2), next_token(b2), next_token(b2))
\gamma  =  \beta
TexSoup.reader.token(name)[source]

Marker for a token

Parameters:name (str) – Name of tokenizer
TexSoup.reader.tokenize_punctuation_command(text)[source]

Process command that augments or modifies punctuation.

This is important to the tokenization of a string, as opening or closing punctuation is not supposed to match.

Parameters:text (Buffer) – iterator over text, with current position
TexSoup.reader.tokenize_command(text)[source]

Process command, but ignore line breaks. (double backslash)

Parameters:text (Buffer) – iterator over line, with current position
TexSoup.reader.tokenize_line_comment(text)[source]

Process a line comment

Parameters:text (Buffer) – iterator over line, with current position
>>> tokenize_line_comment(Buffer('hello %world'))
>>> tokenize_line_comment(Buffer('%hello world'))
'%hello world'
>>> tokenize_line_comment(Buffer('%hello\n world'))
'%hello'
TexSoup.reader.tokenize_argument(text)[source]

Process both optional and required arguments.

Parameters:text (Buffer) – iterator over line, with current position
TexSoup.reader.tokenize_math(text)[source]

Prevents math from being tokenized.

Parameters:text (Buffer) – iterator over line, with current position
>>> b = Buffer(r'$\min_x$ \command')
>>> tokenize_math(b)
'$'
>>> b = Buffer(r'$$\min_x$$ \command')
>>> tokenize_math(b)
'$$'
TexSoup.reader.tokenize_string(text, delimiters=None)[source]

Process a string of text

Parameters:
  • text (Buffer) – iterator over line, with current position
  • delimiters (Union[None,iterable,str]) – defines the delimiters
>>> tokenize_string(Buffer('hello'))
'hello'
>>> b = Buffer(r'hello again\command')
>>> tokenize_string(b)
'hello again'
>>> print(b.peek())
\
>>> print(tokenize_string(Buffer(r'0 & 1 \\\command')))
0 & 1 \\

Mapper

TexSoup.reader.read_tex(src)[source]

Read next expression from buffer

Parameters:src (Buffer) – a buffer of tokens
TexSoup.reader.read_item(src)[source]

Read the item content.

There can be any number of whitespace characters between item and the first non-whitespace character. However, after that first non-whitespace character, the item can only tolerate one successive line break at a time.

item can also take an argument.

Parameters:src (Buffer) – a buffer of tokens
Returns:contents of the item and any item arguments
TexSoup.reader.read_math_env(src, expr)[source]

Read the environment from buffer.

Advances the buffer until right after the end of the environment. Adds parsed content to the expression automatically.

Parameters:
  • src (Buffer) – a buffer of tokens
  • expr (TexExpr) – expression for the environment
Return type:

TexExpr

TexSoup.reader.read_env(src, expr)[source]

Read the environment from buffer.

Advances the buffer until right after the end of the environment. Adds parsed content to the expression automatically.

Parameters:
  • src (Buffer) – a buffer of tokens
  • expr (TexExpr) – expression for the environment
Return type:

TexExpr

TexSoup.reader.read_args(src, args=None)[source]

Read all arguments from buffer.

Advances buffer until end of last valid arguments. There can be any number of whitespace characters between command and the first argument. However, after that first argument, the command can only tolerate one successive line break, before discontinuing the chain of arguments.

Parameters:args (TexArgs) – existing arguments to extend
Returns:parsed arguments
Return type:TexArgs
TexSoup.reader.read_arg(src, c)[source]

Read the argument from buffer.

Advances buffer until right before the end of the argument.

Parameters:
  • src (Buffer) – a buffer of tokens
  • c (str) – argument token (starting token)
Returns:

the parsed argument

Return type:

Arg