Installation and Run#

Pre-Requisites#

PyParse was written for Python v3.7, and the following packages should be installed if not already available:

The suggested version of each package is provided in parentheses. A conda text/yml file is provided as part of the Github repository. PyParse has also been tested and used in Python version 3.11 using up to date versions of the above libraries.

PyParse is currently capable of analysing LCMS data in the Waters OpenLynx™ browser report (.rpt) file format (single file for the whole plate), or LCMS data saved in the .daml file format from Shimadzu machines (one .daml file per well). All data for the analysis must be in one of these two formats.

Preparing a Platemap#

To run the analysis correctly, PyParse requires a platemap detailing the structures of the materials that are expected in each well. This platemap should be provided as Comma-Separated-Values (.csv) file.

As a minimum, the structure of the desired product expected in each well is required, provided as a SMILES string:

Table 1. Example Platemap#

Well

Desired product SMILES

A1

CN(C)c1ccccc1

A2

CN(C)c1ccccc1

A3

CN(C)c1ccccc1

A4

CN(CC)c1ccccc1

A5

CN(CC)c1ccccc1

A6

CN(CC)c1ccccc1

You can also, optionally, specify the structure of the internal standard as well as the structures of any byproducts and the limiting reactant that may be expected. Add additional columns into the platemap using the following headers:

  • InternalSTD SMILES

  • Limiting Reactant SMILES

  • Byproduct1 SMILES

  • ByProduct2 SMILES (etc)

Note

You can only specify a single product and limiting reactant in a well, though multiple of each can appear across a plate.

Multiple byproducts can be listed for a well, but each byproduct should appear in its own column. This may result in some cells in the platemap table being empty, where a particular byproduct cannot be observed in that well. This is expected behaviour.

Caution

Only a single internal standard (InternalSTD SMILES) can be specified for a given plate.

Further detail can be provided in the platemap by the addition of a retention time and a name for each material. This is done by adding “RT” or “Name” respectively after the compound class in the column header, e.g.

  • Desired Product Name

  • Desired Product RT

  • Byproduct3 Name

Note

All column headers are case insensitive, but must otherwise match exactly the names given here.

Table 2. Fully Elaborated Platemap#

Well

Desired Product SMILES

Desired Product Name

Limiting Reactant SMILES

Limiting Reactant RT

InternalSTD SMILES

Byproduct1 SMILES

Byproduct2 SMILES

A1

CN(C)c1ccccc1

DiMethyl Product

Brc1ccccc1

1.01

Fc1ccccc1

Oc1ccccc1

A2

CN(C)c1ccccc1

DiMethyl Product

Brc1ccccc1

1.01

Fc1ccccc1

Oc1ccccc1

A3

CN(C)c1ccccc1

DiMethyl Product

Brc1ccccc1

1.01

Fc1ccccc1

Oc1ccccc1

A4

CCN(CC)c1ccccc1

DiEthyl Product

Clc1ccccc1

0.95

Fc1ccccc1

c1ccccc1

A5

CCN(CC)c1ccccc1

DiEthyl Product

Clc1ccccc1

0.95

Fc1ccccc1

c1ccccc1

A6

CCN(CC)c1ccccc1

DiEthyl Product

Clc1ccccc1

0.95

Fc1ccccc1

c1ccccc1

Specifying a Retention Time in the Platemap#

By specifying a retention time in the platemap, PyParse can be guided to use only the peaks which are present close to that time.

In a typical analysis, there may be many sets of peaks which contain the m/z which corresponds to that compound. In the below example, a retention time of 0.73 min is provided in the platemap, which forces PyParse to use only peaks at or close to that retention time.

A diagram describing the change in output when a retention time is specified.

Figure 1. Hit validation graph before and after a retention time is specified#

Tip

See the section on PyParse Outputs for further information about Hit Validation Graphs.

Running a PyParse Analysis#

PyParse has been written as a standalone script and called from the command line, to carry out the automated analysis of a single reaction plate at a time.

Each analysis should specify, at a minimum, the locations of the LCMS data and the platemap. By default, the output of this run will be saved in a new folder called “output” in the current working directory.

Standard Analysis for a 96-Well Plate for Water’s data#
python PyParse.py example_rpt.rpt example_platemap.csv
Standard Analysis for a 96-Well Plate for Shimadzu data#
python PyParse.py folder_containing_daml_files example_platemap.csv -i Shimadzu

Note that for data generated using a Shimadzu machine, the instrument (-i) must be specified.

Commonly Used Optional Parameters#

PyParse is now configured to determine the number of rows and columns in the plate for data obtained from both Waters and Shimadzu machines. Furthermore, it generates heatmaps for all different plot types by default.

The below options allow you to adjust the number of rows and columns in the heatmaps [-r and -c] (e.g. only the top left of the physical plate contained samples), adjust the default output folder [-o], and specify an alternative metric to determine the “best well” for a compound [-pt]

Use the following optional parameters to change each of these:

  • -o: specify the output_directory

  • -r: specify the number of rows in the plate

  • -c: specify the number of columns in the plate

  • -pt: specify the metric by which different wells should be compared. The following options are available:

    • Parea: Compare wells by their LCMS UV percentage area of product

    • P/SM+P: Compare wells by their ratio of product to the sum of starting material and product, using UV area absolute

    • P/STD: Compare wells by their ratio of product to internal standard, using UV area absolute

    • corrParea: As Parea, but normalised to the maximum value observed for that product

    • corrP/SM_P: As P/SM+P, but normalised to the maximum value observed for that product

    • corrP/STD: As P/STD, but normalised to the maximum value observed for that product