The separation of recombinant human growth hormone variants by UHPLC. - PubMed - NCBI
A Nature Research Journal. The quality of therapeutic proteins such as hormones, subunit and conjugate vaccines, and antibodies is critical to the safety and efficacy of modern medicine. Identifying malformed proteins at the point-of-care can prevent adverse immune reactions in patients; this is of special concern when there is an insecure supply chain resulting in the delivery of degraded, or even counterfeit, drug product.
Identification of degraded protein, for example human growth hormone, is demonstrated by applying automated anomaly detection algorithms. Detection of the degraded protein differs from previous applications of machine-learning and classification to spectral analysis: The use of therapeutic proteins has become common practice in medicine and their success has stimulated significant investment towards the discovery and manufacture of a wide range of biopharmaceuticals.
A central challenge of protein therapies is the potential to trigger antibody formation. Triggering such an immune response has consequences for both the efficacy of the therapy and the safety of the patient 1 , 2. Several factors have been correlated to immunogenicity: Rapid quality assessment of therapeutic proteins is essential when there is the possibility of an insecure supply chain — resulting in the shipment of degraded proteins or in the sale of counterfeit products — or when there is a need to closely monitor and control biopharmaceutical manufacturing.
Formulated proteins require cold distribution and storage, a major limiting factor for treating populations in rural and developing areas. Point-of-care applications are especially challenging as a quality determination must be made rapidly on small sample volumes in clinical settings that may be resource constrained.
The above immunogenic factors all involve changes in the molecular characteristics of the protein that may be possible to measure at the point of care using Raman spectroscopy.
Here we present a Raman instrument capable of measuring proteins in solution at dose relevant concentrations and with a sample volume small enough that it could be taken from the overfill volume used in vial packing of drug products 12 , and demonstrate a classification technique that is capable of identifying degraded material based on changes in the scattered Raman spectrum.
Raman spectroscopy, discovered in by C. Krishnan 13 , is commonly used in the analysis of biological materials 14 and has been used for the verification of primary and secondary structure of proteins since 15 , 16 , Previous work has been limited to high protein concentrations because of the small cross-sections for Raman scattering coupled with the potential for photo-degradation of the protein under high illumination. Surface Enhanced Raman Scattering is a technique where the protein is adsorbed to a metallic substrate, resulting in resonance enhancement of the excitation field 18 , 19 , The resonance enhancement enables detection down to the single molecule level The SERS signal is related to the constituent amino acids and amide backbone of the protein 21 , 22 , but the spectrum differs from the spectra of the protein in solution due to the fact that the SERS technique is sensitive to the orientation of the adsorbed protein Therefore, if the orientation of the protein cannot be controlled there will not be a unique spectrum associated with any given protein when using SERS.
Additionally, this technique requires a consumable substrate with a limited shelf life 24 , Fortunately these complications can be avoided as several therapeutic proteins are provided in formulation at concentrations high enough that the signal enhancement from SERS is not needed.
Here we show that characterization of these proteins is possible using spontaneous Raman spectroscopy of free-proteins in solution, provided the sample holder and optical system are well designed. Our approach was to employ the double-pass, confocal optical geometry illustrated in Fig.
This sample volume is smaller than the U. The total volume is kept low by recognizing that the volume of sample directly illuminated by the excitation light will produce most of the collected Raman signal. A schematic diagram of the Raman system beyond the sample holder is shown in c. For the concentrations and samples examined here, the Raman signal has significantly lower intensity than the raw signal requiring careful background subtraction to retrieve the protein Raman signal.
To gauge the repeatability of the system several spectra were taken of rhGH over a year long period at a fixed concentration and spectra of Insulin over a three month period across a range of concentrations as indicated in e. All spectra are plotted after normalization. Alongside the Raman instrument reported here, we have developed a classification algorithm that is capable of classifying a protein that has degraded. While automated classification algorithms have been widely used for spectral classification 26 , 27 , 28 , 29 , 30 , the challenge here is to develop a classifier that can detect a myriad of degraded protein forms that that the algorithm has never encountered before.
This is a specific case of the general One Class Classification problem in chemometrics This problem occurs in a number of domains with examples including anomaly detection in gas turbines 32 and classification of documents as relevant to a user One class classification has been applied for Raman spectroscopy based identification of unknown bacterial strains 34 and the detection of simple mixtures of chlorinated solvents This work expands the technique to the Raman spectral analysis of complex molecules with a particular focus on therapeutic proteins.
In both cases the goal is to train the classifier so that it ignores systematic variations associated with the instrument such as shot noise, alignment drift, etc. In the first approach the algorithm is trained using only spectra from high-quality recombinant human growth hormone rhGH and in the second approach the training uses both the high-quality rhGH and spectra from three unrelated high-quality proteins of various sizes.
Raman spectra were collected using a purpose built system, shown in Fig. A microscope objective was used to focus the excitation light and collect the Raman scattered light. The sample holder did not need to be any larger than this because the increased sample volume would not improve the amount of sample-laser interaction and the walls and mirror in the sample holder were metallic, so they added no background Raman signal. The depth of the sample holder was set to allow the microscope objective to focus light on the back mirror to achieve an increase in excitation and collection as described above, while at the same time moving the excitation spot as far from the fused silica window as possible to reduce the background signal from the window.
This prevented the formation of an air bubble between the sample and mirror, resulted in a consistent sample volume and geometry with no meniscus, and effectively sealed the sample preventing it from drying during long spectrum acquisitions.
The inverted configuration ensured that neither the sample holder nor the microscope objective needed to move during sample loading or cleaning operations. This allowed the system to be used without re-aligning the optics each time a new sample was presented to the system. The impact of the background signal due to the capillary material can be mitigated by choosing a material with a small Raman cross section such as Teflon, but this approach has not been shown to improve the protein limit of detection The filtered light was coupled into the optical path of the microscope objective by a Semrock long pass filter operated as a dichroic mirror.
Collected light was passed back through the Semrock filter and then through an additional long pass filter to further attenuate Rayleigh scattered excitation light before being coupled into a fiber bundle for delivery to the spectrometer. A diagram of the system is given in Fig.
Spectra were acquired using an Acton SPi grating spectrometer. The use of a mirror in this sample holder does significantly increase the amount of Rayleigh scattered light collected by the system. To prevent the scattered excitation light from saturating the detector the grating in the spectrometer was adjusted to position the scattered excitation light off of the sensor.
To generate the spectra presented here the following procedure was used. After cosmic ray removal, the individual spectra were scaled to match the median value of pixels to across the spectra set to account for variation in the laser power. This region was chosen because it was beyond the filter turn on region and generally contained no strong spectral peaks.
A representative sample spectrum was created by taking the mean value of the filtered and smoothed spectra at each wavelength, and a noise spectrum was created for each measurement by taking the standard deviation at each wavelength across the sample spectra.
The sample spectrum resulting from this processing contained Raman and fluorescence signal from the protein, sample holder, and the buffer that made up the bulk of the protein solution. A characteristic background spectrum for each protein buffer was created using the process described above but with the appropriate buffer solution rather than a protein sample.
The characteristic background spectrum for the rhGH buffer is shown in Fig. To generate the protein Raman spectra presented in the results section this characteristic background spectrum was subtracted from the sample signal and any residual fluorescence was removed by performing a positive residual style polynomial subtraction as described in ref. Calibration of the Raman shift was performed using a polystyrene sample with a well-known Raman spectrum.
The proteins used in this experiment were obtained from several suppliers. The interferon was used directly after reconstituting to avoid the potential damage of a freeze-thaw cycle. The IgG was supplied in phosphate buffered saline containing 0.
Lyophilized insulin powder was obtained from Sigma I and reconstituted in a 0. The proteins examined here were supplied in, or reconstituted into, variants of Phosphate Buffered Saline PBS except for insulin which was reconstituted in 0.
Drug product buffers may contain sugars or similar molecules with unique Raman signatures but here we maintained similar buffers where possible to allow for direct comparison of the limits of detection and to avoid buffer exchange. A series of oxidized human growth hormone samples were prepared from reference material in order to examine the sensitivity of the instrument to changes in proteins due to degradation processes. This resulted in varying levels of oxidation of the Methionine groups M14, M, and M Of particular importance are the methionine residues which each contain a sulphur atom capable of forming a sulfoxide bond.
The degraded material shows significant, but variable, differences when compared with the control material. In preparation for digestion, the proteins were then dialyzed three times in Tris-HCl buffer, pH at 6. The proteins were then digested with Trypsin at room temperature overnight to fragment the proteins into peptides while minimizing artificial oxidation.
Thermo Proteome Discover 1. The tolerance for precursor ions was set at 50 ppm and the tolerance for product ions was set at 1. Carbamidomethylation of cysteine was selected as a static modification and deamidation on asparagine and oxidation on methionine were selected for dynamic modification search. The final confirmation of the modifications was by manual search. The goal of this work was to develop a classifier capable of recognizing if a particular sample under test was high-quality rhGH or degraded rhGH.
The standard classification algorithms for this problem work by defining a boundary between the two classes, based on examples from each class. These algorithms assume that the samples presented to the classifier adequately sample the space of possible spectra generated by each class. This problem falls into the general field of one-class classification A variety of classifiers including Support Vector Machines 40 , 35 , neural networks 33 , and nearest-neighbors 41 have been applied to this problem.
The approach used in this paper will be to define an ellipsoid in N dimensions where N is the size of the spectrum for classification , making the classifier presented a variant of the Minimum Volume Ellipsoid approach developed by Rousseuw 42 , 43 , Given the high dimensionality of the data spectral data points in the raw data and the small number 5 of repeated experiments generated degraded samples overfitting in the classifier is an important concern.
Information about protein structure and condition is not uniformly distributed in the spectrum of the protein, so the first step after spectral processing is a dimension reduction to reduce the impact of spectral variation in low information regions on the classification algorithm.
After background subtraction and residual polynomial fitting the first step in dimension reduction is to truncate the spectrum to an appropriate spectral region. The data is then scaled into the region , to reduce the impact of small changes in concentration. The PCA algorithm finds a set of vectors in the original space that can be used to represent the data. The vectors are ranked by the amount of variation in the original data each vector captures.
By using a set of principal components smaller than the dimension of the original space, the length of the vector representing each data point is reduced. Additionally, if the data set for the PCA algorithm is chosen correctly then the principal components themselves should be directly related to spectral information relevant to classifying the proteins.
Two methods of performing the initial principal component analysis for dimensionality reduction were investigated.