Biochemometric 2D NMR-Based Heterocovariance Analysis: A Targeted Approach for Identifying Bioactive Compounds in Complex Mixtures

Anal. Chem. 2025, 97, 41, 22508–22517: Figure 3. 1H-based 1D HetCA pseudospectrum (A) and 1H–13C HSQC-based 2D HetCA pseudospectrum (B) for TGR5 bioactivity of AMF1–AMF4. The x-axis and left y-axis of the 2D plot show the δH and δC values in ppm, the right y-axis represents the calculated correlation coefficient, color-coded from active (dark red), to inactive (dark blue) signals. Listed δH, δC value pairs see Table S4.
Biochemometric strategies combining bioactivity data with spectroscopic information can accelerate the discovery of bioactive compounds, yet complex mixtures and closely related analogs remain challenging. This study presents a targeted 2D NMR-based heterocovariance analysis (HetCA) workflow to identify chemical features that correlate positively or negatively with biological activity.
The approach was validated using artificial mixtures of pentacyclic triterpenes screened for modulation of RORγ and TGR5, and subsequently applied to a triterpene-rich Eriobotrya japonica leaf extract. The workflow enabled accurate, targeted identification of bioactive constituents, demonstrating the potential of 2D NMR HetCA as a powerful tool for bioactivity-guided compound discovery in complex natural products.
The original article
Biochemometric 2D NMR-Based Heterocovariance Analysis: A Targeted Approach for Identifying Bioactive Compounds in Complex Mixtures
Sigrid Adelsberger, Alexander F. Perhal, Lorenza Bertaina, Patrik F. Schwarz, Verena M. Dirsch, Judith M. Rollinger, Ulrike Grienke*
Anal. Chem. 2025, 97, 41, 22508–22517
https://doi.org/10.1021/acs.analchem.5c02419
licensed under CC-BY 4.0
Selected sections from the article follow. Formats and hyperlinks were adapted from the original.
In search of new drug leads, natural products (NPs) are an important source (1) with the disadvantage of having to deal with complex multicomponent crude extracts as starting materials. (2) Biochemometric approaches allow the precise and targeted detection of bioactive constituents. (3−7) The previously developed workflow ELINA (Eliciting Nature’s Activities) (8) primarily relies on 1H NMR-based statistical heterocovariance analysis (HetCA), (9) which detects specific chemical features, i.e. resonances, that are positively or negatively correlated with bioactivity prior to isolation. (10−13)
The interpretation of 1H NMR HetCA pseudospectra derived from mixtures of structural analogs with overlapping resonances pushes biochemometric methods to their limits. (14−16) In particular, triterpenes (TTs) represent a class of compounds that includes many structural analogs. (17) In addition to overlapping resonances, especially in the aliphatic region, TTs pose several analytical challenges, e.g. reduced sensitivity for UV absorption, (18) low volatility, (19) and poor ionizability in MS-based experiments. (20) Hence, 2D NMR experiments are crucial for unambiguous structure determination of TTs. (8)
The retinoic acid receptor-related orphan receptor gamma isoform RORγt and the Takeda G protein-coupled receptor 5 (TGR5) were selected as suitable targets since their endogenous ligands show a structural similarity to the investigated compound class of tetracyclic and pentacyclic TTs. (21,22) RORγ promotes the differentiation of naïve T cells into interleukin (IL)-17 producing T helper cells (Th17), which release pro-inflammatory cytokines such as IL-17 and IL-22. (21) Small molecules can act as inverse agonists of RORγ thereby reducing cytokine production and inflammation. (23) TGR5, also known as GPBAR-1 or M-BAR, (22) is involved in the regulation of metabolic diseases and inflammation. (24) When stimulated by agonists, TGR5 exerts anti-inflammatory effects via inhibition of transcriptional nuclear factor-κB (NF-κB) as well as modulating other signaling pathways. (25−27)
2D NMR techniques such as heteronuclear single quantum coherence (HSQC) have found their way into dereplication workflows. (28) However, statistical correlation of 2D NMR with bioactivity data, while promising, is still in its infancy. The aim of this study was to move beyond dereplication and to unravel RORγ and TGR5 modulating TTs in mixtures in a targeted, and unambiguous manner. Therefore, a biochemometric 2D NMR-based HetCA approach was developed. As a proof-of-concept, artificially mixed TTs underwent biochemometric analyses and as an authentic application sample, a dichloromethane extract of TT-rich Eriobotrya japonica leaves was selected.
Experimental Section
Fractionation and Isolation
EJD was fractionated by flash chromatography on an Interchim puriFlash 4250 instrument (Interchim, Montluçon, France) controlled by Interchim Software into 45 microfractions (MFs), named MF1–MF45. The device was equipped with a photo diode array (PDA) detector, an evaporative light scattering detector (ELSD), and a Silica HP 220 Gramm 25 μm 220 bar puriFlash column (PF-25SIHC-F0220, Interchim, Montluçon, France). 13.75 g of EJD were prepared for dry load application by mixing it with silica gel 60 (0.040–0.063 mm, CAS-No 7631-86-9, 60.08 g/mol, Merck KGaA, Darmstadt, Germany) in a ratio of 1:1. The gradient for n-hexane (A) and acetone (B), with a flow rate of 50.0 mL/min, was: 2% B for 3 column volumes (cv), 2% B to 5% B in 7 cv, 5% B for 5 cv, 5% B to 10% B in 7 cv, 10% B for 5 cv, 10% B to 15% B in 7 cv, 15% B for 5 cv, 15% B to 20% B in 7 cv, 20% B to 40% B in 5 cv, 40% B for 7 cv, 40% B to 100% B in 10 cv, 100% B for 7 cv. To combine in total 2121 test tubes with 20 mL each into 45 MFs, results of ultrahigh performance supercritical fluid chromatography (UHPSFC) and thin layer chromatography (TLC) were used. The fractionation of MF33 with the same device and detectors via a C18 HQ 12 Gramm 15 μm 22 bar column (PF-15C18HQ-F0012) was conducted with a flow rate of 15.0 mL/min with dd H2O (A) and acetonitrile (B): 5% B for 3 min, 5% B to 45% B in 4 min, 45% B to 98% B in 80 min, 98% B for 17 min, 98% to 5% B in 0 min, 5% B for 15 min. 124.20 mg of MF33 were homogenized 1:1 with 125.0 mg Silica gel 60 for dry load application. UHPSFC and TLC analyses were used to guide the pooling of 359 test tubes into eight fractions A1–A8. Fractionation of 30 mg A3 in methanol (25.29 mg/mL) was performed on a SFC Prep-15 device (Waters, Milford, MA, USA) equipped with an ELSD and PDA detector. A Torus 1-aminoanthracene column (OBD, 130 Å, 10 × 250 mm, 5 μm) (Waters, Milford, MA, USA) was kept at 45 °C. The mobile phase consisted of supercritical CO2 and ethanol absolute >99.7% with a gradient [time (min)/% B]: 0.0/10, 1.5/10, 16.5/11, 18.5/50, 20.5/50, 21.5/10, 22.5/10. Three fractions, named B1–B3, were collected according to the ELSD chromatogram. TLC was performed using silica gel 60 F254 plates (Merck, Darmstadt, Germany). The mobile phase consisted of methanol, dichloromethane and ethyl acetate with 10:1.2:0.5 v/v. The analysis was performed at visible light after derivatization with vanillin (1% in methanol) and sulfuric acid (5% in methanol) and temperature increase to 105 °C for 2 min.
Ultrahigh-Performance Supercritical Fluid Chromatography (UHPSFC)
UHPSFC analysis of MF1–MF45 was performed on an Acquity UPC2 device (Waters, Milford, MA, USA) equipped with a PDA detector and ELSD. A Torus 1-aminoanthracene column (130 Å, 3.0 × 100 mm, 1.7 μm) (Waters, Milford, MA, USA) was kept at 45 °C. Supercritical CO2 (A) and methanol (B), flow rate 1.0 mL/min, were used for the gradient [time (min)/% B]: 0/10, 6.5/19, 7.5/50, 8.5/50, 9/10, 10/10. Samples were prepared with 5 mg/mL in n-hexane and 2-propanol 7:3. The measurements for A1–A8 and B1–B3 (5 mg/mL in acetonitrile and methanol) were performed in the same way with some changes. The gradient [time (min)/% B] was: 0/10, 15/13, 15.2/50, 16.2/50, 16.50/10, 17/10. Additionally, a single quadrupole mass spectrometry detector (QDa by Waters (Milford, MA, USA)) was used with a flow rate of 0.6 mL/min of 10 μM ammonium formate (in dd H2O/methanol 1:9) as makeup solvent. Both positive (15 V cone voltage) and negative mode (30 V) were recorded with a scan range from 100.00 to 1000.00 Da.
NMR Data Acquisition
All NMR experiments were acquired on a spectrometer consisting of a Bruker Ascend 500 MHz magnet (Bruker BioSpin, Billerica, MA, USA), a 5 mm triple-resonance Prodigy CryoProbe, a SampleJet automated sample changer, and an AVANCE NEO console. All measurements were performed at 298 K. The receiver gain was kept constant to ensure that the obtained spectra can be compared quantitatively. The resonance frequency was 500.19 MHz for 1H NMR and 125.77 MHz for 13C NMR. 1H NMR spectra were acquired using a standard Bruker pulse sequence program with default settings (zg30) and 128 scans. The acquisition was performed with FID resolution = 0.183 Hz, pulse width (PW) = 3.3 μs (corresponding to 1/3 of the 90° pulse length of 9.9 μs), relaxation delay (D1) = 1.5 s (in addition to the acquisition time of 5.45 s), size of real spectrum (SI) = 64k, and spectral width (SW) = 6009.615 Hz. The total acquisition time for each 1H NMR experiment was ∼15 min. 1H–13C HSQC NMR spectra were recorded using a standard Bruker pulse sequence program with default settings (hsqcetgpsisp2) and the following parameters: 4 scans, D1 = 2 s, a SW of 25.2 kHz and an acquired spectral size of 256 data points for 13C, and 7.5 kHz and 2048 data points for 1H. The total acquisition time for each experiment was ∼37 min. For all samples, both spectra were recorded. For structure elucidation of B1– B3, in addition to 1H NMR spectra and 1H–13C HSQC, 13C APT, 1H–13C HMBC, and 1H–1H COSY experiments were also conducted (parameters see Table S8).
Results and Discussion
1D HetCA Pseudospectra
To perform HetCA, two prerequisites must be met: (i) a bioactive crude extract must be fractionated into microfractions (MFs) that contain an intersecting quantitative variance of constituents, and (ii), this quantitative variance must be reflected in a correlated ascending or descending variance of bioactivity. (8) 1H NMR experiments of the selected microfraction package (AMF1–AMF4) were recorded as basis for the 1D HetCA of the proof-of-concept study. To ensure consistent signal-to-noise ratios, samples were prepared in the same way and measured under the same conditions (Figure S8). For the statistical correlation between chemical features and bioactivity data, the raw 1H NMR spectral data were processed in MATLAB according to a standard protocol, (8,9) to obtain color-coded pseudospectra with positively (red, upward signals) or negatively (blue, downward signals) correlated resonances based on the correlation coefficient (cc), calculated via normalization of the covariance (Figure 2).
The compounds contained in the MFs possess no aromatic structures, only a few alkenyl hydrogens, and many aliphatic protons. Thus, resonances in the chemical shift region δH 2.5–6.0 are limited, while the upfield region (δH 0.0–2.5) is rather crowded with overlapping signals. Based on the 1D HetCA results, 1 was statistically determined to be active, and 8 to be inactive. This is consistent with the experimental TGR5 bioactivity data. 7 and 12 could not be assigned to any noticeable signals in the 1D pseudospectrum and were therefore not covered by 1D HetCA. Quantitatively, 8 was the main component in the AMF1–AMF4 set, based on the high intensity of its proton resonances. Due to overlapping signals, the 1D HetCA analysis provided no further insight.
Anal. Chem. 2025, 97, 41, 22508–22517: Figure 2. (A) 1D HetCA pseudospectrum of AMF1–AMF4 depicting TGR5 bioactivity with δH resonances on the x-axis and the correlation coefficient on the y-axis. (B) Zoom into the region δH 2.30–5.50 and assignment to the involved TTs 1 and 8. (C) In the aliphatic region, only negatively correlated signals related to 8 could be determined.
2D HetCA Pseudospectra for the Proof-Of-Concept Study
To overcome the limitations of the 1D NMR-based approach, a 2D HetCA plot for the same sample set (AMF1–AMF4) was generated based on 1H–13C HSQC measurements (Figures S9–S12). To calculate HetCA with 2D NMR spectra, peak picking was performed for each individual spectrum. All cross-peak intensities together with their δH and δC values were collected in one file and imported into MATLAB. The next step was to import the bioactivity values for each microfraction package. HetCA was then performed in the same way as for the 1D spectra. (8,9) Finally, the 2D HetCA pseudospectrum was visualized, again using a color-code to show the cc. Independent of the original intensity, each cross-peak in the 2D HetCA spectrum was normalized to the same size so that all calculated signals have the same visibility. This means that each calculated pseudo cross-peak is represented graphically as a data point of uniform size, minimizing the risk of overlooking less intense signals.
Lessons learned from the 1D HetCA results were considered when preparing the data for the 2D analysis. For instance, essential marker signals in the 1D pseudospectrum, such as δH 5.25 for 1 and δH 3.18 for 8, were specifically targeted for 2D HetCA. The 2D HetCA of AMF1–AMF4 resulted in a 2D pseudospectrum with δH values on the x-axis, δC values on the left y-axis and the color-coded cc on the right y-axis (Figure 3).
Anal. Chem. 2025, 97, 41, 22508–22517: Figure 3. 1H-based 1D HetCA pseudospectrum (A) and 1H–13C HSQC-based 2D HetCA pseudospectrum (B) for TGR5 bioactivity of AMF1–AMF4. The x-axis and left y-axis of the 2D plot show the δH and δC values in ppm, the right y-axis represents the calculated correlation coefficient, color-coded from active (dark red), to inactive (dark blue) signals. Listed δH, δC value pairs see Table S4.
2D HetCA plots display signals based on δH, δC values, enabling a clearer interpretation due to a distinction between similar δH values that overlap in the 1D HetCA plot. In the δH 0.0–2.5 region, it allowed better differentiation of active and inactive signals. 1 could be assigned as positively correlated based on red cross-peaks at δH, δC 0.72, 55.0; 0.93, 15.2; 0.95, 20.9; 1.08, 23.3; 1.33, 38.8; 1.50, 32.7; 1.67, 23.9; and 2.19, 52.5. 8 was confirmed inactive by blue cross-peaks at δH, δC 0.76, 15.1; 0.82, 16.1; 0.97, 28.1; 0.98, 14.9; 1.02, 16.3; 1.60, 48.9; and 2.38, 47.9, but two false calculations occurred (1.21, 29.3 (dark orange) and 1.93, 29.2 (medium red)). The moderately active status of 12 was demonstrated by signals of different colors at δH, δC 0.87, 11.1 (light red); 0.90, 23.4 (green); 0.94, 16.9 (green); 0.97, 49.5 (orange); 1.29, 33.9 (medium red), 1.74, 44.6 (orange), 1.77, 29.2 (orange), 2.01, 39.7 (green), and 2.16, 39.7 (orange). 7 could be correctly assigned as inactive based on signals at δH, δC 1.04, 47.3 (yellow); 1.33, 36.7 (light blue); and 2.10, 37.2 (turquoise).
In the δH 3.00–5.50 region of the 1D HetCA plot, the signals at δH 4.58, 4.68, 3.80, and 3.33, belonging to 8, were correctly calculated as negatively correlated. In the 2D plot, the cross-peaks calculated for these resonances are displayed as signals with a weak or moderate positive correlation with activity (yellow, orange). Additionally, two 2D HetCA calculations in the same range were incorrect. The δH, δC value pair 5.30, 122.0 belonging to the moderately active compound 12 was predicted to be inactive (blue), and the δH, δC value pair 3.44, 76.1 belonging to the inactive compound 7 was calculated to be moderately active (orange). In the same ppm area, the 2D approach correctly identified 12 as moderately active (based on δH, δC 3.73, 71.2; 3.68, 75.9; and 3.43, 71.2), and 1 as active (based on δH, δC 5.25, 126.0).
Of 51 signals in the 2D plot, 34 were correctly classified as active, moderately active, or inactive, five were ambiguous, and 12 incorrect. Dark red cross-peaks appeared only for active compound 1 and moderately active compound 12, while nine of 14 dark blue cross-peaks matched inactive compound 8. 1D HetCA only enabled the detection and correct TGR5 bioactivity classification of 1 and 8, 2D HetCA provided the mapping and classification of 1, 7, 8, and 12. Not only was the dispersion and visibility of the signals in the 2D HetCA improved, but also a greater variation of the cc color code gradations was observed. Notably, a subsequent targeted isolation would focus on the correct NMR signals of the 2D HetCA.
1D and 2D NMR-Based HetCA Analyses of MF33–35
The 1D NMR-based HetCA analysis revealed positively correlated and negatively correlated 1H NMR resonances associated with activity on RORγ; the 2D NMR-based HetCA calculations yielded 207 cross-peaks with color coding ranging from dark red to dark blue (Figure 4).
Anal. Chem. 2025, 97, 41, 22508–22517: Figure 4. 1D HetCA pseudospectrum of MF33–35 based on RORγ activity (A), with enlarged regions δH 1.50–5.05 (B) and δH 0.50–1.50 (C). (D) 2D HetCA pseudospectrum, based on 1H–13C HSQC data, with δH values on the x-axis, δC values on the left y-axis, correlation coefficient on the right y-axis. Positively correlated signals shown in dark red, negatively correlated signals in dark blue.
While many positively correlated features in the 1D plot matched the known bioactive compounds 1 and 2, others remained unassigned. A positively correlated doublet of doublets at δH 4.55 showed no correspondence to any reference TT and was used as a marker feature in subsequent 2D NMR-based HetCA and targeted isolation. In the 2D NMR-based HetCA, various positively and negatively correlated δH, δC pairs (Table S5) were observed, mostly at extreme cc values (1.0, dark red and −1.0, dark blue). Comparison with TT references 1 and 2 provided insights into the possible chemical structures responsible for the respective cross-peaks. Signals in the region between δH, δC 5.14–5.36, 125.2–128.9 likely arise from alkene moieties, while those in the region δH, δC 2.18–4.55, 41.0–78.9, could be assigned to protons of secondary alcohol carbons or tertiary protons at C-18 adjacent to a carboxyl group. Additional cross-peaks may originate from methylene groups adjacent to hydroxyl groups, next to the C-12 double bond or to a carboxyl group, as well as from tertiary protons (e.g., at C-9 or C-5), and various methylene and methyl groups.
To filter out the cross-peaks with the highest positive correlation to activity in the 2D pseudospectrum, signals with a cc > 0.990 were evaluated. Among the 14 δH, δC pairs identified, the marker feature at δH 4.55 (δC 69.3), reappeared in this list. These signals spanned the entire 2D pseudospectrum (Table S6), suggesting diverse structural origins near different functional groups. None of the 14 features could be assigned to reference compounds.
All signals previously assigned in 1D HetCA also appeared as positively correlated with bioactivity in the 2D HetCA plot, with cc values between 0.934 and 0.985. The marker feature at δH, δC 4.55, 69.3 again showed the highest value (0.9916). Several methyl proton cross-peaks had strong correlations (cc > 0.97), including δH, δC 0.86, 21.9; 0.85, 21.7; 0.85, 19.7; 0.84, 19.7; 0.79, 17.0; 0.75, 17.0; and 0.70, 12.1. Unlike in 1D HetCA, the 2D plot allowed clearer differentiation of overlapping signals, mostly present in the aliphatic region, and a graded assessment of bioactivity. All 14 selected features, including the marker signal at δH, δC 4.55, 69.3, showed higher signal intensity in MF33 compared to MF34 and MF35, leading to its selection for targeted isolation of the underlying putative RORγ-modulating compound.
Conclusions
In this study, a 2D HetCA approach was developed using an artificially mixed set of TTs. In a follow-up application, 2D HetCA pinpointed to the bioactive compound 16 in a dichloromethane extract of E. japonica leaves (compound 16 content in EJD: 1.98%; Figure S25). This demonstrates the ability of the 2D HetCA approach to identify compounds that might otherwise be overlooked, even when they exhibit similar bioactivity to compounds present at higher concentrations. Although evaluating NMR data in 2D requires longer measurement times and more data interpretation, 2D HetCA provides a reduced spectral overlap due to the second dimension (δC) and access to information that cannot be deduced from 1D HetCA pseudospectra. Further advantages include the finely subdivided gradation of the color coding and the size standardization of the cross-peaks in the pseudospectrum, which makes even less intense signals more visible.
In summary, 2D HetCA is a useful complement to the 1D HetCA approach and provides a valuable tool in the search for bioactive compounds in a targeted, resource-efficient and unambiguous manner, to reduce the number of necessary bioactivity tests and to fish out compounds from complex mixtures.




