Regional comparison study of Epimedium koreanum using UHPLC-QTOF/MS-based metabolomics approach

The untargeted metabolomics‑based molecular networking approach combined with multivariate analysis, proves to be an effective strategy for distinguishing raw materials in herbal medicine according to specific criteria. It exhibits the correlations between chemical constituents and the geographical habitats of plants, providing a valuable tool for ensuring quality control in mass production within the industry. In this study, we conducted a comprehensive investigation of the chemical compositions of Epimedium koreanum Nakai and performed comparative analyses on four extracts collected from distinct regions in South Korea using untargeted metabolomics tools. Through the comprehensive use of UPLC‑QTOF/MS analysis and advanced statistical techniques, we elucidated the chemi‑ cal composition, leading to the identification of key chemical markers. Additionally, the molecular networking analysis revealed distinct clusters of flavonoids and phenolic acids, highlighting the influence of regional factors on the metabolite profiles. These findings offer a promising avenue for enhancing quality control and traceability in the herbal medicine industry, underscoring the important role of geographical variation in the chemical profiles of herbal products


Introduction
Metabolomics is a scientific field dedicated to the comprehensive examination of small molecules and the metabolites within organisms.It involves the identification and quantification of these molecules at specific time points within a biological system, using advanced analytical instruments such as mass spectrometry (MS) and Nuclear Magnetic Resonance (NMR).Renowned for its potential to accelerate the discovery of new bioactive compounds [1], the effectiveness of metabolomics is significantly heightened when integrated with statistical methods, such as multivariate analysis.This integration not only simplifies data presentation but also substantially enhances the interpretation of the data obtained from these techniques [2].
Multivariate analysis, employed on data sets with multiple variables, effectively discerns patterns, correlations, and structures within the data [3].Especially in the case of plant metabolite profiling studies using UPLC-QTOF/ MS, this analysis helps in pinpointing potential chemical markers based on diverse criteria.These markers, quantifiable characteristics or indicators, are paramount for evaluating plant quality and distinguishing between plant sources from distinct regions [4].Researchers increasingly use computational methods for the extraction and analysis of chemical information.Molecular networking (MN), a notable computational strategy, substantially improves the visualization and interpretation of MS data.This advancement aids in the identification of molecules and mining of chemical markers [5].Integrating multivariate analysis with molecular networking presents a promising untargeted approach for the identification of chemical markers.
Epimedium koreanum Nakai (EKN), commonly known as horny goat weed, belongs to the Berberidaceae plant family, indigenous to South Korea and widely found in China and Japan [6,7].EKN is a traditional herb with historical usage for functional food, nutraceutical, and pharmaceutical applications.More than 130 secondary metabolites have been analyzed and classified from different Epimedium species [8] including prenyl-flavonoids, lignans, phenols glycosides, phenylethanoid glycosides, sesquiterpenes, acids, alkaloids, xanthones, and aldehydes.Notably, EKN is rich in flavonoids, especially 8-prenyl-flavonoid derivatives [9,10].These secondary metabolites of EKN are responsible for various bioactivities, including antimicrobial, antioxidant, antimutagenic, immunomodulatory, estrogenic, hypercholesterolemia-regulating, anti-rheumatic, and androgenic activities [11][12][13][14][15][16][17][18][19].Given the broad usage of EKN, pinpointing its chemical markers is worth for ensuring the quality and authenticity of its sources.This study focuses on analyzing four EKN extracts from distinct South Korean regions to authenticate their chemical constituents and distinguish the regional sources, employing an integrated, untargeted metabolomics approach combining multivariate analysis and molecular networking.

Collection and preparation of samples
Aerial parts of Epimedium koreanum Nakai (EKN) were collected from wild fields located in four different regions, [S1] Wando, Jeollanam-do; [S2] Cheorwon, Gangwon-do; [S3] Yongin, Gyeonggi-do; and [S4] Hwacheon, Gangwon-do in South Korea (Fig. S1).The collected plants were obtained from the Natural Product Central Bank at the Korea Research Institute of Bioscience and Biotechnology (Daejeon, Korea).The voucher specimens (KPM028-045, PA000855, PA001124, and PA001125) was deposited at the Natural Product Central Bank of Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Korea.The samples were weighed in ten replications for each region.The dried samples (300 mg) were extracted with 10 mL of methanol at room temperature using a sonicator for 1 h, filtered, and evaporated using a rotary evaporator below 40 ℃.This process was repeated three times to obtain the total extract.Four distinct dried materials were each extracted ten times separately and were then analyzed.Forty different extracts (3 mg each) were dissolved in 1 mL of MeOH for UPLC-QTOF/MS analysis.

UPLC-QTOF/MS analysis
EKN extract was analyzed using Waters Acquity UPLC system combined with a XEVO-G2 XS QTOF mass detector (Waters, Milford, MA, USA) equipped with Atlantis T3 C18 column (1.7 μm, 1.2 mm i.d.× 100 mm) operated at 35 ℃ with 0.1% formic acid/water as mobile phase A and 0.1% formic acid/acetonitrile as mobile phase B. The water was purified using Milli-Q Academic, produced by Merck Millipore (Burlington, MA, USA).Acetonitrile and formic acid required for UPLC-DAD (Diode Array Detector)-QToF/MS analysis was purchased from Merck Millipore and Sigma-Aldrich (St. Louis, MO, USA).The sample analysis focused on phenolic compounds including flavonoids in the extract, was performed with the gradient elution as follows: 13% (B) for 0.00-1.00min, 13-28% (B) for 1.00-7.00min, 28-36% for 7.00-10.00min, 36-38% for 10.00-12.00min, 38-65% for 12.00-16.00min, 65-100% for 16.00-16.01min, 100% (B) for 16.01-18.50min, 100-13% for 18.50-18.51min, and 13% (B) for 18.51-21.00min.The flow rate was 0.4 mL/min and the injection volume was 1 μL.In this study, MS analysis was conducted exclusively in negative mode because a wider range of compounds were effectively ionized and detected in this mode.In contrast, the positive mode did not yield satisfactory detection for these substances (Fig. S2).Data-dependent analysis was performed in negative mode under the following conditions; source temperature was set at 110 ℃; desolvation temperature was set at 350 ℃; the capillary voltage was 2.3 kV; cone voltage was 40 V, collision energy ramp LM 20-40 eV, and HM 50-90 eV.Throughout the analysis, a reference mass of leucine enkephalin (m/z 554.2615) was used for mass correction.All collected raw data sets were converted to mzML format using the MSConvert 3.0, then processed by MZMINE software version 2.53 [20] to extract molecular features for deconvolution, alignment, and integration using manual parameters based on the ion peak including m/z, retention time, and relative intensity.The aligned data was used for multivariate analysis and the GNPS molecular networking (Global Natural Product Social Molecular Networking).

Multivariate analysis
The processed data were exported as a CSV file, containing information on ion peaks including m/z, retention time, and relative intensity.The data were labeled for each group along with a series of repetitions, such as S1-1 to S1-10.Before PCA (Principal Component Analysis) and OPLS-DA (Orthogonal Partial Least Squares Discriminant Analysis) using the SIMCA-P 12.0 (Umetrics, Umeå, Sweden), the data file was mean-centered and Pareto-scaled.The visualization of the heatmap analysis and the VIP (Variable Importance in Projection) score plot were created using the web-based platform Metabo-Analyst 5.0 (https:// www.metab oanal yst.ca/).

Molecular networking workflow
The processed MS/MS data were submitted to GNPS web platform to determine molecular networks (MN).Access to the created MN and its specific settings is available through this link: https:// gnps.ucsd.edu/ Prote oSAFe/ status.jsp? task= 1f0ee cd27e a04bb 7a5d4 6bcae efe79 79.Parameters for generating the MN included a precursor mass tolerance of m/z 0.1 Da, an MS/MS fragment ion tolerance of m/z 0.5 Da, a minimum cosine similarity score of 0.7, at least 6 matching fragment ions, and a minimum cluster size of 1.Following this, the spectra in the network were compared with the GNPS spectral libraries.Matches between the network's spectra and the library's spectra were considered valid if they achieved a cosine similarity score of over 0.7 with a minimum of 6 matched peaks.Visualization of the resultant MN was carried out using the Cytoscape 3.7.0software.Tentative identification of the components relied on manual analysis of the MS/MS spectral data.

UPLC-QTof/MS analysis
The chemical composition of EKN extracts was investigated using UPLC-QTOF/MS in the negative ESI mode to identify various components.

Multivariate statistical analysis
To assess the relative variability and identify potential chemical markers among EKN samples from various locations, a multivariate statistical analysis was conducted.Multivariate analysis, including PCA, OPLS-DA, heatmap, and VIP scores, were applied to visualize and pinpoint the chemical constituents correlated with the regional distinctions of EKN samples.

Principal component analysis (PCA)
PCA (Fig. 2) was performed to visualize the clustering patterns among the EKN samples based on regional distinctions and elucidate the metabolites associated with chemical variability.The PCA score plot revealed that PC1 accounted for 43.8% of the variance, while PC2 accounted for 14.3%.The samples were grouped into four distinct clusters (Wando (S1), Cheorwon (S2), Yongin (S3), and Hwacheon (S4)), each representing a different region, with each point within a cluster representing an individual sample.Notably, the S1 and S2 groups exhibited a close correlation, in contrast to the more distinct separations observed with S3 and S4 groups.Furthermore, the vectors representing data clusters for S2 and S3 indicated opposite directions, suggesting a negative correlation between these groups.The S3 group was particularly distinguishable, positioned away from the other three groups in a positive region of the PCA score plot, indicating a distinct profile.The model's goodness of fit (R 2 X=84.2%) and predictive capability (Q 2 =56.4%) underscore the model's effectiveness in discriminating between the four groups, each comprising 10 samples (Fig. 2a).Additionally, the PCA loading plot (Fig. 2b) highlighted the specific metabolites responsible for the differentiation among the groups.Metabolites including rhamnocitrin 3-O-glucoside (6), hyperoside (12), ikarisoside B (21), epimedin B (30), epimedin C isomers (32,33), epimedin L (43), and epimedin K (44), contributed to distinguishing the samples according to their geographical origins.

Heatmap and VIP score analysis
The heatmap plot and Variable Importance in Projection (VIP) score plot analysis (Fig. 3) showed the key metabolites based on relative variables, facilitating the identification of potential chemical markers.Heatmap analysis visualized twelve markers that demonstrated significant differences across four EKN samples.The variation in metabolite intensity was depicted through color depth, where deeper colors indicated more significant intensity variations.A VIP score > 1 indicates a variable's substantial importance within the dataset, as depicted in Fig. 3b.Here, eight metabolites are considered as the most crucial contributors to the overall model.Among these, some variables (15, 19, and 39) demonstrated low intensity, which might affect the accuracy of analysis if employed as chemical markers.Consequently, five compounds (4, 6, 12, 19, and 43) stand out as promising candidate chemical markers to differentiate EKN samples from four distinct locations.

Orthogonal projections to latent structures discriminant analysis (OPLS-DA)
OPLS-DA (Fig. S3-S8) was performed to discover potential chemical markers for distinguishing between pairs of groups.This process led to the creation of six separate OPLS-DA models for each pairwise comparison.
In the comparison between S1 and S2, the loading S-plot (Fig. S3a) identified compounds 4, 12, 22, 32, and 40 as being significantly distant from the average data.Among these, only compounds 22, 32, and 40, each with a VIP score above one, were considered potential discriminants between the S1 and S2 data series, as shown in Fig. S3.A similar analysis comparing S1 and S3 (Fig. S4) highlighted compounds 6, 32, 33, and 43 for their deviation from the mean, all with VIP scores above one, underscoring their potential as markers differentiating S1 from S3. Further, the S-plot for the comparison between S1 and S4 and subsequent VIP score analysis pinpointed compounds 6, 12, 22, 33, and 40 as distinct from the average (Fig. S5).The comparison between S2 and S3 identified compounds 6 and 30 as distinguishable markers (Fig. S6).Similarly, the analysis between S2 and S4 (Fig. S7).recognized compounds 6, 30, 43, and 44 as effective discriminators.Lastly, the analysis distinguishing S3 from S4 identified compounds 30, 43, and 44 as potential markers, supported by loading S-plot and VIP score analysis (Fig. S8).This streamlined approach effectively highlights key chemical markers for differentiating between groups.

Comparison analysis of selected chemical marker
Following the comprehensive analysis, three metabolites (6, 12, and 43) were selected as chemical markers to distinguish among the four EKN samples, as detailed in Table 2. Fig. 4 presents a bar graph comparing the relative intensities of each compound across regions S1-S4.Notably, compound 6 showed higher intensity in the Cheorwon area (S2) than in other areas, with its lowest detection in Yongin (S3).In contrast, compounds 12 and 43 were more prevalent in Hwacheon (S4), while exhibiting the lowest intensities in Wando (S1).Thus, the relative intensity of these compounds, as measured by   mass spectrometry (Fig. S9), serves as a critical metric for differentiating EKN samples across the four locales.

Molecular networking analysis
Molecular networking (MN) analysis (Fig. 5) was conducted to elucidate chemical characteristics and trends in metabolites.This analysis grouped

Discussion
This study conducted a comprehensive analysis using advanced analytical techniques, including UPLC-QTOF/MS, multivariate statistical analysis, and molecular networking, to identify and characterize chemical markers within EKN samples.This approach elucidated the chemical diversity and spatial variability among EKN samples from four distinct regions in South Korea, highlighting the influence of geographical location on the chemical profile of EKN.Through multivariate analysis, such as PCA, OPLS-DA, heatmap, and VIP score analysis, key metabolites were identified, highlighting the variability and potential chemical markers among EKN samples from various locations.The discovery of three chemical markers (6, 12, and 43) not only facilitates the authentication of EKN but also enhance the understanding of its chemical variability influenced by geographical factors.Furthermore, the molecular networking analysis provided a detailed visualization of the chemical relationships and classifications of chemical constituents, highlighting the variation in cluster distribution among samples.This understanding underscores the geographic specificity of the EKN chemical profile.
In conclusion, this study underscores the significance of advanced analytical approaches in the comprehensive chemical profiling of natural products and suggests the potential of identified chemical markers in tracing geographical origins.Looking forward, it suggests avenues for future research to investigate the effects of environmental factors, such as soil and climate conditions, on EKN's chemical composition.Such research promises to deepen our comprehension of how environmental variations affect metabolite profiles, thereby enhancing the traceability and quality evaluation of herbal products.

1
Fig. 1 UPLC fingerprints of Epimedium koreanum at 280 nm UV absorption and mass spectroscopy in negative mode C

Fig. 3
Fig. 3 Comparative metabolomic profiling of EKN from four regions.a Heat map analysis of the top twenty annotated peaks and (b) Variable importance in projection (VIP) score plot

Table 1 (continued) No tR, min Molecular formula UV (nm) Detected ion [M-H] − Calculated ion [M-H] −− Error (ppm) MS/MS fragment ions Tentative identification
a Elucidated using NMR spectra and confirmed by comparison with reference standards ion fragments at m/z 367 in compounds 16, 29-44, and 47-50.

Table 2
Quantitative comparison of peak areas for selected chemical markers

Table 3
Compound information clustered via molecular networking analysis based on MS/MS data in negative mode