Natural variation in tocopherols, B vitamins, and isoflavones in seeds of 13 Korean conventional soybean varieties

Soybean seeds are excellent sources of tocopherols, B vitamins, and isoflavones, which are well known for their health benefits. This study investigated the influence of environment and genotype on these constituents across 13 Korean soybean varieties cultivated in three locations during the 2017–2019 growing seasons. Statistical analyses, employing both univariate and multivariate methods, revealed significant impacts of genetic and environmental factors on the composition of tocopherols, B vitamins, and isoflavones. Through permutational univariate analysis of variance, the primary contributors to each measured component were identified. Genotype strongly influenced the levels of β-and δ-tocopherols, whereas the interaction between location and year predominantly affected α-and γ-tocopherols. Vitamin B1 content was predominantly determined by genotype, whereas B3 and B6 were influenced by annual variations. Vitamin B2 level was primarily affected by the interplay between environmental and genotypic effects. Genotype had a significant effect on isoflavone components, with the exception of daidzein. Furthermore, early maturing varieties and those with black seed coats exhibited low levels of isoflavone components and total isoflavones, suggesting a relationship between maturity group and seed coat color in isoflavone variation. These findings can be used as reference values for compositional equivalence assessment of genetically modified soybeans.


Introduction
Soybean (Glycine max (L.) Merr.) is an economically important crop worldwide and is consumed in various forms, such as soybean oil, soybean sprout, soy paste, soymilk, and tofu [1].Soybean seeds are excellent sources of protein, essential fatty acids, carbohydrate, and vitamins [1].In addition, soybean seeds contain several highvalue health-beneficial secondary metabolites, including isoflavones, phenolic acids, and soyasaponins, which are considered to be the most effective natural antioxidants [2,3].Owing to the high economic importance of soybean, new varieties with various traits are continuously developed using genetic engineering technologies and conventional breeding strategies and then introduced into the global market [4].
Soybean is the world's largest genetically modified (GM) crop owing to its agronomic, nutritional, and industrial interest and its amenability to genetic transformation.GM soybeans comprise 50% of global biotech crop production [5].For GM crop biosafety assessment, the compositions of GM crops are compared with those of direct comparators (i.e., near isogenic conventional control) and their conventional comparators that have a history of safe consumption [6,7].The range of compositional data of conventional comparators (reference data) are needed to evaluate the composition of GM crops within the natural range of variation [8].Reference data from comparators grown concurrently in the same field trials of the GM crop, ranges of their compositions reported in the Organization for Economic Co-operation and Development (OECD) consensus documents, the Crop Composition Database (CCDB, www.cropcomposition.org),and peer-reviewed scientific literature can be incorporated into the evaluation of GM crop.Therefore, we started developing a crop composition database (the National Institute of Agricultural Sciences) in South Korea to provide reference data for conventionally commercialized crops such as rice, red pepper, and soybean.Studies were conducted for several years in different regions of South Korea to obtain the ranges of compositional data according to genotype and environmental conditions as recommended in the OECD consensus documents.Our composition data for rice and pepper were deposited in the CCDB (version 9.1) for expansion of crop composition data.
Soybean seeds are a rich source of vitamin E tocochromanols, which occur exclusively as tocopherols.Tocopherols are potent chain-breaking antioxidants that protect against lipid peroxidation [9].Tocopherols can be differentiated into four isoforms (α, β, δ, γ) based on the number and position of the methyl groups attached to its chromanol head.In soybean seeds, γ-tocopherol accounts for approximately 70% of the total tocopherol contents and is the most abundant.Vitamin B, a water-soluble vitamin, is important for the growth and metabolism of living organisms and acts as a co-factor in different metabolic mechanisms.Vitamin B2 comprises riboflavin, flavin adenine dinucleotide (FAD), and flavin mononucleotide (FMN).Riboflavin is a precursor to FAD and FMN.The major components of vitamin B3 are niacin and nicotinamide.Nicotinamide is converted into niacin by nicotinamide deamidase.Vitamin B6 comprises pyridoxal, pyridoxamine, and pyridoxine, and they are converted to pyridoxal 5′-phosphate (PLP), the active form of vitamin B6.PLP plays a pivotal role in amino acid metabolism.In soybean, isoflavones have aglycone structures (daidzein, genistein, and glycitein) and glycosides such as β-glycosides, acetyl and malonyl glycosides.Malonyl glycosides are the most abundant type of isoflavone, whereas aglycones are present in very low concentrations [10].
Previous studies have demonstrated that the compositions of soybean seed are influenced by genotype, maturity, growing season, locations, and agronomic practices [11][12][13][14].The total amounts and proportions of α-, β-, γ-, and δ-tocopherols are different according to genotype [15,16].In soybean seeds, higher levels of α-, γ-, and total tocopherols were observed in early maturing accessions, whereas higher levels of β-tocopherol were obtained from late-maturing accessions [17].The contents of α-tocopherol increased under condition of warm temperatures or drought stress during seed maturation [18][19][20].Few studies have identified the factors that affect B vitamins in soybean seeds.Kim et al. [21] showed that the contents of vitamins B1, B2, B3, B5, B6, and total B vitamins in seeds of 10 black and one yellow soybean varieties varied according to variety.Vedrina-Dragojevic et al. [22] determined the contents of vitamins B1, B2, and B3 in four soybean genotypes over the course of 3 years and showed that climatic and genetic factors played a role in vitamin B synthesis.Isoflavone contents in soybean seeds are highly affected by both genetic and environmental factors, such as climate, planting location, crop year, and agricultural management [23][24][25][26][27].In addition, isoflavone contents, seed coat color, and days of maturity have been reported to have a correlation [28,29].
We recently published the results of the influence of natural variation according to genotype and growth environment (grown at three locations during the 2017 and 2018 growing seasons) on the proximate, mineral, fatty acid, phytic acid, and trypsin inhibitor contents in 13 Korean commercial soybean varieties widely used for food in South Korea [30].To expand on our previous results, the contents of vitamin E (tocopherols), B vitamins [B1 (thiamine), B2 (riboflavin), B3 (nicotinic acid), and B6 (pyridoxine)], and isoflavones (aglycones and glycosides) were determined in the same soybean seeds used in our previous study [30], in addition to soybean seeds grown in 2019 at the same locations.Understanding the natural variation in commercialized soybean varieties provides a critical baseline for comparing the characteristics of GM soybeans.We evaluated the natural variations in these components by identifying the effects of genotype, environment, and their interactions.These findings provide reference data for compositional equivalence assessment of GM soybeans.

Soybean materials and growing conditions
A total of 13 conventional Korean soybean varieties were grown in Suwon (37°27 × 50.02´´N, 126°98 × 49.59´´ E), Iksan (35°94 × 40.02´´N, 126°99 × 36.60´´E), and Dalseong (35°90 × 66.92´´ N, 128°44 × 76.59´´ E) of South Korea during the 2017, 2018, and 2019 growing seasons.Information on the soybean varieties used in this study is presented in Additional file: Table S1.The plots at each site were arranged in a balanced strip design.Each plot consisted of two 10 m long rows with 20 cm seed spacing.Rows were approximately 0.6 m apart, and plots were separated by at least 0.8 m.The soil was a silt clay loam at all sites.Fertilizer was applied prior to planting at a rate of 30-30-32 (N-P-K) kg/ha.Appropriate pesticides were used to control disease and insects.Weeds were removed manually.Seeds were collected from individual plants during the R8 (full maturity) growth stage and then pooled and stored at 4 °C.The planting date and harvesting date were listed in Additional file: Table S2.The monthly precipitation (mm) and average temperature (°C) at the cultivation sites are presented in Table S3 in the Supporting Information.Climate data of each cultivation region were collected from the Korean Meteorological Administration website (http://weather.go.kr/w/ index/do).

Compositional analysis of vitamin B1 (thiamine) and vitamin B2 (riboflavin)
Vitamins B1 and B2 were determined as described in Arella et al. [31], with slight modifications.A finely ground 0.1 g sample was added to 0.1 M HCl and incubated in a water bath at 100 °C for 30 min.After cooling, the solution was adjusted to pH 4.5 with 2.5 M sodium acetate.A small quantity of distilled water (DW) was added to takadiastase (Sigma-Aldrich, St. Louis, MO, USA), and the solution was incubated for 3 h at 37 °C in a shaking incubator and then diluted to 4 mL with DW.The solution was filtered using a hydrophilic filter (0.45 μm), and the filtrate obtained was used for chromatographic determination of vitamin B2.For analysis of vitamin B1, an aliquot of the filtrate (0.5 mL) was transferred to a new tube containing an alkaline solution (1.5 mL) of potassium ferricyanide (1 mL of 1% potassium ferricyanide solution and 24 mL of 3.75 M sodium hydroxide solution).The solution was vortexed, allowed to stand for 1 min, and then passed through a Sep-Pak C 18 cartridge (Waters Co. Milford, MA, USA).The cartridge was washed with 0.05 M sodium acetate (5 mL) and then eluted with methanol-water (70:30 v/v) (2 mL).The elute was filtered through a PTFE 0.45 μm syringe filter (Hangahou Anow Micofitration Co. Ltd), and the filtrate was used for HPLC analysis of vitamin B1 (as thiochrome).HPLC analysis was performed using a 1260 Infinity II Agilent HPLC system (Agilent technologies, Santa Clara, CA, USA) with a C 18 column (250 mm × 4.6 mm, 5 μm internal diameter, Waters Co. Milford, MA, USA) isocratically with a mobile phase consisting of methanol-0.05M sodium acetate (30:70 v/v) at a flow rate of 1.0 mL/min and temperature of 30 °C.The fluorometric detector was operated at an excitation wavelength of 366 nm and emission wavelength of 522 nm for vitamin B2.Thiamine-HCl and riboflavin standards were purchased from Sigma-Aldrich, St. Louis, MO, USA.

Compositional analysis of vitamin B3 (niacin)
Niacin content was determined using a gas chromatography-time-of-flight mass spectrometry (GC-TOFMS) as described by Lee et al. [32].Briefly, 0.95 mL of 2.5 M sulfonic acid and 0.05 mL of an internal standard, D4-nicotinic acid (100 ppm in 0.1 N HCl), were added to 0.1 g of samples.After vortexing, samples were autoclaved for 15 min at 121 °C and then cooled down to 20-25 °C.The samples were centrifuged at 13,000 g for 5 min at 4 °C, and 0.2 mL of the supernatant was passed into an HLB PLUS LP extraction cartridge (Waters Co. Milford, MA, USA).After washing the cartridge with 2 mL of DW, niacin was eluted with 70% methanol, and the eluent was dried in a centrifugal concentrator (CVE-2000, Eyela, Tokyo, Japan).For derivatization, 50 µl of N-methyl-N-(trimethylsilyl) trifluoroacetamide (MSTFA) and 50 µL of pyridine were added and incubated in a thermomixer comfort (Eppendorf, Hamburg, Germany) at 60 °C for 30 min with a 1,200 g mixing frequency.The derivatized samples were analyzed using an Agilent 7890 A gas chromatograph (Agilent, Atlanta, GA, USA) equipped with a 30 m × 0.25 mm i.d.fused silica capillary column coated with 0.25 μm CP-SIL 8 CB low bleed (Varia, Palo Alto.CA, USA).One microliter of each extract was injected into the capillary column at a split ratio of 1:15.Helium was used as a carrier gas at a flow rate of 1.0 mL/min, and the injector temperature was set at 280 °C.The oven temperature was programmed initially at 250 °C for 2 min and then from 250 °C to 290 °C at a rate of 1 mL/min with a final holding time of 8 min.The GC column effluent was analyzed using a Pegasus HT TOF mass spectrometer (LECO, St. Joseph, MI, USA).The temperatures of the source and interface were 250 °C and 290 °C, respectively.The MS spectra were monitored in full scan mode from m/z 70 to 600, and the detector voltage was set at 1800 V.

Compositional analysis of vitamin B6 (pyridoxine)
Pyridoxine was determined according to the procedure described by Choi et al. [33] with modifications.Briefly, 2.5 mL of 0.05 M sodium acetate (pH 4.5 using formic acid) was added to 0.1 g of sample, and extraction was performed in a sonication water bath for 30 min at 40 °C.The sample was placed in a shacking incubator for 18 h at 37 °C.DW of 1.5 mL was added to the sample and then vortexed.The sample was centrifuged at 13,000 g for 20 min at 4 °C.The supernatant was transferred into a new tube and then filtered using a PTFE 0.45 μm syringe filter (Hangahou Anow Micofitration Co. Ltd).The filtrates obtained were injected into a HPLC (Agilent technologies 1260 Infinity II) system equipped with a C 18 column (Waters Symmetry, 5 μm, 4.6 × 250 mm, Waters Co. Milford, MA, USA).The mobile phase consisted of 0.02 M sodium acetate (pH 3.6 using formic acid, mobile phase A) and acetonitrile (mobile phase B) with a binary gradient elution according to the following program: 0-15 min, 98% A/2% B; 15-20 min, 60% A/40% B; 20-25 min, 60% A/40% B; 25-30 min, 98% A/2% B; 30-42 min, 98% A/2% B at a flow rate of 1.0 mL/min, and column temperature of 30 °C.The fluorometric detector was operated at excitation and emission wavelengths of 292 and 396 nm, respectively.Pyridoxine standard was purchased from Sigma-Aldrich, St. Louis, MO, USA.

Compositional analysis of vitamin E (Tocopherol)
Vitamin E was determined using a GC-TOF-MS according to the procedure of Park et al. [34].Ethanol containing 0.1% ascorbic acid (w/v) and 0.05 mL of 5α-cholestane (10 µg/mL) as an internal standard was added to the powdered soybean seed sample (0.1 g).After vortexing, the sample was placed in a water bath at 85 °C for 5 min.Thereafter, 120 µL of potassium hydroxide (80%) was added to the sample, and after vortexing, the sample was further incubated in a water bath for 10 min.The samples were immediately placed on ice, and deionized water (1.5 mL) and hexane (1.5 mL) were then added sequentially.After vortexing, the sample was centrifuged (1,200 g, 5 min, 20 °C).The upper layer was transferred to a new tube, and the pellet was re-extracted with hexane.The hexane fraction was then dried using a centrifugal concentrator (CVE-2000; Eyela, Tokyo, Japan).For derivatization, MSTFA (30 µL) and pyridine (30 µL) were added and incubated in a thermomixer comfort (Eppendorf, Hamgurg, Germany) at 85 °C for 5 min with a 1,200 g mixing frequency.The GC-TOF system used is the same as that used for niacin analysis, except the oven temperature, which was programmed from 250 °C to 290 °C at a rate of 10 °C/min with a final holding time of 10 min.The temperatures of source and interface were 250 °C and 290 °C, respectively.The MS spectra were monitored in full scan mode from m/z 50 to 800, and the detector voltage was set at 1800 V.The tocopherol standard set was purchased from EMD Millipore Corp. (Billerica, MA, USA).

Compositional analysis of isoflavone
For isoflavone extraction, 1.2 mL of a 75% (v/v) ethanol solution was added to 0.3 g of ground samples and sonicated for 1 h in a sonication water bath at 25 °C.After centrifugation (2,000 g for 10 min at 4 °C), 800 µL of the supernatant was transferred to a new tube and 150 µL of 2 N NaOH was added.The sample was allowed to stand at ambient temperature for 10 min and then mixed with 50 µL of acetic acid.After filtration through a PTFE 0.45 μm syringe filter (Hangahou Anow Micofitration Co. Ltd), the isoflavone concentration was analyzed using the HPLC method described in [35] with a slight modification.Thereafter, 0.3 µL of the filtered extraction was applied in the HPLC analysis (Agilent technology 1260 Infinity) equipped with a Cosmosil 2.5 C 18 -MS-II column (50 mm × 2.0 mm ID, 2.5 μm, Nacalai Tesque, Inc., Janpan).A linear HPLC gradient was employed.Solvent A was acetonitrile, and solvent B was 1% trifluoroacetic acid in water according to the program: 0-0.35 min, 90% A/10% B; 0.35-3.96min, 30% A/70% B; 3.96-4.32min, 30% A/70% B; 4.32-9 min, 10% A/90% B at a flow rate of 0.58 mL/min, and column temperature at 30 °C.The isoflavone standards (daidzin, daidzein, genistin, genistein, glycitin, and glycitein) were purchased from Sigma-Aldrich, St. Louis, MO, USA.

Statistical analysis
Statistical analysis was performed on the data using SAS Enterprise Guide 7.0 (SAS Institute, 1999).One-way analysis of variance (ANOVA) was conducted to identify the differences in soybean varieties, locations, and cultivation years.Separation of mean was performed using Bonferroni-corrected t-tests, and statistically significant differences were determined at a probability level of p < 0.05.Principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) was performed with auto-scaled and log-transformed data using SIMCA version 13 (Umetrics, Umeå, Sweden) [36].The quality of the PLS-DA model was evaluated based on the goodness of fit measured based on R 2 X (cum) and R 2 Y (cum) and predictive ability measured based on Q 2 (cum).To assess whether the PLS-DA models were overfitted, a permutation test was performed with 7-fold cross validation (n = 200) [37].The permutational univariate analysis of variance (PERMANOVA) used to define the explanatory power of the variance components of varieties (V), years (Y), locations (L), and their interactions (V×L, V×Y, L×Y, V×L×Y) with compositions using the Plymouth Routines in Multivariate Ecological Research (PRIMER) software package version 7.0 with Add on PERMANOVA (PRIMER-E Ltd, UK) [38,39].The test was computed of raw data using 999 permutations at a significant level of 0.01.

Tocopherol contents
The contents of individual and total tocopherols in 13 Korean soybean varieties across three locations (Suwon, Iksan, and Dalseong) over 3 years (2017, 2018, and 2019) are presented in Table 1.It is known that γ-tocopherol is a major form of seed tocopherols in soybeans, with a concentration ranging from 60 to 70% in soybean seeds.In contrast, α -, β -, and δ-tocopherols are often lower in concentrations [13].It has been reported that total amounts and proportions of α-, β-, γ-, and δ-tocopherols are different according to genotypes [14,15].In the present study, the total tocopherol content of soybeans ranged from 66.7 (CHO) to 89.2 µg/g (DW) across all the locations and years.The contents of γ-and δ-tocopherols accounted for 65 to 80% and 8 to 12% of the total tocopherols, respectively, whereas those of α-and β-tocopherols accounted for 5 to 15% and 2 to 8%, respectively.The concentration of α-and β-tocopherols ranged from 4.8 µg/g (DW) to 11.1 µg/g (SP) and from 1.2 µg/g (PSN) to 3.9 µg/g (SP), respectively.The concentration of δ-and γtocopherols ranged from 6.3 (SCJ) to 14.5 µg/g (DW) and from 44.4 µg/g (SO) to 68.3 µg/g (DW), respectively.In addition, Ghosh et al. [17] used 493 soybean accessions of different origins belonging to seven maturity groups and showed relationship between maturity groups of cultivars and their tocopherol concentrations: higher levels of α-, γ-, and total tocopherols were observed in early maturing accessions, whereas higher levels of β-tocopherol were obtained from late-maturing accessions.However, in our study, no consistent relationship was observed between maturity groups and their tocopherol content (Table 1, Additional file: Tables S4-S6).Further, two early maturity varieties, CHO and SO, showed the lowest γ-and total tocopherols contents (Table 1) across environments.The relationship between tocopherol content and maturity groups most likely did not reflect due to our small sample size.
In addition to the genotypic factor, the yearly difference at the same location was important for individual and total tocopherol contents, with the exception of β-tocopherol at Dalseong (Table 2).Total tocopherol and four individual tocopherol contents were the lowest in 2017 in Iksan.Overall, these results were observed in all the varieties grown in Iksan in 2017 compared to those grown in the other 2 years (Additional file: Table S4).Some varieties, such as CHO, DC-2, DP, and UR in Iksan in 2017, had lower tocopherol content compared to those in the other 2 years.Tocopherol composition is greatly affected by growth environment, especially during the stages R5-R7 [13,14].For instance, higher α-tocopherol concentrations and lower δ-and total tocopherol concentrations were observed in warmer environments than in cooler environments [19].In addition, soil moisture and irrigation during seed-filling period affect the tocopherol composition of soybean seeds.Drought stress increased α-tocopherol concentrations but decreased its δ-and γ-tocopherols [18].However, in Iksan during 2017, the average air temperature and precipitation in September and October, when seed filling occurs, were not exceptional compared to those in 2018 and 2017 (Additional file: Table S3).The lower tocopherol concentrations in Iksan in 2017 might be attributable to other conditions such as soil fertility and crop management practices [13].

B vitamins contents
It is known that the contents of B vitamins in crops such as beans and wheat grains are influenced by variety, seed maturity, and cultivation environment [22,40].However, studies for the effects of environmental and genetic factors on vitamin B accumulation in soybean seeds were not sufficiently reported.Vedrina-Dragojević et al. [22] reported that thiamine and riboflavin concentrations were markedly different between four soybean varieties and climatic factors in the 3 years.In contrast, niacin content was similar between the cultivars in the same year.Kim et al. [21] showed that the vitamins B1, B2, B3, B5, and B6 contents in 10 black soybean seeds were affected by genotypic factors alongside the differences in cotyledon color.In the present study, the differences in vitamins B1, B2, and B6 contents between the varieties were significant, whereas vitamin B3 content was not significantly different (Table 1).The contents of vitamins B1, B2, B3, and B6 ranged from 3.5 (SP) to 6.4 (SO), 0.8 (CHO) to 1.1 (PSN), 55.2 (DC) to 60.0 (PSN), and 2.9 (SP) to 4.2 µg/g (SCJ), respectively.The year effect of each location across varieties was observed in all four B vitamins, with the exception of vitamin B2 in Dalseong (Table 2).In Suwon, vitamin B1 was the highest in 2017, whereas vitamins B2, B3, and B6 were the lowest in 2017.
For the Iksan-grown samples, the contents of vitamins B1 and B2 were the highest in 2019, whereas those of vitamins B3 and B6 were the highest in 2018.In Dalseong, the contents of vitamins B1 and B6 were the highest in 2019, whereas that of vitamin B3 was the highest in 2018.
In addition, the effect of year at each growth location on isoflavone biosynthesis was significant (Table 4).The contents of daidzin, genistin, and total isoflavone were significantly higher in 2018 than in other years in all locations (Table 4).The content of glycitein was not different according to year in each location, whereas that of glycitin was the lowest in 2017 compared to other years at Iksan and Dalseong.In previous studies, lower isoflavone concentrations were generally observed in early maturing cultivars than in late-maturing soybean cultivars [28,29].In our study, the varieties CHO and SO, which belong to the early maturing ecotype, and SCJ and CJ-3, which have a black seed coat, tended to have lower isoflavone concentrations (Table 3).It is known that soybean seed isoflavone concentration and composition are influenced by temperature during seed development [19,25].When the R6 growth stage plants grown under intermediate night/ daytime temperatures of 18/28°C were subjected to either intermediate (18/28°C), low (13/23°C), or high (23/33°C) temperature conditions, the decrease in temperature significantly increased the isoflavone concentrations [25].Lower isoflavone content in early maturing varieties might be attributable to higher temperatures during seed maturation than in mid-late-and late-maturing varieties (Table 1).Some studies have investigated the relationship between isoflavone content and seed coat color; for instance, high total isoflavone content in black soybeans [29,41].In contrast, Lee et al. [42] found less isoflavones in black soybeans than in yellow soybeans.These authors [27,29,42] suggested that there is no consensus regarding the relationship between isoflavone content and seed coat color.
Despite significant environmental effects on isoflavone concentration, varieties with consistently high and low isoflavone concentrations across environments were observed by Seguin et al. [26], who carried out an investigation on 20 cultivars grown in replicated trials at two sites in Montreal, Canada, in 2002/2003.Similarly, in our study, the ranking of some varieties with the highest and lowest total isoflavone concentrations was relatively stable across locations and years.DP-2, PW, SP, and UR consistently had the highest total isoflavone concentrations at each location per year (Additional file: Tables S7-S9).In contrast, SO and TG consistently had the lowest total isoflavone concentrations in each of the nine locations.These results revealed the existence of genetic differences in total isoflavone concentrations.

Chemometric analyses
Principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) have been used as the most common chemometric tools for extracting information from any multivariate data of a biological system [43].When unsupervised PCA analysis was performed to visualize for the separation among varieties, locations, and years of compositional data, the t1 and t2 accounted for 26% and 18% of the total variance, respectively (Fig. 1).Each point represents a particular sample.Large variances among samples of the same variety were clearly observed on the PCA score plot (Fig. 1a), indicating that there are considerable environmental effects on the composition.Notably, SO and SP were separated on the PC1.The loading plot of the corresponding PCA indicated that discrimination of SO and SP could be attributed in part to differences in levels vitamin B1, daidzin, and genistin (Fig. 1d).The PCA was in agreement with the levels of vitamin B1, daidzin and genistin in SO and SP (Tables 1 and 3).There was no clear separation among the three locations, with exception of that some data of the Iksan-grown samples were differentiated from the Suwon-and Dalseong-grown samples (Fig. 1b).Year 2017 could be separated from the year 2018 and 2019 on the t2 (Fig. 1c).
Since PCA is an unsupervised method, not taking into account varieties, locations, or years in the definition of the components, PLS-DA, a supervised classification method was utilized (Fig. 2).The R 2 and Q 2 parameters of the PLS-DA model were used to measure the goodness of fit and predictive ability of the model, respectively.These values ranged from a minimum of zero to a maximum of one.The model fits and predicts better if their values are close to 1.0, and a model with Q 2 > 0.5 is considered to have good predictive capacity [36].The score plot of PLS-DA according to variety (R 2 X = 0.578, R 2 Y = 0.267, Q 2 = 0.183) showed some differences among the varieties on t1 and t2, although some varieties were not significantly different (Fig. 2a).The two components t1 and t2 accounted for 25.2% and 10.6% of the total model variance, respectively.As already indicated by the PCA, Variable Importance in Projection (VIP) score was obtained from the PLS-DA and then used to identify potential metabolites for discrimination.Variables with a VIP score > 1 were considered more important for classification.β-tocopherol, glycitein, δ-tocopherol, vitamin B1, and glycitin contributed to the separation of varieties.The score plot of PLS-DA by location (R 2 X = 0.564, R 2 Y = 0.297, and Q 2 = 0.09) showed that there were no apparent differences among the three locations (Fig. 2b).As Daidzein, vitamin B2, and genistein contributed to the differences between the locations (Fig. 2b).The score plot of PLS-DA by year (R 2 X = 0.583, R 2 Y = 0.588, Q 2 = 0.502) showed some differences among the 3 years (Fig. 2c).Data from soybeans grown in 2017 were different from data from 2018 to 2019.Vitamins B6 and B3 were the most significant components in the PLS-DA model for the separation of data according to cultivation year (Fig. 2c).
The parameters Q 2 of the PLS-DA for variety (Fig. 2a) and location (Fig. 2b) were 0.183 and 0.09, respectively, suggesting the poor predictive abilities of the models.The Q 2 value of the PLS-DA for year (Fig. 2c) was > 0.5, indicating a good predictability of the model.Q 2 values strongly depend on the properties of a dataset, such as the number of observations.It has been shown that models with poor predictability are frequently validated by a permutation test when the predictive ability of the original model is greater than that of any model with random permutations of y variables [37,44].Results of the permutation tests for the three models are shown in Additional file: Fig. S1.All the permutated R 2 and Q 2 values were smaller than the original values of their models, and permutated Q 2 value, the intercept on the y-axis was negative.This suggests that the models were acceptable.

Analysis of variance using permutational univariate analysis (PEMANOVA)
PERMANOVA, a nonparametric analysis of variance, can partition variation directly among individual terms in multifactorial ANOVA model [45].In this study, a PERMANOVA based on 999 permutations using the Euclidean distance and partitioning was done using Type III sum of squares for each variable (genotype, location, year, and respective interactions) to determine the contributions of variables to the component composition (Table 5).The component of variation in PERMANOVA (COV) is a value that indicates the degree of influence of each factor.A higher COV indicates a greater influence of a specific factor or interaction effect [38].Table 5 summarizes the PERMANOVA results of Pseudo-F, p (perm), and COV values.
Tocopherol concentrations are mainly influenced mainly by genetic and environmental factors such as temperature during seed filling and soil moisture [13,18].The relative contribution of these factors of variation to tocopherol composition is quite contradictory in previous studies; Whent et al. ( 2009) reported that α-(57%), γ-(70%), δ-(43%), and total tocopherol (69%) contents were most affected by genotype.The second most important source of variation for individual tocopherol was the environment, followed by genotype-by-environment interactions.However, Carrera et al. [13] reported that the environment accounted for most of the total variation in the concentration of α-(84%), γ-(38%), δ-(84%), and total tocopherol (41%).In our study, the L×Y effect was the most significant factor for α-(COV = 197.62)and γ-(COV = 76.91)tocopherols, whereas the V effect was for β-(COV = 59.85) and δ-(COV = 26.62)tocopherols (Table 5).Notably, tocopherol isomers with the same benzoquinol structures present in the same biosynthetic pathway are influenced by the same variance factors.The different results between studies may be due to differences in growing locations, evaluation years, and genotypes.
To date, studies on the genotypic and/or environmental factors affecting the concentrations of B vitamins are few.Our results revealed that vitamin B1 was attributable to the V effect (COV = 15.31),followed by the L×Y effect (COV = 8.28).Vitamin B3 (COV = 2.99), and B6 (COV = 49.13)contents were attributable to the year effect, whereas the vitamin B2 content was mainly affected by the V×L×Y effect (COV = 19.86),followed by the V×L effect (COV = 12.75) (Table 5).With regard to the isoflavone content, the V effect was the most significant for daidzin (COV = 11.02),genistein (COV = 53.15),genistin (COV = 61.42),glycitein (COV = 236.66),and glycitin (COV = 118.74).However, the daidzein content was influenced by the V×L×Y effect (COV = 123.38)rather than the V effect alone (COV = 63.51).Our results are in agreement with those of Hoeck et al. [46] and Zhang et al. [28], who previously reported that genetic factors play the most important role in isoflavone accumulation rather than environmental factors, such as site and year or interaction effect between genetic and environmental factors [28].These results were further supported by the identified variables that contributed to the discriminations caused by variety, location, and years in PLS-DA model: β-tocopherol, glycitein, and δ-tocopherol for variety; daidzein, vitamin B2, and genistein for location; vitamin B6 and B3 for year (Fig. 2).This shows that our evaluation of the major variables determining the contents of these compositions is reliable.
This study investigated the impact of genotypic and environmental factors on the tocopherol, B vitamin, and isoflavone contents in 13 Korean soybean varieties, with differing seed coat color, maturity durations, and food usage.Our findings revealed significant effects of both genotypic and environmental variables on these seed constituents.Utilizing the PLS-DA model, we observed a greater influence of the cultivation year on the measured components compared to variety and location.PERMANOVA analysis highlighted genetic factors as the primary sources of variations in β-and δ-tocopherols, vitamin B1, daidzin, genistein, genistin, glycitein, and glycitin accumulation.Additionally, location and year interactions significantly impacted α-and γ-tocopherols.Thus, optimizing the growing environment becomes crucial for enhancing α-and γ-tocopherols in soybean seeds.Furthermore, cultivation year was a key determinant of vitamin B3 and B6 contents, whereas daidzein and vitamin B2 contents were influenced by genotype-environmental interactions.Notably, isoflavone accumulation was found to be lower in early maturing varieties compared to late maturing varieties.These findings contribute to a better understanding of the factors governing seed composition in soybeans, and to expand the compositional dataset of commercial soybeans for the safety assessment of genetically modified soybeans.

Fig. 1
Fig. 1 Score and loading plots of principal components 1 and 2 of the principal component analysis (PCA) generated from tocopherols, B vitamins, and isoflavones.PCA score plots colored according to variety (a), location (b), and year (c)

Fig. 2
Fig. 2 Latent structure discrimination analysis (PLS-DA) score plots and variable importance in the projection (VIP) score plots of tocopherol, B vitamins, and isoflavone.Compositional data were subjected to PLS-DA according to variety (a), location (b), and year (c)

Table 5
Results of the PERMANOVA for tocopherols, B vitamins, and isoflavone contents in seeds of 13 soybean cultivars grown at three locations for 3 consecutive years
Data are presented as the mean and standard deviation (µg/g dry weight basis).Means in the same column followed by the same letter(s) are not significantly different at P < 0.05 based on the Bonferroni test.Total toc summed up the contents of four tocopherols.Toc, tocopherol

Table 2
Contents of tocopherols and B vitamins in seeds of 13 soybean varieties in each environment across varietiesData are presented as the mean and standard deviation (µg/g dry weight basis).Means in the same column followed by the same letter(s) are not significantly different at P < 0.05 based on the Bonferroni test.Total toc summed up the contents of four tocopherols.Toc, tocopherol

Table 3
Contents of isoflavones in seeds of 13 soybean varieties across three locations for 3 years Data are presented as the mean and standard deviation (µg/g dry weight basis).Means in the same column followed by the same letter(s) are not significantly different at P < 0.05 based on the Bonferroni test

Table 4
Contents of isoflavones in seeds of 13 soybean varieties in each environment across varietiesData are presented as the mean and standard deviation (µg/g dry weight basis).Means in the same column followed by the same letter(s) are not significantly different at P < 0.05 based on the Bonferroni test