Nondestructive Sequencing of Enantiopure Oligoesters by Nuclear Magnetic Resonance Spectroscopy
- Photo: JACS Au 2022 2 (9), 2108-2118: graphical abstract.
In the research article published in the JACS Au journal, the researchers from the Seoul National University, Korea showcased a method for nondestructively decoding digital information stored in sequence-defined oligomers of enantiopure α-hydroxy acids using 13C nuclear magnetic resonance spectroscopy.
The researchers successfully decoded a 192-bit bitmap image encoded within a library of oligo(L-mandelic-co-D-phenyl lactic acid)s and oligo(L-lactic-co-glycolic acid)s, synthesized rapidly via semi-automated flow chemistry. This approach bypasses traditional degradative sequencing methods, preserving the information-storing polymers. These findings underscore the potential of sequence-defined oligomer bundles for large-scale, nondestructive data storage, leveraging automated synthesis and efficient decoding techniques.
The original article
Nondestructive Sequencing of Enantiopure Oligoesters by Nuclear Magnetic Resonance Spectroscopy
Jeong Min Lee, Heejeong Jang, Seul Woo Lee, and Kyoung Taek Kim
JACS Au 2022 2 (9), 2108-2118
DOI: 10.1021/jacsau.2c00388
licensed under CC-BY 4.0
Selected sections from the article follow. Formats and hyperlinks were adapted from the original.
Abstract
Sequence-defined synthetic oligomers and polymers are promising molecular media for permanently storing digital information. However, the information decoding process relies on degradative sequencing methods such as mass spectrometry, which consumes the information-storing polymers upon decoding. Here, we demonstrate the nondestructive decoding of sequence-defined oligomers of enantiopure α-hydroxy acids, oligo(l-mandelic-co-d-phenyl lactic acid)s (oMPs), and oligo(l-lactic-co-glycolic acid)s (oLGs) by 13C nuclear magnetic resonance spectroscopy. We were able to nondestructively decode a bitmap image (192 bits) encoded using a library of 12 equimolar mixtures of an 8-bit-storing oMP and oLG, synthesized through semiautomated flow chemistry in less than 1% of the reaction time required for the repetition of conventional batch reactions. Our results highlight the potential of bundles of sequence-defined oligomers as efficient media for encoding and decoding large-scale information based on the automation of their synthesis and nondestructive sequencing processes.
Introduction
The storage of information produced by human activities is essential for civilization. The explosion in data production in recent decades demands a corresponding expansion in data storage capacity. However, conventional technologies based on magnetic, optical, and electronic media consume a significant amount of physical space and energy for storing and maintaining information. (1−3) Long-term storage also requires the periodic refreshment of the data, given the deterioration of media over time. (4) Large-scale information storage in sequence-defined biomacromolecules such as DNA has evolved from curiosity-driven research to a promising alternative to existing information storage technologies. (5−8) DNAs can store digital information in their chemical structures using only a few atoms per bit, the unit of digital information. (9−11) In addition, the structural integrity of these information-storing macromolecules can be preserved for an extended period without requiring additional energy for extensive cooling or the periodic refreshment of the stored data. (12) Compared to DNA, sequence-defined polymers are beneficial thanks to the simpler chemical structures of monomers, ease of synthesis, and enhanced stability in various conditions. (13,14) Therefore, sequence-defined oligomers and polymers have attracted recent interest as information storage media. (15−18)
However, realization of these macromolecular media for information storage requires that several key challenges be overcome. In these media, information is encoded in the form of sequences of the monomers constituting the polymer chains through the repetitive coupling of the individual monomers in a stepwise manner. (19−21) Consequently, encoding information by chemical synthesis imposes significant cost and time constraints for large-scale information storage. To overcome this challenge, parallel synthesis of sequence-defined macromolecules by automated and continuous processes is necessary to accelerate the rate of chemical encoding. (22−26)
Decoding the information stored in sequence-defined macromolecules also presents a challenge. Most of the current methods for decoding the information stored in sequence-defined polymers rely on destructive techniques such as tandem mass spectrometry, which involves the fragmentation of the parent molecules. (27−31) Consequently, these methods inevitably consume the information-storing polymers during each decoding attempt, making large-scale or additional synthesis processes necessary for replenishing the polymers, especially when frequent decoding is required. Therefore, nondestructive methods for sequencing synthetic polymers must be developed for macromolecular media to become practically suitable for information storage. (32−37) Recently, nanopore sequencing of oligonucleotides has been adopted for decoding sequence-defined oligomers. (38−40)
On the other hand, NMR spectroscopy, which can detect the structural differences around the atom of interest, is widely used for the analysis of microstructures (arrangements of enantiomeric repeating units along the polymer backbone) of polymers. Especially, 13C NMR has been used to measure comonomer distributions and tacticity of stereoregular polymers such as vinyl polymers and poly(l/d-lactide)s. (41−44) However, the results of 13C NMR spectroscopy only show the cumulative populations of the relative orientation of the enantiomeric repeating units constituting the polymer backbone; (45−47) this is especially true for high-molecular-weight polymers with molecular weight distribution. Previously, Meyer and co-workers reported the sequencing of the repeating fragments composed of lactic, glycolic, or caproic acid by 1H and 13C NMR. (48,49) Despite these previous works, full sequencing of oligoesters up to octamers has not been achieved.
Here, we report the spectroscopic sequencing of sequence-defined oligoesters composed of enantiopure α-hydroxy acids. A library of sequence-defined oligomers of the enantiopure α-hydroxy acids, oligo(l-mandelic-co-d-phenyl lactic acid)s (oMPs), and oligo(l-lactic-co-glycolic acid)s (oLGs) was constructed by semiautomated flow chemistry within less than 1% of the reaction time required to prepare the same set of oligomers by conventional batch reactions and the accompanying purification processes. The sequence of each oligoester could be unambiguously decoded from a single 13C nuclear magnetic resonance (NMR) spectrum. In addition, we show that a maximum of 32 bits (4 bytes) of digital information can be stored in an NMR sample containing an equimolar mixture of oMP and oLG and that this information can be decoded by a single 13C NMR measurement based on the nonoverlapping chemical shifts in the sequence-indicating peaks of oMP and oLG. Our results highlight the potential of bundles of sequence-defined oligomers as efficient media for encoding and decoding large-scale information through the automation of their synthesis and nondestructive sequencing processes.
Results and Discussion
Accelerated Synthesis of Sequence-Defined Octamers of α-Hydroxy Acids by Flow Chemistry
We employed a cross-convergent approach to synthesize the sequence-defined oligomers in a step-economical manner. (50−54) The cross-convergent approach involves the deconstruction of the target sequence into smaller segments built from building blocks composed of a minimum number of uniquely identifiable monomers. Therefore, the sequence-defined building blocks obtained from the permutation of the monomers covering all of the possible sequences of a minimal number of repeating units are prerequisites for the cross-convergent approach (Scheme 1). The permutation in which l-mandelic acid (M) represents 0 and d-phenyl lactic acid (P) represents 1 yielded four dyads (MM, MP, PM, and PP), which cover all possible sequences of the dimers of M and P having protective groups for the hydroxyl and carboxyl end groups. Cross convergence of these dyads would produce 16 tetramers of M and P (tetrads) covering all of the possible sequences during the divergent stage of the synthesis, such that the number of possible products is maximized.
JACS Au 2022, 2, 9, 2108-2118: Scheme 1. Synthesis of Sequence-Defined Oligoesters.
The cross-convergent synthesis of the sequence-defined oligomers and polymers relies on the repetition of a set of chemical reactions, wherein the number of required synthetic steps increases in proportion to the target molecular weight or the number of sequence-defined products. To encode information in oligoesters at a rate higher than that of conventional batch processes, we used the semiautomated method to synthesize all the possible tetrads in a continuous flow process (26) (Figure 1). The controlled feeding of the desired dyads to the corresponding deprotection reactor line, one for the desilylation of the tert-butyldimethylsilyl (TBDMS) group with trifluoroborane etherate (BF3·Et2O) and the other for the allyl transfer reaction to morpholine catalyzed by tetrakis(triphenylphosphine)palladium(0) (Pd(PPh3)4), was achieved by using a set of the computer-controlled six-way valve systems. The hydroxyl product resulting from the desilylation process was purified with water while the carboxyl product from the deallylation process was purified with a 1 M HCl aqueous solution in an in-line extractor. Finally, the two deprotected precursors were converged for esterification while injecting a coupling agent, 1-(3-dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride (EDC) and 4-(dimethylamino) pyridinium 4-toluenesulfonate (DPTS).
JACS Au 2022, 2, 9, 2108-2118: Figure 1. Flow chemistry for encoding 8-bit information in sequence-defined oMPs. A continuous flow synthesis of four dyads yielded 16-tetrad library. Subsequently, a sequence-defined octameric oMP, MPPPPMPM, could be obtained by cross convergence of MPPP and PMPM using a semiautomated flow system.
However, the coupling yield of this fully continuous setup was lower (35–40% after purification) than that of the batch reaction (typically >90%) to form an identical tetrad; this was due to the residual byproducts produced during the deprotection steps, such as allyl morpholinium and PPh3, which remained in the reaction mixture after the allyl transfer reaction. To counter the detrimental effects of these residual byproducts, we included an offline purification step for their removal using an automated instrument for silica column chromatography. This step could be completed within 15 min. The reinjection of the purified dyads into the flow reactor for esterification improved the coupling yield to 90% or greater, which was comparable to that of the batch reaction. The repetition of this semiautomated process allowed the synthesis of a 16-tetrad library to be completed on a gram scale within 24 h.
The resulting tetrads of M and P were subsequently subjected to cross convergence to form an octameric oMP having the desired sequence representing an 8-bit binary code. The target tetrads encoding 4-bit fragments of information (200 mg) were injected into the flow reactor. This was followed by offline purification via column chromatography. This flow process produced an octameric oMP encoding the target 8-bit information within 1 h; this period included the offline purification step. The purified oMPs, acquired with similar yields of tetrads, were fully characterized by 1H and 13C NMR spectroscopy techniques and matrix-assisted laser desorption and ionization-time-of-flight (MALDI-TOF) mass spectrometry, which confirmed their purity.
Based on the nonoverlapping of the peaks of l-lactic acid (L) and glycolic acid (G) units with those of the M and P units in 13C NMR spectroscopy, we also synthesized a sequence-defined oligoester of L and G as an 8-bit-storing molecular medium using the flow chemistry setup described above in the same condition. The injection of four dyads of L and G having the same protective groups as those of the dyads of M and P into the flow chemistry setup yielded 16 tetrads on the 0.5 g scale in 24 h. The purified tetrads, obtained in yields of 88–92%, were subsequently converged to the targeted octameric oLGs using the flow reactor employed for the synthesis of the oMPs. The overall yields of oLGs from dyads were above 75%.
Nondestructive Sequencing of oMPs and oLGs by NMR Spectroscopy
The use of enantiomeric α-hydroxy acids as monomers for the cross convergence to oMPs and oLGs renders the resulting oligoesters to exhibit an absolutely defined stereochemical configuration. This simplified the analysis of the NMR signals, as it removed the complexities arising from the splitting of the peaks caused by the uncertainty of the chemistry and stereochemical configuration of the neighboring monomers.
We first examined the 13C NMR spectra of the tetrads of M and P to obtain information for decoding the sequence of the constituting monomers of the oMP in relation to the neighboring monomer units. We determined that the peaks corresponding to the ipso-carbons of the aromatic substituents of M and P could be used for sequencing, as these peaks appeared as singlets at the specific chemical shifts corresponding to the relative position of the TBDMS protective group (Si-terminus) with respect to that of the carboxyl terminus having an allyl protective group (All-terminus). The positions of the monomers were numbered from 1 to 4 in the direction from the Si-terminus to the All-terminus. The peaks corresponding to the ipso-carbons of the four M units of the homotetrad MMMM from the Si-terminus to the All-terminus appeared in the order M(1), M(2), M(4), and M(3) at the chemical shifts of 138.64, 133.16, 133.16, and 132.76 parts per million (ppm), respectively. Similarly, the 13C NMR spectrum of the tetrad PPPP showed a set of ipso-carbon peaks of the P repeating units in the order P(1), P(2), P(4), and P(3) at 137.76, 136.13, 135.58, and 135.28 ppm, respectively.
In contrast to the NMR spectra of the homotetrads, MMMM and PPPP, those of the heterotetrads indicated that the ipso-carbon of the phenyl group of the tetrads experiences an electronic shielding/deshielding effect arising from the presence of aromatic substituents in the vicinity. The ipso-carbons of the M units were deshielded when they were neighbors with P units but shielded when they were neighbors with M units. Similar trends were observed in the case of the 13C NMR peaks of the ipso-carbons of the P units, which were shifted downfield when the repeating units were neighbors with P units and upfield when the repeating units were neighbors with M units. For example, the 13C NMR spectrum of MMPM shows the downfield shift of the ipso-carbon peaks of M(2) and M(4), in contrast to the spectrum of MMMM owing to the presence of P(3) (Figure 2A). In the case of tetrad PPMP, the ipso-carbon peaks corresponding to P(2) and P(4) were upfield-shifted, in contrast to the spectrum of PPPP, because of the presence of M(3) (Figure 2B). These slight variances in the chemical shifts of the peaks of the repeating units of the oligoesters arising from the chemical structures of the neighboring units were used to estimate the sequence of the repeating units (Figure S1).
JACS Au 2022, 2, 9, 2108-2118: Figure 2. 13C NMR spectra of homotetrads and heterotetrads of M and P units. (A) 13C NMR spectra of MMMM and MMPM. The overlapped ipso-carbons of M units at positions 2 and 4 (133.16 ppm) were deshielded (133.55 and 133.30 ppm, respectively) when neighboring with the P unit. (B) 13C NMR spectra of PPPP and PPMP. The ipso-carbons of P units at positions 2 and 4 (136.13 and 135.58 ppm, respectively) were shielded (135.67 and 135.39 ppm, respectively) when neighboring with the M unit.
The sequencing of octameric oMPs by 13C NMR spectroscopy was expectedly more complicated than the sequencing of tetrads because of the increased number of possible sequences corresponding to the eight peaks appearing within a narrow chemical shift range. To identify a subtle change in the sequences of oMPs, we investigated a series of 13C NMR spectra of oMPs having one M unit moving from the 2-position to 7-position. Despite the identical composition of monomers, these oMPs clearly exhibited the ipso-carbon peaks corresponding to all repeating units at distinguished chemical shifts (Figure S13). Encouraged by these results, we established the rules for decoding the sequence of oMPs based on the electronic shielding/deshielding effects caused by the neighboring monomers. These decoding rules are summarized in Figure 3A.
JACS Au 2022, 2, 9, 2108-2118: Figure 3. NMR sequencing of octameric oMPs. (A) Decoding diagrams of sequence-defined octameric oMPs. (B) Decoding of octameric oMPs based on the 13C NMR spectrum and deciphered chemical structure of oMP. Red circles indicate the chemical shift that should be checked for sequencing.
Based on these rules, we attempted NMR sequencing of an oMP, Si-MPMMPPPM-All, as shown in Figure 3B. The 13C NMR spectrum exhibited four peaks in the 134.5–137.5 ppm range, indicating that the oMP consists of four M and four P units as per (Rule i). The presence of the peak at 138.6 ppm suggests that the units at positions 1 and 2 are M and P, respectively. Rule iii suggests that an M unit is present at position 3, given the absence of a peak at 136.0 ppm. The three units at the All-terminus are P, P, and M units, which are present at positions 6, 7, and 8, respectively. This is based on the peak of ω-carbon of the allyl protective group at 118.7 ppm. Finally, given that no peak was present at 133.0 ppm, Rule v suggests that an M unit is at position 4 and a P unit at position 5. The results of the nondestructive sequencing of the oMP using the above-described rules to decode its 13C spectrum were verified by comparing them with the results of tandem mass sequencing performed using a MALDI-TOF/TOF mass spectrometer (Figure S15). Thus, this nondestructive sequencing method can be applied repeatedly without a loss of the oligoester, which remains intact in the solution. In addition, the solution can be stored in a conventional NMR tube for more than a year.
Similarly, the sequence of an oLG could be decoded by analyzing the 13C NMR spectrum to determine the chemical shifts of the peaks corresponding to the α-carbons of the L and G units composing the sequence-defined octamer. The positions of the monomers of the oLG were assigned from A to H, starting from the Si-terminus to the All-terminus. After confirming the decodability of a single G unit at different positions of oLG, we composed the rules for decoding the sequence of oLGs based on the peak of the α-carbon. These decoding rules are summarized in Figure 4A.
JACS Au 2022, 2, 9, 2108-2118: Figure 4. NMR sequencing of octameric oLGs. (A) Decoding diagrams of sequence-defined octameric oLGs. (B) Decoding of octameric oLGs based on the 13C NMR spectrum and the deciphered chemical structure of oLG. Red circles indicate the checkpoint for sequencing.
As a demonstration, Si-LLGGLLLL-All was sequenced based on the decoding rules, as shown in Figure 4B. The number of peaks present in the 68.0–70.0 ppm range suggests that the oLG is composed of six L and two G units (Rule i). The presence of a peak at 68.0 ppm suggests that the first unit at the A position is L. Rule iii suggests that an L unit is present at position B, given the presence of a peak at 68.5 ppm. The absence of a peak at 68.8 ppm suggests that the third repeating unit is G. The appearance of a peak at 60.5 ppm is indicative of a fourth repeating unit of G. Rule v suggests that an L unit is present at position H, given the absence of a peak in the 61.0–61.2 ppm region. The absence of a peak in the 69.4–69.5 ppm range suggests that the seventh repeating unit is L. Finally, the appearance of a peak at 69.2 ppm is indicative of an L unit at position F. Therefore, the final sequence of the oLG was determined to be Si-LLGGLLLL-All, which is in keeping with the proposed structure of oLG. Moreover, tandem mass sequencing performed using an electrospray ionization (ESI) mass spectrometer yielded a sequence identical to that obtained from NMR sequencing of oLG (Figure S16).
Our rules for decoding the sequences of oligoesters based on their 13C NMR spectra indicate that there is no duplication between the 256 possible 13C NMR spectra of oMPs or the 256 possible spectra of oLGs. This one-to-one correspondence between oligoesters and their respective NMR spectra suggests that the sequences of these oligoesters can be determined by comparing the acquired spectrum with a library of the spectra of 8-bit-storing oligoesters. We envisage that these presynthesized oligoesters covering all of the possible sequences could be used as a pool of 8-bit packets to compose and store large-size digital information that can be readily decoded by nondestructive NMR sequencing.
Encoding and Decoding of Digital Information in Oligoesters
We chose two pairs of enantiomeric α-hydroxy acids, l-mandelic acid/d-phenyl lactic acid and l-lactic acid/glycolic acid, to compose oMPs and oLGs with the aim of confirming that there is no overlapping of the peaks of oligoesters having markedly different substituent chemistries. The NMR spectra of these oligoesters showed that their peaks of interest were present in different chemical shift regions and did not overlap (Figure S17). Hence, the sequences of an oMP and oLG could be decoded simultaneously from a single NMR spectrum of a mixture of the two oligoesters. Therefore, a mixture of an oMP and oLG can store 16 bits, which can be decoded based on a single 13C NMR measurement of the mixture.
To demonstrate this idea, we attempted the accelerated encoding of information in sequence-defined oligoesters through flow chemistry to show that the nondestructive sequencing of oligoesters by NMR spectroscopy can be exploited for the archival storage and retrieval of digital information (Figure 5). A bitmap image (192 bit) was converted into a chemical sequence distributed into 12 sets of oMPs and oLGs. A set of an oMP and oLG, each storing 8 bits, was constructed by repeating the semiautomatic flow processes using tetrads of M and P or L and G as the precursors. The encoding of all 192 bits of information in the library of 12 oMPs and oLGs could be completed within 12 h by running two flow processes in parallel. The encoding time was significantly lower (∼1%) than the time required to complete the synthesis of the same set of oligoesters by repeating the conventional batch reaction and purification processes. Following the decoding rules, the 12 sets of oMPs and oLGs could be completely decoded (Figure S18).
JACS Au 2022, 2, 9, 2108-2118: Figure 5. Encoding and decoding process of enantiopure oligoesters. The converted bitmap image was encoded into 12 sets of sequence-defined octameric oMP and oLG by a semiautomated flow process. Nondestructive decoding of the 04 mixture revealed the absolute sequence of oMP and oLG, which could be retrieved to digital information (highlighted to a red rectangles in a bitmap image).
The synthesized oligoesters were grouped into 12 sets of equimolar mixtures of the oMP and oLG (1:2 w/w, 15 μmol), which were dissolved in 0.5 mL of CDCl3 and stored in conventional NMR tubes labeled 1–12, respectively. The assigned tube numbers corresponded to the externally given addresses for the 12 sets of 16 bits of information. Each tube was subjected to an NMR spectrometer (Varian, 125 MHz for 13C) to acquire the spectrum. The sequencing of oMP and oLG was completed using a single spectrum containing the ipso-carbon peaks of oMPs and the α-carbon peaks of oLG, which did not overlap. The 12 spectra of the mixtures of oMP and oLG could be decoded completely, and the stored information could be fully retrieved within 1 h by comparing the recorded spectra acquired by the minimum number of scans by NMR (32 scans for 110 s) with the existing reference spectra of oMPs and oLGs (Figure 6).
JACS Au 2022, 2, 9, 2108-2118: Figure 6. 13C NMR spectra in a range of ipso-carbons (left), ω-carbon (center), and α-carbons (right) of the octameric mixture 09 with different numbers of scans. The peaks in the spectrum with 32 scans (maroon) could be distinguishable and matched with the reference spectrum (black).
Oligoesters could be degraded by hydrolysis or the epimerization of α-proton in solution, in particular at high temperatures or in the presence of basic catalysts. (55,56) The end groups of oMPs and oLGs used in this study are protected by TBDMS and an allyl ester group, which prevents degradation by hydrolysis. To investigate stability, the oligoesters were contained in NMR tubes with a cap and parafilm seal and without any additives. The decoding and retrieval of the stored image could be achieved without any reading errors even after 10 months using the same NMR samples (Figure S19). We also note that the storage capacity in the mixture of oMP and oLG could be expanded by the correlation of two sequences. For example, 4-bit information can be encoded by the correlation of position 1 of oMP and position A of oLG, which makes the mixture of oMP and oLG to store 32 bits.
Conclusions
In conclusion, we demonstrated the nondestructive sequencing of oligoesters composed of enantiopure α-hydroxy acids by NMR spectroscopy based on the sequence-specific 13C NMR spectral peaks of absolutely configured oligoesters in the 13C NMR spectra. The sequence-defined octameric oligoesters were synthesized at an accelerated rate by a step-economical cross-convergent synthesis based on semiautomated flow chemistry while using a feed of sequence-defined dyads and tetrads of M and P or L and G. The flow chemistry-based synthesis of the information-storing oligoesters accelerates the rate of encoding by a factor of 100 compared with the rate of synthesis for conventional batch processes. The synthesized sequence-defined octaesters, oMPs and oLGs, could both store 8-bit information. The 13C NMR spectra of the oMPs and oLGs contained the peaks arising from the enantiopure monomers with respect to their relative positions between the Si- and All-protective groups. This, in turn, allowed for sequencing without the degradation of the information-storing molecules. We also demonstrated that the nondestructive decoding of the information-storing oligoesters can be combined with accelerated encoding through flow chemistry to allow for the permanent storage of digital information without requiring any additional energy or synthesis processes for maintaining the stored information. Thus, our results suggest that the bundles of sequence-defined oligomers can serve as efficient media for storing large-scale information based on their automated synthesis and nondestructive sequencing.
Methods
Materials
l-Lactic acid (≥98%), l-mandelic acid (≥99%), allyl bromide, tert-butyldimethylsilyl chloride, trifluoroborane etherate, tetrakis(triphenylphosphine)palladium (0), and morpholine were purchased from Sigma-Aldrich and used without further purification. Glycolic acid (≥98%) and 1-(3-dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride were purchased from Tokyo Chemical Industry and used without purification. d-Phenyl lactic acid (≥95%) was purchased from AK Scientific, Inc. and used without purification. Dichloromethane was distilled over CaH2 under N2.
General Instruments
A Legato 101 syringe pump was purchased from Kd Scientific. A Cadent 3TM syringe pump was purchased from IMI Norgren. SEP-10 was purchased from Zaiput Flow Technologies. PFA tubing (1/16″ OD/0.02″ and 0.03″ ID) was purchased from Revodix. An Omnifit EZ column was purchased from Diba Industries Inc. Gastorr AG-42-01 was purchased from GL Sciences. A CF-2 fraction collector was purchased from Spectrum Chemical Mfg. Corp. Automated column chromatography was performed on a Biotage Selekt flash chromatography purification system equipped with a Sfär silica column cartridge. Hexane and ethyl acetate were used as eluents. 1H NMR and 13C NMR spectra were recorded on a Varian INOVA 500 MHz NMR spectrometer in CDCl3. ESI-MS analyses were performed on a SCIEX TripleTOF 5600. MALDI-TOF MS/MS analyses were performed on a Bruker Ultraflex TOF/TOF mass spectrometer.
Continuous Flow Synthesis for Sequence-Defined Tetrads
All permutations of enantiopure oMP and oLG tetrads were generated by a continuous flow system consisting of a synthetic step and cleaning step, which is operated via six-way syringe pumps connected with a programmable controller. About 2 mmol of a dyad (1 M in DCM) and BF3·Et2O (7 M in DCM) were injected at 0.1 mL/min and mixed through a T-mixer. The mixture was allowed to react in the reaction loop (volume of 2 mL). Simultaneously, 2 mmol of a dyad (1 M in tetrahydrofuran (THF)) and Pd(PPh3)4 and morpholine mixture (0.03 M and 1.05 M in THF) were injected at 0.1 mL/min and mixed through the T-mixer. The mixture was allowed to react in the reaction loop (volume of 2 mL). Deprotected dyads, hydroxyl and carboxylic acids, were purified by automated flash column chromatography using HEX/EA and ether/MeOH eluents, respectively. Subsequently, the purified dyads (1 M in DCM, 0.1 mL/min) were reinjected and mixed with EDC·HCl and DPTS (0.7 M and 0.07 M in DCM, 0.2 mL/min). The mixture was allowed to react in the reaction loop (volume of 6 mL). After the synthesis step, the DCM solvent (6 mL) was purged to the flow reactor for cleaning the reaction loop. Synthesis and cleaning cycles were repeated 16 times for generating every permutation of tetrads. Collected tetrads were purified with automated flash column chromatography (86–92% yield).
Synthesis of Octameric Oligoesters by the Semiautomated Flow System
The octameric oMPs and oLGs were obtained by flow synthesis following the same conditions described for the synthesis of tetrads. About 200 mg of a tetrad (0.25–0.5 M in DCM) and BF3·Et2O (7 M in DCM) were injected at 0.1 mL/min and mixed through a T-mixer. The mixture was allowed to react in the reaction loop (volume of 2 mL). Simultaneously, 200 mg of a tetrad (0.25–0.5 M in THF) and Pd(PPh3)4 and morpholine mixture (0.03 and 1.05 M in THF) were injected at 0.1 mL/min and mixed through the T-mixer. The mixture was allowed to react in the reaction loop (volume of 2 mL). Deprotected tetrads were purified by automated flash column chromatography. The purified tetrads (0.25–0.5 M in DCM, 0.1 mL/min) were reinjected and mixed with EDC·HCl and DPTS (0.7 M and 0.07 M in DCM, 0.3 mL/min). The mixture was allowed to react in the reaction loop (volume of 9 mL). The collected octameric oligoester was purified with automated flash column chromatography (85–91% yield).
13C NMR Sequencing of Octameric oMPs
(Rule i) The number of M units in oMP can be determined by counting the number of peaks in the chemical shift regions of 132.5–133.5 ppm and 138–139 ppm. The number of peaks in the 134.5–137.5 ppm range is identical to the number of P units (Figure S2). (Rule ii) The monomer units at positions 1 and 2 can be determined based on the peak of the ipso-carbon that is deshielded most by a silicon atom present in its proximity (Figure S3). In the range of 138.4–138.6 ppm, the peak corresponding to the ipso-carbon of Si-M(1), which neighbors M(2), appears downfield to that of the ipso-carbon that neighbors P(2). Similarly, the peak related to the ipso-carbon of Si-P(1) appears in the range of 137.24–60 ppm owing to the adjacent monomer units. (Rule iii) The presence of a P unit at position 3 can be predicted based on the peak of the monomer unit at position 2 (Figure S4). The peak corresponding to P(3) appears at 133.4 ppm in the case of M(2) and 135.9 ppm in the case of P(2). (Rule iv) The ω-carbon peak of the allyl protective group appears at the characteristic chemical shift in the range of 118.36–119.13 ppm because of the sequence of the monomer units at positions 6–8 (Figure S5). (Rule v) The monomer units on positions 4 and 5 can be predicted by an indirect method (Figure S6). With P at position 3, the presence of the peak at 134.9 ppm is indicative of M(4). In the case of M(3) and P(2), the peak at 133.0 ppm suggests that the P unit is at position 4. In the case of M(3) and M(2), the peak at 132.9 ppm suggests that the P unit is at position 4.
13C NMR Sequencing of Octameric oLGs
(Rule i) The number of L units in the oLG can be determined by counting the number of peaks in the chemical shift regions of 67.9–69.5 ppm. The number of peaks in the range of 60.2–61.5 ppm range is identical to the number of G units (Figures S7 and S8). (Rule ii) The monomer unit at position A can be determined based on the presence of the peak of the α-carbon at a specific chemical shift (Figures S9 and S10). At 68.0–68.1 ppm, the peak corresponding to the α-carbon of L(A) appears and G(A) appears at 61.3–61.5 ppm. (Rule iii) The monomer unit at position B can also be revealed depending on the presence of the peak in specific regions (Figures S9 and S10). The peak corresponding to the α-carbon of L(B) appears at 68.5 ppm and that of G(B) appears at 60.3–5 ppm. (Rule iv) The monomer units at positions C and D are predicted by an indirect method (Figure S11). When the L unit is at position B, the peak at 68.8 ppm suggests that the L unit is at position C. In the case of L(B) and G(C), the peak at 60.5–6 ppm suggests that the G unit is at position D. In the case of L(B) and L(C), the peak at 68.9 ppm suggests that the L unit is at position D. The α-carbon of G(B) appears at 60.3, 60.4 or 60.5 ppm in the case of neighboring G(C), L(C) and L(D), or L(C) and G(D), respectively. When the G units are at positions B and C, the peak at 60.7 ppm suggests that the G unit is at position D. (Rule v) The monomer unit at position H is predicted by the presence of α-carbon at the specific chemical shift (Figure S12). The peak corresponding to the α-carbon of G(H) appears at 61.0–2 ppm. (Rule vi) The monomer units at positions F and G are predicted by an indirect method (Figure S12). The α-carbon of G(H) appears at 61.0, 61.1, or 61.2 ppm in the case of neighboring L(F) and L(G), G(F) and L(G), or G(G), respectively. The peak corresponding to the α-carbon of L(H) appears at 69.4 or 69.5 ppm in the case of neighboring L(F) and G(G) or G(F) and G(G), respectively. In the case of G(G) and G(H), the peak at 60.9 ppm suggests that the G unit is at position F. In the case of L(G) and L(H), the peak at 69.2 ppm suggests that the L unit is at position F.
- Nondestructive Sequencing of Enantiopure Oligoesters by Nuclear Magnetic Resonance Spectroscopy. Jeong Min Lee, Heejeong Jang, Seul Woo Lee, and Kyoung Taek Kim. JACS Au 2022 2 (9), 2108-2118. DOI: 10.1021/jacsau.2c00388.