MicroRNA clustering on the biogenesis of suboptimal microRNAs

Most microRNAs (miRNAs) are processed by two ribonuclease III enzymes. The first cleavage is performed by Microprocessor that is composed of RNase III enzyme Drosha and DGCR8, and the second by another RNase III enzyme Dicer. There are many examples of miRNAs that are poor substrates for Drosha and Dicer, owing to their suboptimal structures. However, a number of these suboptimal miRNAs are known to be expressed at the same or higher level as their neighboring structurally-optimal miRNAs. Recent studies suggest that the clustered orientation of these suboptimal miRNAs is the explanation for this phenomenon. It has been observed that the biogenesis of these suboptimal miRNAs can be affected by the expression of their neighboring optimal miRNAs. This principle is expected to apply more broadly, as it has been shown that a large percentage of suboptimal miRNAs reside within operons.


Biogenesis of canonical and non-canonical microRNAs
MicroRNAs (miRNAs) are small, non-coding RNAs comprising ~ 22 nucleotides that play important roles in gene regulation [1,2]. miRNAs form a silencing complex with Argonaute protein (AGO) and together direct the posttranscriptional repression of mRNAs. After one strand of miRNA duplex is loaded into AGO, an miRNA seeks out and pairs with the target mRNA, which is then repressed by AGO [3].
In the canonical pathway of miRNA biogenesis, a long primary miRNA transcript (pri-miRNA) undergoes two cleavages (Fig. 1A). Pri-miRNAs are generated via the transcription of miRNA genes by RNA polymerase II (RNAPII), and they carry distinct stem-loop structures that are necessary for both of the cleavage reactions [4]. The first cleavage is performed by the Microprocessor complex, which consists of Drosha, a member of RNase III family, and two molecules of its cofactor, DGCR8 [5]. The Microprocessor recognizes and binds to the distinct stem-loop structure of a pri-miRNA and cleaves the hairpin at about one helical turn from its base, releasing the precursor miRNA (pre-miRNA) [6]. This pre-miRNA is then exported to the cytoplasm, where a second cleavage is performed by Dicer, another RNase-III-type endonuclease [7]. Similar to Drosha, Dicer recognizes certain structures of pre-miRNAs rather than being sequence specific. Dicer cleaves the pre-miRNA at about two helical turns away from its base and releases an miRNA duplex. As mentioned earlier, one strand of this miRNA is then loaded into AGO to form a silencing complex that regulates mRNA expression [8].
Along with this canonical pathway of miRNA biogenesis, there are also numerous variations, in which miRNA substrates undergo non-canonical biogenesis (Fig. 1B). A number of Drosha-independent and Dicer-independent miRNA biogenesis pathways have been reported [9][10][11][12]. For example, "mirtrons" that are introns that mimic the structural features of pre-miR-NAs and enter the miRNA biogenesis pathway have been identified [10]. It also has been found that some pre-miRNAs can be derived from tRNA genes without being processed by Drosha [11]. Despite these examples of miRNAs that bypass Drosha processing, there is only one known example of a miRNA that is Dicerindependent: miR-451 [12]. Therefore, many studies have been conducted to study the biogenesis pathway of miR-451, and numerous unique characteristics of this miRNA have been discovered recently [13][14][15][16].
Including miR-451, around one-third of vertebrae miRNAs exist in operons [17], and around 50% of conserved miRNA genes in the human genome are clustered together [18]. Although the biological implications of this clustering are still mostly unclear, many recent studies have suggested that the clustering of miRNAs provide an advantage during miRNA biogenesis [13,15,16,19]. More specifically, this clustered arrangement may allow pri-miRNAs that are structurally poor substrates of Microprocessor to be efficiently processed, allowing them to successfully enter the miRNA biogenesis pathway.
Recently, many studies have been conducted on miR-NAs that are present on the same primary transcripts, for which the expression of one of the miRNAs is dependent on that of the other [13,15,16,19]. This phenomenon is referred to as "cluster assistance" [13].

Biogenesis of Drosophila miR-998 is dependent on neighboring miR-11
Two miRNAs within the Drosophila E2f1 gene, miR-11 and miR-998, comprise the mir-11~998 cluster. Although both miR-11 and miR-998 are known to be co-expressed Canonical and non-canonical pathways of miRNA biogenesis. A Canonical pathway of miRNA biogenesis. Primary transcripts (pri-miRNAs) are transcribed from miRNA genes by RNA polymerase II and are then cleaved by Microprocessor (Drosha+DGCR8) to form precursor miRNAs (pre-miRNAs). Pre-miRNAs are then exported to the cytoplasm, where they are once again cleaved by Dicer, forming miRNA duplexes. One strand of this miRNA duplex is loaded into Argonaute protein (AGO) to form RISC, which takes part in mRNA regulation. B Non-canonical pathway of miRNA biogenesis. During the non-canonical pathway of miRNA biogenesis, miRNAs bypass the Drosha-or Dicer-dependent cleavage step, and this step is replaced by another cleavage reaction carried out by different proteins within the host gene, it has recently been reported that the expression of miR-998 is strongly dependent on the existence of the neighboring miR-11 gene [20]. This dependency was found by comparing the results of quantitative RT-PCR and Northern blots of wild-type miR-998 and miR-11, reciprocal mutant alleles, the miR-11 deletion (mRr-11Δ1), and an miR-998 mutant allele with imprecise P-element excision (miR-998 exc222 ) [21,22]. As a result, miR-11 in both wild-type and the miR-998 exc222 was expressed at the usual level whereas the expression of miR-11 in miR-11Δ1 was absent. The expression of miR-11 was reduced in the miR-11Δ1/miR-998 exc222 heterozygote.
On comparing these results with the expression of miR-998 in each mutant, miR-998 expression was not observed in either qRT-PCR or Northern blot analysis of miR-11Δ1. Furthermore, considering the results of miR-11Δ1/miR-998 exc222 , the expression of mir-11 in trans did not retrieve the expression of miR-998. These data suggest that the existence of miR-11 within the cluster is required for the expression of miR-998 [20].
To investigate the mechanism of this regulation, a primary miR-11~998 transcript was inserted downstream of the luciferase gene in the 3′ UTR [20]. If the pri-miRNA is cleaved by Microprocessor, then the luciferase transcript will be degraded that could be detected by measuring the decrease in luciferase activity. As a result, the decrease in luciferase activity was detected in the wild-type pri-miR-11~998. However, in the pri-miR-11Δ1~998 transcript, luciferase activity was maintained at the same level [20]. The results of this experiment suggest a mechanism in which the presence of miR-11 is necessary for miR-998 to be successfully processed by Drosha.

Biogenesis of Epstein-Barr virus miR-BHRF1-3 is dependent on neighboring miR-BHRF1-2
Regulation among clustered miRNA genes has also been observed in viral miRNA clusters. The BHRF1 miRNA cluster in Epstein-Barr virus (EBV) consists of three genes: miR-BHRF1-1, miR-BHRF1-2, and miR-BHRF1-3. The co-expression of all three genes is required for the EBV virus to transform the resting B cells [23]. The efficiency of this viral transformation drops by about 20-fold if the BHRF1 miRNAs are not present. This significant decrease in efficiency is mainly caused by miR-BHRF1-2 and miR-BHRF1-3. It has been observed that a virus without pre-miR-BHRF1-2 shows a decrease in the level of expression of miR-BHRF1-3 [24,25], as demonstrated by investigating the downregulation of miR-BHRF1-3 in the miR-BHRF1-2 deleted (Δ2) mutant. Both the Δ2 virus and the wild-type sequence were cloned into a eukaryotic expression plasmid and the expression levels of BHRF1 miRNA were measured. The results obtained from these cloned plasmids showed that the expression of miR-BHRF1-3 is highly dependent on the existence of miR-BHRF1-2, a finding that is in line with the results obtained from the Δ2 mutant virus, indicating that this expression pattern depends only on the genetic elements of the BHRF1 locus [24].

Biogenesis of miR-497a is dependent on neighboring miR-195a
CRISPR/Cas9 technology has been used to discover numerous other examples of gene regulation in clustered miRNA genes, including cluster miR-497~195. This cluster is composed of two miRNAs: miR-497a and miR-195a [26]. By targeting the hairpin structure of miR-195a with CRISPR/Cas9, the expression of miR-195a, as measured using qPCR, was downregulated by 55%, and this downregulation eventually led to a significant decrease in the expression level of miR-497a as well [27]. As no mutation was detected in the sequence of miR-197a, it appears that Fig. 2 Cluster assistance between miR-144 and miR-451. miR-451 is a poor substrate for Microprocessor due to its suboptimal structural features. Therefore, it is not expressed abundantly when it exists alone. However, in actual cells, miR-451 is known to be processed efficiently by Microprocessor and is highly expressed. This is due to "cluster assistance" between miR-451 and its neighboring miRNA, miR-144. The existence of this helper hairpin (miR-144) enables the recipient hairpin (miR-451) to be recognized and processed by Microprocessor this decreased expression of miR-197a was due to the downregulation of miR-195a [27].

Biogenesis of non-canonical miR-451 is dependent on biogenesis of neighboring miR-144
The cleavage process of pri-miRNAs directed by the Microprocessor is highly selective. Only very few selected substrates are able to be processed by the Microprocessor and become pre-miRNAs. This selectivity results from the Microprocessor's preference for certain hairpin structures in its substrates. It highly prefers hairpins with a stem length of 35 ± 1 bp, flanking single-stranded regions, with an unstructured terminal loop composed of more than 10 nucleotides, and four specific sequence motifs within the flanking regions and terminal loop [6,[28][29][30][31][32]. These four motifs include the basal UG motif, the apical UGU motif, the flanking CNNc motif, and the mismatched GHG (mGHG) motif [28,30]. The first three are simple primary-sequence motifs, whereas the last mGHG is a complex primary-and secondary-structural motif [28]. Most of the pri-miRNA hairpins that have been conserved throughout the evolution possess several features of this ideal Microprocessor substrate.
However, there are exceptions and miR-451 is one of those. Its stem is only 31 bp and its apical loop is only 4 nucleotides long [13], making it a very structurally poor Microprocessor substrate. Despite these structural disadvantages, however, miR-451 is still processed by Microprocessor to produce pre-miRNAs and is even one of the most highly expressed miRNAs in erthroblasts and erythrocytes [14]. To understand this abnormal situation, many studies have focused on the biogenesis of miR-451, and recently, it has been suggested that its clusterassisted processing coupled with miR-144 expression is a possible explanation of this abnormality (Fig. 2) [13][14][15][16].
miR-144 is located in the same primary transcript as miR-451, but it has structural features that make it a good Microprocessor substrate. It has two motifs and an almost ideal stem-loop structure [13]. However, the expression levels of these two contrasting miRNAs, miR-144 and miR-451, are similar and the accumulation of miR-144 seems to benefit the processing of neighboring miR-451.
To observe cluster assistance in the expression of miR-451, miR-451 was expressed in HEK293 cells through a plasmid with a bidirectional promoter. miR-451 was transcribed in one direction, whereas miR-144, which exists in cluster with miR-451, was transcribed in the opposite direction [13]. Thus, miR-451's level of expression increased by around 40-fold when expressed together with miR-144 from the same pri-miRNA transcript, compared to when it was expressed alone. The expression level of miR-144, however, did not change significantly in either situation. Furthermore, when miR-144 was expressed from a different transcript, the benefit it had on miR-451 was not maintained. The benefit remained when the order of miR-451 and miR-144 was switched on the same transcript and also when miR-144 was replaced with another optimal substrate of Microprocessor, miR-125a [13]. Lastly, when the hairpin structure of miR-144 was replaced with the hairpin of miR-451, the benefit was lost. These results suggest the existence of "cluster assistance, " in which a poor substrate of Microprocessor can be efficiently expressed when clustered on the same transcript with an miRNA that has an optimal hairpin structure [13].
In another study, the expression levels of miR-451 and miR-144 were measured under a number of different circumstances [15]. Experiments such as expressing the two miRNAs from different transcripts, using genetic mutants (deletion of pre-mir-144 or deletion of the terminal loop of pre-mir-144), substituting miR-144 with miR-7a or miR-545, other good substrates of Microprocessor, all showcased similar results as mentioned earlier [29]. In addition, these experiments were performed in vivo using CRISPR/Cas9 technology to overcome the errors of in vitro tests and to confirm that cluster assistance can occur under endogenous environments. The results obtained suggested the existence of cluster assistance in vivo as well as in vitro [13,15].
Studies also suggested that the miR-451 hairpin is not processed efficiently by Microprocessor on its own because of its suboptimal stem-loop structure [13,15]. When both the stem and the apical loop were lengthened to 35 bp and 12 nt, respectively, the ideal length of each feature, the processing rate of miR-451 increased by 170fold. This observation suggests that unfavorable structural features of miR-451 are the key elements hindering its processing by Microprocessor [13].
Biogenesis of miR-15a is dependent on biogenesis of neighboring miR-16-1 miR-15a that exists within the miR-15a-16-1 cluster is a recently documented example of cluster assistance. The primary miR-15a hairpin is processed efficiently in the presence of the neighboring optimal hairpin of miR-16-1 [19]. As in miR-451, miR-15a has a suboptimal hairpin structure, which makes it a poor substrate for Microprocessor. Its lower stem has an atypical extended region that has no base pairing, and when one or two pairing point mutations were introduced in this region, miR-15a was expressed at a higher level, even in the absence of the helper hairpin, miR-16-1 [19,30]. When this mutation was reversed to mimic the structure of the original lower stem, the expression level of miR-15a was decreased again, demonstrating that this suboptimal stem-loop structure is the reason for its poor expression and its need for cluster assistance [19].
The existence of cluster assistance between miR-15a and miR-16-1 was shown using the GFP and dsRedbased reporter system, which measured the mature miRNA activity of various mutated miRNAs. When the stem-loop structure of miR-16-1 was mutated and destabilized, although the expression level of pri-miRNA did not change substantially, but the mature miR-15a activity decreased. Similar results were also observed under endogenous circumstances, where CRISPR/Cas 9 was used to disrupt the miR-16-1 gene [19]. As with miR-451 and miR-144 [13,15], substituting the assisting miRNA (miR-16-1, in this case) with another optimal miRNA had the same beneficial effects on the processing of miR-15a [19]. Table 1 shows the lists of helper and recipient hairpin pairs that are mentioned in this section.

Characteristics of cluster assistance
A number of other characteristics of cluster assistance were found through a series of experiments (Fig. 3) [13,15]. The basal stem of the miR-144 hairpin was mutated to make the structure of the hairpin less optimal as a Microprocessor substrate and to investigate whether the effect of cluster assistance depends on how efficiently Microprocessor recognizes the helper hairpin [13]. As a result, the expression of the recipient gene decreased. Also, when the subsequent mutations were made to restore the optimal stem length, the level of expression level was restored. These results indicate the existence of correlation between the efficient recognition of the helper hairpin (miR-144) and the expression of the recipient hairpin (miR-451) [13].
To test the possible role of RNAPII in cluster assistance, the miR-144~451 was expressed using a U6 snRNA promoter, which directs RNA polymerase III transcription, instead of RNAPII. Under these conditions, cluster assistance was still evident, indicating that there was no clear effect of RNAPII on cluster assistance and that RNAPII coupling to Microprocessor is not essential to cluster assistance during miR-451 biogenesis [13,15].
The effect of the spacing between the recipient and the helper hairpin on cluster assistance was also tested by increasing the length of the linker region located between miR-144 and miR-451. The results showed that cluster assistance was still evident in the presence of long linkers, but the expression of the miR-451 hairpin clearly decreased as the length between the two miRNAs increased [13,15].
In addition, it has been shown that cluster assistance occurs even after the helper hairpin is cleaved. This situation allows the prolonged association of Microprocessor with its processing miRNAs, thus significantly increases the benefits of cluster assistance [13].
Finally, experiments to determine whether the cluster assistance that enhances the efficiency of suboptimal miRNA biogenesis could be generalized were conducted [15]. By analyzing numerous suboptimal miRNA hairpins with short loops and measuring their relationship with the closest pri-miRNAs, the idea that the interactions between neighboring miRNAs can enhance the biogenesis of suboptimal canonical miRNAs was again supported. However, several individual pri-miRNAs that have short terminal loops but are still expressed efficiently have been discovered. The existence of these miRNAs suggests the possibility that there are different mechanisms that enhance the biogenesis of suboptimal miRNAs when they do not occur within operons [15].

Local recruitment of microprocessor
Many experimental results, including the experiment in which expression of suboptimal miRNA decreased as the linker region between helper and recipient hairpins increased, suggest the idea that the enhancement of mir-451 expression involves local recruitment and transfer of Microprocessor from neighboring optimal miRNAs [13,15,19]. The exact mechanism by which the local recruitment of Microprocessor to the helper hairpin enhances the expression of the recipient hairpin is still unknown. Fig. 4 miRNA expression can be affected by neighboring clustered miRNA in trans. miR-451 is the only known miRNA whose biogenesis is Dicer-independent and also the most abundant miRNA in erythrocytes. It bypasses the global downregulation of other miRNAs that takes place in erythrocytes. This situation is caused by two elements: miR-144 and miR-451's Dicer-independent biogenesis. After pre-miR-144 is exported to the cytoplasm, it is cleaved by Dicer and becomes miR-144. miR-144 then downregulates Dicer, resulting in a negative-feedback system. As other miRNAs have to be processed by Dicer in order to become mature miRNAs, miR-451 is the only miRNA that is not affected in this way, making it the most abundant miRNA Binding of Microprocessor to the helper miRNA will generally increase the local population of Microprocessors in the neighboring area. However, this cannot be the complete explanation of cluster assistance, as there is no guarantee that an increased population of Microprocessors will result in increased number of Microprocessors binding to the suboptimal hairpin.
Recently, numerous studies have suggested a specific mechanism for this local transfer of Microprocessor [13,15,19]. The key elements of the proposed mechanism are the SAFB2 and ERH proteins. It has first been observed that the loss of SAFB2 or ERH results in a significant decrease in the processing of miR-15a mut (seed mutants of miR-15a), and when SAFB2 and ERH are re-expressed, the normal level of biogenesis was restored. This result indicates that both genes are possible mediators of cluster assistance [19]. It has also been observed that SAFB1, which is highly homologous to SAFB2, can compensate for the loss of SAFB2 to some extent, but not completely [19].
To define how exactly SAFB2 assists biogenesis of miR-15a, and whether it is required for cluster assistance or only for the cleavage of pri-miR-15a after Microprocessor has bound to the miRNA, the seed mutant of miR-15a (miR-15a mut ) was used. This miR-15a mut has intrinsic processing activity, and therefore, if SAFB2 is mainly related to cluster assistance, its repression will have effects on miRNA function only when clustering is present. As a result, the loss of SAFB2 and SAFB1 had an impact on miRNA biogenesis only under conditions in which cluster assistance was present, an observation that supports the idea that SAFB proteins are directly involved in the cluster assistance mechanism [19].
In addition, more examples that showcase the involvement of SAFB2 in cluster assistance have been found through experiments. miR-181b in the miR-181a-181b cluster, miR-92a in the miR-17-92 cluster, and miR-425 in the miR-191-425 cluster were shown to be processed by cluster assistance to some extent, and they all had decreased expression when SAFB1 and SAFB2 were absent [19].
A recent study showed the specific interaction between ERH and Microprocessor [33]. The results obtained using cell knockdown, crystal structure and other various methods demonstrate that ERH forms a stable bond in a 2:2 stoichiometry complex with a conserved region in the N-terminus of DGCR8 [33].
As ERH interacts with SAFB1/SAFB2 and also with Microprocessor [33][34][35], and both SAFB2 and ERH can dimerize, one model of a mechanism for local Microprocessor transfer can be postulated. In this suggested hypothesis, dimerization of SAFB2 and ERH eventually recruits another Microprocessor, which then binds to a nearby recipient hairpin, and enhances its processing [13]. Although this model presently lacks supporting evidence at the moment, the idea that the dimerization of SAFB2 is somehow mandatory for cluster assistance has been suggested by a recent study [13,15,19]. This was achieved by shortening the N-terminal of SAFB2 min (incrementally smaller version of SAFB2), and this experiment demonstrated that the minimal functional region of the SAFB2 protein during cluster assistance consists of a putative coiled-coil domain [19]. As this domain is in charge of the dimerization of SAFB2, it is logical to suggest that the dimerization of SAFB2 is required for cluster assistance.

miRNA expression can be affected by neighboring clustered miRNA in trans
Recently, data suggesting that neighboring clustered miRNAs regulate the biogenesis of suboptimal miRNA not only in cis, but also in trans have been obtained [16]. An example of this mechanism is the non-canonical, Dicer-independent biogenesis of miR-451.
As mentioned earlier, miR-451 has suboptimal structural features with its short stem and apical loop [6]. Due to its unusually short stem, after being processed by Drosha, pre-miR-451 is not cleaved by Dicer, but instead it goes through a Dicer-independent, non-canonical pathway after it is exported to the cytoplasm. As a result, its second cleavage reaction is performed by AGO2, which cleaves the middle of the 3′ arm of the hairpin [36,37]. It is then trimmed by poly(A)-specific ribonuclease (PARN), finally producing mature miR-451 [38]. miR-451 is currently the only known miRNA that is Dicer-independent. miR-451 is the most highly expressed miRNA in erythrocytes and does not go through the global downregulation of canonical miRNAs that takes place in erythrocytes [14]. A recent study has suggested that this phenomenon could be explained by the trans regulation of miR-144 which is located in the same primary transcript as miR-451 [16].
The efficiency of AGO2 dependent processing and Dicer-dependent processing of pre-miRNAs was compared using Northern blot analysis. It was observed that miR-451 was 7.5-fold more abundant than miR-144 in the peripheral blood of adult fish [16]. However, when the efficiency of AGO2 and Dicer was directly measured by comparing wild-type zebrafish pre-miR-451 (pre-miR-451 Ago2 ) and pre-miR-451 Dicer , pre-miR-451 Dicer was more efficiently processed than pre-miR-451 Ago2 by around 20-fold [16]. This result does not agree with the previous result of miR-451 and miR-144 efficiency. This inconsistency can be explained through the analysis of mass spectrometry data of protein abundance during human erythropoiesis in which Dicer shows a steady decline, whereas AGO2 concentration remains the same [39]. Furthermore, three sites that are complementary to the miR-144-3p seed sequence were found in the dicer1 3′UTR [16]. This targeting of miR-144 to Dicer was confirmed by reporter assays in zebrafish. These results together suggest a possible role for miR-144 in the downregulation of Dicer during erythropoiesis, forming a negative-feedback loop (Fig. 4) [16].
As PARN trims not only the 3′ end of miR-451, but also numerous other canonical miRNAs [40], decreases in canonical miRNAs due to the repression of Dicer by the negative-feedback of miR-144 allows Dicer-independent miR-451 to be the most abundant miRNA during erythropoiesis, owing to low competition for the final trimming step operated by PARN [4,38,41].
miR-144 thus mediates the repression of Dicer which then eventually allows efficient biogenesis of miR-451 by downregulating other canonical miRNAs. This mechanism is an example of clustered miRNAs regulating each other in trans (Fig. 4) [16].
Understanding the biogenesis of miRNAs will provide better insights into gene regulation and moreover, might even provide invaluable experimental tools. However, there are still more unknown facts than known facts about this mechanism, and numerous studies are being conducted in an attempt to solve these questions. Recent studies suggest a new mechanism in the biogenesis of suboptimal miRNAs (13,15,16,19). Although it has long been known that a large percentage of miR-NAs are located in operons, the biological reasons for this genetic composition were not fully understood. However, many recent studies suggest that this cluster of miRNAs assists in the biogenesis of some canonical miRNAs with suboptimal structural features (13,15,16,19). More examples of cluster assistance are rapidly being found. Cluster assistance between clustered miR-NAs occurs in both cis and trans (16). Cluster assistance provides explanations as to why certain suboptimal miR-NAs are expressed at the same levels as, or even higher levels than, their neighboring optimal miRNAs. However, there still remain many blank spaces in the mechanisms of cluster assistance and miRNA regulation in general needs considerable further study. The mechanism by which local recruitment of Microprocessors benefit the recipient hairpins is one of these areas that needs to be studied in more detail (19). Although a large percentage of miRNA genes reside in operons, and many of them are not involved in cluster assistance, cluster assistance will still provide basis for further investigations into the regulation of miRNA.