EP4612301A1 - Chemical stability of mrna - Google Patents
Chemical stability of mrnaInfo
- Publication number
- EP4612301A1 EP4612301A1 EP23817920.4A EP23817920A EP4612301A1 EP 4612301 A1 EP4612301 A1 EP 4612301A1 EP 23817920 A EP23817920 A EP 23817920A EP 4612301 A1 EP4612301 A1 EP 4612301A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- orf
- mrna
- encoded
- polypeptide
- fewer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/67—General methods for enhancing the expression
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K31/00—Medicinal preparations containing organic active ingredients
- A61K31/70—Carbohydrates; Sugars; Derivatives thereof
- A61K31/7088—Compounds having three or more nucleosides or nucleotides
- A61K31/7105—Natural ribonucleic acids, i.e. containing only riboses attached to adenine, guanine, cytosine or uracil and having 3'-5' phosphodiester links
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K48/00—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
- A61K48/005—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K48/00—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
- A61K48/005—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
- A61K48/0066—Manipulation of the nucleic acid to modify its expression pattern, e.g. enhance its duration of expression, achieved by the presence of particular introns in the delivered nucleic acid
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K48/00—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
- A61K48/0091—Purification or manufacturing processes for gene therapy compositions
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K9/00—Medicinal preparations characterised by special physical form
- A61K9/0012—Galenical forms characterised by the site of application
- A61K9/0019—Injectable compositions; Intramuscular, intravenous, arterial, subcutaneous administration; Compositions to be administered through the skin in an invasive manner
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K9/00—Medicinal preparations characterised by special physical form
- A61K9/48—Preparations in capsules, e.g. of gelatin, of chocolate
- A61K9/50—Microcapsules having a gas, liquid or semi-solid filling; Solid microparticles or pellets surrounded by a distinct coating layer, e.g. coated microspheres, coated drug crystals
- A61K9/51—Nanocapsules; Nanoparticles
- A61K9/5107—Excipients; Inactive ingredients
- A61K9/5123—Organic compounds, e.g. fats, sugars
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/88—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation using microencapsulation, e.g. using amphiphile liposome vesicle
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/26—Preparation of nitrogen-containing carbohydrates
- C12P19/28—N-glycosides
- C12P19/30—Nucleotides
- C12P19/34—Polynucleotides, e.g. nucleic acids, oligoribonucleotides
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2830/00—Vector systems having a special element relevant for transcription
- C12N2830/50—Vector systems having a special element relevant for transcription regulating RNA stability, not being an intron, e.g. poly A signal
Definitions
- mRNA-based therapeutics have shown promise, e.g., as vaccines for infectious diseases.
- mRNAs are susceptible to cleavage through multiple pathways, such as hydrolysis of phosphodiester bonds.
- RNAs e.g., mRNAs
- CpA dinucleotide content has been reduced, relative to a wild-type nucleic acid sequence, or minimized, to improve stability of the RNA.
- the disclosure is based, at least in part, on the discovery by the inventors that the phosphodiester bond between the cytidine and adenosine nucleotides of the CpA dinucleotide may be particularly susceptible to non-enzymatic cleavage (e.g., via spontaneous hydrolysis).
- RNA stability provides multiple benefits in the production of RNA therapeutics and prophylactics.
- the improved stability of RNAs in stored RNA compositions allows efficacy to be maintained for longer durations, thereby improving the efficiency of RNA manufacturing.
- Reducing CpA dinucleotide content may be achieved by modifying one or more codons in the open reading frame (ORF) of the RNA without changing the amino acid sequence of an encoded protein.
- one or more UCA codons encoding serine may be changed to UCU, UCC, or UCG, which still encode serine but do not contain a CpA dinucleotide.
- This same approach may be used to reduce or eliminate the presence of CpA dinucleotides in codons encoding proline, threonine, and/or arginine.
- the only amino acids that must be encoded by a codon containing a CpA dinucleotide are histidine (encoded by CAU and CAC) and glutamine (encoded by CAA and CAG), and so the theoretical minimum of CpA dinucleotides in an RNA sequence is limited only by the number of histidine and glutamine residues present in an encoded protein.
- methionine, isoleucine, threonine, lysine, and asparagine must be encoded by codons beginning with an adenosine (A) nucleotide, and so a preceding codon that ends in a cytidine (C) nucleotide will result in a CpA dinucleotide at the junction between the two codons.
- A adenosine
- C cytidine
- a first codon ending in a cytidine (C) nucleotide that immediately precedes a second codon encoding methionine, isoleucine, threonine, lysine, or asparagine may be changed to a codon that encodes the same amino acid as the first codon, but does not end in a C nucleotide.
- GACAUG the first codon
- UAC first codon
- AUG encoding methionine
- one or more serine-or arginine encoding codons that begin with adenosine nucleotides may be changed to codons that encode the same amino acid, but do not begin with adenosine nucleotides.
- other untranslated regions (UTRs) of the RNA such as the 5′ and 3′ UTRs, may be modified to reduce CpA dinucleotide abundance.
- UTRs untranslated regions of RNA
- one or more nucleotides of a CpA dinucleotide may be mutated to eliminate CpA dinucleotides from the UTRs.
- a minimum number of CpA dinucleotides that are present in regulatory motifs may be maintained in a UTR.
- a Kozak sequence that serves as the site of translation initiation may comprise one or more CpA dinucleotides, to allow efficient translation, while other CpA dinucleotides are eliminated to improve stability without reducing translation efficiency.
- Codon and UTR modification to reduce CpA dinucleotide content may comprise specific substitutions maintain other features of an mRNA, such as nucleotide composition, codon optimality, and/or structure, within a desired range.
- RNAs having higher %G/C contents may be more stable than RNAs having lower %G/C contents.
- the inventors posit that the formation of intramolecular secondary structures contributes to RNA thermodynamic stability, with G/C-rich RNAs forming more and stronger secondary structures.
- a specific codon may be substituted to maintain or increase the %G/C content of the resulting RNA sequence.
- a first codon ending in a cytidine nucleotide and preceding a second codon beginning with an adenosine nucleotide may be replaced by a codon ending in a guanosine nucleotide, if possible, to avoid reducing the %G/C content of the RNA sequence.
- some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is greater than or equal to a theoretical minimum and less than or equal to 300% of the theoretical minimum.
- Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum.
- ORF open reading frame
- Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a CpA dinucleotide content of 6.5% or less.
- ORF open reading frame
- Some aspects of the disclosure relate to an mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the mRNA has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%, wherein each of the uridine nucleotides of the ORF comprises a chemical modification, wherein: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleo
- lipid nanoparticle comprising an mRNA described herein, and an ionizable cationic lipid, a non-cationic lipid, a sterol, and a polyethylene glycol (PEG)-modified lipid.
- PEG polyethylene glycol
- Some aspects of the disclosure relate to a method of producing a modified mRNA sequence comprising an ORF encoding a polypeptide, the method comprising modifying a reference mRNA sequence comprising a reference ORF to produce the modified mRNA sequence by: (a) replacing one or more codons in the reference ORF comprising a CpA dinucleotide with a codon that encodes the same amino acid but does not comprise a CpA dinucleotide; and/or (b) replacing one or more codons in the reference ORF that: (1) ends in a cytidine nucleotide; and (2) is immediately followed in the reference ORF by a codon that encodes an isoleucine, methionine, threonine, asparagine, or lysine, or a codon that encodes a serine or arginine and begins with an adenosine nucleotide, with a codon encoding the same amino acid as the replaced cod
- FIG.1 shows the results of sequencing mRNA fragments generated by spontaneous cleavage of a reference mRNA, as a frequency map of cleavage positions, used to determine the positions of spontaneous (non-enzymatic) cleavage. Sequencing reads were aligned to the full-length mRNA sequence, with the 3′ end of the read indicating the nucleotide in the mRNA sequence where cleavage occurred.
- FIGs.2A–2C show the effects of %G/C content and CpA dinucleotide abundance on mRNA structure and stability.
- FIGs.2A and 2B show the kinetics of mRNA purity, as measured by FACE, during storage of unformulated mRNA at 40 °C (FIG.2A) or 25 °C (FIG.2B), for each of three mRNAs containing reduced CpA dinucleotide contents and for a control mRNA.
- FIG.2B shows the kinetics of mRNA purity, as measured by reverse-phase ion pair (RPIP) chromatography, during storage of the same mRNAs formulated in lipid nanoparticles (LNPs) at 25 °C.
- FIGs.3A–3C show the effects of CpA dinucleotide content in in vitro expression of a protein encoded by an mRNA.
- FIG.3A shows the effects of CpA dinucleotide abundance on immunogenicity of mRNAs comprised in lipid nanoparticles (LNP-mRNA compositions).
- mice were administered two doses of the same LNP-mRNA composition on days 1 and 22, with sera collected on day 21, three weeks after administration of the first dose, and day 36, 14 days after administration of the second dose. All mRNAs tested encoded the same antigen with the same amino acid sequence, but individual mRNAs differed in CpA dinucleotide content.
- DETAILED DESCRIPTION [0018] Aspects of the disclosure relate to non-naturally occurring (modified) mRNAs containing relatively reduced abundances of CpA dinucleotides, and methods of improving mRNA stability by reducing the number of CpA dinucleotides in the mRNA sequence.
- the disclosure is based, in part, on the discovery by the inventors that the CpA dinucleotide is the most susceptible to spontaneous cleavage in mRNAs containing 1-methylpseudouridine nucleotides in place of conventional uridine nucleotides.
- the compositions and methods described herein are useful, in some embodiments, for providing RNA therapeutics with improved stability, increased expression of encoded proteins, and/or improved efficacy.
- Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is greater than or equal to a theoretical minimum and less than or equal to 300% of the theoretical minimum.
- ORF open reading frame
- Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum.
- ORF open reading frame
- the number of CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum is no more than 10, no more than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, no more than 2, or no more than 1.
- Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a CpA dinucleotide content of 6.5% or less.
- the ORF comprises a CpA dinucleotide content of 6.0% or less, 5.5% or less, 5% or less, 4.5% or less, 4% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.5% or less.
- (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucle
- the nucleotide sequence of the mRNA comprises a %G/C content of 30% – 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
- one or more nucleotides of the mRNA comprises a chemically modified nucleotide.
- each uridine nucleotide of the mRNA comprises a chemically modified nucleotide.
- Some aspects of the disclosure relate to an mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the mRNA has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%, wherein each of the uridine nucleotides of the ORF comprises a chemical modification, wherein: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleo
- the chemically modified nucleotide comprise N1- methylpseudouridine.
- fewer than 15% of serine residues, fewer than 27% of proline residues, fewer than 28% of threonine residues, and fewer than 23% of alanine residues in the polypeptide are encoded by codons in the ORF comprising a CpA dinucleotide.
- no serine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide
- no proline residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide
- no threonine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide
- no alanine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide.
- no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide;
- no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide;
- no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide;
- no amino acid that immediately precedes an asparagine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide;
- no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucle
- no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, or lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- no codon in the ORF beginning with an adenosine nucleotide is immediately preceded by a codon in the ORF that ends in a cytidine nucleotide.
- the ORF is codon-optimized for expression in a cell.
- the cell is a mammalian cell.
- the mRNA further comprises: (i) a 5′ untranslated region (UTR); and/or (ii) a 3′ UTR.
- the 5′ UTR is a heterologous UTR and/or the 3′ UTR is a heterologous UTR.
- the 5′ UTR comprises five or fewer, four or fewer, three or fewer, two or fewer, one or fewer, or zero CpA dinucleotides. In some embodiments, the 5′ UTR does not comprise a CpA dinucleotide.
- the 3′ UTR comprises five or fewer, four or fewer, three or fewer, two or fewer, one or fewer, or zero CpA dinucleotides. In some embodiments, the 3′ UTR does not comprise a CpA dinucleotide. In some embodiments, the last nucleotide of the 5′ UTR is not a cytidine nucleotide.
- the 5′ UTR has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
- the ORF has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
- the 3′ UTR has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
- the mRNA further comprises: (iii) a 5′ cap structure; and/or (iv) a poly-A tail.
- the last nucleotide of the 3′ UTR is not a cytidine nucleotide.
- the 5′ cap structure comprises 7mG(5')ppp(5')NlmpNp.
- the level of expression in a mammalian cell of the encoded polypeptide from the mRNA is at least 50% of the level of expression of a reference mRNA comprising a reference open reading frame (rORF) encoding the polypeptide, wherein the rORF comprises a higher number of CpA dinucleotides than the ORF.
- rORF reference open reading frame
- one or more CpA dinucleotides of the mRNA comprises a modified cytidine nucleotide and/or a modified adenosine nucleotide.
- the number of CpA dinucleotides comprising an unmodified cytidine nucleotide and an unmodified adenosine nucleotide in the ORF is 100%, 95% or less, 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, or 10% or less of the total number of histidine and glutamine residues in the polypeptide.
- the polypeptide comprises 9–5,000, 20–4,000, 30–3,000, 40–2,000, or 50–1,500 amino acids.
- the polypeptide is a vaccine antigen or a therapeutic protein.
- a coefficient of degradation at 25 °C of the mRNA is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide.
- a composition comprising a plurality of the mRNAs remains above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNAs comprising a wild-type ORF encoding the polypeptide.
- storage of the mRNA is conducted at a temperature between about 2 °C to about 8 °C.
- the mRNA is stored in a buffer comprising 10–50 mM Tris and 5–10% sucrose, wherein the buffer has a pH of about 7.3 to about 7.6.
- the stability of the mRNA is increased relative to a reference mRNA having a higher number of CpA dinucleotides, the reference mRNA comprising a reference open reading frame (rORF) encoding the polypeptide, wherein the rORF has a higher number of CpA dinucleotides than the ORF.
- rORF reference open reading frame
- lipid nanoparticle comprising an mRNA described herein, and an ionizable cationic lipid, a non-cationic lipid, a sterol, and a polyethylene glycol (PEG)-modified lipid.
- the lipid nanoparticle comprises 20–60% ionizable cationic lipid, and 5–25% non-cationic lipid, 25–55% cholesterol, and 0.5–15% polyethylene glycol (PEG)-modified lipid.
- a coefficient of degradation at 25 °C of the mRNA in the lipid nanoparticle is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide.
- a composition comprising a plurality of the lipid nanoparticles remains above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of the lipid nanoparticles and mRNAs comprising a wild- type ORF encoding the polypeptide.
- the lipid nanoparticle further comprises a stabilizing compound of Formula (I): or a tautomer or solvate thereof, wherein: is a single bond or a double bond; R 1 is H; R 2 is OCH3, or together with R 3 is OCH2O; R 3 is OCH3, or together with R 2 is OCH2O; R 4 is H; R 5 is H or OCH3; R 6 is OCH3; R 7 is H or OCH 3 ; R 8 is H; R 9 is H or CH3; and X is a pharmaceutically acceptable anion.
- Formula (I) is a tautomer or solvate thereof
- the stabilizing compound is wherein the compound is of: ; Formula (Ic) or a tautomer or solvate thereof.
- the lipid nanoparticle further comprises a stabilizing compound of Formula (II): or a tautomer or solvate thereof, wherein: R 10 is H; R 11 is H; R 12 together with R 13 is OCH 2 O; R 14 is H; R 15 together with R 16 is OCH2O; R 17 is H; and X is a pharmaceutically acceptable anion.
- Some aspects of the disclosure relate to a pharmaceutical composition comprising a lipid nanoparticle described herein, and a pharmaceutically acceptable excipient.
- Some aspects of the disclosure relate to a method of producing a modified mRNA sequence comprising an ORF encoding a polypeptide, the method comprising modifying a reference mRNA sequence comprising a reference ORF to produce the modified mRNA sequence by: (a) replacing one or more codons in the reference ORF comprising a CpA dinucleotide with a codon that encodes the same amino acid but does not comprise a CpA dinucleotide; and/or (b) replacing one or more codons in the reference ORF that: (1) ends in a cytidine nucleotide; and (2) is immediately followed in the reference ORF by a codon that encodes an isoleucine, methionine, threonine, asparagine, or lysine, or a codon that encodes a serine or arginine and begins with an adenosine nucleotide, with a codon encoding the same amino acid as the replaced cod
- the reference mRNA sequence further comprises: (i) a reference 5′ untranslated region (UTR); and/or (ii) a reference 3′ UTR.
- the reference 5′ UTR is a heterologous 5′ UTR and/or the reference 3′ UTR is a heterologous 3′ UTR.
- the replacing comprises changing the last nucleotide of the reference 5′ UTR from a cytidine nucleotide to a non-cytidine nucleotide.
- the reference mRNA sequence further comprises: (iii) a 5′ cap structure; and/or (iv) a poly-A region.
- the replacing comprises changing the last nucleotide of the reference 3′ UTR from a cytidine nucleotide to a non-cytidine nucleotide.
- the method further comprises replacing one or more cytidine nucleotides in the reference mRNA sequence with guanosine nucleotides.
- the method further comprises replacing one or more unmodified cytidine nucleotides in the reference mRNA sequence with modified cytidine nucleotides.
- the method further comprises replacing one or more unmodified adenosine nucleotides in the reference mRNA sequence with modified adenosine nucleotides.
- the method further comprises replacing one or more adenosine nucleotides in the reference mRNA sequence with uracil nucleotides. In some embodiments, the method further comprises replacing one or more adenosine nucleotides in the reference mRNA sequence, that are not immediately followed by a second adenosine nucleotide, with cytidine nucleotides. In some embodiments, the method further comprises replacing one or more adenosine nucleotides in the reference mRNA sequence with guanosine nucleotides.
- the ORF of the modified mRNA sequence comprises a number of CpA dinucleotides that is greater than or equal to the theoretical minimum and less than or equal to 300% of the theoretical minimum. [0042] In some embodiments, the ORF of the modified mRNA sequences comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum.
- the number of CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum is no more than 10, no more than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, no more than 2, or no more than 1.
- the ORF of the modified mRNA sequence comprises a CpA dinucleotide content of 6.5% or less.
- the ORF of the modified mRNA sequence comprises a CpA dinucleotide content of 6.0% or less, 5.5% or less, 5% or less, 4.5% or less, 4% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.5% or less.
- the modified mRNA sequence (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end
- fewer than 15% of serine residues, fewer than 27% of proline residues, fewer than 28% of threonine residues, and fewer than 23% of alanine residues in the polypeptide are encoded by codons in the ORF that comprise a CpA dinucleotide.
- the modified mRNA sequence in the modified mRNA sequence: (a) no serine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (b) no proline residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (c) no threonine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; and/or (d) no alanine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide.
- the modified mRNA sequence in the modified mRNA sequence: (a) no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (b) no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (c) no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (d) no amino acid that immediately precedes an asparagine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (e) no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon in the ORF that ends in a
- no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- no codon in the ORF beginning with an adenosine nucleotide is immediately preceded by a codon in the ORF that ends in a cytidine nucleotide.
- the modified mRNA sequence comprises a %G/C content of 30% – 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
- one or more nucleotides of the modified mRNA sequence comprises a chemically modified nucleotide.
- each of the uridine nucleotides of the modified mRNA sequence comprises a chemically modified nucleotide.
- the chemically modified nucleotide comprises N1-methylpseudouridine.
- one or more CpA dinucleotides of the modified mRNA sequence comprises a modified cytidine nucleotide and/or a modified adenosine nucleotide.
- the number of CpA dinucleotides comprising an unmodified cytidine nucleotide and an unmodified adenosine nucleotide in the ORF of the modified mRNA sequence is 100%, 95% or less, 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, or 10% or less of the total number of histidine and glutamine residues in the polypeptide.
- the polypeptide comprises 9–5,000, 20–4,000, 30–3,000, 40–2,000, or 50–1,500 amino acids.
- the polypeptide is a vaccine antigen or a therapeutic protein.
- the ORF of the modified mRNA sequence is codon- optimized for expression in a cell.
- the cell is a mammalian cell.
- the cell is a human cell.
- the method further comprises transcribing the modified mRNA sequence to produce a modified mRNA.
- a level of expression in a mammalian cell of the encoded polypeptide from the modified mRNA is at least 80% of a level of expression of the reference mRNA.
- a coefficient of degradation at 25 °C of the modified mRNA is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising the reference ORF.
- a composition comprising a plurality of the mRNAs is remains at least above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNAs comprising the reference ORF.
- storage of the modified mRNA is conducted at a temperature between about 2 °C to about 8 °C.
- the modified mRNA has increased stability relative to a reference mRNA comprising the reference mRNA sequence.
- CpA dinucleotide contents and mRNA stability [0054] Some aspects relate to mRNAs encoding polypeptides, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, where the mRNA comprises a number of CpA dinucleotides content in the ORF that is at least equal to (i.e., equal to or greater than) a theoretical minimum number of CpA dinucleotides and at most (i.e., less than or equal to) 500% of the theoretical minimum.
- ORF open reading frame
- a “theoretical minimum” number of CpA dinucleotides refers to the number of histidine and glutamine residues present in a polypeptide encoded by an open reading frame. If a histidine or glutamine is present in an amino acid sequence, a codon beginning with CA is required to encode that amino acid, and so some CpA dinucleotides are required for a nucleic acid to encode a protein comprising histidine and/or glutamine residues.
- codons containing CpA dinucleotides may be also encoded by codons that do not contain a CpA dinucleotide (e.g., ACU, ACC, and ACG codons also encode threonine).
- portions of an mRNA sequence other than codons encoding histidine or glutamine may be mutated to reduce the number of CpA dinucleotides in an mRNA sequence to a level closer to the theoretical minimum.
- the number of CpA dinucleotides in an ORF of a modified mRNA or modified sequence is 100% – 400%, 100% – 300%, 100% – 200%, 100% – 150%, or 100% – 125% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 400% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 300% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 250% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 200% of the theoretical minimum.
- the number of CpA dinucleotides is at most 150% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 125% of the theoretical minimum.
- References to the ORF of an mRNA, its length, the polypeptide it encodes, and codons within the ORF, are to be understood as referring to the longest ORF in the mRNA, not internal open reading frames in the same frame as the ORF, alternative reading frames, or sequences that may be translated due to initiation at a start codon that is downstream from the first occurrence of the sequence AUG in the mRNA.
- Some aspects relate to mRNAs comprising an ORF encoding a polypeptide, with the ORF having a %CpA dinucleotide content of 6.5% or less.
- Some embodiments of such mRNAs contain ORFs with %CpA dinucleotide contents that are reduced, relative to a nucleic acid sequence encoding the same polypeptide (i.e., having the same amino acid sequence).
- the %CpA dinucleotide content (percentage CpA dinucleotide content) of a sequence can be determined by dividing the number of CpA dinucleotides in the sequence by the total number of dinucleotides in the sequence.
- the number of dinucleotides in a sequence is one fewer than the number of nucleotides.
- an ORF having 60 CpA dinucleotides and being 301 nucleotides in length has a %CpA dinucleotide content of 20%.
- the ORF of an mRNA described herein has a %CpA dinucleotide content of 6.0% or less, 5.0% or less, 4.5% or less, 4.0% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.5% or less.
- the ORF has a %CpA dinucleotide content of 6.0% or less.
- the ORF has a %CpA dinucleotide content of 5.5% or less.
- the ORF has a %CpA dinucleotide content of 5.0% or less.
- the ORF has a %CpA dinucleotide content of 4.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 4.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 3.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 3.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 2.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 2.0% or less.
- the ORF has a %CpA dinucleotide content of 1.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 1.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 0.5% or less. [0057] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by the methods described herein, an increased percentage of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine.
- a CpA dinucleotide is comprised within a codon if it forms either (i) the first and second nucleotides of a codon, or (ii) the second and third nucleotides of the codon, but not if it forms the third nucleotide of one codon and the first nucleotide of the second codon (i.e., the CpA dinucleotide bridges two codons).
- At least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or up to 100% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine.
- CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 50% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine.
- At least 60% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 70% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 80% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 90% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine.
- CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, 100% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. [0058] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by the methods described herein, the %CpA dinucleotide content in the ORF is reduced, relative to the %CpA dinucleotide content in a wild-type or reference ORF encoding the same polypeptide (e.g., having the same amino acid sequence).
- a “wild- type ORF,” as used herein, is the nucleotide sequence of a naturally occurring ORF that encodes the same polypeptide (having the same amino acid sequence) as the ORF of a modified mRNA or modified mRNA sequence, where the naturally occurring ORF is present on a naturally occurring mRNA.
- a “reference ORF,” as a starting sequence for modification to reduce %CpA dinucleotide content in a modified mRNA sequence, may be a wild-type ORF, or a non-naturally occurring ORF.
- an ORF of a modified mRNA or modified mRNA sequence has a %CpA dinucleotide content that is 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, or 30% or less of the %CpA dinucleotide content in a wild-type or reference ORF encoding the same polypeptide.
- At least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of CpA dinucleotides in the wild-type or reference ORF that are not comprised in a codon encoding histidine or glutamine are absent in a modified mRNA sequence encoding the polypeptide.
- Some aspects relate to mRNAs comprising an ORF encoding a polypeptide, where the ORF comprises a number of CpA dinucleotides that is greater than or equal to a theoretical minimum, but the number of CpA dinucleotides above (greater than) the theoretical minimum is no more than 11 per every 100 nucleotides of the ORF.
- an mRNA having a theoretical minimum of 20 CpA dinucleotides due to encoding a polypeptide with a total of 20 histidine and/or glutamine residues), and encoding a protein that is 99 amino acids in length, thus having an ORF 300 nucleotides in length (including the STOP codon), could have 33 CpA dinucleotides above the minimum of 20 and still satisfy the requirement of having no more than 11 CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum.
- the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 10.
- the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 9. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 8. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 7. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 6. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 5.
- the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 4. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 3. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 2. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 1.
- the proportion of codons encoding a given amino acid is lower than the expected proportion based on codon usage frequencies in nature. For example, approximately 15% of serine residues in human proteins are encoded by codons having the RNA sequence UCA (DNA sequence TCA). Similarly, approximately 27% of proline residues are encoded CCA codons, approximately 28% of threonine residues are encoded by ACA codons, and approximately 23% of alanine residues are encoded by GCA codons.
- fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of serine residues are encoded by UCA codons.
- fewer than 27%, fewer than 25%, fewer than 20%, fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of proline residues are encoded by CCA codons.
- fewer than 28%, fewer than 25%, fewer than 20%, fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of threonine residues are encoded by ACA codons.
- fewer than 23%, fewer than 20%, fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of alanine residues are encoded by GCA codons.
- fewer than 2% of serine residues are encoded by codons comprising the sequence UCA.
- fewer than 12% of proline residues are encoded by codons comprising the sequence CCA.
- fewer than 3% of threonine residues are encoded by codons comprising the sequence ACA.
- fewer than 5% of alanine residues are encoded by codons comprising the sequence GCA.
- no serine residue is encoded by a codon comprising the RNA sequence UCA.
- no proline residue is encoded by a codon comprising the sequence CCA.
- no threonine residue is encoded by a codon comprising the sequence ACA. In some embodiments, no alanine residue is encoded by a codon comprising the sequence GCA. In some embodiments, each serine, proline, threonine, and alanine residue is encoded by a codon that does not comprise a CpA dinucleotide. In some embodiments, none of the serine, proline, threonine, and alanine residues is encoded by a codon comprising a CpA dinucleotide.
- codons encoding serine, proline, threonine, and/or alanine are contemplated because such codons may contain CpA dinucleotides in humans, but similar approaches are contemplated for reducing numbers of CpA dinucleotidesin mRNAs suitable for introduction into cells with different genetic codes in which other amino acids may be encoded by codons containing CpA dinucleotides.
- the proportion of codons immediately preceding a codon encoding a given amino acid is lower than the expected proportion based on codon usage frequencies in nature.
- cytidine nucleotides For example, approximately 30% of codons in human open reading frames end in cytidine nucleotides.
- C cytidine
- A adenosine nucleotide
- codons encoding isoleucine, methionine, threonine, asparagine, and lysine cannot be mutated to begin with a different nucleotide without changing the encoded amino acid
- an upstream codon may be substituted with a codon that does not end in a cytidine nucleotide, to reduce the abundance of CpA dinucleotides formed at the junction between two codons.
- serine may be encoded by codons comprising the sequence AGU or AGC
- arginine may be encoded by codons comprising the sequence AGA or AGG.
- substituting the codons immediately preceding such serine-encoding AGU and AGC codons, and/or such arginine-encoding AGA and AGG codons, may also reduce the abundance of such CpA dinucleotides at the junctions between two codons.
- serine and arginine may also be encoded by codons that do not begin with adenosine nucleotides.
- serine may be encoded by codons beginning with UC and ending with a guanosine, uridine, or cytidine nucleotide
- arginine may be encoded by codons beginning with CG and ending with any third nucleotide.
- codons encoding serine or arginine, and beginning with adenosine nucleotides may be substituted with alternative codons that encode the same amino acid but do not begin with an adenosine nucleotide.
- codons immediately preceding codons encoding isoleucine, methionine, asparagine, lysine, serine, or arginine is specifically contemplated because all codons encoding isoleucine, methionine, asparagine, and lysine, and certain codons encoding serine and arginine, begin with adenosine nucleosides in humans, but similar approaches are contemplated for reducing numbers of CpA dinucleotides in mRNAs suitable for introduction into cells with different genetic codes in which other amino acids are encoded by codons beginning with adenosine residues.
- fewer than 30% of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 25% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 20% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide.
- 15% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 12% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 10% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide.
- codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 6% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 5% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide.
- 4% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 3% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 2% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide.
- 1% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, no codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. [0063] In some embodiments, fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an methionine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- no amino acid that immediately precedes an asparagine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- fewer than 30% of amino acids that immediately precede a serine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede a serine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- no amino acid that immediately precedes a serine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- fewer than 30% of amino acids that immediately precede an arginine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an arginine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide.
- no amino acid that immediately precedes an arginine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, or lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
- no amino acid that immediately precedes a serine or arginine in the polypeptide, where the serine or arginine is encoded by a codon beginning with an adenosine nucleotide is encoded by a codon that ends in a cytidine nucleotide.
- a codon comprising a CpA dinucleotide may be substituted with any synonymous codon (i.e., a codon encoding the same amino acid as the substituted codon) that does not comprise a CpA dinucleotide.
- Multiple codons comprising CpA dinucleotides may be substituted with the same synonymous codon, or with different synonymous codons.
- two or more ACA codons may each be substituted with an ACU codon, or one ACA codon may be substituted with an ACC codon and another may be substituted with an ACG codon.
- Substituting multiple instances of the same codon with different synonymous codons may be useful, for example, to achieve a desired distribution of codons encoding a given amino acid in an mRNA sequence.
- 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer UCA codons are substituted with a UCC codon.
- 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer UCA codons are substituted with a UCG codon.
- the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding serine residues are UCU codons.
- the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding serine residues are UCC codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding serine residues are UCG codons.
- 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer GCA codons are substituted with a GCC codon.
- 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer GCA codons are substituted with a GCG codon.
- 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of GCA codons are substituted with a GCC codon.
- the modified mRNA sequence comprises an ORF in which 5–80%, 10– 70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding alanine residues are GCU codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5– 80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding alanine residues are GCC codons.
- the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding alanine residues are GCG codons.
- 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer ACA codons are substituted with a ACC codon.
- 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer ACA codons are substituted with a ACG codon.
- 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of ACA codons are substituted with a ACC codon.
- 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of ACA codons are substituted with a ACG codon.
- the modified mRNA sequence comprises an ORF in which 5–80%, 10– 70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding threonine residues are ACU codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding threonine residues are ACC codons.
- the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding threonine residues are ACG codons.
- 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer CCA codons are substituted with a CCC codon.
- CCA codons 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer CCA codons are substituted with a CCG codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of CCA codons are substituted with a CCC codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of CCA codons are substituted with a CCG codon.
- the modified mRNA sequence comprises an ORF in which 5–80%, 10– 70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding proline residues are CCU codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5– 80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding proline residues are CCC codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding proline residues are CCG codons.
- substituting multiple instances of a given codon with the same synonymous codon may be useful, for example, to achieve a desired property of an mRNA sequence (e.g., %G/C content).
- one or more codons are substituted with codons comprising a higher %G/C content.
- 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of UCA codons are substituted with codons comprising either UCC or UCG.
- 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CCA codons are substituted with codons comprising either CCC or CCG.
- 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of ACA codons are substituted with codons comprising either ACC or ACG.
- 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of GCA codons are substituted with codons comprising either GCC or GCG.
- one or more codons are substituted with codons comprising an equal %G/C content.
- 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of UCA codons are substituted with UCU codons.
- 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CCA codons are substituted with CCU codons.
- 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of ACA codons are substituted with ACU codons.
- CpA dinucleotide abundance may also be reduced by substituting nucleotides in untranslated regions (UTRs) of an mRNA, such as a 5′ UTR or 3′ UTR.
- UTRs untranslated regions
- the extent to which mRNA stability may be improved by substituting one or more nucleotides of the 5′ UTR or 3′ UTR depends on the abundance of CpA dinucleotides in the sequence of unmodified UTRs. In some embodiments, 50% or more, 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a 5′ UTR are removed by substitution.
- CpA dinucleotides in a 3′ UTR are removed by substitution.
- Removing one or more CpA dinucleotides from an mRNA sequence may be achieved by substituting the cytidine nucleotide, the adenosine nucleotide, or both nucleotides of a CpA dinucleotide with different nucleotides, provided that the substitution does not introduce a new CpA dinucleotide into the sequence.
- the modified mRNA comprises a 5′ UTR that does not comprise a CpA dinucleotide.
- an mRNA described herein comprises a 3′ UTR that does not comprise a CpA dinucleotide.
- an mRNA sequence comprises one or more CpA dinucleotides that are present in regulatory motifs.
- the 5′ UTR comprises 5 or fewer, 4 or fewer, 3 or fewer, 2 or fewer, 1 or fewer, or 0 CpA dinucleotides.
- the 5′ UTR comprises no more than five CpA dinucleotides.
- the 5′ UTR comprises no more than four CpA dinucleotides.
- the 5′ UTR comprises no more than three CpA dinucleotides. In some embodiments, the 5′ UTR comprises no more than two CpA dinucleotides. In some embodiments, the 5′ UTR comprises no more than one CpA dinucleotides. In some embodiments, the 5′ UTR does not comprise a CpA dinucleotide. In some embodiments, the 3′ UTR comprises 5 or fewer, 4 or fewer, 3 or fewer, 2 or fewer, 1 or fewer, or 0 CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than five CpA dinucleotides.
- the 3′ UTR comprises no more than four CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than three CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than two CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than one CpA dinucleotides. In some embodiments, the 3′ UTR does not comprise a CpA dinucleotide. In some embodiments, the last nucleotide of the 5′ UTR (immediately preceding the AUG start codon) is not a cytidine nucleotide.
- the last nucleotide of the 3′ UTR (immediately preceding the polyA tail) is not a cytidine nucleotide.
- Some embodiments of mRNAs described herein, and modified mRNAs made by described methods comprise a sequence with a %G/C content of 30% – 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
- the nucleic acid sequence of the full-length mRNA comprises a %G/C content of 30% to 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
- the mRNA comprises an ORF with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%.
- the mRNA comprises 5′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%.
- the mRNA comprises 3′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%.
- a modified mRNA made by a method described herein comprises a higher %G/C content than a reference mRNA sequence.
- the %G/C content of the modified mRNA sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the reference RNA sequence.
- the %G/C content of the modified ORF sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the reference ORF sequence.
- the %G/C content of the modified 5′ UTR sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the reference 3′ UTR sequence.
- Some embodiments of mRNAs described herein, and modified mRNAs made by described methods express one or more encoded proteins in a mammalian cell at a level that is at least 50% of the level of expression of a reference mRNA encoding a protein with the same amino acid sequence, but containing a higher number of CpA dinucleotides.
- Expression of an encoded protein may refer to the number of copies of an encoded polypeptide produced by translation of a given mRNA molecule. Typically, a reduction in the level of an mRNA (e.g., by mRNA cleavage) results in a reduction in the level of a polypeptide translated therefrom.
- the level of expression may be determined using standard techniques for measuring protein.
- an mRNA has a level of expression in a mammalian cell that is at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or at least 100% of the level of expression of a reference mRNA encoding a protein with the same amino acid sequence, but containing a higher number of CpA dinucleotides.
- Examples of mammalian cells for use in evaluating expression of an mRNA include, without limitation, humans, mice, rats, hamsters, guinea pigs, cats, dogs, chimpanzees, macaques, baboons, and gorillas.
- the mammalian cell is a human cell.
- Some embodiments of the mRNAs described herein or produced by a method described herein are stable for longer periods of time than reference mRNAs having higher numbers of CpA dinucleotides but encoding a protein with the same amino acid sequence.
- the modified mRNA has a coefficient of degradation below a threshold value.
- a “coefficient of degradation” refers to a parameter of an equation describing the loss of nucleic acid purity over time.
- nucleic acid purity refers to the percentage of nucleic acid in a composition having a desired sequence and structure.
- compositions may be prepared using nucleic acids having a specific sequence encoding a protein to be expressed in cells.
- the nucleic acid may be degraded by environmental factors such as water or nucleases.
- Water molecules can hydrolyze the phosphodiester bond that bridges a phosphate moiety and sugar moiety in the sugar- phosphate backbone of a nucleic acid, resulting in the production of two separate nucleic acid molecules, neither of which contains an intact sequence encoding the full-length protein encoded by the unhydrolyzed nucleic acid.
- Nucleases are enzymes that can facilitate this process, but nucleic acids are susceptible to degradation by water molecules even in the absence of environmental nucleases.
- Nucleic acid purity may be measured by any one of multiple methods known in the art, such as mass spectrometry or high-performance liquid chromatography (HPLC) (see, e.g., Papadoyannis et al., J Liq Chrom Relat Tech.2007. 27(6):1083–1092).
- HPLC high-performance liquid chromatography
- a sample to be analyzed such as nucleic acid
- a column containing a solid material stationary phase
- the rate at which molecules of the sample move through the stationary phase depends on multiple factors, including size, such that different components of the sample will be observed at different times.
- a sample containing 100% pure nucleic acid will produce a single peak (main peak) on a chromatogram when analyzed by HPLC, while a sample containing multiple different nucleic acid molecules will produce multiple peaks, including a main peak and one or more impurity peaks, for a total of N peaks.
- the coefficient of degradation is expressed in units of day -1 .
- the modified mRNA has a coefficient of degradation at 25 °C that is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide.
- the coefficient of degradation of the modified mRNA at a temperature of 2 °C – 8 °C is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 90% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 80% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide.
- the coefficient of degradation of the modified mRNA is 70% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 60% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. [0084] In some embodiments, the decrease in degradation coefficient is calculated with respect to storage of modified mRNAs in the absence of lipid nanoparticles.
- the decrease in degradation coefficient is calculated with respect to storage of modified mRNAs in a buffer lacking lipid nanoparticles.
- the buffer comprises 10–100 mM Tris.
- the buffer comprises 5–10% sucrose.
- the buffer has a pH of about 7.3 to about 7.6.
- the buffer comprises 10–100 mM Tris, 5–10% sucrose, and has a pH of 7.3 to 7.6.
- the decrease in degradation coefficient is calculated with respect to storage of mRNAs formulated in lipid nanoparticles.
- the lipid nanoparticles may be any lipid nanoparticle described herein.
- the lipid nanoparticles may be another lipid nanoparticle known in the art.
- reduction in degradation coefficient is measured in mRNAs having an ORF of a length in a specific range, as it is understood that the length of an mRNA affects stability during storage (e.g., shorter mRNAs are less susceptible to degradation than longer mRNAs).
- the modified mRNA having a reduced degradation coefficient comprises an ORF that is 100–500, 500–1,000, 1,000–2,000, 2,000–3,000, 3,000–5,000, 100–5,000, 100–2,500, 100–1,500, 100–1,000, 500–5,000, 500– 2,500, 500–1,000, 1,000–5,000, 1,000–4,000, 1,000–3,000, 1,000–2,000, 2,000–5,000, 2,000–5,000, or 3,000–4,000 nucleotides in length.
- the modified mRNA having a reduced degradation coefficient comprises an ORF that is 300–5,000 nucleotides in length.
- the modified mRNA having a reduced degradation coefficient comprises an ORF that is 300–1,500 nucleotides in length.
- the modified mRNA having a reduced degradation coefficient comprises an ORF that is 1,500–3,000 nucleotides in length. In some embodiments, the modified mRNA having a reduced degradation coefficient comprises an ORF that is 3,000–5,000 nucleotides in length. [0086] [0087] In some embodiments, the nucleic acid degrades (e.g., as measured by capillary electrophoresis) about 2% or less per month during storage, such as about 1% or less, about 0.75% or less, about 0.5% or less, about 0.4% or less, about 0.3% or less, about 0.2% or less, or about 0.1% or less per month during storage (e.g., at 4 0C).
- the nucleic acid degrades (e.g., as measured by capillary electrophoresis) about 2% or less per month during storage, such as about 1% or less, about 0.75% or less, about 0.5% or less, about 0.4% or less, about 0.3% or less, about 0.2% or less, or about 0.1%
- the methods comprise producing compositions comprising modified nucleic acid, where the modified nucleic acid in the composition is at least 50% pure (such as about 50% pure, about 55% pure, about 60% pure, about 65% pure, about 70% pure, or about 75% pure or more) after storage at 0°C or more (such as 0 °C, 2 °C, 4 °C, 5 °C, 8 °C, 10 °C, 15 °C, 20 °C, 25 °C, or 2–8 °C) for a given length of time.
- the modified nucleic acid in the composition is at least 50% pure (such as about 50% pure, about 55% pure, about 60% pure, about 65% pure, about 70% pure, or about 75% pure or more) after storage at 0°C or more (such as 0 °C, 2 °C, 4 °C, 5 °C, 8 °C, 10 °C, 15 °C, 20 °C, 25 °C, or 2–8 °C) for a
- a composition comprising a plurality of the modified mRNAs remains above 50% purity (such as about 50% pure, about 55% pure, about 60% pure, about 65% pure, about 70% pure, or about 75% pure or more) for at least 30 days, at least 40 days, at least 50 days, at least 60 days, at least 75 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNA comprising a wild-type ORF encoding the polypeptide.
- the increase in duration of maintenance above 50% purity is during storage of modified mRNAs in the absence of lipid nanoparticles.
- the increase in duration of maintenance above 50% purity is during storage of modified mRNAs in a buffer lacking lipid nanoparticles.
- the buffer comprises 10–100 mM Tris.
- the buffer comprises 5–10% sucrose.
- the buffer has a pH of about 7.3 to about 7.6.
- the buffer comprises 10–100 mM Tris, 5–10% sucrose, and has a pH of 7.3 to 7.6.
- the increased duration of maintenance above 50% purity is during storage of mRNAs formulated in lipid nanoparticles.
- the lipid nanoparticles may be any lipid nanoparticle described herein.
- the lipid nanoparticles may be another lipid nanoparticle known in the art.
- improved stability is measured in mRNAs having an ORF of a length in a specific range, as it is understood that the length of an mRNA affects stability during storage (e.g., longer mRNAs are less stable than shorter mRNAs).
- the mRNA having improved stability comprises an ORF that is 100–500, 500–1,000, 1,000– 2,000, 2,000–3,000, 3,000–5,000, 100–5,000, 100–2,500, 100–1,500, 100–1,000, 500–5,000, 500–2,500, 500–1,000, 1,000–5,000, 1,000–4,000, 1,000–3,000, 1,000–2,000, 2,000–5,000, 2,000–5,000, or 3,000–4,000 nucleotides in length.
- the mRNA having improved stability comprises an ORF that is 300–5,000 nucleotides in length.
- the mRNA having improved stability comprises an ORF that is 300–1,500 nucleotides in length.
- the mRNA having improved stability comprises an ORF that is 1,500–3,000 nucleotides in length. In some embodiments, the mRNA having improved stability comprises an ORF that is 3,000–5,000 nucleotides in length.
- the storage is conducted at a temperature between about 2 °C and about 40 °C. In some embodiments, the storage is conducted at a temperature between about 22 °C and about 28 °C. In some embodiments, the storage is conducted at about 25 °C. In some embodiments, the storage is conducted at a temperature between about 2 °C and about 15 °C. In some embodiments, the storage is conducted at a temperature between about 2 °C and about 8 °C.
- the storage is conducted at about 3 °C. In some embodiments, the storage is conducted at about 5 °C.
- Degradation of nucleic acids is a chemical reaction that occurs more readily at higher temperatures, and as such the coefficient of degradation and kinetics of purity depend on the temperature at which nucleic acids are stored.
- the stability of a modified mRNA is evaluated by storing the mRNA in a buffer with a defined composition. In some embodiments, the mRNA is stored in a buffer comprising 10–100 mM Tris. In some embodiments, the mRNA is stored in a buffer comprising 5–10% sucrose.
- the mRNA is stored in a buffer having a pH of about 7.3 to about 7.6.
- the storage buffer comprises 10–100 mM Tris, 5–10% sucrose, and a pH of 7.3 to 7.6.
- Codon optimization [0091] In some embodiments, an mRNA is codon-optimized. Codon optimization methods are known in the art.
- Codon optimization may be used to match codon frequencies in target and host organisms to ensure proper folding; bias %G/C content to increase mRNA thermodynamic stability or reduce secondary structures; minimize tandem repeat codons or base runs that may impair gene construction or expression; customize transcriptional and translational control regions; insert or remove protein trafficking sequences; remove/add post translation modification sites in encoded protein (e.g., glycosylation sites); add, remove or shuffle protein domains; insert or delete restriction sites; modify ribosome binding sites and mRNA degradation sites; adjust translational rates to allow the various domains of the protein to fold properly; or reduce or eliminate problem secondary structures within the polynucleotide.
- Codon optimization tools, algorithms and services are known in the art – non-limiting examples include services from GeneArt (Life Technologies), DNA2.0 (Menlo Park CA) and/or proprietary methods.
- the open reading frame (ORF) sequence is optimized using optimization algorithms.
- a codon optimized sequence shares less than 95% sequence identity to a naturally-occurring or wild-type sequence ORF (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide).
- a codon optimized sequence shares less than 90% sequence identity to a naturally-occurring or wild- type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide).
- a codon optimized sequence shares less than 85% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares less than 80% sequence identity to a naturally-occurring or wild- type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares less than 75% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide).
- a codon optimized sequence shares between 65% and 85% (e.g., between about 67% and about 85% or between about 67% and about 80%) sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild- type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares between 65% and 75% or about 80% sequence identity to a naturally- occurring or wild-type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide).
- modified mRNAs When transfected into mammalian host cells, some embodiments of modified mRNAs have a stability of between 12-18 hours, or greater than 18 hours, e.g., 24, 36, 48, 60, 72, or greater than 72 hours and are capable of being expressed by the mammalian host cells.
- a codon optimized RNA may be one in which the levels of GC are enhanced.
- the G/C-content of nucleic acid molecules (e.g., mRNA) may influence the stability of the RNA.
- RNA having an increased amount of guanine (G) and/or cytosine (C) residues may be more thermodynamically stable than RNA containing a large amount of adenine (A) and thymine (T) or uracil (U) nucleotides.
- WO02/098443 discloses a pharmaceutical composition containing an mRNA stabilized by sequence modifications in the translated region. Due to the degeneracy of the genetic code, the modifications work by substituting existing codons for those that promote greater RNA stability without changing the resulting amino acid. The approach is limited to coding regions of the RNA.
- one or more cytidine or adenosine nucleotides of a CpA dinucleotide comprises a modified nucleotide.
- one or more cytidine nucleotides of a CpA dinucleotide comprises a modified nucleotide.
- substitutions are useful, for example, to improve mRNA stability where CpA dinucleotides are necessary, such as in codons encoding histidine or glutamine or in regulatory motifs (e.g., Kozak sequence).
- 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified cytidine nucleotide and/or a modified adenosine nucleotide.
- 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified cytidine nucleotide.
- 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified adenosine nucleotide.
- 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified cytidine nucleotide and a modified adenosine nucleotide.
- Multiple cytidine nucleotides may be substituted with the same or different modified cytidine nucleotides, and multiple adenosine nucleotides may be substituted with the same or different modified adenosine nucleotides.
- a modified cytidine nucleotide refers to a nucleotide comprising a structure different from the conventional structure of cytidine monophosphate (CMP) in an mRNA, but is still capable of hydrogen bonding with guanine (e.g., guanine of a guanosine nucleotide on a tRNA).
- CMP cytidine monophosphate
- a modified adenosine nucleotide refers to a nucleotide comprising a structure different from the conventional structure of adenosine monophosphate (AMP) in an mRNA, but is still capable of hydrogen bonding with uracil (e.g., uracil of a uridine nucleotide on a tRNA).
- a modified cytidine nucleotide may comprise a modified cytosine nucleobase (i.e., nucleobase that is capable of hydrogen bonding with guanine but has a different structure than canonical cytosine), a modified sugar (i.e., sugar other than ribose), and/or a modified phosphate (i.e., internucleoside linkage different from the canonical phosphate structure).
- cytosine nucleobase i.e., nucleobase that is capable of hydrogen bonding with guanine but has a different structure than canonical cytosine
- a modified sugar i.e., sugar other than ribose
- a modified phosphate i.e., internucleoside linkage different from the canonical phosphate structure
- a modified adenosine nucleotide may comprise a modified adenine nucleobase (i.e., nucleobase that is capable of hydrogen bonding with uracil but has a different structure than canonical adenine), a modified sugar, and/or a modified phosphate.
- modified nucleotides including examples of modified nucleobases, modified sugars, and modified phosphates, are described in the section below entitled “Nucleic acids.”
- Nucleic acids [0098] Some aspects relate to compositions comprising nucleic acids and methods of producing nucleic acids.
- nucleic acid includes multiple nucleotides (i.e., molecules comprising a sugar (e.g., ribose or deoxyribose) linked to a phosphate group and to an exchangeable organic base, which is either a substituted pyrimidine (e.g., cytosine (C), thymine (T) or uracil (U)) or a substituted purine (e.g., adenine (A) or guanine (G))).
- a substituted pyrimidine e.g., cytosine (C), thymine (T) or uracil (U)
- a substituted purine e.g., adenine (A) or guanine (G)
- nucleic acid includes polyribonucleotides as well as polydeoxyribonucleotides.
- nucleic acid also includes polynucleosides (i.e., a polynucleotide minus the phosphate) and any other organic base containing polymer.
- Non- limiting examples of nucleic acids include chromosomes, genomic loci, genes, or gene segments that encode polynucleotides or polypeptides, coding sequences, non-coding sequences (e.g., intron, 5′-UTR, or 3′-UTR) of a gene, pri-mRNA, pre-mRNA, cDNA, mRNA, etc.
- a nucleic acid e.g., mRNA
- the substitution and/or modification is in one or more bases and/or sugars.
- a nucleic acid e.g., mRNA
- mRNA includes nucleotides having an organic group, such as a methyl group, attached to a nucleic acid base at the N6 position.
- an mRNA Ies one or more N6-methyladenosine nucleotides.
- a phosphate, sugar, or nucleic acid base of a nucleotide may also be substituted for another phosphate, sugar, or nucleic acid base.
- a uridine base may be substituted for a pseudouridine base, in which the uracil base is attached to the sugar by a carbon-carbon bond rather than a nitrogen-carbon bond.
- a nucleic acid e.g., mRNA
- mRNA is heterogeneous in backbone composition thereby containing any possible combination of polymer units linked together such as peptide-nucleic acids (which have an amino acid backbone with nucleic acid bases).
- the nucleic acids described herein may include nucleic acid sequences that have been removed from their naturally occurring environment, recombinant or cloned DNA isolates, and chemically synthesized analogues or analogues biologically synthesized by heterologous systems.
- an “engineered nucleic acid” is a nucleic acid that does not occur in nature. It should be understood, however, that while an engineered nucleic acid as a whole is not naturally-occurring, it may include nucleotide sequences that occur in nature.
- an engineered nucleic acid comprises nucleotide sequences from different organisms (e.g., from different species).
- an engineered nucleic acid includes a bacterial nucleotide sequence, a human nucleotide sequence, and/or a viral nucleotide sequence.
- Engineered nucleic acids include recombinant nucleic acids and synthetic nucleic acids.
- a “recombinant nucleic acid” is a molecule that is constructed by joining nucleic acids (e.g., isolated nucleic acids, synthetic nucleic acids, or a combination thereof) and, in some embodiments, can replicate in a living cell.
- a “synthetic nucleic acid” is a molecule that is amplified or chemically, or by other means, synthesized.
- a synthetic nucleic acid includes those that are chemically modified, or otherwise modified, but can base pair with naturally- occurring nucleic acid molecules.
- Recombinant and synthetic nucleic acids also include those molecules that result from the replication of either of the foregoing.
- a nucleic may comprise naturally occurring nucleotides and/or non-naturally occurring nucleotides such as modified nucleotides.
- a nucleic acid is present in (or on) a vector.
- vectors include but are not limited to bacterial plasmids, phage, cosmids, phasmids, fosmids, bacterial artificial chromosomes, yeast artificial chromosomes, viruses, and retroviruses (for example vaccinia, adenovirus, adeno-associated virus, lentivirus, herpes-simplex virus, Epstein-Barr virus, fowlpox virus, pseudorabies, baculovirus) and vectors derived therefrom.
- retroviruses for example vaccinia, adenovirus, adeno-associated virus, lentivirus, herpes-simplex virus, Epstein-Barr virus, fowlpox virus, pseudorabies, baculovirus
- a nucleic acid used as an input molecule for in vitro transcription (IVT) is present in a plasmid vector.
- IVT in vitro transcription
- isolated denotes that the polynucleotide sequence has been removed from its natural genetic milieu and is thus free of other extraneous or unwanted coding sequences (but may include naturally occurring 5′ and 3′ untranslated regions such as promoters and terminators), and is in a form suitable for use within genetically engineered protein production systems.
- isolated molecules are those that are separated from their natural environment.
- 5′ and 3′ are used herein to describe features of a nucleic acid sequence related to either the position of genetic elements and/or the direction of events (5′ to 3′), such as e.g. transcription by RNA polymerase or translation by the ribosome which proceeds in 5′ to 3′ direction. Synonyms are upstream (5′) and downstream (3′). Conventionally, DNA sequences, gene maps, vector cards and RNA sequences are drawn with 5′ to 3′ from left to right or the 5′ to 3′ direction is indicated with arrows, wherein the arrowhead points in the 3′ direction.
- a nucleic acid typically comprises a plurality of nucleotides.
- a nucleotide includes a nitrogenous base, a five-carbon sugar (ribose or deoxyribose), and at least one phosphate group.
- Nucleotides include nucleoside monophosphates, nucleoside diphosphates, and nucleoside triphosphates.
- a nucleoside monophosphate includes a nucleobase linked to a ribose and a single phosphate; a nucleoside diphosphate (NDP) includes a nucleobase linked to a ribose and two phosphates; and a nucleoside triphosphate (NTP) includes a nucleobase linked to a ribose and three phosphates.
- Nucleotide analogs are compounds that have the general structure of a nucleotide or are structurally similar to a nucleotide.
- Nucleotide analogs include an analog of the nucleobase, an analog of the sugar and/or an analog of the phosphate group(s) of a nucleotide.
- a nucleoside includes a nitrogenous base and a 5-carbon sugar. Thus, a nucleoside plus a phosphate group yields a nucleotide.
- Nucleoside analogs are compounds that have the general structure of a nucleoside or are structurally similar to a nucleoside. Nucleoside analogs, for example, include an analog of the nucleobase and/or an analog of the sugar of a nucleoside.
- nucleotide includes naturally-occurring nucleotides, synthetic nucleotides and modified nucleotides, unless indicated otherwise.
- naturally-occurring nucleotides used for the production of RNA include adenosine triphosphate (ATP), guanosine triphosphate (GTP), cytidine triphosphate (CTP), uridine triphosphate (UTP), and 5-methyluridine triphosphate (m 5 UTP).
- ATP adenosine triphosphate
- GTP guanosine triphosphate
- CTP cytidine triphosphate
- UTP uridine triphosphate
- m 5 UTP 5-methyluridine triphosphate
- adenosine diphosphate ADP
- GDP guanosine diphosphate
- CDP cytidine diphosphate
- UDP uridine diphosphate
- nucleotide analogs include, but are not limited to, antiviral nucleotide analogs, phosphate analogs (soluble or immobilized, hydrolyzable or non-hydrolyzable), dinucleotide, trinucleotide, tetranucleotide, e.g., a cap analog, or a precursor/substrate for enzymatic capping (vaccinia or ligase), a nucleotide labeled with a functional group to facilitate ligation/conjugation of cap or 5 ⁇ moiety (IRES), a nucleotide labeled with a 5 ⁇ PO4 to facilitate ligation of cap or 5 ⁇ moiety, or a nucleotide label
- antiviral nucleotide/nucleoside analogs include, but are not limited, to Ganciclovir, Entecavir, Telbivudine, Vidarabine and Cidofovir.
- Modified nucleotides may include modified nucleobases.
- an RNA transcript (e.g., mRNA transcript) described herein may include a modified nucleobase selected from pseudouracil ( ⁇ ), N1-methylpseudouracil (m1 ⁇ ), 1-ethylpseudouracil, 2- thiouracil, 4′-thiouracil, 2-thio-1-methyl-1-deaza-pseudouracil, 2-thio-1-methyl-pseudouracil, 2-thio-5-aza-uracil, 2-thio-dihydropseudouracil, 2-thio-dihydrouracil, 2-thio-pseudouracil, 4- methoxy-2-thio-pseudouracil, 4-methoxy-pseudouracil, 4-thio-1-methyl-pseudouracil, 4-thio- pseudouracil, 5-aza-uracil, dihydropseudouracil, 5-methyluracil, 5-methoxy
- an RNA transcript may include a modified cytosine nucleobase selected from digoxigeninated cytosine, 2-thiocytosine, 5- aminoallylcytosine, 5-bromocytosine, 5-carboxycytosine, 5-formylcytosine, 5- hydroxycytosine, 5-hydroxymethylcytosine, 5-methoxycytosine, 5-methylcytosine, 5- propargylaminocytosine, 5-propynylcytosine, 6-azacytosine, aracytosine, cyanine 3-5- propargylaminocytosine, cyanine 3-aminoallylcytosine, cyanine 5-6-propargylaminocytosine, cyanine 5-aminoallylcytosine, desthiobiotin-6-aminoallylcytosine, N4-biotin-OBEA- cytosine, N4-methylcytosine, pseudoisocytosine, and thienocytosine.
- an RNA transcript may include a modified adenine nucleobase selected from digoxigeninated adenine, N6-methyladenine, 7-deazaadenine, 7-deaza-7-propargylaminoadenine, 8- azaadenine, 8-azidoadenine, 8-chloroadenine, 8-oxoadenine, araadenine, N1-methyladenine, N6-methyladenine [0110] 3-deazaadenine, 2,6-diaminoadenine, 2-methyl-thio-N6-isopentenyladenine (ms2i6A), 2-methylthio-N6-methyladenine (ms2m6A), N6-(cis-hydroxyisopentenyl)adenine (io6A), 2-methylthio-N6-(cis-hydroxyisopentenyl)adenine (ms2io6A), N6- glycinylcarb
- an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified nucleobases.
- Modified nucleotides may include modified sugars.
- an RNA transcript (e.g., mRNA transcript) described herein may include a modified sugar selected from 2′-thioribose, 2′,3′-dideoxyribose, 2′-amino-2′-deoxyribose, 2′ deoxyribose, 2′-azido-2′- deoxyribose, 2′-fluoro-2′-deoxyribose, 2′-O-methylribose, 2′-O-methyldeoxyribose, 3′- amino-2′,3′-dideoxyribose, 3′-azido-2′,3′-dideoxyribose, 3′-deoxyribose, 3′-O-(2- nitrobenzyl)-2′-deoxyribose, 3′-O-methylribose, 5′-aminoribose, 5′-thioribose, 5-nitro-1- indolyl-2′-deoxyribo
- an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified sugars.
- Modified nucleotides may include modified phosphates.
- a modified phosphate group is a phosphate group that differs from the canonical structure of phosphate.
- An example of a canonical structure of a phosphate is shown below: , where R 5 and R 3 are atoms or molecules to which the canonical phosphate is bonded.
- R 5 may refer to the upstream nucleotide of the nucleic acid
- R3 may refer to the downstream nucleotide of the nucleic acid.
- the canonical structure of phosphate also refers to structures in which one or more hydroxyl groups of the phosphate are deprotonated, or in which an oxygen atom of the phosphate is bonded to an adjacent nucleotide in a nucleic acid sequence.
- an RNA transcript (e.g., mRNA transcript) described herein may include a modified phosphate selected from phosphorothioate (PS), thiophosphate, 5′-O-methylphosphonate, 3′-O- methylphosphonate, 5′-hydroxyphosphonate, hydroxyphosphanate, phosphoroselenoate, selenophosphate, phosphoramidate, carbophosphonate, methylphosphonate, phenylphosphonate, ethylphosphonate, H-phosphonate, guanidinium ring, triazole ring, boranophosphate (BP), methylphosphonate, and guanidinopropyl phosphoramidate.
- PS phosphorothioate
- thiophosphate 5′-O-methylphosphonate
- 3′-O- methylphosphonate 5′-hydroxyphosphonate, hydroxyphosphanate
- phosphoroselenoate selenophosphate
- phosphoramidate carb
- an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified phosphates.
- mRNAs described herein may be used to produce polypeptides of interest, such as therapeutic proteins and/or vaccine antigens.
- an mRNA encodes a vaccine antigen.
- an mRNA encodes a therapeutic protein.
- the encoded polypeptide comprises 9–10,000, 9–9,000, 9–8,000, 9–7,000, 9– 6,000, 9–5,000, 9–4,000, 9–3,000, 9–2,000, 9–1,000, 9–500, 9–400, 9–300, 9–200, 9–100, 9– 10,000, 100–9,000, 100–8,000, 100–7,000, 100–6,000, 100–5,000, 100–4,000, 100–3,000, 100–2,000, 100–1,000, 100–500, 100–400, 100–300, 100–200, 100–9,000, 200–10,000, 200– 9,000200–8,000, 200–7,000, 200–6,000, 200–5,000, 200–4,000, 200–3,000, 200–2,000, 200–1,000, 200–500, 200–400, 500–10,000, 500–9,000, 500–8,000, 500–7,000, 500–6,000, 500–5,000, 500–4,000, 500–3,000, 500–2,000, 500–1,000, 1,000–10,000, 1,000–9,000, 1,000–8,000, 1,000–7,000, 1,000–6,000, 1,000–5,000, 1,000–4,000, 1,000–3,000, or 1,000
- the encoded polypeptide consists of 9–10,000, 9– 9,000, 9–8,000, 9–7,000, 9–6,000, 9–5,000, 9–4,000, 9–3,000, 9–2,000, 9–1,000, 9–500, 9– 400, 9–300, 9–200, 9–100, 9–10,000, 100–9,000, 100–8,000, 100–7,000, 100–6,000, 100– 5,000, 100–4,000, 100–3,000, 100–2,000, 100–1,000, 100–500, 100–400, 100–300, 100–200, 100–9,000, 200–10,000, 200–9,000200–8,000, 200–7,000, 200–6,000, 200–5,000, 200– 4,000, 200–3,000, 200–2,000, 200–1,000, 200–500, 200–400, 500–10,000, 500–9,000, 500– 8,000, 500–7,000, 500–6,000, 500–5,000, 500–4,000, 500–3,000, 500–2,000, 500–1,000, 1,000–10,000, 1,000–9,000, 1,000–8,000, 1,000–7,000, 1,000–6,000, 1,000–5,000, 1,000–5,000, 1,000– 4,000, 1,000
- the encoded polypeptide comprises 9–5,000 amino acids. In some embodiments, the encoded polypeptide consists of 9–5,000 amino acids. In some embodiments, the encoded polypeptide comprises 20–4,000 amino acids. In some embodiments, the encoded polypeptide consists of 20–4,000 amino acids. In some embodiments, the encoded polypeptide comprises 30–3,000 amino acids. In some embodiments, the encoded polypeptide consists of 30–3,000 amino acids. In some embodiments, the encoded polypeptide comprises 40–2,000 amino acids. In some embodiments, the encoded polypeptide consists of 40–2,000 amino acids. In some embodiments, the encoded polypeptide comprises 50–1,500 amino acids. In some embodiments, the encoded polypeptide consists of 50–1,500 amino acids.
- the encoded polypeptide comprises 100–5,000 amino acids. In some embodiments, the encoded polypeptide consists of 100–5,000 amino acids. In some embodiments, the encoded polypeptide comprises 200–4,000 amino acids. In some embodiments, the encoded polypeptide consists of 200–4,000 amino acids. In some embodiments, the encoded polypeptide comprises 300–3,000 amino acids. In some embodiments, the encoded polypeptide consists of 300–3,000 amino acids. In some embodiments, the encoded polypeptide comprises 400–2,000 amino acids. In some embodiments, the encoded polypeptide consists of 400–2,000 amino acids. In some embodiments, the encoded polypeptide comprises 500–1,500 amino acids. In some embodiments, the encoded polypeptide consists of 500–1,500 amino acids.
- a therapeutic mRNA is an mRNA that encodes a therapeutic protein (the term ‘protein’ encompasses peptides).
- RNA compositions described herein comprise one or more RNAs that encode peptides or proteins that interact or complex in a cell or subject to form a multi-subunit protein (e.g., an antibody comprising a heavy chain and a light chain, a multi-subunit receptor protein, a multi-subunit signaling protein, a multi- subunit antigen, etc.) or a multivalent vaccine.
- Therapeutic proteins mediate a variety of effects in a host cell or in a subject to treat a disease or ameliorate the signs and symptoms of a disease.
- a therapeutic protein can replace a protein that is deficient or abnormal, augment the function of an endogenous protein, provide a novel function to a cell (e.g., inhibit or activate an endogenous cellular activity, or act as a delivery agent for another therapeutic compound (e.g., an antibody-drug conjugate).
- Therapeutic mRNA may be useful for the treatment of the following diseases and conditions: bacterial infections, viral infections, parasitic infections, cell proliferation disorders, genetic disorders, and autoimmune disorders. Other diseases and conditions are encompassed herein.
- a protein or proteins of interest encoded by an RNA composition as described herein can be essentially any protein or peptide (e.g., peptide antigen).
- a therapeutic peptide or therapeutic protein is a biologic.
- a biologic is a polypeptide-based molecule that may be used to treat, cure, mitigate, prevent, or diagnose a serious or life-threatening disease or medical condition.
- Biologics include, but are not limited to, allergenic extracts (e.g. for allergy shots and tests), blood components, gene therapy products, human tissue or cellular products used in transplantation, vaccines, monoclonal antibodies, cytokines, growth factors, enzymes, thrombolytics, and immunomodulators, among others.
- the therapeutic protein is a cytokine, a growth factor, an antibody (e.g., monoclonal antibody), a fusion protein, or a vaccine (e.g., an RNA encoding one or more peptide antigens designed to elicit an immune response in a subject).
- therapeutic proteins include blood factors (such as Factor VIII and Factor VII), complement factors, Low Density Lipoprotein Receptor (LDLR) and MUT1.
- cytokines include interleukins, interferons, chemokines, lymphokines and the like.
- Non-limiting examples of growth factors include erythropoietin, EGFs, PDGFs, FGFs, TGFs, IGFs, TNFs, CSFs, MCSFs, GMCSFs and the like.
- Non-limiting examples of antibodies include adalimumab, infliximab, rituximab, ipilimumab, tocilizumab, canakinumab, itolizumab, tralokinumab, anti-influenza virus monoclonal antibody, anti-Chikungunya virus monoclonal antibody, anti-Zika virus monoclonal antibody, anti-SARS-CoV-2 monoclonal antibody.
- Non-limiting examples of fusion proteins include, for example, etanercept, abatacept and belatacept.
- Non-limiting examples of multivalent vaccines include, for example, multivalent cytomegalovirus (CMV) vaccine, and personalized cancer vaccines.
- CMV multivalent cytomegalovirus
- One or more biologics currently being marketed or in development may be encoded by the RNA. While not wishing to be bound by theory, it is believed that incorporation of the encoding polynucleotides of a known biologic into the RNA described herein will result in improved therapeutic efficacy due at least in part to the specificity, purity and/or selectivity of the construct designs.
- RNA composition described herein may encode one or more antibodies (e.g., may comprise a first mRNA encoding an antibody heavy chain and a second RNA encoding an antibody light chain).
- antibody includes monoclonal antibodies (including full length antibodies which have an immunoglobulin Fc region), antibody compositions with polyepitopic specificity, multispecific antibodies (e.g., bispecific antibodies, diabodies, and single-chain molecules), as well as antibody fragments.
- immunoglobulin Ig is used interchangeably with “antibody” herein.
- a monoclonal antibody is an antibody obtained from a population of substantially homogeneous antibodies, i.e., the individual antibodies comprising the population are identical except for possible naturally occurring mutations and/or post-translation modifications (e.g., isomerizations, amidations) that may be present in minor amounts. Monoclonal antibodies are highly specific, being directed against a single antigenic site.
- Monoclonal antibodies specifically include chimeric antibodies (immunoglobulins) in which a portion of the heavy and/or light chain is identical with or homologous to corresponding sequences in antibodies derived from a particular species or belonging to a particular antibody class or subclass, while the remainder of the chain(s) is(are) identical with or homologous to corresponding sequences in antibodies derived from another species or belonging to another antibody class or subclass, as well as fragments of such antibodies, so long as they exhibit the desired biological activity.
- Chimeric antibodies include, but are not limited to, “primatized” antibodies comprising variable domain antigen- binding sequences derived from a non-human primate (e.g., Old World Monkey, Ape etc.) and human constant region sequences.
- Antibodies encoded in the RNA compositions may be utilized to treat conditions or diseases in many therapeutic areas such as, but not limited to, blood, cardiovascular, CNS, poisoning (including antivenoms), dermatology, endocrinology, gastrointestinal, medical imaging, musculoskeletal, oncology, immunology, respiratory, sensory and anti-infective.
- An RNA composition described herein may encode one or more vaccine antigens.
- a vaccine antigen is a biological preparation that improves immunity to a particular disease or infectious agent.
- One or more vaccine antigens currently being marketed or in development may be encoded by the RNA.
- Vaccine antigens encoded in the RNA may be utilized to treat conditions or diseases in many therapeutic areas such as, but not limited to, cancer, allergy, and infectious disease.
- a vaccine may be a personalized vaccine in the form of a concatemer or individual RNAs encoding peptide epitopes or a combination thereof.
- An RNA composition described herein may be designed to encode on or more antimicrobial peptides (AMP) or antiviral peptides (AVP).
- AMPs and AVPs have been isolated and described from a wide range of animals such as, but not limited to, microorganisms, invertebrates, plants, amphibians, birds, fish, and mammals.
- the anti- microbial polypeptides may block cell fusion and/or viral entry by one or more enveloped viruses (e.g., HIV, HCV).
- the anti-microbial polypeptide can comprise or consist of a synthetic peptide corresponding to a region, e.g., a consecutive sequence of at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 amino acids of the transmembrane subunit of a viral envelope protein, e.g., HIV-1 gp120 or gp41.
- the amino acid and nucleotide sequences of HIV-1 gp120 or gp41 are described in, e.g., Kuiken et al., (2008).
- RNA transcripts are used for in vitro translation and microinjection.
- RNA transcripts are used for RNA structure, processing and catalysis studies.
- RNA transcripts are used for RNA amplification.
- RNA transcripts are used as anti-sense RNA for gene expression modulation.
- Other applications are also encompassed.
- 5′ cap structures [0126]
- a composition includes an RNA polynucleotide having an open reading frame encoding at least one polypeptide having at least one modification, at least one 5′ terminal cap.
- 5′ terminal caps can include endogenous caps or cap analogs.
- a 5′ terminal cap can comprise a guanine analog.
- Useful guanine analogs include, but are not limited to, inosine, N1-methyl-guanosine, 2′fluoro-guanosine, 7-deaza-guanosine, 8-oxo-guanosine, 2- amino-guanosine, LNA-guanosine, and 2-azido-guanosine.
- caps including those that can be used in co- transcriptional capping methods for ribonucleic acid (RNA) synthesis, using RNA polymerase, e.g., wild type RNA polymerase or variants thereof, e.g., such as those variants described herein.
- RNA polymerase e.g., wild type RNA polymerase or variants thereof, e.g., such as those variants described herein.
- caps can be added when RNA is produced in a “one- pot” reaction, without the need for a separate capping reaction.
- the methods in some embodiments, comprise reacting a polynucleotide template with a RNA polymerase variant, nucleoside triphosphates, and a cap analog under in vitro transcription reaction conditions to produce RNA transcript.
- the cap analog binds to a polynucleotide template that comprises a promoter region comprising a transcriptional start site having a first nucleotide at nucleotide position +1, a second nucleotide at nucleotide position +2, and a third nucleotide at nucleotide position +3.
- the cap analog hybridizes to the polynucleotide template at least at nucleotide position +1, such as at the +1 and +2 positions, or at the +1, +2, and +3 positions.
- a cap analog may be, for example, a dinucleotide cap, a trinucleotide cap, or a tetranucleotide cap.
- a cap analog is a dinucleotide cap.
- a cap analog is a trinucleotide cap.
- a cap analog is a tetranucleotide cap.
- the term “cap” includes the inverted G nucleotide and can comprise additional nucleotides 3’ of the inverted G, .e.g., 1, 2, or more nucleotides 3’ of the inverted G and 5’ to the 5’ UTR.
- Exemplary caps comprise a sequence GG, GA, or GGA wherein the underlined, italicized G is an inverted G.
- a trinucleotide cap comprises a compound of Formula (III) or (IV), or a stereoisomer, tautomer, or salt thereof.
- a trinucleotide cap in some embodiments, comprises a compound of formula (III): tautomer, or salt thereof, wherein ring B1 is a modified or unmodified Guanine; ring B2 and ring B3 each independently is a nucleobase or a modified nucleobase; X 2 is O, S(O) p , NR 24 or CR 25 R 26 in which p is 0, 1, or 2; Y 0 is O or CR 6 R 7 ; Y1 is O, S(O)n, CR6R7, or NR8, in which n is 0, 1, or 2; each --- is a single bond or absent, wherein when each --- is a single bond, Yi is O, S(O) n , CR 6 R 7 , or NR 8 ; and when each --- is absent, Y 1 is void; Y2 is (OP(O)R4)m in which m is 0, 1, or
- a cap analog may include any of the cap analogs described in international publication WO 2017/066797, published on 20 April 2017, incorporated by reference herein in its entirety.
- the B 2 middle position can be a non-ribose molecule, such as arabinose.
- R2 is ethyl-based.
- a trinucleotide cap comprises the following structure: (IIIa), or a stereoisomer, tautomer, or salt thereof.
- a trinucleotide cap comprises the following structure: (IIIb), or a stereoisomer, tautomer or salt thereof. [0139] In still other embodiments, a trinucleotide cap comprises the following structure: (IIIc), or a stereoisomer, tautomer, or salt thereof. [0140] In some embodiments, R is an alkyl (e.g., C1-C6 alkyl). In some embodiments, R is a methyl group (e.g., C 1 alkyl). In some embodiments, R is an ethyl group (e.g., C 2 alkyl).
- a trinucleotide cap in some embodiments, comprises a sequence selected from the following sequences: GAA, GAC, GAG, GAU, GCA, GCC, GCG, GCU, GGA, GGC, GGG, GGU, GUA, GUC, GUG, and GUU.
- a trinucleotide cap comprises GAA.
- a trinucleotide cap comprises GAC.
- a trinucleotide cap comprises GAG.
- a trinucleotide cap comprises GAU.
- a trinucleotide cap comprises GCA.
- a trinucleotide cap comprises GCC. In some embodiments, a trinucleotide cap comprises GCG. In some embodiments, a trinucleotide cap comprises GCU. In some embodiments, a trinucleotide cap comprises GGA. In some embodiments, a trinucleotide cap comprises GGC. In some embodiments, a trinucleotide cap comprises GGG. In some embodiments, a trinucleotide cap comprises GGU. In some embodiments, a trinucleotide cap comprises GUA. In some embodiments, a trinucleotide cap comprises GUC. In some embodiments, a trinucleotide cap comprises GUG.
- a trinucleotide cap comprises GUU.
- a trinucleotide cap comprises a sequence selected from the following sequences: m 7 GpppApA, m 7 GpppApC, m 7 GpppApG, m 7 GpppApU, m 7 GpppCpA, m 7 GpppCpC, m 7 GpppCpG, m 7 GpppCpU, m 7 GpppGpA, m 7 GpppGpC, m 7 GpppGpG, m 7 GpppGpU, m 7 GpppUpA, m 7 GpppUpC, m 7 GpppUpG, and m 7 GpppUpU.
- a trinucleotide cap comprises m 7 GpppApA. In some embodiments, a trinucleotide cap comprises m 7 GpppApC. In some embodiments, a trinucleotide cap comprises m 7 GpppApG. In some embodiments, a trinucleotide cap comprises m 7 GpppApU. In some embodiments, a trinucleotide cap comprises m 7 GpppCpA. In some embodiments, a trinucleotide cap comprises m 7 GpppCpC. In some embodiments, a trinucleotide cap comprises m 7 GpppCpG.
- a trinucleotide cap comprises m 7 GpppCpU. In some embodiments, a trinucleotide cap comprises m 7 GpppGpA. In some embodiments, a trinucleotide cap comprises m 7 GpppGpC. In some embodiments, a trinucleotide cap comprises m 7 GpppGpG. In some embodiments, a trinucleotide cap comprises m 7 GpppGpU. In some embodiments, a trinucleotide cap comprises m 7 GpppUpA. In some embodiments, a trinucleotide cap comprises m 7 GpppUpC.
- a trinucleotide cap comprises m 7 GpppUpG. In some embodiments, a trinucleotide cap comprises m 7 GpppUpU.
- a trinucleotide cap comprises a sequence selected from the following sequences: m 7 g3′OMepppApA, m 7 g3′OMepppApC, m 7 g3′OMepppApG, m 7 g 3 ′ OMe pppApU, m 7 g 3 ′ OMe pppCpA, m 7 g 3 ′ OMe pppCpC, m 7 g 3 ′ OMe pppCpG, m 7 g 3 ′ OMe pppCpU, m 7 g 3 ′ OMe pppGpA, m 7 g 3 ′ OMe pppGpA, m 7 g 3 ′ OMe pppGp
- a trinucleotide cap comprises m 7 G 3 ′ OMe pppApA. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppApC. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppApG. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppApU. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppCpA.
- a trinucleotide cap comprises m 7 G 3 ′ OMe pppCpC. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppCpG. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppCpU. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppGpA. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppGpC.
- a trinucleotide cap comprises m 7 G 3 ′ OMe pppGpG. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppGpU. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppUpA. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppUpC. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppUpG. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppUpU.
- a trinucleotide cap in other embodiments, comprises a sequence selected from the following sequences: m 7 G 3 ′ OMe pppA 2 ′ OMe pA, m 7 G 3 ′ OMe pppA 2 ′ OMe pC, m 7 G 3 ′ OMe pppA 2 ′ OMe pG, m 7 G 3 ′ OMe pppA 2 ′ OMe pU, m 7 G 3 ′ OMe pppC 2 ′ OMe pA, m 7 G 3 ′ OMe pppC 2 ′ OMe pC, m 7 G 3 ′ OMe pppC 2 ′ OMe pG, m 7 G 3 ′ OMe pppC 2 ′ OMe pG, m 7 G 3 ′ OMe pppC 2 ′ OMe pU, m 7 G 3 ′ OMe ppp
- a trinucleotide cap comprises m 7 G3′OMepppA2′OMepA. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppA2′OMepC. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppA2′OMepG. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppA2′OMepU. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppC2′OMepA.
- a trinucleotide cap comprises m 7 G3′OMepppC2′OMepC. In some embodiments, a trinucleotide cap comprises m 7 G3′OMepppC2′OMepG. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppC 2 ′ OMe pU. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppG 2 ′ OMe pA. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppG 2 ′ OMe pC.
- a trinucleotide cap comprises m 7 G 3 ′ OMe pppG 2 ′ OMe pG. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppG 2 ′ OMe pU. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppU 2 ′ OMe pA. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppU 2 ′ OMe pC.
- a trinucleotide cap comprises m 7 G 3 ′ OMe pppU 2 ′ OMe pG. In some embodiments, a trinucleotide cap comprises m 7 G 3 ′ OMe pppU 2 ′ OMe pU.
- a trinucleotide cap in still other embodiments, comprises a sequence selected from the following sequences: m 7 Gpppa2′OMepA, m 7 Gpppa2′OMepC, m 7 Gpppa2′OMepG, m 7 Gpppa2′OMepU, m 7 Gpppc2′OMepA, m 7 Gpppc2′OMepC, m 7 Gpppc2′OMepG, m 7 Gpppc2′OMepU, m 7 Gpppg 2 ′ OMe pA, m 7 Gpppg 2 ′ OMe pC, m 7 Gpppg 2 ′ OMe pG, m 7 Gpppg 2 ′ OMe pU, m 7 Gpppu 2 ′ OMe pA, m 7 Gpppu 2 ′ OMe pC, m 7 GpppU 2 ′ Ome
- a trinucleotide cap comprises m 7 GpppA 2 ′ OMe pA. In some embodiments, a trinucleotide cap comprises m 7 GpppA 2 ′ OMe pC. In some embodiments, a trinucleotide cap comprises m 7 GpppA 2 ′ OMe pG. In some embodiments, a trinucleotide cap comprises m 7 GpppA 2 ′ OMe pU. In some embodiments, a trinucleotide cap comprises m 7 GpppC 2 ′ OMe pA.
- a trinucleotide cap comprises m 7 GpppC 2 ′ OMe pC. In some embodiments, a trinucleotide cap comprises m 7 GpppC 2 ′ OMe pG. In some embodiments, a trinucleotide cap comprises m 7 GpppC 2 ′ OMe pU. In some embodiments, a trinucleotide cap comprises m 7 GpppG 2 ′ OMe pA. In some embodiments, a trinucleotide cap comprises m 7 GpppG2′OMepC. In some embodiments, a trinucleotide cap comprises m 7 GpppG2′OMepG.
- a trinucleotide cap comprises m 7 GpppG2′OMepU. In some embodiments, a trinucleotide cap comprises m 7 GpppU2′OMepA. In some embodiments, a trinucleotide cap comprises m 7 GpppU2′OMepC. In some embodiments, a trinucleotide cap comprises m 7 GpppU2′OMepG. In some embodiments, a trinucleotide cap comprises m 7 GpppU2′OMepU. [0150] In some embodiments, a trinucleotide cap comprises m 7 Gpppm 6 A2’OmepG.
- a trinucleotide cap comprises m 7 Gpppe 6 A 2’Ome pG.
- a trinucleotide cap comprises GAG.
- a trinucleotide cap comprises GCG.
- a trinucleotide cap comprises GUG.
- a trinucleotide cap comprises GGG.
- a trinucleotide cap comprises any one of the following structures: , or a stereoisomer, tautomer, or salt thereof.
- the cap analog comprises a tetranucleotide cap.
- the tetranucleotide cap comprises a trinucleotide as set forth above.
- the tetranucleotide cap comprises m7 GpppN1N2N3, where N1, N2, and N3 are optional (i.e., can be absent or one or more can be present) and are independently a natural, a modified, or an unnatural nucleoside base.
- m7 G is further methylated, e.g., at the 3’ position.
- the m7 G comprises an O-methyl at the 3’ position.
- N 1 , N 2 , and N 3 if present, optionally, are independently an adenine, a uracil, a guanidine, a thymine, or a cytosine.
- one or more (or all) of N1, N2, and N3, if present, are methylated, e.g., at the 2’ position.
- one or more (or all) of N 1 , N 2 , and N 3, if present have an O-methyl at the 2’ position.
- the tetranucleotide cap comprises formula (IV): or a stereoisomer, tautomer, or salt thereof, wherein B1, B2, and B3 are independently a natural, a modified, or an unnatural nucleoside based; and R 1 , R 2 , R 3 , and R 4 are independently OH or O-methyl.
- R3 is O-methyl and R4 is OH.
- R3 and R4 are O-methyl.
- R4 is O-methyl.
- R1 is OH
- R2 is OH
- R3 is O- methyl
- R 4 is OH.
- R 1 is OH
- R 2 is OH
- R 3 is O-methyl
- R 4 is O-methyl
- at least one of R1 and R2 is O-methyl
- R3 is O-methyl
- R4 is OH
- at least one of R1 and R2 is O-methyl
- R3 is O-methyl
- R 4 is O-methyl
- B 1 , B 3 , and B 3 are natural nucleoside bases.
- at least one of B1, B2, and B3 is a modified or unnatural base.
- at least one of B 1 , B 2 , and B 3 is N6-methyladenine.
- B 1 is adenine, cytosine, thymine, or uracil. In some embodiments, B 1 is adenine, B 2 is uracil, and B3 is adenine. In some embodiments, R1 and R2 are OH, R3 and R4 are O-methyl, B1 is adenine, B2 is uracil, and B3 is adenine. [0156] In some embodiments the tetranucleotide cap comprises a sequence selected from the following sequences: GAAA, GACA, GAGA, GAUA, GCAA, GCCA, GCGA, GCUA, GGAA, GGCA, GGGA, GGUA, GUCA, and GUUA.
- the tetranucleotide cap comprises a sequence selected from the following sequences: GAAG, GACG, GAGG, GAUG, GCAG, GCCG, GCGG, GCUG, GGAG, GGCG, GGGG, GGUG, GUCG, GUGG, and GUUG.
- the tetranucleotide cap comprises a sequence selected from the following sequences: GAAU, GACU, GAGU, GAUU, GCAU, GCCU, GCGU, GCUU, GGAU, GGCU, GGGU, GGUU, GUAU, GUCU, GUGU, and GUUU.
- the tetranucleotide cap comprises a sequence selected from the following sequences: GAAC, GACC, GAGC, GAUC, GCAC, GCCC, GCGC, GCUC, GGAC, GGCC, GGGC, GGUC, GUAC, GUCC, GUGC, and GUUC.
- a tetranucleotide cap in some embodiments, comprises a sequence selected from the following sequences: m 7 G 3 ′ OMe pppApApN, m 7 G 3 ′ OMe pppApCpN, m 7 G 3 ′ OMe pppApGpN, m 7 G 3 ′ OMe pppApUpN, m 7 G 3 ′ OMe pppCpApN, m 7 G 3 ′ OMe pppCpCpN, m 7 G 3 ′ OMe pppCpGpN, m 7 G3′OMepppCpUpN, m 7 G3′OMepppGpApN, m 7 G3′OMepppGpCpN, m 7 G3′OMepppGpCpN, m 7 G3′OMepppGpCpN, m 7 G3′OMepppGpCpN,
- a tetranucleotide cap in other embodiments, comprises a sequence selected from the following sequences: m 7 G 3 ′ OMe pppA 2 ′ OMe papN, m 7 G 3 ′ OMe pppA 2 ′ OMe pcpN, m 7 G 3 ′ OMe pppA 2 ′ OMe pgpN, m 7 G 3 ′ OMe pppA 2 ′ OMe pupN, m 7 G 3 ′ OMe pppC 2 ′ OMe papN, m 7 G 3 ′ OMe pppC 2 ′ OMe pcpN, m 7 G 3 ′ OMe pppC 2 ′ OMe pgpN, m 7 G 3 ′ OMe pppC 2 ′ OMe pupN, m 7 G3′OMepppG2′OMepapN, m 7 G3′OMepppG
- a tetranucleotide cap in still other embodiments, comprises a sequence selected from the following sequences: m 7 GpppA 2 ′ OMe pApN, m 7 GpppA 2 ′ OMe pCpN, m 7 GpppA 2 ′ OMe pGpN, m 7 GpppA 2 ′ OMe pUpN, m 7 GpppC 2 ′ OMe pApN, m 7 GpppC 2 ′ OMe pCpN, m 7 GpppC 2 ′ OMe pGpN, m 7 GpppC 2 ′ OMe pUpN, m 7 GpppG 2 ′ OMe pApN, m 7 GpppG 2 ′ OMe pCpN, m 7 GpppG 2 ′ OMe pCpN, m 7 GpppG 2 ′ OMe pApN,
- a tetranucleotide cap in other embodiments, comprises a sequence selected from the following sequences: m 7 g3′OMepppA2′oMepA2′OMepN, m 7 g3′OMepppA2′oMepC2′OMepN, m 7 g3′OMepppA2′oMepG2′OMepN, m 7 g3′OMepppA2′oMepU2′OMepN, m 7 g3′OMepppC2′oMepA2′OMepN, m 7 g 3 ′ OMe pppC 2 ′ oMe pC 2 ′ OMe pN, m 7 g 3 ′ OMe pppC 2 ′ oMe pG 2 ′ OMe pN, m 7 g 3 ′ OMe pppC 2 ′ oMe pG 2
- a tetranucleotide cap in still other embodiments, comprises a sequence selected from the following sequences: m 7 GpppA2′OMepa2′OMepn, m 7 GpppA2′OMepc2′OMepn, m 7 GpppA2′OMepg2′OMepn, m 7 GpppA2′OMepu2′OMepn, m 7 GpppC2′OMepa2′OMepn, m 7 GpppC2′OMepc2′OMepn, m 7 GpppC2′OMepg2′OMepn, m 7 GpppC2′OMepu2′OMepn, m 7 GpppG2′OMepa2′OMepn, m 7 GpppG2′OMepc2′OMepn, m 7 GpppG2′OMepg2′OMepn, m 7 GpppG2
- a tetranucleotide cap comprises GGAG. In some embodiments, a tetranucleotide cap comprises the following structure: [0163]
- the capping efficiency of a post-transcriptional or co-transcriptional capping reaction may vary. As used herein “capping efficiency” refers to the amount (e.g., expressed as a percentage) of mRNAs comprising a cap structure relative to the total mRNAs in a mixture (e.g., a post-translational capping reaction or a co-transcriptional calling reaction).
- the capping efficiency of a capping reaction is at least 60%, 70%, 80%, 90%, 95%, 99%, or 99.9% (e.g., after the capping reaction at least 60%, 70%, 80%, 90%, 95%, 99%, or 99.9% of the input mRNAs comprise a cap).
- multivalent co-IVT reactions described herein do not affect the capping efficiency of the mRNAs resulting from the IVT reaction.
- a 3′-poly(A) tail is typically a stretch of adenine nucleotides added to the 3′-end of the transcribed mRNA. It can, in some instances, comprise up to about 400 adenine nucleotides.
- a composition comprises an RNA (e.g., mRNA) having an ORF that encodes a signal peptide fused to the expressed polypeptide.
- Signal peptides usually comprising the N-terminal 15-60 amino acids of proteins, are typically needed for the translocation across the membrane on the secretory pathway and, thus, universally control the entry of most proteins both in eukaryotes and prokaryotes to the secretory pathway.
- a signal peptide may have a length of 15-60 amino acids.
- an ORF encoding a polypeptide is codon optimized. Codon optimization methods are known in the art. For example, an ORF of any one or more of the sequences provided herein may be codon optimized. Codon optimization, in some embodiments, may be used to match codon frequencies in target and host organisms to ensure proper folding; bias %G/C content to increase mRNA thermodynamic stability or reduce secondary structures; minimize tandem repeat codons or base runs that may impair gene construction or expression; customize transcriptional and translational control regions; insert or remove protein trafficking sequences; remove/add post translation modification sites in encoded protein (e.g., glycosylation sites); add, remove or shuffle protein domains; insert or delete restriction sites; modify ribosome binding sites and mRNA degradation sites; adjust translational rates to allow the various domains of the protein to fold properly; or reduce or eliminate problem secondary structures within the polynucleotide.
- an RNA e.g., mRNA
- an RNA is not chemically modified and comprises the standard ribonucleotides consisting of adenosine, guanosine, cytosine and uridine.
- nucleotides and nucleosides comprise standard nucleoside residues such as those present in transcribed RNA (e.g. A, G, C, or U).
- nucleotides and nucleosides comprise standard deoxyribonucleosides such as those present in DNA (e.g. dA, dG, dC, or dT).
- the compositions can comprise, in some embodiments, an RNA having an open reading frame encoding a polypeptide, wherein the nucleic acid comprises nucleotides and/or nucleosides that can be standard (unmodified) or modified as is known in the art.
- nucleotides and nucleosides comprise modified nucleotides or nucleosides.
- modified nucleotides and nucleosides can be naturally-occurring modified nucleotides and nucleosides or non-naturally occurring modified nucleotides and nucleosides. Such modifications can include those at the sugar, backbone, or nucleobase portion of the nucleotide and/or nucleoside as are recognized in the art.
- a naturally-occurring modified nucleotide or nucleotide is one as is generally known or recognized in the art. Non-limiting examples of such naturally occurring modified nucleotides and nucleotides can be found, inter alia, in the widely recognized MODOMICS database.
- nucleic acid e.g., RNA nucleic acids, such as mRNA nucleic acids.
- a “nucleoside” refers to a compound containing a sugar molecule (e.g., a pentose or ribose) or a derivative thereof in combination with an organic base (e.g., a purine or pyrimidine) or a derivative thereof (also referred to herein as “nucleobase”).
- organic base e.g., a purine or pyrimidine
- nucleobase also referred to herein as “nucleobase”.
- a “nucleotide” refers to a nucleoside, including a phosphate group.
- Modified nucleotides may by synthesized by any useful method, such as, for example, chemically, enzymatically, or recombinantly, to include one or more modified or non-natural nucleosides.
- Nucleic acids can comprise a region or regions of linked nucleosides. Such regions may have variable backbone linkages. The linkages can be standard phosphodiester linkages, in which case the nucleic acids would comprise regions of nucleotides.
- modified nucleosides in nucleic acids comprise N1-methyl-pseudouridine (m1 ⁇ ), 1-ethyl- pseudouridine (e1 ⁇ ), 5-methoxy-uridine (mo5U), 5-methyl-cytidine (m5C), and/or pseudouridine ( ⁇ ).
- modified nucleobases in nucleic acids comprise 5-methoxymethyl uridine, 5-methylthio uridine, 1-methoxymethyl pseudouridine, 5-methyl cytidine, and/or 5-methoxycytidine.
- the polyribonucleotide includes a combination of at least two (e.g., 2, 3, 4 or more) of any of the aforementioned modified nucleobases, including but not limited to chemical modifications.
- an mRNA comprises N1-methyl-pseudouridine (m1 ⁇ ) substitutions at one or more or all uridine positions of the nucleic acid.
- an mRNA comprises N1-methyl-pseudouridine (m1 ⁇ ) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid.
- a mRNA comprises pseudouridine ( ⁇ ) substitutions at one or more or all uridine positions of the nucleic acid.
- a mRNA comprises uridine at one or more or all uridine positions of the nucleic acid.
- mRNAs are uniformly modified (e.g., fully modified, modified throughout the entire sequence) for a particular modification.
- a nucleic acid can be uniformly modified with N1-methyl-pseudouridine, meaning that all uridine residues in the mRNA sequence are replaced with N1-methyl-pseudouridine.
- a nucleic acid can be uniformly modified for any type of nucleoside residue present in the sequence by replacement with a modified residue such as those set forth above.
- the nucleic acids may be partially or fully modified along the entire length of the molecule.
- one or more or all or a given type of nucleotide e.g., purine or pyrimidine, or any one or more or all of A, G, U, C
- nucleotides X in a nucleic acid are modified nucleotides, wherein X may be any one of nucleotides A, G, U, C, or any one of the combinations A+G, A+U, A+C, G+U, G+C, U+C, A+G+U, A+G+C, G+U+C or A+G+C.
- the mRNAs may comprise one or more regions or parts which act or function as an untranslated region. Where mRNAs are designed to encode at least one polypeptide of interest, the nucleic may comprise one or more of these untranslated regions (UTRs).
- Wild- type untranslated regions of a nucleic acid are transcribed but not translated.
- the 5′ UTR starts at the transcription start site and continues to the start codon but does not include the start codon; whereas the 3′ UTR starts immediately following the stop codon and continues until the transcriptional termination signal.
- the regulatory features of a UTR can be incorporated into the polynucleotides to, among other things, enhance the stability of the molecule. The specific features can also be incorporated to ensure controlled down-regulation of the transcript in case they are misdirected to undesired organs sites.
- a variety of 5’UTR and 3’UTR sequences are known and available in the art.
- Untranslated regions are sections of a nucleic acid before a start codon (5′ UTR) and after a stop codon (3′ UTR) that are not translated.
- a nucleic acid e.g., a ribonucleic acid (RNA), e.g., a messenger RNA (mRNA)
- mRNA messenger RNA
- ORF open reading frame
- a UTR can be homologous or heterologous to the coding region in a nucleic acid.
- the UTR is homologous to the ORF encoding the one or more proteins. In some embodiments, the UTR is heterologous to the ORF encoding the one or more proteins.
- the nucleic acid comprises two or more 5′ UTRs or functional fragments thereof, each of which have the same or different nucleotide sequences. In some embodiments, the nucleic acid comprises two or more 3′ UTRs or functional fragments thereof, each of which have the same or different nucleotide sequences. [0182] In some embodiments, the 5′ UTR or functional fragment thereof, 3′ UTR or functional fragment thereof, or any combination thereof is sequence optimized.
- the 5′ UTR or functional fragment thereof, 3′ UTR or functional fragment thereof, or any combination thereof comprises at least one chemically modified nucleobase, e.g., 5-methoxyuracil.
- UTRs can have features that provide a regulatory role, e.g., increased or decreased stability, localization, and/or translation efficiency.
- a nucleic acid comprising a UTR can be administered to a cell, tissue, or organism, and one or more regulatory features can be measured using routine methods.
- a functional fragment of a 5′ UTR or 3′ UTR comprises one or more regulatory features of a full length 5′ or 3′ UTR, respectively.
- Natural 5′ UTRs bear features that play roles in translation initiation. They harbor signatures like Kozak sequences that are commonly known to be involved in the process by which the ribosome initiates translation of many genes.5′ UTRs also have been known to form secondary structures that are involved in elongation factor binding. [0186] By engineering the features typically found in abundantly expressed genes of specific target organs, one can enhance the stability and protein production of a nucleic acid.
- liver-expressed mRNA such as albumin, serum amyloid A, Apolipoprotein A/B/E, transferrin, alpha fetoprotein, erythropoietin, or Factor VIII
- introduction of 5′ UTR of liver-expressed mRNA can enhance expression of nucleic acids in hepatic cell lines or liver.
- tissue-specific mRNA to improve expression in that tissue is possible for muscle (e.g., MyoD, Myosin, Myoglobin, Myogenin, Herculin), for endothelial cells (e.g., Tie-1, CD36), for myeloid cells (e.g., C/EBP, AML1, G-CSF, GM-CSF, CD11b, MSR, Fr-1, i-NOS), for leukocytes (e.g., CD45, CD18), for adipose tissue (e.g., CD36, GLUT4, ACRP30, adiponectin), and for lung epithelial cells (e.g., SP-A/B/C/D).
- muscle e.g., MyoD, Myosin, Myoglobin, Myogenin, Herculin
- endothelial cells e.g., Tie-1, CD36
- myeloid cells e.g., C/EBP, AML1, G
- UTRs are selected from a family of transcripts whose proteins share a common function, structure, feature, or property.
- an encoded polypeptide can belong to a family of proteins (i.e., that share at least one function, structure, feature, localization, origin, or expression pattern), which are expressed in a particular cell, tissue or at some time during development.
- the UTRs from any of the genes or mRNA can be swapped for any other UTR of the same or different family of proteins to create a new nucleic acid.
- the 5′ UTR and the 3′ UTR can be heterologous.
- the 5′ UTR can be derived from a different species than the 3′ UTR.
- the 3′ UTR can be derived from a different species than the 5′ UTR.
- International Patent Application No. PCT/US2014/021522 (Publ. No. WO/2014/164253) provides a listing of exemplary UTRs that may be utilized in the nucleic acids as flanking regions to an ORF. This publication is incorporated by reference herein for this purpose.
- Additional exemplary UTRs that may be utilized in the nucleic acids include, but are not limited to, one or more 5′ UTRs and/or 3′ UTRs derived from the nucleic acid sequence of: a globin, such as an ⁇ - or ⁇ -globin (e.g., a Xenopus, mouse, rabbit, or human globin); a strong Kozak translational initiation signal; a CYBA (e.g., human cytochrome b- 245 ⁇ polypeptide); an albumin (e.g., human albumin7); a HSD17B4 (hydroxysteroid (17- ⁇ ) dehydrogenase); a virus (e.g., a tobacco etch virus (TEV), a Venezuelan equine encephalitis virus (VEEV), a Dengue virus, a cytomegalovirus (CMV; e.g., CMV immediate early 1 (IE1)), a hepatitis virus (e.g.,
- the 5′ UTR is selected from the group consisting of a ⁇ - globin 5′ UTR; a 5′ UTR containing a strong Kozak translational initiation signal; a cytochrome b-245 ⁇ polypeptide (CYBA) 5′ UTR; a hydroxysteroid (17- ⁇ ) dehydrogenase (HSD17B4) 5′ UTR; a Tobacco etch virus (TEV) 5′ UTR; a Vietnamese etch virus (TEV) 5′ UTR; a decielen equine encephalitis virus (TEEV) 5′ UTR; a 5′ proximal open reading frame of rubella virus (RV) RNA encoding nonstructural proteins; a Dengue virus (DEN) 5′ UTR; a heat shock protein 70 (Hsp70) 5′ UTR; a eIF4G 5′ UTR; a GLUT15′ UTR; functional fragments thereof and any combination thereof.
- CYBA cytochrome b-2
- the 3′ UTR is selected from the group consisting of a ⁇ - globin 3′ UTR; a CYBA 3′ UTR; an albumin 3′ UTR; a growth hormone (GH) 3′ UTR; a VEEV 3′ UTR; a hepatitis B virus (HBV) 3′ UTR; ⁇ -globin 3′ UTR; a DEN 3′ UTR; a PAV barley yellow dwarf virus (BYDV-PAV) 3′ UTR; an elongation factor 1 ⁇ 1 (EEF1A1) 3′ UTR; a manganese superoxide dismutase (MnSOD) 3′ UTR; a ⁇ subunit of mitochondrial H(+)-ATP synthase ( ⁇ -mRNA) 3′ UTR; a GLUT13′ UTR; a MEF2A 3′ UTR; a ⁇ -F1- ATPase 3′ UTR; functional fragments thereof and combinations thereof
- Wild-type UTRs derived from any gene or mRNA can be incorporated into the nucleic acids.
- a UTR can be altered relative to a wild type or native UTR to produce a variant UTR, e.g., by changing the orientation or location of the UTR relative to the ORF; or by inclusion of additional nucleotides, deletion of nucleotides, swapping or transposition of nucleotides.
- variants of 5′ or 3′ UTRs can be utilized, for example, mutants of wild type UTRs, or variants wherein one or more nucleotides are added to or removed from a terminus of the UTR.
- one or more synthetic UTRs can be used in combination with one or more non-synthetic UTRs. See, e.g., Mandal and Rossi, Nat. Protoc.20138(3):568-82, and sequences available at www.addgene.org, the contents of each are incorporated herein by reference in their entirety. UTRs or portions thereof can be placed in the same orientation as in the transcript from which they were selected or can be altered in orientation or location. Hence, a 5′ and/or 3′ UTR can be inverted, shortened, lengthened, or combined with one or more other 5′ UTRs or 3′ UTRs.
- the nucleic acid may comprise multiple UTRs, e.g., a double, a triple or a quadruple 5′ UTR or 3′ UTR.
- a double UTR comprises two copies of the same UTR either in series or substantially in series.
- a double beta- globin 3′ UTR can be used (see, e.g., US 2010/0129877, the contents of which are incorporated herein by reference for this purpose).
- the nucleic acids can comprise combinations of features.
- the ORF can be flanked by a 5′ UTR that comprises a strong Kozak translational initiation signal and/or a 3′ UTR comprising an oligo(dT) sequence for templated addition of a polyA tail.
- a 5′ UTR can comprise a first nucleic acid fragment and a second nucleic acid fragment from the same and/or different UTRs (see, e.g., US 2010/0293625, herein incorporated by reference in its entirety for this purpose).
- Other non-UTR sequences can be used as regions or subregions within the nucleic acids. For example, introns or portions of intron sequences can be incorporated into the nucleic acids.
- the nucleic acid comprises an internal ribosome entry site (IRES) instead of or in addition to a UTR (see, e.g., Yakubov et al., Biochem. Biophys Res Commun.2010.394(1):189-193, the contents of which are incorporated herein by reference in their entirety).
- the nucleic acid comprises an IRES instead of a 5′ UTR sequence.
- the nucleic acid comprises an IRES that is located between a 5′ UTR and an open reading frame.
- the nucleic acid comprises an ORF encoding a viral capsid sequence.
- the nucleic acid comprises a synthetic 5′ UTR in combination with a non- synthetic 3′ UTR.
- the UTR can also include at least one translation enhancer nucleic acid, translation enhancer element, or translational enhancer elements (collectively, “TEE,” which refers to nucleic acid sequences that increase the amount of polypeptide or protein produced from a polynucleotide.
- TEE translation enhancer nucleic acid, translation enhancer element, or translational enhancer elements
- the TEE can include those described in US2009/0226470, incorporated herein by reference in its entirety for this purpose, and others known in the art.
- the TEE can be located between the transcription promoter and the start codon.
- the 5′ UTR comprises a TEE.
- a TEE is a conserved element in a UTR that can promote translational activity of a nucleic acid such as, but not limited to, cap-dependent or cap- independent translation.
- the TEE comprises the TEE sequence in the 5′-leader of the Gtx homeodomain protein. See, e.g., Chappell et al., PNAS.2004. 101:9590-9594, incorporated herein by reference in its entirety for this purpose.
- Poly(A) tails [0199] Some aspects relate to methods of producing RNAs containing one or more polyA tails.
- a “polyA tail” is a region of mRNA that is downstream, e.g., directly downstream (i.e., 3′), from the open reading frame and/or the 3′ UTR that contains multiple, consecutive adenosine monophosphates.
- a polyA tail may contain 10 to 300 adenosine monophosphates.
- a polyA tail may contain 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or 300 adenosine monophosphates.
- a polyA tail contains 50 to 250 adenosine monophosphates.
- the poly(A) tail functions to protect mRNA from enzymatic degradation, e.g., in the cytoplasm, and aids in transcription termination, export of the mRNA from the nucleus, and translation.
- polyA-tailing efficiency refers to the amount (e.g., expressed as a percentage) of mRNAs having polyA tail that are produced by an IVT reaction using an input DNA relative to the total number of mRNAs produced in the IVT reaction using the input DNA.
- the polyA-tailing efficiency of an IVT reaction may vary, for example depending upon the RNA polymerase used, amount or purity of input DNA used, etc. In some embodiments, the polyA-tailing efficiency of an IVT reaction is greater than 85%, 90%, 95%, or 99.9%. Methods of calculating polyA-tailing efficiency are known, for example by determining the amount of polyA tail-containing mRNA relative to total mRNA produced in an IVT reaction by column chromatography (e.g., oligo-dT chromatography).
- RNAs in an RNA composition produced by a method described herein comprise a polyA tail.
- at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 99.9% of each RNA in an RNA composition produced by a method described herein comprise a polyA tail.
- the efficiency e.g., percentage of polyA tail-containing RNAs in an RNA composition may be measured i) after the IVT reaction and before purification, or ii) after the RNA composition has been purified (e.g., by chromatography, such as oligo-dT chromatography).
- Unique polyA tail lengths provide certain advantages to nucleic acids. Generally, the length of a polyA tail, when present, is greater than 30 nucleotides in length.
- the polyA tail is greater than 35 nucleotides in length (e.g., at least or greater than about 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1,000, 1,100, 1,200, 1,300, 1,400, 1,500, 1,600, 1,700, 1,800, 1,900, 2,000, 2,500, or 3,000 nucleotides).
- the polyA tail is designed relative to the length of the overall nucleic acid or the length of a particular region of the nucleic acid.
- the polyA tail can be 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% greater in length than the nucleic acid or feature thereof.
- the polyA tail can also be designed as a fraction of the nucleic acid to which it belongs.
- the polyA tail can be 10, 20, 30, 40, 50, 60, 70, 80, or 90% or more of the total length of the construct, a construct region, or the total length of the construct minus the polyA tail.
- engineered binding sites and conjugation of nucleic acids for PolyA-binding protein can enhance expression.
- IVT methods produce (e.g., synthesize) an RNA transcript (e.g., mRNA transcript) by contacting a DNA template (e.g., a first input DNA and a second input DNA) with an RNA polymerase (e.g., a T7 RNA polymerase, a T7 RNA polymerase variant, etc.) under conditions that result in the production of the RNA transcript.
- a DNA template e.g., a first input DNA and a second input DNA
- an RNA polymerase e.g., a T7 RNA polymerase, a T7 RNA polymerase variant, etc.
- IVT conditions typically require a purified DNA template containing a promoter, nucleoside triphosphates, a buffer system that includes dithiothreitol (DTT) and magnesium ions, and an RNA polymerase.
- DTT dithiothreitol
- IVT methods further comprise a step of separating (e.g., purifying) in vitro transcription products (e.g., mRNA) from other reaction components.
- the separating comprises performing chromatography on the IVT reaction mixture.
- the method comprises reverse phase chromatography.
- the method comprises reverse phase column chromatography.
- the chromatography comprises size-based (e.g., length-based) chromatography.
- the method comprises size exclusion chromatography.
- the chromatography comprises oligo-dT chromatography.
- Multivalent in vitro transcription refers to contacting two or more DNA templates (e.g., a first input DNA and a second input DNA) with an RNA polymerase (e.g., a T7 RNA polymerase) under conditions that result in the production of RNA transcripts.
- Each input DNA (e.g., in a population of input DNA templates) in a co-IVT reaction may be obtained from a different source than other input DNAs.
- each input DNA may be obtained from a different bacterial cell or population or bacterial cells.
- a first input DNA can be produced in bacterial cell population A
- a second input DNA can be produced in bacterial cell population B
- a third input DNA can be produced in bacterial cell population C, where each of A, B, and C are not the same bacterial culture (e.g., co-cultured in the same container or plate).
- different input DNAs are obtained by separate synthesis reactions or produced by separate amplification reactions.
- the amounts of input DNAs used in multivalent co-IVT reactions may be normalized. Normalization may be based, for example, on the molar masses, lengths, nucleotide contents, degradation rates, and/or purity of input DNAs. In some embodiments, normalization is based on the degradation rate of resulting RNAs. [0210] Normalization may be based on the lowest level of a certain characteristic present among the input DNAs (e.g., lowest molar mass, degradation rate (e.g., of the input DNA and/or output RNA), nucleotide content, purity, and/or polyA-tailing efficiency).
- lowest level of a certain characteristic present among the input DNAs e.g., lowest molar mass, degradation rate (e.g., of the input DNA and/or output RNA), nucleotide content, purity, and/or polyA-tailing efficiency.
- normalization may be based on the highest level of a certain characteristic present among the input DNAs (e.g., highest molar mass, degradation rate (e.g., of the input DNA and/or output RNA), nucleotide context, purity, and/or polyA-tailing efficiency). In some embodiments, normalization is based on the rate of RNA production from the input DNAs (e.g., the highest rate of RNA production of an input DNA or the lowest rate of RNA production of an input DNA in a reaction mixture). [0211] The amount of one or more input DNAs may be adjusted and/or normalized to improve production of RNA compositions having a pre-defined or desired ratio of RNA components.
- Adjusting and/or normalizing amounts of input DNAs may compensate for differences between input DNAs (e.g., large differences in lengths of two input DNAs, or different polyA tailing efficiencies) that can affect the ratio of RNAs in a multivalent RNA composition, thereby allowing for the production of RNA compositions having desired ratios of different RNAs.
- the amount of two input DNAs present in a co-IVT reaction may be determined by selecting a desired molar ratio of a first RNA to a second RNA, calculating the mass of each DNA template necessary to achieve the same molar ratio between input DNAs, and combining input DNAs encoding each of the first and second RNAs in the same molar ratio.
- the number of input DNAs (e.g., populations of input DNA molecules) used in an IVT reaction may vary, depending upon the number of different RNA molecules desired to be included in the multivalent RNA composition.
- An IVT reaction mixture may comprise 2 or more different input DNAs (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different input DNAs).
- the concentration of each of the populations of DNA molecules may also vary.
- the input DNAs may be added to an IVT reaction are a predefined DNA ratio, which may comprise a ratio between 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different input DNAs (e.g., depending on the number of different RNAs in a composition).
- the size of two or more input DNAs may also vary.
- the mass of each population of input DNA molecules in an IVT reaction may also vary.
- the molar ratio between populations of input DNA molecules in an IVT reaction may also vary.
- Different input DNA molecules used in an IVT reaction may have a different length (e.g., comprises a different number of nucleotides).
- a co-IVT reaction may include co-transcription of at least 2 different input DNAs (e.g., at least 2 of DNA A, B, C, D, E, F, F, H, I, J, etc.) at a ratio of A:B:C:D:E:F:G:H:I:J, wherein if DNA A is normalized to 1, one or more of DNA B, C, D, E, F, G, H, I, J, etc. can each independently be present at an amount (e.g., a concentration) that is from 0.01 to 100 times the amount (e.g., a concentration) of A.
- One or more of DNA B, C, D, E, F, G, H, I, or J may also be absent.
- a multivalent RNA composition may be produced by combining RNA transcripts (e.g., mRNAs) from separate sources. For example, each of two or more DNA templates may be transcribed in separate IVT reactions, and combined to produce a multivalent RNA composition. RNAs may be combined in any desired amount to produce a multivalent RNA composition comprising two or more RNAs in a specific ratio.
- Identification and Ratio Determination (IDR) sequences [0221] In some embodiments, one or more nucleic acids comprises an Identification and Ratio Determination sequence.
- An Identification and Ratio Determination (IDR) sequence is a sequence of a biological molecule (e.g., nucleic acid or protein) that, when combined with the sequence of a target biological molecule, serves to identify the target biological molecule.
- an IDR sequence is a heterologous sequence that is incorporated within or appended to a sequence of a target biological molecule and can be used as a reference to identify the target molecule.
- a nucleic acid e.g., mRNA
- a target sequence of interest e.g., a coding sequence encoding a therapeutic and/or antigenic peptide or protein
- a unique IDR sequence e.g., a unique IDR sequence.
- RNA species may comprise an IDR sequence that differs from the IDR sequence of other RNA species (e.g., RNA(s) having different coding sequence(s)).
- Each IDR sequence thus identifies a particular RNA species, and so the abundance of IDR sequences may be measured to determine the abundance of each RNA species in a composition.
- Use of distinct IDR sequences to identify RNA species allows for analysis of multivalent RNA compositions (e.g., containing multiple RNA species) containing RNA species with similar coding sequences and/or lengths, which could otherwise be difficult to distinguish using PCR- or chromatography-based analysis of full-length RNAs.
- Each RNA species in a multivalent RNA composition may comprise an IDR sequence that is not a sequence isomer of an IDR sequence of another RNA species in a multivalent RNA composition (e.g., the IDR sequence does not have the same number of adenosine nucleotides, the same number of cytosine nucleotides, the same number of guanine nucleotides, and the same number of uracil nucleotides, as another IDR sequence in the composition, even if those sequences have different sequences).
- Having identical nucleotide compositions causes sequence isomers to have the same mass, presenting a challenge to distinguishing sequence isomers using mass-based identification methods (e.g., mass spectrometry).
- Each RNA species in a multivalent RNA composition may comprise an IDR sequence having a mass that differs from the mass of IDR sequences of each other RNA species in a multivalent RNA composition.
- the mass of each IDR sequence may differ from the mass of other IDR sequences by at least 9 Da, at least 25 Da, at least 25 Da, or at least 50 Da.
- Use of IDR sequences with distinct masses allows RNA fragments comprising different IDR sequences to be distinguished using mass-based analysis methods (e.g., mass spectrometry), which do not require reverse transcription, amplification, or sequencing of RNAs.
- Each RNA species in an RNA composition may comprises an IDR sequence with a different length.
- each IDR sequence may have a length independently selected from 0 to 25 nucleotides.
- the length of a nucleic acid influences the rate at which the nucleic acid traverses a chromatography column, and so the use of IDR sequences of different lengths on different RNA species allows RNA fragments having different IDR sequences to be distinguished using chromatography-based methods (e.g., LC-UV).
- IDR sequences may be chosen such that no IDR sequence comprises a start codon, ‘AUG’. Lack of a start codon in an IDR sequence prevents undesired translation of nucleotide sequences within and/or downstream from the IDR sequence.
- IDR sequences may be chosen such that no IDR sequence comprises a recognition site for a restriction enzyme.
- no IDR sequence comprises a recognition site for XbaI, ‘UCUAG’.
- Lack of a recognition site for a restriction enzyme e.g., XbaI recognition site ‘UCUAG’) allows the restriction enzyme to be used in generating and modifying a DNA template for in vitro transcription, without affecting the IDR sequence or sequence of the transcribed RNA.
- Lipid Compositions [0228]
- the nucleic acids are formulated as a lipid composition, such as a composition comprising a lipid nanoparticle, a liposome, and/or a lipoplex.
- nucleic acids are formulated as lipid nanoparticle (LNP) compositions.
- LNP lipid nanoparticle
- Lipid nanoparticles typically comprise amino lipid, non-cationic lipid, structural lipid, and PEG lipid components along with the nucleic acid cargo of interest.
- the lipid nanoparticles can be generated using components, compositions, and methods as are generally known in the art, see for example PCT/US2016/052352; PCT/US2016/068300; PCT/US2017/037551; PCT/US2015/027400; PCT/US2016/047406; PCT/US2016000129; PCT/US2016/014280; PCT/US2017/038426; PCT/US2014/027077; PCT/US2014/055394; PCT/US2016/52117; PCT/US2012/069610; PCT/US2017/027492; PCT/US2016/059575; PCT/US2016/069491; PCT/US2016/069493; and PCT/US2014/66242, all of which are incorporated by reference herein in their entirety.
- the lipid nanoparticle comprises at least one ionizable amino lipid, at least one non-cationic lipid, at least one sterol, and/or at least one polyethylene glycol (PEG)-modified lipid.
- the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-25% non-cationic lipid, 25-55% structural lipid, and 0.5-15% PEG- modified lipid.
- the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-30% non-cationic lipid, 10-55% structural lipid, and 0.5-15% PEG- modified lipid.
- the lipid nanoparticle comprises 40-50 mol% ionizable lipid, optionally 45-50 mol%, for example, 45-46 mol%, 46-47 mol%, 47-48 mol%, 48-49 mol%, or 49-50 mol% for example about 45 mol%, 45.5 mol%, 46 mol%, 46.5 mol%, 47 mol%, 47.5 mol%, 48 mol%, 48.5 mol%, 49 mol%, or 49.5 mol%.
- the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid.
- the lipid nanoparticle may comprise 20-50 mol%, 20-40 mol%, 20- 30 mol%, 30-60 mol%, 30-50 mol%, 30-40 mol%, 40-60 mol%, 40-50 mol%, or 50-60 mol% ionizable amino lipid.
- the lipid nanoparticle comprises 20 mol%, 30 mol%, 40 mol%, 50 mol%, or 60 mol% ionizable amino lipid.
- the lipid nanoparticle comprises 35 mol%, 36 mol%, 37 mol%, 38 mol%, 39 mol%, 40 mol%, 41 mol%, 42 mol%, 43 mol%, 44 mol%, 45 mol%, 46 mol%, 47 mol%, 48 mol%, 49 mol%, 50 mol%, 51 mol%, 52 mol%, 53 mol%, 54 mol%, or 55 mol% ionizable amino lipid.
- the lipid nanoparticle comprises 45 – 55 mole percent (mol%) ionizable amino lipid.
- lipid nanoparticle may comprise 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, or 55 mol% ionizable amino lipid.
- Ionizable amino lipids Formula (AI) [0235]
- the ionizable amino lipid is a compound of Formula (AI): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched ; wherein R’ branched denotes a point of attachment; wherein R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each independently selected from the group consisting of H, C 2-12 alkyl, and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C2-14 alkenyl; R 4 is selected from the group consisting of -(CH 2 ) n OH, wherein n is selected from the group consisting wherein denotes a point of
- R’ a is R’ branched ;
- R’ branched is denotes a point of attachment;
- R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each H;
- R 2 and R 3 are each C 1-14 alkyl;
- R 4 is -(CH 2 ) n OH; n is 2;
- each R 5 is H;
- each R 6 is H;
- M and M’ are each -C(O)O-;
- R’ is a C 1-12 alkyl; l is 5; and
- m is 7.
- R’ a is R’ branched ; point of attachment; R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each H; R 2 and R 3 are each C1-14 alkyl; R 4 is -(CH2)nOH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 3; and m is 7.
- R’ a is R’ branched ; denotes a point of attachment; R a ⁇ is C2-12 alkyl; R a ⁇ , R a ⁇ , and R a ⁇ are each H; R 2 and R 3 are each C 1-14 alkyl; R 10 NH(C 1-6 alkyl); n2 is 2; R 5 is H; each R 6 is H; M and M’ are each -C(O)O-; R’ is a C 1-12 alkyl; l is 5; and m is 7.
- R’ a is R’ branched ; point of attachment; R a ⁇ , R a ⁇ , and R a ⁇ are each H; R a ⁇ is C2-12 alkyl; R 2 and R 3 are each C1-14 alkyl; R 4 is -(CH2)nOH; n is 2; each R 5 is H; each R 6 is H; M and M’ are each -C(O)O-; R’ is a C 1-12 alkyl; l is 5; and m is 7.
- the compound of Formula (AI) is selected from: , , and .
- the ionizable amino lipid of Formula (AI) is a compound of Formula (AIa): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched ; wherein denotes a point of attachment; wherein R a ⁇ , R a ⁇ , and R a ⁇ are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C 2-14 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting wherein denotes a point of attachment; wherein R 10 is N(R) 2 ; each R is independently selected from the group consisting of C 1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10
- the ionizable amino lipid of Formula (AI) is a compound of Formula (AIb): wherein R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each independently selected from the group consisting of H, C 2-12 alkyl, and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R 4 is -(CH 2 ) n OH, wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; each R 5 is independently selected from the group consisting of C 1-3 alkyl, C2-3 alkenyl, and H; each R 6 is independently selected from the group consisting of C1-3 alkyl, C 2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C 1-12 alkyl or C 2-12
- R’ a is R’ branched ;
- R’ branched is denotes a point of attachment;
- R a ⁇ , R a ⁇ , and R a ⁇ are each H;
- R 2 and R 3 are each C 1-14 alkyl;
- R 4 is -(CH 2 ) n OH; n is 2;
- each R 5 is H;
- each R 6 is H;
- M and M’ are each -C(O)O-;
- R’ is a C 1-12 alkyl; l is 5; and
- m is 7.
- R’ a is R’ branched ;
- R’ branched is denotes a point of attachment;
- R a ⁇ , R a ⁇ , and R a ⁇ are each H;
- R 2 and R 3 are each C 1-14 alkyl;
- R 4 is -(CH 2 ) n OH; n is 2;
- each R 5 is H;
- each R 6 is H;
- M and M’ are each -C(O)O-;
- R’ is a C1-12 alkyl; l is 3; and
- m is 7.
- R’ a is R’ branched ;
- R’ branched is denotes a point of attachment;
- R a ⁇ and R a ⁇ are each H;
- R a ⁇ is C2-12 alkyl;
- R 2 and R 3 are each C 1-14 alkyl;
- R 4 is -(CH 2 ) n OH;
- n is 2;
- each R 5 is H;
- each R 6 is H;
- M and M’ are each -C(O)O-;
- R’ is a C1-12 alkyl; l is 5; and m is 7.
- the ionizable amino lipid of Formula (AI) is a compound of Formula (AIc): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched ; wherein denotes a point of attachment; wherein R a ⁇ , R a ⁇ , R a ⁇ , and R a ⁇ are each independently selected from the group consisting of H, C 2-12 alkyl, and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C2-14 alkenyl; wherein denotes a point of attachment; whereinR 10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R 5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkeny
- R a ⁇ , R a ⁇ , and R a ⁇ are each H; R a ⁇ is C2-12 alkyl; R 2 and R 3 are NH(C1-6 alkyl); n2 is 2; each R 5 is H; each R 6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7.
- the compound of Formula (AIc) is: .
- the ionizable amino lipid is a compound of Formula (AII): wherein R’ a is R’ branched or R’ cyclic ; wherein a R a ⁇ and R a ⁇ are each independently selected from the group consisting of H, C 1-12 alkyl, and C2-12 alkenyl, wherein at least one of R a ⁇ and R a ⁇ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R b ⁇ and R b ⁇ are each independently selected from the group consisting of H, C 1-12 alkyl, and C 2-12 alkenyl, wherein at least one of R b ⁇ and R b ⁇ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C 2-14 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-a): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched or R’ cyclic ; wherein wherein denotes a point of attachment; R a ⁇ and R a ⁇ are each independently selected from the group consisting of H, C 1-12 alkyl, and C2-12 alkenyl, wherein at least one of R a ⁇ and R a ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl; R b ⁇ and R b ⁇ are each independently selected from the group consisting of H, C 1-12 alkyl, and C2-12 alkenyl, wherein at least one of R b ⁇ and R b ⁇ is selected from the group consisting of C 1-12 alkyl and C 2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 al
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-b): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched or R’ cyclic ; wherein wherein denotes a point of attachment; R a ⁇ and R b ⁇ are each independently selected from the group consisting of C 1-12 alkyl and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C 2-14 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting wherein denotes a point of attachment; wherein R 10 is N(R)2; each R is independently selected from the group consisting of C 1-6 alkyl, C 2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-c): wherein denotes a point of attachment; wherein R a ⁇ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C 2-14 alkenyl; R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting wherein denotes a point of attachment; wherein R 10 is N(R)2; each R is independently selected from the group consisting of C 1-6 alkyl, C 2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; R’ is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5,
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-d): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched or R’ cyclic ; wherein R’ branched is: and R’ b is: ; wherein denotes a point of attachment; wherein R a ⁇ and R b ⁇ are each independently selected from the group consisting of C1- 12 alkyl and C 2-12 alkenyl; R 4 is selected from the group consisting of -(CH 2 ) n OH wherein n is selected from the group consisting wherein denotes a point of attachment; wherein R 10 is N(R)2; each R is independently selected from the group consisting of C 1-6 alkyl, C 2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-e): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched or R’ cyclic ; wherein wherein denotes a point of attachment; wherein R a ⁇ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R 2 and R 3 are each independently selected from the group consisting of C 1-14 alkyl and C2-14 alkenyl; R 4 is -(CH2)nOH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C 1-12 alkyl or C 2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9.
- each R’ independently is a C 1-12 alkyl.
- each R’ independently is a C2-5 alkyl.
- R’ b is: and R 2 and R 3 are each independently a C 1-14 alkyl.
- R’ b is: and R 2 and R 3 are each independently a C 6-10 alkyl.
- R’ b is: are each a C 8 alkyl.
- R’ branched is: alkyl and R 2 and R 3 are each independently a C6-10 alkyl.
- R’ branched is: are each independently a C 6-10 alkyl.
- (AII- R a ⁇ is a C2-6 alkyl, and R 2 and R 3 are each a C8 alkyl.
- m and l are each independently selected from 4, 5, and 6 and each R’ independently is a C1-12 alkyl.
- m and l are each 5 and each R’ independently is a C 2-5 alkyl.
- R’ branched is: are each independently selected from 4, 5, and 6, each R’ independently is a C1-12 alkyl, and R a ⁇ and R b ⁇ are each a C 1-12 alkyl.
- R’ independently is a C 2-5 alkyl
- R a ⁇ and R b ⁇ are each a C 2-6 alkyl.
- R’ branched is: are each independently selected from 4, 5, and 6, R’ is a C 1-12 alkyl, R a ⁇ is a C 1-12 alkyl and R 2 and R 3 are each independently a C6-10 alkyl.
- R’ branched is: are each 5, R’ is a C 2-5 alkyl, R a ⁇ is a C 2-6 alkyl, and R 2 and R 3 are each a C 8 alkyl.
- R 10 is NH(C1-6 alkyl) and n2 is 2.
- R 10 is NH(CH3) and n2 is 2.
- R’ branched is: each independently selected from 4, 5, and 6, each R’ independently is a C 1-12 alkyl, R a ⁇ and R b ⁇ are each a C1-12 alkyl, wherein R 10 is NH(C1-6 alkyl), and n2 is 2.
- R’ branched is: each 5, each R’ independently is a C2-5 alkyl, R a ⁇ and R b ⁇ are each a C2-6 alkyl, and R 4 is , wherein R 10 is NH(CH 3 ) and n2 is 2.
- R’ branched is: are each independently selected from 4, 5, and 6, R’ is a C1-12 alkyl, R 2 and R 3 are each independently a C 6-10 alkyl, R a ⁇ is a C 1-12 alkyl, wherein R 10 is NH(C 1-6 alkyl) and n2 is 2.
- R’ branched is: are each 5, R’ is a C2-5 alkyl, R a ⁇ is a C2-6 alkyl, R 2 and R 3 are each a C8 alkyl, and R 4 is , wherein R 10 is NH(CH 3 ) and n2 is 2. [0266] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R 4 is -(CH2)nOH and n is 2, 3, or 4.
- R 4 is -(CH2)nOH and n is 2.
- R’ branched is: each independently selected from 4, 5, and 6, each R’ independently is a C1-12 alkyl, R a ⁇ and R b ⁇ are each a C 1-12 alkyl, R 4 is -(CH 2 ) n OH, and n is 2, 3, or 4.
- R’ branched is: , m and l are each 5, each R’ independently is a C2-5 alkyl, R a ⁇ and R b ⁇ are each a C2-6 alkyl, R 4 is -(CH2)nOH, and n is 2.
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-f): its N-oxide, or a salt or isomer thereof, wherein R’ a is R’ branched or R’ cyclic ; wherein wherein denotes a point of attachment; R a ⁇ is a C1-12 alkyl; R 2 and R 3 are each independently a C1-14 alkyl; R 4 is -(CH 2 ) n OH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C 1-12 alkyl; m is selected from 4, 5, and 6; and l is selected from 4, 5, and 6.
- m and l are each 5, and n is 2, 3, or 4.
- R’ is a C 2-5 alkyl, R a ⁇ is a C2-6 alkyl, and R 2 and R 3 are each a C6-10 alkyl.
- m and l are each 5, n is 2, 3, or 4, R’ is a C 2-5 alkyl, R a ⁇ is a C 2-6 alkyl, and R 2 and R 3 are each a C 6-10 alkyl.
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-g): its N-oxide, or a salt or isomer thereof; wherein R a ⁇ is a C 2-6 alkyl; R’ is a C2-5 alkyl; and R 4 is selected from the group consisting of -(CH 2 ) n OH wherein n is selected from the group consisting wherein denotes a point of attachment, R 10 is NH(C 1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3.
- the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-h): its N-oxide, or a salt or isomer thereof; wherein R a ⁇ and R b ⁇ are each independently a C2-6 alkyl; each R’ independently is a C 2-5 alkyl; and R 4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting wherein denotes a point of attachment, R 10 is NH(C1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3.
- R 4 is , wherein R 10 is NH(CH 3 ) and n2 is 2. [0275] In some embodiments of the compound of Formula (AII-g) or (AII-h), R 4 is - (CH2)2OH.
- the ionizable amino lipids may be one or more of compounds of Formula (AIII): or their N-oxides, or salts or isomers thereof, wherein: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of hydrogen, a C 3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a
- another subset of compounds of Formula (AIII) includes those in which: R 1 is selected from the group consisting of C 5-30 alkyl, C 5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of a C 3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N, O, and S,
- another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of a C 3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heterocycle having one or more heteroatoms selected from N, O, and S, -
- another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of H, C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of a C 3-6 carbocycle, -(CH 2 ) n Q, -(CH 2 ) n CHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N,
- another subset of compounds of Formula (AIII) includes those in which R 1 is selected from the group consisting of C 5-30 alkyl, C 5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C2-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is -(CH 2 ) n Q or -(CH 2 ) n CHQR, where Q is -N(R) 2 , and n is selected from 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H
- another subset of compounds of Formula (AIII) includes those in which R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R 2 and R 3 are independently selected from the group consisting of C 1-14 alkyl, C 2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R 4 is selected from the group consisting of -(CH 2 ) n Q, -(CH 2 ) n CHQR, -CHQR, and -CQ(R) 2 , where Q is -N(R) 2 , and n is selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R 6 is
- a subset of compounds of Formula (AIII) includes those of Formula (AIII-B): or its N-oxide, or a salt or isomer thereof in which all variables are as defined herein.
- m is selected from 5, 6, 7, 8, and 9;
- M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an
- m is 5, 7, or 9.
- Q is OH, -NHC(S)N(R)2, or -NHC(O)N(R)2.
- Q is -N(R)C(O)R, or -N(R)S(O)2R.
- the compounds of Formula (AIII) are of Formula (AIII-D), or their N-oxides, or salts or isomers thereof, wherein R 4 is as described herein.
- the compounds of Formula (AIII) are of Formula (AIII- E), or their N-oxides, or salts or isomers thereof, wherein R4 is as described herein.
- the compounds of Formula (AIII) are of Formula (AIII-F) or (AIII-G): or their N-oxides, or salts or isomers thereof, wherein R 4 is as described herein.
- the compounds of Formula (AIII) are of Formula (AIII-H): their N-oxides, or salts or isomers thereof, wherein M is -C(O)O- or –OC(O)-, M” is C 1-6 alkyl or C 2-6 alkenyl, R 2 and R 3 are independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl, and n is selected from 2, 3, and 4.
- the compounds of Formula (AIII) are of Formula (AIII- I): or their N-oxides, or salts or isomers thereof, wherein n is 2, 3, or 4; and m, R’, R”, and R 2 through R 6 are as described herein.
- each of R 2 and R 3 may be independently selected from the group consisting of C 5-14 alkyl and C 5-14 alkenyl.
- an ionizable amino lipid comprises a compound having structure: (Compound 1).
- an ionizable amino lipid comprises a compound having structure: (Compound 2).
- the compounds of Formula (AIII) are of Formula (AIII- J), (AIII-J), or their N-oxides, or salts or isomers thereof, wherein l is selected from 1, 2, 3, 4, and 5; m is selected from 5, 6, 7, 8, and 9; M 1 is a bond or M’; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R 2 and R 3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl.
- M is C1-6 alkyl (e.g., C1-4 alkyl) or C2-6 alkenyl (e.g. C2-4 alkenyl).
- R2 and R3 are independently selected from the group consisting of C 5-14 alkyl and C 5-14 alkenyl.
- the ionizable amino lipids are one or more of the compounds described in U.S. Application Nos.
- the central amine moiety of a lipid according to Formula (AIII), (AIII-A), (AIII- B), (AIII-C), (AIII-D), (AIII-E), (AIII-F), (AIII-G), (AIII-H), (AIII-I), or (AIII-J) may be protonated at a physiological pH.
- a lipid may have a positive or partial positive charge at physiological pH.
- Such amino lipids may be referred to as cationic lipids, ionizable lipids, cationic amino lipids, or ionizable amino lipids.
- Amino lipids may also be zwitterionic, i.e., neutral molecules having both a positive and a negative charge.
- the ionizable amino lipids may be one or more of compounds of formula (AIV), or salts or isomers thereof, wherein t is 1 or 2; A1 and A2 are each independently selected from CH or N; Z is CH 2 or absent wherein when Z is CH 2 , the dashed lines (1) and (2) each represent a single bond; and when Z is absent, the dashed lines (1) and (2) are both absent; R1, R2, R3, R4, and R5 are independently selected from the group consisting of C5-20 alkyl, C 5-20 alkenyl, -R”MR’, -R*YR”, -YR”, and -R*OR”; R X1 and R X2 are each independently H or C 1-3 alkyl; each M is independently selected from the group consisting of -C(O)O-, -OC(O)-, -OC(O)O-, -C(O)N(R’)-, -N(R
- the compound is of any of formulae (AIVa)-(AIVh): (AIVe), (AIVf), (AIVg), or (AIVh).
- the ionizable amino lipid is salt thereof.
- the central amine moiety of a lipid according to Formula (AIV), (AIVa), (AIVb), (AIVc), (AIVd), (AIVe), (AIVf), (AIVg), or (AIVh) may be protonated at a physiological pH.
- a lipid may have a positive or partial positive charge at physiological pH.
- the lipid nanoparticle comprises a lipid having the structure: or a pharmaceutically acceptable salt thereof, wherein: each R la is independently hydrogen, R lc , or R ld ; each R lb is independently R lc or R ld ; each R 1c is independently –[CH 2 ] 2 C(O)X 1 R 3 ; each R ld Is independently -C(O)R 4 ; each R 2 is independently -[C(R 2a )2]cR 2b ; each R 2a is independently hydrogen or C 1 -C 6 alkyl; each R 3 and R 4 is independently C6-C30 aliphatic; each I.
- each B is independently hydrogen or an ionizable nitrogen-containing group
- each X 1 is independently a covalent bond or O
- each a is independently an integer of 1-10
- each b is independently an integer of 1-10
- each c is independently an integer of 1-10.
- the lipid nanoparticle comprises a lipid having the structure: or a pharmaceutically acceptable salt thereof, wherein R 1 and R 2 are the same or different, each a linear or branched alkyl with 1-9 carbons, or as alkenyl or alkynyl with 2 to 11 carbon atoms, L1 and L2 are the same or different, each a linear alkyl having 5 to 18 carbon atoms, or form a heterocycle with N, X 1 is a bond, or is -CG-G- whereby L2-CO-O-R 2 is formed, X 2 is S or O, L3 is a bond or a lower alkyl, or form a heterocycle with N, R 3 is a lower alkyl, and R 4 and R 5 are the same or different, each a lower alkyl.
- the lipid nanoparticle comprises an ionizable lipid having the structure: or a pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: or a pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: (A4), or a pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: (A6), or a pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: (A7), or a pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: (A10), or a pharmaceutically acceptable salt thereof.
- the lipid nanoparticle comprises a lipid having the structure: pharmaceutically acceptable salt thereof.
- Non-cationic lipids [0324] In certain embodiments, the lipid nanoparticles described herein comprise one or more non-cationic lipids. Non-cationic lipids may be phospholipids. [0325] In some embodiments, the lipid nanoparticle comprises 5-25 mol% non-cationic lipid.
- the lipid nanoparticle may comprise 5-20 mol%, 5-15 mol%, 5-10 mol%, 10-25 mol%, 10-20 mol%, 10-25 mol%, 15-25 mol%, 15-20 mol%, or 20-25 mol% non- cationic lipid.
- the lipid nanoparticle comprises 5 mol%, 10 mol%, 15 mol%, 20 mol%, or 25 mol% non-cationic lipid.
- a non-cationic lipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2- dilinoleoyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero- phosphocholine (DMPC), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), l,2- dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-diundecanoyl-sn-glycero- phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-di- O-octadecenyl-sn-glycero-3-phosphocholine
- the lipid nanoparticle comprises 5 – 15 mol%, 5 – 10 mol%, or 10 – 15 mol% DSPC.
- the lipid nanoparticle may comprise 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 mol% DSPC.
- the lipid composition of the lipid nanoparticle composition disclosed herein can comprise one or more phospholipids, for example, one or more saturated or (poly)unsaturated phospholipids or a combination thereof.
- phospholipids comprise a phospholipid moiety and one or more fatty acid moieties.
- a phospholipid moiety can be selected, for example, from the non-limiting group consisting of phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl glycerol, phosphatidyl serine, phosphatidic acid, 2-lysophosphatidyl choline, and a sphingomyelin.
- a fatty acid moiety can be selected, for example, from the non-limiting group consisting of lauric acid, myristic acid, myristoleic acid, palmitic acid, palmitoleic acid, stearic acid, oleic acid, linoleic acid, alpha-linolenic acid, erucic acid, phytanoic acid, arachidic acid, arachidonic acid, eicosapentaenoic acid, behenic acid, docosapentaenoic acid, and docosahexaenoic acid.
- Particular phospholipids can facilitate fusion to a membrane.
- a cationic phospholipid can interact with one or more negatively charged phospholipids of a membrane (e.g., a cellular or intracellular membrane). Fusion of a phospholipid to a membrane can allow one or more elements (e.g., a therapeutic agent) of a lipid-containing composition (e.g., LNPs) to pass through the membrane permitting, e.g., delivery of the one or more elements to a target tissue.
- elements e.g., a therapeutic agent
- a lipid-containing composition e.g., LNPs
- Non-natural phospholipid species including natural species with modifications and substitutions including branching, oxidation, cyclization, and alkynes are also contemplated.
- a phospholipid can be functionalized with or cross-linked to one or more alkynes (e.g., an alkenyl group in which one or more double bonds is replaced with a triple bond).
- alkynes e.g., an alkenyl group in which one or more double bonds is replaced with a triple bond.
- an alkyne group can undergo a copper- catalyzed cycloaddition upon exposure to an azide.
- Such reactions can be useful in functionalizing a lipid bilayer of a nanoparticle composition to facilitate membrane permeation or cellular recognition or in conjugating a nanoparticle composition to a useful component such as a targeting or imaging moiety (e.g., a dye).
- Phospholipids include, but are not limited to, glycerophospholipids such as phosphatidylcholines, phosphatidylethanolamines, phosphatidylserines, phosphatidylinositols, phosphatidy glycerols, and phosphatidic acids. Phospholipids also include phosphosphingolipid, such as sphingomyelin.
- a phospholipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-Distearoyl-sn-glycero-3-phosphoethanolamine (DSPE), 1,2- dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleoyl-sn-glycero-3- phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero-phosphocholine (DMPC), 1,2-dioleoyl- sn-glycero-3-phosphocholine (DOPC), l,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-diundecanoyl-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3- phosphocholine (POPC),
- a phospholipid is an analog or variant of DSPC.
- a phospholipid is a compound of Formula (HI): or a salt thereof, wherein: each R 1 is independently optionally substituted alkyl; or optionally two R 1 are joined together with the intervening atoms to form optionally substituted monocyclic carbocyclyl or optionally substituted monocyclic heterocyclyl; or optionally three R 1 are joined together with the intervening atoms to form optionally substituted bicyclic carbocyclyl or optionally substitute bicyclic heterocyclyl; n is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; m is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; A is of the formula: each instance of L 2 is independently a bond or optionally substituted C 1-6 alkylene, wherein one methylene unit of the optionally substituted C1-6 alkylene is optionally replaced with O, N(R N ), S, C(O), C(O)
- the compound is not of the formula: , wherein each instance of R 2 is independently unsubstituted alkyl, unsubstituted alkenyl, or unsubstituted alkynyl.
- the phospholipids may be one or more of the phospholipids described in PCT Application No. PCT/US2018/037922.
- the lipid nanoparticle comprises a molar ratio of 5-25% non-cationic lipid relative to the other lipid components.
- the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15-20%, 20-25%, or 25-30% non-cationic lipid.
- the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% non-cationic lipid.
- the lipid nanoparticle comprises a molar ratio of 5-25% phospholipid relative to the other lipid components.
- the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15- 20%, 20-25%, or 25-30% phospholipid.
- the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% phospholipid lipid.
- Structural lipids [0340]
- the lipid composition of a pharmaceutical composition disclosed herein can comprise one or more structural lipids.
- structural lipid includes sterols and also to lipids containing sterol moieties.
- Structural lipids can be selected from the group including but not limited to, cholesterol, fecosterol, sitosterol, ergosterol, campesterol, stigmasterol, brassicasterol, tomatidine, tomatine, ursolic acid, alpha-tocopherol, hopanoids, phytosterols, steroids, and mixtures thereof.
- the structural lipid is a sterol.
- sterols are a subgroup of steroids consisting of steroid alcohols.
- the structural lipid is a steroid.
- the structural lipid is cholesterol. In certain embodiments, the structural lipid is an analog of cholesterol. In certain embodiments, the structural lipid is alpha-tocopherol. [0342] In some embodiments, the structural lipids may be one or more of the structural lipids described in U.S. Application No.16/493,814. [0343] In some embodiments, the lipid nanoparticle comprises a molar ratio of 25-55% structural lipid relative to the other lipid components.
- the lipid nanoparticle may comprise a molar ratio of 10- 55%, 25-50%, 25-45%, 25-40%, 25-35%, 25-30%, 30- 55%, 30-50%, 30-45%, 30-40%, 30-35%, 35-55%, 35-50%, 35-45%, 35-40%, 40-55%, 40- 50%, 40-45%, 45-55%, 45-50%, or 50-55% structural lipid.
- the lipid nanoparticle comprises a molar ratio of 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or 55% structural lipid.
- the lipid nanoparticle comprises 30-45 mol% sterol, optionally 35-40 mol%, for example, 30-31 mol%, 31-32 mol%, 32-33 mol%, 33-34 mol%, 35-35 mol%, 35-36 mol%, 36-37 mol%, 38-38 mol%, 38-39 mol%, or 39-40 mol%. In some embodiments, the lipid nanoparticle comprises 25-55 mol% sterol.
- the lipid nanoparticle may comprise 25-50 mol%, 25-45 mol%, 25-40 mol%, 25-35 mol%, 25-30 mol%, 30-55 mol%, 30-50 mol%, 30-45 mol%, 30-40 mol%, 30-35 mol%, 35-55 mol%, 35- 50 mol%, 35-45 mol%, 35-40 mol%, 40-55 mol%, 40-50 mol%, 40-45 mol%, 45-55 mol%, 45-50 mol%, or 50-55 mol% sterol.
- the lipid nanoparticle comprises 25 mol%, 30 mol%, 35 mol%, 40 mol%, 45 mol%, 50 mol%, or 55 mol% sterol.
- the lipid nanoparticle comprises 35 – 40 mol% cholesterol.
- the lipid nanoparticle may comprise 35, 35.5, 36, 36.5, 37, 37.5, 38, 38.5, 39, 39.5, or 40 mol% cholesterol.
- Polyethylene glycol (PEG)-Lipids [0346]
- the lipid composition of a pharmaceutical composition disclosed herein can comprise one or more polyethylene glycol (PEG) lipids.
- PEG-lipid or “PEG-modified lipid” refers to polyethylene glycol (PEG)-modified lipids.
- PEG-lipids include PEG-modified phosphatidylethanolamine and phosphatidic acid, PEG-ceramide conjugates (e.g., PEG-CerC14 or PEG-CerC20), PEG-modified dialkylamines, and PEG-modified 1,2- diacyloxypropan-3-amines.
- PEG-lipids include PEG-modified phosphatidylethanolamine and phosphatidic acid, PEG-ceramide conjugates (e.g., PEG-CerC14 or PEG-CerC20), PEG-modified dialkylamines, and PEG-modified 1,2- diacyloxypropan-3-amines.
- PEGylated lipids PEGylated lipids.
- a PEG lipid can be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid.
- the PEG-lipid includes, but not limited to 1,2-dimyristoyl- sn-glycerol methoxypolyethylene glycol (PEG-DMG), 1,2-distearoyl-sn-glycero-3- phosphoethanolamine-N-[amino(polyethylene glycol)] (PEG-DSPE), PEG-disteryl glycerol (PEG-DSG), PEG-dipalmetoleyl, PEG-dioleyl, PEG-distearyl, PEG-diacylglycamide (PEG- DAG), PEG-dipalmitoyl phosphatidylethanolamine (PEG-DPPE), or PEG-l,2- dimyristyloxlpropy
- the PEG-lipid is selected from the group consisting of a PEG-modified phosphatidylethanolamine, a PEG-modified phosphatidic acid, a PEG- modified ceramide, a PEG-modified dialkylamine, a PEG-modified diacylglycerol, a PEG- modified dialkylglycerol, and mixtures thereof.
- the PEG-modified lipid is PEG-DMG, PEG-c-DOMG (also referred to as PEG-DOMG), PEG-DSG, and/or PEG-DPG.
- the lipid moiety of the PEG-lipids includes those having lengths of from about C14 to about C22, preferably from about C14 to about C16.
- a PEG moiety for example an mPEG-NH 2 , has a size of about 1000, 2000, 5000, 10,000, 15,000 or 20,000 daltons.
- the PEG-lipid is PEG 2k - DMG.
- the lipid nanoparticles described herein can comprise a PEG lipid which is a non-diffusible PEG.
- Non-limiting examples of non-diffusible PEGs include PEG-DSG and PEG-DSPE.
- PEG-lipids are known in the art, such as those described in U.S. Patent No. 8158601 and International Publ. No. WO 2015/130584 A2, which are incorporated herein by reference in their entirety.
- some of the other lipid components (e.g., PEG lipids) of various formulae described herein may be synthesized as described International Patent Application No. PCT/US2016/000129, filed December 10, 2016, entitled “Compositions and Methods for Delivery of Therapeutic Agents,” which is incorporated by reference in its entirety.
- the lipid component of a lipid nanoparticle composition may include one or more molecules comprising polyethylene glycol, such as PEG or PEG-modified lipids.
- a PEG lipid is a lipid modified with polyethylene glycol.
- a PEG lipid may be selected from the non-limiting group including PEG-modified phosphatidylethanolamines, PEG-modified phosphatidic acids, PEG-modified ceramides, PEG-modified dialkylamines, PEG-modified diacylglycerols, PEG-modified dialkylglycerols, and mixtures thereof.
- a PEG lipid may be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid.
- the PEG-modified lipids are a modified form of PEG DMG.
- PEG-DMG has the following structure: [0356]
- PEG lipids can be PEGylated lipids described in International Publication No. WO2012099755, the contents of which is herein incorporated by reference in its entirety. Any of these exemplary PEG lipids described herein may be modified to comprise a hydroxyl group on the PEG chain.
- the PEG lipid is a PEG-OH lipid.
- a “PEG-OH lipid” (also referred to herein as “hydroxy-PEGylated lipid”) is a PEGylated lipid having one or more hydroxyl (– OH) groups on the lipid.
- the PEG-OH lipid includes one or more hydroxyl groups on the PEG chain.
- a PEG-OH or hydroxy- PEGylated lipid comprises an –OH group at the terminus of the PEG chain.
- a PEG lipid is a compound of Formula (PI): or salts thereof, wherein: R 3 is –OR O ; R O is hydrogen, optionally substituted alkyl, or an oxygen protecting group; r is an integer between 1 and 100, inclusive; L 1 is optionally substituted C1-10 alkylene, wherein at least one methylene of the optionally substituted C1-10 alkylene is independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, O, N(R N ), S, C(O), C(O)N(R N ), NR N C(O), C(O)O, - OC(O), OC(O)O, OC(O)N(R N ), NR N C(O)O, or NR N C(O)N(R N ); D is a moiety obtained by click chemistry or a moiety cleav
- the compound of Fomula (PI) is a PEG-OH lipid (i.e., R 3 is –OR O , and R O is hydrogen).
- the compound of Formula (PI) is of Formula (PI-OH): (PI-OH), or a salt thereof.
- Formula (PII) [0359]
- a PEG lipid is a PEGylated fatty acid.
- a PEG lipid is a compound of Formula (PII).
- the compound of Formula (PII) is of Formula (PII-OH): (PII-OH), or a salt thereof. In some embodiments, r is 40-50. [0361] In yet other embodiments the compound of Formula (PII) is: or a salt thereof. [0362] In some embodiments, the compound of Formula (PII) is [0363] In some embodiments, the lipid composition of the pharmaceutical compositions disclosed herein does not comprise a PEG-lipid. [0364] In some embodiments, the PEG-lipids may be one or more of the PEG lipids described in U.S. Application No. US15/674,872.
- the lipid nanoparticle comprises a molar ratio of 0.5-15% PEG lipid relative to the other lipid components.
- the lipid nanoparticle may comprise a molar ratio of 0.5-10%, 0.5-5%, 1-15%, 1-10%, 1-5%, 2-15%, 2-10%, 2-5%, 5- 15%, 5-10%, or 10-15% PEG lipid.
- the lipid nanoparticle comprises a molar ratio of 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, or 15% PEG- lipid.
- the lipid nanoparticle comprises 1-5% PEG-modified lipid, optionally 1-3 mol%, for example 1.5 to 2.5 mol%, 1-2 mol%, 2-3 mol%, 3-4 mol%, or 4-5 mol%.
- the lipid nanoparticle comprises 0.5-15 mol% PEG-modified lipid.
- the lipid nanoparticle may comprise 0.5-10 mol%, 0.5-5 mol%, 1-15 mol%, 1-10 mol%, 1-5 mol%, 2-15 mol%, 2-10 mol%, 2-5 mol%, 5-15 mol%, 5-10 mol%, or 10-15 mol%.
- the lipid nanoparticle comprises 0.5 mol%, 1 mol%, 2 mol%, 3 mol%, 4 mol%, 5 mol%, 6 mol%, 7 mol%, 8 mol%, 9 mol%, 10 mol%, 11 mol%, 12 mol%, 13 mol%, 14 mol%, or 15 mol% PEG-modified lipid.
- Some embodiments comprise adding PEG to a composition comprising an LNP encapsulating a nucleic acid (e.g., which already includes PEG in the amounts listed above).
- the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid, 5-25 mol% non-cationic lipid, 25-55 mol% sterol, and 0.5-15 mol% PEG- modified lipid.
- a LNP comprises an ionizable amino lipid of Compound 1, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG.
- a LNP comprises an ionizable amino lipid of Compound 2, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG.
- a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising PEG-DMG.
- a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising a compound having Formula (PII).
- a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII).
- a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII).
- a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid having Formula (HI), a structural lipid, and a PEG lipid comprising a compound having Formula (PII).
- the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 10 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG.
- the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 1.5 mol% DMG-PEG.
- the lipid nanoparticle comprises 48 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG.
- a LNP comprises an N:P ratio of from about 2:1 to about 30:1.
- a LNP comprises an N:P ratio of about 6:1.
- a LNP comprises an N:P ratio of about 3:1, 4:1, or 5:1.
- a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of from about 10:1 to about 100:1.
- a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 20:1.
- a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 10:1.
- Some embodiments comprise a composition having one or more LNPs having a diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less.
- Some embodiments comprise a composition having a mean LNP diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less.
- the composition has a mean LNP diameter from about 30nm to about 150nm, or a mean diameter from about 60nm to about 120nm.
- a LNP may comprise or one or more types of lipids, including but not limited to amino lipids (e.g., ionizable amino lipids), neutral lipids, non-cationic lipids, charged lipids, PEG-modified lipids, phospholipids, structural lipids and sterols.
- a LNP may further comprise one or more cargo molecules, including but not limited to nucleic acids (e.g., mRNA, plasmid DNA, DNA or RNA oligonucleotides, siRNA, shRNA, snRNA, snoRNA, lncRNA, etc.), small molecules, proteins and peptides.
- the composition comprises a liposome.
- a liposome is a lipid particle comprising lipids arranged into one or more concentric lipid bilayers around a central region. The central region of a liposome may comprises an aqueous solution, suspension, or other aqueous composition.
- a lipid nanoparticle may comprise two or more components (e.g., amino lipid and nucleic acid, PEG-lipid, phospholipid, structural lipid).
- a lipid nanoparticle may comprise an amino lipid and a nucleic acid.
- compositions comprising the lipid nanoparticles, such as those described herein, may be used for a wide variety of applications, including the stealth delivery of therapeutic payloads with minimal adverse innate immune response.
- Effective in vivo delivery of nucleic acids represents a continuing medical challenge. Exogenous nucleic acids (i.e., originating from outside of a cell or organism) are readily degraded in the body, e.g., by the immune system. Accordingly, effective delivery of nucleic acids to cells often requires the use of a particulate carrier (e.g., lipid nanoparticles).
- the particulate carrier should be formulated to have minimal particle aggregation, be relatively stable prior to intracellular delivery, effectively deliver nucleic acids intracellularly, and illicit no or minimal immune response.
- many conventional particulate carriers have relied on the presence and/or concentration of certain components (e.g., PEG-lipid).
- certain components e.g., PEG-lipid
- certain components may decrease the stability of encapsulated nucleic acids (e.g., mRNA molecules).
- the reduced stability may limit the broad applicability of the particulate carriers.
- nucleic acid e.g., mRNA
- the lipid nanoparticles comprise one or more of ionizable molecules, polynucleotides, and optional components, such as structural lipids, sterols, neutral lipids, phospholipids and a molecule capable of reducing particle aggregation (e.g., polyethylene glycol (PEG), PEG-modified lipid), such as those described above.
- a LNP described herein may include one or more ionizable molecules (e.g., amino lipids or ionizable lipids). The ionizable molecule may comprise a charged group and may have a certain pKa.
- the pKa of the ionizable molecule may be greater than or equal to about 6, greater than or equal to about 6.2, greater than or equal to about 6.5, greater than or equal to about 6.8, greater than or equal to about 7, greater than or equal to about 7.2, greater than or equal to about 7.5, greater than or equal to about 7.8, greater than or equal to about 8.
- the pKa of the ionizable molecule may be less than or equal to about 10, less than or equal to about 9.8, less than or equal to about 9.5, less than or equal to about 9.2, less than or equal to about 9.0, less than or equal to about 8.8, or less than or equal to about 8.5.
- an ionizable molecule comprises one or more charged groups.
- an ionizable molecule may be positively charged or negatively charged.
- an ionizable molecule may be positively charged.
- an ionizable molecule may comprise an amine group.
- the term “ionizable molecule” has its ordinary meaning in the art and may refer to a molecule or matrix comprising one or more charged moiety.
- a “charged moiety” is a chemical moiety that carries a formal electronic charge, e.g., monovalent (+1, or -1), divalent (+2, or -2), trivalent (+3, or - 3), etc.
- the charged moiety may be anionic (i.e., negatively charged) or cationic (i.e., positively charged).
- Examples of positively-charged moieties include amine groups (e.g., primary, secondary, and/or tertiary amines), ammonium groups, pyridinium group, guanidine groups, and imidizolium groups.
- the charged moieties comprise amine groups.
- Examples of negatively-charged groups or precursors thereof include carboxylate groups, sulfonate groups, sulfate groups, phosphonate groups, phosphate groups, hydroxyl groups, and the like.
- the charge of the charged moiety may vary, in some cases, with the environmental conditions, for example, changes in pH may alter the charge of the moiety, and/or cause the moiety to become charged or uncharged.
- an ionizable molecule e.g., an amino lipid or ionizable lipid
- an ionizable molecule may include one or more precursor moieties that can be converted to charged moieties.
- the ionizable molecule may include a neutral moiety that can be hydrolyzed to form a charged moiety, such as those described above.
- the molecule or matrix may include an amide, which can be hydrolyzed to form an amine, respectively.
- the ionizable molecule e.g., amino lipid or ionizable lipid
- the molecular weight of an ionizable molecule is less than or equal to about 2,500 g/mol, less than or equal to about 2,000 g/mol, less than or equal to about 1,500 g/mol, less than or equal to about 1,250 g/mol, less than or equal to about 1,000 g/mol, less than or equal to about 900 g/mol, less than or equal to about 800 g/mol, less than or equal to about 700 g/mol, less than or equal to about 600 g/mol, less than or equal to about 500 g/mol, less than or equal to about 400 g/mol, less than or equal to about 300 g/mol, less than or equal to about 200 g/mol, or less than or equal to about 100 g/mol.
- the molecular weight of an ionizable molecule is greater than or equal to about 100 g/mol, greater than or equal to about 200 g/mol, greater than or equal to about 300 g/mol, greater than or equal to about 400 g/mol, greater than or equal to about 500 g/mol, greater than or equal to about 600 g/mol, greater than or equal to about 700 g/mol, greater than or equal to about 1000 g/mol, greater than or equal to about 1,250 g/mol, greater than or equal to about 1,500 g/mol, greater than or equal to about 1,750 g/mol, greater than or equal to about 2,000 g/mol, or greater than or equal to about 2,250 g/mol.
- each type of ionizable molecule may independently have a molecular weight in one or more of the ranges described above.
- the percentage (e.g., by weight, or by mole) of a single type of ionizable molecule (e.g., amino lipid or ionizable lipid) and/or of all the ionizable molecules within a particle may be greater than or equal to about 15%, greater than or equal to about 16%, greater than or equal to about 17%, greater than or equal to about 18%, greater than or equal to about 19%, greater than or equal to about 20%, greater than or equal to about 21%, greater than or equal to about 22%, greater than or equal to about 23%, greater than or equal to about 24%, greater than or equal to about 25%, greater than or equal to about 30%, greater than or equal to about 35%, greater than or equal to about 40%, greater than or equal to about 42%, greater than or equal to about 45%, greater than or equal to about 48%, greater than or equal to about 50%, greater than or equal to about 52%, greater than or equal to about 55%, greater than or equal to about 5
- the percentage (e.g., by weight, or by mole) may be less than or equal to about 70%, less than or equal to about 68%, less than or equal to about 65%, less than or equal to about 62%, less than or equal to about 60%, less than or equal to about 58%, less than or equal to about 55%, less than or equal to about 52%, less than or equal to about 50%, or less than or equal to about 48%. Combinations of the above referenced ranges are also possible (e.g., greater than or equal to 20% and less than or equal to about 60%, greater than or equal to 40% and less than or equal to about 55%, etc.).
- each type of ionizable molecule may independently have a percentage (e.g., by weight, or by mole) in one or more of the ranges described above.
- the percentage e.g., by weight, or by mole
- the percentage may be determined by extracting the ionizable molecule(s) from the dried particles using, e.g., organic solvents, and measuring the quantity of the agent using high pressure liquid chromatography (i.e., HPLC), liquid chromatography-mass spectrometry (LC-MS), nuclear magnetic resonance (NMR), or mass spectrometry (MS).
- HPLC may be used to quantify the amount of a component, by, e.g., comparing the area under the curve of a HPLC chromatogram to a standard curve.
- charge or “charged moiety” does not refer to a “partial negative charge” or “partial positive charge” on a molecule.
- partial negative charge and “partial positive charge” are given their ordinary meaning in the art.
- a “partial negative charge” may result when a functional group comprises a bond that becomes polarized such that electron density is pulled toward one atom of the bond, creating a partial negative charge on the atom.
- a lipid composition may comprise one or more lipids as described herein. Such lipids may include those useful in the preparation of lipid nanoparticle formulations as described above or as known in the art. Stabilizing compounds [0398] Some embodiments of the compositions described herein are stabilized pharmaceutical compositions.
- Various non-viral delivery systems, including nanoparticle formulations present attractive opportunities to overcome many challenges associated with mRNA delivery.
- Lipid nanoparticles have drawn particular attention in recent years as various LNP formulations have shown promise in a variety of pharmaceutical applications.
- lipids have been shown to degrade nucleic acids, including mRNA, and lipid nanoparticle formulations undergo rapid loss of purity when stored as refrigerated liquids.
- the storage stability of mRNA encapsulated within LNPs is lower than that of unencapsulated mRNA.
- a class of compounds has been found to stabilize nucleic acids within a lipid carrier such as an LNP, an unexpected and unprecedented discovery which enables applications including extended refrigerated liquid shelf-life, extended in-use periods at room temperature, and extended in-use stability at physiological temperatures up to higher temperatures such as 40°C.
- the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a stabilizing compound (e.g., a compound of Formula (I), of Formula (II), or a tautomer or solvate thereof).
- a stabilizing compound e.g., a compound of Formula (I), of Formula (II), or a tautomer or solvate thereof.
- the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a lipid, and a compound of Formula (I): or a tautomer or solvate thereof, wherein: is a single bond or a double bond; R 1 is H; R 2 is OCH 3 , or together with R 3 is OCH 2 O; R 3 is OCH 3 , or together with R 2 is OCH 2 O; R 4 is H; R 5 is H or OCH 3 ; R 6 is OCH 3 ; R 7 is H or OCH 3 ; R 8 is H; R 9 is H or CH 3 ; and X is a pharmaceutically acceptable anion, e.g., a halide such as chloride.
- R 1 is H
- R 2 is OCH 3 , or together with R 3 is OCH 2 O
- R 3 is OCH 3 , or together with R 2 is OCH 2 O
- R 4 is H
- R 5 is H or OCH 3
- R 6 is OCH 3
- the compound of Formula (I) has the structure of: Formula (Ia) Formula (Ib) Formula (Ic) or a tautomer or solvate thereof.
- the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a lipid, and a compound of Formula (II): (II), or a tautomer or solvate thereof, wherein: R 10 is H; R 11 is H; R 12 together with R 13 is OCH2O; R 14 is H; R 15 together with R 16 is OCH 2 O; R 17 is H; and X is a pharmaceutically acceptable anion, e.g., a halide such as chloride.
- the compound of Formula (II) has the structure of: or a tautomer or solvate thereof.
- Stabilizing compounds of Formulas (I), (Ia), (Ib), (Ic), (II), and (Iia) are described in International Application No. PCT/US2022/025967, which is incorporated by reference herein in its entirety.
- the nucleic acid formulation comprises lipid nanoparticles.
- the nucleic acid is mRNA.
- the stabilizing compound (“the compound”) has a purity of at least 70%, 80%, 90%, 95%, or 99%.
- the compound contains fewer than 100ppm of elemental metals.
- the stabilized pharmaceutical composition (“the composition”) comprises a pharmaceutically acceptable metal chelator, e.g., EDTA (ethylenediaminetetraacetic acid) or DTPA (diethylenetriaminepentaacetic acid).
- the composition is an aqueous solution.
- the compound is present at a concentration between about 0.1mM and about 10mM in the aqueous solution.
- the aqueous solution has a pH of or about 5 to 8, including pH of about 5, 5.5, 6, 6.5, 7, 7.5, or 8.
- the aqueous solution does not comprise NaCl.
- the aqueous solution comprises NaCl in a concentration of or about 150mM. In some embodiments, the aqueous solution comprises a phosphate buffer, a tris buffer, an acetate buffer, a histidine buffer, or a citrate buffer. [0408] In some embodiments, microbial growth in the composition is inhibited by the compound. [0409] In some embodiments, the composition is characterized as having a mRNA purity level of greater than 60%, greater than 70%, greater than 80%, or greater than 90% main peak mRNA purity after at least thirty days of storage. In some embodiments, the composition comprises a mRNA purity level of greater than 50% main peak mRNA purity after at least six months of storage.
- the storage is at room temperature.
- the composition comprises a lipid nanoparticle encapsulating a mRNA, and the composition comprises less than 50%, less than 60%, less than 70%, less than 80%, less than 90%, or less than 95% RNA fragments after at least thirty days of storage.
- the storage temperature is greater than room temperature. In some embodiments, the storage temperature is about 4°C.
- the compound interacts with the nucleic acid comprised within a lipid nanostructure (e.g., a lipid nanoparticle, liposome, or lipoplex), e.g., via pi-pi stacking and/or by changing backbone helicity of the nucleic acid.
- the compound intercalates with a nucleic acid. In some embodiments, the compound binds with a nucleic acid, e.g., reversible binding, and/or binding to the stranded regions of the nucleic acid. In some embodiments, the compound self-associates, binds to nucleic acid ribose contacts, and/or binds to nucleic acid base contacts. In some embodiments, the compound does not substantially bind to nucleic acid phosphate contacts. In some embodiments, the positive charge of the compound contributes to nucleic acid binding.
- the compound interacts with a nucleic acid and provides shielding from solvent, e.g., water.
- the compound shields ribose from solvent more than the compound shields the phosphate groups of the nucleic acid.
- the solvent exposure is measured by the solvent accessible surface area (SASA).
- a stabilizing compound decreases the solvent accessible area of ribose to about 5-10 nm 2 . In some embodiments, a stabilizing compound decreases the solvent accessible area of ribose to about 6-8 nm 2 . In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 9-12 nm 2 . In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 10-11 nm 2 . [0413] In some embodiments, a nucleic acid that is conformationally stabilized by the compound exhibits thermal unfolding temperatures (measured by circular dichroism or DSC, for example) that are higher than in the absence of the compound.
- the compound confers increased stability, e.g., thermal stability, to the nucleic acid in a folded structure, e.g., relative to its unfolded or less folded or more linear form.
- the compound causes compaction of the nucleic acid upon interaction with the nucleic acid.
- the compound causes a decrease in the hydrodynamic radius of the nucleic acid molecule upon interaction with the nucleic acid.
- a stabilizing compound causes compaction or a decrease in the hydrodynamic radius of a nucleic acid molecule by 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, or more.
- a stabilizing compound causes compaction or a decrease in the hydrodynamic radius of a nucleic acid molecule when the compound is in a concentration of 1 ⁇ M, 2 ⁇ M, 3 ⁇ M, 4 ⁇ M, 5 ⁇ M, 6 ⁇ M, 7 ⁇ M, 8 ⁇ M, 9 ⁇ M, 10 ⁇ M, 15 ⁇ M, 20 ⁇ M, 25 ⁇ M, 30 ⁇ M, 35 ⁇ M, 40 ⁇ M, 45 ⁇ M, 50 ⁇ M, 60 ⁇ M, 70 ⁇ M, 80 ⁇ M, 90 ⁇ M, or 100 ⁇ M.
- Example 1 Chemical stability of CpA dinucleotide in mRNA [0414] The susceptibility of different dinucleotide pairs to spontaneous cleavage was analyzed by incubating a test mRNA in water for 4 hours, and analyzing the resulting mRNA cleavage fragments by Illumina 3′ end sequencing.
- fragments were sequenced, and reads were aligned to the reference sequence, with the 3′ nucleotide of each read corresponding to the first nucleotide in a dinucleotide pair that was cleaved to generate the sequenced mRNA fragment (e.g., a read ending in AAGCAC (SEQ ID NO: 1) that aligned to the sequence AAGCACAAUC (SEQ ID NO: 2) indicated that the bolded CpA dinucleotide was cleaved to generate the 3′ of the mRNA fragment).
- AAGCAC SEQ ID NO: 1
- AAGCACAAUC SEQ ID NO: 2
- Low CA mRNAs #2 and 3 contained increased %G/C content, relative to Low CA mRNA #1, and Low CA mRNAs #2 and #3 differed in 5′ UTR sequences.
- the CpA dinucleotide content (# of CpA dinucleotides in the open reading frame), %G/C content (in mRNA sequence), and time to 50% purity during storage at (i) 40 °C unformulated; (ii) 25 °C unformulated; or (iii) 25 °C when formulated in a lipid nanoparticle (LNP), is shown in Table 1.
- Example 2 In vitro expression and in vivo immunogenicity of mRNAs with low CpA dinucleotide content [0416] The panel of mRNAs tested in Example 1 was also tested in cultured EXPI293 cells to evaluate expression of mRNAs with reduced CpA dinucleotide content.
- mice were immunized with two doses of a composition containing 1 ⁇ g mRNA, receiving the first dose on day 0 and the second dose on day 22. On day 21, three weeks after the first dose, and day 36, two weeks after the second dose, sera were collected to evaluate antibody responses elicited by each LNP-mRNA composition.
- Example 3 In vitro transcription (IVT) Materials and Methods
- ORF open reading frame
- the open reading frame (ORF) of the gene of interest may be flanked by a 5′ untranslated region (UTR) containing a strong Kozak translational initiation signal, and an alpha-globin 3′ UTR.
- the ORF may also include various upstream or downstream additions (such as, but not limited to, ⁇ -globin, tags, etc.) may be ordered from an optimization service such as, but limited to, DNA2.0 (Menlo Park, Calif.) and may contain multiple cloning sites which may have XbaI recognition.
- NEB DH5-alpha Competent E. coli may be used. Transformations are performed according to NEB instructions using 100 ng of plasmid. The protocol is as follows: Thaw a tube of NEB 5-alpha Competent E. coli cells on ice for 10 minutes. Add 1-5 ⁇ l containing 1 pg-100 ng of plasmid DNA to the cell mixture. Carefully flick the tube 4-5 times to mix cells and DNA. Do not vortex. Place the mixture on ice for 30 minutes. Do not mix. Heat shock at 42° C. for exactly 30 seconds. Do not mix. Place on ice for 5 minutes. Do not mix.
- a maxi prep is performed using the Invitrogen PURELINKTM HiPure Maxiprep Kit (Carlsbad, Calif.), following the manufacturer's instructions.
- IVT In order to generate cDNA for In Vitro Transcription (IVT), the plasmid is first linearized using a restriction enzyme such as XbaI.
- a typical restriction digest with XbaI will comprise the following: Plasmid 1.0 ⁇ g; 10 ⁇ Buffer 1.0 ⁇ l; XbaI 1.5 ⁇ l; dH2O up to 10 ⁇ l; incubated at 37° C. for 1 hr.
- the reaction is cleaned up using Invitrogen's PURELINKTM PCR Micro Kit (Carlsbad, Calif.) per manufacturer's instructions. Larger scale purifications may need to be done with a product that has a larger load capacity such as Invitrogen's standard PURELINKTM PCR Kit (Carlsbad, Calif.). Following the cleanup, the linearized vector is quantified using the NanoDrop and analyzed to confirm linearization using agarose gel electrophoresis.
- IVT Reaction [0423] The in vitro transcription reaction generates mRNA containing alternative nucleotides or alternative RNA. The input nucleotide triphosphate (NTP) mix is made in- house using natural and unnatural NTPs.
- NTP nucleotide triphosphate
- a typical in vitro transcription reaction includes the following: Template cDNA 1.0 ⁇ g 10x transcription buffer (400 mM Tris-HCl 2.0 ⁇ l pH 8.0, 190 mM MgCl2, 50 mM DTT, 10 mM Spermidine) Custom NTPs (25 mM each) 7.2 ⁇ l RNase Inhibitor 20 U T7 RNA polymerase 3000 U dH2O up to 20.0 ⁇ l Incubation at 37 °C for 3 hr-5 hrs. [0424] The crude IVT mix may be stored at 4° C overnight for cleanup the next day.1 U of RNase-free DNase is then used to digest the original template.
- the T7 RNA polymerase may be selected from, T7 RNA polymerase, T3 RNA polymerase and mutant polymerases such as, but not limited to, the novel polymerases able to incorporate alternative NTPs as well as those polymerases described by Liu (Esvelt et al.
- Nanodrop Alternative mRNA Quantification and UV Spectral Data [0428] Alternative mRNAs in TE buffer (1 ⁇ l) are used for Nanodrop UV absorbance readings to quantitate the yield of each alternative mRNA from an in vitro transcription reaction (UV absorbance traces are not shown).
- Example 3 Enzymatic capping of mRNA [0429] Capping of the mRNA is performed as follows where the mixture includes: IVT RNA 60 ⁇ g–180 ⁇ g and dH2O up to 72 ⁇ l. The mixture is incubated at 65 °C for 5 minutes to denature RNA, and then is transferred immediately to ice.
- the protocol then involves the mixing of 10 ⁇ Capping Buffer (0.5 M Tris-HCl (pH 8.0), 60 mM KCl, 12.5 mM MgCl 2 ) (10.0 ⁇ l); 20 mM GTP (5.0 ⁇ l); 20 mM S-Adenosyl Methionine (2.5 ⁇ l); RNase Inhibitor (100 U); 2′-O-Methyltransferase (400 U); Vaccinia capping enzyme (Guanylyl transferase) (40 U); dH2O (Up to 28 ⁇ l); and incubation at 37 °C for 30 minutes for 60 ⁇ g RNA or up to 2 hours for 180 ⁇ g of RNA.
- Capping Buffer 0.5 M Tris-HCl (pH 8.0), 60 mM KCl, 12.5 mM MgCl 2
- 20 mM GTP 5.0 ⁇ l
- 20 mM S-Adenosyl Methionine 2.5 ⁇
- RNA is then purified using Ambion's MEGACLEARTM Kit (Austin, Tex.) following the manufacturer's instructions. Following the cleanup, the RNA is quantified using the NANODROPTM (ThermoFisher, Waltham, Mass.) and analyzed by agarose gel electrophoresis to confirm the RNA is the proper size and that no degradation of the RNA has occurred. The RNA product may also be sequenced by running a reverse-transcription-PCR to generate the cDNA for sequencing. Example 4: 5′-Guanosine capping Materials and Methods [0432] The cloning, gene synthesis and vector sequencing may be performed by DNA2.0 Inc. (Menlo Park, Calif.).
- the ORF is restriction digested using XbaI and used for cDNA synthesis using tailed- or tail-less-PCR.
- the tailed-PCR cDNA product is used as the template for the alternative mRNA synthesis reaction using 25 mM each alternative nucleotide mix (all alternative nucleotides may be custom synthesized or purchased from TriLink Biotech, San Diego, Calif. except pyrrolo-C triphosphate which may be purchased from Glen Research, Sterling Va.; unmodified nucleotides are purchased from Epicenter Biotechnologies, Madison, Wis.) and CellScript MEGASCRIPTTM (Epicenter Biotechnologies, Madison, Wis.) complete mRNA synthesis kit.
- the in vitro transcription reaction is run for 4 hours at 37 °C.
- Alternative mRNAs incorporating adenosine analogs are poly (A) tailed using yeast Poly (A) Polymerase (Affymetrix, Santa Clara, Calif.).
- the PCR reaction uses HiFi PCR 2 ⁇ MASTER MIXTM (Kapa Biosystems, Woburn, Mass.).
- Alternative mRNAs are post-transcriptionally capped using recombinant Vaccinia Virus Capping Enzyme (New England BioLabs, Ipswich, Mass.) and a recombinant 2′-O-methyltransferase (Epicenter Biotechnologies, Madison, Wis.) to generate the 5′-guanosine Cap1 structure.
- Cap 2 structure and Cap 2 structures may be generated using additional 2′-O-methyltransferases.
- the in vitro transcribed mRNA product is run on an agarose gel and visualized.
- Alternative mRNA may be purified with Ambion/Applied Biosystems (Austin, Tex.) MEGAClear RNATM purification kit.
- the PCR uses PURELINKTM PCR purification kit (Invitrogen, Carlsbad, Calif.).
- the product is quantified on NANODROPTM UV Absorbance (ThermoFisher, Waltham, Mass.). Quality, UV absorbance quality and visualization of the product was performed on an 1.2% agarose gel.
- the product is resuspended in TE buffer.
- 5′-Capping Alternative Nucleic Acid (mRNA) Structure 5′-capping of alternative mRNA may be completed concomitantly during the in vitro-transcription reaction using the following chemical RNA cap analogs to generate the 5′- guanosine cap structure according to manufacturer protocols: 3′′-O-Me-m7G(5′)ppp(5′)G (the ARCA cap); G(5′)ppp(5′)A; G(5′)ppp(5′)G; m7G(5′)ppp(5′)A; m7G(5′)ppp(5′)G (New England BioLabs, Ipswich, Mass.).5′-capping of alternative mRNA may be completed post- transcriptionally using a Vaccinia Virus Capping Enzyme to generate the “Cap 0” structure: m7G(5′)ppp(5′)G (New England BioLabs, Ipswich, Mass.).
- Cap 1 structure may be generated using both Vaccinia Virus Capping Enzyme and a 2′-O methyl-transferase to generate: m7G(5′)ppp(5′)G-2′-O-methyl.
- Cap 2 structure may be generated from the Cap 1 structure followed by the 2′-O-methylation of the 5′-antepenultimate nucleotide using a 2′-O methyl- transferase.
- Cap 3 structure may be generated from the Cap 2 structure followed by the 2′-O- methylation of the 5′-preantepenultimate nucleotide using a 2′-O methyl-transferase.
- Enzymes are preferably derived from a recombinant source.
- the alternative mRNAs When transfected into mammalian cells, the alternative mRNAs have a stability of 12-18 hours or more than 18 hours, e.g., 24, 36, 48, 60, 72 or greater than 72 hours.
- Example 5 In vivo expression of selected sequences [0436] Lipid nanoparticles containing modified or unmodified mRNA are administered to mice at mRNA doses of at 0.05 mg/kg intravenously, subcutaneous, or intramuscularly. Expression of polypeptides encoded mRNAs is evaluated by any method known in the art. For example, expression of encoded fluorescent protein may be evaluated by isolating cells and measuring fluorescence intensity by fluorescence activated cell sorting (FACS) or fluorescent microscopy.
- FACS fluorescence activated cell sorting
- Example 6 Method of screening for protein expression Electrospray Ionization
- a biological sample which may contain proteins encoded by modified RNA administered to the subject is prepared and analyzed according to the manufacturer protocol for electrospray ionization (ESI) using 1, 2, 3 or 4 mass analyzers.
- ESI electrospray ionization
- a biologic sample may also be analyzed using a tandem ESI mass spectrometry system.
- Patterns of protein fragments, or whole proteins, are compared to known controls for a given protein and identity is determined by comparison.
- Matrix-Assisted Laser Desorption/Ionization A biological sample which may contain proteins encoded by alternative RNA administered to the subject is prepared and analyzed according to the manufacturer protocol for matrix-assisted laser desorption/ionization (MALDI).
- MALDI matrix-assisted laser desorption/ionization
- Patterns of protein fragments, or whole proteins, are compared to known controls for a given protein and identity is determined by comparison.
- Liquid Chromatography-Mass Spectrometry-Mass Spectrometry A biological sample, which may contain proteins encoded by alternative RNA, may be treated with a trypsin enzyme to digest the proteins contained within. The resulting peptides are analyzed by liquid chromatography-mass spectrometry-mass spectrometry (LC/MS/MS). The peptides are fragmented in the mass spectrometer to yield diagnostic patterns that can be matched to protein sequence databases via computer algorithms. The digested sample may be diluted to achieve 1 ng or less starting material for a given protein.
- Biological samples containing a simple buffer background e.g., water or volatile salts
- a simple buffer background e.g., water or volatile salts
- complex backgrounds e.g., detergent, non- volatile salts, glycerol
- Patterns of protein fragments, or whole proteins, are compared to known controls for a given protein and identity is determined by comparison.
- Example 7 In vivo assays with human EPO containing alternative nucleotides formulation
- Modified mRNAs encoding human erythropoietin (hEPO) are formulated in lipid nanoparticles (LNPs) comprising DLin-KC2-DMA, DSPC, Cholesterol, and PEG-DMG at 50:10:38.5:1.5 mol % respectively.
- LNPs lipid nanoparticles
- the LNPs are made by direct injection utilizing nanoprecipitation of ethanol solubilized lipids into a pH 4.050 mM citrate mRNA solution.
- the EPO LNP particle size distributions are characterized by DLS.
- Encapsulation efficiency is determined using a RibogreenTM fluorescence-based assay for detection and quantification of nucleic acids.
- inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein.
- any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.
- a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in some embodiments, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
- “or” should be understood to have the same meaning as “and/or” as defined above.
- At least one of A and B can refer, in some embodiments, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Veterinary Medicine (AREA)
- Public Health (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Animal Behavior & Ethology (AREA)
- Epidemiology (AREA)
- Pharmacology & Pharmacy (AREA)
- Medicinal Chemistry (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biochemistry (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Plant Pathology (AREA)
- Biophysics (AREA)
- Manufacturing & Machinery (AREA)
- Optics & Photonics (AREA)
- Nanotechnology (AREA)
- Dermatology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
Abstract
Aspects of the disclosure relate to mRNAs comprising a relatively low abundance of cytidine: adenosine (CA) dinucleotides that benefit from increased stability relative to mRNAs containing more CpA dinucleotides. The disclosure also relates to methods of modifying an mRNA sequence to improve stability. In some aspects, the disclosure relates to mRNAs comprising modified mRNA sequences with relatively reduced numbers of CpA dinucleotides, and compositions comprising mRNAs with relatively reduced numbers of CpA dinucleotides.
Description
CHEMICAL STABILITY OF MRNA RELATED APPLICATION [0001] This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application No.63/422,103, filed November 3, 2022, the contents of which are incorporated by reference herein in their entirety. BACKGROUND [0002] Recently, messenger ribonucleic acid (mRNA)-based therapeutics have shown promise, e.g., as vaccines for infectious diseases. However, mRNAs are susceptible to cleavage through multiple pathways, such as hydrolysis of phosphodiester bonds. Unlike DNA and self-amplifying RNAs, which can generate additional mRNAs after introduction into cells, cleavage of administered mRNAs reduces the amount of protein that can be translated. SUMMARY [0003] Described herein are RNAs (e.g., mRNAs) in which CpA dinucleotide content has been reduced, relative to a wild-type nucleic acid sequence, or minimized, to improve stability of the RNA. The disclosure is based, at least in part, on the discovery by the inventors that the phosphodiester bond between the cytidine and adenosine nucleotides of the CpA dinucleotide may be particularly susceptible to non-enzymatic cleavage (e.g., via spontaneous hydrolysis). These results are surprising, in part because previous reports in the literature suggested that the UA dinucleotide, rather than CA, is particularly susceptible to cleavage. See, e.g., Kierzek, Nucleic Acids Res.1992.20(19):5079–5084; and Kaukinen et al., Nucleic Acids Res.2002.30(2):468–474. Without wishing to be bound by theory, the inventors posit that reducing the abundance of CpA dinucleotides in an RNA sequence reduces the frequency of such spontaneous cleavage, thereby improving stability of the RNA (e.g., in stored RNA compositions). Such improved RNA stability provides multiple benefits in the production of RNA therapeutics and prophylactics. For example, the improved stability of RNAs in stored RNA compositions allows efficacy to be maintained for longer durations, thereby improving the efficiency of RNA manufacturing. [0004] Reducing CpA dinucleotide content may be achieved by modifying one or more codons in the open reading frame (ORF) of the RNA without changing the amino acid sequence of an encoded protein. For example, one or more UCA codons encoding serine may be changed to UCU, UCC, or UCG, which still encode serine but do not contain a CpA
dinucleotide. This same approach may be used to reduce or eliminate the presence of CpA dinucleotides in codons encoding proline, threonine, and/or arginine. The only amino acids that must be encoded by a codon containing a CpA dinucleotide are histidine (encoded by CAU and CAC) and glutamine (encoded by CAA and CAG), and so the theoretical minimum of CpA dinucleotides in an RNA sequence is limited only by the number of histidine and glutamine residues present in an encoded protein. As another example, methionine, isoleucine, threonine, lysine, and asparagine must be encoded by codons beginning with an adenosine (A) nucleotide, and so a preceding codon that ends in a cytidine (C) nucleotide will result in a CpA dinucleotide at the junction between the two codons. To eliminate such CpA dinucleotides, a first codon ending in a cytidine (C) nucleotide that immediately precedes a second codon encoding methionine, isoleucine, threonine, lysine, or asparagine may be changed to a codon that encodes the same amino acid as the first codon, but does not end in a C nucleotide. A first codon “immediately precedes” a second codon in a nucleic acid sequence if there are no intervening nucleotides between the last nucleotide of the first codon and the first nucleotide of the second codon (e.g., in the sequence GACAUG, the first codon (GAC) encoding aspartate immediately precedes the second codon (AUG) encoding methionine). The same approach may be applied to codons preceding serine- or arginine- encoding codons that begin with adenosine nucleotides. Alternatively, one or more serine-or arginine encoding codons that begin with adenosine nucleotides may be changed to codons that encode the same amino acid, but do not begin with adenosine nucleotides. [0005] In addition or as an alternative to modifying the ORF, other untranslated regions (UTRs) of the RNA, such as the 5′ and 3′ UTRs, may be modified to reduce CpA dinucleotide abundance. In such UTRs, one or more nucleotides of a CpA dinucleotide may be mutated to eliminate CpA dinucleotides from the UTRs. Alternatively, a minimum number of CpA dinucleotides that are present in regulatory motifs may be maintained in a UTR. For example, a Kozak sequence that serves as the site of translation initiation may comprise one or more CpA dinucleotides, to allow efficient translation, while other CpA dinucleotides are eliminated to improve stability without reducing translation efficiency. [0006] Codon and UTR modification to reduce CpA dinucleotide content may comprise specific substitutions maintain other features of an mRNA, such as nucleotide composition, codon optimality, and/or structure, within a desired range. For example, RNAs having higher %G/C contents (percentage of nucleotides in a sequence being guanosine or cytidine nucleotides) may be more stable than RNAs having lower %G/C contents. Without wishing to be bound by theory, the inventors posit that the formation of intramolecular secondary
structures contributes to RNA thermodynamic stability, with G/C-rich RNAs forming more and stronger secondary structures. Thus, in modifying a codon to remove a CpA dinucleotide, a specific codon may be substituted to maintain or increase the %G/C content of the resulting RNA sequence. For example, a first codon ending in a cytidine nucleotide and preceding a second codon beginning with an adenosine nucleotide may be replaced by a codon ending in a guanosine nucleotide, if possible, to avoid reducing the %G/C content of the RNA sequence. [0007] Accordingly, some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is greater than or equal to a theoretical minimum and less than or equal to 300% of the theoretical minimum. [0008] Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum. [0009] Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a CpA dinucleotide content of 6.5% or less. [0010] Some aspects of the disclosure relate to an mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the mRNA has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%, wherein each of the uridine nucleotides of the ORF comprises a chemical modification, wherein: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in
cytidine nucleotides; (f) fewer than 30% of amino acids that immediately precede a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides; and/or (g) fewer than 30% of amino acids that immediately precede an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides. [0011] Some aspects of the disclosure relate to a lipid nanoparticle comprising an mRNA described herein, and an ionizable cationic lipid, a non-cationic lipid, a sterol, and a polyethylene glycol (PEG)-modified lipid. [0012] Some aspects of the disclosure relate to a pharmaceutical composition comprising a lipid nanoparticle described herein, and a pharmaceutically acceptable excipient. [0013] Some aspects of the disclosure relate to a method of producing a modified mRNA sequence comprising an ORF encoding a polypeptide, the method comprising modifying a reference mRNA sequence comprising a reference ORF to produce the modified mRNA sequence by: (a) replacing one or more codons in the reference ORF comprising a CpA dinucleotide with a codon that encodes the same amino acid but does not comprise a CpA dinucleotide; and/or (b) replacing one or more codons in the reference ORF that: (1) ends in a cytidine nucleotide; and (2) is immediately followed in the reference ORF by a codon that encodes an isoleucine, methionine, threonine, asparagine, or lysine, or a codon that encodes a serine or arginine and begins with an adenosine nucleotide, with a codon encoding the same amino acid as the replaced codon but does not end in a cytidine nucleotide. BRIEF DESCRIPTION OF THE DRAWINGS [0014] FIG.1 shows the results of sequencing mRNA fragments generated by spontaneous cleavage of a reference mRNA, as a frequency map of cleavage positions, used to determine the positions of spontaneous (non-enzymatic) cleavage. Sequencing reads were aligned to the full-length mRNA sequence, with the 3′ end of the read indicating the nucleotide in the mRNA sequence where cleavage occurred. [0015] FIGs.2A–2C show the effects of %G/C content and CpA dinucleotide abundance on mRNA structure and stability. FIGs.2A and 2B show the kinetics of mRNA purity, as measured by FACE, during storage of unformulated mRNA at 40 °C (FIG.2A) or 25 °C (FIG.2B), for each of three mRNAs containing reduced CpA dinucleotide contents and for a control mRNA. FIG.2B shows the kinetics of mRNA purity, as measured by reverse-phase
ion pair (RPIP) chromatography, during storage of the same mRNAs formulated in lipid nanoparticles (LNPs) at 25 °C. [0016] FIGs.3A–3C show the effects of CpA dinucleotide content in in vitro expression of a protein encoded by an mRNA. Lipid nanoparticles containing mRNAs were added to EXPI293 cells, and cells were analyzed by staining with an antibody specific to the protein, followed by flow cytometry to determine the percentage of cells expressing the encoded protein (Ag+ cells) (FIG.3A), total fluorescence measured by the product of median fluorescence intensity and the frequency of protein-expressing cells (FIG.3B), and normalized total fluorescence measured as the product of FIG.3B divided by the product measured for mock-transfected cells (FIG.3C). [0017] FIG.4 shows the effects of CpA dinucleotide abundance on immunogenicity of mRNAs comprised in lipid nanoparticles (LNP-mRNA compositions). Mice were administered two doses of the same LNP-mRNA composition on days 1 and 22, with sera collected on day 21, three weeks after administration of the first dose, and day 36, 14 days after administration of the second dose. All mRNAs tested encoded the same antigen with the same amino acid sequence, but individual mRNAs differed in CpA dinucleotide content. DETAILED DESCRIPTION [0018] Aspects of the disclosure relate to non-naturally occurring (modified) mRNAs containing relatively reduced abundances of CpA dinucleotides, and methods of improving mRNA stability by reducing the number of CpA dinucleotides in the mRNA sequence. The disclosure is based, in part, on the discovery by the inventors that the CpA dinucleotide is the most susceptible to spontaneous cleavage in mRNAs containing 1-methylpseudouridine nucleotides in place of conventional uridine nucleotides. The compositions and methods described herein are useful, in some embodiments, for providing RNA therapeutics with improved stability, increased expression of encoded proteins, and/or improved efficacy. [0019] Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is greater than or equal to a theoretical minimum and less than or equal to 300% of the theoretical minimum. [0020] Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100
nucleotides of the ORF greater than the theoretical minimum. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum is no more than 10, no more than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, no more than 2, or no more than 1. [0021] Some aspects of the disclosure relate to a non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a CpA dinucleotide content of 6.5% or less. In some embodiments, the ORF comprises a CpA dinucleotide content of 6.0% or less, 5.5% or less, 5% or less, 4.5% or less, 4% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.5% or less. [0022] In some embodiments, (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (f) fewer than 30% of amino acids that immediately precede a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides; and/or (g) fewer than 30% of amino acids that immediately precede an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides. In some embodiments, the nucleotide sequence of the mRNA comprises a %G/C content of 30% – 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%. [0023] In some embodiments, one or more nucleotides of the mRNA comprises a chemically modified nucleotide. In some embodiments, each uridine nucleotide of the mRNA comprises a chemically modified nucleotide. [0024] Some aspects of the disclosure relate to an mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the mRNA has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%,
65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%, wherein each of the uridine nucleotides of the ORF comprises a chemical modification, wherein: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (f) fewer than 30% of amino acids that immediately precede a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides; and/or (g) fewer than 30% of amino acids that immediately precede an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides. [0025] In some embodiments, the chemically modified nucleotide comprise N1- methylpseudouridine. [0026] In some embodiments, fewer than 15% of serine residues, fewer than 27% of proline residues, fewer than 28% of threonine residues, and fewer than 23% of alanine residues in the polypeptide are encoded by codons in the ORF comprising a CpA dinucleotide. In some embodiments, (a) no serine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (b) no proline residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (c) no threonine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; and/or (d) no alanine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide. [0027] In some embodiments, (a) no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (b) no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (c) no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (d) no amino acid that immediately
precedes an asparagine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (e) no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (f) no amino acid that immediately precedes a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide; and/or (g) no amino acid that immediately precedes an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, or lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. In some embodiments, no codon in the ORF beginning with an adenosine nucleotide is immediately preceded by a codon in the ORF that ends in a cytidine nucleotide. [0028] In some embodiments, the ORF is codon-optimized for expression in a cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the mRNA further comprises: (i) a 5′ untranslated region (UTR); and/or (ii) a 3′ UTR. In some embodiments, the 5′ UTR is a heterologous UTR and/or the 3′ UTR is a heterologous UTR. In some embodiments, the 5′ UTR comprises five or fewer, four or fewer, three or fewer, two or fewer, one or fewer, or zero CpA dinucleotides. In some embodiments, the 5′ UTR does not comprise a CpA dinucleotide. In some embodiments, the 3′ UTR comprises five or fewer, four or fewer, three or fewer, two or fewer, one or fewer, or zero CpA dinucleotides. In some embodiments, the 3′ UTR does not comprise a CpA dinucleotide. In some embodiments, the last nucleotide of the 5′ UTR is not a cytidine nucleotide. [0029] In some embodiments, the 5′ UTR has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%. In some embodiments, the ORF has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%. In some embodiments, the 3′ UTR has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%. In some embodiments, the mRNA further comprises: (iii) a 5′ cap structure; and/or (iv) a poly-A tail. In some embodiments, the last nucleotide of the 3′ UTR is not a cytidine nucleotide. In some embodiments, the 5′ cap structure comprises 7mG(5')ppp(5')NlmpNp.
[0030] In some embodiments, the level of expression in a mammalian cell of the encoded polypeptide from the mRNA is at least 50% of the level of expression of a reference mRNA comprising a reference open reading frame (rORF) encoding the polypeptide, wherein the rORF comprises a higher number of CpA dinucleotides than the ORF. In some embodiments, one or more CpA dinucleotides of the mRNA comprises a modified cytidine nucleotide and/or a modified adenosine nucleotide. In some embodiments, the number of CpA dinucleotides comprising an unmodified cytidine nucleotide and an unmodified adenosine nucleotide in the ORF is 100%, 95% or less, 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, or 10% or less of the total number of histidine and glutamine residues in the polypeptide. In some embodiments, the polypeptide comprises 9–5,000, 20–4,000, 30–3,000, 40–2,000, or 50–1,500 amino acids. In some embodiments, the polypeptide is a vaccine antigen or a therapeutic protein. [0031] In some embodiments, a coefficient of degradation at 25 °C of the mRNA is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, a composition comprising a plurality of the mRNAs remains above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNAs comprising a wild-type ORF encoding the polypeptide. In some embodiments, storage of the mRNA is conducted at a temperature between about 2 °C to about 8 °C. In some embodiments, the mRNA is stored in a buffer comprising 10–50 mM Tris and 5–10% sucrose, wherein the buffer has a pH of about 7.3 to about 7.6. [0032] In some embodiments, the stability of the mRNA is increased relative to a reference mRNA having a higher number of CpA dinucleotides, the reference mRNA comprising a reference open reading frame (rORF) encoding the polypeptide, wherein the rORF has a higher number of CpA dinucleotides than the ORF. [0033] Some aspects of the disclosure relate to a lipid nanoparticle comprising an mRNA described herein, and an ionizable cationic lipid, a non-cationic lipid, a sterol, and a polyethylene glycol (PEG)-modified lipid. In some embodiments, the lipid nanoparticle comprises 20–60% ionizable cationic lipid, and 5–25% non-cationic lipid, 25–55% cholesterol, and 0.5–15% polyethylene glycol (PEG)-modified lipid. In some embodiments, a coefficient of degradation at 25 °C of the mRNA in the lipid nanoparticle is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, a composition comprising a plurality
of the lipid nanoparticles remains above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of the lipid nanoparticles and mRNAs comprising a wild- type ORF encoding the polypeptide. In some embodiments, storage of the lipid nanoparticle is conducted at a temperature between about 2 °C to about 8 °C. [0034] In some embodiments, the lipid nanoparticle further comprises a stabilizing compound of Formula (I):
or a tautomer or solvate thereof, wherein: is a single bond or a double bond; R1 is H; R2 is OCH3, or together with R3 is OCH2O; R3 is OCH3, or together with R2 is OCH2O; R4 is H; R5 is H or OCH3; R6 is OCH3; R7 is H or OCH3; R8 is H; R9 is H or CH3; and X is a pharmaceutically acceptable anion. [0035] In some embodiments, the stabilizing compound is wherein the compound is of:
; Formula (Ic) or a tautomer or solvate thereof. [0036] In some embodiments, the lipid nanoparticle further comprises a stabilizing compound of Formula (II):
or a tautomer or solvate thereof, wherein: R10 is H; R11 is H; R12 together with R13 is OCH2O; R14 is H; R15 together with R16 is OCH2O; R17 is H; and X is a pharmaceutically acceptable anion. [0037] Some aspects of the disclosure relate to a pharmaceutical composition comprising a lipid nanoparticle described herein, and a pharmaceutically acceptable excipient. [0038] Some aspects of the disclosure relate to a method of producing a modified mRNA sequence comprising an ORF encoding a polypeptide, the method comprising modifying a reference mRNA sequence comprising a reference ORF to produce the modified mRNA sequence by: (a) replacing one or more codons in the reference ORF comprising a CpA dinucleotide with a codon that encodes the same amino acid but does not comprise a CpA dinucleotide; and/or (b) replacing one or more codons in the reference ORF that: (1) ends in a
cytidine nucleotide; and (2) is immediately followed in the reference ORF by a codon that encodes an isoleucine, methionine, threonine, asparagine, or lysine, or a codon that encodes a serine or arginine and begins with an adenosine nucleotide, with a codon encoding the same amino acid as the replaced codon but does not end in a cytidine nucleotide. [0039] In some embodiments, the reference mRNA sequence further comprises: (i) a reference 5′ untranslated region (UTR); and/or (ii) a reference 3′ UTR. In some embodiments, the reference 5′ UTR is a heterologous 5′ UTR and/or the reference 3′ UTR is a heterologous 3′ UTR. In some embodiments, the replacing comprises changing the last nucleotide of the reference 5′ UTR from a cytidine nucleotide to a non-cytidine nucleotide. In some embodiments, the reference mRNA sequence further comprises: (iii) a 5′ cap structure; and/or (iv) a poly-A region. [0040] In some embodiments, the replacing comprises changing the last nucleotide of the reference 3′ UTR from a cytidine nucleotide to a non-cytidine nucleotide. In some embodiments, the method further comprises replacing one or more cytidine nucleotides in the reference mRNA sequence with guanosine nucleotides. In some embodiments, the method further comprises replacing one or more unmodified cytidine nucleotides in the reference mRNA sequence with modified cytidine nucleotides. In some embodiments, the method further comprises replacing one or more unmodified adenosine nucleotides in the reference mRNA sequence with modified adenosine nucleotides. In some embodiments, the method further comprises replacing one or more adenosine nucleotides in the reference mRNA sequence with uracil nucleotides. In some embodiments, the method further comprises replacing one or more adenosine nucleotides in the reference mRNA sequence, that are not immediately followed by a second adenosine nucleotide, with cytidine nucleotides. In some embodiments, the method further comprises replacing one or more adenosine nucleotides in the reference mRNA sequence with guanosine nucleotides. [0041] In some embodiments, the ORF of the modified mRNA sequence comprises a number of CpA dinucleotides that is greater than or equal to the theoretical minimum and less than or equal to 300% of the theoretical minimum. [0042] In some embodiments, the ORF of the modified mRNA sequences comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum is no more than 10, no more
than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, no more than 2, or no more than 1. [0043] In some embodiments, the ORF of the modified mRNA sequence comprises a CpA dinucleotide content of 6.5% or less. In some embodiments, the ORF of the modified mRNA sequence comprises a CpA dinucleotide content of 6.0% or less, 5.5% or less, 5% or less, 4.5% or less, 4% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.5% or less. [0044] In some embodiments, in the modified mRNA sequence: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (f) fewer than 30% of amino acids that immediately precede a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides; and/or (g) fewer than 30% of amino acids that immediately precede an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides. [0045] In some embodiments, in the modified mRNA sequence, fewer than 15% of serine residues, fewer than 27% of proline residues, fewer than 28% of threonine residues, and fewer than 23% of alanine residues in the polypeptide are encoded by codons in the ORF that comprise a CpA dinucleotide. In some embodiments,, in the modified mRNA sequence: (a) no serine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (b) no proline residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (c) no threonine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; and/or (d) no alanine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide. [0046] In some embodiments,, in the modified mRNA sequence: (a) no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon in the
ORF that ends in a cytidine nucleotide; (b) no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (c) no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (d) no amino acid that immediately precedes an asparagine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (e) no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (f) no amino acid that immediately precedes a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide; and/or (g) no amino acid that immediately precedes an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide. In some embodiments, in the modified mRNA sequence, no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. In some embodiments, in the modified mRNA sequence, no codon in the ORF beginning with an adenosine nucleotide is immediately preceded by a codon in the ORF that ends in a cytidine nucleotide. [0047] In some embodiments, the modified mRNA sequence comprises a %G/C content of 30% – 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%. [0048] In some embodiments, one or more nucleotides of the modified mRNA sequence comprises a chemically modified nucleotide. In some embodiments, each of the uridine nucleotides of the modified mRNA sequence comprises a chemically modified nucleotide. In some embodiments, the chemically modified nucleotide comprises N1-methylpseudouridine. [0049] In some embodiments, one or more CpA dinucleotides of the modified mRNA sequence comprises a modified cytidine nucleotide and/or a modified adenosine nucleotide. In some embodiments, the number of CpA dinucleotides comprising an unmodified cytidine nucleotide and an unmodified adenosine nucleotide in the ORF of the modified mRNA sequence is 100%, 95% or less, 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, or 10% or less of the total number of histidine and glutamine residues in the polypeptide.
[0050] In some embodiments, the polypeptide comprises 9–5,000, 20–4,000, 30–3,000, 40–2,000, or 50–1,500 amino acids. In some embodiments, the polypeptide is a vaccine antigen or a therapeutic protein. [0051] In some embodiments, the ORF of the modified mRNA sequence is codon- optimized for expression in a cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. [0052] In some embodiments, the method further comprises transcribing the modified mRNA sequence to produce a modified mRNA. [0053] In some embodiments, a level of expression in a mammalian cell of the encoded polypeptide from the modified mRNA is at least 80% of a level of expression of the reference mRNA. In some embodiments, a coefficient of degradation at 25 °C of the modified mRNA is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising the reference ORF. In some embodiments, a composition comprising a plurality of the mRNAs is remains at least above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNAs comprising the reference ORF. In some embodiments, storage of the modified mRNA is conducted at a temperature between about 2 °C to about 8 °C. In some embodiments, the modified mRNA has increased stability relative to a reference mRNA comprising the reference mRNA sequence. CpA dinucleotide contents and mRNA stability [0054] Some aspects relate to mRNAs encoding polypeptides, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, where the mRNA comprises a number of CpA dinucleotides content in the ORF that is at least equal to (i.e., equal to or greater than) a theoretical minimum number of CpA dinucleotides and at most (i.e., less than or equal to) 500% of the theoretical minimum. Other aspects relate to methods of modifying a reference mRNA sequence to produce a modified RNA sequence having fewer CpA dinucleotides than the reference mRNA sequence. As used herein, a “theoretical minimum” number of CpA dinucleotides refers to the number of histidine and glutamine residues present in a polypeptide encoded by an open reading frame. If a histidine or glutamine is present in an amino acid sequence, a codon beginning with CA is required to encode that amino acid, and so some CpA dinucleotides are required for a nucleic acid to encode a protein comprising histidine and/or glutamine residues. However, other amino acids that may be encoded by codons containing CpA dinucleotides (e.g., threonine, encoded by the codon ACA) may be
also encoded by codons that do not contain a CpA dinucleotide (e.g., ACU, ACC, and ACG codons also encode threonine). Thus, portions of an mRNA sequence other than codons encoding histidine or glutamine may be mutated to reduce the number of CpA dinucleotides in an mRNA sequence to a level closer to the theoretical minimum. In some embodiments, the number of CpA dinucleotides in an ORF of a modified mRNA or modified sequence is 100% – 400%, 100% – 300%, 100% – 200%, 100% – 150%, or 100% – 125% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 400% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 300% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 250% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 200% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 150% of the theoretical minimum. In some embodiments, the number of CpA dinucleotides is at most 125% of the theoretical minimum. [0055] References to the ORF of an mRNA, its length, the polypeptide it encodes, and codons within the ORF, are to be understood as referring to the longest ORF in the mRNA, not internal open reading frames in the same frame as the ORF, alternative reading frames, or sequences that may be translated due to initiation at a start codon that is downstream from the first occurrence of the sequence AUG in the mRNA. [0056] Some aspects relate to mRNAs comprising an ORF encoding a polypeptide, with the ORF having a %CpA dinucleotide content of 6.5% or less. Some embodiments of such mRNAs contain ORFs with %CpA dinucleotide contents that are reduced, relative to a nucleic acid sequence encoding the same polypeptide (i.e., having the same amino acid sequence). The %CpA dinucleotide content (percentage CpA dinucleotide content) of a sequence can be determined by dividing the number of CpA dinucleotides in the sequence by the total number of dinucleotides in the sequence. Because consecutive dinucleotides in a nucleic acid sequence overlap (e.g., in an ORF beginning with the start codon AUG, the first dinucleotide is an AU dinucleotide, and the second dinucleotide is a UG dinucleotide), the number of dinucleotides in a sequence is one fewer than the number of nucleotides. For example, an ORF having 60 CpA dinucleotides and being 301 nucleotides in length has a %CpA dinucleotide content of 20%. In some embodiments, the ORF of an mRNA described herein has a %CpA dinucleotide content of 6.0% or less, 5.0% or less, 4.5% or less, 4.0% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 6.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 5.5% or less. In
some embodiments, the ORF has a %CpA dinucleotide content of 5.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 4.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 4.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 3.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 3.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 2.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 2.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 1.5% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 1.0% or less. In some embodiments, the ORF has a %CpA dinucleotide content of 0.5% or less. [0057] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by the methods described herein, an increased percentage of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. A CpA dinucleotide is comprised within a codon if it forms either (i) the first and second nucleotides of a codon, or (ii) the second and third nucleotides of the codon, but not if it forms the third nucleotide of one codon and the first nucleotide of the second codon (i.e., the CpA dinucleotide bridges two codons). In some embodiments, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or up to 100% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, 30–100%, 30–80%, 30–50%, 40–100%, 40–90%, 40–80%, 40–60%, 50–100%, 50–90%, 50– 80%, 50–70%, 50–60%, 60–100%, 60–90%, 60–80%, 60–70%, 70–100%, 70–90%, 70–80%, 80–100%, 80–90%, or 90–100% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 50% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 60% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 70% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 80% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 90% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, at least 95% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine. In some embodiments, 100% of CpA dinucleotides in the ORF are comprised within codons encoding histidine or glutamine.
[0058] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by the methods described herein, the %CpA dinucleotide content in the ORF is reduced, relative to the %CpA dinucleotide content in a wild-type or reference ORF encoding the same polypeptide (e.g., having the same amino acid sequence). A “wild- type ORF,” as used herein, is the nucleotide sequence of a naturally occurring ORF that encodes the same polypeptide (having the same amino acid sequence) as the ORF of a modified mRNA or modified mRNA sequence, where the naturally occurring ORF is present on a naturally occurring mRNA. A “reference ORF,” as a starting sequence for modification to reduce %CpA dinucleotide content in a modified mRNA sequence, may be a wild-type ORF, or a non-naturally occurring ORF. In some embodiments, an ORF of a modified mRNA or modified mRNA sequence has a %CpA dinucleotide content that is 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, or 30% or less of the %CpA dinucleotide content in a wild-type or reference ORF encoding the same polypeptide. In some embodiments, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of CpA dinucleotides in the wild-type or reference ORF that are not comprised in a codon encoding histidine or glutamine are absent in a modified mRNA sequence encoding the polypeptide. [0059] Some aspects relate to mRNAs comprising an ORF encoding a polypeptide, where the ORF comprises a number of CpA dinucleotides that is greater than or equal to a theoretical minimum, but the number of CpA dinucleotides above (greater than) the theoretical minimum is no more than 11 per every 100 nucleotides of the ORF. For example, an mRNA having a theoretical minimum of 20 CpA dinucleotides (due to encoding a polypeptide with a total of 20 histidine and/or glutamine residues), and encoding a protein that is 99 amino acids in length, thus having an ORF 300 nucleotides in length (including the STOP codon), could have 33 CpA dinucleotides above the minimum of 20 and still satisfy the requirement of having no more than 11 CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 10. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 9. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 8. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 7. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than
6. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 5. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 4. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 3. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 2. In some embodiments, the number of CpA dinucleotides per 100 nucleotides of the ORF above the theoretical minimum is no more than 1. [0060] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by methods described herein, the proportion of codons encoding a given amino acid is lower than the expected proportion based on codon usage frequencies in nature. For example, approximately 15% of serine residues in human proteins are encoded by codons having the RNA sequence UCA (DNA sequence TCA). Similarly, approximately 27% of proline residues are encoded CCA codons, approximately 28% of threonine residues are encoded by ACA codons, and approximately 23% of alanine residues are encoded by GCA codons. Thus, in some embodiments, (a) fewer than 15% of serine residues in an encoded polypeptide are encoded by codons comprising the sequence UCA; (b) fewer than 27% of proline residues are encoded by codons comprising the sequence CCA; (c) fewer than 28% of threonine residues are encoded by codons comprising the sequence ACA; and (d) fewer than 23% of alanine residues are encoded by codons comprising the sequence GCA. In some embodiments, fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of serine residues are encoded by UCA codons. In some embodiments, fewer than 27%, fewer than 25%, fewer than 20%, fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of proline residues are encoded by CCA codons. In some embodiments, fewer than 28%, fewer than 25%, fewer than 20%, fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of threonine residues are encoded by ACA codons. In some embodiments, fewer than 23%, fewer than 20%, fewer than 15%, fewer than 12%, fewer than 10%, fewer than 8%, fewer than 6%, fewer than 5%, fewer than 4%, fewer than 3%, fewer than 2%, or fewer than 1% of alanine residues are encoded by GCA codons. In some embodiments, fewer than 2% of serine residues are encoded by codons comprising the sequence UCA. In some embodiments, fewer than 12% of proline residues are encoded by
codons comprising the sequence CCA. In some embodiments, fewer than 3% of threonine residues are encoded by codons comprising the sequence ACA. In some embodiments, fewer than 5% of alanine residues are encoded by codons comprising the sequence GCA. In some embodiments, no serine residue is encoded by a codon comprising the RNA sequence UCA. In some embodiments, no proline residue is encoded by a codon comprising the sequence CCA. In some embodiments, no threonine residue is encoded by a codon comprising the sequence ACA. In some embodiments, no alanine residue is encoded by a codon comprising the sequence GCA. In some embodiments, each serine, proline, threonine, and alanine residue is encoded by a codon that does not comprise a CpA dinucleotide. In some embodiments, none of the serine, proline, threonine, and alanine residues is encoded by a codon comprising a CpA dinucleotide. Replacement of codons encoding serine, proline, threonine, and/or alanine is contemplated because such codons may contain CpA dinucleotides in humans, but similar approaches are contemplated for reducing numbers of CpA dinucleotidesin mRNAs suitable for introduction into cells with different genetic codes in which other amino acids may be encoded by codons containing CpA dinucleotides. [0061] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by methods described herein, the proportion of codons immediately preceding a codon encoding a given amino acid is lower than the expected proportion based on codon usage frequencies in nature. For example, approximately 30% of codons in human open reading frames end in cytidine nucleotides. When such a codon ending in a cytidine (C) nucleotide is immediately followed by a codon encoding isoleucine, methionine, threonine, asparagine, or lysine, which must begin with an adenosine (A) nucleotide, a CpA dinucleotide is formed at the junction between the first (5′) and second (3′) codon. While codons encoding isoleucine, methionine, threonine, asparagine, and lysine cannot be mutated to begin with a different nucleotide without changing the encoded amino acid, an upstream codon may be substituted with a codon that does not end in a cytidine nucleotide, to reduce the abundance of CpA dinucleotides formed at the junction between two codons. Similarly, serine may be encoded by codons comprising the sequence AGU or AGC, and arginine may be encoded by codons comprising the sequence AGA or AGG. Therefore, substituting the codons immediately preceding such serine-encoding AGU and AGC codons, and/or such arginine-encoding AGA and AGG codons, may also reduce the abundance of such CpA dinucleotides at the junctions between two codons. Unlike isoleucine, methionine, threonine, asparagine, and lysine, however, serine and arginine may also be encoded by codons that do not begin with adenosine nucleotides. Instead, serine may be encoded by
codons beginning with UC and ending with a guanosine, uridine, or cytidine nucleotide, and arginine may be encoded by codons beginning with CG and ending with any third nucleotide. Thus, codons encoding serine or arginine, and beginning with adenosine nucleotides, may be substituted with alternative codons that encode the same amino acid but do not begin with an adenosine nucleotide. Replacement of codons immediately preceding codons encoding isoleucine, methionine, asparagine, lysine, serine, or arginine, is specifically contemplated because all codons encoding isoleucine, methionine, asparagine, and lysine, and certain codons encoding serine and arginine, begin with adenosine nucleosides in humans, but similar approaches are contemplated for reducing numbers of CpA dinucleotides in mRNAs suitable for introduction into cells with different genetic codes in which other amino acids are encoded by codons beginning with adenosine residues. [0062] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by methods described herein, fewer than 30% of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 25% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 20% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 15% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 12% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 10% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 8% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 6% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 5% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 4% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 3% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 2% or fewer of codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, 1% or fewer of codons
beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. In some embodiments, no codons beginning with an adenosine nucleotide are immediately preceded by a codon ending in a cytidine nucleotide. [0063] In some embodiments, fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. [0064] In some embodiments, fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an methionine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. [0065] In some embodiments, fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. [0066] In some embodiments, fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments,
no amino acid that immediately precedes an asparagine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. [0067] In some embodiments, fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. [0068] In some embodiments, fewer than 30% of amino acids that immediately precede a serine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede a serine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes a serine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. [0069] In some embodiments, fewer than 30% of amino acids that immediately precede an arginine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, 25% or fewer, 20% or fewer, 15% or fewer, 12% or fewer, 10% or fewer, 8% or fewer, 6% or fewer, 5% or fewer, 4% or fewer, 3% or fewer, 2% or fewer, or 1% or fewer of amino acids that immediately precede an arginine residue in the polypeptide are encoded by codons that end in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes an arginine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. [0070] In some embodiments, no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, or lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide. In some embodiments, no amino acid that immediately precedes a serine or arginine in the polypeptide, where the serine or arginine is encoded by a codon beginning with an adenosine nucleotide, is encoded by a codon that ends in a cytidine nucleotide. [0071] To reduce the number of CpA dinucleotidesof an mRNA sequence, a codon comprising a CpA dinucleotide may be substituted with any synonymous codon (i.e., a codon
encoding the same amino acid as the substituted codon) that does not comprise a CpA dinucleotide. Multiple codons comprising CpA dinucleotides may be substituted with the same synonymous codon, or with different synonymous codons. For example, two or more ACA codons may each be substituted with an ACU codon, or one ACA codon may be substituted with an ACC codon and another may be substituted with an ACG codon. Substituting multiple instances of the same codon with different synonymous codons may be useful, for example, to achieve a desired distribution of codons encoding a given amino acid in an mRNA sequence. In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer UCA codons are substituted with a UCC codon. In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer UCA codons are substituted with a UCG codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of UCA codons are substituted with a UCC codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of UCA codons are substituted with a UCG codon. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding serine residues are UCU codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding serine residues are UCC codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding serine residues are UCG codons. [0072] In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer GCA codons are substituted with a GCC codon. In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer GCA codons are substituted with a GCG codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of GCA codons are substituted with a GCC codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of GCA codons are substituted with a GCG codon. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10– 70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding alanine residues are GCU codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5– 80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding alanine residues are GCC codons. In some embodiments, the modified mRNA sequence comprises an ORF in
which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding alanine residues are GCG codons. [0073] In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer ACA codons are substituted with a ACC codon. In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer ACA codons are substituted with a ACG codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of ACA codons are substituted with a ACC codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of ACA codons are substituted with a ACG codon. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10– 70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding threonine residues are ACU codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding threonine residues are ACC codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding threonine residues are ACG codons. [0074] In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer CCA codons are substituted with a CCC codon. In some embodiments, 50% or fewer, 40% or fewer, 30% or fewer, 25% or fewer, 20% or fewer, 15% or fewer, 10% or fewer, or 5% or fewer CCA codons are substituted with a CCG codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of CCA codons are substituted with a CCC codon. In some embodiments, 5–75%, 10–60%, 15–50%, 20–40%, or 25–35% of CCA codons are substituted with a CCG codon. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10– 70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding proline residues are CCU codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5– 80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding proline residues are CCC codons. In some embodiments, the modified mRNA sequence comprises an ORF in which 5–80%, 10–70%, 15–60%, 20–50%, 25–40%, or 25–35% of codons encoding proline residues are CCG codons. [0075] In some embodiments, substituting multiple instances of a given codon with the same synonymous codon may be useful, for example, to achieve a desired property of an mRNA sequence (e.g., %G/C content). In some embodiments, one or more codons are substituted with codons comprising a higher %G/C content. In some embodiments, 50% or
more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of UCA codons are substituted with codons comprising either UCC or UCG. In some embodiments, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CCA codons are substituted with codons comprising either CCC or CCG. In some embodiments, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of ACA codons are substituted with codons comprising either ACC or ACG. In some embodiments, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of GCA codons are substituted with codons comprising either GCC or GCG. [0076] In some embodiments, one or more codons are substituted with codons comprising an equal %G/C content. In some embodiments, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of UCA codons are substituted with UCU codons. In some embodiments, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CCA codons are substituted with CCU codons. In some embodiments, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of ACA codons are substituted with ACU codons. In some embodiments, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of GCA codons are substituted with CCU codons. [0077] In addition to substituting codons to reduce the abundance of CpA dinucleotides in the ORF of an mRNA, CpA dinucleotide abundance may also be reduced by substituting nucleotides in untranslated regions (UTRs) of an mRNA, such as a 5′ UTR or 3′ UTR. The extent to which mRNA stability may be improved by substituting one or more nucleotides of the 5′ UTR or 3′ UTR depends on the abundance of CpA dinucleotides in the sequence of unmodified UTRs. In some embodiments, 50% or more, 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a 5′ UTR are removed by substitution. In some embodiments, 50% or more, 60% or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a 3′ UTR are removed by substitution. Removing one
or more CpA dinucleotides from an mRNA sequence may be achieved by substituting the cytidine nucleotide, the adenosine nucleotide, or both nucleotides of a CpA dinucleotide with different nucleotides, provided that the substitution does not introduce a new CpA dinucleotide into the sequence. For example, substituting the first adenosine nucleotide in the sequence CAA with a cytidine nucleotide would produce the sequence CCA, which contains the same number of CpA dinucleotides, and thus an alternative substitution would be required to reduce the number of CpA dinucleotides in this sequence. [0078] In some embodiments of the modified mRNAs described herein or modified mRNA sequences produced by methods described herein, the modified mRNA comprises a 5′ UTR that does not comprise a CpA dinucleotide. In some embodiments, an mRNA described herein comprises a 3′ UTR that does not comprise a CpA dinucleotide. In some embodiments, the only CpA dinucleotides present in an mRNA sequence are located in codons encoding histidine or glutamine residues. [0079] In some embodiments, an mRNA sequence comprises one or more CpA dinucleotides that are present in regulatory motifs. In some embodiments, the 5′ UTR comprises 5 or fewer, 4 or fewer, 3 or fewer, 2 or fewer, 1 or fewer, or 0 CpA dinucleotides. In some embodiments, the 5′ UTR comprises no more than five CpA dinucleotides. In some embodiments, the 5′ UTR comprises no more than four CpA dinucleotides. In some embodiments, the 5′ UTR comprises no more than three CpA dinucleotides. In some embodiments, the 5′ UTR comprises no more than two CpA dinucleotides. In some embodiments, the 5′ UTR comprises no more than one CpA dinucleotides. In some embodiments, the 5′ UTR does not comprise a CpA dinucleotide. In some embodiments, the 3′ UTR comprises 5 or fewer, 4 or fewer, 3 or fewer, 2 or fewer, 1 or fewer, or 0 CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than five CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than four CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than three CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than two CpA dinucleotides. In some embodiments, the 3′ UTR comprises no more than one CpA dinucleotides. In some embodiments, the 3′ UTR does not comprise a CpA dinucleotide. In some embodiments, the last nucleotide of the 5′ UTR (immediately preceding the AUG start codon) is not a cytidine nucleotide. In some embodiments, the last nucleotide of the 3′ UTR (immediately preceding the polyA tail) is not a cytidine nucleotide. [0080] Some embodiments of mRNAs described herein, and modified mRNAs made by described methods, comprise a sequence with a %G/C content of 30% – 80%, 40% – 70%,
50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%. In some embodiments, the nucleic acid sequence of the full-length mRNA comprises a %G/C content of 30% to 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%. In some embodiments, the mRNA comprises an ORF with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%. In some embodiments, the mRNA comprises 5′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%. In some embodiments, the mRNA comprises 3′ UTR with a %G/C content from about 30% to about 80%, about 35% to about 70%, about 40% to about 60%, about 45% to about 55%, about 40% to about 70%, about 50% to about 60%, about 35% to about 50%, about 50% to about 50% to about 65%, about 65% to about 70%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 70%, about 70% to about 75%, or about 75% to about 80%. In some embodiments, a modified mRNA made by a method described herein comprises a higher %G/C content than a reference mRNA sequence. In some embodiments, the %G/C content of the modified mRNA sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the reference RNA sequence. In some embodiments, the %G/C content of the modified ORF sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the reference ORF sequence. In some embodiments, the %G/C content of the modified 5′ UTR sequence is 2% or more, 3% or more, 4% or more, 5% or more, 6% or more, 7% or more, 8% or more, 9% or more, 10% or more, 12% or more, 15% or more, or 20% or more than the %G/C content of the reference 3′ UTR sequence. [0081] Some embodiments of mRNAs described herein, and modified mRNAs made by described methods, express one or more encoded proteins in a mammalian cell at a level that
is at least 50% of the level of expression of a reference mRNA encoding a protein with the same amino acid sequence, but containing a higher number of CpA dinucleotides. Expression of an encoded protein may refer to the number of copies of an encoded polypeptide produced by translation of a given mRNA molecule. Typically, a reduction in the level of an mRNA (e.g., by mRNA cleavage) results in a reduction in the level of a polypeptide translated therefrom. The level of expression may be determined using standard techniques for measuring protein. In some embodiments, an mRNA has a level of expression in a mammalian cell that is at least 60%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or at least 100% of the level of expression of a reference mRNA encoding a protein with the same amino acid sequence, but containing a higher number of CpA dinucleotides. Examples of mammalian cells for use in evaluating expression of an mRNA include, without limitation, humans, mice, rats, hamsters, guinea pigs, cats, dogs, chimpanzees, macaques, baboons, and gorillas. In some embodiments, the mammalian cell is a human cell. [0082] Some embodiments of the mRNAs described herein or produced by a method described herein are stable for longer periods of time than reference mRNAs having higher numbers of CpA dinucleotides but encoding a protein with the same amino acid sequence. In some embodiments, the modified mRNA has a coefficient of degradation below a threshold value. As used herein, a “coefficient of degradation” refers to a parameter of an equation describing the loss of nucleic acid purity over time. As used herein, “nucleic acid purity” refers to the percentage of nucleic acid in a composition having a desired sequence and structure. Compositions may be prepared using nucleic acids having a specific sequence encoding a protein to be expressed in cells. During storage, the nucleic acid may be degraded by environmental factors such as water or nucleases. Water molecules can hydrolyze the phosphodiester bond that bridges a phosphate moiety and sugar moiety in the sugar- phosphate backbone of a nucleic acid, resulting in the production of two separate nucleic acid molecules, neither of which contains an intact sequence encoding the full-length protein encoded by the unhydrolyzed nucleic acid. Nucleases are enzymes that can facilitate this process, but nucleic acids are susceptible to degradation by water molecules even in the absence of environmental nucleases. Nucleic acid purity may be measured by any one of multiple methods known in the art, such as mass spectrometry or high-performance liquid chromatography (HPLC) (see, e.g., Papadoyannis et al., J Liq Chrom Relat Tech.2007. 27(6):1083–1092). In HPLC, a sample to be analyzed, such as nucleic acid, is dissolved in a solvent (mobile phase) and passed through a column containing a solid material (stationary
phase), with a detector measuring the presence of dissolved sample molecules as the mobile phase is eluted from the column. The rate at which molecules of the sample move through the stationary phase depends on multiple factors, including size, such that different components of the sample will be observed at different times. A sample containing 100% pure nucleic acid will produce a single peak (main peak) on a chromatogram when analyzed by HPLC, while a sample containing multiple different nucleic acid molecules will produce multiple peaks, including a main peak and one or more impurity peaks, for a total of N peaks. To calculate the purity of a nucleic acid using HPLC analysis, the area under the curve (A.U.C.) of each of N peaks is calculated by integration, and the percent purity is calculated using the equation % ^^^^^^ = ^^^ (^^^^ ^^^^) ∑ ^ ^^^ ^^^(^^^^^) . [0083] Loss of nucleic acid purity over time may be described by a differential equation of the form ^^ = −^ , where P is nucleic acid purity (%), λ is the coefficient of degradation, and dP/dt is the rate of change in nucleic acid purity. Alternatively, nucleic acid purity over time may be described by an equation of the form P(t) = P0e –λt, where P(t) is nucleic acid purity (%) at a given time, t, P0 is initial nucleic acid purity at time t=0, e is the base of the natural logarithm, and λ is the coefficient of degradation. In both equation forms, a positive value of λ indicates exponential decay, while a negative λ indicates exponential growth, with larger absolute values of λ indicating faster decay or growth, respectively. In some embodiments, the coefficient of degradation is expressed in units of day-1. In some embodiments, the modified mRNA has a coefficient of degradation at 25 °C that is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA at a temperature of 2 °C – 8 °C is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 90% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 80% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 70% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the coefficient of degradation of the modified mRNA is 60% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the
coefficient of degradation of the modified mRNA is 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide. [0084] In some embodiments, the decrease in degradation coefficient is calculated with respect to storage of modified mRNAs in the absence of lipid nanoparticles. In some embodiments, the decrease in degradation coefficient is calculated with respect to storage of modified mRNAs in a buffer lacking lipid nanoparticles. In some embodiments, the buffer comprises 10–100 mM Tris. In some embodiments, the buffer comprises 5–10% sucrose. In some embodiments, the buffer has a pH of about 7.3 to about 7.6. In some embodiments, the buffer comprises 10–100 mM Tris, 5–10% sucrose, and has a pH of 7.3 to 7.6. In some embodiments, the decrease in degradation coefficient is calculated with respect to storage of mRNAs formulated in lipid nanoparticles. The lipid nanoparticles may be any lipid nanoparticle described herein. Alternatively, the lipid nanoparticles may be another lipid nanoparticle known in the art. [0085] In some embodiments, reduction in degradation coefficient is measured in mRNAs having an ORF of a length in a specific range, as it is understood that the length of an mRNA affects stability during storage (e.g., shorter mRNAs are less susceptible to degradation than longer mRNAs). In some embodiments, the modified mRNA having a reduced degradation coefficient comprises an ORF that is 100–500, 500–1,000, 1,000–2,000, 2,000–3,000, 3,000–5,000, 100–5,000, 100–2,500, 100–1,500, 100–1,000, 500–5,000, 500– 2,500, 500–1,000, 1,000–5,000, 1,000–4,000, 1,000–3,000, 1,000–2,000, 2,000–5,000, 2,000–5,000, or 3,000–4,000 nucleotides in length. In some embodiments, the modified mRNA having a reduced degradation coefficient comprises an ORF that is 300–5,000 nucleotides in length. In some embodiments, the modified mRNA having a reduced degradation coefficient comprises an ORF that is 300–1,500 nucleotides in length. In some embodiments, the modified mRNA having a reduced degradation coefficient comprises an ORF that is 1,500–3,000 nucleotides in length. In some embodiments, the modified mRNA having a reduced degradation coefficient comprises an ORF that is 3,000–5,000 nucleotides in length. [0086] [0087] In some embodiments, the nucleic acid degrades (e.g., as measured by capillary electrophoresis) about 2% or less per month during storage, such as about 1% or less, about 0.75% or less, about 0.5% or less, about 0.4% or less, about 0.3% or less, about 0.2% or less, or about 0.1% or less per month during storage (e.g., at 4 ⁰C). In some embodiments, the methods comprise producing compositions comprising modified nucleic acid, where the
modified nucleic acid in the composition is at least 50% pure (such as about 50% pure, about 55% pure, about 60% pure, about 65% pure, about 70% pure, or about 75% pure or more) after storage at 0°C or more (such as 0 °C, 2 °C, 4 °C, 5 °C, 8 °C, 10 °C, 15 °C, 20 °C, 25 °C, or 2–8 °C) for a given length of time. The length of time for which a composition will comprise at least 50% pure nucleic acid can be predicted by measuring a) the initial purity of the nucleic acid in a composition, and b) the coefficient of degradation of nucleic acid, as described above, then using the equation P(t) = P0e –λt to calculate the value of t at which P(t) = 50% or 0.5. This length of time is given by the formula ^ = !"#$%%!"^& %' if P0 is expressed as a percentage or ^ = !" $.#%!"^& %' if P0 is expressed as a proportion. [0088] In some embodiments, a composition comprising a plurality of the modified mRNAs remains above 50% purity (such as about 50% pure, about 55% pure, about 60% pure, about 65% pure, about 70% pure, or about 75% pure or more) for at least 30 days, at least 40 days, at least 50 days, at least 60 days, at least 75 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNA comprising a wild-type ORF encoding the polypeptide. In some embodiments, the increase in duration of maintenance above 50% purity is during storage of modified mRNAs in the absence of lipid nanoparticles. In some embodiments, the increase in duration of maintenance above 50% purity is during storage of modified mRNAs in a buffer lacking lipid nanoparticles. In some embodiments, the buffer comprises 10–100 mM Tris. In some embodiments, the buffer comprises 5–10% sucrose. In some embodiments, the buffer has a pH of about 7.3 to about 7.6. In some embodiments, the buffer comprises 10–100 mM Tris, 5–10% sucrose, and has a pH of 7.3 to 7.6. In some embodiments, the increased duration of maintenance above 50% purity is during storage of mRNAs formulated in lipid nanoparticles. The lipid nanoparticles may be any lipid nanoparticle described herein. Alternatively, the lipid nanoparticles may be another lipid nanoparticle known in the art. In some embodiments, improved stability is measured in mRNAs having an ORF of a length in a specific range, as it is understood that the length of an mRNA affects stability during storage (e.g., longer mRNAs are less stable than shorter mRNAs). In some embodiments, the mRNA having improved stability comprises an ORF that is 100–500, 500–1,000, 1,000– 2,000, 2,000–3,000, 3,000–5,000, 100–5,000, 100–2,500, 100–1,500, 100–1,000, 500–5,000, 500–2,500, 500–1,000, 1,000–5,000, 1,000–4,000, 1,000–3,000, 1,000–2,000, 2,000–5,000, 2,000–5,000, or 3,000–4,000 nucleotides in length. In some embodiments, the mRNA having improved stability comprises an ORF that is 300–5,000 nucleotides in length. In some
embodiments, the mRNA having improved stability comprises an ORF that is 300–1,500 nucleotides in length. In some embodiments, the mRNA having improved stability comprises an ORF that is 1,500–3,000 nucleotides in length. In some embodiments, the mRNA having improved stability comprises an ORF that is 3,000–5,000 nucleotides in length. [0089] In some embodiments, the storage is conducted at a temperature between about 2 °C and about 40 °C. In some embodiments, the storage is conducted at a temperature between about 22 °C and about 28 °C. In some embodiments, the storage is conducted at about 25 °C. In some embodiments, the storage is conducted at a temperature between about 2 °C and about 15 °C. In some embodiments, the storage is conducted at a temperature between about 2 °C and about 8 °C. In some embodiments, the storage is conducted at about 3 °C. In some embodiments, the storage is conducted at about 5 °C. Degradation of nucleic acids is a chemical reaction that occurs more readily at higher temperatures, and as such the coefficient of degradation and kinetics of purity depend on the temperature at which nucleic acids are stored. [0090] In some embodiments, the stability of a modified mRNA is evaluated by storing the mRNA in a buffer with a defined composition. In some embodiments, the mRNA is stored in a buffer comprising 10–100 mM Tris. In some embodiments, the mRNA is stored in a buffer comprising 5–10% sucrose. In some embodiments, the mRNA is stored in a buffer having a pH of about 7.3 to about 7.6. In some embodiments, the storage buffer comprises 10–100 mM Tris, 5–10% sucrose, and a pH of 7.3 to 7.6. Codon optimization [0091] In some embodiments, an mRNA is codon-optimized. Codon optimization methods are known in the art. Codon optimization, in some embodiments, may be used to match codon frequencies in target and host organisms to ensure proper folding; bias %G/C content to increase mRNA thermodynamic stability or reduce secondary structures; minimize tandem repeat codons or base runs that may impair gene construction or expression; customize transcriptional and translational control regions; insert or remove protein trafficking sequences; remove/add post translation modification sites in encoded protein (e.g., glycosylation sites); add, remove or shuffle protein domains; insert or delete restriction sites; modify ribosome binding sites and mRNA degradation sites; adjust translational rates to allow the various domains of the protein to fold properly; or reduce or eliminate problem secondary structures within the polynucleotide. Codon optimization tools, algorithms and services are known in the art – non-limiting examples include services from GeneArt (Life
Technologies), DNA2.0 (Menlo Park CA) and/or proprietary methods. In some embodiments, the open reading frame (ORF) sequence is optimized using optimization algorithms. [0092] In some embodiments, a codon optimized sequence shares less than 95% sequence identity to a naturally-occurring or wild-type sequence ORF (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares less than 90% sequence identity to a naturally-occurring or wild- type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares less than 85% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares less than 80% sequence identity to a naturally-occurring or wild- type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares less than 75% sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). [0093] In some embodiments, a codon optimized sequence shares between 65% and 85% (e.g., between about 67% and about 85% or between about 67% and about 80%) sequence identity to a naturally-occurring or wild-type sequence (e.g., a naturally-occurring or wild- type mRNA sequence encoding the polypeptide). In some embodiments, a codon optimized sequence shares between 65% and 75% or about 80% sequence identity to a naturally- occurring or wild-type sequence (e.g., a naturally-occurring or wild-type mRNA sequence encoding the polypeptide). [0094] When transfected into mammalian host cells, some embodiments of modified mRNAs have a stability of between 12-18 hours, or greater than 18 hours, e.g., 24, 36, 48, 60, 72, or greater than 72 hours and are capable of being expressed by the mammalian host cells. [0095] In some embodiments, a codon optimized RNA may be one in which the levels of GC are enhanced. The G/C-content of nucleic acid molecules (e.g., mRNA) may influence the stability of the RNA. RNA having an increased amount of guanine (G) and/or cytosine (C) residues may be more thermodynamically stable than RNA containing a large amount of adenine (A) and thymine (T) or uracil (U) nucleotides. As an example, WO02/098443 discloses a pharmaceutical composition containing an mRNA stabilized by sequence modifications in the translated region. Due to the degeneracy of the genetic code, the modifications work by substituting existing codons for those that promote greater RNA
stability without changing the resulting amino acid. The approach is limited to coding regions of the RNA. [0096] In some embodiments, one or more cytidine or adenosine nucleotides of a CpA dinucleotide comprises a modified nucleotide. In some embodiments, one or more cytidine nucleotides of a CpA dinucleotide comprises a modified nucleotide. Without wishing to be bound by any particular theory, it is believed that the substitution of a conventional cytidine or adenosine nucleotide for a modified cytidine or adenosine nucleotide, respectively, is useful for reducing the susceptibility of the internucleoside linkage of a CpA dinucleotide to hydrolysis. Such substitutions are useful, for example, to improve mRNA stability where CpA dinucleotides are necessary, such as in codons encoding histidine or glutamine or in regulatory motifs (e.g., Kozak sequence). In some embodiments, 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified cytidine nucleotide and/or a modified adenosine nucleotide. In some embodiments, 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified cytidine nucleotide. In some embodiments, 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified adenosine nucleotide. In some embodiments, 10% or more, 20% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more, or up to 100% of CpA dinucleotides in a modified mRNA sequence comprise a modified cytidine nucleotide and a modified adenosine nucleotide. [0097] Multiple cytidine nucleotides may be substituted with the same or different modified cytidine nucleotides, and multiple adenosine nucleotides may be substituted with the same or different modified adenosine nucleotides. A modified cytidine nucleotide refers to a nucleotide comprising a structure different from the conventional structure of cytidine monophosphate (CMP) in an mRNA, but is still capable of hydrogen bonding with guanine (e.g., guanine of a guanosine nucleotide on a tRNA). A modified adenosine nucleotide refers to a nucleotide comprising a structure different from the conventional structure of adenosine monophosphate (AMP) in an mRNA, but is still capable of hydrogen bonding with uracil
(e.g., uracil of a uridine nucleotide on a tRNA). A modified cytidine nucleotide may comprise a modified cytosine nucleobase (i.e., nucleobase that is capable of hydrogen bonding with guanine but has a different structure than canonical cytosine), a modified sugar (i.e., sugar other than ribose), and/or a modified phosphate (i.e., internucleoside linkage different from the canonical phosphate structure). Similarly, a modified adenosine nucleotide may comprise a modified adenine nucleobase (i.e., nucleobase that is capable of hydrogen bonding with uracil but has a different structure than canonical adenine), a modified sugar, and/or a modified phosphate. Non-limiting examples of modified nucleotides, including examples of modified nucleobases, modified sugars, and modified phosphates, are described in the section below entitled “Nucleic acids.” Nucleic acids [0098] Some aspects relate to compositions comprising nucleic acids and methods of producing nucleic acids. As used herein, the term “nucleic acid” includes multiple nucleotides (i.e., molecules comprising a sugar (e.g., ribose or deoxyribose) linked to a phosphate group and to an exchangeable organic base, which is either a substituted pyrimidine (e.g., cytosine (C), thymine (T) or uracil (U)) or a substituted purine (e.g., adenine (A) or guanine (G))). The term nucleic acid includes polyribonucleotides as well as polydeoxyribonucleotides. The term nucleic acid also includes polynucleosides (i.e., a polynucleotide minus the phosphate) and any other organic base containing polymer. Non- limiting examples of nucleic acids include chromosomes, genomic loci, genes, or gene segments that encode polynucleotides or polypeptides, coding sequences, non-coding sequences (e.g., intron, 5′-UTR, or 3′-UTR) of a gene, pri-mRNA, pre-mRNA, cDNA, mRNA, etc. A nucleic acid (e.g., mRNA) may include a substitution and/or modification. In some embodiments, the substitution and/or modification is in one or more bases and/or sugars. For example, in some embodiments a nucleic acid (e.g., mRNA) includes nucleotides having an organic group, such as a methyl group, attached to a nucleic acid base at the N6 position. Thus, in some embodiments, an mRNA Ies one or more N6-methyladenosine nucleotides. A phosphate, sugar, or nucleic acid base of a nucleotide may also be substituted for another phosphate, sugar, or nucleic acid base. For example, a uridine base may be substituted for a pseudouridine base, in which the uracil base is attached to the sugar by a carbon-carbon bond rather than a nitrogen-carbon bond. Thus, in some embodiments, a nucleic acid (e.g., mRNA) is heterogeneous in backbone composition thereby containing any
possible combination of polymer units linked together such as peptide-nucleic acids (which have an amino acid backbone with nucleic acid bases). [0099] The nucleic acids described herein may include nucleic acid sequences that have been removed from their naturally occurring environment, recombinant or cloned DNA isolates, and chemically synthesized analogues or analogues biologically synthesized by heterologous systems. [0100] An “engineered nucleic acid” is a nucleic acid that does not occur in nature. It should be understood, however, that while an engineered nucleic acid as a whole is not naturally-occurring, it may include nucleotide sequences that occur in nature. In some embodiments, an engineered nucleic acid comprises nucleotide sequences from different organisms (e.g., from different species). For example, in some embodiments, an engineered nucleic acid includes a bacterial nucleotide sequence, a human nucleotide sequence, and/or a viral nucleotide sequence. [0101] Engineered nucleic acids include recombinant nucleic acids and synthetic nucleic acids. A “recombinant nucleic acid” is a molecule that is constructed by joining nucleic acids (e.g., isolated nucleic acids, synthetic nucleic acids, or a combination thereof) and, in some embodiments, can replicate in a living cell. A “synthetic nucleic acid” is a molecule that is amplified or chemically, or by other means, synthesized. A synthetic nucleic acid includes those that are chemically modified, or otherwise modified, but can base pair with naturally- occurring nucleic acid molecules. Recombinant and synthetic nucleic acids also include those molecules that result from the replication of either of the foregoing. A nucleic may comprise naturally occurring nucleotides and/or non-naturally occurring nucleotides such as modified nucleotides. [0102] In some embodiments, a nucleic acid is present in (or on) a vector. Examples of vectors include but are not limited to bacterial plasmids, phage, cosmids, phasmids, fosmids, bacterial artificial chromosomes, yeast artificial chromosomes, viruses, and retroviruses (for example vaccinia, adenovirus, adeno-associated virus, lentivirus, herpes-simplex virus, Epstein-Barr virus, fowlpox virus, pseudorabies, baculovirus) and vectors derived therefrom. In some embodiments, a nucleic acid (e.g., DNA) used as an input molecule for in vitro transcription (IVT) is present in a plasmid vector. [0103] When applied to a nucleic acid sequence, the term “isolated” denotes that the polynucleotide sequence has been removed from its natural genetic milieu and is thus free of other extraneous or unwanted coding sequences (but may include naturally occurring 5′ and 3′ untranslated regions such as promoters and terminators), and is in a form suitable for use
within genetically engineered protein production systems. Such isolated molecules are those that are separated from their natural environment. [0104] The terms 5′ and 3′ are used herein to describe features of a nucleic acid sequence related to either the position of genetic elements and/or the direction of events (5′ to 3′), such as e.g. transcription by RNA polymerase or translation by the ribosome which proceeds in 5′ to 3′ direction. Synonyms are upstream (5′) and downstream (3′). Conventionally, DNA sequences, gene maps, vector cards and RNA sequences are drawn with 5′ to 3′ from left to right or the 5′ to 3′ direction is indicated with arrows, wherein the arrowhead points in the 3′ direction. Accordingly, 5′ (upstream) indicates genetic elements positioned towards the left- hand side, and 3′ (downstream) indicates genetic elements positioned towards the right-hand side, when following this convention. [0105] A nucleic acid (e.g., mRNA) typically comprises a plurality of nucleotides. A nucleotide includes a nitrogenous base, a five-carbon sugar (ribose or deoxyribose), and at least one phosphate group. Nucleotides include nucleoside monophosphates, nucleoside diphosphates, and nucleoside triphosphates. A nucleoside monophosphate (NMP) includes a nucleobase linked to a ribose and a single phosphate; a nucleoside diphosphate (NDP) includes a nucleobase linked to a ribose and two phosphates; and a nucleoside triphosphate (NTP) includes a nucleobase linked to a ribose and three phosphates. Nucleotide analogs are compounds that have the general structure of a nucleotide or are structurally similar to a nucleotide. Nucleotide analogs, for example, include an analog of the nucleobase, an analog of the sugar and/or an analog of the phosphate group(s) of a nucleotide. [0106] A nucleoside includes a nitrogenous base and a 5-carbon sugar. Thus, a nucleoside plus a phosphate group yields a nucleotide. Nucleoside analogs are compounds that have the general structure of a nucleoside or are structurally similar to a nucleoside. Nucleoside analogs, for example, include an analog of the nucleobase and/or an analog of the sugar of a nucleoside. [0107] It should be understood that the term “nucleotide” includes naturally-occurring nucleotides, synthetic nucleotides and modified nucleotides, unless indicated otherwise. Examples of naturally-occurring nucleotides used for the production of RNA, e.g., in an IVT reaction, as described herein include adenosine triphosphate (ATP), guanosine triphosphate (GTP), cytidine triphosphate (CTP), uridine triphosphate (UTP), and 5-methyluridine triphosphate (m5UTP). In some embodiments, adenosine diphosphate (ADP), guanosine diphosphate (GDP), cytidine diphosphate (CDP), and/or uridine diphosphate (UDP) are used.
[0108] Examples of nucleotide analogs include, but are not limited to, antiviral nucleotide analogs, phosphate analogs (soluble or immobilized, hydrolyzable or non-hydrolyzable), dinucleotide, trinucleotide, tetranucleotide, e.g., a cap analog, or a precursor/substrate for enzymatic capping (vaccinia or ligase), a nucleotide labeled with a functional group to facilitate ligation/conjugation of cap or 5 ^ moiety (IRES), a nucleotide labeled with a 5 ^ PO4 to facilitate ligation of cap or 5 ^ moiety, or a nucleotide labeled with a functional group/protecting group that can be chemically or enzymatically cleaved. Examples of antiviral nucleotide/nucleoside analogs include, but are not limited, to Ganciclovir, Entecavir, Telbivudine, Vidarabine and Cidofovir. [0109] Modified nucleotides may include modified nucleobases. For example, an RNA transcript (e.g., mRNA transcript) described herein may include a modified nucleobase selected from pseudouracil (ψ), N1-methylpseudouracil (m1ψ), 1-ethylpseudouracil, 2- thiouracil, 4′-thiouracil, 2-thio-1-methyl-1-deaza-pseudouracil, 2-thio-1-methyl-pseudouracil, 2-thio-5-aza-uracil, 2-thio-dihydropseudouracil, 2-thio-dihydrouracil, 2-thio-pseudouracil, 4- methoxy-2-thio-pseudouracil, 4-methoxy-pseudouracil, 4-thio-1-methyl-pseudouracil, 4-thio- pseudouracil, 5-aza-uracil, dihydropseudouracil, 5-methyluracil, 5-methoxyuracil (mo5U) and 2′-O-methyluracil. In some embodiments, an RNA transcript may include a modified cytosine nucleobase selected from digoxigeninated cytosine, 2-thiocytosine, 5- aminoallylcytosine, 5-bromocytosine, 5-carboxycytosine, 5-formylcytosine, 5- hydroxycytosine, 5-hydroxymethylcytosine, 5-methoxycytosine, 5-methylcytosine, 5- propargylaminocytosine, 5-propynylcytosine, 6-azacytosine, aracytosine, cyanine 3-5- propargylaminocytosine, cyanine 3-aminoallylcytosine, cyanine 5-6-propargylaminocytosine, cyanine 5-aminoallylcytosine, desthiobiotin-6-aminoallylcytosine, N4-biotin-OBEA- cytosine, N4-methylcytosine, pseudoisocytosine, and thienocytosine. In some embodiments, an RNA transcript may include a modified adenine nucleobase selected from digoxigeninated adenine, N6-methyladenine, 7-deazaadenine, 7-deaza-7-propargylaminoadenine, 8- azaadenine, 8-azidoadenine, 8-chloroadenine, 8-oxoadenine, araadenine, N1-methyladenine, N6-methyladenine [0110] 3-deazaadenine, 2,6-diaminoadenine, 2-methyl-thio-N6-isopentenyladenine (ms2i6A), 2-methylthio-N6-methyladenine (ms2m6A), N6-(cis-hydroxyisopentenyl)adenine (io6A), 2-methylthio-N6-(cis-hydroxyisopentenyl)adenine (ms2io6A), N6- glycinylcarbamoyladenine (g6A), N6-threonylcarbamoyladenine (t6A), 2-methylthio-N6- threonyl carbamoyladenine (ms2t6A), N6-methyl-N6-threonylcarbamoyladenine (m6t6A), N6-hydroxynorvalylcarbamoyladenine (hn6A), 2-methylthio-N6-hydroxynorvalyl
carbamoyladenine (ms2hn6A), N6,N6-dimethyladenine (m62A), and N6-acetyladenine (ac6A). In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified nucleobases. [0111] Modified nucleotides may include modified sugars. For example, an RNA transcript (e.g., mRNA transcript) described herein may include a modified sugar selected from 2′-thioribose, 2′,3′-dideoxyribose, 2′-amino-2′-deoxyribose, 2′ deoxyribose, 2′-azido-2′- deoxyribose, 2′-fluoro-2′-deoxyribose, 2′-O-methylribose, 2′-O-methyldeoxyribose, 3′- amino-2′,3′-dideoxyribose, 3′-azido-2′,3′-dideoxyribose, 3′-deoxyribose, 3′-O-(2- nitrobenzyl)-2′-deoxyribose, 3′-O-methylribose, 5′-aminoribose, 5′-thioribose, 5-nitro-1- indolyl-2′-deoxyribose, 5′-biotin-ribose, 2′-O,4′-C-methylene-linked, 2′-O,4′-C-amino-linked ribose, and 2′-O,4′-C-thio-linked ribose. In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified sugars. [0112] Modified nucleotides may include modified phosphates. A modified phosphate group is a phosphate group that differs from the canonical structure of phosphate. An example of a canonical structure of a phosphate is shown below: , where R5 and R3 are atoms or molecules to which the canonical phosphate is bonded. For example, for a phosphate in a nucleic acid sequence, R5 may refer to the upstream nucleotide of the nucleic acid, and R3 may refer to the downstream nucleotide of the nucleic acid. The canonical structure of phosphate also refers to structures in which one or more hydroxyl groups of the phosphate are deprotonated, or in which an oxygen atom of the phosphate is bonded to an adjacent nucleotide in a nucleic acid sequence. In some embodiments, an RNA transcript (e.g., mRNA transcript) described herein may include a modified phosphate selected from phosphorothioate (PS), thiophosphate, 5′-O-methylphosphonate, 3′-O- methylphosphonate, 5′-hydroxyphosphonate, hydroxyphosphanate, phosphoroselenoate, selenophosphate, phosphoramidate, carbophosphonate, methylphosphonate, phenylphosphonate, ethylphosphonate, H-phosphonate, guanidinium ring, triazole ring, boranophosphate (BP), methylphosphonate, and guanidinopropyl phosphoramidate. In some embodiments, an RNA transcript (e.g., mRNA transcript) includes a combination of at least two (e.g., 2, 3, 4 or more) of the foregoing modified phosphates.
[0113] mRNAs described herein may be used to produce polypeptides of interest, such as therapeutic proteins and/or vaccine antigens. In some embodiments, an mRNA encodes a vaccine antigen. In some embodiments, an mRNA encodes a therapeutic protein. In some embodiments, the encoded polypeptide comprises 9–10,000, 9–9,000, 9–8,000, 9–7,000, 9– 6,000, 9–5,000, 9–4,000, 9–3,000, 9–2,000, 9–1,000, 9–500, 9–400, 9–300, 9–200, 9–100, 9– 10,000, 100–9,000, 100–8,000, 100–7,000, 100–6,000, 100–5,000, 100–4,000, 100–3,000, 100–2,000, 100–1,000, 100–500, 100–400, 100–300, 100–200, 100–9,000, 200–10,000, 200– 9,000200–8,000, 200–7,000, 200–6,000, 200–5,000, 200–4,000, 200–3,000, 200–2,000, 200–1,000, 200–500, 200–400, 500–10,000, 500–9,000, 500–8,000, 500–7,000, 500–6,000, 500–5,000, 500–4,000, 500–3,000, 500–2,000, 500–1,000, 1,000–10,000, 1,000–9,000, 1,000–8,000, 1,000–7,000, 1,000–6,000, 1,000–5,000, 1,000–4,000, 1,000–3,000, or 1,000– 2,000 amino acids. In some embodiments, the encoded polypeptide consists of 9–10,000, 9– 9,000, 9–8,000, 9–7,000, 9–6,000, 9–5,000, 9–4,000, 9–3,000, 9–2,000, 9–1,000, 9–500, 9– 400, 9–300, 9–200, 9–100, 9–10,000, 100–9,000, 100–8,000, 100–7,000, 100–6,000, 100– 5,000, 100–4,000, 100–3,000, 100–2,000, 100–1,000, 100–500, 100–400, 100–300, 100–200, 100–9,000, 200–10,000, 200–9,000200–8,000, 200–7,000, 200–6,000, 200–5,000, 200– 4,000, 200–3,000, 200–2,000, 200–1,000, 200–500, 200–400, 500–10,000, 500–9,000, 500– 8,000, 500–7,000, 500–6,000, 500–5,000, 500–4,000, 500–3,000, 500–2,000, 500–1,000, 1,000–10,000, 1,000–9,000, 1,000–8,000, 1,000–7,000, 1,000–6,000, 1,000–5,000, 1,000– 4,000, 1,000–3,000, or 1,000–2,000 amino acids. In some embodiments, the encoded polypeptide comprises 9–5,000 amino acids. In some embodiments, the encoded polypeptide consists of 9–5,000 amino acids. In some embodiments, the encoded polypeptide comprises 20–4,000 amino acids. In some embodiments, the encoded polypeptide consists of 20–4,000 amino acids. In some embodiments, the encoded polypeptide comprises 30–3,000 amino acids. In some embodiments, the encoded polypeptide consists of 30–3,000 amino acids. In some embodiments, the encoded polypeptide comprises 40–2,000 amino acids. In some embodiments, the encoded polypeptide consists of 40–2,000 amino acids. In some embodiments, the encoded polypeptide comprises 50–1,500 amino acids. In some embodiments, the encoded polypeptide consists of 50–1,500 amino acids. In some embodiments, the encoded polypeptide comprises 100–5,000 amino acids. In some embodiments, the encoded polypeptide consists of 100–5,000 amino acids. In some embodiments, the encoded polypeptide comprises 200–4,000 amino acids. In some embodiments, the encoded polypeptide consists of 200–4,000 amino acids. In some embodiments, the encoded polypeptide comprises 300–3,000 amino acids. In some
embodiments, the encoded polypeptide consists of 300–3,000 amino acids. In some embodiments, the encoded polypeptide comprises 400–2,000 amino acids. In some embodiments, the encoded polypeptide consists of 400–2,000 amino acids. In some embodiments, the encoded polypeptide comprises 500–1,500 amino acids. In some embodiments, the encoded polypeptide consists of 500–1,500 amino acids. [0114] A therapeutic mRNA is an mRNA that encodes a therapeutic protein (the term ‘protein’ encompasses peptides). In some embodiments, RNA compositions described herein comprise one or more RNAs that encode peptides or proteins that interact or complex in a cell or subject to form a multi-subunit protein (e.g., an antibody comprising a heavy chain and a light chain, a multi-subunit receptor protein, a multi-subunit signaling protein, a multi- subunit antigen, etc.) or a multivalent vaccine. [0115] Therapeutic proteins mediate a variety of effects in a host cell or in a subject to treat a disease or ameliorate the signs and symptoms of a disease. For example, a therapeutic protein can replace a protein that is deficient or abnormal, augment the function of an endogenous protein, provide a novel function to a cell (e.g., inhibit or activate an endogenous cellular activity, or act as a delivery agent for another therapeutic compound (e.g., an antibody-drug conjugate). Therapeutic mRNA may be useful for the treatment of the following diseases and conditions: bacterial infections, viral infections, parasitic infections, cell proliferation disorders, genetic disorders, and autoimmune disorders. Other diseases and conditions are encompassed herein. [0116] A protein or proteins of interest encoded by an RNA composition as described herein can be essentially any protein or peptide (e.g., peptide antigen). [0117] In some embodiments, a therapeutic peptide or therapeutic protein is a biologic. A biologic is a polypeptide-based molecule that may be used to treat, cure, mitigate, prevent, or diagnose a serious or life-threatening disease or medical condition. Biologics include, but are not limited to, allergenic extracts (e.g. for allergy shots and tests), blood components, gene therapy products, human tissue or cellular products used in transplantation, vaccines, monoclonal antibodies, cytokines, growth factors, enzymes, thrombolytics, and immunomodulators, among others. [0118] In some embodiments, the therapeutic protein is a cytokine, a growth factor, an antibody (e.g., monoclonal antibody), a fusion protein, or a vaccine (e.g., an RNA encoding one or more peptide antigens designed to elicit an immune response in a subject). Non- limiting examples of therapeutic proteins include blood factors (such as Factor VIII and Factor VII), complement factors, Low Density Lipoprotein Receptor (LDLR) and MUT1.
Non-limiting examples of cytokines include interleukins, interferons, chemokines, lymphokines and the like. Non-limiting examples of growth factors include erythropoietin, EGFs, PDGFs, FGFs, TGFs, IGFs, TNFs, CSFs, MCSFs, GMCSFs and the like. Non- limiting examples of antibodies include adalimumab, infliximab, rituximab, ipilimumab, tocilizumab, canakinumab, itolizumab, tralokinumab, anti-influenza virus monoclonal antibody, anti-Chikungunya virus monoclonal antibody, anti-Zika virus monoclonal antibody, anti-SARS-CoV-2 monoclonal antibody. Non-limiting examples of fusion proteins include, for example, etanercept, abatacept and belatacept. Non-limiting examples of multivalent vaccines include, for example, multivalent cytomegalovirus (CMV) vaccine, and personalized cancer vaccines. [0119] One or more biologics currently being marketed or in development may be encoded by the RNA. While not wishing to be bound by theory, it is believed that incorporation of the encoding polynucleotides of a known biologic into the RNA described herein will result in improved therapeutic efficacy due at least in part to the specificity, purity and/or selectivity of the construct designs. [0120] An RNA composition described herein may encode one or more antibodies (e.g., may comprise a first mRNA encoding an antibody heavy chain and a second RNA encoding an antibody light chain). The term “antibody” includes monoclonal antibodies (including full length antibodies which have an immunoglobulin Fc region), antibody compositions with polyepitopic specificity, multispecific antibodies (e.g., bispecific antibodies, diabodies, and single-chain molecules), as well as antibody fragments. The term “immunoglobulin” (Ig) is used interchangeably with “antibody” herein. A monoclonal antibody is an antibody obtained from a population of substantially homogeneous antibodies, i.e., the individual antibodies comprising the population are identical except for possible naturally occurring mutations and/or post-translation modifications (e.g., isomerizations, amidations) that may be present in minor amounts. Monoclonal antibodies are highly specific, being directed against a single antigenic site. [0121] Monoclonal antibodies specifically include chimeric antibodies (immunoglobulins) in which a portion of the heavy and/or light chain is identical with or homologous to corresponding sequences in antibodies derived from a particular species or belonging to a particular antibody class or subclass, while the remainder of the chain(s) is(are) identical with or homologous to corresponding sequences in antibodies derived from another species or belonging to another antibody class or subclass, as well as fragments of such antibodies, so long as they exhibit the desired biological activity. Chimeric antibodies
include, but are not limited to, “primatized” antibodies comprising variable domain antigen- binding sequences derived from a non-human primate (e.g., Old World Monkey, Ape etc.) and human constant region sequences. [0122] Antibodies encoded in the RNA compositions may be utilized to treat conditions or diseases in many therapeutic areas such as, but not limited to, blood, cardiovascular, CNS, poisoning (including antivenoms), dermatology, endocrinology, gastrointestinal, medical imaging, musculoskeletal, oncology, immunology, respiratory, sensory and anti-infective. [0123] An RNA composition described herein may encode one or more vaccine antigens. A vaccine antigen is a biological preparation that improves immunity to a particular disease or infectious agent. One or more vaccine antigens currently being marketed or in development may be encoded by the RNA. Vaccine antigens encoded in the RNA may be utilized to treat conditions or diseases in many therapeutic areas such as, but not limited to, cancer, allergy, and infectious disease. In some embodiments, a vaccine may be a personalized vaccine in the form of a concatemer or individual RNAs encoding peptide epitopes or a combination thereof. [0124] An RNA composition described herein may be designed to encode on or more antimicrobial peptides (AMP) or antiviral peptides (AVP). AMPs and AVPs have been isolated and described from a wide range of animals such as, but not limited to, microorganisms, invertebrates, plants, amphibians, birds, fish, and mammals. The anti- microbial polypeptides may block cell fusion and/or viral entry by one or more enveloped viruses (e.g., HIV, HCV). For example, the anti-microbial polypeptide can comprise or consist of a synthetic peptide corresponding to a region, e.g., a consecutive sequence of at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 amino acids of the transmembrane subunit of a viral envelope protein, e.g., HIV-1 gp120 or gp41. The amino acid and nucleotide sequences of HIV-1 gp120 or gp41 are described in, e.g., Kuiken et al., (2008). “HIV Sequence Compendium,” Los Alamos National Laboratory. [0125] In some embodiments, RNA transcripts (e.g., mRNA) are used for in vitro translation and microinjection. In some embodiments, RNA transcripts are used for RNA structure, processing and catalysis studies. In some embodiments, RNA transcripts are used for RNA amplification. In some embodiments, RNA transcripts are used as anti-sense RNA for gene expression modulation. Other applications are also encompassed.
5′ cap structures [0126] In some embodiments, a composition includes an RNA polynucleotide having an open reading frame encoding at least one polypeptide having at least one modification, at least one 5′ terminal cap. [0127] 5′ terminal caps can include endogenous caps or cap analogs. A 5′ terminal cap can comprise a guanine analog. Useful guanine analogs include, but are not limited to, inosine, N1-methyl-guanosine, 2′fluoro-guanosine, 7-deaza-guanosine, 8-oxo-guanosine, 2- amino-guanosine, LNA-guanosine, and 2-azido-guanosine. [0128] Also provided herein are exemplary caps including those that can be used in co- transcriptional capping methods for ribonucleic acid (RNA) synthesis, using RNA polymerase, e.g., wild type RNA polymerase or variants thereof, e.g., such as those variants described herein. In one embodiment, caps can be added when RNA is produced in a “one- pot” reaction, without the need for a separate capping reaction. Thus, the methods, in some embodiments, comprise reacting a polynucleotide template with a RNA polymerase variant, nucleoside triphosphates, and a cap analog under in vitro transcription reaction conditions to produce RNA transcript. [0129] In some embodiments, the cap analog binds to a polynucleotide template that comprises a promoter region comprising a transcriptional start site having a first nucleotide at nucleotide position +1, a second nucleotide at nucleotide position +2, and a third nucleotide at nucleotide position +3. In some embodiments, the cap analog hybridizes to the polynucleotide template at least at nucleotide position +1, such as at the +1 and +2 positions, or at the +1, +2, and +3 positions. [0130] A cap analog may be, for example, a dinucleotide cap, a trinucleotide cap, or a tetranucleotide cap. In some embodiments, a cap analog is a dinucleotide cap. In some embodiments, a cap analog is a trinucleotide cap. In some embodiments, a cap analog is a tetranucleotide cap. As used here the term “cap” includes the inverted G nucleotide and can comprise additional nucleotides 3’ of the inverted G, .e.g., 1, 2, or more nucleotides 3’ of the inverted G and 5’ to the 5’ UTR. [0131] Exemplary caps comprise a sequence GG, GA, or GGA wherein the underlined, italicized G is an inverted G. [0132] In some embodiments, a trinucleotide cap comprises a compound of Formula (III) or (IV), or a stereoisomer, tautomer, or salt thereof.
Formula (III) [0133] As described herein, a trinucleotide cap, in some embodiments, comprises a compound of formula (III):
tautomer, or salt thereof, wherein
ring B1 is a modified or unmodified Guanine; ring B2 and ring B3 each independently is a nucleobase or a modified nucleobase; X2 is O, S(O)p, NR24 or CR25R26 in which p is 0, 1, or 2; Y0 is O or CR6R7; Y1 is O, S(O)n, CR6R7, or NR8, in which n is 0, 1, or 2; each --- is a single bond or absent, wherein when each --- is a single bond, Yi is O, S(O)n, CR6R7, or NR8; and when each --- is absent, Y1 is void; Y2 is (OP(O)R4)m in which m is 0, 1, or 2, or -O-(CR40R41)u-Q0-(CR42R43)v-, in which Q0 is a bond, O, S(O)r, NR44, or CR45R46, r is 0, 1 , or 2, and each of u and v independently is 1, 2, 3 or 4; each R2 and R2' independently is halo, LNA, or OR3; each R3 independently is H, C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl and R3, when being C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl, is optionally substituted with one or more of halo, OH and C1-C6 alkoxyl that is optionally substituted with one or more OH or OC(O)-C1-C6 alkyl; each R4 and R4' independently is H, halo, C1-C6 alkyl, OH, SH, SeH, or BH3-;
each of R6, R7, and R8, independently, is -Q1-T1, in which Q1 is a bond or C1-C3 alkyl linker optionally substituted with one or more of halo, cyano, OH and C1-C6 alkoxy, and T1 is H, halo, OH, COOH, cyano, or Rs1, in which Rs1 is C1-C3 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C1- C6 alkoxyl, C(O)O-C1-C6 alkyl, C3-C8 cycloalkyl, C6-C10 aryl, NR31R32, (NR31R32R33)+, 4 to 12- membered heterocycloalkyl, or 5- or 6-membered heteroaryl, and Rs1 is optionally substituted with one or more substituents selected from the group consisting of halo, OH, oxo, C1-C6 alkyl, COOH, C(O)O-C1-C6 alkyl, cyano, C1-C6 alkoxyl, NR31R32, (NR31R32R33)+, C3-C8 cycloalkyl, C6-C10 aryl, 4 to 12-membered heterocycloalkyl, and 5- or 6-membered heteroaryl; each of R10, R11, R12, R13 R14, and R15, independently, is -Q2-T2, in which Q2 is a bond or C1-C3 alkyl linker optionally substituted with one or more of halo, cyano, OH and C1-C6 alkoxy, and T2 is H, halo, OH, NH2, cyano, NO2, N3, Rs2, or ORs2, in which Rs2 is C1-C6 alkyl, C2-C6 alkenyl, C2-C6 alkynyl, C3-C8 cycloalkyl, C6-C10 aryl, NHC(O)-C1-C6 alkyl, NR31R32, (NR31R32R33)+, 4 to 12-membered heterocycloalkyl, or 5- or 6-membered heteroaryl, and Rs2 is optionally substituted with one or more substituents selected from the group consisting of halo, OH, oxo, C1-C6 alkyl, COOH, C(O)O-C1-C6 alkyl, cyano, C1 - C6 alkoxyl, NR31R32, (NR31R32R33)+, C3-C8 cycloalkyl, C6-C10 aryl, 4 to 12-membered heterocycloalkyl, and 5- or 6- membered heteroaryl; or alternatively R12 together with R14 is oxo, or R13 together with R15 is oxo, each of R20, R21, R22, and R23 independently is -Q3-T3, in which Q3 is a bond or C1-C3 alkyl linker optionally substituted with one or more of halo, cyano, OH and C1-C6 alkoxy, and T3 is H, halo, OH, NH2, cyano, NO2, N3, RS3, or ORS3, in which RS3 is C1-C6 alkyl, C2- C6 alkenyl, C2-C6 alkynyl, C3-C8 cycloalkyl, C6-C10 aryl, NHC(O)-C1-C6 alkyl, mono-C1- C6 alkylamino, di-C1-C6 alkylamino, 4 to 12-membered heterocycloalkyl, or 5- or 6- membered heteroaryl, and Rs3 is optionally substituted with one or more substituents selected from the group consisting of halo, OH, oxo, C1-C6 alkyl, COOH, C(O)O-C1-C6 alkyl, cyano, C1-C6 alkoxyl, amino, mono-C1-C6 alkylamino, di-C1-C6 alkylamino, C3-C8 cycloalkyl, C6- C10 aryl, 4 to 12-membered heterocycloalkyl, and 5- or 6-membered heteroaryl; each of R24, R25, and R26 independently is H or C1-C6 alkyl; each of R27 and R28 independently is H or OR29; or R27 and R28 together form O-R30- O; each R29 independently is H, C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl and R29, when being C1-C6 alkyl, C2-C6 alkenyl, or C2-C6 alkynyl, is optionally substituted with one or more of halo, OH and C1-C6 alkoxyl that is optionally substituted with one or more OH or OC(O)- C1-C6 alkyl;
R30 is C1-C6 alkylene optionally substituted with one or more of halo, OH and C1-C6 alkoxyl; each of R31, R32, and R33, independently is H, C1-C6 alkyl, C3-C8 cycloalkyl, C6-C10 aryl, 4 to 12-membered heterocycloalkyl, or 5- or 6-membered heteroaryl; each of R40, R41, R42, and R43 independently is H, halo, OH, cyano, N3, OP(O)R47R48, or C1-C6 alkyl optionally substituted with one or more OP(O)R47R48, or one R41 and one R43, together with the carbon atoms to which they are attached and Q0, form C4-C10 cycloalkyl, 4- to 14-membered heterocycloalkyl, C6-C10 aryl, or 5- to 14-membered heteroaryl, and each of the cycloalkyl, heterocycloalkyl, phenyl, or 5- to 6-membered heteroaryl is optionally substituted with one or more of OH, halo, cyano, N3, oxo, OP(O)R47R48, C1-C6 alkyl, C1-C6 haloalkyl, COOH, C(O)O-C1-C6 alkyl, C1-C6 alkoxyl, C1-C6 haloalkoxyl, amino, mono-C1-C6 alkylamino, and di-C1-C6 alkylamino; R44 is H, C1-C6 alkyl, or an amine protecting group; each of R45 and R46 independently is H, OP(O)R47R48, or C1-C6 alkyl optionally substituted with one or more OP(O)R47R48, and each of R47 and R48, independently is H, halo, C1-C6 alkyl, OH, SH, SeH, or BH3. [0134] It should be understood that a cap analog, as provided herein, may include any of the cap analogs described in international publication WO 2017/066797, published on 20 April 2017, incorporated by reference herein in its entirety. [0135] In some embodiments, the B2 middle position can be a non-ribose molecule, such as arabinose. [0136] In some embodiments R2 is ethyl-based. [0137] Thus, in some embodiments, a trinucleotide cap comprises the following structure:
(IIIa),
or a stereoisomer, tautomer, or salt thereof. [0138] In yet other embodiments, a trinucleotide cap comprises the following structure:
(IIIb), or a stereoisomer, tautomer or salt thereof. [0139] In still other embodiments, a trinucleotide cap comprises the following structure:
(IIIc), or a stereoisomer, tautomer, or salt thereof. [0140] In some embodiments, R is an alkyl (e.g., C1-C6 alkyl). In some embodiments, R is a methyl group (e.g., C1 alkyl). In some embodiments, R is an ethyl group (e.g., C2 alkyl). [0141] A trinucleotide cap, in some embodiments, comprises a sequence selected from the following sequences: GAA, GAC, GAG, GAU, GCA, GCC, GCG, GCU, GGA, GGC, GGG, GGU, GUA, GUC, GUG, and GUU. In some embodiments, a trinucleotide cap comprises GAA. In some embodiments, a trinucleotide cap comprises GAC. In some embodiments, a trinucleotide cap comprises GAG. In some embodiments, a trinucleotide cap
comprises GAU. In some embodiments, a trinucleotide cap comprises GCA. In some embodiments, a trinucleotide cap comprises GCC. In some embodiments, a trinucleotide cap comprises GCG. In some embodiments, a trinucleotide cap comprises GCU. In some embodiments, a trinucleotide cap comprises GGA. In some embodiments, a trinucleotide cap comprises GGC. In some embodiments, a trinucleotide cap comprises GGG. In some embodiments, a trinucleotide cap comprises GGU. In some embodiments, a trinucleotide cap comprises GUA. In some embodiments, a trinucleotide cap comprises GUC. In some embodiments, a trinucleotide cap comprises GUG. In some embodiments, a trinucleotide cap comprises GUU. [0142] In some embodiments, a trinucleotide cap comprises a sequence selected from the following sequences: m7GpppApA, m7GpppApC, m7GpppApG, m7GpppApU, m7GpppCpA, m7GpppCpC, m7GpppCpG, m7GpppCpU, m7GpppGpA, m7GpppGpC, m7GpppGpG, m7GpppGpU, m7GpppUpA, m7GpppUpC, m7GpppUpG, and m7GpppUpU. [0143] In some embodiments, a trinucleotide cap comprises m7GpppApA. In some embodiments, a trinucleotide cap comprises m7GpppApC. In some embodiments, a trinucleotide cap comprises m7GpppApG. In some embodiments, a trinucleotide cap comprises m7GpppApU. In some embodiments, a trinucleotide cap comprises m7GpppCpA. In some embodiments, a trinucleotide cap comprises m7GpppCpC. In some embodiments, a trinucleotide cap comprises m7GpppCpG. In some embodiments, a trinucleotide cap comprises m7GpppCpU. In some embodiments, a trinucleotide cap comprises m7GpppGpA. In some embodiments, a trinucleotide cap comprises m7GpppGpC. In some embodiments, a trinucleotide cap comprises m7GpppGpG. In some embodiments, a trinucleotide cap comprises m7GpppGpU. In some embodiments, a trinucleotide cap comprises m7GpppUpA. In some embodiments, a trinucleotide cap comprises m7GpppUpC. In some embodiments, a trinucleotide cap comprises m7GpppUpG. In some embodiments, a trinucleotide cap comprises m7GpppUpU. [0144] A trinucleotide cap, in some embodiments, comprises a sequence selected from the following sequences: m7g3′OMepppApA, m7g3′OMepppApC, m7g3′OMepppApG, m7g3′OMepppApU, m7g3′OMepppCpA, m7g3′OMepppCpC, m7g3′OMepppCpG, m7g3′OMepppCpU, m7g3′OMepppGpA, m7g3′OMepppGpC, m7g3′OMepppGpG, m7g3′OMepppGpU, m7g3′OMepppUpA, m7g3′OMepppUpC, m7G3′OmepppUpG, and m7G3′OMepppUpU. [0145] In some embodiments, a trinucleotide cap comprises m7G3′OMepppApA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppApC. In some embodiments, a
trinucleotide cap comprises m7G3′OMepppApG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppApU. In some embodiments, a trinucleotide cap comprises m7G3′OMepppCpA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppCpC. In some embodiments, a trinucleotide cap comprises m7G3′OMepppCpG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppCpU. In some embodiments, a trinucleotide cap comprises m7G3′OMepppGpA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppGpC. In some embodiments, a trinucleotide cap comprises m7G3′OMepppGpG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppGpU. In some embodiments, a trinucleotide cap comprises m7G3′OMepppUpA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppUpC. In some embodiments, a trinucleotide cap comprises m7G3′OMepppUpG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppUpU. [0146] A trinucleotide cap, in other embodiments, comprises a sequence selected from the following sequences: m7G3′OMepppA2′OMepA, m7G3′OMepppA2′OMepC, m7G3′OMepppA2′OMepG, m7G3′OMepppA2′OMepU, m7G3′OMepppC2′OMepA, m7G3′OMepppC2′OMepC, m7G3′OMepppC2′OMepG, m7G3′OMepppC2′OMepU, m7G3′OMepppG2′OMepA, m7G3′OMepppG2′OMepC, m7G3′OMepppG2′OMepG, m7G3′OMepppG2′OMepU, m7G3′OMepppU2′OMepA, m7G3′OMepppU2′OMepC, m7G3′OMepppu2′OMepG, and m7G3′OMepppU2′OMepU. [0147] In some embodiments, a trinucleotide cap comprises m7G3′OMepppA2′OMepA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppA2′OMepC. In some embodiments, a trinucleotide cap comprises m7G3′OMepppA2′OMepG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppA2′OMepU. In some embodiments, a trinucleotide cap comprises m7G3′OMepppC2′OMepA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppC2′OMepC. In some embodiments, a trinucleotide cap comprises m7G3′OMepppC2′OMepG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppC2′OMepU. In some embodiments, a trinucleotide cap comprises m7G3′OMepppG2′OMepA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppG2′OMepC. In some embodiments, a trinucleotide cap comprises m7G3′OMepppG2′OMepG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppG2′OMepU. In some embodiments, a trinucleotide cap comprises m7G3′OMepppU2′OMepA. In some embodiments, a trinucleotide cap comprises m7G3′OMepppU2′OMepC. In some embodiments, a trinucleotide cap comprises
m7G3′OMepppU2′OMepG. In some embodiments, a trinucleotide cap comprises m7G3′OMepppU2′OMepU. [0148] A trinucleotide cap, in still other embodiments, comprises a sequence selected from the following sequences: m7Gpppa2′OMepA, m7Gpppa2′OMepC, m7Gpppa2′OMepG, m7Gpppa2′OMepU, m7Gpppc2′OMepA, m7Gpppc2′OMepC, m7Gpppc2′OMepG, m7Gpppc2′OMepU, m7Gpppg2′OMepA, m7Gpppg2′OMepC, m7Gpppg2′OMepG, m7Gpppg2′OMepU, m7Gpppu2′OMepA, m7Gpppu2′OMepC, m7GpppU2′OmepG, and m7GpppU2′OMepU. [0149] In some embodiments, a trinucleotide cap comprises m7GpppA2′OMepA. In some embodiments, a trinucleotide cap comprises m7GpppA2′OMepC. In some embodiments, a trinucleotide cap comprises m7GpppA2′OMepG. In some embodiments, a trinucleotide cap comprises m7GpppA2′OMepU. In some embodiments, a trinucleotide cap comprises m7GpppC2′OMepA. In some embodiments, a trinucleotide cap comprises m7GpppC2′OMepC. In some embodiments, a trinucleotide cap comprises m7GpppC2′OMepG. In some embodiments, a trinucleotide cap comprises m7GpppC2′OMepU. In some embodiments, a trinucleotide cap comprises m7GpppG2′OMepA. In some embodiments, a trinucleotide cap comprises m7GpppG2′OMepC. In some embodiments, a trinucleotide cap comprises m7GpppG2′OMepG. In some embodiments, a trinucleotide cap comprises m7GpppG2′OMepU. In some embodiments, a trinucleotide cap comprises m7GpppU2′OMepA. In some embodiments, a trinucleotide cap comprises m7GpppU2′OMepC. In some embodiments, a trinucleotide cap comprises m7GpppU2′OMepG. In some embodiments, a trinucleotide cap comprises m7GpppU2′OMepU. [0150] In some embodiments, a trinucleotide cap comprises m7Gpppm6A2’OmepG. In some embodiments, a trinucleotide cap comprises m7Gpppe6A2’OmepG. [0151] In some embodiments, a trinucleotide cap comprises GAG. In some embodiments, a trinucleotide cap comprises GCG. In some embodiments, a trinucleotide cap comprises GUG. In some embodiments, a trinucleotide cap comprises GGG.
[0152] In some embodiments, a trinucleotide cap comprises any one of the following structures:
, or a stereoisomer, tautomer, or salt thereof. [0153] In some embodiments, the cap analog comprises a tetranucleotide cap. In some embodiments, the tetranucleotide cap comprises a trinucleotide as set forth above. In some embodiments, the tetranucleotide cap comprises m7GpppN1N2N3, where N1, N2, and N3 are optional (i.e., can be absent or one or more can be present) and are independently a natural, a modified, or an unnatural nucleoside base. In some embodiments, m7G is further methylated, e.g., at the 3’ position. In some embodiments, the m7G comprises an O-methyl at the 3’ position. In some embodiments N1, N2, and N3 if present, optionally, are independently an adenine, a uracil, a guanidine, a thymine, or a cytosine. In some embodiments, one or more (or all) of N1, N2, and N3, if present, are methylated, e.g., at the 2’ position. In some
embodiments, one or more (or all) of N1, N2, and N3, if present have an O-methyl at the 2’ position. Formula (IV) [0154] As described herein, in some embodiments, the tetranucleotide cap comprises formula (IV):
or a stereoisomer, tautomer, or salt thereof, wherein B1, B2, and B3 are independently a natural, a modified, or an unnatural nucleoside based; and R1, R2, R3, and R4 are independently OH or O-methyl. In some embodiments, R3 is O-methyl and R4 is OH. In some embodiments, R3 and R4 are O-methyl. In some embodiments, R4 is O-methyl. In some embodiments, R1 is OH, R2 is OH, R3 is O- methyl, and R4 is OH. In some embodiments, R1 is OH, R2 is OH, R3 is O-methyl, and R4 is O-methyl. In some embodiments, at least one of R1 and R2 is O-methyl, R3 is O-methyl, and R4 is OH. In some embodiments, at least one of R1 and R2 is O-methyl, R3 is O-methyl, and R4 is O-methyl. [0155] In some embodiments, B1, B3, and B3 are natural nucleoside bases. In some embodiments, at least one of B1, B2, and B3 is a modified or unnatural base. In some embodiments, at least one of B1, B2, and B3 is N6-methyladenine. In some embodiments, B1 is adenine, cytosine, thymine, or uracil. In some embodiments, B1 is adenine, B2 is uracil, and B3 is adenine. In some embodiments, R1 and R2 are OH, R3 and R4 are O-methyl, B1 is adenine, B2 is uracil, and B3 is adenine. [0156] In some embodiments the tetranucleotide cap comprises a sequence selected from the following sequences: GAAA, GACA, GAGA, GAUA, GCAA, GCCA, GCGA, GCUA, GGAA, GGCA, GGGA, GGUA, GUCA, and GUUA. In some embodiments the
tetranucleotide cap comprises a sequence selected from the following sequences: GAAG, GACG, GAGG, GAUG, GCAG, GCCG, GCGG, GCUG, GGAG, GGCG, GGGG, GGUG, GUCG, GUGG, and GUUG. In some embodiments the tetranucleotide cap comprises a sequence selected from the following sequences: GAAU, GACU, GAGU, GAUU, GCAU, GCCU, GCGU, GCUU, GGAU, GGCU, GGGU, GGUU, GUAU, GUCU, GUGU, and GUUU. In some embodiments the tetranucleotide cap comprises a sequence selected from the following sequences: GAAC, GACC, GAGC, GAUC, GCAC, GCCC, GCGC, GCUC, GGAC, GGCC, GGGC, GGUC, GUAC, GUCC, GUGC, and GUUC. [0157] A tetranucleotide cap, in some embodiments, comprises a sequence selected from the following sequences: m7G3′OMepppApApN, m7G3′OMepppApCpN, m7G3′OMepppApGpN, m7G3′OMepppApUpN, m7G3′OMepppCpApN, m7G3′OMepppCpCpN, m7G3′OMepppCpGpN, m7G3′OMepppCpUpN, m7G3′OMepppGpApN, m7G3′OMepppGpCpN, m7G3′OMepppGpGpN, m7G3′OMepppGpUpN, m7G3′OMepppUpApN, m7G3′OMepppUpCpN, m7G3′OMepppUpGpN, and m7G3′OMepppUpUpN, where N is a natural, a modified, or an unnatural nucleoside base. [0158] A tetranucleotide cap, in other embodiments, comprises a sequence selected from the following sequences: m7G3′OMepppA2′OMepapN, m7G3′OMepppA2′OMepcpN, m7G3′OMepppA2′OMepgpN, m7G3′OMepppA2′OMepupN, m7G3′OMepppC2′OMepapN, m7G3′OMepppC2′OMepcpN, m7G3′OMepppC2′OMepgpN, m7G3′OMepppC2′OMepupN, m7G3′OMepppG2′OMepapN, m7G3′OMepppG2′OMepcpN, m7G3′OMepppG2′OMepgpN, m7G3′OMepppG2′OMepupN, m7G3′OMepppU2′OMepapN, m7G3′OMepppU2′OMepcpN, m7G3′OMepppU2′OMepGpN, and m7G3′OMepppU2′OMepUpN, where N is a natural, a modified, or an unnatural nucleoside base. [0159] A tetranucleotide cap, in still other embodiments, comprises a sequence selected from the following sequences: m7GpppA2′OMepApN, m7GpppA2′OMepCpN, m7GpppA2′OMepGpN, m7GpppA2′OMepUpN, m7GpppC2′OMepApN, m7GpppC2′OMepCpN, m7GpppC2′OMepGpN, m7GpppC2′OMepUpN, m7GpppG2′OMepApN, m7GpppG2′OMepCpN, m7GpppG2′OMepGpN, m7GpppG2′OMepUpN, m7GpppU2′OMepApN, m7GpppU2′OMepCpN, m7GpppU2′OMepGpN, and m7GpppU2′OMepUpN, where N is a natural, a modified, or an unnatural nucleoside base. [0160] A tetranucleotide cap, in other embodiments, comprises a sequence selected from the following sequences: m7g3′OMepppA2′oMepA2′OMepN, m7g3′OMepppA2′oMepC2′OMepN, m7g3′OMepppA2′oMepG2′OMepN, m7g3′OMepppA2′oMepU2′OMepN, m7g3′OMepppC2′oMepA2′OMepN,
m7g3′OMepppC2′oMepC2′OMepN, m7g3′OMepppC2′oMepG2′OMepN, m7g3′OMepppC2′oMepU2′OMepN, m7g3′OMepppG2′oMepA2′OMepN, m7g3′OMepppG2′oMepC2′OMepN, m7g3′OMepppG2′oMepG2′OMepN, m7g3′OMepppG2′oMepU2′OMepN, m7g3′OMepppU2′oMepA2′OMepN, m7g3′OMepppU2′oMepC2′OMepN, m7g3′OMepppU2′OMepg2′OMepN, and m7g3′OMepppU2′OMepU2′OMepN, where N is a natural, a modified, or an unnatural nucleoside base. [0161] A tetranucleotide cap, in still other embodiments, comprises a sequence selected from the following sequences: m7GpppA2′OMepa2′OMepn, m7GpppA2′OMepc2′OMepn, m7GpppA2′OMepg2′OMepn, m7GpppA2′OMepu2′OMepn, m7GpppC2′OMepa2′OMepn, m7GpppC2′OMepc2′OMepn, m7GpppC2′OMepg2′OMepn, m7GpppC2′OMepu2′OMepn, m7GpppG2′OMepa2′OMepn, m7GpppG2′OMepc2′OMepn, m7GpppG2′OMepg2′OMepn, m7GpppG2′OMepu2′OMepn, m7GpppU2′OMepa2′OMepn, m7GpppU2′OMepc2′OMepn, m7GpppU2′OMepG2′OmepN, and m7GpppU2′OMepU2′OMepN, where N is a natural, a modified, or an unnatural nucleoside base. [0162] In some embodiments, a tetranucleotide cap comprises GGAG. In some embodiments, a tetranucleotide cap comprises the following structure:
[0163] The capping efficiency of a post-transcriptional or co-transcriptional capping reaction may vary. As used herein “capping efficiency” refers to the amount (e.g., expressed as a percentage) of mRNAs comprising a cap structure relative to the total mRNAs in a mixture (e.g., a post-translational capping reaction or a co-transcriptional calling reaction). In some embodiments, the capping efficiency of a capping reaction is at least 60%, 70%, 80%, 90%, 95%, 99%, or 99.9% (e.g., after the capping reaction at least 60%, 70%, 80%, 90%, 95%, 99%, or 99.9% of the input mRNAs comprise a cap). In some embodiments,
multivalent co-IVT reactions described herein do not affect the capping efficiency of the mRNAs resulting from the IVT reaction. [0164] A 3′-poly(A) tail is typically a stretch of adenine nucleotides added to the 3′-end of the transcribed mRNA. It can, in some instances, comprise up to about 400 adenine nucleotides. In some embodiments, the length of the 3′-poly(A) tail may be an essential element with respect to the stability of the individual mRNA. [0165] In some embodiments, a composition comprises an RNA (e.g., mRNA) having an ORF that encodes a signal peptide fused to the expressed polypeptide. Signal peptides, usually comprising the N-terminal 15-60 amino acids of proteins, are typically needed for the translocation across the membrane on the secretory pathway and, thus, universally control the entry of most proteins both in eukaryotes and prokaryotes to the secretory pathway. A signal peptide may have a length of 15-60 amino acids. [0166] In some embodiments, an ORF encoding a polypeptide is codon optimized. Codon optimization methods are known in the art. For example, an ORF of any one or more of the sequences provided herein may be codon optimized. Codon optimization, in some embodiments, may be used to match codon frequencies in target and host organisms to ensure proper folding; bias %G/C content to increase mRNA thermodynamic stability or reduce secondary structures; minimize tandem repeat codons or base runs that may impair gene construction or expression; customize transcriptional and translational control regions; insert or remove protein trafficking sequences; remove/add post translation modification sites in encoded protein (e.g., glycosylation sites); add, remove or shuffle protein domains; insert or delete restriction sites; modify ribosome binding sites and mRNA degradation sites; adjust translational rates to allow the various domains of the protein to fold properly; or reduce or eliminate problem secondary structures within the polynucleotide. Codon optimization tools, algorithms and services are known in the art – non-limiting examples include services from GeneArt (Life Technologies), DNA2.0 (Menlo Park CA) and/or proprietary methods. In some embodiments, the open reading frame (ORF) sequence is optimized using optimization algorithms. [0167] In some embodiments, an RNA (e.g., mRNA) is not chemically modified and comprises the standard ribonucleotides consisting of adenosine, guanosine, cytosine and uridine. In some embodiments, nucleotides and nucleosides comprise standard nucleoside residues such as those present in transcribed RNA (e.g. A, G, C, or U). In some embodiments, nucleotides and nucleosides comprise standard deoxyribonucleosides such as those present in DNA (e.g. dA, dG, dC, or dT).
[0168] The compositions can comprise, in some embodiments, an RNA having an open reading frame encoding a polypeptide, wherein the nucleic acid comprises nucleotides and/or nucleosides that can be standard (unmodified) or modified as is known in the art. In some embodiments, nucleotides and nucleosides comprise modified nucleotides or nucleosides. Such modified nucleotides and nucleosides can be naturally-occurring modified nucleotides and nucleosides or non-naturally occurring modified nucleotides and nucleosides. Such modifications can include those at the sugar, backbone, or nucleobase portion of the nucleotide and/or nucleoside as are recognized in the art. [0169] In some embodiments, a naturally-occurring modified nucleotide or nucleotide is one as is generally known or recognized in the art. Non-limiting examples of such naturally occurring modified nucleotides and nucleotides can be found, inter alia, in the widely recognized MODOMICS database. [0170] Also provided are modified nucleosides and nucleotides of a nucleic acid (e.g., RNA nucleic acids, such as mRNA nucleic acids). A “nucleoside” refers to a compound containing a sugar molecule (e.g., a pentose or ribose) or a derivative thereof in combination with an organic base (e.g., a purine or pyrimidine) or a derivative thereof (also referred to herein as “nucleobase”). A “nucleotide” refers to a nucleoside, including a phosphate group. Modified nucleotides may by synthesized by any useful method, such as, for example, chemically, enzymatically, or recombinantly, to include one or more modified or non-natural nucleosides. Nucleic acids can comprise a region or regions of linked nucleosides. Such regions may have variable backbone linkages. The linkages can be standard phosphodiester linkages, in which case the nucleic acids would comprise regions of nucleotides. [0171] In some embodiments, modified nucleosides in nucleic acids (e.g., RNA nucleic acids, such as mRNA nucleic acids) comprise N1-methyl-pseudouridine (m1ψ), 1-ethyl- pseudouridine (e1ψ), 5-methoxy-uridine (mo5U), 5-methyl-cytidine (m5C), and/or pseudouridine (ψ). In some embodiments, modified nucleobases in nucleic acids (e.g., RNA nucleic acids, such as mRNA nucleic acids) comprise 5-methoxymethyl uridine, 5-methylthio uridine, 1-methoxymethyl pseudouridine, 5-methyl cytidine, and/or 5-methoxycytidine. In some embodiments, the polyribonucleotide includes a combination of at least two (e.g., 2, 3, 4 or more) of any of the aforementioned modified nucleobases, including but not limited to chemical modifications. [0172] In some embodiments, an mRNA comprises N1-methyl-pseudouridine (m1ψ) substitutions at one or more or all uridine positions of the nucleic acid.
[0173] In some embodiments, an mRNA comprises N1-methyl-pseudouridine (m1ψ) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid. [0174] In some embodiments, a mRNA comprises pseudouridine (ψ) substitutions at one or more or all uridine positions of the nucleic acid. [0175] In some embodiments, a mRNA pseudouridine (ψ) substitutions at one or more or all uridine positions of the nucleic acid and 5-methyl cytidine substitutions at one or more or all cytidine positions of the nucleic acid. [0176] In some embodiments, a mRNA comprises uridine at one or more or all uridine positions of the nucleic acid. [0177] In some embodiments, mRNAs are uniformly modified (e.g., fully modified, modified throughout the entire sequence) for a particular modification. For example, a nucleic acid can be uniformly modified with N1-methyl-pseudouridine, meaning that all uridine residues in the mRNA sequence are replaced with N1-methyl-pseudouridine. Similarly, a nucleic acid can be uniformly modified for any type of nucleoside residue present in the sequence by replacement with a modified residue such as those set forth above. [0178] The nucleic acids may be partially or fully modified along the entire length of the molecule. For example, one or more or all or a given type of nucleotide (e.g., purine or pyrimidine, or any one or more or all of A, G, U, C) may be uniformly modified in a nucleic acid, or in a predetermined sequence region thereof (e.g., in the mRNA including or excluding the poly(A) tail). In some embodiments, all nucleotides X in a nucleic acid (or in a sequence region thereof) are modified nucleotides, wherein X may be any one of nucleotides A, G, U, C, or any one of the combinations A+G, A+U, A+C, G+U, G+C, U+C, A+G+U, A+G+C, G+U+C or A+G+C. [0179] The mRNAs may comprise one or more regions or parts which act or function as an untranslated region. Where mRNAs are designed to encode at least one polypeptide of interest, the nucleic may comprise one or more of these untranslated regions (UTRs). Wild- type untranslated regions of a nucleic acid are transcribed but not translated. In mRNA, the 5′ UTR starts at the transcription start site and continues to the start codon but does not include the start codon; whereas the 3′ UTR starts immediately following the stop codon and continues until the transcriptional termination signal. The regulatory features of a UTR can be incorporated into the polynucleotides to, among other things, enhance the stability of the molecule. The specific features can also be incorporated to ensure controlled down-regulation
of the transcript in case they are misdirected to undesired organs sites. A variety of 5’UTR and 3’UTR sequences are known and available in the art. Untranslated regions [0180] Untranslated regions (UTRs) are sections of a nucleic acid before a start codon (5′ UTR) and after a stop codon (3′ UTR) that are not translated. In some embodiments, a nucleic acid (e.g., a ribonucleic acid (RNA), e.g., a messenger RNA (mRNA)) comprising an open reading frame (ORF) encoding one or more proteins or peptides further comprises one or more UTR (e.g., a 5′ UTR or functional fragment thereof, a 3′ UTR or functional fragment thereof, or a combination thereof). [0181] A UTR can be homologous or heterologous to the coding region in a nucleic acid. In some embodiments, the UTR is homologous to the ORF encoding the one or more proteins. In some embodiments, the UTR is heterologous to the ORF encoding the one or more proteins. In some embodiments, the nucleic acid comprises two or more 5′ UTRs or functional fragments thereof, each of which have the same or different nucleotide sequences. In some embodiments, the nucleic acid comprises two or more 3′ UTRs or functional fragments thereof, each of which have the same or different nucleotide sequences. [0182] In some embodiments, the 5′ UTR or functional fragment thereof, 3′ UTR or functional fragment thereof, or any combination thereof is sequence optimized. [0183] In some embodiments, the 5′ UTR or functional fragment thereof, 3′ UTR or functional fragment thereof, or any combination thereof comprises at least one chemically modified nucleobase, e.g., 5-methoxyuracil. [0184] UTRs can have features that provide a regulatory role, e.g., increased or decreased stability, localization, and/or translation efficiency. A nucleic acid comprising a UTR can be administered to a cell, tissue, or organism, and one or more regulatory features can be measured using routine methods. In some embodiments, a functional fragment of a 5′ UTR or 3′ UTR comprises one or more regulatory features of a full length 5′ or 3′ UTR, respectively. [0185] Natural 5′ UTRs bear features that play roles in translation initiation. They harbor signatures like Kozak sequences that are commonly known to be involved in the process by which the ribosome initiates translation of many genes.5′ UTRs also have been known to form secondary structures that are involved in elongation factor binding. [0186] By engineering the features typically found in abundantly expressed genes of specific target organs, one can enhance the stability and protein production of a nucleic acid. For example, introduction of 5′ UTR of liver-expressed mRNA, such as albumin, serum
amyloid A, Apolipoprotein A/B/E, transferrin, alpha fetoprotein, erythropoietin, or Factor VIII, can enhance expression of nucleic acids in hepatic cell lines or liver. Likewise, use of 5′ UTRs from other tissue-specific mRNA to improve expression in that tissue is possible for muscle (e.g., MyoD, Myosin, Myoglobin, Myogenin, Herculin), for endothelial cells (e.g., Tie-1, CD36), for myeloid cells (e.g., C/EBP, AML1, G-CSF, GM-CSF, CD11b, MSR, Fr-1, i-NOS), for leukocytes (e.g., CD45, CD18), for adipose tissue (e.g., CD36, GLUT4, ACRP30, adiponectin), and for lung epithelial cells (e.g., SP-A/B/C/D). [0187] In some embodiments, UTRs are selected from a family of transcripts whose proteins share a common function, structure, feature, or property. For example, an encoded polypeptide can belong to a family of proteins (i.e., that share at least one function, structure, feature, localization, origin, or expression pattern), which are expressed in a particular cell, tissue or at some time during development. The UTRs from any of the genes or mRNA can be swapped for any other UTR of the same or different family of proteins to create a new nucleic acid. [0188] In some embodiments, the 5′ UTR and the 3′ UTR can be heterologous. In some embodiments, the 5′ UTR can be derived from a different species than the 3′ UTR. In some embodiments, the 3′ UTR can be derived from a different species than the 5′ UTR. [0189] International Patent Application No. PCT/US2014/021522 (Publ. No. WO/2014/164253) provides a listing of exemplary UTRs that may be utilized in the nucleic acids as flanking regions to an ORF. This publication is incorporated by reference herein for this purpose. [0190] Additional exemplary UTRs that may be utilized in the nucleic acids include, but are not limited to, one or more 5′ UTRs and/or 3′ UTRs derived from the nucleic acid sequence of: a globin, such as an α- or β-globin (e.g., a Xenopus, mouse, rabbit, or human globin); a strong Kozak translational initiation signal; a CYBA (e.g., human cytochrome b- 245 α polypeptide); an albumin (e.g., human albumin7); a HSD17B4 (hydroxysteroid (17-β) dehydrogenase); a virus (e.g., a tobacco etch virus (TEV), a Venezuelan equine encephalitis virus (VEEV), a Dengue virus, a cytomegalovirus (CMV; e.g., CMV immediate early 1 (IE1)), a hepatitis virus (e.g., hepatitis B virus), a sindbis virus, or a PAV barley yellow dwarf virus); a heat shock protein (e.g., hsp70); a translation initiation factor (e.g., elF4G); a glucose transporter (e.g., hGLUT1 (human glucose transporter 1)); an actin (e.g., human α or β actin); a GAPDH; a tubulin; a histone; a citric acid cycle enzyme; a topoisomerase (e.g., a 5′ UTR of a TOP gene lacking the 5′ TOP motif (the oligopyrimidine tract)); a ribosomal protein Large 32 (L32); a ribosomal protein (e.g., human or mouse ribosomal protein, such
as, for example, rps9); an ATP synthase (e.g., ATP5A1 or the β subunit of mitochondrial H+- ATP synthase); a growth hormone (e.g., bovine (bGH) or human (hGH)); an elongation factor (e.g., elongation factor 1 α1 (EEF1A1)); a manganese superoxide dismutase (MnSOD); a myocyte enhancer factor 2A (MEF2A); a β-F1-ATPase, a creatine kinase, a myoglobin, a granulocyte-colony stimulating factor (G-CSF); a collagen (e.g., collagen type I, alpha 2 (Col1A2), collagen type I, alpha 1 (Col1A1), collagen type VI, alpha 2 (Col6A2), collagen type VI, alpha 1 (Col6A1)); a ribophorin (e.g., ribophorin I (RPNI)); a low density lipoprotein receptor-related protein (e.g., LRP1); a cardiotrophin-like cytokine factor (e.g., Nnt1); calreticulin (Calr); a procollagen-lysine, 2-oxoglutarate 5-dioxygenase 1 (Plod1); and a nucleobindin (e.g., Nucb1). [0191] In some embodiments, the 5′ UTR is selected from the group consisting of a β- globin 5′ UTR; a 5′ UTR containing a strong Kozak translational initiation signal; a cytochrome b-245 α polypeptide (CYBA) 5′ UTR; a hydroxysteroid (17-β) dehydrogenase (HSD17B4) 5′ UTR; a Tobacco etch virus (TEV) 5′ UTR; a Venezuelen equine encephalitis virus (TEEV) 5′ UTR; a 5′ proximal open reading frame of rubella virus (RV) RNA encoding nonstructural proteins; a Dengue virus (DEN) 5′ UTR; a heat shock protein 70 (Hsp70) 5′ UTR; a eIF4G 5′ UTR; a GLUT15′ UTR; functional fragments thereof and any combination thereof. [0192] In some embodiments, the 3′ UTR is selected from the group consisting of a β- globin 3′ UTR; a CYBA 3′ UTR; an albumin 3′ UTR; a growth hormone (GH) 3′ UTR; a VEEV 3′ UTR; a hepatitis B virus (HBV) 3′ UTR; α-globin 3′ UTR; a DEN 3′ UTR; a PAV barley yellow dwarf virus (BYDV-PAV) 3′ UTR; an elongation factor 1 α1 (EEF1A1) 3′ UTR; a manganese superoxide dismutase (MnSOD) 3′ UTR; a β subunit of mitochondrial H(+)-ATP synthase (β-mRNA) 3′ UTR; a GLUT13′ UTR; a MEF2A 3′ UTR; a β-F1- ATPase 3′ UTR; functional fragments thereof and combinations thereof. [0193] Wild-type UTRs derived from any gene or mRNA can be incorporated into the nucleic acids. In some embodiments, a UTR can be altered relative to a wild type or native UTR to produce a variant UTR, e.g., by changing the orientation or location of the UTR relative to the ORF; or by inclusion of additional nucleotides, deletion of nucleotides, swapping or transposition of nucleotides. In some embodiments, variants of 5′ or 3′ UTRs can be utilized, for example, mutants of wild type UTRs, or variants wherein one or more nucleotides are added to or removed from a terminus of the UTR. [0194] Additionally, one or more synthetic UTRs can be used in combination with one or more non-synthetic UTRs. See, e.g., Mandal and Rossi, Nat. Protoc.20138(3):568-82, and
sequences available at www.addgene.org, the contents of each are incorporated herein by reference in their entirety. UTRs or portions thereof can be placed in the same orientation as in the transcript from which they were selected or can be altered in orientation or location. Hence, a 5′ and/or 3′ UTR can be inverted, shortened, lengthened, or combined with one or more other 5′ UTRs or 3′ UTRs. [0195] In some embodiments, the nucleic acid may comprise multiple UTRs, e.g., a double, a triple or a quadruple 5′ UTR or 3′ UTR. For example, a double UTR comprises two copies of the same UTR either in series or substantially in series. For example, a double beta- globin 3′ UTR can be used (see, e.g., US 2010/0129877, the contents of which are incorporated herein by reference for this purpose). [0196] The nucleic acids can comprise combinations of features. For example, the ORF can be flanked by a 5′ UTR that comprises a strong Kozak translational initiation signal and/or a 3′ UTR comprising an oligo(dT) sequence for templated addition of a polyA tail. A 5′ UTR can comprise a first nucleic acid fragment and a second nucleic acid fragment from the same and/or different UTRs (see, e.g., US 2010/0293625, herein incorporated by reference in its entirety for this purpose). [0197] Other non-UTR sequences can be used as regions or subregions within the nucleic acids. For example, introns or portions of intron sequences can be incorporated into the nucleic acids. Incorporation of intronic sequences can increase protein production as well as nucleic acid expression levels. In some embodiments, the nucleic acid comprises an internal ribosome entry site (IRES) instead of or in addition to a UTR (see, e.g., Yakubov et al., Biochem. Biophys Res Commun.2010.394(1):189-193, the contents of which are incorporated herein by reference in their entirety). In some embodiments, the nucleic acid comprises an IRES instead of a 5′ UTR sequence. In some embodiments, the nucleic acid comprises an IRES that is located between a 5′ UTR and an open reading frame. In some embodiments, the nucleic acid comprises an ORF encoding a viral capsid sequence. In some embodiments, the nucleic acid comprises a synthetic 5′ UTR in combination with a non- synthetic 3′ UTR. [0198] In some embodiments, the UTR can also include at least one translation enhancer nucleic acid, translation enhancer element, or translational enhancer elements (collectively, “TEE,” which refers to nucleic acid sequences that increase the amount of polypeptide or protein produced from a polynucleotide. As a non-limiting example, the TEE can include those described in US2009/0226470, incorporated herein by reference in its entirety for this purpose, and others known in the art. As a non-limiting example, the TEE can be located
between the transcription promoter and the start codon. In some embodiments, the 5′ UTR comprises a TEE. In one aspect, a TEE is a conserved element in a UTR that can promote translational activity of a nucleic acid such as, but not limited to, cap-dependent or cap- independent translation. In one non-limiting example, the TEE comprises the TEE sequence in the 5′-leader of the Gtx homeodomain protein. See, e.g., Chappell et al., PNAS.2004. 101:9590-9594, incorporated herein by reference in its entirety for this purpose. Poly(A) tails [0199] Some aspects relate to methods of producing RNAs containing one or more polyA tails. A “polyA tail” is a region of mRNA that is downstream, e.g., directly downstream (i.e., 3′), from the open reading frame and/or the 3′ UTR that contains multiple, consecutive adenosine monophosphates. A polyA tail may contain 10 to 300 adenosine monophosphates. For example, a polyA tail may contain 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or 300 adenosine monophosphates. In some embodiments, a polyA tail contains 50 to 250 adenosine monophosphates. In a relevant biological setting (e.g., in cells, in vivo, etc.) the poly(A) tail functions to protect mRNA from enzymatic degradation, e.g., in the cytoplasm, and aids in transcription termination, export of the mRNA from the nucleus, and translation. [0200] As used herein, “polyA-tailing efficiency” refers to the amount (e.g., expressed as a percentage) of mRNAs having polyA tail that are produced by an IVT reaction using an input DNA relative to the total number of mRNAs produced in the IVT reaction using the input DNA. The polyA-tailing efficiency of an IVT reaction may vary, for example depending upon the RNA polymerase used, amount or purity of input DNA used, etc. In some embodiments, the polyA-tailing efficiency of an IVT reaction is greater than 85%, 90%, 95%, or 99.9%. Methods of calculating polyA-tailing efficiency are known, for example by determining the amount of polyA tail-containing mRNA relative to total mRNA produced in an IVT reaction by column chromatography (e.g., oligo-dT chromatography). [0201] In some embodiments, at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 99.9% of RNAs in an RNA composition produced by a method described herein comprise a polyA tail. In some embodiments, at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 99.9% of each RNA in an RNA composition produced by a method described herein comprise a polyA tail. The efficiency (e.g., percentage of polyA tail-containing RNAs in an RNA composition may be measured i) after
the IVT reaction and before purification, or ii) after the RNA composition has been purified (e.g., by chromatography, such as oligo-dT chromatography). [0202] Unique polyA tail lengths provide certain advantages to nucleic acids. Generally, the length of a polyA tail, when present, is greater than 30 nucleotides in length. In another embodiment, the polyA tail is greater than 35 nucleotides in length (e.g., at least or greater than about 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1,000, 1,100, 1,200, 1,300, 1,400, 1,500, 1,600, 1,700, 1,800, 1,900, 2,000, 2,500, or 3,000 nucleotides). [0203] In some embodiments, the polyA tail is designed relative to the length of the overall nucleic acid or the length of a particular region of the nucleic acid. This design can be based on the length of a coding region, the length of a particular feature or region or based on the length of the ultimate product expressed from the nucleic acids. [0204] In this context, the polyA tail can be 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% greater in length than the nucleic acid or feature thereof. The polyA tail can also be designed as a fraction of the nucleic acid to which it belongs. In this context, the polyA tail can be 10, 20, 30, 40, 50, 60, 70, 80, or 90% or more of the total length of the construct, a construct region, or the total length of the construct minus the polyA tail. Further, engineered binding sites and conjugation of nucleic acids for PolyA-binding protein can enhance expression. In vitro transcription [0205] Some aspects relate to mRNAs produced by “in vitro transcription” or IVT. IVT methods produce (e.g., synthesize) an RNA transcript (e.g., mRNA transcript) by contacting a DNA template (e.g., a first input DNA and a second input DNA) with an RNA polymerase (e.g., a T7 RNA polymerase, a T7 RNA polymerase variant, etc.) under conditions that result in the production of the RNA transcript. IVT conditions typically require a purified DNA template containing a promoter, nucleoside triphosphates, a buffer system that includes dithiothreitol (DTT) and magnesium ions, and an RNA polymerase. The exact conditions used in the transcription reaction depend on the amount of RNA needed for a specific application. Typical IVT reactions are performed by incubating a DNA template with an RNA polymerase and nucleoside triphosphates, including GTP, ATP, CTP, and UTP (or nucleotide analogs) in a transcription buffer. An RNA transcript having a 5 ^ terminal guanosine triphosphate is produced from this reaction. [0206] In some embodiments, IVT methods further comprise a step of separating (e.g., purifying) in vitro transcription products (e.g., mRNA) from other reaction components. In
some embodiments, the separating comprises performing chromatography on the IVT reaction mixture. In some embodiments, the method comprises reverse phase chromatography. In some embodiments, the method comprises reverse phase column chromatography. In some embodiments, the chromatography comprises size-based (e.g., length-based) chromatography. In some embodiments, the method comprises size exclusion chromatography. In some embodiments, the chromatography comprises oligo-dT chromatography. Multivalent in vitro transcription (IVT) [0207] Some aspects relate to multivalent in vitro transcription. Multivalent in vitro transcription refers to contacting two or more DNA templates (e.g., a first input DNA and a second input DNA) with an RNA polymerase (e.g., a T7 RNA polymerase) under conditions that result in the production of RNA transcripts. [0208] Each input DNA (e.g., in a population of input DNA templates) in a co-IVT reaction may be obtained from a different source than other input DNAs. For example, each input DNA may be obtained from a different bacterial cell or population or bacterial cells. For example, in a co-IVT reaction having three populations of input DNAs, a first input DNA can be produced in bacterial cell population A, a second input DNA can be produced in bacterial cell population B, and a third input DNA can be produced in bacterial cell population C, where each of A, B, and C are not the same bacterial culture (e.g., co-cultured in the same container or plate). In another example, different input DNAs are obtained by separate synthesis reactions or produced by separate amplification reactions. [0209] The amounts of input DNAs used in multivalent co-IVT reactions may be normalized. Normalization may be based, for example, on the molar masses, lengths, nucleotide contents, degradation rates, and/or purity of input DNAs. In some embodiments, normalization is based on the degradation rate of resulting RNAs. [0210] Normalization may be based on the lowest level of a certain characteristic present among the input DNAs (e.g., lowest molar mass, degradation rate (e.g., of the input DNA and/or output RNA), nucleotide content, purity, and/or polyA-tailing efficiency). Alternatively, normalization may be based on the highest level of a certain characteristic present among the input DNAs (e.g., highest molar mass, degradation rate (e.g., of the input DNA and/or output RNA), nucleotide context, purity, and/or polyA-tailing efficiency). In some embodiments, normalization is based on the rate of RNA production from the input
DNAs (e.g., the highest rate of RNA production of an input DNA or the lowest rate of RNA production of an input DNA in a reaction mixture). [0211] The amount of one or more input DNAs may be adjusted and/or normalized to improve production of RNA compositions having a pre-defined or desired ratio of RNA components. Adjusting and/or normalizing amounts of input DNAs may compensate for differences between input DNAs (e.g., large differences in lengths of two input DNAs, or different polyA tailing efficiencies) that can affect the ratio of RNAs in a multivalent RNA composition, thereby allowing for the production of RNA compositions having desired ratios of different RNAs. For example, the amount of two input DNAs present in a co-IVT reaction may be determined by selecting a desired molar ratio of a first RNA to a second RNA, calculating the mass of each DNA template necessary to achieve the same molar ratio between input DNAs, and combining input DNAs encoding each of the first and second RNAs in the same molar ratio. [0212] The number of input DNAs (e.g., populations of input DNA molecules) used in an IVT reaction may vary, depending upon the number of different RNA molecules desired to be included in the multivalent RNA composition. An IVT reaction mixture may comprise 2 or more different input DNAs (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different input DNAs). [0213] The concentration of each of the populations of DNA molecules may also vary. [0214] The input DNAs may be added to an IVT reaction are a predefined DNA ratio, which may comprise a ratio between 2, 3, 4, 5, 6, 7, 8, 9, 10, or more different input DNAs (e.g., depending on the number of different RNAs in a composition). [0215] The size of two or more input DNAs (e.g., DNAs in two or more different populations of input DNAs) may also vary. [0216] The mass of each population of input DNA molecules in an IVT reaction may also vary. [0217] The molar ratio between populations of input DNA molecules in an IVT reaction may also vary. [0218] Different input DNA molecules used in an IVT reaction may have a different length (e.g., comprises a different number of nucleotides). [0219] A co-IVT reaction may include co-transcription of at least 2 different input DNAs (e.g., at least 2 of DNA A, B, C, D, E, F, F, H, I, J, etc.) at a ratio of A:B:C:D:E:F:G:H:I:J, wherein if DNA A is normalized to 1, one or more of DNA B, C, D, E, F, G, H, I, J, etc. can each independently be present at an amount (e.g., a concentration) that is from 0.01 to 100
times the amount (e.g., a concentration) of A. One or more of DNA B, C, D, E, F, G, H, I, or J may also be absent. [0220] A multivalent RNA composition may be produced by combining RNA transcripts (e.g., mRNAs) from separate sources. For example, each of two or more DNA templates may be transcribed in separate IVT reactions, and combined to produce a multivalent RNA composition. RNAs may be combined in any desired amount to produce a multivalent RNA composition comprising two or more RNAs in a specific ratio. Identification and Ratio Determination (IDR) sequences [0221] In some embodiments, one or more nucleic acids comprises an Identification and Ratio Determination sequence. An Identification and Ratio Determination (IDR) sequence is a sequence of a biological molecule (e.g., nucleic acid or protein) that, when combined with the sequence of a target biological molecule, serves to identify the target biological molecule. Typically, an IDR sequence is a heterologous sequence that is incorporated within or appended to a sequence of a target biological molecule and can be used as a reference to identify the target molecule. Thus, in some embodiments, a nucleic acid (e.g., mRNA) comprises (i) a target sequence of interest (e.g., a coding sequence encoding a therapeutic and/or antigenic peptide or protein); and (ii) a unique IDR sequence. [0222] An RNA species (e.g., RNA having a given coding sequence) may comprise an IDR sequence that differs from the IDR sequence of other RNA species (e.g., RNA(s) having different coding sequence(s)). Each IDR sequence thus identifies a particular RNA species, and so the abundance of IDR sequences may be measured to determine the abundance of each RNA species in a composition. Use of distinct IDR sequences to identify RNA species allows for analysis of multivalent RNA compositions (e.g., containing multiple RNA species) containing RNA species with similar coding sequences and/or lengths, which could otherwise be difficult to distinguish using PCR- or chromatography-based analysis of full-length RNAs. [0223] Each RNA species in a multivalent RNA composition may comprise an IDR sequence that is not a sequence isomer of an IDR sequence of another RNA species in a multivalent RNA composition (e.g., the IDR sequence does not have the same number of adenosine nucleotides, the same number of cytosine nucleotides, the same number of guanine nucleotides, and the same number of uracil nucleotides, as another IDR sequence in the composition, even if those sequences have different sequences). Having identical nucleotide compositions causes sequence isomers to have the same mass, presenting a challenge to
distinguishing sequence isomers using mass-based identification methods (e.g., mass spectrometry). [0224] Each RNA species in a multivalent RNA composition may comprise an IDR sequence having a mass that differs from the mass of IDR sequences of each other RNA species in a multivalent RNA composition. For example, the mass of each IDR sequence may differ from the mass of other IDR sequences by at least 9 Da, at least 25 Da, at least 25 Da, or at least 50 Da. Use of IDR sequences with distinct masses allows RNA fragments comprising different IDR sequences to be distinguished using mass-based analysis methods (e.g., mass spectrometry), which do not require reverse transcription, amplification, or sequencing of RNAs. [0225] Each RNA species in an RNA composition may comprises an IDR sequence with a different length. For example, each IDR sequence may have a length independently selected from 0 to 25 nucleotides. The length of a nucleic acid influences the rate at which the nucleic acid traverses a chromatography column, and so the use of IDR sequences of different lengths on different RNA species allows RNA fragments having different IDR sequences to be distinguished using chromatography-based methods (e.g., LC-UV). [0226] IDR sequences may be chosen such that no IDR sequence comprises a start codon, ‘AUG’. Lack of a start codon in an IDR sequence prevents undesired translation of nucleotide sequences within and/or downstream from the IDR sequence. [0227] IDR sequences may be chosen such that no IDR sequence comprises a recognition site for a restriction enzyme. In one example, no IDR sequence comprises a recognition site for XbaI, ‘UCUAG’. Lack of a recognition site for a restriction enzyme (e.g., XbaI recognition site ‘UCUAG’) allows the restriction enzyme to be used in generating and modifying a DNA template for in vitro transcription, without affecting the IDR sequence or sequence of the transcribed RNA. Lipid Compositions [0228] In some embodiments, the nucleic acids are formulated as a lipid composition, such as a composition comprising a lipid nanoparticle, a liposome, and/or a lipoplex. In some embodiments, nucleic acids are formulated as lipid nanoparticle (LNP) compositions. Lipid nanoparticles typically comprise amino lipid, non-cationic lipid, structural lipid, and PEG lipid components along with the nucleic acid cargo of interest. The lipid nanoparticles can be generated using components, compositions, and methods as are generally known in the art, see for example PCT/US2016/052352; PCT/US2016/068300; PCT/US2017/037551;
PCT/US2015/027400; PCT/US2016/047406; PCT/US2016000129; PCT/US2016/014280; PCT/US2017/038426; PCT/US2014/027077; PCT/US2014/055394; PCT/US2016/52117; PCT/US2012/069610; PCT/US2017/027492; PCT/US2016/059575; PCT/US2016/069491; PCT/US2016/069493; and PCT/US2014/66242, all of which are incorporated by reference herein in their entirety. [0229] In some embodiments, the lipid nanoparticle comprises at least one ionizable amino lipid, at least one non-cationic lipid, at least one sterol, and/or at least one polyethylene glycol (PEG)-modified lipid. [0230] In some embodiments, the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-25% non-cationic lipid, 25-55% structural lipid, and 0.5-15% PEG- modified lipid. [0231] In some embodiments, the lipid nanoparticle comprises a molar ratio of 20-60% ionizable amino lipid, 5-30% non-cationic lipid, 10-55% structural lipid, and 0.5-15% PEG- modified lipid. [0232] In some embodiments, the lipid nanoparticle comprises 40-50 mol% ionizable lipid, optionally 45-50 mol%, for example, 45-46 mol%, 46-47 mol%, 47-48 mol%, 48-49 mol%, or 49-50 mol% for example about 45 mol%, 45.5 mol%, 46 mol%, 46.5 mol%, 47 mol%, 47.5 mol%, 48 mol%, 48.5 mol%, 49 mol%, or 49.5 mol%. [0233] In some embodiments, the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid. For example, the lipid nanoparticle may comprise 20-50 mol%, 20-40 mol%, 20- 30 mol%, 30-60 mol%, 30-50 mol%, 30-40 mol%, 40-60 mol%, 40-50 mol%, or 50-60 mol% ionizable amino lipid. In some embodiments, the lipid nanoparticle comprises 20 mol%, 30 mol%, 40 mol%, 50 mol%, or 60 mol% ionizable amino lipid. In some embodiments, the lipid nanoparticle comprises 35 mol%, 36 mol%, 37 mol%, 38 mol%, 39 mol%, 40 mol%, 41 mol%, 42 mol%, 43 mol%, 44 mol%, 45 mol%, 46 mol%, 47 mol%, 48 mol%, 49 mol%, 50 mol%, 51 mol%, 52 mol%, 53 mol%, 54 mol%, or 55 mol% ionizable amino lipid. [0234] In some embodiments, the lipid nanoparticle comprises 45 – 55 mole percent (mol%) ionizable amino lipid. For example, lipid nanoparticle may comprise 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, or 55 mol% ionizable amino lipid.
Ionizable amino lipids Formula (AI) [0235] In some embodiments, the ionizable amino lipid is a compound of Formula (AI):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched; wherein R’branched
denotes a point of attachment; wherein Raα, Raβ, Raγ, and Raδ are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH, wherein n is selected from the group consisting
wherein
denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the group consisting of 5, 6, 7, 8, 9, 10, 11, 12, and 13.
In some embodiments of the compounds of Formula (AI), R’a is R’branched; R’branched is
denotes a point of attachment; Raα, Raβ, Raγ, and Raδ are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. [0236] In some embodiments of the compounds of Formula (AI), R’a is R’branched;
point of attachment; Raα, Raβ, Raγ, and Raδ are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 3; and m is 7. [0237] In some embodiments of the compounds of Formula (AI), R’a is R’branched;
denotes a point of attachment; Raα is C2-12 alkyl; Raβ, Raγ, and Raδ are each H; R2 and R3 are each C1-14 alkyl;
R10 NH(C1-6 alkyl); n2 is 2; R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. [0238] In some embodiments of the compounds of Formula (AI), R’a is R’branched;
point of attachment; Raα, Raβ, and Raδ are each H; Raγ is C2-12 alkyl; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. [0239] In some embodiments, the compound of Formula (AI) is selected from:
, , and . [0240] In some embodiments, the ionizable amino lipid of Formula (AI) is a compound of Formula (AIa):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched; wherein
denotes a point of attachment; wherein Raβ, Raγ, and Raδ are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein
denotes a point of attachment; wherein
R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the group consisting of 5, 6, 7, 8, 9, 10, 11, 12, and 13. [0241] In some embodiments, the ionizable amino lipid of Formula (AI) is a compound of Formula (AIb):
wherein Raα, Raβ, Raγ, and Raδ are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is -(CH2)nOH, wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-;
R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the group consisting of 5, 6, 7, 8, 9, 10, 11, 12, and 13. [0242] In some embodiments of Formula (AI) or (AIb), R’a is R’branched; R’branched is
denotes a point of attachment; Raβ, Raγ, and Raδ are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. [0243] In some embodiments of Formula (AI) or (AIb), R’a is R’branched; R’branched is
denotes a point of attachment; Raβ, Raγ, and Raδ are each H; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 3; and m is 7. [0244] In some embodiments of Formula (AI) or (AIb), R’a is R’branched; R’branched is
denotes a point of attachment; Raβ and Raδ are each H; Raγ is C2-12 alkyl; R2 and R3 are each C1-14 alkyl; R4 is -(CH2)nOH; n is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. [0245] In some embodiments, the ionizable amino lipid of Formula (AI) is a compound of Formula (AIc):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched; wherein
denotes a point of attachment; wherein Raα, Raβ, Raγ, and Raδ are each independently selected from the group consisting of H, C2-12 alkyl, and C2-12 alkenyl;
R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl;
wherein denotes a point of attachment; whereinR10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are each independently selected from the group consisting of -C(O)O- and -OC(O)-; R’ is a C1-12 alkyl or C2-12 alkenyl; l is selected from the group consisting of 1, 2, 3, 4, and 5; and m is selected from the group consisting of 5, 6, 7, 8, 9, 10, 11, 12, and 13. [0246] In some embodiments,
denotes a point of attachment; Raβ, Raγ, and Raδ are each H; Raα is C2-12 alkyl; R2 and R3 are
NH(C1-6 alkyl); n2 is 2; each R5 is H; each R6 is H; M and M’ are each -C(O)O-; R’ is a C1-12 alkyl; l is 5; and m is 7. [0247] In some embodiments, the compound of Formula (AIc) is:
.
Formula (AII) [0248] In some embodiments, the ionizable amino lipid is a compound of Formula (AII):
wherein R’a is R’branched or R’cyclic; wherein
a Raγ and Raδ are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of Raγ and Raδ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; Rbγ and Rbδ are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of Rbγ and Rbδ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein
denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl; Ya is a C3-6 carbocycle;
R*”a is selected from the group consisting of C1-15 alkyl and C2-15 alkenyl; and s is 2 or 3; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. [0249] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-a):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched or R’cyclic; wherein
wherein
denotes a point of attachment; Raγ and Raδ are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of Raγ and Raδ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; Rbγ and Rbδ are each independently selected from the group consisting of H, C1-12 alkyl, and C2-12 alkenyl, wherein at least one of Rbγ and Rbδ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl;
m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. [0250] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-b):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched or R’cyclic; wherein
wherein
denotes a point of attachment; Raγ and Rbγ are each independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein
denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9.
[0251] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-c):
wherein
denotes a point of attachment; wherein Raγ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; R’ is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. [0252] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-d):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched or R’cyclic; wherein
R’branched is: and R’b is: ; wherein
denotes a point of attachment; wherein Raγ and Rbγ are each independently selected from the group consisting of C1- 12 alkyl and C2-12 alkenyl; R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein denotes a point of attachment; wherein R10 is N(R)2; each R is independently selected from the group consisting of C1-6 alkyl, C2-3 alkenyl, and H; and n2 is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10; each R’ independently is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9; l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. [0253] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-e):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched or R’cyclic; wherein
wherein
denotes a point of attachment; wherein Raγ is selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; R2 and R3 are each independently selected from the group consisting of C1-14 alkyl and C2-14 alkenyl; R4 is -(CH2)nOH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C1-12 alkyl or C2-12 alkenyl; m is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9;
l is selected from 1, 2, 3, 4, 5, 6, 7, 8, and 9. [0254] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), m and l are each independently selected from 4, 5, and 6. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), m and l are each 5. [0255] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), each R’ independently is a C1-12 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), each R’ independently is a C2-5 alkyl. [0256] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R’b is:
and R2 and R3 are each independently a C1-14 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’b is:
and R2 and R3 are each independently a C6-10 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’b is:
are each a C8 alkyl. [0257] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R’branched is:
is:
alkyl and R2 and R3 are each independently a C6-10 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’branched is:
are each independently a C6-10 alkyl. In some embodiments of the compound of Formula (AII), (AII-
Raγ is a C2-6 alkyl, and R2 and R3 are each a C8 alkyl. [0258] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-
Rbγ are each a C1-12 alkyl. In some embodiments of the compound of Formula (AII), (AII-a),
, and Raγ and Rbγ are each a C2-6 alkyl. [0259] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), m and l are each independently selected from 4, 5, and 6 and each R’ independently is a C1-12 alkyl. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), m and l are each 5 and each R’ independently is a C2-5 alkyl. [0260] In some embodiments of the compound of (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’branched is:
are each independently selected from 4, 5, and 6, each R’ independently is a C1-12 alkyl, and Raγ and Rbγ are each a C1-12 alkyl. In some embodiments of the compound of Formula (AII), (AII-a),
, m and l are each 5, each R’ independently is a C2-5 alkyl, and Raγ and Rbγ are each a C2-6 alkyl. [0261] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R’branched is:
are each independently selected from 4, 5, and 6, R’ is a C1-12 alkyl, Raγ is a C1-12 alkyl and R2 and R3 are each independently a C6-10 alkyl. [0262] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R’branched is:
are each 5, R’ is a C2-5 alkyl, Raγ is a C2-6 alkyl, and R2 and R3 are each a C8 alkyl. [0263] In some embodiments of the compound of (AII), (AII-a), (AII-b), (AII-c), (AII-d),
wherein R10 is NH(C1-6 alkyl) and n2 is 2. In some
embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e),
wherein R10 is NH(CH3) and n2 is 2. [0264] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R’branched is:
each independently selected from 4, 5, and 6, each R’ independently is a C1-12 alkyl, Raγ and Rbγ are each a C1-12 alkyl,
wherein R10 is NH(C1-6 alkyl), and n2 is 2. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’branched is:
each 5, each R’ independently is a C2-5 alkyl, Raγ and Rbγ are each a C2-6 alkyl, and R4 is
, wherein R10 is NH(CH3) and n2 is 2. [0265] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R’branched is:
is:
are each independently selected from 4, 5, and 6, R’ is a C1-12 alkyl, R2 and R3 are each independently a C6-10 alkyl, Raγ is a C1-12 alkyl,
wherein R10 is NH(C1-6 alkyl) and n2 is 2. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’branched is:
are each 5, R’ is a C2-5 alkyl, Raγ is a C2-6 alkyl, R2 and R3 are each a C8 alkyl, and R4 is
, wherein R10 is NH(CH3) and n2 is 2.
[0266] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R4 is -(CH2)nOH and n is 2, 3, or 4. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R4 is -(CH2)nOH and n is 2. [0267] In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII- c), (AII-d), or (AII-e), R’branched is:
each independently selected from 4, 5, and 6, each R’ independently is a C1-12 alkyl, Raγ and Rbγ are each a C1-12 alkyl, R4 is -(CH2)nOH, and n is 2, 3, or 4. In some embodiments of the compound of Formula (AII), (AII-a), (AII-b), (AII-c), (AII-d), or (AII-e), R’branched is:
, m and l are each 5, each R’ independently is a C2-5 alkyl, Raγ and Rbγ are each a C2-6 alkyl, R4 is -(CH2)nOH, and n is 2. [0268] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-f):
its N-oxide, or a salt or isomer thereof, wherein R’a is R’branched or R’cyclic; wherein
wherein
denotes a point of attachment; Raγ is a C1-12 alkyl; R2 and R3 are each independently a C1-14 alkyl; R4 is -(CH2)nOH wherein n is selected from the group consisting of 1, 2, 3, 4, and 5; R’ is a C1-12 alkyl; m is selected from 4, 5, and 6; and l is selected from 4, 5, and 6. [0269] In some embodiments of the compound of Formula (AII-f), m and l are each 5, and n is 2, 3, or 4.
[0270] In some embodiments of the compound of Formula (AII-f) R’ is a C2-5 alkyl, Raγ is a C2-6 alkyl, and R2 and R3 are each a C6-10 alkyl. [0271] In some embodiments of the compound of Formula (AII-f), m and l are each 5, n is 2, 3, or 4, R’ is a C2-5 alkyl, Raγ is a C2-6 alkyl, and R2 and R3 are each a C6-10 alkyl. [0272] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-g):
its N-oxide, or a salt or isomer thereof; wherein Raγ is a C2-6 alkyl; R’ is a C2-5 alkyl; and R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein denotes a point of attachment, R10 is NH(C1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3. [0273] In some embodiments, the ionizable amino lipid of Formula (AII) is a compound of Formula (AII-h):
its N-oxide, or a salt or isomer thereof; wherein Raγ and Rbγ are each independently a C2-6 alkyl; each R’ independently is a C2-5 alkyl; and R4 is selected from the group consisting of -(CH2)nOH wherein n is selected from the group consisting
wherein denotes a point of attachment, R10 is NH(C1-6 alkyl), and n2 is selected from the group consisting of 1, 2, and 3. [0274] In some embodiments of the compound of Formula (AII-g) or (AII-h), R4 is
, wherein R10 is NH(CH3) and n2 is 2. [0275] In some embodiments of the compound of Formula (AII-g) or (AII-h), R4 is - (CH2)2OH. Formula (AIII) [0276] In some embodiments, the ionizable amino lipids may be one or more of compounds of Formula (AIII):
or their N-oxides, or salts or isomers thereof, wherein: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of hydrogen, a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a carbocycle, heterocycle, -OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -N(R)2, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -N(R)R 8, -N(R)S(O)2R8, -O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR)C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)N(R)2,
-C(=NR9)R, -C(O)N(R)OR, and –C(R)N(R)2C(O)OR, and each n is independently selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S -S-, an aryl group, and a heteroaryl group, in which M” is a bond, C1-13 alkyl or C2-13 alkenyl; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-15 alkyl and C3-15 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13; and wherein when R4 is -(CH2)nQ, -(CH2)nCHQR, –CHQR, or -CQ(R)2, then (i) Q is not -N(R)2 when n is 1, 2, 3, 4 or 5, or (ii) Q is not 5, 6, or 7-membered heterocycloalkyl when n is 1 or 2. [0277] In some embodiments, another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’;
R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N, O, and S, - OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -CRN(R)2C(O)OR, -N(R)R8, -O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR)C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)N(R)2, -C(=NR9)R, -C(O)N(R)O R, and a 5- to 14-membered heterocycloalkyl having one or more heteroatoms selected from N, O, and S which is substituted with one or more substituents selected from oxo (=O), OH, amino, mono- or di-alkylamino, and C1-3 alkyl, and each n is independently selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl;
each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. [0278] In some embodiments, another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heterocycle having one or more heteroatoms selected from N, O, and S, - OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -CRN(R)2C(O)OR, -N(R)R8, -O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, - N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR) C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)R, -C(O)N(R)OR, and -C(=NR9)N(R)2, and each n is independently selected from 1, 2, 3, 4, and 5; and when Q is a 5- to 14-membered heterocycle and (i) R4 is -(CH2)nQ in which n is 1 or 2, or (ii) R4 is -(CH2)nCHQR in which n is 1, or (iii) R4 is -CHQR, and -CQ(R)2, then Q is either a 5- to 14-membered heteroaryl or 8- to 14-membered heterocycloalkyl; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H;
M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. [0279] In some embodiments, another subset of compounds of Formula (AIII) includes those in which: R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of a C3-6 carbocycle, -(CH2)nQ, -(CH2)nCHQR, -CHQR, -CQ(R)2, and unsubstituted C1-6 alkyl, where Q is selected from a C3-6 carbocycle, a 5- to 14-membered heteroaryl having one or more heteroatoms selected from N, O, and S, - OR, -O(CH2)nN(R)2, -C(O)OR, -OC(O)R, -CX3, -CX2H, -CXH2, -CN, -C(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)C(O)N(R)2, -N(R)C(S)N(R)2, -CRN(R)2C(O)OR, -N(R)R8,
-O(CH2)nOR, -N(R)C(=NR9)N(R)2, -N(R)C(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, -N(OR)C(O)R, -N(OR)S(O)2R, -N(OR)C(O)OR, -N(OR)C(O)N(R)2, -N(OR)C(S)N(R)2, -N(OR)C(=NR9)N(R)2, -N(OR)C(=CHR9)N(R)2, -C(=NR9)R, -C(O)N(R)OR, and -C(=NR9)N(R)2, and each n is independently selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; R8 is selected from the group consisting of C3-6 carbocycle and heterocycle; R9 is selected from the group consisting of H, CN, NO2, C1-6 alkyl, -OR, -S(O)2R, -S(O)2N(R)2, C2-6 alkenyl, C3-6 carbocycle and heterocycle; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. [0280] In some embodiments, another subset of compounds of Formula (AIII) includes those in which R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of H, C2-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle;
R4 is -(CH2)nQ or -(CH2)nCHQR, where Q is -N(R)2, and n is selected from 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C1-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. [0281] In some embodiments, another subset of compounds of Formula (AIII) includes those in which R1 is selected from the group consisting of C5-30 alkyl, C5-20 alkenyl, -R*YR”, -YR”, and -R”M’R’; R2 and R3 are independently selected from the group consisting of C1-14 alkyl, C2-14 alkenyl, -R*YR”, -YR”, and -R*OR”, or R2 and R3, together with the atom to which they are attached, form a heterocycle or carbocycle; R4 is selected from the group consisting of -(CH2)nQ, -(CH2)nCHQR, -CHQR, and -CQ(R)2, where Q is -N(R)2, and n is selected from 1, 2, 3, 4, and 5; each R5 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H;
each R6 is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; M and M’ are independently selected from -C(O)O-, -OC(O)-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S)-, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -S-S-, an aryl group, and a heteroaryl group; R7 is selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R is independently selected from the group consisting of C1-3 alkyl, C2-3 alkenyl, and H; each R’ is independently selected from the group consisting of C1-18 alkyl, C2-18 alkenyl, -R*YR”, -YR”, and H; each R” is independently selected from the group consisting of C3-14 alkyl and C3-14 alkenyl; each R* is independently selected from the group consisting of C1-12 alkyl and C1-12 alkenyl; each Y is independently a C3-6 carbocycle; each X is independently selected from the group consisting of F, Cl, Br, and I; and m is selected from 5, 6, 7, 8, 9, 10, 11, 12, and 13, or salts or isomers thereof. [0282] In certain embodiments, a subset of compounds of Formula (AIII) includes those of Formula (AIII-A):
or its N-oxide, or a salt or isomer thereof, wherein l is selected from 1, 2, 3, 4, and 5; m is selected from 5, 6, 7, 8, and 9; M1 is a bond or M’; R4 is hydrogen, unsubstituted C1-3 alkyl, or -(CH2)nQ, in which Q is -OH, -NHC(S)N(R)2, -NHC(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)R8, -NHC(=NR9)N(R)2, -NHC(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, heteroaryl or heterocycloalkyl; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group,; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl. For example, m is 5, 7, or 9. For example, Q is OH, -NHC(S)N(R)2, or -NHC(O)N(R)2. For example, Q is -N(R)C(O)R, or -N(R)S(O)2R.
[0283] In certain embodiments, a subset of compounds of Formula (AIII) includes those of Formula (AIII-B):
or its N-oxide, or a salt or isomer thereof in which all variables are as defined herein. For example, m is selected from 5, 6, 7, 8, and 9; R4 is hydrogen, unsubstituted C1-3 alkyl, or -(CH2)nQ, in which Q is H, -NHC(S)N(R)2, -NHC(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)R8, -NHC(=NR9)N(R)2, -NHC(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, heteroaryl or heterocycloalkyl; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl. For example, m is 5, 7, or 9. For example, Q is OH, -NHC(S)N(R)2, or -NHC(O)N(R)2. For example, Q is -N(R)C(O)R, or -N(R)S(O)2R. [0284] In certain embodiments, a subset of compounds of Formula (AIII) includes those of Formula (AIII-C):
or its N-oxide, or a salt or isomer thereof, wherein l is selected from 1, 2, 3, 4, and 5; M1 is a bond or M’; R4 is hydrogen, unsubstituted C1-3 alkyl, or -(CH2)nQ, in which n is 2, 3, or 4, and Q is OH, -NHC(S)N(R)2, -NHC(O)N(R)2, -N(R)C(O)R, -N(R)S(O)2R, -N(R)R8, -NHC(=NR9)N(R)2, -NHC(=CHR9)N(R)2, -OC(O)N(R)2, -N(R)C(O)OR, heteroaryl or heterocycloalkyl; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl.
[0285] In some embodiments, the compounds of Formula (AIII) are of Formula (AIII-D),
or their N-oxides, or salts or isomers thereof, wherein R4 is as described herein. [0286] In another embodiment, the compounds of Formula (AIII) are of Formula (AIII- E),
or their N-oxides, or salts or isomers thereof, wherein R4 is as described herein. [0287] In another embodiment, the compounds of Formula (AIII) are of Formula (AIII-F) or (AIII-G):
or their N-oxides, or salts or isomers thereof, wherein R4 is as described herein. [0288] In another embodiment, the compounds of Formula (AIII) are of Formula (AIII-H):
their N-oxides, or salts or isomers thereof,
wherein M is -C(O)O- or –OC(O)-, M” is C1-6 alkyl or C2-6 alkenyl, R2 and R3 are independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl, and n is selected from 2, 3, and 4. [0289] In a further embodiment, the compounds of Formula (AIII) are of Formula (AIII- I):
or their N-oxides, or salts or isomers thereof, wherein n is 2, 3, or 4; and m, R’, R”, and R2 through R6 are as described herein. For example, each of R2 and R3 may be independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl. [0290] In some embodiments, an ionizable amino lipid comprises a compound having structure:
(Compound 1). [0291] In some embodiments, an ionizable amino lipid comprises a compound having structure:
(Compound 2).
[0292] In a further embodiment, the compounds of Formula (AIII) are of Formula (AIII- J),
(AIII-J), or their N-oxides, or salts or isomers thereof, wherein l is selected from 1, 2, 3, 4, and 5; m is selected from 5, 6, 7, 8, and 9; M1 is a bond or M’; M and M’ are independently selected from -C(O)O-, -OC(O)-, -OC(O)-M”-C(O)O-, -C(O)N(R’)-, -P(O)(OR’)O-, -S-S-, an aryl group, and a heteroaryl group; and R2 and R3 are independently selected from the group consisting of H, C1-14 alkyl, and C2-14 alkenyl. For example, M” is C1-6 alkyl (e.g., C1-4 alkyl) or C2-6 alkenyl (e.g. C2-4 alkenyl). For example, R2 and R3 are independently selected from the group consisting of C5-14 alkyl and C5-14 alkenyl. [0293] In some embodiments, the ionizable amino lipids are one or more of the compounds described in U.S. Application Nos. 62/220,091, 62/252,316, 62/253,433, 62/266,460, 62/333,557, 62/382,740, 62/393,940, 62/471,937, 62/471,949, 62/475,140, and 62/475,166, and PCT Application No. PCT/US2016/052352. [0294] The central amine moiety of a lipid according to Formula (AIII), (AIII-A), (AIII- B), (AIII-C), (AIII-D), (AIII-E), (AIII-F), (AIII-G), (AIII-H), (AIII-I), or (AIII-J) may be protonated at a physiological pH. Thus, a lipid may have a positive or partial positive charge at physiological pH. Such amino lipids may be referred to as cationic lipids, ionizable lipids, cationic amino lipids, or ionizable amino lipids. Amino lipids may also be zwitterionic, i.e., neutral molecules having both a positive and a negative charge. Formula (AIV) [0295] In some embodiments, the ionizable amino lipids may be one or more of compounds of formula (AIV),
or salts or isomers thereof, wherein
t is 1 or 2; A1 and A2 are each independently selected from CH or N; Z is CH2 or absent wherein when Z is CH2, the dashed lines (1) and (2) each represent a single bond; and when Z is absent, the dashed lines (1) and (2) are both absent; R1, R2, R3, R4, and R5 are independently selected from the group consisting of C5-20 alkyl, C5-20 alkenyl, -R”MR’, -R*YR”, -YR”, and -R*OR”; RX1 and RX2 are each independently H or C1-3 alkyl; each M is independently selected from the group consisting of -C(O)O-, -OC(O)-, -OC(O)O-, -C(O)N(R’)-, -N(R’)C(O)-, -C(O)-, -C(S)-, -C(S)S-, -SC(S) -, -CH(OH)-, -P(O)(OR’)O-, -S(O)2-, -C(O)S-, -SC(O)-, an aryl group, and a heteroaryl group; M* is C1-C6 alkyl, W1 and W2 are each independently selected from the group consisting of -O- and -N(R6)-; each R6 is independently selected from the group consisting of H and C1-5 alkyl; X1, X2, and X3 are independently selected from the group consisting of a bond, -CH2-, -(CH2)2-, -CHR-, -CHY-, -C(O)-, -C(O)O-, -OC(O)-, -(CH2)n-C(O)-, -C(O)-(CH2)n-, -(CH2)n-C(O)O-, -OC(O)-(CH2)n-, -(CH2)n-OC(O)-, -C(O)O-(CH2)n-, -CH(OH)-, -C(S)-, and -CH(SH)-; each Y is independently a C3-6 carbocycle; each R* is independently selected from the group consisting of C1-12 alkyl and C2-12 alkenyl; each R is independently selected from the group consisting of C1-3 alkyl and a C3-6 carbocycle;
each R’ is independently selected from the group consisting of C1-12 alkyl, C2-12 alkenyl, and H; each R” is independently selected from the group consisting of C3-12 alkyl, C3-12 alkenyl and -R*MR’; and n is an integer from 1-6; wherein when ring
then i) at least one of X1, X2, and X3 is not -CH2-; and/or ii) at least one of R1, R2, R3, R4, and R5 is -R”MR’. [0296] In some embodiments, the compound is of any of formulae (AIVa)-(AIVh):
(AIVe), (AIVf), (AIVg), or (AIVh). [0297] In some embodiments, the ionizable amino lipid is
salt thereof. [0298] The central amine moiety of a lipid according to Formula (AIV), (AIVa), (AIVb), (AIVc), (AIVd), (AIVe), (AIVf), (AIVg), or (AIVh) may be protonated at a physiological pH. Thus, a lipid may have a positive or partial positive charge at physiological pH. Formula (AV) [0299] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, tautomer, or stereoisomer thereof, wherein: R1 is optionally substituted C1-C24 alkyl or optionally substituted C2-C24 alkenyl; R2 and R3 are each independently optionally substituted C1-C36 alkyl;
R4 and R5 are each independently optionally substituted C1-C6 alkyl, or R4 and R5 join, along with the N to which they are attached, to form a heterocyclyl or heteroaryl; L1, L2, and L3 are each independently optionally substituted C1-C I 8 alkylene; G1 is a direct bond, -(CH2)nO(C=O)-, -(CH2)n(C=O)O-, or -(C=O)-; G2 and G3 are each independently -(C=O)O- or -0(C=O)-; and n is an integer greater than 0. Formula (AVI) [0300] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, tautomer, or stereoisomer thereof, wherein: G1 is -N(R3)R4 or -OR5; R1 is optionally substituted branched, saturated or unsaturated C12-C36 alkyl; R2 is optionally substituted branched or unbranched, saturated or unsaturated C12- C36 alkyl when L is -C(=O)-; or R2 is optionally substituted branched or unbranched, saturated or unsaturated C4-C36 alkyl when L is C6-C12 alkylene, C6-C12 alkenylene, or C2- C6 alkynylene; R3 and R4 are each independently H, optionally substituted branched or unbranched, saturated or unsaturated C1-C6 alkyl; or R3 and R4 are each independently optionally substituted branched or unbranched, saturated or unsaturated C1-C6 alkyl when L is C6- C12 alkylene, C6-C12 alkenylene, or C2-C6 alkynylene; or R3 and R4, together with the nitrogen to which they are attached, join to form a heterocyclyl; R5 is H or optionally substituted C1-C6 alkyl; L is -C(=O)-, C6-C 12 alkylene, C6-C12 alkenylene, or C2-C6 alkynylene; and n is an integer from 1 to 12.
Formula (AVII) [0301] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt thereof, wherein: each Rla is independently hydrogen, Rlc, or Rld; each Rlb is independently Rlc or Rld; each R1c is independently –[CH2]2C(O)X1R3; each Rld Is independently -C(O)R4; each R2 is independently -[C(R2a)2]cR2b; each R2a is independently hydrogen or C1-C6 alkyl;
each R3 and R4 is independently C6-C30 aliphatic; each I.3 is independently C1-C10 alkylene; each B is independently hydrogen or an ionizable nitrogen-containing group; each X1 is independently a covalent bond or O; each a is independently an integer of 1-10; each b is independently an integer of 1-10; and each c is independently an integer of 1-10. Formula (AVIII) [0302] In some embodiments, the lipid nanoparticle comprises a lipid having the structure: (AVIII), or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein: X is N, and Y is absent; or X is CR, and Y is NR;
L1 is -O(C-O)R1, -(C=O)OR1, -C(=O)R1, -OR1, -S(O)xR1, -S-SR1, -C(=O)SR1, - SC(=O)R1, -NRaC(=O)R1, -C(=O)NRbRc, -NRaC(=O)NRbRc, -OC(=O)NRbRc, or - NRaC(=O)OR1; L2 is -O(C=O)R2, -(C=O)OR2, -C(=O)R2, -OR2, -S(O)xR2, -S-SR2, -C(=O)SR2, - SC(=O)R2, -NRdC(=O)R2, -C(=O)NReRf, -NRdC(=O)NReRf, -OC(=O)NReRf; - NRdC(=O)OR2 or a direct bond to R2;
G1 and G2 are each independently C2-C12 alkylene or C2-C12 alkenylene; G3 is C1-C24 alkylene, C2-C24 alkenylene, C1-C24 heteroalkylene or C2- C24 heteroalkenylene when X is CR, and Y is NR; and G3 is C1-C24 heteroalkylene or C2- C24 heteroalkenylene when X is N, and Y is absent; Ra, Rb, Rd and Re are each independently H or C1-C12 alkyl or C1-C12 alkenyl; Rc and Rf are each independently C1-C12 alkyl or C2-C12 alkenyl; each R is independently H or C1-C12 alkyl; R1, R2 and R3 are each independently C1-C24 alkyl or C2-C24 alkenyl; and x is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, heteroalkylene and heteroalkenylene is independently substituted or unsubstituted unless otherwise specified. Formula (AIX) [0303] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, tautomer, prodrug or stereoisomer thereof, wherein: L1 and L2 are each independently -0(C=0)-, -(C=0)0-, -C(=0)-, -0-, -S(0)x-s -S-S-, - C(=0)S-, -SC(=0)-, -NRaC(=0)-, -C(=0)NRa-, -NRaC(=0)NRa-, -OC(=0)NRa-, -NRaC(=0)0- or a direct bond; G1 is C,-C2 alkylene, -(C=0)-, -0(C=0)-, -SC(=0)-, -NRaC(=0)- or a direct bond; G2 is -C(0)-, -(CO)O-, -C(=0)S-, -C(=0)NRa- or a direct bond;
G3 is C1-C6 alkylene; Ra is H or C1-C12 alkyl; Rl a and Rlb are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) Rla is H or C1-C12 alkyl, and RI b together with the carbon atom to which it is bound is taken together with an adjacent Rl b and the carbon atom to which it is bound to form a carbon- carbon double bond; R2a and R2b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R2a is H or C1-C12 alkyl, and R2b together with the carbon atom to which it is bound is taken together with an adjacent R2b and the carbon atom to which it is bound to form a carbon- carbon double bond; R3a and R3b are, at each occurrence, independently either (a): H or C1-C12 alkyl; or (b) R3a is H or C1-C12 alkyl, and R3b together with the carbon atom to which it is bound is taken together with an adjacent R and the carbon atom to which it is bound to form a carbon-carbon double bond; R4A and R4B are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R4A is H or C1-C12 alkyl, and R4B together with the carbon atom to which it is bound is taken together with an adjacent R4B and the carbon atom to which it is bound to form a carbon- carbon double bond; R5 and R6 are each independently H or methyl; R7 is H or C,-C20 alkyl; R8 is OH, -N(R9)(C=0)R10, -(C=0)NR9R10, -NR9R10, -(C=0)0R" 1 or -0(C=0)R", provided that G3 is C4-C6 alkylene when R8 is -NR9R10, R9 and R10 are each independently H or C1-C12 alkyl; R" is aralkyl; a, b, c and d are each independently an integer from 1 to 24; and x is 0, 1 or 2, wherein each alkyl, alkylene and aralkyl is optionally substituted. Formula (AX) [0304] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein:
X and X' are each independently N or CR; Y and Y' are each independently absent, -O(C=O)-, -(C=O)O- or NR, provided that: a) Y is absent when X is N; b) Y' is absent when X' is N; c) Y is -O(C=O)-, -(C=O)O- or NR when X is CR; and d) Y' is -O(C=O)-, -(C=O)O- or NR when X' is CR, L1 and L1' are each independently -O(C=O)R', -(C=O)OR' , -C(=O)R', -OR1, -S(O)zR', -S-SR1, -C(=O)SR', -SC(=O)R', -NRaC(=O)R', -C(=O)NRbRc, -NRaC(=O)NRbRc, - OC(=O)NRbRc or -NRaC(=O)OR'; L2 and L2’ are each independently -O(C=O)R2, -(C=O)OR2, -C(=O)R2, -OR2, - S(O)zR2, -S-SR2, -C(=O)SR2, -SC(=O)R2, -NRdC(=O)R2, -C(=O)NReRf, -NRdC(=O)NReRf, - OC(=O)NReRf;-NRdC(=O)OR2 or a direct bond to R2; G1. G1’, G2 and G2’ are each independently C2-Ci2 alkylene or C2-C12 alkenylene; G is C2-C24 heteroalkylene or C2-C24 heteroalkenylene; Ra, Rb, Rd and Re are, at each occurrence, independently H, C1-C12 alkyl or C2- C12 alkenyl; Rc and Rf are, at each occurrence, independently C1-C12 alkyl or C2-C12 alkenyl; R is, at each occurrence, independently H or C1-C12 alkyl; R1 and R2 are, at each occurrence, independently branched C6-C24 alkyl or branched C6-C24 alkenyl; z is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, heteroalkylene and heteroalkenylene is independently substituted or unsubstituted unless otherwise specified. Formula (AXI) [0305] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein: L1 is -O(C=O)R1, -(C=O)OR1, -C(=O)R1, -OR1, -S(O)xR1, -S-SR1, - C(=O)SR1, -SC(=O)R1, -NRaC(=O)R1, -C(=O)NRbRc, -NRaC(=O)NRbRc, -OC(=O)NRbRc or - NRaC(=O)OR1;
L2 is -O(C=O)R2, -(C=O)OR2, -C(=O)R2, -OR2, -S(O)xR2, -S-SR2, - C(=O)SR2, -SC(=O)R2, -NRdC(=O)R2, -C(=O)NReRf, -NRdC(=O)NReRf, -OC(=O)NReRf; -NRdC(=O)OR2 or a direct bond to R2; G1 and G2 are each independently C2-C12 alkylene or C2-C12 alkenylene; G3 is C1-C24 alkylene, C2-C24 alkenylene, C3-C8 cycloalkylene or C3- C8 cycloalkenylene; Ra, Rb, Rd and Re are each independently H or C1-C12 alkyl or C1-C12 alkenyl; Rc and Rf are each independently C1-C12 alkyl or C2-C12 alkenyl; R1 and R2 are each independently branched C6-C24 alkyl or branched C6- C24 alkenyl; R3 is -N(R4)R5; R4 is C1-C12 alkyl; R5 is substituted C1-C12 alkyl; and x is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, cycloalkylene, cycloalkenylene, aryl and aralkyl is independently substituted or unsubstituted unless otherwise specified. [0306] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein: L1 is -O(C=O)R1, -(C=O)OR1, -C(=O)R1, -OR1, -S(O)xR1, -S-SR1, -C(=O)SR1, - SC(=O)R1, -NRaC(=O)R1, -C(=O)NRbRc, -NRaC(=O)NRbRc, -OC(=O)NRbRc or - NRaC(=O)OR1; L2 is -O(C=O)R2, -(C=O)OR2, -C(=O)R2, -OR2, -S(O)xR2, -S-SR2, -C(=O)SR2, - SC(=O)R2, -NRdC(=O)R2, -C(=O)NReRf, -NRdC(=O)NReRf, -OC(=O)NReRf;-NRdC(=O)OR2 or a direct bond to R2; G1a and G2b are each independently C2-C12 alkylene or C2-C12 alkenylene; G1b and G2b are each independently C1-C12 alkylene or C2-C12 alkenylene; G3 is C1-C24 alkylene, C2-C24 alkenylene, C3-C8 cycloalkylene or C3- C8 cycloalkenylene; Ra, Rb, Rd and Re are each independently H or C1-C12 alkyl or C2-C12 alkenyl;
Rc and Rf are each independently C1-C12 alkyl or C2-C12 alkenyl; R1 and R2 are each independently branched C6-C24 alkyl or branched C6- C24 alkenyl; R3a is -C(=O)N(R4a)R5a or -C(=O)OR6; R3b is -NR4bC(=O)R5b; R4a is C1-C12 alkyl; R4b is H, C1-C12 alkyl or C2-C12 alkenyl; R5a is H, C1-C8 alkyl or C2-C8 alkenyl; R5b is C2-C12 alkyl or C2-C12 alkenyl when R4b is H; or R5b is C1-C12 alkyl or C2- C12 alkenyl when R4b is C1-C12 alkyl or C2-C12 alkenyl; R6 is H, aryl or aralkyl; and x is 0, 1 or 2, and wherein each alkyl, alkenyl, alkylene, alkenylene, cycloalkylene, cycloalkenylene, aryl and aralkyl is independently substituted or unsubstituted. Formula (AXII) [0307] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein: G1 is -OH, - R3R4, -(C=0) R5 or - R3(C=0)R5; G2 is -CH2- or -(C=0)-; R is, at each occurrence, independently H or OH; R1 and R2 are each independently optionally substituted branched, saturated or unsaturated C12-C36 alkyl; R3 and R4 are each independently H or optionally substituted straight or branched, saturated or unsaturated Ci-C6 alkyl; R5 is optionally substituted straight or branched, saturated or unsaturated Ci-C6 alkyl; and n is an integer from 2 to 6.
Formula (AXIII) [0308] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein: one of G1 or G2 is, at each occurrence, -O(C=O)-, -(C=O)O-, -C(=O)-, -O-, -S(O) , -S- S-, -C(=O)S-, SC(=O)-, -N(Ra)C(=O)-, -C(=O)N(Ra)-, -N(Ra)C(=O)N(Ra)-, -OC(=O)N(Ra)- or -N(Ra)C(=O)O-, and the other of G1 or G2 is, at each occurrence, -O(C=O)-, -(C=O)O-, - C(=O)-, -O-, -S(O) , -S-S-, -C(=O)S-, -SC(=O)-, -N(Ra)C(=O)-, -C(=O)N(Ra)-, - N(Ra)C(=O)N(Ra)-, -OC(=O)N(Ra)- or -N(Ra)C(=O)O- or a direct bond; L is, at each occurrence, ~O(C=O)-, wherein ~ represents a covalent bond to X; X is CRa; Z is alkyl, cycloalkyl or a monovalent moiety comprising at least one polar functional group when n is 1; or Z is alkylene, cycloalkylene or a polyvalent moiety comprising at least one polar functional group when n is greater than 1; Ra is, at each occurrence, independently H, C1-C12 alkyl, C1-C12 hydroxylalkyl, C1- C12 aminoalkyl, C1-C12 alkylaminylalkyl, C1-C12 alkoxyalkyl, C1-C12 alkoxycarbonyl, C1- C12 alkylcarbonyloxy, C1-C12 alkylcarbonyloxyalkyl or C1-C12 alkylcarbonyl; R is, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R together with the carbon atom to which it is bound is taken together with an adjacent R and the carbon atom to which it is bound to form a carbon-carbon double bond; R1 and R2 have, at each occurrence, the following structure, respectively:
a1 and a2 are, at each occurrence, independently an integer from 3 to 12; b1 and b2 are, at each occurrence, independently 0 or 1; c1 and c2 are, at each occurrence, independently an integer from 5 to 10; d1 and d2 are, at each occurrence, independently an integer from 5 to 10; y is, at each occurrence, independently an integer from 0 to 2; and n is an integer from 1 to 6, wherein each alkyl, alkylene, hydroxylalkyl, aminoalkyl, alkylaminylalkyl, alkoxyalkyl, alkoxycarbonyl, alkylcarbonyloxy, alkylcarbonyloxyalkyl and alkylcarbonyl is optionally substituted with one or more substituent. Formula (AXIV) [0309] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, prodrug or stereoisomer thereof, wherein:
RaC(=O)-, -C(=O) Ra-, , RaC(=O) Ra-, -OC(=O) Ra- or -NRaC(=O)O- or a direct bond; G1 and G2 are each independently unsubstituted C1-C12 alkylene or C1-C12 alkenylene; G3 is C1-C24 alkylene, C1-C24 alkenylene, C3-C8 cycloalkylene, C3-C8 cycloalkenylene; Ra is H or C1-C12 alkyl; R1 and R2 are each independently C6-C24 alkyl or C6-C24 alkenyl; R3 is H, OR5, CN, -C(=O)OR4, -OC(=O)R4 or - R5C(=O)R4; R4 is C1-C12 alkyl; R5 is H or C1-C6 alkyl; and x is 0, 1 or 2.
Formula (AXV) [0310] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, tautomer, prodrug or stereoisomer thereof, wherein: L1 and L2 are each independently -0(C=0)-, -(C=0)0-, -C(=0)-, -0-, -S(0)x-, -S-S-, - C(=0)S-, -SC(=0)-, - RaC(=0)-, -C(=0) Ra-, - RaC(=0) Ra-, -OC(=0) Ra-, - RaC(=0)0- or a direct bond; G1 is Ci-C2 alkylene, - (C=0)-, -0(C=0)-, -SC(=0)-, - RaC(=0)- or a direct bond: G2 is -C(=0)-, -(C=0)0-, -C(=0)S-, -C(=0)NRa- or a direct bond; G3 is C1-C6 alkylene; Ra is H or C1-C12 alkyl; Rla and Rlb are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) Rla is H or C1-C12 alkyl, and Rlb together with the carbon atom to which it is bound is taken together with an adjacent Rlb and the carbon atom to which it is bound to form a carbon- carbon double bond; R2a and R2b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R2a is H or C1-C12 alkyl, and R2b together with the carbon atom to which it is bound is taken together with an adjacent R2b and the carbon atom to which it is bound to form a carbon- carbon double bond; R3a and R3b are, at each occurrence, independently either (a): H or C1-C12 alkyl; or (b) R3a is H or C1-C12 alkyl, and R3b together with the carbon atom to which it is bound is taken together with an adjacent R and the carbon atom to which it is bound to form a carbon-carbon double bond; R4a and R4b are, at each occurrence, independently either: (a) H or C1-C12 alkyl; or (b) R4a is H or C1-C12 alkyl, and R4b together with the carbon atom to which it is bound is taken
together with an adjacent R4b and the carbon atom to which it is bound to form a carbon- carbon double bond; R5 and R6 are each independently H or methyl; R7 is C4-C20 alkyl; R8 and R9 are each independently C1-C12 alkyl; or R8 and R9, together with the nitrogen atom to which they are attached, form a 5, 6 or 7-membered heterocyclic ring; a, b, c and d are each independently an integer from 1 to 24; and x is 0, 1 or 2. Formula (AXVI) [0311] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt, tautomer, prodrug or stereoisomer thereof, wherein: L1 and L2 are each independently -0(C=0)-, -(C=0)0- or a carbon- carbon double bond; Rla and Rlb are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) Rla is H or C1-C12 alkyl, and Rlb together with the carbon atom to which it is bound is taken together with an adjacent Rlb and the carbon atom to which it is bound to form a carbon- carbon double bond; R2a and R2b are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) R2a is H or C1-C12 alkyl, and R2b together with the carbon atom to which it is bound is taken together with an adjacent R2b and the carbon atom to which it is bound to form a carbon- carbon double bond; R3a and R3b are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) R3a is H or C1-C12 alkyl, and R3b together with the carbon atom to which it is bound is taken together with an adjacent R3b and the carbon atom to which it is bound to form a carbon- carbon double bond;
R4a and R4b are, at each occurrence, independently either (a) H or C1-C12 alkyl, or (b) R4a is H or C1-C12 alkyl, and R4b together with the carbon atom to which it is bound is taken together with an adjacent R4b and the carbon atom to which it is bound to form a carbon- carbon double bond; R5 and R6 are each independently methyl or cycloalkyl; R7 is, at each occurrence, independently H or C1-C12 alkyl; R8 and R9 are each independently unsubstituted C1-C12 alkyl; or R8 and R9, together with the nitrogen atom to which they are attached, form a 5, 6 or 7- membered heterocyclic ring comprising one nitrogen atom; a and d are each independently an integer from 0 to 24; b and c are each independently an integer from 1 to 24; and e is 1 or 2, provided that: at least one of Rla, R2a, R3a or R4a is C1-C12 alkyl, or at least one of L1 or L2 is - 0(C=0)- or -(C=0)0-; and Rla and Rlb are not isopropyl when a is 6 or n-butyl when a is 8. Formula (AXVII) [0312] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt thereof, wherein R1 and R2 are the same or different, each a linear or branched alkyl with 1-9 carbons, or as alkenyl or alkynyl with 2 to 11 carbon atoms, L1 and L2 are the same or different, each a linear alkyl having 5 to 18 carbon atoms, or form a heterocycle with N, X1 is a bond, or is -CG-G- whereby L2-CO-O-R2 is formed, X2 is S or O, L3 is a bond or a lower alkyl, or form a heterocycle with N, R3 is a lower alkyl, and R4 and R5 are the same or different, each a lower alkyl.
Compounds (A1)-(A11) [0313] In some embodiments, the lipid nanoparticle comprises an ionizable lipid having the structure:
or a pharmaceutically acceptable salt thereof. [0314] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
pharmaceutically acceptable salt thereof. [0315] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
or a pharmaceutically acceptable salt thereof.
[0316] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
(A4), or a pharmaceutically acceptable salt thereof. [0317] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
pharmaceutically acceptable salt thereof. [0318] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
(A6), or a pharmaceutically acceptable salt thereof.
[0319] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
(A7), or a pharmaceutically acceptable salt thereof. [0320] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
pharmaceutically acceptable salt thereof. [0321] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
pharmaceutically acceptable salt thereof. [0322] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
(A10), or a pharmaceutically acceptable salt thereof.
[0323] In some embodiments, the lipid nanoparticle comprises a lipid having the structure:
pharmaceutically acceptable salt thereof. Non-cationic lipids [0324] In certain embodiments, the lipid nanoparticles described herein comprise one or more non-cationic lipids. Non-cationic lipids may be phospholipids. [0325] In some embodiments, the lipid nanoparticle comprises 5-25 mol% non-cationic lipid. For example, the lipid nanoparticle may comprise 5-20 mol%, 5-15 mol%, 5-10 mol%, 10-25 mol%, 10-20 mol%, 10-25 mol%, 15-25 mol%, 15-20 mol%, or 20-25 mol% non- cationic lipid. In some embodiments, the lipid nanoparticle comprises 5 mol%, 10 mol%, 15 mol%, 20 mol%, or 25 mol% non-cationic lipid. [0326] In some embodiments, a non-cationic lipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2- dilinoleoyl-sn-glycero-3-phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero- phosphocholine (DMPC), 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), l,2- dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-diundecanoyl-sn-glycero- phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-di- O-octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleoyl-2 cholesterylhemisuccinoyl-sn-glycero-3-phosphocholine (OChemsPC), 1-hexadecyl-sn- glycero-3-phosphocholine (C16 Lyso PC), 1,2-dilinolenoyl-sn-glycero-3-phosphocholine,1,2- diarachidonoyl-sn-glycero-3-phosphocholine, 1,2-didocosahexaenoyl-sn-glycero-3- phosphocholine, 1,2-diphytanoyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE), 1,2- distearoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleoyl-sn-glycero-3- phosphoethanolamine, 1,2-dilinolenoyl-sn-glycero-3-phosphoethanolamine, 1,2- diarachidonoyl-sn-glycero-3-phosphoethanolamine, 1,2-didocosahexaenoyl-sn-glycero-3- phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-rac-(1-glycerol) sodium salt (DOPG), sphingomyelin, or mixtures thereof.
[0327] In some embodiments, the lipid nanoparticle comprises 5 – 15 mol%, 5 – 10 mol%, or 10 – 15 mol% DSPC. For example, the lipid nanoparticle may comprise 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 mol% DSPC. [0328] In certain embodiments, the lipid composition of the lipid nanoparticle composition disclosed herein can comprise one or more phospholipids, for example, one or more saturated or (poly)unsaturated phospholipids or a combination thereof. In general, phospholipids comprise a phospholipid moiety and one or more fatty acid moieties. [0329] A phospholipid moiety can be selected, for example, from the non-limiting group consisting of phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl glycerol, phosphatidyl serine, phosphatidic acid, 2-lysophosphatidyl choline, and a sphingomyelin. [0330] A fatty acid moiety can be selected, for example, from the non-limiting group consisting of lauric acid, myristic acid, myristoleic acid, palmitic acid, palmitoleic acid, stearic acid, oleic acid, linoleic acid, alpha-linolenic acid, erucic acid, phytanoic acid, arachidic acid, arachidonic acid, eicosapentaenoic acid, behenic acid, docosapentaenoic acid, and docosahexaenoic acid. [0331] Particular phospholipids can facilitate fusion to a membrane. For example, a cationic phospholipid can interact with one or more negatively charged phospholipids of a membrane (e.g., a cellular or intracellular membrane). Fusion of a phospholipid to a membrane can allow one or more elements (e.g., a therapeutic agent) of a lipid-containing composition (e.g., LNPs) to pass through the membrane permitting, e.g., delivery of the one or more elements to a target tissue. [0332] Non-natural phospholipid species including natural species with modifications and substitutions including branching, oxidation, cyclization, and alkynes are also contemplated. For example, a phospholipid can be functionalized with or cross-linked to one or more alkynes (e.g., an alkenyl group in which one or more double bonds is replaced with a triple bond). Under appropriate reaction conditions, an alkyne group can undergo a copper- catalyzed cycloaddition upon exposure to an azide. Such reactions can be useful in functionalizing a lipid bilayer of a nanoparticle composition to facilitate membrane permeation or cellular recognition or in conjugating a nanoparticle composition to a useful component such as a targeting or imaging moiety (e.g., a dye). [0333] Phospholipids include, but are not limited to, glycerophospholipids such as phosphatidylcholines, phosphatidylethanolamines, phosphatidylserines, phosphatidylinositols, phosphatidy glycerols, and phosphatidic acids. Phospholipids also include phosphosphingolipid, such as sphingomyelin.
[0334] In some embodiments, a phospholipid comprises 1,2-distearoyl-sn-glycero-3- phosphocholine (DSPC), 1,2-Distearoyl-sn-glycero-3-phosphoethanolamine (DSPE), 1,2- dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 1,2-dilinoleoyl-sn-glycero-3- phosphocholine (DLPC), 1,2-dimyristoyl-sn-gly cero-phosphocholine (DMPC), 1,2-dioleoyl- sn-glycero-3-phosphocholine (DOPC), l,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-diundecanoyl-sn-glycero-phosphocholine (DUPC), 1-palmitoyl-2-oleoyl-sn-glycero-3- phosphocholine (POPC), 1,2-di-O-octadecenyl-sn-glycero-3-phosphocholine (18:0 Diether PC), 1-oleoyl-2 cholesterylhemisuccinoyl-sn-glycero-3-phosphocholine (OChemsPC), 1- hexadecyl-sn-glycero-3-phosphocholine (C16 Lyso PC), 1,2-dilinolenoyl-sn-glycero-3- phosphocholine,1,2-diarachidonoyl-sn-glycero-3-phosphocholine, 1,2-didocosahexaenoyl-sn- glycero-3-phosphocholine, 1,2-diphytanoyl-sn-glycero-3-phosphoethanolamine (ME 16.0 PE), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine, 1,2-dilinoleoyl-sn-glycero-3- phosphoethanolamine, 1,2-dilinolenoyl-sn-glycero-3-phosphoethanolamine, 1,2- diarachidonoyl-sn-glycero-3-phosphoethanolamine, 1,2-didocosahexaenoyl-sn-glycero-3- phosphoethanolamine, 1,2-dioleoyl-sn-glycero-3-phospho-rac-(1-glycerol) sodium salt (DOPG), sphingomyelin, or mixtures thereof. Formula (HI) [0335] In certain embodiments, a phospholipid is an analog or variant of DSPC. In certain embodiments, a phospholipid is a compound of Formula (HI):
or a salt thereof, wherein: each R1 is independently optionally substituted alkyl; or optionally two R1 are joined together with the intervening atoms to form optionally substituted monocyclic carbocyclyl or optionally substituted monocyclic heterocyclyl; or optionally three R1 are joined together with the intervening atoms to form optionally substituted bicyclic carbocyclyl or optionally substitute bicyclic heterocyclyl; n is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; m is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; A is of the formula:
each instance of L2 is independently a bond or optionally substituted C1-6 alkylene, wherein one methylene unit of the optionally substituted C1-6 alkylene is optionally replaced with O, N(RN), S, C(O), C(O)N(RN), NRNC(O), C(O)O, OC(O), OC(O)O, OC(O)N(RN), - NRNC(O)O, or NRNC(O)N(RN); each instance of R2 is independently optionally substituted C1-30 alkyl, optionally substituted C1-30 alkenyl, or optionally substituted C1-30 alkynyl; optionally wherein one or more methylene units of R2 are independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, N(RN), O, S, C(O), C(O)N(RN), NRNC(O), - NRNC(O)N(RN), C(O)O, OC(O), OC(O)O, OC(O)N(RN), NRNC(O)O, C(O)S, SC(O), - C(=NRN), C(=NRN)N(RN), NRNC(=NRN), NRNC(=NRN)N(RN), C(S), C(S)N(RN), NRNC(S), NRNC(S)N(RN), S(O), OS(O), S(O)O, OS(O)O, OS(O)2, S(O)2O, OS(O)2O, N(RN)S(O), - S(O)N(RN), N(RN)S(O)N(RN), OS(O)N(RN), N(RN)S(O)O, S(O)2, N(RN)S(O)2, S(O)2N(RN), N(RN)S(O)2N(RN), OS(O)2N(RN), or N(RN)S(O)2O; each instance of RN is independently hydrogen, optionally substituted alkyl, or a nitrogen protecting group; Ring B is optionally substituted carbocyclyl, optionally substituted heterocyclyl, optionally substituted aryl, or optionally substituted heteroaryl; and p is 1 or 2. [0336] In certain embodiments, the compound is not of the formula:
, wherein each instance of R2 is independently unsubstituted alkyl, unsubstituted alkenyl, or unsubstituted alkynyl. [0337] In some embodiments, the phospholipids may be one or more of the phospholipids described in PCT Application No. PCT/US2018/037922. [0338] In some embodiments, the lipid nanoparticle comprises a molar ratio of 5-25% non-cationic lipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15-20%, 20-25%, or 25-30% non-cationic lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% non-cationic lipid.
[0339] In some embodiments, the lipid nanoparticle comprises a molar ratio of 5-25% phospholipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 5-30%, 5-15%, 5-10%, 10-25%, 10-20%, 10-25%, 15-25%, 15- 20%, 20-25%, or 25-30% phospholipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 5%, 10%, 15%, 20%, 25%, or 30% phospholipid lipid. Structural lipids [0340] The lipid composition of a pharmaceutical composition disclosed herein can comprise one or more structural lipids. As used herein, the term “structural lipid” includes sterols and also to lipids containing sterol moieties. [0341] Incorporation of structural lipids in the lipid nanoparticle may help mitigate aggregation of other lipids in the particle. Structural lipids can be selected from the group including but not limited to, cholesterol, fecosterol, sitosterol, ergosterol, campesterol, stigmasterol, brassicasterol, tomatidine, tomatine, ursolic acid, alpha-tocopherol, hopanoids, phytosterols, steroids, and mixtures thereof. In some embodiments, the structural lipid is a sterol. As defined herein, “sterols” are a subgroup of steroids consisting of steroid alcohols. In certain embodiments, the structural lipid is a steroid. In certain embodiments, the structural lipid is cholesterol. In certain embodiments, the structural lipid is an analog of cholesterol. In certain embodiments, the structural lipid is alpha-tocopherol. [0342] In some embodiments, the structural lipids may be one or more of the structural lipids described in U.S. Application No.16/493,814. [0343] In some embodiments, the lipid nanoparticle comprises a molar ratio of 25-55% structural lipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 10- 55%, 25-50%, 25-45%, 25-40%, 25-35%, 25-30%, 30- 55%, 30-50%, 30-45%, 30-40%, 30-35%, 35-55%, 35-50%, 35-45%, 35-40%, 40-55%, 40- 50%, 40-45%, 45-55%, 45-50%, or 50-55% structural lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or 55% structural lipid. [0344] In some embodiments, the lipid nanoparticle comprises 30-45 mol% sterol, optionally 35-40 mol%, for example, 30-31 mol%, 31-32 mol%, 32-33 mol%, 33-34 mol%, 35-35 mol%, 35-36 mol%, 36-37 mol%, 38-38 mol%, 38-39 mol%, or 39-40 mol%. In some embodiments, the lipid nanoparticle comprises 25-55 mol% sterol. For example, the lipid nanoparticle may comprise 25-50 mol%, 25-45 mol%, 25-40 mol%, 25-35 mol%, 25-30 mol%, 30-55 mol%, 30-50 mol%, 30-45 mol%, 30-40 mol%, 30-35 mol%, 35-55 mol%, 35-
50 mol%, 35-45 mol%, 35-40 mol%, 40-55 mol%, 40-50 mol%, 40-45 mol%, 45-55 mol%, 45-50 mol%, or 50-55 mol% sterol. In some embodiments, the lipid nanoparticle comprises 25 mol%, 30 mol%, 35 mol%, 40 mol%, 45 mol%, 50 mol%, or 55 mol% sterol. [0345] In some embodiments, the lipid nanoparticle comprises 35 – 40 mol% cholesterol. For example, the lipid nanoparticle may comprise 35, 35.5, 36, 36.5, 37, 37.5, 38, 38.5, 39, 39.5, or 40 mol% cholesterol. Polyethylene glycol (PEG)-Lipids [0346] The lipid composition of a pharmaceutical composition disclosed herein can comprise one or more polyethylene glycol (PEG) lipids. [0347] As used herein, the term “PEG-lipid” or “PEG-modified lipid” refers to polyethylene glycol (PEG)-modified lipids. Non-limiting examples of PEG-lipids include PEG-modified phosphatidylethanolamine and phosphatidic acid, PEG-ceramide conjugates (e.g., PEG-CerC14 or PEG-CerC20), PEG-modified dialkylamines, and PEG-modified 1,2- diacyloxypropan-3-amines. Such lipids are also referred to as PEGylated lipids. For example, a PEG lipid can be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid. [0348] In some embodiments, the PEG-lipid includes, but not limited to 1,2-dimyristoyl- sn-glycerol methoxypolyethylene glycol (PEG-DMG), 1,2-distearoyl-sn-glycero-3- phosphoethanolamine-N-[amino(polyethylene glycol)] (PEG-DSPE), PEG-disteryl glycerol (PEG-DSG), PEG-dipalmetoleyl, PEG-dioleyl, PEG-distearyl, PEG-diacylglycamide (PEG- DAG), PEG-dipalmitoyl phosphatidylethanolamine (PEG-DPPE), or PEG-l,2- dimyristyloxlpropyl-3-amine (PEG-c-DMA). [0349] In some embodiments, the PEG-lipid is selected from the group consisting of a PEG-modified phosphatidylethanolamine, a PEG-modified phosphatidic acid, a PEG- modified ceramide, a PEG-modified dialkylamine, a PEG-modified diacylglycerol, a PEG- modified dialkylglycerol, and mixtures thereof. In some embodiments, the PEG-modified lipid is PEG-DMG, PEG-c-DOMG (also referred to as PEG-DOMG), PEG-DSG, and/or PEG-DPG. [0350] In some embodiments, the lipid moiety of the PEG-lipids includes those having lengths of from about C14 to about C22, preferably from about C14 to about C16. In some embodiments, a PEG moiety, for example an mPEG-NH2, has a size of about 1000, 2000, 5000, 10,000, 15,000 or 20,000 daltons. In some embodiments, the PEG-lipid is PEG2k- DMG.
[0351] In some embodiments, the lipid nanoparticles described herein can comprise a PEG lipid which is a non-diffusible PEG. Non-limiting examples of non-diffusible PEGs include PEG-DSG and PEG-DSPE. [0352] PEG-lipids are known in the art, such as those described in U.S. Patent No. 8158601 and International Publ. No. WO 2015/130584 A2, which are incorporated herein by reference in their entirety. [0353] In general, some of the other lipid components (e.g., PEG lipids) of various formulae described herein may be synthesized as described International Patent Application No. PCT/US2016/000129, filed December 10, 2016, entitled “Compositions and Methods for Delivery of Therapeutic Agents,” which is incorporated by reference in its entirety. [0354] The lipid component of a lipid nanoparticle composition may include one or more molecules comprising polyethylene glycol, such as PEG or PEG-modified lipids. Such species may be alternately referred to as PEGylated lipids. A PEG lipid is a lipid modified with polyethylene glycol. A PEG lipid may be selected from the non-limiting group including PEG-modified phosphatidylethanolamines, PEG-modified phosphatidic acids, PEG-modified ceramides, PEG-modified dialkylamines, PEG-modified diacylglycerols, PEG-modified dialkylglycerols, and mixtures thereof. For example, a PEG lipid may be PEG-c-DOMG, PEG-DMG, PEG-DLPE, PEG-DMPE, PEG-DPPC, or a PEG-DSPE lipid. [0355] In some embodiments the PEG-modified lipids are a modified form of PEG DMG. PEG-DMG has the following structure:
[0356] In some embodiments, PEG lipids can be PEGylated lipids described in International Publication No. WO2012099755, the contents of which is herein incorporated by reference in its entirety. Any of these exemplary PEG lipids described herein may be modified to comprise a hydroxyl group on the PEG chain. In certain embodiments, the PEG lipid is a PEG-OH lipid. As generally defined herein, a “PEG-OH lipid” (also referred to herein as “hydroxy-PEGylated lipid”) is a PEGylated lipid having one or more hydroxyl (– OH) groups on the lipid. In certain embodiments, the PEG-OH lipid includes one or more hydroxyl groups on the PEG chain. In certain embodiments, a PEG-OH or hydroxy- PEGylated lipid comprises an –OH group at the terminus of the PEG chain. Each possibility represents a separate embodiment.
Formula (PI) [0357] In certain embodiments, a PEG lipid is a compound of Formula (PI):
or salts thereof, wherein: R3 is –ORO; RO is hydrogen, optionally substituted alkyl, or an oxygen protecting group; r is an integer between 1 and 100, inclusive; L1 is optionally substituted C1-10 alkylene, wherein at least one methylene of the optionally substituted C1-10 alkylene is independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, O, N(RN), S, C(O), C(O)N(RN), NRNC(O), C(O)O, - OC(O), OC(O)O, OC(O)N(RN), NRNC(O)O, or NRNC(O)N(RN); D is a moiety obtained by click chemistry or a moiety cleavable under physiological conditions; m is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; A is of the formula:
each instance of L2 is independently a bond or optionally substituted C1-6 alkylene, wherein one methylene unit of the optionally substituted C1-6 alkylene is optionally replaced with O, N(RN), S, C(O), C(O)N(RN), NRNC(O), C(O)O, OC(O), OC(O)O, OC(O)N(RN), - NRNC(O)O, or NRNC(O)N(RN); each instance of R2 is independently optionally substituted C1-30 alkyl, optionally substituted C1-30 alkenyl, or optionally substituted C1-30 alkynyl; optionally wherein one or more methylene units of R2 are independently replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, N(RN), O, S, C(O), C(O)N(RN), NRNC(O), - NRNC(O)N(RN), C(O)O, OC(O), OC(O)O, OC(O)N(RN), NRNC(O)O, C(O)S, SC(O), - C(=NRN), C(=NRN)N(RN), NRNC(=NRN), NRNC(=NRN)N(RN), C(S), C(S)N(RN), NRNC(S), NRNC(S)N(RN), S(O) , OS(O), S(O)O, OS(O)O, OS(O)2, S(O)2O, OS(O)2O, N(RN)S(O), - S(O)N(RN), N(RN)S(O)N(RN), OS(O)N(RN), N(RN)S(O)O, S(O)2, N(RN)S(O)2, S(O)2N(RN), N(RN)S(O)2N(RN), OS(O)2N(RN), or N(RN)S(O)2O;
each instance of RN is independently hydrogen, optionally substituted alkyl, or a nitrogen protecting group; Ring B is optionally substituted carbocyclyl, optionally substituted heterocyclyl, optionally substituted aryl, or optionally substituted heteroaryl; and p is 1 or 2. [0358] In certain embodiments, the compound of Fomula (PI) is a PEG-OH lipid (i.e., R3 is –ORO, and RO is hydrogen). In certain embodiments, the compound of Formula (PI) is of Formula (PI-OH):
(PI-OH), or a salt thereof. Formula (PII) [0359] In certain embodiments, a PEG lipid is a PEGylated fatty acid. In certain embodiments, a PEG lipid is a compound of Formula (PII). In some embodiments, compounds of Formula (PII) have the following formula:
or a salts thereof, wherein: R3 is–ORO; RO is hydrogen, optionally substituted alkyl or an oxygen protecting group; r is an integer between 1 and 100, inclusive; R5 is optionally substituted C10-40 alkyl, optionally substituted C10-40 alkenyl, or optionally substituted C10-40 alkynyl; and optionally one or more methylene groups of R5 are replaced with optionally substituted carbocyclylene, optionally substituted heterocyclylene, optionally substituted arylene, optionally substituted heteroarylene, N(RN), O, S, C(O), - C(O)N(RN), NRNC(O), NRNC(O)N(RN), C(O)O, OC(O), OC(O)O, OC(O)N(RN), - NRNC(O)O, C(O)S, SC(O), C(=NRN), C(=NRN)N(RN), NRNC(=NRN), NRNC(=NRN)N(RN), C(S), C(S)N(RN), NRNC(S), NRNC(S)N(RN), S(O), OS(O), S(O)O, OS(O)O, OS(O)2, - S(O)2O, OS(O)2O, N(RN)S(O), S(O)N(RN), N(RN)S(O)N(RN), OS(O)N(RN), N(RN)S(O)O, S(O)2, N(RN)S(O)2, S(O)2N(RN), N(RN)S(O)2N(RN), OS(O)2N(RN), or N(RN)S(O)2O; and each instance of RN is independently hydrogen, optionally substituted alkyl, or a nitrogen protecting group.
[0360] In certain embodiments, the compound of Formula (PII) is of Formula (PII-OH):
(PII-OH), or a salt thereof. In some embodiments, r is 40-50. [0361] In yet other embodiments the compound of Formula (PII) is:
or a salt thereof. [0362] In some embodiments, the compound of Formula (PII) is
[0363] In some embodiments, the lipid composition of the pharmaceutical compositions disclosed herein does not comprise a PEG-lipid. [0364] In some embodiments, the PEG-lipids may be one or more of the PEG lipids described in U.S. Application No. US15/674,872. [0365] In some embodiments, the lipid nanoparticle comprises a molar ratio of 0.5-15% PEG lipid relative to the other lipid components. For example, the lipid nanoparticle may comprise a molar ratio of 0.5-10%, 0.5-5%, 1-15%, 1-10%, 1-5%, 2-15%, 2-10%, 2-5%, 5- 15%, 5-10%, or 10-15% PEG lipid. In some embodiments, the lipid nanoparticle comprises a molar ratio of 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, or 15% PEG- lipid. [0366] In some embodiments, the lipid nanoparticle comprises 1-5% PEG-modified lipid, optionally 1-3 mol%, for example 1.5 to 2.5 mol%, 1-2 mol%, 2-3 mol%, 3-4 mol%, or 4-5 mol%. In some embodiments, the lipid nanoparticle comprises 0.5-15 mol% PEG-modified lipid. For example, the lipid nanoparticle may comprise 0.5-10 mol%, 0.5-5 mol%, 1-15 mol%, 1-10 mol%, 1-5 mol%, 2-15 mol%, 2-10 mol%, 2-5 mol%, 5-15 mol%, 5-10 mol%, or 10-15 mol%. In some embodiments, the lipid nanoparticle comprises 0.5 mol%, 1 mol%, 2 mol%, 3 mol%, 4 mol%, 5 mol%, 6 mol%, 7 mol%, 8 mol%, 9 mol%, 10 mol%, 11 mol%, 12 mol%, 13 mol%, 14 mol%, or 15 mol% PEG-modified lipid. [0367] Some embodiments comprise adding PEG to a composition comprising an LNP encapsulating a nucleic acid (e.g., which already includes PEG in the amounts listed above). In embodiments comprise adding about 0.5mo% or more PEG to an LNP composition, such
as about 1mol%, about 1.5mol%, about 2mol%, about 2.5mol%, about 3mol%, about 3.5mol%, about 4mol%, about 5mol%, or more after formation of an LNP composition (e.g., which already contains PEG in amount listed elsewhere herein). [0368] In some embodiments, the lipid nanoparticle comprises 20-60 mol% ionizable amino lipid, 5-25 mol% non-cationic lipid, 25-55 mol% sterol, and 0.5-15 mol% PEG- modified lipid. [0369] In some embodiments, a LNP comprises an ionizable amino lipid of Compound 1, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG. [0370] In some embodiments, a LNP comprises an ionizable amino lipid of Compound 2, wherein the non-cationic lipid is DSPC, the structural lipid that is cholesterol, and the PEG lipid is DMG-PEG. [0371] In some embodiments, a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising PEG-DMG. [0372] In some embodiments, a LNP comprises an ionizable amino lipid of any of Formula (AIII), (AIV), or (AV), a phospholipid comprising DSPC, a structural lipid, and a PEG lipid comprising a compound having Formula (PII). [0373] In some embodiments, a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII). [0374] In some embodiments, a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid comprising a compound having Formula (HI), a structural lipid, and the PEG lipid comprising a compound having Formula (PI) or (PII). [0375] In some embodiments, a LNP comprises an ionizable amino lipid of Formula (AIII), (AIV), or (AV), a phospholipid having Formula (HI), a structural lipid, and a PEG lipid comprising a compound having Formula (PII). [0376] In some embodiments, the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 10 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG. [0377] In some embodiments, the lipid nanoparticle comprises 49 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 1.5 mol% DMG-PEG. [0378] In some embodiments, the lipid nanoparticle comprises 48 mol% ionizable amino lipid, 11 mol% DSPC, 38.5 mol% cholesterol, and 2.5 mol% DMG-PEG.
[0379] In some embodiments, a LNP comprises an N:P ratio of from about 2:1 to about 30:1. [0380] In some embodiments, a LNP comprises an N:P ratio of about 6:1. [0381] In some embodiments, a LNP comprises an N:P ratio of about 3:1, 4:1, or 5:1. [0382] In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of from about 10:1 to about 100:1. [0383] In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 20:1. [0384] In some embodiments, a LNP comprises a wt/wt ratio of the ionizable amino lipid component to the RNA of about 10:1. [0385] Some embodiments comprise a composition having one or more LNPs having a diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less. Some embodiments comprise a composition having a mean LNP diameter of about 150 nm or less, such as about 140 nm, 130 nm, 120 nm, 110 nm, 100 nm, 90 nm, 80 nm, 70 nm, 60 nm, 50 nm, 40 nm, 30 nm, or 20 nm or less. In some embodiments, the composition has a mean LNP diameter from about 30nm to about 150nm, or a mean diameter from about 60nm to about 120nm. [0386] A LNP may comprise or one or more types of lipids, including but not limited to amino lipids (e.g., ionizable amino lipids), neutral lipids, non-cationic lipids, charged lipids, PEG-modified lipids, phospholipids, structural lipids and sterols. In some embodiments, a LNP may further comprise one or more cargo molecules, including but not limited to nucleic acids (e.g., mRNA, plasmid DNA, DNA or RNA oligonucleotides, siRNA, shRNA, snRNA, snoRNA, lncRNA, etc.), small molecules, proteins and peptides. [0387] In some embodiments, the composition comprises a liposome. A liposome is a lipid particle comprising lipids arranged into one or more concentric lipid bilayers around a central region. The central region of a liposome may comprises an aqueous solution, suspension, or other aqueous composition. [0388] In some embodiments, a lipid nanoparticle may comprise two or more components (e.g., amino lipid and nucleic acid, PEG-lipid, phospholipid, structural lipid). For instance, a lipid nanoparticle may comprise an amino lipid and a nucleic acid. Compositions comprising the lipid nanoparticles, such as those described herein, may be used for a wide variety of applications, including the stealth delivery of therapeutic payloads with minimal adverse innate immune response.
[0389] Effective in vivo delivery of nucleic acids represents a continuing medical challenge. Exogenous nucleic acids (i.e., originating from outside of a cell or organism) are readily degraded in the body, e.g., by the immune system. Accordingly, effective delivery of nucleic acids to cells often requires the use of a particulate carrier (e.g., lipid nanoparticles). The particulate carrier should be formulated to have minimal particle aggregation, be relatively stable prior to intracellular delivery, effectively deliver nucleic acids intracellularly, and illicit no or minimal immune response. To achieve minimal particle aggregation and pre- delivery stability, many conventional particulate carriers have relied on the presence and/or concentration of certain components (e.g., PEG-lipid). However, it has been discovered that certain components may decrease the stability of encapsulated nucleic acids (e.g., mRNA molecules). The reduced stability may limit the broad applicability of the particulate carriers. As such, there remains a need for methods by which to improve the stability of nucleic acid (e.g., mRNA) encapsulated within lipid nanoparticles. [0390] In some embodiments, the lipid nanoparticles comprise one or more of ionizable molecules, polynucleotides, and optional components, such as structural lipids, sterols, neutral lipids, phospholipids and a molecule capable of reducing particle aggregation (e.g., polyethylene glycol (PEG), PEG-modified lipid), such as those described above. [0391] In some embodiments, a LNP described herein may include one or more ionizable molecules (e.g., amino lipids or ionizable lipids). The ionizable molecule may comprise a charged group and may have a certain pKa. In certain embodiments, the pKa of the ionizable molecule may be greater than or equal to about 6, greater than or equal to about 6.2, greater than or equal to about 6.5, greater than or equal to about 6.8, greater than or equal to about 7, greater than or equal to about 7.2, greater than or equal to about 7.5, greater than or equal to about 7.8, greater than or equal to about 8. In some embodiments, the pKa of the ionizable molecule may be less than or equal to about 10, less than or equal to about 9.8, less than or equal to about 9.5, less than or equal to about 9.2, less than or equal to about 9.0, less than or equal to about 8.8, or less than or equal to about 8.5. Combinations of the above referenced ranges are also possible (e.g., greater than or equal to 6 and less than or equal to about 8.5). Other ranges are also possible. In embodiments in which more than one type of ionizable molecule are present in a particle, each type of ionizable molecule may independently have a pKa in one or more of the ranges described above. [0392] In general, an ionizable molecule comprises one or more charged groups. In some embodiments, an ionizable molecule may be positively charged or negatively charged. For instance, an ionizable molecule may be positively charged. For example, an ionizable
molecule may comprise an amine group. As used herein, the term “ionizable molecule” has its ordinary meaning in the art and may refer to a molecule or matrix comprising one or more charged moiety. As used herein, a “charged moiety” is a chemical moiety that carries a formal electronic charge, e.g., monovalent (+1, or -1), divalent (+2, or -2), trivalent (+3, or - 3), etc. The charged moiety may be anionic (i.e., negatively charged) or cationic (i.e., positively charged). Examples of positively-charged moieties include amine groups (e.g., primary, secondary, and/or tertiary amines), ammonium groups, pyridinium group, guanidine groups, and imidizolium groups. In a particular embodiment, the charged moieties comprise amine groups. Examples of negatively-charged groups or precursors thereof, include carboxylate groups, sulfonate groups, sulfate groups, phosphonate groups, phosphate groups, hydroxyl groups, and the like. The charge of the charged moiety may vary, in some cases, with the environmental conditions, for example, changes in pH may alter the charge of the moiety, and/or cause the moiety to become charged or uncharged. In general, the charge density of the molecule and/or matrix may be selected as desired. [0393] In some cases, an ionizable molecule (e.g., an amino lipid or ionizable lipid) may include one or more precursor moieties that can be converted to charged moieties. For instance, the ionizable molecule may include a neutral moiety that can be hydrolyzed to form a charged moiety, such as those described above. As a non-limiting specific example, the molecule or matrix may include an amide, which can be hydrolyzed to form an amine, respectively. Those of ordinary skill in the art will be able to determine whether a given chemical moiety carries a formal electronic charge (for example, by inspection, pH titration, ionic conductivity measurements, etc.), and/or whether a given chemical moiety can be reacted (e.g., hydrolyzed) to form a chemical moiety that carries a formal electronic charge. [0394] The ionizable molecule (e.g., amino lipid or ionizable lipid) may have any suitable molecular weight. In certain embodiments, the molecular weight of an ionizable molecule is less than or equal to about 2,500 g/mol, less than or equal to about 2,000 g/mol, less than or equal to about 1,500 g/mol, less than or equal to about 1,250 g/mol, less than or equal to about 1,000 g/mol, less than or equal to about 900 g/mol, less than or equal to about 800 g/mol, less than or equal to about 700 g/mol, less than or equal to about 600 g/mol, less than or equal to about 500 g/mol, less than or equal to about 400 g/mol, less than or equal to about 300 g/mol, less than or equal to about 200 g/mol, or less than or equal to about 100 g/mol. In some instances, the molecular weight of an ionizable molecule is greater than or equal to about 100 g/mol, greater than or equal to about 200 g/mol, greater than or equal to about 300 g/mol, greater than or equal to about 400 g/mol, greater than or equal to about 500 g/mol,
greater than or equal to about 600 g/mol, greater than or equal to about 700 g/mol, greater than or equal to about 1000 g/mol, greater than or equal to about 1,250 g/mol, greater than or equal to about 1,500 g/mol, greater than or equal to about 1,750 g/mol, greater than or equal to about 2,000 g/mol, or greater than or equal to about 2,250 g/mol. Combinations of the above ranges (e.g., at least about 200 g/mol and less than or equal to about 2,500 g/mol) are also possible. In embodiments in which more than one type of ionizable molecules are present in a particle, each type of ionizable molecule may independently have a molecular weight in one or more of the ranges described above. [0395] In some embodiments, the percentage (e.g., by weight, or by mole) of a single type of ionizable molecule (e.g., amino lipid or ionizable lipid) and/or of all the ionizable molecules within a particle may be greater than or equal to about 15%, greater than or equal to about 16%, greater than or equal to about 17%, greater than or equal to about 18%, greater than or equal to about 19%, greater than or equal to about 20%, greater than or equal to about 21%, greater than or equal to about 22%, greater than or equal to about 23%, greater than or equal to about 24%, greater than or equal to about 25%, greater than or equal to about 30%, greater than or equal to about 35%, greater than or equal to about 40%, greater than or equal to about 42%, greater than or equal to about 45%, greater than or equal to about 48%, greater than or equal to about 50%, greater than or equal to about 52%, greater than or equal to about 55%, greater than or equal to about 58%, greater than or equal to about 60%, greater than or equal to about 62%, greater than or equal to about 65%, or greater than or equal to about 68%. In some instances, the percentage (e.g., by weight, or by mole) may be less than or equal to about 70%, less than or equal to about 68%, less than or equal to about 65%, less than or equal to about 62%, less than or equal to about 60%, less than or equal to about 58%, less than or equal to about 55%, less than or equal to about 52%, less than or equal to about 50%, or less than or equal to about 48%. Combinations of the above referenced ranges are also possible (e.g., greater than or equal to 20% and less than or equal to about 60%, greater than or equal to 40% and less than or equal to about 55%, etc.). In embodiments in which more than one type of ionizable molecule is present in a particle, each type of ionizable molecule may independently have a percentage (e.g., by weight, or by mole) in one or more of the ranges described above. The percentage (e.g., by weight, or by mole) may be determined by extracting the ionizable molecule(s) from the dried particles using, e.g., organic solvents, and measuring the quantity of the agent using high pressure liquid chromatography (i.e., HPLC), liquid chromatography-mass spectrometry (LC-MS), nuclear magnetic resonance (NMR), or mass spectrometry (MS). Those of ordinary skill in the art
would be knowledgeable of techniques to determine the quantity of a component using the above-referenced techniques. For example, HPLC may be used to quantify the amount of a component, by, e.g., comparing the area under the curve of a HPLC chromatogram to a standard curve. [0396] It should be understood that the terms “charged” or “charged moiety” does not refer to a “partial negative charge" or “partial positive charge" on a molecule. The terms “partial negative charge" and “partial positive charge" are given their ordinary meaning in the art. A “partial negative charge" may result when a functional group comprises a bond that becomes polarized such that electron density is pulled toward one atom of the bond, creating a partial negative charge on the atom. Those of ordinary skill in the art will, in general, recognize bonds that can become polarized in this way. [0397] A lipid composition may comprise one or more lipids as described herein. Such lipids may include those useful in the preparation of lipid nanoparticle formulations as described above or as known in the art. Stabilizing compounds [0398] Some embodiments of the compositions described herein are stabilized pharmaceutical compositions. Various non-viral delivery systems, including nanoparticle formulations, present attractive opportunities to overcome many challenges associated with mRNA delivery. Lipid nanoparticles (LNPs) have drawn particular attention in recent years as various LNP formulations have shown promise in a variety of pharmaceutical applications. However, lipids have been shown to degrade nucleic acids, including mRNA, and lipid nanoparticle formulations undergo rapid loss of purity when stored as refrigerated liquids. Moreover, the storage stability of mRNA encapsulated within LNPs is lower than that of unencapsulated mRNA. [0399] A class of compounds has been found to stabilize nucleic acids within a lipid carrier such as an LNP, an unexpected and unprecedented discovery which enables applications including extended refrigerated liquid shelf-life, extended in-use periods at room temperature, and extended in-use stability at physiological temperatures up to higher temperatures such as 40°C. Such stabilizing compounds solve a critical problem, as current manufacturing processes and formulations experience a 5-10% purity loss during LNP formation and processing that is typical with current large-scale LNP production. [0400] In some embodiments, the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a stabilizing compound (e.g., a
compound of Formula (I), of Formula (II), or a tautomer or solvate thereof). In some embodiments, the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a lipid, and a compound of Formula (I):
or a tautomer or solvate thereof, wherein: is a single bond or a double bond; R1 is H; R2 is OCH3, or together with R3 is OCH2O; R3 is OCH3, or together with R2 is OCH2O; R4 is H; R5 is H or OCH3; R6 is OCH3; R7 is H or OCH3; R8 is H; R9 is H or CH3; and X is a pharmaceutically acceptable anion, e.g., a halide such as chloride. [0401] In some embodiments, the compound of Formula (I) has the structure of:
Formula (Ia) Formula (Ib) Formula (Ic) or a tautomer or solvate thereof. [0402] In some embodiments, the stabilized pharmaceutical composition comprises a nucleic acid formulation comprising a nucleic acid and a lipid, and a compound of Formula (II):
(II), or a tautomer or solvate thereof, wherein: R10 is H; R11 is H; R12 together with R13 is OCH2O; R14 is H; R15 together with R16 is OCH2O; R17 is H; and X is a pharmaceutically acceptable anion, e.g., a halide such as chloride.
[0403] In some embodiments, the compound of Formula (II) has the structure of:
or a tautomer or solvate thereof. [0404] Stabilizing compounds of Formulas (I), (Ia), (Ib), (Ic), (II), and (Iia) are described in International Application No. PCT/US2022/025967, which is incorporated by reference herein in its entirety. [0405] In some embodiments, the nucleic acid formulation comprises lipid nanoparticles. In some embodiments, the nucleic acid is mRNA. [0406] In some embodiments, the stabilizing compound (“the compound”) has a purity of at least 70%, 80%, 90%, 95%, or 99%. In some embodiments, the compound contains fewer than 100ppm of elemental metals. In some embodiments, the stabilized pharmaceutical composition (“the composition”) comprises a pharmaceutically acceptable metal chelator, e.g., EDTA (ethylenediaminetetraacetic acid) or DTPA (diethylenetriaminepentaacetic acid). [0407] In some embodiments, the composition is an aqueous solution. In some embodiments, the compound is present at a concentration between about 0.1mM and about 10mM in the aqueous solution. In some embodiments, the aqueous solution has a pH of or about 5 to 8, including pH of about 5, 5.5, 6, 6.5, 7, 7.5, or 8. In some embodiments, the aqueous solution does not comprise NaCl. In some embodiments, the aqueous solution comprises NaCl in a concentration of or about 150mM. In some embodiments, the aqueous solution comprises a phosphate buffer, a tris buffer, an acetate buffer, a histidine buffer, or a citrate buffer. [0408] In some embodiments, microbial growth in the composition is inhibited by the compound. [0409] In some embodiments, the composition is characterized as having a mRNA purity level of greater than 60%, greater than 70%, greater than 80%, or greater than 90% main peak mRNA purity after at least thirty days of storage. In some embodiments, the composition comprises a mRNA purity level of greater than 50% main peak mRNA purity after at least six months of storage. In some embodiments, the storage is at room temperature. [0410] In some embodiments, the composition comprises a lipid nanoparticle encapsulating a mRNA, and the composition comprises less than 50%, less than 60%, less
than 70%, less than 80%, less than 90%, or less than 95% RNA fragments after at least thirty days of storage. In some embodiments, the storage temperature is greater than room temperature. In some embodiments, the storage temperature is about 4°C. [0411] In some embodiments, the compound interacts with the nucleic acid comprised within a lipid nanostructure (e.g., a lipid nanoparticle, liposome, or lipoplex), e.g., via pi-pi stacking and/or by changing backbone helicity of the nucleic acid. In some embodiments, the compound intercalates with a nucleic acid. In some embodiments, the compound binds with a nucleic acid, e.g., reversible binding, and/or binding to the stranded regions of the nucleic acid. In some embodiments, the compound self-associates, binds to nucleic acid ribose contacts, and/or binds to nucleic acid base contacts. In some embodiments, the compound does not substantially bind to nucleic acid phosphate contacts. In some embodiments, the positive charge of the compound contributes to nucleic acid binding. In some embodiments, the interacts with the nucleic acid with a binding affinity defined by an equilibrium dissociation constant of less than 10-3 M (e.g., less than 10-4 M, less than 10-5 M, less than 10- 5 M, less than 10-7 M, less than 10-8 M, or less than 10-9 M). [0412] In some embodiments, the compound interacts with a nucleic acid and provides shielding from solvent, e.g., water. In some embodiments, the compound shields ribose from solvent more than the compound shields the phosphate groups of the nucleic acid. In some embodiments, the solvent exposure is measured by the solvent accessible surface area (SASA). In some embodiments, a stabilizing compound decreases the solvent accessible area of ribose to about 5-10 nm2. In some embodiments, a stabilizing compound decreases the solvent accessible area of ribose to about 6-8 nm2. In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 9-12 nm2. In some embodiments, a stabilizing compound decreases the solvent accessible area of phosphate to about 10-11 nm2. [0413] In some embodiments, a nucleic acid that is conformationally stabilized by the compound exhibits thermal unfolding temperatures (measured by circular dichroism or DSC, for example) that are higher than in the absence of the compound. In some embodiments, the compound confers increased stability, e.g., thermal stability, to the nucleic acid in a folded structure, e.g., relative to its unfolded or less folded or more linear form. In some embodiments, the compound causes compaction of the nucleic acid upon interaction with the nucleic acid. In some embodiments, the compound causes a decrease in the hydrodynamic radius of the nucleic acid molecule upon interaction with the nucleic acid. In some embodiments, a stabilizing compound causes compaction or a decrease in the hydrodynamic
radius of a nucleic acid molecule by 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, or more. In some embodiments, a stabilizing compound causes compaction or a decrease in the hydrodynamic radius of a nucleic acid molecule when the compound is in a concentration of 1 μM, 2 μM, 3 μM, 4 μM, 5 μM, 6 μM, 7 μM, 8 μM, 9 μM, 10 μM, 15 μM, 20 μM, 25 μM, 30 μM, 35 μM, 40 μM, 45 μM, 50 μM, 60 μM, 70 μM, 80 μM, 90 μM, or 100 μM. EXAMPLES Example 1: Chemical stability of CpA dinucleotide in mRNA [0414] The susceptibility of different dinucleotide pairs to spontaneous cleavage was analyzed by incubating a test mRNA in water for 4 hours, and analyzing the resulting mRNA cleavage fragments by Illumina 3′ end sequencing. After incubation, fragments were sequenced, and reads were aligned to the reference sequence, with the 3′ nucleotide of each read corresponding to the first nucleotide in a dinucleotide pair that was cleaved to generate the sequenced mRNA fragment (e.g., a read ending in AAGCAC (SEQ ID NO: 1) that aligned to the sequence AAGCACAAUC (SEQ ID NO: 2) indicated that the bolded CpA dinucleotide was cleaved to generate the 3′ of the mRNA fragment). Analysis of the resulting abundance of cleaved dinucleotides indicated that the CpA dinucleotide was the most represented dinucleotide, indicating that this dinucleotide is particularly susceptible to cleavage (FIG.1). [0415] Next, a panel of mRNAs, each encoding the same antigen (Ag) with the same amino acid sequence, but varying in CpA dinucleotide content, was generated to test the effects of CpA dinucleotide content on stability during mRNA storage. Control mRNAs contained open reading frames with 366 CpA dinucleotides, while others (“Low CA”) contained open reading frames with only 79 CpA dinucleotides. Low CA mRNAs #2 and 3 contained increased %G/C content, relative to Low CA mRNA #1, and Low CA mRNAs #2 and #3 differed in 5′ UTR sequences. For each mRNA, the CpA dinucleotide content (# of CpA dinucleotides in the open reading frame), %G/C content (in mRNA sequence), and time to 50% purity during storage at (i) 40 °C unformulated; (ii) 25 °C unformulated; or (iii) 25 °C when formulated in a lipid nanoparticle (LNP), is shown in Table 1. At both temperatures, mRNAs having fewer CpA dinucleotides decayed more slowly than the control mRNA, indicating that the stability of a given mRNA may be increased by reducing the abundance of CpA dinucleotides.
Table 1: Stability of mRNAs with low CpA dinucleotide content
Example 2: In vitro expression and in vivo immunogenicity of mRNAs with low CpA dinucleotide content [0416] The panel of mRNAs tested in Example 1 was also tested in cultured EXPI293 cells to evaluate expression of mRNAs with reduced CpA dinucleotide content. Following addition of LNP-mRNA compositions to cells and sufficient time to allow antigen expression, cells were collected, stained with an Ag-specific antibody, and analyzed by flow cytometry to evaluate antigen expression. The results of this analysis are shown in FIGs.3A– 3C. All Low CA mRNA compositions allowed translation of the encoded antigen in cells, with at least 40% of cells expressing detectable antigen (FIG.3A), and total protein expression being similar to that of cells contacted with compositions containing control mRNAs (FIGs.3B and 3C). [0417] The same panel of mRNA vaccine compositions were tested in C57BL/6 mice. Mice were immunized with two doses of a composition containing 1 µg mRNA, receiving the first dose on day 0 and the second dose on day 22. On day 21, three weeks after the first dose, and day 36, two weeks after the second dose, sera were collected to evaluate antibody responses elicited by each LNP-mRNA composition. The results of ELISAs, measuring titers of antibodies specific to the encoded antigen, are shown in FIG.4. These results indicate that reduction of CpA dinucleotide content may be used to improve mRNA stability, while still allowing expression in vitro and in vivo (e.g., sufficient expression to elicit an antibody response to an encoded antigen).
Example 3: In vitro transcription (IVT) Materials and Methods [0418] Alternative mRNAs are made using standard laboratory methods and materials for in vitro transcription. The open reading frame (ORF) of the gene of interest may be flanked by a 5′ untranslated region (UTR) containing a strong Kozak translational initiation signal, and an alpha-globin 3′ UTR. [0419] The ORF may also include various upstream or downstream additions (such as, but not limited to, β-globin, tags, etc.) may be ordered from an optimization service such as, but limited to, DNA2.0 (Menlo Park, Calif.) and may contain multiple cloning sites which may have XbaI recognition. Upon receipt of the construct, it may be reconstituted and transformed into chemically competent E. coli. NEB DH5-alpha Competent E. coli may be used. Transformations are performed according to NEB instructions using 100 ng of plasmid. The protocol is as follows: Thaw a tube of NEB 5-alpha Competent E. coli cells on ice for 10 minutes. Add 1-5 μl containing 1 pg-100 ng of plasmid DNA to the cell mixture. Carefully flick the tube 4-5 times to mix cells and DNA. Do not vortex. Place the mixture on ice for 30 minutes. Do not mix. Heat shock at 42° C. for exactly 30 seconds. Do not mix. Place on ice for 5 minutes. Do not mix. Pipette 950 μl of room temperature SOC into the mixture. Place at 37° C. for 60 minutes. Shake vigorously (250 rpm) or rotate. Warm selection plates to 37° C. Mix the cells thoroughly by flicking the tube and inverting. Spread 50-100 μl of each dilution onto a selection plate and incubate overnight at 37° C. Alternatively, incubate at 30° C. for 24-36 hours or 25° C. for 48 hours. [0420] A single colony is then used to inoculate 5 ml of LB growth media using the appropriate antibiotic and then allowed to grow (250 RPM, 37° C.) for 5 hours. This is then used to inoculate a 200 ml culture medium and allowed to grow overnight under the same conditions. [0421] To isolate the plasmid (up to 850 μg), a maxi prep is performed using the Invitrogen PURELINK™ HiPure Maxiprep Kit (Carlsbad, Calif.), following the manufacturer's instructions.
[0422] In order to generate cDNA for In Vitro Transcription (IVT), the plasmid is first linearized using a restriction enzyme such as XbaI. A typical restriction digest with XbaI will comprise the following: Plasmid 1.0 μg; 10× Buffer 1.0 μl; XbaI 1.5 μl; dH2O up to 10 μl; incubated at 37° C. for 1 hr. If performing at lab scale (<5 μg), the reaction is cleaned up using Invitrogen's PURELINK™ PCR Micro Kit (Carlsbad, Calif.) per manufacturer's instructions. Larger scale purifications may need to be done with a product that has a larger load capacity such as Invitrogen's standard PURELINK™ PCR Kit (Carlsbad, Calif.). Following the cleanup, the linearized vector is quantified using the NanoDrop and analyzed to confirm linearization using agarose gel electrophoresis. IVT Reaction [0423] The in vitro transcription reaction generates mRNA containing alternative nucleotides or alternative RNA. The input nucleotide triphosphate (NTP) mix is made in- house using natural and unnatural NTPs. A typical in vitro transcription reaction includes the following: Template cDNA 1.0 μg 10x transcription buffer (400 mM Tris-HCl 2.0 μl pH 8.0, 190 mM MgCl2, 50 mM DTT, 10 mM Spermidine) Custom NTPs (25 mM each) 7.2 μl RNase Inhibitor 20 U T7 RNA polymerase 3000 U dH2O up to 20.0 μl Incubation at 37 °C for 3 hr-5 hrs. [0424] The crude IVT mix may be stored at 4° C overnight for cleanup the next day.1 U of RNase-free DNase is then used to digest the original template. After 15 minutes of incubation at 37° C., the mRNA is purified using Ambion's MEGACLEAR™ Kit (Austin, Tex.) following the manufacturer's instructions. This kit can purify up to 500 μg of RNA. Following the cleanup, the RNA is quantified using the NanoDrop and analyzed by agarose gel electrophoresis to confirm the RNA is the proper size and that no degradation of the RNA has occurred. [0425] The T7 RNA polymerase may be selected from, T7 RNA polymerase, T3 RNA polymerase and mutant polymerases such as, but not limited to, the novel polymerases able to incorporate alternative NTPs as well as those polymerases described by Liu (Esvelt et al. (Nature (2011) 472(7344):499-503 and U.S. Publication No. US 2011/0177495) which
recognize alternate promoters, Ellington (Chelliserrykattil and Ellington, Nature Biotechnology (2004) 22(9):1155-1160) describing a T7 RNA polymerase variant to transcribe 2′-O-methyl RNA and Sousa (Padilla and Sousa, Nucleic Acids Research (2002) 30(24):e128) describing a T7 RNA polymerase double mutant; herein incorporated by reference in their entireties. Agarose Gel Electrophoresis of Alternative mRNA [0426] Individual alternative mRNAs (200-400 ng in a 20 μl volume) are loaded into a well on a non-denaturing 1.2% Agarose E-Gel (Invitrogen, Carlsbad, Calif.) and run for 12- 15 minutes according to the manufacturer protocol. Agarose Gel Electrophoresis of RT-PCR Products [0427] Individual reverse transcribed-PCR products (200-400 ng) are loaded into a well of a non-denaturing 1.2% Agarose E-Gel (Invitrogen, Carlsbad, Calif.) and run for 12-15 minutes according to the manufacturer protocol. Nanodrop Alternative mRNA Quantification and UV Spectral Data [0428] Alternative mRNAs in TE buffer (1 μl) are used for Nanodrop UV absorbance readings to quantitate the yield of each alternative mRNA from an in vitro transcription reaction (UV absorbance traces are not shown). Example 3: Enzymatic capping of mRNA [0429] Capping of the mRNA is performed as follows where the mixture includes: IVT RNA 60 μg–180 μg and dH2O up to 72 μl. The mixture is incubated at 65 °C for 5 minutes to denature RNA, and then is transferred immediately to ice. [0430] The protocol then involves the mixing of 10× Capping Buffer (0.5 M Tris-HCl (pH 8.0), 60 mM KCl, 12.5 mM MgCl2) (10.0 μl); 20 mM GTP (5.0 μl); 20 mM S-Adenosyl Methionine (2.5 μl); RNase Inhibitor (100 U); 2′-O-Methyltransferase (400 U); Vaccinia capping enzyme (Guanylyl transferase) (40 U); dH2O (Up to 28 μl); and incubation at 37 °C for 30 minutes for 60 μg RNA or up to 2 hours for 180 μg of RNA. [0431] The mRNA is then purified using Ambion's MEGACLEAR™ Kit (Austin, Tex.) following the manufacturer's instructions. Following the cleanup, the RNA is quantified using the NANODROP™ (ThermoFisher, Waltham, Mass.) and analyzed by agarose gel electrophoresis to confirm the RNA is the proper size and that no degradation of the RNA has
occurred. The RNA product may also be sequenced by running a reverse-transcription-PCR to generate the cDNA for sequencing. Example 4: 5′-Guanosine capping Materials and Methods [0432] The cloning, gene synthesis and vector sequencing may be performed by DNA2.0 Inc. (Menlo Park, Calif.). The ORF is restriction digested using XbaI and used for cDNA synthesis using tailed- or tail-less-PCR. The tailed-PCR cDNA product is used as the template for the alternative mRNA synthesis reaction using 25 mM each alternative nucleotide mix (all alternative nucleotides may be custom synthesized or purchased from TriLink Biotech, San Diego, Calif. except pyrrolo-C triphosphate which may be purchased from Glen Research, Sterling Va.; unmodified nucleotides are purchased from Epicenter Biotechnologies, Madison, Wis.) and CellScript MEGASCRIPT™ (Epicenter Biotechnologies, Madison, Wis.) complete mRNA synthesis kit. [0433] The in vitro transcription reaction is run for 4 hours at 37 °C. Alternative mRNAs incorporating adenosine analogs are poly (A) tailed using yeast Poly (A) Polymerase (Affymetrix, Santa Clara, Calif.). The PCR reaction uses HiFi PCR 2× MASTER MIX™ (Kapa Biosystems, Woburn, Mass.). Alternative mRNAs are post-transcriptionally capped using recombinant Vaccinia Virus Capping Enzyme (New England BioLabs, Ipswich, Mass.) and a recombinant 2′-O-methyltransferase (Epicenter Biotechnologies, Madison, Wis.) to generate the 5′-guanosine Cap1 structure. Cap 2 structure and Cap 2 structures may be generated using additional 2′-O-methyltransferases. The in vitro transcribed mRNA product is run on an agarose gel and visualized. Alternative mRNA may be purified with Ambion/Applied Biosystems (Austin, Tex.) MEGAClear RNA™ purification kit. The PCR uses PURELINK™ PCR purification kit (Invitrogen, Carlsbad, Calif.). The product is quantified on NANODROP™ UV Absorbance (ThermoFisher, Waltham, Mass.). Quality, UV absorbance quality and visualization of the product was performed on an 1.2% agarose gel. The product is resuspended in TE buffer. 5′-Capping Alternative Nucleic Acid (mRNA) Structure [0434] 5′-capping of alternative mRNA may be completed concomitantly during the in vitro-transcription reaction using the following chemical RNA cap analogs to generate the 5′- guanosine cap structure according to manufacturer protocols: 3″-O-Me-m7G(5′)ppp(5′)G (the
ARCA cap); G(5′)ppp(5′)A; G(5′)ppp(5′)G; m7G(5′)ppp(5′)A; m7G(5′)ppp(5′)G (New England BioLabs, Ipswich, Mass.).5′-capping of alternative mRNA may be completed post- transcriptionally using a Vaccinia Virus Capping Enzyme to generate the “Cap 0” structure: m7G(5′)ppp(5′)G (New England BioLabs, Ipswich, Mass.). Cap 1 structure may be generated using both Vaccinia Virus Capping Enzyme and a 2′-O methyl-transferase to generate: m7G(5′)ppp(5′)G-2′-O-methyl. Cap 2 structure may be generated from the Cap 1 structure followed by the 2′-O-methylation of the 5′-antepenultimate nucleotide using a 2′-O methyl- transferase. Cap 3 structure may be generated from the Cap 2 structure followed by the 2′-O- methylation of the 5′-preantepenultimate nucleotide using a 2′-O methyl-transferase. Enzymes are preferably derived from a recombinant source. [0435] When transfected into mammalian cells, the alternative mRNAs have a stability of 12-18 hours or more than 18 hours, e.g., 24, 36, 48, 60, 72 or greater than 72 hours. Example 5: In vivo expression of selected sequences [0436] Lipid nanoparticles containing modified or unmodified mRNA are administered to mice at mRNA doses of at 0.05 mg/kg intravenously, subcutaneous, or intramuscularly. Expression of polypeptides encoded mRNAs is evaluated by any method known in the art. For example, expression of encoded fluorescent protein may be evaluated by isolating cells and measuring fluorescence intensity by fluorescence activated cell sorting (FACS) or fluorescent microscopy. Example 6: Method of screening for protein expression Electrospray Ionization [0437] A biological sample which may contain proteins encoded by modified RNA administered to the subject is prepared and analyzed according to the manufacturer protocol for electrospray ionization (ESI) using 1, 2, 3 or 4 mass analyzers. A biologic sample may also be analyzed using a tandem ESI mass spectrometry system. [0438] Patterns of protein fragments, or whole proteins, are compared to known controls for a given protein and identity is determined by comparison.
Matrix-Assisted Laser Desorption/Ionization [0439] A biological sample which may contain proteins encoded by alternative RNA administered to the subject is prepared and analyzed according to the manufacturer protocol for matrix-assisted laser desorption/ionization (MALDI). [0440] Patterns of protein fragments, or whole proteins, are compared to known controls for a given protein and identity is determined by comparison. Liquid Chromatography-Mass Spectrometry-Mass Spectrometry [0441] A biological sample, which may contain proteins encoded by alternative RNA, may be treated with a trypsin enzyme to digest the proteins contained within. The resulting peptides are analyzed by liquid chromatography-mass spectrometry-mass spectrometry (LC/MS/MS). The peptides are fragmented in the mass spectrometer to yield diagnostic patterns that can be matched to protein sequence databases via computer algorithms. The digested sample may be diluted to achieve 1 ng or less starting material for a given protein. Biological samples containing a simple buffer background (e.g., water or volatile salts) are amenable to direct in-solution digest; more complex backgrounds (e.g., detergent, non- volatile salts, glycerol) require an additional clean-up step to facilitate the sample analysis. [0442] Patterns of protein fragments, or whole proteins, are compared to known controls for a given protein and identity is determined by comparison. Example 7: In vivo assays with human EPO containing alternative nucleotides formulation [0443] Modified mRNAs encoding human erythropoietin (hEPO) are formulated in lipid nanoparticles (LNPs) comprising DLin-KC2-DMA, DSPC, Cholesterol, and PEG-DMG at 50:10:38.5:1.5 mol % respectively. The LNPs are made by direct injection utilizing nanoprecipitation of ethanol solubilized lipids into a pH 4.050 mM citrate mRNA solution. The EPO LNP particle size distributions are characterized by DLS. Encapsulation efficiency
(EE) is determined using a Ribogreen™ fluorescence-based assay for detection and quantification of nucleic acids.
Methods [0444] Female Balb/c mice (n=5) are administered 0.05 mg/kg IM (50 μl in the quadriceps) or IV (100 μl in the tail vein) of human EPO mRNA. At time 8 hours after the injection mice are euthanized and blood was collected in serum separator tubes. The samples are spun, and serum samples are then run on an EPO ELISA following the kit protocol (Stem Cell Technologies Catalog #01630). EQUIVALENTS AND SCOPE [0445] While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or
applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure. [0446] All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms. [0447] All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document. [0448] The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.” [0449] The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in some embodiments, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc. As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items.
Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law. [0450] As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in some embodiments, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc. Each possibility represents a separate embodiment of the present invention. [0451] It should be understood that, unless clearly indicated to the contrary, the disclosure of numerical values and ranges of numerical values in the specification includes both i) the exact value(s) or range specified, and ii) values that are “about” the value(s) or ranges specified (e.g., values or ranges falling within a reasonable range (e.g., about 10% similar)) as would be understood by a person of ordinary skill in the art. [0452] It should also be understood that, unless clearly indicated to the contrary, in any methods disclosed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are disclosed. [0453] In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,”
“composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.
Claims
CLAIMS What is claimed is: 1. A non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is greater than or equal to a theoretical minimum and less than or equal to 300% of the theoretical minimum.
2. A non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum.
3. The mRNA of claim 2, wherein the number of CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum is no more than 10, no more than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, no more than 2, or no more than 1. 4. A non-naturally occurring mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the ORF comprises a CpA dinucleotide content of 6.5% or less. 5. The mRNA of claim 4, wherein the ORF comprises a CpA dinucleotide content of 6.0% or less, 5.5% or less, 5% or less, 4.5% or less,
4% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.
5% or less.
6. The mRNA of any one of the preceding claims, wherein: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides;
(c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (f) fewer than 30% of amino acids that immediately precede a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides; and/or (g) fewer than 30% of amino acids that immediately precede an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides.
7. The mRNA of any one of the preceding claims, wherein the nucleotide sequence of the mRNA comprises a %G/C content of 30% – 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
8. The mRNA of any one of the preceding claims, wherein one or more nucleotides of the mRNA comprises a chemically modified nucleotide.
9. The mRNA of any one of the preceding claims, wherein each uridine nucleotide of the mRNA comprises a chemically modified nucleotide.
10. An mRNA encoding a polypeptide, the mRNA comprising an open reading frame (ORF) encoding the polypeptide, wherein the mRNA has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%, wherein each of the uridine nucleotides of the ORF comprises a chemical modification, wherein: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides;
(b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (f) fewer than 30% of amino acids that immediately precede a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides; and/or (g) fewer than 30% of amino acids that immediately precede an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides.
11. The mRNA of any one of claims 8–9 or 10, wherein the chemically modified nucleotide comprise N1-methylpseudouridine.
12. The mRNA of any one of the preceding claims, wherein fewer than 15% of serine residues, fewer than 27% of proline residues, fewer than 28% of threonine residues, and fewer than 23% of alanine residues in the polypeptide are encoded by codons in the ORF comprising a CpA dinucleotide.
13. The mRNA of any one of the preceding claims, wherein: (a) no serine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (b) no proline residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (c) no threonine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; and/or (d) no alanine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide.
14. The mRNA of any one of the preceding claims, wherein: (a) no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (b) no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (c) no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (d) no amino acid that immediately precedes an asparagine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (e) no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (f) no amino acid that immediately precedes a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide; and/or (g) no amino acid that immediately precedes an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide.
15. The mRNA of any one of the preceding claims, wherein no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, or lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
16. The mRNA of any one of the preceding claims, wherein no codon in the ORF beginning with an adenosine nucleotide is immediately preceded by a codon in the ORF that ends in a cytidine nucleotide.
17. The mRNA of any one of the preceding claims, wherein the ORF is codon-optimized for expression in a cell.
18. The mRNA of claim 17, wherein the cell is a mammalian cell.
19. The mRNA of any one of the preceding claims, wherein the mRNA further comprises: (i) a 5′ untranslated region (UTR); and/or
(ii) a 3′ UTR.
20. The mRNA of claim 19, wherein the 5′ UTR is a heterologous UTR and/or the 3′ UTR is a heterologous UTR.
21. The mRNA of claim 19 or 20, wherein the 5′ UTR comprises five or fewer, four or fewer, three or fewer, two or fewer, one or fewer, or zero CpA dinucleotides.
22. The mRNA of any one of claims 19–21, wherein the 5′ UTR does not comprise a CpA dinucleotide.
23. The mRNA of any one of claims 19–22, wherein the 3′ UTR comprises five or fewer, four or fewer, three or fewer, two or fewer, one or fewer, or zero CpA dinucleotides.
24. The mRNA of any one of claims 19–23, wherein the 3′ UTR does not comprise a CpA dinucleotide.
25. The mRNA of any one of claims 19–24, wherein the last nucleotide of the 5′ UTR is not a cytidine nucleotide.
26. The mRNA of any one of claims 19–25, wherein the 5′ UTR has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
27. The mRNA of any one of claims 19–26, wherein the ORF has a %G/C content of 30– 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
28. The mRNA of any one of claims 19–27, wherein the 3′ UTR has a %G/C content of 30–80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
29. The mRNA of any of the preceding claims, wherein the mRNA further comprises: (iii) a 5′ cap structure; and/or
(iv) a poly-A tail.
30. The mRNA of claim 29, wherein the last nucleotide of the 3′ UTR is not a cytidine nucleotide.
31. The mRNA of claim 29 or 30, wherein the 5′ cap structure comprises 7mG(5')ppp(5')NlmpNp.
32. The mRNA of any one of the preceding claims, wherein the level of expression in a mammalian cell of the encoded polypeptide from the mRNA is at least 50% of the level of expression of a reference mRNA comprising a reference open reading frame (rORF) encoding the polypeptide, wherein the rORF comprises a higher number of CpA dinucleotides than the ORF.
33. The mRNA of any one of the preceding claims, wherein one or more CpA dinucleotides of the mRNA comprises a modified cytidine nucleotide and/or a modified adenosine nucleotide.
34. The mRNA of any one of the preceding claims, wherein the number of CpA dinucleotides comprising an unmodified cytidine nucleotide and an unmodified adenosine nucleotide in the ORF is 100%, 95% or less, 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, or 10% or less of the total number of histidine and glutamine residues in the polypeptide.
35. The mRNA of any one of the preceding claims, wherein the polypeptide comprises 9– 5,000, 20–4,000, 30–3,000, 40–2,000, or 50–1,500 amino acids.
36. The mRNA of any one of the preceding claims, wherein the polypeptide is a vaccine antigen or a therapeutic protein.
37. The mRNA of any one of the preceding claims, wherein a coefficient of degradation at 25 °C of the mRNA is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide.
38. The mRNA of any one of the preceding claims, wherein a composition comprising a plurality of the mRNAs remains above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNAs comprising a wild-type ORF encoding the polypeptide.
39. The mRNA of claim 38, wherein storage of the mRNA is conducted at a temperature between about 2 °C to about 8 °C.
40. The mRNA of claim 38 or 39, wherein the mRNA is stored in a buffer comprising 10–50 mM Tris and 5–10% sucrose, wherein the buffer has a pH of about 7.3 to about 7.6.
41. The mRNA of any one of the preceding claims, wherein the stability of the mRNA is increased relative to a reference mRNA having a higher number of CpA dinucleotides, the reference mRNA comprising a reference open reading frame (rORF) encoding the polypeptide, wherein the rORF has a higher number of CpA dinucleotides than the ORF.
42. A lipid nanoparticle comprising the mRNA of any one of the preceding claims, and an ionizable cationic lipid, a non-cationic lipid, a sterol, and a polyethylene glycol (PEG)- modified lipid.
43. The lipid nanoparticle of claim 42, wherein the lipid nanoparticle comprises 20–60% ionizable cationic lipid, and 5–25% non-cationic lipid, 25–55% cholesterol, and 0.5–15% polyethylene glycol (PEG)-modified lipid.
44. The lipid nanoparticle of claim 42 or 43, wherein a coefficient of degradation at 25 °C of the mRNA in the lipid nanoparticle is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising a wild-type ORF encoding the polypeptide.
45. The lipid nanoparticle of any one of claims 42–44, wherein a composition comprising a plurality of the lipid nanoparticles remains above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of the lipid nanoparticles and mRNAs comprising a wild-type ORF encoding the polypeptide.
46. The lipid nanoparticle of claim 45, wherein storage of the lipid nanoparticle is conducted at a temperature between about 2 °C to about 8 °C.
47. The lipid nanoparticle of any one of claims 42–46, further comprising a stabilizing compound of Formula (I):
or a tautomer or solvate thereof, wherein: is a single bond or a double bond; R1 is H; R2 is OCH3, or together with R3 is OCH2O; R3 is OCH3, or together with R2 is OCH2O; R4 is H; R5 is H or OCH3; R6 is OCH3; R7 is H or OCH3; R8 is H; R9 is H or CH3; and X is a pharmaceutically acceptable anion.
48. The lipid nanoparticle of claim 47, wherein the stabilizing compound is wherein the
Formula (Ia) Formula (Ib)
Formula (Ic) or a tautomer or solvate thereof.
49. The lipid nanoparticle of any one of claims 42–48, further comprising a stabilizing compound of Formula (II):
or a tautomer or solvate thereof, wherein: R10 is H; R11 is H; R12 together with R13 is OCH2O; R14 is H; R15 together with R16 is OCH2O; R17 is H; and X is a pharmaceutically acceptable anion.
50. A pharmaceutical composition comprising the lipid nanoparticle of any one of claims 42–49, and a pharmaceutically acceptable excipient.
51. A method of producing a modified mRNA sequence comprising an ORF encoding a polypeptide, the method comprising modifying a reference mRNA sequence comprising a reference ORF to produce the modified mRNA sequence by: (a) replacing one or more codons in the reference ORF comprising a CpA dinucleotide with a codon that encodes the same amino acid but does not comprise a CpA dinucleotide; and/or (b) replacing one or more codons in the reference ORF that: (1) ends in a cytidine nucleotide; and (2) is immediately followed in the reference ORF by a codon that encodes an isoleucine, methionine, threonine, asparagine, or lysine, or a codon that encodes a serine or arginine and begins with an adenosine nucleotide, with a codon encoding the same amino acid as the replaced codon but does not end in a cytidine nucleotide.
52. The method of claim 51, wherein the reference mRNA sequence further comprises: (i) a reference 5′ untranslated region (UTR); and/or (ii) a reference 3′ UTR.
53. The method of claim 52, wherein the reference 5′ UTR is a heterologous 5′ UTR and/or the reference 3′ UTR is a heterologous 3′ UTR.
54. The method of claim 52 or 53, wherein the replacing comprises changing the last nucleotide of the reference 5′ UTR from a cytidine nucleotide to a non-cytidine nucleotide.
55. The method of any one of claims 52–54, wherein the reference mRNA sequence further comprises: (iii) a 5′ cap structure; and/or (iv) a poly-A region.
56. The method of claim 55, wherein the replacing comprises changing the last nucleotide of the reference 3′ UTR from a cytidine nucleotide to a non-cytidine nucleotide.
57. The method of any one of claims 51–56, further comprising replacing one or more cytidine nucleotides in the reference mRNA sequence with guanosine nucleotides.
58. The method of any one of claims 51–57, further comprising replacing one or more unmodified cytidine nucleotides in the reference mRNA sequence with modified cytidine nucleotides.
59. The method of any one of claims 51–58, further comprising replacing one or more unmodified adenosine nucleotides in the reference mRNA sequence with modified adenosine nucleotides.
60. The method of any one of claims 51–59, further comprising replacing one or more adenosine nucleotides in the reference mRNA sequence with uracil nucleotides.
61. The method of any one of claims 51–60, further comprising replacing one or more adenosine nucleotides in the reference mRNA sequence, that are not immediately followed by a second adenosine nucleotide, with cytidine nucleotides.
62. The method of any one of claims 51–61, further comprising replacing one or more adenosine nucleotides in the reference mRNA sequence with guanosine nucleotides.
63. The method of any one of claims 51–62, wherein the ORF of the modified mRNA sequence comprises a number of CpA dinucleotides that is greater than or equal to the theoretical minimum and less than or equal to 300% of the theoretical minimum.
64. The method of any one of claims 51–63, wherein the ORF of the modified mRNA sequences comprises a number of CpA dinucleotides that is: (i) greater than or equal to a theoretical minimum; and (ii) no more than 11 CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum.
65. The method of claim 64, wherein the number of CpA dinucleotides per 100 nucleotides of the ORF greater than the theoretical minimum is no more than 10, no more than 9, no more than 8, no more than 7, no more than 6, no more than 5, no more than 4, no more than 3, no more than 2, or no more than 1.
66. The method of any one of claims 51–65, wherein the ORF of the modified mRNA sequence comprises a CpA dinucleotide content of 6.5% or less.
67. The method of claim 66, wherein the ORF of the modified mRNA sequence comprises a CpA dinucleotide content of 6.0% or less, 5.5% or less, 5% or less, 4.5% or less, 4% or less, 3.5% or less, 3.0% or less, 2.5% or less, 2.0% or less, 1.5% or less, 1.0% or less, or 0.5% or less.
68. The method of any one of claims 51–67, wherein, in the modified mRNA sequence: (a) fewer than 30% of amino acids that immediately precede an isoleucine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (b) fewer than 30% of amino acids that immediately precede a methionine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (c) fewer than 30% of amino acids that immediately precede a threonine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (d) fewer than 30% of amino acids that immediately precede an asparagine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (e) fewer than 30% of amino acids that immediately precede a lysine residue in the polypeptide are encoded by codons in the ORF that end in cytidine nucleotides; (f) fewer than 30% of amino acids that immediately precede a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides; and/or (g) fewer than 30% of amino acids that immediately precede an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, are encoded by codons in the ORF that end in cytidine nucleotides.
69. The method of any one of claims 51–68, wherein, in the modified mRNA sequence, fewer than 15% of serine residues, fewer than 27% of proline residues, fewer than 28% of threonine residues, and fewer than 23% of alanine residues in the polypeptide are encoded by codons in the ORF that comprise a CpA dinucleotide.
70. The method of any one of claims 51–69, wherein, in the modified mRNA sequence: (a) no serine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide;
(b) no proline residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; (c) no threonine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide; and/or (d) no alanine residue of the polypeptide is encoded by a codon in the ORF comprising a CpA dinucleotide.
71. The method of any one of claims 51–70, wherein, in the modified mRNA sequence: (a) no amino acid that immediately precedes an isoleucine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (b) no amino acid that immediately precedes a methionine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (c) no amino acid that immediately precedes a threonine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (d) no amino acid that immediately precedes an asparagine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (e) no amino acid that immediately precedes a lysine residue in the polypeptide is encoded by a codon in the ORF that ends in a cytidine nucleotide; (f) no amino acid that immediately precedes a serine residue, wherein the serine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide; and/or (g) no amino acid that immediately precedes an arginine residue, wherein the arginine residue is encoded by a codon in the ORF beginning with an adenosine nucleotide, is encoded by a codon in the ORF that ends in a cytidine nucleotide.
72. The method of any one of claims 51–71, wherein, in the modified mRNA sequence, no amino acid that immediately precedes an isoleucine, methionine, threonine, asparagine, lysine residue in the polypeptide is encoded by a codon that ends in a cytidine nucleotide.
73. The method of any one of claims 51–72, wherein, in the modified mRNA sequence, no codon in the ORF beginning with an adenosine nucleotide is immediately preceded by a codon in the ORF that ends in a cytidine nucleotide.
74. The method of any one of claims 51–73, wherein the modified mRNA sequence comprises a %G/C content of 30% – 80%, 40% – 70%, 50% – 60%, 35% – 50%, 50% – 65%, 65% – 70%, 40% – 45%, 45% – 50%, 50% – 55%, 55% – 70%, 70% – 75%, or 75% – 80%.
75. The method of any one of claims 51–74, wherein one or more nucleotides of the modified mRNA sequence comprises a chemically modified nucleotide.
76. The method of any one of claims 51–74, wherein each of the uridine nucleotides of the modified mRNA sequence comprises a chemically modified nucleotide.
77. The method of claim 75 or 76, wherein the chemically modified nucleotide comprises N1-methylpseudouridine.
78. The method of any one of claims 75–77, wherein one or more CpA dinucleotides of the modified mRNA sequence comprises a modified cytidine nucleotide and/or a modified adenosine nucleotide.
79. The method of any one of claims 51–78, wherein the number of CpA dinucleotides comprising an unmodified cytidine nucleotide and an unmodified adenosine nucleotide in the ORF of the modified mRNA sequence is 100%, 95% or less, 90% or less, 80% or less, 70% or less, 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, or 10% or less of the total number of histidine and glutamine residues in the polypeptide.
80. The method of any one of claims 51–79, wherein the polypeptide comprises 9–5,000, 20–4,000, 30–3,000, 40–2,000, or 50–1,500 amino acids.
81. The mRNA of any one of claims 51–80, wherein the polypeptide is a vaccine antigen or a therapeutic protein.
82. The method of any one of claims 51–81, wherein the ORF of the modified mRNA sequence is codon-optimized for expression in a cell.
83. The method of claim 82, wherein the cell is a mammalian cell.
84. The method of claim 82 or 83, wherein the cell is a human cell.
85. The method of any one of claims 51–84, further comprising transcribing the modified mRNA sequence to produce a modified mRNA.
86. The method of claim 85, wherein a level of expression in a mammalian cell of the encoded polypeptide from the modified mRNA is at least 80% of a level of expression of the reference mRNA.
87. The method of claim 85 or 86, wherein a coefficient of degradation at 25 °C of the modified mRNA is 90% or less, 80% or less, 70% or less, 60% or less, or 50% or less, relative to an mRNA comprising the reference ORF.
88. The method of any one of claims 85–87, wherein a composition comprising a plurality of the mRNAs is remains at least above 50% purity for at least 30 days, at least 60 days, at least 90 days, at least 120 days, at least 150 days, or at least 180 days longer in storage than a composition comprising a plurality of mRNAs comprising the reference ORF.
89. The method of claim 88, wherein storage of the modified mRNA is conducted at a temperature between about 2 °C to about 8 °C.
90. The method of any one of claims 85–89, wherein the modified mRNA has increased stability relative to a reference mRNA comprising the reference mRNA sequence.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263422103P | 2022-11-03 | 2022-11-03 | |
| PCT/US2023/078516 WO2024097874A1 (en) | 2022-11-03 | 2023-11-02 | Chemical stability of mrna |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4612301A1 true EP4612301A1 (en) | 2025-09-10 |
Family
ID=89076347
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP23817920.4A Pending EP4612301A1 (en) | 2022-11-03 | 2023-11-02 | Chemical stability of mrna |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4612301A1 (en) |
| WO (1) | WO2024097874A1 (en) |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017020026A1 (en) | 2015-07-30 | 2017-02-02 | Modernatx, Inc. | Concatemeric peptide epitopes rnas |
| TW201729835A (en) | 2015-10-22 | 2017-09-01 | 現代公司 | Respiratory virus vaccine |
| CN116837052A (en) | 2016-09-14 | 2023-10-03 | 摩登纳特斯有限公司 | High-purity RNA composition and preparation method thereof |
| WO2018089851A2 (en) | 2016-11-11 | 2018-05-17 | Modernatx, Inc. | Influenza vaccine |
| WO2019148101A1 (en) | 2018-01-29 | 2019-08-01 | Modernatx, Inc. | Rsv rna vaccines |
| MA53650A (en) | 2018-09-19 | 2021-07-28 | Modernatx Inc | PEG LIPIDS AND THEIR USES |
| AU2019345067A1 (en) | 2018-09-19 | 2021-04-08 | Modernatx, Inc. | High-purity peg lipids and uses thereof |
| CA3132975A1 (en) | 2019-03-11 | 2020-09-17 | Modernatx, Inc. | Fed-batch in vitro transcription process |
| US12329811B2 (en) | 2021-01-11 | 2025-06-17 | Modernatx, Inc. | Seasonal RNA influenza virus vaccines |
| US20220363937A1 (en) | 2021-05-14 | 2022-11-17 | Armstrong World Industries, Inc. | Stabilization of antimicrobial coatings |
| WO2025054383A1 (en) * | 2023-09-06 | 2025-03-13 | Modernatx, Inc. | Chemical stability of mrna |
Family Cites Families (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| ATE490267T1 (en) | 2001-06-05 | 2010-12-15 | Curevac Gmbh | STABILIZED MRNA WITH INCREASED G/C CONTENT CODING A VIRAL ANTIGEN |
| DE102005046490A1 (en) | 2005-09-28 | 2007-03-29 | Johannes-Gutenberg-Universität Mainz | New nucleic acid molecule comprising promoter, a transcriptable nucleic acid sequence, a first and second nucleic acid sequence for producing modified RNA with transcriptional stability and translational efficiency |
| DK2205618T3 (en) | 2007-09-26 | 2017-02-20 | Intrexon Corp | SYNTHETIC 5 'NON-TRANSLATED REGIONS, EXPRESSION VECTORS AND PROCEDURES FOR INCREASING TRANSGENIC EXPRESSION |
| WO2009075886A1 (en) | 2007-12-11 | 2009-06-18 | The Scripps Research Institute | Compositions and methods related to mrna translational enhancer elements |
| EP2342336B1 (en) | 2008-09-05 | 2016-12-14 | President and Fellows of Harvard College | Continuous directed evolution of proteins and nucleic acids |
| MX342785B (en) | 2009-06-10 | 2016-10-12 | Alnylam Pharmaceuticals Inc | IMPROVED LIPID FORMULATION. |
| CA2824526C (en) | 2011-01-11 | 2020-07-07 | Alnylam Pharmaceuticals, Inc. | Pegylated lipids and their use for drug delivery |
| CN104114572A (en) * | 2011-12-16 | 2014-10-22 | 现代治疗公司 | Modified nucleosides, nucleotides and nucleic acid compositions |
| US20160022840A1 (en) | 2013-03-09 | 2016-01-28 | Moderna Therapeutics, Inc. | Heterologous untranslated regions for mrna |
| US10821175B2 (en) | 2014-02-25 | 2020-11-03 | Merck Sharp & Dohme Corp. | Lipid nanoparticle vaccine adjuvants and antigen delivery systems |
| US11866754B2 (en) | 2015-10-16 | 2024-01-09 | Modernatx, Inc. | Trinucleotide mRNA cap analogs |
| EP3374504B1 (en) * | 2015-11-09 | 2025-03-19 | CureVac SE | Optimized nucleic acid molecules |
| EP3576751A4 (en) * | 2017-02-01 | 2021-08-04 | ModernaTX, Inc. | RNA CANCER VACCINES |
| CA3194325A1 (en) * | 2020-11-06 | 2022-05-12 | Danilo Casimiro | Lipid nanoparticles for delivering mrna vaccines |
-
2023
- 2023-11-02 EP EP23817920.4A patent/EP4612301A1/en active Pending
- 2023-11-02 WO PCT/US2023/078516 patent/WO2024097874A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024097874A9 (en) | 2025-06-12 |
| WO2024097874A1 (en) | 2024-05-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4612301A1 (en) | Chemical stability of mrna | |
| WO2022212711A2 (en) | Methods for identification and ratio determination of rna species in multivalent rna compositions | |
| US20250251391A1 (en) | Ribosomal engagement potency assay | |
| CN111315359A (en) | Methods of preparing lipid nanoparticles | |
| CN113271926A (en) | Preparation of lipid nanoparticles and methods of administration thereof | |
| WO2021155171A1 (en) | Delivery of compositions comprising circular polyribonucleotides | |
| JP2023531511A (en) | LNP compositions comprising mRNA therapeutics with extended half-lives | |
| CN115398546A (en) | Improved method for in vitro transcription of messenger RNA | |
| WO2024206835A1 (en) | Circular mrna and production thereof | |
| JP2026503718A (en) | Epstein-Barr virus mRNA vaccine | |
| CN120858178A (en) | DNA compositions containing modified cytosine | |
| CN120082547B (en) | Capping analogs for self-replicating mRNA and uses thereof | |
| CA3158013A1 (en) | Mrnas encoding granulocyte-macrophage colony stimulating factor for treating parkinson's disease | |
| WO2025054383A1 (en) | Chemical stability of mrna | |
| US20250313830A1 (en) | Messenger ribonucleic acids with extended half-life | |
| HK40126401B (en) | Capped analog for self-replicating mrna and use thereof | |
| WO2025010420A2 (en) | Compositions and methods for delivering molecules | |
| HK40126401A (en) | Capped analog for self-replicating mrna and use thereof | |
| KR20260052931A (en) | Capped mRNA and method for preparing the same |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20250522 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) |