WO2022161294A1 - 一种中通量单细胞拷贝数文库构建的方法及其应用 - Google Patents
一种中通量单细胞拷贝数文库构建的方法及其应用 Download PDFInfo
- Publication number
- WO2022161294A1 WO2022161294A1 PCT/CN2022/073321 CN2022073321W WO2022161294A1 WO 2022161294 A1 WO2022161294 A1 WO 2022161294A1 CN 2022073321 W CN2022073321 W CN 2022073321W WO 2022161294 A1 WO2022161294 A1 WO 2022161294A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequencing
- cell
- library
- sequence
- primer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/16—Primer sets for multiplex assays
Definitions
- the invention relates to the field of single-cell sequencing, in particular to a method for constructing a medium-throughput single-cell copy number library and its application.
- next-generation sequencing includes genome sequencing, transcriptome sequencing, epigenetic sequencing, etc.
- target sequence target sequence
- sequencing library preparation a special sequencing adapter needs to be added to the 2 ends of the target sequence (target sequence), which is the so-called sequencing library preparation.
- target sequence target sequence
- sequencing library preparation the so-called sequencing library preparation.
- single-cell sequencing technology has developed rapidly and achieved important results in the fields of reproduction, development, aging, and cancer research.
- expensive experimental costs and high-quality library preparation are the key obstacles standing in front of researchers. Therefore, high-throughput, low-cost, high-quality single-cell library preparation technology and corresponding sequencing strategies have broad prospects.
- Single-cell genome sequencing technology and population-based cell genome sequencing technology are basically the same in library preparation, and both require steps such as fragmentation, adding adapters, and polymerase chain reaction (PCR). But the difference is that single-cell sequencing generally requires the use of special single-cell genome amplification methods for pre-amplification, such as MDA , MALBAC or DOP-PCR based amplification methods. But in any case, the cost of single-cell genome sequencing has increased. Therefore, due to various limitations, single-cell genome sequencing technology is often time-consuming, labor-intensive, and expensive in library preparation; from the acquisition of a single cell to the completion of the actual sequencing library preparation, the steps involved are cumbersome and require a lot of reagents and consumables. The cost of constructing a cell genome sequencing library is far greater than that of transcriptome sequencing.
- Single-cell genome sequencing mainly includes copy number variation (CNV) sequencing and single nucleotide variation (SNV) sequencing (SNV is not covered by this patent).
- CNV copy number variation
- SNV single nucleotide variation
- Low-throughput (usually single-cell independent and whole-process library construction) single-cell genome sequencing is expensive, time-consuming and labor-intensive.
- high-throughput single-cell genome sequencing has greatly improved the throughput efficiency.
- the number of cells in these clinical samples is not large. Taking preimplantation prenatal diagnosis (PGT) as an example, only 8-13 cells of the trophoblast are required, or 3-5 cells.
- PTT preimplantation prenatal diagnosis
- CTCs circulating tumor cells
- the cost includes library construction and sequencing 2
- the cost of scCNV sequencing is mainly in library construction, and scSNV sequencing is more expensive in library construction and sequencing 2 (this patent does not involve scCNV innovation).
- the object of the present invention is to overcome the shortcomings of the above-mentioned prior art and provide a low-cost and high-efficiency medium-throughput single-cell copy number sequencing method MT-scCNV-seq (CNV: Copy Number based on Tn5 transposase specific primers) Variation Copy number variation in chromosomal or subchromosomal regions or DNA segments.
- sc Single cell.
- MT Medium throughput).
- MT/medium throughput is only compared to high throughput (HT) and low throughput for single-cell sequencing.
- Single-cell HT now refers to the parallel operation of more than thousands of cells in one operation program, but sometimes hundreds of cells or even dozens of cells are sometimes considered HT, while low-throughput means that a single cell independently builds a library throughout the entire process.
- Our technology can perform CNV-seq of several to hundreds of accurately labeled single cells in parallel in one program, and the combination of multiple programs can process thousands to tens of thousands of single cells, so it can also belong to HT technology.
- MT-scCNV-seq it is now called MT-scCNV-seq.
- scCNV-seq is a powerful tool in the fields of tumor heterogeneity and evolution, tumor biomarker identification, reproductive health, drug screening, and disease pathology research.
- its current clinical bottleneck especially in the low-throughput operation of "third-generation IVF preimplantation genetic testing" (PGT)
- PTT third-generation IVF preimplantation genetic testing
- the current scCNV-seq technology is not only low-throughput, but more seriously, it is generally based on independent single-cell whole-genome amplification technology, plus independent library construction and sequencing methods of amplified DNA, which are inefficient in cost and time.
- Our MT-scCNV-seq is based on an innovatively designed nucleic acid sequence combined with Tn5 transposase, which enables it to randomly capture nucleic acid fragments and insert a cell-specific barcode sequence when building a library by next-generation sequencing. Then, a large number of single cells are mixed, and a one-step mixed amplification is performed under the micro-reaction system in the subsequent steps to build a library, and the batch index sequence is used to achieve fast, efficient and medium-throughput single-cell copy number sequencing.
- a method for constructing a medium-throughput single-cell copy number library comprising: separately performing cell lysis on the single cells selected in a multi-well plate, and performing DNA fragmentation and library building based on Tn5 transposase to obtain a A single-cell genome sequencing library directly used for subsequent sequencing; the steps include:
- Sorting and capturing single cells capturing single cells into multi-well plates including but not limited to 96-well or 384-well plates, or multiple test tubes but not limited to 8 or 12 tubes;
- Reaction treatment by inactivating the enzyme and purifying DNA or diluting the sample, the inhibitory reaction of the aforementioned reaction to the downstream is relieved;
- Tn5 transposase Using Tn5 transposase to build a library: Fragmenting genomic DNA based on Tn5 transposase and adding a single-cell barcode recognition sequence formed by a combination of N single nucleotides to the DNA fragment;
- step 1) sorting single cells flow cytometry or other alternative or cell type-specific enrichment and sorting equipment may be used, including but not limited to cellenone or namocell single cell sorter.
- the step 2) lysing cells is performed with Zymo lysis buffer (cat#D3004-1-50).
- step 2) lysis of cells is performed with Qiagen Protease (cat#19155/19157), and the enzyme is inactivated by heating instead of purification after lysis is complete.
- Qiagen Protease cat#19155/19157
- the step 3) purifying DNA is performed with AMPure XP (cat#A63881) magnetic beads, or other magnetic beads that can purify DNA.
- AMPure XP catalog#A63881
- magnetic beads or other magnetic beads that can purify DNA.
- the Tn5 transposase library construction in step 4) includes the following steps: adding Tn5 transposase to the single-cell DNA solution for reaction, and then adding an enzyme inhibitor to completely stop the fragmentation reaction and enzymatic activity of Tn5.
- the Tn5 transposase contains a binding primer
- the binding primer is composed of three parts A, B, and C
- the A primer contains a cell recognition sequence of N single nucleotide combinations and a P5 end linker sequence and The reverse ME sequence
- the B primer contains the P7 end linker sequence and the reverse ME sequence
- the C primer is an oligonucleotide fragment with phosphorylation at the 5 end, and can be partially complementary to the A primer and the B primer respectively.
- the nucleotide sequence of the A primer is shown in SEQ ID NO: 1 ⁇ 48
- the nucleotide sequence of the B primer is shown in SEQ ID NO: 49
- the nucleotide sequence of the C primer is shown in SEQ ID NO: 49 ID NO: 50.
- the step 6) is constructed into a specially designed sequencing library, in which an anchor sequence and a cell barcode sequence are added to the 5' end of each nucleic acid fragment;
- the downstream primer is added with an amplification adapter sequence compatible with the sequencing system;
- the DNA fragments obtained by amplification include the P5 end adapter sequence, index sequence 1, sequencing primer binding site 1, and cell barcode recognition sequence in order from the 5' end to the 3' end.
- the barcode sequence is a nucleotide sequence of 3 random bases plus a length of 8bp bases;
- the anchor sequence is AGATGTGTATAAGAGACAG;
- the sequencing primer binding site 1 is:
- the specific structure of the nucleotide fragments in the sequencing library is as follows:
- the anchor sequence is a nucleic acid sequence used to stably find the insertion position of the recognition sequence in the later sequencing data, and the index sequence 1 and the index sequence 2 are both index sequences used to mark the experimental batch.
- the step 7) library purification and library length selection adopts but is not limited to DNA fragment length selective magnetic beads, and gel electrophoresis to classify fragments and selectively recover them.
- the specific step of performing second-generation sequencing in the step 8) is as follows: mixing multiple libraries of different index sequences, and then using a high-throughput sequencing platform to perform sequencing on the same lane or directly according to the amount of data required by oneself Scattered Sequencing.
- DNA purification and sequencing can be performed after fragment screening, or DNA purification can be directly performed without fragment screening and then sequencing.
- the single cell of each sample can be replaced by a plurality of cells, which can be 1-50, 50-100, 100-200, 200-500, 500-1000, 1000-10000 cells, purified 1ng to 1ug of genomic DNA.
- the invention also provides the application of the method in preparing detection kits, experimental devices or detection systems related to basic research on cancer, reproductive health and general health, as well as clinical diagnosis, treatment, and pharmacy.
- the method can reach a medium-throughput level or even a high-throughput level depending on the requirements of the experiment. It is mainly reflected in the fact that after preparing the sample into a single-cell suspension, a 10 ⁇ l filter-containing pipette method is used to capture and separate single cells, or sorting-level flow cytometry can be used when the throughput is high.
- the single-cell sorting system such as Namocell, which has been put into production on the market, is used for sorting.
- FIG. 1 is a technical flow chart of the present invention.
- Figure 2 is a schematic diagram of the assembly of primers bound to Tn5 transposase.
- Figure 3 is a schematic diagram of single cell capture.
- Figure 4 is a schematic diagram of the structure of the sequencing library after PCR amplification and purification.
- Figure 5 is a schematic diagram of E-Gel analysis of sequencing libraries for medium-throughput single-cell copy number variation of K562 cells, followed by gel cutting (300-500 bp) recovery.
- Figure 7 is a schematic diagram of E-Gel analysis of the sequencing library (48 single-cell pooled library) of medium-throughput single-cell copy number variation for GM12878 cell line, followed by gel cutting (300-500bp) recovery.
- Figure 8 is a schematic diagram of the detection results of the single-cell CNV constructed for the K562 cell line after library construction, using 2100 to build a library fragment. It can be seen that the kurtosis is between 300-800, which meets the on-machine sequencing standard.
- Figure 9 is a schematic diagram of the detection results of the single-cell CNV sequencing library constructed for the normal control and Jurkat cell line, using 2100 to build the library fragment, the kurtosis is between 300-800, which meets the on-machine sequencing standard, wherein the normal control is Normal human peripheral blood mononuclear cells, the number of single cells in the bank is 48. There are 48 Jurkat cell line banks.
- Figure 10 is a schematic diagram of the detection of the library fragments using the 2100 Nucleic Acid Analyzer after the single-cell CNV library constructed for the GM12878 cell line. The number of cells was 48.
- Figure 11 is a schematic diagram of the quality of sequencing library data for medium-throughput single-cell copy number variation in K562 cells.
- Figure 12 is a graph showing the quality of pooled sequencing library data for mid-throughput single-cell copy number variation for Jurkat cell line and normal human peripheral blood mononuclear cells.
- Figure 13 is a graph showing the quality of sequencing library data for mid-throughput single-cell copy number variation for the GM12878 cell line.
- the binding primer must contain the ME sequence in order to combine with the transposase and complete the one-step process of breaking, building a library and adding a linker, Furthermore, a complementary double-stranded structure is required. Therefore, the synthesized primers need to be pre-annealed, that is, two primers designed to have a complementary sequence are re-integrated into a double strand according to the principle of annealing.
- the Tn5 transposase binding primer of the present invention is composed of three parts: A primer, B primer and C primer, and the A primer consists of a barcode recognition sequence of 3 random bases + 8bp bases and a P5 end linker sequence, and reverse
- the ME sequence consists of the B primer; the B primer consists of the P7 end linker sequence and the reverse ME sequence; the C primer is an oligonucleotide fragment with phosphorylation at the 5 end, and the A primer and the B primer are partially complementary to the C primer, respectively.
- the primer nucleotide sequence is shown in any one of SEQ ID NOs: 1 to 48, the B primer nucleotide sequence is shown in SEQ ID NO: 49, and the C primer nucleotide sequence is shown in SEQ ID NO: 50.
- the P5 end adapter is used to match the 5-terminal PCR amplification sequence of the illuminate sequencing platform, which can facilitate the addition of the official signature sequence (index1) and sequencing adapter 1 by PCR technology after mixing (pooling);
- P7 end adapter is used for In order to match the 7-terminal PCR amplification sequence of the illuminate sequencing platform, it is also convenient to add the official tag sequence (index2) and sequencing adapter 2 by PCR technology after pooling. This results in an N ⁇ M combination that can perform medium-throughput single-cell sequencing and save costs (no need to package the entire flowcell or lane, but can mix samples for sequencing).
- a and C can be partially complementary
- B and C can be partially complementary, therefore, primers A and C, and primers B and C need to be annealed to form double strands respectively before the library construction reaction, that is, to obtain P5 and P7 linkers.
- Pre-annealed nucleic acid products can be stored in a -20°C refrigerator for subsequent single-cell copy number sequencing library construction experiments.
- Tn5 transposase can recognize the double-stranded portion of the above-mentioned P5 and P7 joints, and the two different double-stranded nucleic acid products are assembled with Tn5 transposase to form a Tn transposase complex that can be used for next-generation sequencing library construction. as shown in picture 2.
- the above-mentioned linker P7 is the above-mentioned transposase and sequence conjugate
- the above-mentioned linker P5 is the above-mentioned transposase and sequence conjugate.
- the above reaction product is the reaction enzyme that has assembled the adapter, which can be used for the following single-cell copy number variation sequencing library construction, or stored at -20°C.
- the state of the cells has a great influence on the method of the present invention. If there are too many debris in the cell culture medium, the cell sorting under the microscope will be affected. If the cells are nutrient deficient, the chromosomal three-dimensional structure or chromatin structure of the overall cell may have a certain impact, or cause cell death to produce debris.
- the specific steps of cell culture in the present embodiment are as follows:
- the cell samples selected in this example include: K562 cells, Jurkat cells, and GM12878 cells, among which K562 is taken as an example.
- K562 cells were centrifuged at 800 rpm using a low-speed centrifuge for 5 min
- the concentration of cultured cells is about 1 ⁇ 10 5 , and transferred to a 15ml centrifuge tube.
- thermo sterile enzyme-free water a solution of 1 ⁇ l thermo sterile enzyme-free water, and then lyse for 10 min (at 7.5 min, flick the bottom with your finger 3 times, and centrifuge briefly after mixing).
- Tn5 transposase was added in sequence according to the number of single cells needed to build the library, and the reaction was performed at 55° C. for 20 minutes to perform nucleic acid fragmentation and addition of amplification linker sequences (ie, the above-mentioned library building and sequencing linkers, AC, BC).
- the number of cycles is determined according to the number of single cells mixed into the library, generally 27-28 cycles for a single cell, and 22-23 cycles for a mixture of 48 cells.
- the p7 primer and P5 primer are commercial kits. Norwegian or illuminate can be purchased.
- an anchor sequence (ME sequence), which is used to locate the barcode sequence and mimic the ME sequence AGATGTGTATAAGAGACAG, so that it can be assembled normally with the Tn5 enzyme.
- the DNA insert in Figure 4 in the gray part represents the fragment that needs to be sequenced.
- Rd2SP is the other end sequencing primer binding sequence for paired-end sequencing.
- Index sequence 2 (index2) is the tag sequence at the P7 end of the anchor sequence.
- the purpose of designing this sequence is to reduce cost, high efficiency and match the existing platform, so paired-end sequencing and double-end index are used.
- the amount of sequencing data can be selected according to the needs, and there is no need to package the entire sequencing lane or the entire sequencing pool (flowcell), which reduces the cost of sequencing to a certain extent.
- Standardized instrument Take two tubes, add 199 ⁇ l of working buffer to each tube, and then add 1 ⁇ l of fluorescent dye, briefly centrifuge and vortex to mix, discard 10 ⁇ l of liquid with a pipette tip and add 10 ⁇ l The standard reagents were centrifuged briefly and then vortexed to mix well. After incubating at room temperature for 2 minutes, place the tube in the instrument and click the screen button of the manipulator to perform automatic standardization operation.
- Table 6 K562 cell line library preparation Qbit nucleic acid analyzer concentration analysis table
- Table 8 GM12878 cell line library preparation Qbit nucleic acid analyzer concentration analysis table
- Table 9 Data quality of single-cell copy number variation sequencing libraries of K562 cell line (take one group as an example)
- the data quality obtained by this method for the K562 cell line library construction generally meets the expected standard. Since this sequencing is a packet lane sequencing, it can avoid data waste and test whether the double-ended index of the commercial standard matches this method. Therefore, the library was constructed by adding 7 indexes to the same batch of cells. It can be seen from the figure and table that cleanreadsrate accounts for 98.62% of the total data volume, and the Q30rate of both rawdata and cleandata reaches more than 93%. Therefore, the quality of the database constructed by this method is in line with the requirements of the later bioinformatics analysis, and it produces less data redundancy and saves costs.
- Table 10 Data quality of single-cell copy number variation sequencing libraries of Jurkat cell lines and normal human peripheral blood mononuclear cells. (Take one of the groups as an example)
- Table 11 Data quality of single-cell copy number variation sequencing libraries for the GM12878 cell line.
- Primer A of the present embodiment is shown in following table 12:
- the lowercase part is the independently designed Barcode sequence
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Wood Science & Technology (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
一种中通量单细胞拷贝数文库构建和测序的方法,所述方法包括:分选单细胞,对各个单细胞进行独立细胞裂解、Tn5转座酶系统插入基因组DNA同时并进行单细胞特异性全基因组条码精准标记,随后进行多样品混合、在单一试管内进行多样品测序文库构建,并进行后续测序,测序后根据单细胞样品条码精准解码获得鉴定对应细胞的数据和分析。该方法核心是含有样品特异性的核苷酸条码寡核苷酸的Tn5转座酶对单细胞样品DNA进行片段化同时加入识别序列,随后早期进行直接混合多个样品,从而无需预扩增、实现早期单试管混合样品操作、而且文库与通用测序系统兼容,实现高效的基因组测序文库构建和测序。
Description
本发明涉及单细胞测序领域,具体涉及一种中通量单细胞拷贝数文库构建的方法及其应用。
随着人类基础医学的蓬勃发展,二代测序平台的越来越成熟。二代测序包括基因组测序、转录组测序、表观组测序等。二代测序最主要的前提是是需要在靶序列(目标序列)的2端加上特别的测序接头,也就是所谓的测序文库制备。近年来单细胞测序技术飞速发展,在生殖、发育、衰老及癌症研究等领域取得了重要成果,但昂贵的实验费用和高质量的文库制备是矗立在研究人员面前关键障碍。因此高通量、低成本的高质量单细胞文库制备技术和相应的测序策略有广阔的前景。
但是,令人遗憾的是,即使是目前为止较成熟的高通量单细胞转录组测序技术,价格成本也是十分昂贵,使用基于drop-seq技术的10×genomics chronium平台也需要高达数万元/每个样品(约3000-6000个单细胞),而且除此以外还有诸多的限制。
传统的单细胞基因组测序技术与群体细胞基因组测序技术在文库制备上是基本上一致,都需要经过片段打断,加接头,聚合酶链式反应(PCR)等步骤。但不同的是,单细胞测序为了达至足够的起始质量以便可以使用超声波或酶切的方法打断基因组核酸序列,普遍都需要使用特殊的单细胞基因组扩增方法进行预扩增,例如MDA、MALBAC或基于DOP-PCR的扩增方法。但无论怎样,都使得单细胞基因组测序的成本都提升。因此单细胞基因组测序技术由于各种限制,在文库制备上常常耗时、耗力、耗费;从单个细胞的获取到真正测序文库制备完毕,所涉及步骤繁琐,需要大量的试剂耗材,每一个单细胞基因组测序文库构建费用远远大于转录组测序。
单细胞基因组测序主要包括拷贝数变异(CNV)测序,及单核苷酸变异(SNV)测序(SNV本专利不涉)。低通量(通常是单个细胞分别独立全程建库)单细胞基因组测序费用昂贵,费时、费力,近年出现的高通量的单细胞基因组 测序在大大提高了通量效率,固然在一些研究领域如肿瘤研究具有巨大潜在价值,但是,不仅其仍然昂贵的费用让人望而却步,在某些重要的临床检测应用上有着诸多实际的限制。1、这些临床样品细胞数量并不多。以植入前产前诊断(PGT)为例,只需要滋养层8-13个细胞即可,或者3-5个细胞。以循环肿瘤细胞(CTC)为例,在病人常规的2ml血中,一般只存在3-20个,甚至无法纯化出CTC,通量一般续保数十至数百样品。2、无法进行精准的对指定单个细胞的预先标记。目前的高通量技术中,存在条码序列的高通量单细胞建库技术无法在文库构建时精准定点地对单个细胞进行标记;只用于生信分析后期将数据归属于不同的单细胞数据,并不能精准鉴定某一个数据属于哪一个预先指定的单细胞。3、价格昂贵,费用包括在建库和测序2方面,scCNV测序的费用主要在建库方面,scSNV测序在建库和测序2方面都更加昂贵(本专利不涉及scCNV创新)。
目前尚没有理想的技术可以在单细胞层面实现中(高)通量单细胞拷贝数文库构建方法,能精准标记每个指定细胞,同时快速、经济、高效,并适合于临床实用性的中(高)通量的scCNV(MT-scCNV)技术。
发明内容
本发明的目的在于克服上述现有技术的不足之处而提供基于Tn5转座酶特异性引物的一种低成本高效率中通量单细胞拷贝数测序方法MT-scCNV-seq(CNV:Copy Number Variation染色体或亚染色体区域或DNA片段的拷贝数变异。sc:Single cell单细胞。MT:Medium throughput中通量)。
MT/中通量仅仅是与单细胞测序的高通量(HT)和低通量相比而言。单细胞HT现指一个操作程序中同时平行操作数千细胞以上,但是数百细胞甚至数十细胞有时也算HT,而低通量是单个细胞分别全程独立建库。我们的技术可以在一个程序中平行进行数个至数百个精准标记单细胞的CNV-seq,多个程序合并就可以进行数千、上万个单细胞的处理,故也可属于HT技术,但是为突出本技术特点,现称为MT-scCNV-seq。
scCNV-seq作为单细胞测序的最新技术之一,在肿瘤异质性和进化、肿瘤生物标记鉴定、生殖健康、药物筛选和疾病病理机制研究等领域是有力的工具。但是其目前的临床上尤其在“第三代试管婴儿植入前遗传检测”(PGT)的低通量 操作技术瓶颈阻碍了这一技术的应用。目前scCNV-seq技术不仅低通量,更严重的是普遍基于独立的单细胞全基因组扩增技术、加上扩增后DNA的独立建库和测序方法,成本和时间都低效。虽然近年在国际高水平杂志上也有数项高通量scCNV-seq技术报道,但是其所要求的样品数(巨大)、随机标记单细胞(不能精准标记单细胞)的方式、需要基因组预扩增、微流控芯片和特殊测序方案,从而导致时间、效率等都不适用于临床样品检测的要求,导致这些方法没有见到任何后续研究者应用,更没有临床应用。
我们的MT-scCNV-seq基于创新设计的与Tn5转座酶相结合的核酸序列,使之在二代测序建立文库时,在随机捕获核酸片段的同时插入细胞特异性的条码(barcode)序列,既而混合大量的单细胞,在后续步骤中的微量反应体系下进行一步法混合扩增建库,配合批次标签(index)序列实现快速高效中通量的单细胞拷贝数测序。其核心设计点是:改目前技术各个单细胞独立扩增+独立建库为一步法直接进行多个单细胞的混合建库;改国际最新发布技术的随机标记单细胞为精准标记单细胞,并改其不兼容当前测序平台为特殊设计友好接轨二代测序平台,大大提高效率和质量,满足临床和科研实验室的要求。
本发明采取的技术方案为:
一种中通量单细胞拷贝数文库构建的方法,所述方法包括:在多孔板中对分选出的单细胞分别进行细胞裂解以及进行基于Tn5转座酶的DNA断裂和建库,获得可以直接用于后续测序的单细胞基因组测序文库;其步骤包括:
1)分选及捕获单细胞:捕获单细胞到多孔版包括但是不限于96孔或384孔板,或多联试管但不限于8联管或12联管;
2)裂解细胞:充分暴露基因组DNA;
3)反应处理:通过失活酶及纯化DNA或稀释样品,解除前述反应对下游的抑制反应;
4)采用Tn5转座酶建库:基于Tn5转座酶片段化基因组DNA同时在DNA片段加入由N个单核苷酸组合形成的单细胞barcode识别序列;
5)混合多样品于单试管及纯化和浓缩体积;
6)单管内平行建立多样品文库:用PCR扩增进行,同时每批次采用独特设计的含特定索引的与二代测序系统兼容的引物;
7)进行文库纯化和选择文库长度;
8)二代测序及数据的单细胞特异性解码;
9)下游分析。
优选地,所述步骤1)分选单细胞可用流式细胞分选仪或其他替代性或细胞类型特异富集及分选设备,包括但不限于cellenone或namocell单细胞分选仪。
优选地,所述步骤2)裂解细胞用Zymo lysis buffer(cat#D3004-1-50)进行。
优选地,所述步骤2),裂解细胞用Qiagen Protease(cat#19155/19157)进行,而且裂解完成后通过加热替代纯化使该酶失活。
优选地,所述步骤3)纯化DNA用AMPure XP(cat#A63881)磁珠,或其他可纯化DNA的磁珠进行。
优选地,所述步骤4)所述Tn5转座酶建库包括以下步骤:在单细胞DNA溶液中加入Tn5转座酶进行反应,然后加入酶抑制剂完全终止Tn5的片段化反应和酶活性。
优选地,所述Tn5转座酶含有结合引物,所述结合引物由A、B、C三部分组分,所述A引物含有N个单核苷酸组合的细胞识别序列和P5端接头序列以及反向ME序列;所述B引物含有P7端接头序列和反向ME序列;所述C引物为5端带有磷酸化的寡核苷酸片段,且分别能与A引物和B引物分别部分互补;所述A引物的核苷酸序列如SEQ ID NO:1~48所示,所述B引物的核苷酸序列如SEQ ID NO:49所示,所述C引物的核苷酸序列如SEQ ID NO:50所示。
优选地,所述步骤6)构建成专门设计的测序文库,其中的每一个核酸片段的5’端分别添加锚定序列、细胞条码序列;随后在DNA片段扩增时,分别在扩增的上下游引物添加与测序系统兼容的扩增接头序列;扩增获得的DNA片段从5’端到3’端的方向依次包括P5端接头序列、索引序列1、测序引物结合位点1、细胞barcode识别序列、锚定序列、待测序列、测序引物结合位点2、索引序列2、P7端接头序列,最终形成与Illumina测序系统兼容的二代测序文库。
优选地,所述barcode序列为3个随机碱基加长度为8bp碱基的一段核苷酸序列;所述锚定序列为AGATGTGTATAAGAGACAG;所述测序引物结合位点1为:
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGATGTGTATAAGAGA CAG;所述测序引物结合位点2:
GTCTCGTGGGCTCGAGATGTGTATAAGAGACAG。
优选地,所述测序文库中核苷酸片段具体结构如下:
5’-AATGATACGGCGACCACCGAGATCTACAC(index1)TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG(NNN+N位barcode)
AGATGTGTATAAGAGACAG-TARGET-CTGTCTCTTATACACATCTCCGAGCCCACGAGAC(index2)ATCTCGTATGCCGTCTTCTGCTTG-3’;所述“TARGET”表示目的核酸片段。
优选地,所述锚定序列是用于在后期测序数据中稳定的查找识别序列的插入位置的核酸序列,所述索引序列1和索引序列2均为用于标记实验批次的index序列。
优选地,所述步骤7)文库纯化和文库长度选择采用但不限于DNA片段长度选择性磁珠,和凝胶电泳分类片段并选择性回收。
优选地,所述步骤8)中进行二代测序的具体步骤为:将不同索引序列的多个文库进行混合,然后采用高通量测序平台在同一测序lane或者直接根据自己所需要的数据量进行散样测序。
优选地,可根据数据量实际需求,进行片段筛选后进行DNA纯化进行测序,或者无片段筛选直接进行DNA纯化后进行测序。
优选地,每个样品的单细胞可以替换为多个细胞,可以是1-50个、50-100个、100-200个、200-500个、500-1000个、1000-10000个细胞,纯化的基因组DNA 1ng到1ug。
本发明还提供了所述的方法在制备对于癌症、生殖健康、大健康的基础研究及临床诊断、治疗、制药相关的检测试剂盒、实验装置或检测系统中的应用。
本发明的有益效果:本方法可以视乎实验的需求达至中通量级别甚至高通量级别。主要体现在根据实际情况,把样品制备成单细胞悬液后,使用10μl含滤芯的移液枪方法进行单细胞的捕获与分离,或需求通量高的时候可使用分选级别的流式细胞仪或者市面上已经投产的Namocell等单细胞分选系统进行分选。按照本实验的方法,在分选细胞的步骤的时候只需要普通的96孔板或者八连管即可,不需要某系单细胞测序公司所需要特殊的微流控芯片及特殊油包水磁珠或微孔体系。当96孔板或八联管每一个孔含有一个单细胞的时候(体系约 1μl),本实验使用该方法中最核心的自主设计的含有barcode的Tn5转座酶进行片段化(即加入可识别的序列)。而且经过优化后的反应体系可在5μl的反应液环境中进行片段化和加接头合并反应将每个单细胞进行标识。随后进行直接混合(pooling)纯化的步骤,在无需预扩增的步骤下,进行一步到位的PCR基因组测序建库扩增,这是由于双端采用的是不同的接头,由于PCR抑制效应(参考文献),当出现转座酶由于不可抗原因成为A-A端,或者B-B端的时候,在扩增阶段会形成发夹结构而无法进行扩增,最终保证文库的扩增效率。若是有需求,例如不同的细胞或者想增加细胞测序的通量,也可在此步利用已商业化的index进行标记。本方法经过测试已经符合商业化试剂盒的index(诺唯赞,illuminate等品牌)的引物接头,因此理论上本方法能方便快捷的进行成百上千的细胞的单细胞拷贝数变异测序文库构建。并针对临床样品中的液体活检的肿瘤单细胞例如循环癌细胞(CTC),生殖健康例如PGT(植入前遗传筛查)和NIPD(无创产前诊断)的研究及临床应用及其他疾病早期诊断提供新型关键核心前沿技术,推动整个生物医学发展。
图1为本发明的技术流程图。
图2为Tn5转座酶的与其结合引物的组装示意图。
图3为单细胞捕获示意图。
图4为PCR扩增纯化后测序文库结构示意图。
图5为针对K562细胞的中通量单细胞拷贝数变异的测序文库的E-Gel分析示意图,随后进行切胶(300-500bp)回收。
图6为针对Jurkat细胞系(n=40)和正常人外周血单个核细胞(n=56)的中通量单细胞拷贝数变异的测序文库的E-Gel分析示意图,随后进行切胶(300-500bp)回收。
图7为针对GM12878细胞系的中通量单细胞拷贝数变异的测序文库(48个单细胞混合建库)的E-Gel分析示意图,随后进行切胶(300-500bp)回收。
图8为针对K562细胞系构建的单细胞CNV建库后,使用2100进行建库片段的检测结果示意图,可见峰度为300-800之间,符合上机测序标准。
图9为针对正常对照和Jurkat细胞系构建的单细胞CNV测序库后,使用2100进行建库片段的检测结果示意图,可见峰度为300-800之间,符合上机测序标准,其中正常对照为正常人体外周血单个核细胞,建库单细胞数量48个。 Jurkat细胞系建库数量48个。
图10为针对GM12878细胞系构建的单细胞CNV建库后,使用2100核酸分析仪进行建库片段的检测示意图,可见峰度为300-800之间,符合上机测序标准,混合建库的单细胞数量为48个。
图11为为针对K562细胞的中通量单细胞拷贝数变异的测序文库数据质量示意图。
图12为为针对Jurkat细胞系和正常人外周血单个核细胞的中通量单细胞拷贝数变异的混合测序文库数据质量示意图。
图13为针对GM12878细胞系的中通量单细胞拷贝数变异的测序文库数据质量示意图。
为了更加简洁明了的展示本发明的技术方案、目的和优点,下面结合具体实施例及其附图对本发明做进一步的详细描述。
实施例
一、设计Tn5转座酶的结合引物
由于设计的引物要符合Tn5转座酶的组装,而该酶的组装需要符合以下条件:结合引物必须含有ME序列才可与转座酶相结合并完成打断与建库加接头的一步过程,而且需要互补的双链结构。因此需要把合成的引物进行预退火,即把两个设计为存在一段互补序列的引物重新根据退火的原理整合成一条双链。
因此,本发明的Tn5转座酶结合引物由A引物、B引物、C引物三部分组分,A引物由3个随机碱基+8bp碱基的barcode识别序列和P5端接头序列,以及反向ME序列组成;B引物由P7端接头序列和反向ME序列组成;C引物为5端带有磷酸化的寡聚核苷酸片段,且A引物和B引物分别与C引物有部分互补,A引物核苷酸序列如SEQ ID NO:1~48所示的任一种,B引物核苷酸序列如SEQ ID NO:49所示,C引物核苷酸序列如SEQ ID NO:50所示。
其中,P5端接头用于匹配上illuminate测序平台的5端PCR扩增序列,可方便在混合(pooling)的后通过PCR技术把官方便签序列(index1)和测序接头1加上;P7端接头用于匹配illuminate测序平台的7端PCR扩增序列,同理可方便在混合(pooling)的后通过PCR技术把官方标签序列(index2)和测序 接头2加上。这样就形成了N×M的组合,可进行中通量的单细胞测序,并且节省成本(无需打包下整个flowcell或lane而是可以混样品测序)。
1、Tn5转座酶结合引物的制备:
(1)引物预退火:
a.由于A与C可以部分互补,B与C可以部分互补,因此,在进行建库反应之前需要将引物A和C、引物B和C退火分别形成双链,即获得P5、P7接头。
b.委托擎科生物有限公司合成引物,按照说明体系加入TE buffer使浓度为100μmol/ml。
c.使用1.5ml离心管按照以下体系配置反应退火体系:
表1:接头P5反应体系:
表2:接头P7反应体系:
d.使用锡纸把上述1.5ml离心管包裹,以便后续反应加热均匀。
e.把上述含有反应体系的1.5ml离心管转移进94℃水浴中,反应2min后在10min内把温度逐渐下降至80℃,转移至洁净环境,自然降温至室温。
f.完成预退火的核酸产物,可置于-20℃冰箱保存,用于后续单细胞拷贝数测序建库实验。
2、组装Tn5转座酶
Tn5转座酶可识别上述P5、P7接头的双链部分,两个不同的双链核酸产物后与Tn5转座酶进行组装,形成可进行二代测序建库的Tn转座酶复合体。如图2所示。
具体操作如下:
a.将P5、P7接头储存液以1:1的比例稀释2倍,使其最终浓度为10μM/ml。
b.按照以下体系配制反应体系:
表3:反应体系
上述接头P7为上述转座酶和序列结合体,上述接头P5为上述转座酶和序列结合体。
c.把上述反应体系置于37℃金属浴中,反应30min。
d.上述反应产物即为已组装了接头的反应酶,可用于以下单细胞拷贝数变异测序建库,或-20℃保存。
二、获得单细胞
1、细胞培养
细胞的状态对于本发明的方法的由较大影响,若是细胞培养液中碎片过多,会影响在显微镜下的细胞分选。若是细胞营养不足,则整体细胞的染色体三维结构或染色质结构可能会有一定的影响,或者导致细胞死亡产生碎片。本实施例细胞培养具体步骤如下:
(1)本实施例选用的细胞样品包括:K562细胞、Jurkat细胞、GM12878细胞,其中以K562为例。
(2)将K562细胞冻存管置于37℃水浴速溶。
(3)溶解后K562细胞使用低速离心机在800rpm,离心5min
(4)使用75%酒精喷淋含有K562细胞的冻存管后,置于超净台进行后续操作
(5)使用1000μl的移液枪弃除上清后加入1000μl PBS重悬细胞,吹打混匀。
(6)置于低速离心机使用800rpm,离心4min。
(7)去除上清,使用1000μl含10%FBS的1640培养基重悬细胞。
(8)将重悬的K562细胞全部转移至含4ml含有10%FBS的1640培养基的培养瓶中。
(9)“十”字混匀后,将培养瓶置于显微镜下观察细胞状态。
(10)将培养瓶置于5%二氧化碳培养箱中37℃培养。
(11)24小时后对细胞进行换液。
2、制备单细胞悬浮液
3、单细胞捕获:
(1)培养好的细胞浓度约1×10
5,转移至15ml离心管中。
(2)800rpm离心3min,弃除上清。
(3)加入5ml预冷4℃的PBS,800rpm离心3min,弃除上清。
(4)重复以上步骤,再洗一遍,弃除上清。
(5)使用100ul预冷培养基1640重悬细胞,放置于冰上。
(6)准备6孔板培养皿,或60mm培养皿,加入1ml预冷含10%的FBS的pbs和10ul细胞。
(7)倒置显微镜下观察,若是细胞浓度过高,再适当稀释。直至在10X物镜视野下1-2个细胞为准。
(8)使用带滤芯的10μl长枪头在倒置显微镜下进行单细胞捕获。
(9)捕获含单细胞的溶液体积最终为1μl,并转移至96孔板或8联管的管底,以进行后续CNV建库实验。
上述结果,如图3所示,使用2.5μl级别的移液枪和附带滤芯的10μl枪头配合使用进行对单细胞的筛选与捕获。图中视野中红色圆圈可见为单个细胞,通过1μl的体系可以完整的把整个细胞吸入,并且在浓度合适的情况下,是任何其它细胞或杂质是可控制的,因此在1μl的体系中,只存在单个细胞。同时由于镜检和单细胞捕获是在同一步,因此对于单细胞的活性等质量一定的保障。
上述为预试实验的细胞制备。在实际应用中,无论是实体组织、血液,或者是分析富集的临床样品(如CTC富集、流式细胞仪富集),或者直接挑取的样品(如激光获取的细胞,Tip挑取的细胞),等机物理、化学、生物方法获取的细胞样品,都可以应用做研究对象。
三、构建单细胞文库
1、单细胞裂解:
(1)使用1μlzymolysisbuffer加入含单细胞的1ul以上溶液中。
(2)室温反应10min(在7.5min时,用手指在底部轻弹3下,混匀后瞬时离心)。
(3)加入1μlthermo无菌无酶水,再裂解10min(在7.5min时,用手指在底部轻弹3下,混匀后瞬时离心)。
2、单细胞DNA纯化:
(1)加入2倍体积(6μl)的AMPurer磁珠(磁珠需要提前30min室温平衡)于以上体系中,孵育15min。
(2)置于磁力架中,反应1-2min,直至吸附DNA的磁珠会聚团并被磁铁吸附。
(3)弃除上清,使用200μl的80%乙醇清洗磁珠(本步骤在磁力架上进行),并去除上清。
(4)重复以上步骤清洗DNA。
(5)使用200μl带滤芯枪头移除乙醇,后用10μl带滤芯枪头完全去除剩余的乙醇。
(6)磁力架置于生物安全柜中风干10-15min,直至磁珠干燥,但不可干燥至磁珠龟裂。
3、片段化和加接头:
(1)使用3μl预热至60℃的无菌无酶水加入至磁珠块中,孵育1-2min,溶解出DNA。
(2)瞬时离心后加入1μl的5×LM buffer
(3)根据需要建库的单细胞数量依次加入组装的Tn5转座酶,在55℃反应20min,进行核酸片段化和加入扩增接头序列(即上述的建库测序接头,AC,BC)。
(4)使用1μl的NT buffer或0.2%的SDS在55℃反应8min终止Tn5的片 段化反应。
4、混合和纯化:
(1)把八连管或者96孔板置于磁力架中1-2min,转移以上全部上清至一新的1.5ml EP管中。
(2)加入5倍体积的binding buffer(zymo DNA concentration&purification kit),涡旋2-5s混匀。
(3)在纯化柱中预先加入1μl的carrier DNA(arh35F,生工合成),孵育1min。
(4)转移上述步骤2的混合液体至纯化柱中,12000rpm离心1min。若pooling体积过大,可先转移一次,离心后再转移剩余液体过柱,直至步骤2中的混合液中的DNA完全被纯化柱所吸附。弃除滤液。
(5)加入200μlwashbuffer至纯化柱,12000rpm离心1min。
(6)重复步骤5。
(7)使用60℃的6μl无菌无酶水加入至纯化柱中并换新的EP管,孵育1min后于12000rpm离心1min。
(8)重复以上步骤,新EP管中最终得到的溶液为纯化的DNA。
4、聚合酶链式反应(PCR)扩增
按照下表体系配制PCR反应体系
表4:PCR反应体系
按照下表进行PCR程序设定
表5:PCR程序设定
注意:循环数根据混合建库的单细胞数量做决定,一般单个细胞为27-28个循环,48个细胞混合为22-23个循环,p7引物和P5引物是已有商业化的试剂盒,诺唯赞或illuminate均可购买获得。
5、PCR产物纯化
(1)由于经过PCR后含有其它杂质,因此在E-Gel分析之前需要使用ZYMOCONCENTATION&PURIFICATION纯化试剂盒进行PCR产物纯化。
(2)完全转移PCR产物(100μl)至新的1.5ml离心管中,按照5倍体积即加入500μl bindbuffer,震荡5s混匀。
(3)完全转移溶液至纯化柱中,于室温12000rpm以上离心1min,弃滤液。
(4)加入200μl washbuffer至纯化柱中,于室温12000rpm以上离心1min,弃滤液。
(5)重复步骤4。
(6)将纯化柱转移至新的1.5ml离心管中,使用10μl预热至60℃的无菌无酶水加入纯化柱中心,于室温12000rpm以上离心1min。
(7)使用10μl预热至60℃的无菌无酶水加入纯化柱中心,于室温12000rpm以上离心1min。
(8)1.5ml离心管中约有20μl的纯化产物,可立即进行E-Gel分析,也可-20℃保存。
根据上述步骤,获得纯化测序文库结构如下,其结构如图4所示:
5’-AATGATACGGCGACCACCGAGATCTACAC(index1)TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG(NNN+N位barcode)
-AGATGTGTATAAGAGACAG-TARGET-CTGTCTCTTATACACATCTCCGAGCCCACGAGAC(index2)ATCTCGTATGCCGTCTTCTGCTTG-3’
从左往右(5’到3’方向)依次是标准化的P5端接头,其用于锚定在市面上二代测序平台illuminate的桥式PCR测序池(flowcell)上,其具体序列为:5’-AATGATACGGCGACCACCGAGATCTACAC-3’。随后是识别样品的索引序列index1。Rd1SP是双端测序的其中一端测序引物结合序列,其具体序列为:5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’。BC则是识别单细胞 的barcode序列,本发明在该识别序列前端加了三个随机碱基NNN,以防测序的时候初始信号不稳定而导致barcode识别率下降。紧接着是一段锚定序列(ME序列),用于定位barcode序列和模拟ME序列AGATGTGTATAAGAGACAG,使其可跟Tn5酶正常结合组装。灰色部分图4中的DNA insert则表示需要测序的片段。Rd2SP是双端测序的另一端测序引物结合序列。索引序列2(index2)是锚定序列P7端的标签序列。
设计该序列的目的是为了降低成本、高效和匹配现有平台的目的,所以使用的是双端测序和双端index,由于P5端和P7端包括的index都能匹配现有的平台,因此可根据需求自行选择测序数据量,无需包下整条测序lane或者整个测序池(flowcell),在一定程度上减少了测序的成本。
6、E-GEL分析
(1)本实验使用英潍捷基(Invitrogen)2%的预制胶(E-Gel),使用时直接拆封包装并专属装在仪器上,在胶板上标记泳道所属样品。
(2)点样:若是使用50bp DNA marker(Thermo Fisher,cat.no.10488099)则需向两个Maker孔中加入16μl无菌无酶水和4μl Maker(由于Maker孔在两边,有时会出现漏出少量液体,此时用无菌无酶水将孔补满至20μl即可),若是使用另一款marker则直接加入20μl溶液即可。根据操作习惯和操作技巧的不同,在添加样品要注意,为了防止在切胶回收步骤和跑胶时两个样本相互污染,样品孔需间隔一个孔。把上述20μl纯化产物加入胶板中,间隔孔需用无菌无酶水补充至20μl。若样品不足20μl,需要用无菌无酶水补充至20μl。
(3)跑胶:为了验证建库情况和回收300-500bp片段,0.8%-2%的预制胶一般需要18min,注意marker条带的50bp片段跑至接近E-Gel包装板的黑色胶纸部分即可。
(4)初步结果观察:使用胶荧光成像系统观察测序文库建库条带情况并拍照记录。
(5)切胶回收:切下300-500bp片段。
(6)将回收区域的胶切下来回收至1.5ml ep管中,称量其重量,可进行后续胶纯化步骤或保存于4℃中。
上述实验结果,如图4~5所示,条带发亮,说明文库制备成功。
7、胶DNA回收纯化
(1)使用zymo gel purification kit进行胶里的DNA片段进行回收和纯化。
(2)将以上回收的胶以1:3(即1mg加3ml的比例)加入AD buffer。(300-500bp一般是0.9mg,加入270μl AD buffer,每个泳道的胶置于独立一个1.5ml离心管)。
(3)在55℃金属浴中反应15分钟,直至胶全部溶解。
(4)转移全部溶液至层析柱中,于室温10000rpm以上离心1min后,弃滤液。
(5)加入200ul Wash buffer至层析柱中,于室温10000rpm以上离心1min后,倒弃滤液。
(6)重复步骤4。
(7)将纯化柱转移至新的1.5ml离心管中,使用8μl预热至60℃的无菌无酶水加入纯化柱中心,于室温10000rpm以上离心1min。
(8)使用10μl预热至60℃的无菌无酶水加入纯化柱中心,于室温10000rpm以上离心1min。
(9)1.5ml离心管中约有16μl的纯化产物,可进行下一步测序前2100核酸分析仪和Qit检测,或-20℃保存。
8、Qubit 3.0 fluorometer核酸分析仪检测浓度
(1)标准化仪器:取两个管子,向每个管子中加入199μl的working buffer,然后再加入1μl的荧光染料,瞬时离心后涡旋振荡混匀,用枪头弃掉10μl液体后补入10μl的标准试剂,瞬时离心后再涡旋震荡混匀,室温静置孵育2分钟后,将管子置于仪器中,点击操作仪屏幕按钮进行自动标准化操作。
(2)测量浓度。取相应数量的匹配离心管,加入199μl working buffer,随后加入1μl荧光染料,做好标记,涡旋混匀后瞬时离心。
(3)弃除1μl以上溶液,后再向每个离心管子中加入1μl的样品,涡旋震荡混匀后瞬时离心,室温静置孵育2分钟后,将离心管子置于仪器。
(4)选择ds DNA,根据面板指示,调节好稀释倍数,检测文库DNA最终浓度。
上述实验结果,如下表所示:
表6:K562细胞系文库制备Qbit核酸分析仪浓度分析表
表7:
Jurkat细胞系和正常人外周血单个核细胞混合文库制备Qbit核酸分析仪浓度分析表
表8:GM12878细胞系文库制备Qbit核酸分析仪浓度分析表
由于测序前,需要判断文库制备的质量,因此需要使用Invitrogen开发的Qbit核酸分析仪进行浓度的检测。结果如上表所示,由上述细胞构建的文库均符合测序浓度2ng/ml的要求。
9、2100核酸分析仪分析
(1)取650μl胶加入到带滤膜的EP管中,取下层滤过的胶加1μl的核酸染料,涡旋震荡混匀,于13000rpm反应10min。
(2)加9μl胶至2100分析仪专用芯片带○G的孔中,注意枪头不能触及芯片底部。
(3)将芯片放到注胶平台上对齐,扣紧注胶平台,注射器下压60s,打开卡位待注射器自然弹回(一般弹回到0.9附近,再拉至1.0左右,若自然弹回仅0.7,则注胶漏气,需重新操作)。
(4)加入9μl胶至芯片中另外两个带○G的孔中,无须再用注射器压。
(5)向芯片中除了带○G的孔之外的每个孔中加入5μlMarker,注意加入底部。
(6)每孔中加入1μl的样品,注意防止产生气泡。
(7)向芯片中的标记有“梯子”图案的孔加入1μl Ladder,置于振荡器上以2000rpm振荡1min,卡入2100分析仪内。
(8)打开2100分析仪专属软件,Assay设置检测类型,点击START开始检测。
(9)待样品跑完之后,根据实验需要选取相应的片段,关闭电脑和2100后需清洗电极。在清洗芯片中加满无菌无酶水,置于电极中浸泡3min,室温晾电极5-10min,再将干燥剂防置于电极下方,保持电极干燥以备下次使用。
上述实验结果如图7~9所示,说明运用本发明方法针对K562细胞系(共120个单细胞)、正常对照组和Jurkat细胞系(共96个单细胞)、GM12878细胞系(共48个单细胞)构建的单细胞CNV文库,使用2100进行建库片段的检测,可见峰度为300-800之间,均符合上机测序标准。
测序数据质量分析:如图11~13和表9~11所示
表9:K562细胞系的单细胞拷贝数变异测序文库的数据质量(以其中一组为例)
由上表可知,本方法针对K562细胞系建库得出来的数据质量总体符合预期标准,由于本次测序为包lane测序,以免导致数据浪费及测试商业化标准的双 端index是否匹配本方法,因此采用在同一批细胞中添加了7个index进行建库。从图和表中可得,cleanreadsrate占了总数据量得98.62%,无论是rawdata和cleandata的Q30rate都达到93%以上。因此本方法所建库的质量是符合后期生信分析的要求,并少产生数据冗余,节省成本。
表10:Jurkat细胞系和正常人外周血单个核细胞的单细胞拷贝数变异测序文库的数据质量。(以其中一组为例)
为验证是否barcode之间能否区别不同的细胞系及混合测序是否相互影响,本实验采用了48个jurkat细胞和48个正常人外周血的单个核细胞进行混合建库。数据质量由上图和表可知,总数据量约120G,cleanreadsrate基本上在98%左右,Q30百分比也有91%。证明数据可靠,基本无交叉污染及低质量影响,可进行下游生信分析。
表11:GM12878细胞系的单细胞拷贝数变异测序文库的数据质量。
为了验证其一批数据是否能正常检测出barcode和测试对接的测序平台,本次建库为48个GM12878细胞系的单批单细胞拷贝数变异文库的制备,采用illuminate nova-seq PE150平台进行单批数据散样测序,目标数据量是48G,最终产出数据量达62G。可从上表也可得,这批数据质量依旧良好,其中Cleanreadsrate高达99.48%,基本上无接头污染和低质量读数的影响,而且Q30也有90.7%以上。对于达到后续生信分析的要求是无需置疑的。
本实施例的引物A如下表12所示:
小写部分为自主设计的Barcode序列
本实施例引物B:
49:GTCTCGTCGACGACTGGGCTCGAGATGTGTATAAGAGACAG
本实施例引物C:
50:CTGTCTCTTATACACATCT
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。
Claims (15)
- 一种中通量单细胞拷贝数文库构建的方法,其特征在于,所述方法包括:在分选单细胞,分别进行单细胞裂解以及进行基于Tn5转座酶的DNA断裂和建库,获得可以直接用于后续测序的单细胞基因组测序文库;其步骤包括:1)分选及捕获单细胞:捕获单细胞到多联试管但不限于8联管或12联管,或多孔版包括但是不限于96孔或384孔板;2)裂解细胞:充分暴露基因组DNA;3)反应处理:通过失活酶及纯化DNA或稀释样品,解除前述反应对下游的抑制反应;4)采用Tn5转座酶建库:基于Tn5转座酶片段化基因组DNA,同时在DNA片段加入由N个单核苷酸组合形成的单细胞barcode识别序列;5)混合多样品于单试管及纯化和浓缩体积;6)单管内平行建立多样品文库:用PCR扩增进行,同时每批次采用独特设计的含特定索引(Index)的与二代测序系统兼容的引物;7)进行文库纯化和选择文库长度;8)二代测序及数据的单细胞特异性解码;9)下游分析。
- 如权利要求1所述的方法,其特征在于,所述步骤1)分选单细胞可用流式细胞分选仪或其他替代性或细胞类型特异富集及分选设备,包括但不限于cellenone或namocell单细胞分选仪。
- 如权利要求1所述的方法,其特征在于,所述步骤2)裂解细胞用Zymolysis buffer(cat#D3004-1-50)进行。
- 如权利要求1所述的方法,其特征在于,所述步骤2),裂解细胞用Qiagen Protease(cat#19155/19157)进行,而且裂解完成后通过加热替代纯化使该酶失活。
- 如权利要求1所述的方法,其特征在于,所述步骤3)纯化DNA用AMPure XP(cat#A63881)磁珠,或其他可纯化DNA的磁珠进行。
- 如权利要求1所述的方法,其特征在于,所述步骤4)所述Tn5转座酶建库包括以下步骤:在单细胞DNA溶液中加入Tn5转座酶进行反应,然后加入酶抑制剂完全终止Tn5的片段化反应和酶活性。
- 如权利要求6所述的方法,其特征在于,所述Tn5转座酶含有结合引物, 所述结合引物由A、B、C三部分组分,所述A引物含有N个单核苷酸组合的细胞识别序列和P5端接头序列以及反向ME序列;所述B引物含有P7端接头序列和反向ME序列;所述C引物为5端带有磷酸化的寡核苷酸片段,且分别能与A引物和B引物分别部分互补;所述A引物的核苷酸序列如SEQ ID NO:1~48所示,所述B引物的核苷酸序列如SEQ ID NO:49所示,所述C引物的核苷酸序列如SEQ ID NO:50所示。
- 如权利要求1所述的方法,其特征在于,所述步骤6)构建成专门设计的测序文库,其中的每一个核酸片段的5’端分别添加锚定序列、细胞条码序列;随后在DNA片段扩增时,分别在扩增的上下游引物添加与测序系统兼容的扩增接头序列;扩增获得的DNA片段从5’端到3’端的方向依次包括P5端接头序列、索引序列1、测序引物结合位点1、细胞barcode识别序列、锚定序列、待测序列、测序引物结合位点2、索引序列2、P7端接头序列,最终形成与Illumina测序系统兼容的二代测序文库。
- 如权利要求8所述的方法,其特征在于,所述barcode序列为3个随机碱基加长度为8bp碱基的一段核苷酸序列;所述锚定序列为AGATGTGTATAAGAGACAG;所述测序引物结合位点1为:TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGATGTGTATAAGAGACAG;所述测序引物结合位点2:GTCTCGTGGGCTCGAGATGTGTATAAGAGACAG。
- 如权利要求8所述的方法,其特征在于,所述测序文库中核苷酸片段具体结构如下:5’-AATGATACGGCGACCACCGAGATCTACAC(index1)TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG(NNN+N位barcode)AGATGTGTATAAGAGACAG-TARGET-CTGTCTCTTATACACATCTCCGAGCCCACGAGAC(index2)ATCTCGTATGCCGTCTTCTGCTTG-3’,其中“TARGET”表示目的核酸片段。
- 如权利要求1所述的方法,其特征在于,所述步骤7)文库纯化和文库长度选择采用但不限于DNA片段长度选择性磁珠,或凝胶电泳分类片段并选择性回收。
- 如权利要求1所述的方法,其特征在于,所述步骤8)中进行二代测序 的具体步骤为:将不同索引序列的多个文库进行混合,然后采用高通量测序平台在同一测序lane或者直接根据自己所需要的数据量进行散样测序。
- 如权利要求1所述的方法,其特征在于:可根据数据量实际需求,进行片段筛选后进行DNA纯化进行测序,或者无片段筛选直接进行DNA纯化后进行测序。
- 如权利要求1~13任一项所述的方法,其特征在于,每个样品的单细胞可以替换为多个细胞,可以是1-50个、50-100个、100-200个、200-500个、500-1000个、1000-10000个细胞,纯化的基因组DNA 1ng到1ug。
- 如权利要求1所述的方法在制备对于癌症、生殖健康、大健康的基础研究及临床诊断、治疗、制药相关的检测试剂盒、实验装置或检测系统中的应用。
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/228,664 US20240043919A1 (en) | 2021-02-01 | 2023-07-31 | Method for traceable medium-throughput single-cell copy number sequencing |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110133128.5 | 2021-02-01 | ||
| CN202110133128.5A CN114836838A (zh) | 2021-02-01 | 2021-02-01 | 一种中通量单细胞拷贝数文库构建的方法及其应用 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/228,664 Continuation-In-Part US20240043919A1 (en) | 2021-02-01 | 2023-07-31 | Method for traceable medium-throughput single-cell copy number sequencing |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022161294A1 true WO2022161294A1 (zh) | 2022-08-04 |
Family
ID=82561272
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/073321 Ceased WO2022161294A1 (zh) | 2021-02-01 | 2022-01-21 | 一种中通量单细胞拷贝数文库构建的方法及其应用 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20240043919A1 (zh) |
| CN (1) | CN114836838A (zh) |
| WO (1) | WO2022161294A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116515955A (zh) * | 2023-06-20 | 2023-08-01 | 中国科学院海洋研究所 | 一种高效低成本的多基因靶向分型方法 |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115537408B (zh) * | 2022-10-08 | 2025-04-18 | 厦门大学 | 一种单细胞多组学文库及其构建方法 |
| CN116254611A (zh) * | 2022-12-16 | 2023-06-13 | 南方科技大学 | 一种多样本超高通量单细胞转录组测序文库的构建方法 |
| CN117683866B (zh) * | 2024-01-22 | 2024-08-06 | 湛江中心人民医院 | 检测细胞中dna的方法 |
| CN118086545A (zh) * | 2024-04-08 | 2024-05-28 | 上海奕检智造生命科技有限公司 | 一种检测结核分枝杆菌及其耐药基因方法 |
| US12467087B1 (en) | 2024-06-25 | 2025-11-11 | Guardant Health, Inc. | Sequencing methods with partitioning |
| CN120026092B (zh) * | 2025-04-21 | 2025-09-16 | 中国水产科学研究院黄海水产研究所 | 一种利用分子测序技术分析南极磷虾胃肠道食物残留物的方法 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016126871A2 (en) * | 2015-02-04 | 2016-08-11 | The Regents Of The University Of California | Sequencing of nucleic acids via barcoding in discrete entities |
| CN109526228A (zh) * | 2017-05-26 | 2019-03-26 | 10X基因组学有限公司 | 转座酶可接近性染色质的单细胞分析 |
| CN109811045A (zh) * | 2017-11-22 | 2019-05-28 | 深圳华大智造科技有限公司 | 高通量的单细胞全长转录组测序文库的构建方法及其应用 |
| CN110268059A (zh) * | 2016-07-22 | 2019-09-20 | 俄勒冈健康与科学大学 | 单细胞全基因组文库及制备其的组合索引方法 |
| CN110886021A (zh) * | 2018-09-07 | 2020-03-17 | 深圳华大生命科学研究院 | 一种单细胞dna文库的构建方法 |
-
2021
- 2021-02-01 CN CN202110133128.5A patent/CN114836838A/zh active Pending
-
2022
- 2022-01-21 WO PCT/CN2022/073321 patent/WO2022161294A1/zh not_active Ceased
-
2023
- 2023-07-31 US US18/228,664 patent/US20240043919A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016126871A2 (en) * | 2015-02-04 | 2016-08-11 | The Regents Of The University Of California | Sequencing of nucleic acids via barcoding in discrete entities |
| CN110268059A (zh) * | 2016-07-22 | 2019-09-20 | 俄勒冈健康与科学大学 | 单细胞全基因组文库及制备其的组合索引方法 |
| CN109526228A (zh) * | 2017-05-26 | 2019-03-26 | 10X基因组学有限公司 | 转座酶可接近性染色质的单细胞分析 |
| CN109811045A (zh) * | 2017-11-22 | 2019-05-28 | 深圳华大智造科技有限公司 | 高通量的单细胞全长转录组测序文库的构建方法及其应用 |
| CN110886021A (zh) * | 2018-09-07 | 2020-03-17 | 深圳华大生命科学研究院 | 一种单细胞dna文库的构建方法 |
Non-Patent Citations (1)
| Title |
|---|
| SIMONE PICELLI, ÅSA K. BJÖRKLUND, BJÖRN REINIUS, SVEN SAGASSER, GÖSTA WINBERG, RICKARD SANDBERG: "Tn5 transposase and tagmentation procedures for massively scaled sequencing projects", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, US, vol. 24, no. 12, 1 December 2014 (2014-12-01), US , pages 2033 - 2040, XP055236186, ISSN: 1088-9051, DOI: 10.1101/gr.177881.114 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116515955A (zh) * | 2023-06-20 | 2023-08-01 | 中国科学院海洋研究所 | 一种高效低成本的多基因靶向分型方法 |
| CN116515955B (zh) * | 2023-06-20 | 2023-11-17 | 中国科学院海洋研究所 | 一种多基因靶向分型方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114836838A (zh) | 2022-08-02 |
| US20240043919A1 (en) | 2024-02-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022161294A1 (zh) | 一种中通量单细胞拷贝数文库构建的方法及其应用 | |
| EP2714938B1 (en) | Methods of amplifying whole genome of a single cell | |
| CN112041459A (zh) | 核酸扩增方法 | |
| CN114107459A (zh) | 一种基于寡核苷酸链杂交标记的高通量单细胞测序方法 | |
| CN115386622B (zh) | 一种转录组文库的建库方法及其应用 | |
| CN111748637A (zh) | 一种用于亲缘关系分析鉴定的snp分子标记组合、多重复合扩增引物组、试剂盒及方法 | |
| CN117089597A (zh) | 一种单细胞文库构建测序方法及其应用 | |
| US20230079748A1 (en) | Preparation method, product, and application of circulating tumor dna reference samples | |
| CN108300766A (zh) | 利用转座酶对染色质开放区和线粒体甲基化研究的方法 | |
| CN116515977A (zh) | 基于单端接头转座酶的单细胞基因组测序试剂盒和方法 | |
| TW201321520A (zh) | 用於病毒檢測的方法和系統 | |
| WO2024250299A1 (zh) | 染色体外环状dna文库的制备方法、测序方法、试剂盒及其应用 | |
| CN118272555B (zh) | 一种靶向病原检测方法、系统和设备 | |
| CN115873922A (zh) | 一种单细胞全长转录本建库测序方法 | |
| JPWO2018061674A1 (ja) | 脊椎動物由来の単一細胞の塩基配列情報を取得する方法 | |
| Derbala et al. | Whole-Genome Bisulfite Sequencing Protocol for the Analysis of Genome-Wide DNA Methylation and Hydroxymethylation Patterns at Single-Nucleotide Resolution | |
| CN118703607A (zh) | 一种基于微流控技术的高通量单细胞外源性载体整合位点检测方法 | |
| CN118374490A (zh) | 利用荧光pcr毛细管电泳技术检测牛亲缘关系及个体识别的试剂盒及其应用 | |
| CN110468180A (zh) | 血浆dna文库及其构建方法 | |
| EP4347870A1 (en) | Methods and systems for determining cell-cell interaction | |
| CN117165657A (zh) | 一种rna文库的构建方法 | |
| CN116497105B (zh) | 基于末端转移酶的单细胞转录组测序试剂盒及测序方法 | |
| US20190284625A1 (en) | Methods for joint low-pass and targeted sequencing | |
| EP3283646B1 (en) | Method for analysing nuclease hypersensitive sites. | |
| CN117487932B (zh) | 一种用于亲子鉴定的snp位点组合、及其检测引物对和应用 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22745169 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22745169 Country of ref document: EP Kind code of ref document: A1 |














