TWI912823B

TWI912823B - A method for establishing a system for determining nutrition requirement based on metagenomic sequencing data

Info

Publication number: TWI912823B
Application number: TW113125494A
Authority: TW
Inventors: 胡兆棨; 洪源懋; 莊曜宇; 蔡孟勳
Original assignee: 國立臺灣大學
Priority date: 2023-07-17
Filing date: 2024-07-05
Publication date: 2026-01-21
Also published as: TW202505534A

Abstract

The invention discloses a method for establishing a system for determining nutrition requirement based on metagenomic sequencing data. Specifically, the method integrates a next-generation bacterial 16S rRNA analytical pipeline, metabolic pathway prediction tools, and a food compound database to enable the identification of missing nutrients and provide individual dietary suggestions

Description

A method for establishing a nutrient intake requirement system using metagenomic data

本發明係揭示一種利用宏基因體資料建立營養攝取需求系統的方法。具體地，該方法係藉由整合次世代定序微生物16S核糖體RNA分析流程、代謝途徑預測工具以及食物與化合物資料庫，預測個體缺乏的營養素並提供飲食建議。 This invention discloses a method for establishing a nutritional intake requirement system using metagenomic data. Specifically, this method integrates next-generation sequenced microbial 16S ribosomal RNA analysis, metabolic pathway prediction tools, and food and compound databases to predict nutrient deficiencies in individuals and provide dietary recommendations.

隨著預防醫學與營養保健日益受到重視，精準營養的研究領域逐漸發展。其中，腸道微生物與個體的共生關係相當複雜，近幾年的研究發現，疾病、情緒、運動與飲食習慣與腸道微生物具有密切關聯。 With the increasing emphasis on preventive medicine and nutritional health, the research field of precision nutrition is gradually developing. Among these areas, the symbiotic relationship between gut microbiota and the individual is quite complex. Recent studies have found a close link between disease, emotions, exercise, and dietary habits and gut microbiota.

但是，個體對於相同的預防醫學或營養保健方案卻有著不同程度的反應與效果，其主要原因之一是個體的腸道菌的組成不盡相同，進而造成免疫與攝取食物營養的代謝差異。此外，透過食用益生菌定殖於腸道以此改善腸道菌相，其效果也往往因個體而異。以上都構成預防醫學和精準營養的技術瓶頸。 However, individuals respond differently to the same preventative medicine or nutritional supplement programs. One major reason is the variation in the composition of an individual's gut microbiota, leading to differences in immunity and the metabolism of nutrients from food. Furthermore, the effectiveness of improving gut microbiota through probiotic colonization often varies from person to person. These factors constitute technological bottlenecks in preventative medicine and precision nutrition.

鑒於上述，研究開發一可應用於精準營養領域的技術系統，藉此提供產業界系統性地了解菌相與飲食間的關聯性，並協助制定個體化的飲食建議與改善方案仍是現今預防醫學和營養保健亟需突破和發展之目標。 In light of the above, developing a technological system applicable to the field of precision nutrition to provide industry with a systematic understanding of the relationship between gut microbiota and diet, and to assist in formulating personalized dietary recommendations and improvement plans, remains a crucial breakthrough and development goal for preventive medicine and nutritional health.

鑒於上述的技術背景，為了符合產業需求，本發明之目的是系統性地建立微生物和飲食營養之間的關係，藉此達到精準營養之目標。據此，本發明的第一目的在於提供一種利用宏基因體資料建立營養攝取需求系統的方法。創新地，本發明的技術手段包含：微生物菌種分析、酵素與化合物串接以及飲食評估推薦。具體地，本發明利用的宏基因體資料是核糖體RNA定序資料，且該核糖體RNA定序資料經由品質控管後輸出分析檔案，然後該分析檔案由物種分類器以及酵素、化合物、食物與基準菌相等多種資料集進行串接比較，再透過飲食評估及最佳化方法得出推薦的食物列表，藉此完成本發明的利用宏基因體資料建立營養攝取需求系統的技術方案。 In light of the aforementioned technical background, and to meet industry needs, the purpose of this invention is to systematically establish the relationship between microorganisms and dietary nutrition, thereby achieving the goal of precision nutrition. Accordingly, the primary objective of this invention is to provide a method for establishing a nutritional intake requirement system using metagenomic data. Innovatively, the technical means of this invention include: microbial strain analysis, enzyme and compound cohesion, and dietary assessment and recommendation. Specifically, the metagenomic data utilized in this invention is ribosomal RNA sequencing data. This ribosomal RNA sequencing data undergoes quality control before being output as an analysis file. This analysis file is then compared and linked with various datasets, including those of species classifiers, enzymes, compounds, foods, and reference bacteria. Finally, through dietary assessment and optimization methods, a recommended food list is generated. This completes the technical solution of this invention for establishing a nutritional intake requirement system using metagenomic data.

具體地，上述的利用宏基因體資料建立營養攝取需求系統的方法包含但不限於如下步驟。 Specifically, the above-mentioned method for establishing a nutrient intake requirement system using metagenomic data includes, but is not limited to, the following steps.

步驟一、提供一宏基因體資料，該宏基因體資料包含核糖體RNA定序資料。 Step 1: Provide metagenomic data, which includes ribosomal RNA sequencing data.

步驟二、執行一物種分類程序，藉此得到對應該宏基因體資料的微生物菌相或菌種。 Step 2: Perform a species classification procedure to obtain the microbial community or species corresponding to the metagenomic data.

步驟三、執行一基因功能預測程序，藉此得到該微生物菌相或菌種的基因功能，該微生物菌相或菌種的基因功能是以酵素或代謝途徑表現。 Step 3: Execute a gene function prediction program to obtain the gene function of the microbial community or species, whereby the gene function is expressed through enzymes or metabolic pathways.

步驟四、執行一差異化分析程序，經由分析該酵素之豐富度及其和基準資料集的差異，藉此得到欠缺的酵素或代謝途徑，並由KEGG資料庫得到對應於該欠缺的酵素或代謝途徑之營養素或化合物。 Step 4: Perform a differential analysis procedure to analyze the abundance of the enzyme and its differences from the baseline dataset, thereby identifying the deficient enzyme or metabolic pathway. The corresponding nutrients or compounds for the deficient enzyme or metabolic pathway are then obtained from the KEGG database.

步驟五、執行一食物篩選程序，由一食物資料庫篩選能補充該欠缺的酵素或代謝途徑之營養素或化合物並建立一推薦食物列表。 Step 5: Perform a food screening procedure, using a food database to select nutrients or compounds that can supplement the missing enzyme or metabolic pathway and create a recommended food list.

步驟六、執行一線性規劃程序，針對該推薦食物列表進行最佳化，藉此建立完成一營養攝取需求系統，該營養攝取需求系統內含的資訊包括該微生物菌相或菌種組成、與基準資料集比較後的酵素差異、代謝途徑差異、缺乏之營養素差異分析和飲食推薦。 Step Six: Perform a linear planning procedure to optimize the recommended food list, thereby establishing a complete nutritional intake system. This system includes information such as the microbial community or species composition, enzyme differences compared to the baseline dataset, differences in metabolic pathways, analysis of nutrient deficiencies, and dietary recommendations.

通常地，本發明是使用次世代基因定序技術得到所述的核糖體RNA定序資料。且本發明上述的步驟還包括自生物檢體中萃取DNA樣本。 Typically, this invention uses next-generation sequencing technology to obtain the ribosomal RNA sequencing data. Furthermore, the steps described above in this invention also include extracting DNA samples from biological specimens.

更具體地，本發明係產出個體化的菌相分析與推薦食物報告，並以視覺化圖表呈現，有利於快速判斷結果，同時也提供表格等原始資料下載，能夠進行延伸分析和應用。 More specifically, this invention produces personalized microbial community analysis and recommended food reports, presented in visual charts for quick result interpretation. It also provides downloadable raw data such as tables for further analysis and application.

本發明之第二目的在於提供一種建立宏基因體和食物營養素關聯資料庫的方法。具體地，該建立宏基因體和食物營養素關聯資料庫的方法包含但不限於下述步驟。 A second objective of this invention is to provide a method for establishing a database of associations between metagenomics and food nutrients. Specifically, this method for establishing a database of associations between metagenomics and food nutrients includes, but is not limited to, the following steps.

步驟二、執行一物種分類程序，藉此得到對應該宏基因體資料的微生物菌相或菌種，該微生物菌相或菌種是腸道菌菌相或菌種。 Step 2: Perform a species classification procedure to obtain the microbial community or species corresponding to the metagenomic data. This microbial community or species is an intestinal flora community or species.

步驟六、執行一線性規劃程序，針對該推薦食物列表進行最佳化，藉此建立完成一宏基因體和食物營養素關聯資料庫，該宏基因體和食物營養素關聯資料庫內含資訊包含核糖體RNA定序資料、微生物菌相或菌種組成、與基準資料集比較後的酵素差異、代謝途徑差異、缺乏之營養素差異分析和飲食推薦。 Step Six: Execute a linear planning procedure to optimize the recommended food list, thereby establishing a metagenomic and nutrient association database. This database contains information including ribosomal RNA sequencing data, microbial community or species composition, enzyme differences compared to the baseline dataset, metabolic pathway differences, nutrient deficiency analysis, and dietary recommendations.

綜上，本發明的技術手段係為建立從核糖體RNA序列資料至腸道菌、酵素、代謝途徑、營養化合物及食物的分析流程，並建構創新具有生物資訊統計分析基礎的系統和資料庫。本發明的技術功效包含使用者能透過本發明清楚了解受試個體腸道菌菌相組成與營養素的變化或是與對照組之差異；具有營養保健專業的人員更能藉由本發明系統性地了解個案所缺乏的營養素並作為參考資料，再進行深入判斷與規劃食譜或營養攝取。據此，本發明提供的營養攝取需求系統以及宏基因體和食物營養素關聯資料庫能完善微生物菌相定序技術在營養保健領域的應用，以進一步滿足食品、營養和保健相關預防醫學產業的需求並達到精準營養之目的。 In summary, the technical means of this invention are to establish an analytical process from ribosomal RNA sequence data to gut microbiota, enzymes, metabolic pathways, nutrient compounds, and food, and to construct an innovative system and database based on bioinformatics statistical analysis. The technical benefits of this invention include enabling users to clearly understand changes in the gut microbiota composition and nutrients of test subjects, or differences from the control group; and enabling nutrition and health professionals to systematically understand the nutrients lacking in individuals and use this as reference data for further in-depth judgment and planning of diets or nutritional intake. Accordingly, the nutrient intake requirement system and metagenomic and food nutrient association database provided by this invention can improve the application of microbial community sequencing technology in the field of nutrition and health care, further meeting the needs of the food, nutrition, and health-related preventive medicine industries and achieving the goal of precision nutrition.

〔圖1〕係本發明建立營養攝取需求系統以及宏基因體和食物營養素關聯資料庫的步驟流程圖。 [Figure 1] is a flowchart illustrating the steps of this invention in establishing a nutrient intake requirement system and a database linking metagenomics and food nutrients.

根據前述的發明內容，以下以實施例和具體範例具體闡述本發明所採用的技術手段和達到的技術功效。 Based on the foregoing invention, the following embodiments and specific examples illustrate the technical means employed and the technical effects achieved by this invention.

本發明的第一實施例揭示一種利用宏基因體資料建立營養攝取需求系統的方法，其步驟包含但不限於下述步驟。 The first embodiment of this invention discloses a method for establishing a nutrient intake requirement system using metagenomic data, the steps of which include, but are not limited to, the following steps.

步驟六、執行一線性規劃程序，針對該推薦食物列表進行最佳化，藉此建立完成一營養攝取需求系統，該營養攝取需求系統內含的資訊包括該微生物菌相或菌種組成、與基準資料集比較後的酵素差異、代謝途徑差異、缺乏之營養素差異分析和飲食推薦。較佳地，該線性規劃程序是整數線性規劃程序。 Step Six: Perform a linear programming procedure to optimize the recommended food list, thereby establishing a complete nutrient intake requirement system. This system includes information such as the microbial community or species composition, enzyme differences compared to the baseline dataset, differences in metabolic pathways, analysis of nutrient deficiencies, and dietary recommendations. Preferably, this linear programming procedure is an integer linear programming procedure.

本發明第二實施例揭示一種建立宏基因體和食物營養素關聯資料庫的方法，其包含但不限於下述步驟。 A second embodiment of this invention discloses a method for establishing a database of associations between metagenomics and food nutrients, comprising, but not limited to, the following steps.

步驟六、執行一線性規劃程序，針對該推薦食物列表進行最佳化，藉此建立完成一宏基因體和食物營養素關聯資料庫，該宏基因體和食物營養素關聯資料庫內含資訊包含核糖體RNA定序資料、微生物菌相或菌種組成、與基準資料集比較後的酵素差異、代謝途徑差異、缺乏之營養素差異分析和飲食推薦。較佳地，該線性規劃程序是整數線性規劃程序。 Step Six: Execute a linear programming procedure to optimize the recommended food list, thereby establishing a metagenomic and nutrient association database. This database contains information including ribosomal RNA sequencing data, microbial community or species composition, enzyme differences compared to the baseline dataset, metabolic pathway differences, nutrient deficiency analysis, and dietary recommendations. Preferably, this linear programming procedure is an integer linear programming procedure.

於一應用實施例，前述的營養攝取需求系統以及宏基因體和食物營養素關聯資料庫係應用作為動物的飲食均衡指引或營養保健品的配方設計指南。具體地，該動物包含人類、寵物或牲畜。 In one application embodiment, the aforementioned nutrient intake requirements system and metagenomics and food nutrient association database are used as guidelines for balanced animal diets or for the formulation design of nutritional supplements. Specifically, the animals include humans, pets, or livestock.

根據以上揭示的第一實施例和第二實施例，以下以具體範例說明前述的技術內容和達到技術功效所採用的技術手段。 Based on the first and second embodiments disclosed above, the following specific examples illustrate the aforementioned technical content and the technical means employed to achieve the technical effects.

於一具體範例，該核糖體RNA定序資料包含原核生物16S核糖體RNA定序資料、真核生物18S核糖體RNA定序資料、原核生物23S核糖體RNA定序資料、真核生物28S核糖體RNA定序資料或其組合。 In one specific example, the ribosomal RNA sequencing data includes prokaryotic 16S ribosomal RNA sequencing data, eukaryotic 18S ribosomal RNA sequencing data, prokaryotic 23S ribosomal RNA sequencing data, eukaryotic 28S ribosomal RNA sequencing data, or combinations thereof.

於一具體範例，本發明以營養攝取需求為目標的序列分析系統以及宏基因體和食物營養素關聯資料庫是以16S rRNA腸道微生物序列資料為基礎進行分析，使用的系統資料庫以16S rRNA V3~V4序列片段為主，提供個體化營養需求報告。最後產出的結果包含：微生物組成分析、微生物基因功能預測、缺乏的營養化合物與飲食建議。 In a specific example, this invention's sequence analysis system and metagenomic and food nutrient association database, targeting nutritional intake requirements, are based on 16S rRNA gut microbiome sequence data. The system database primarily uses 16S rRNA V3-V4 sequence fragments to provide personalized nutritional requirement reports. The final outputs include: microbiome composition analysis, microbiome gene function prediction, identified deficient nutrient compounds, and dietary recommendations.

於一具體範例，該微生物菌相或菌種是腸道菌菌相或菌種。 In a specific example, the microbial community or species is an intestinal flora community or species.

於一具體範例，上述的核糖體RNA序列資料經分類器處理後，會選擇欲比較的基準資料集進行差異分析，其分析結果呈現在微生物菌相豐富度表格或欄位。 In a specific example, after the ribosomal RNA sequence data mentioned above is processed by a classifier, a benchmark dataset for comparison will be selected for differential analysis. The analysis results will be presented in a microbial community richness table or column.

於一具體範例，物種分類器(SPINGO(version 1.2))在本發明中係作為分析微生物菌相的分類器，該工具(SPINGO)能準確將16S rRNA序列分類至種(species)層級。其參數調整如下：kmer size(-kmersize)為8，The number of bootstrap samples(-bootstrap)為10、The number of kmer used for each bootstrap subsample(-subsample)為8，微生物資料庫為RDP_11.2。 In a specific example, the species classifier (SPINGO (version 1.2)) in this invention is used as a classifier for analyzing microbial communities. This tool (SPINGO) can accurately classify 16S rRNA sequences to the species level. Its parameters are adjusted as follows: kmer size (- kmersize ) is 8, the number of bootstrap samples (- bootstrap ) is 10, the number of kmers used for each bootstrap subsample (- subsample ) is 8, and the microbial database is RDP_11.2.

於一具體範例，該物種分類器分類物種的結果中會輸出兩個檔名為「results.out」以及「level_count.out」的檔案。其中，檔案(results.out)含有特徵序列編號(feature ID)及各序列所對應的菌種；檔案(level_count.out)則包含各菌種的豐富度。然而，在檔案(level_count.out)中，其「物種序列數量」僅能說明「共有幾條代表性序列屬於該物種」，而非原來各代表性序列擁有的序列數，因此本發明未直接使用檔案(level_count.out)中的代表性序列總量來估算豐富度。 In a specific example, the species classifier outputs two files named "results.out" and "level_count.out" after classifying species. The file "results.out" contains feature IDs and the corresponding species for each sequence; the file "level_count.out" contains the richness of each species. However, the "species sequence count" in the file "level_count.out" only indicates "how many representative sequences belong to this species," not the total number of sequences possessed by each representative sequence. Therefore, this invention does not directly use the total number of representative sequences in the file "level_count.out" to estimate richness.

於一具體範例，本發明揭示一種正確計算物種豐富度百分比的方法。首先，將檔案(level_count.out)裡的物種名分別對應到另一檔案(result.out)中相同的物種名(可能同時對應到複數個相同的物種名稱)，之後可獲得至少一個物種名對應的特徵序列ID，再將使用者上傳經轉檔後的BIOM檔案(.tsv)用來尋找對應的特徵序列ID即可獲得該特徵序列ID的原始序列數量之後就可計算出每一菌種所對應的物種豐富度比例。 In a specific example, this invention discloses a method for accurately calculating the percentage of species richness. First, the species names in the file (level_count.out) are mapped to the same species names in another file (result.out) (possibly mapping to multiple identical species names simultaneously). Then, the feature sequence ID corresponding to at least one species name can be obtained. Next, the user-uploaded, converted BIOM file (.tsv) is used to find the corresponding feature sequence ID, thus obtaining the original sequence count for that feature sequence ID. The species richness percentage for each species can then be calculated.

於一具體範例，本發明在得到的各菌種在樣本中的豐富度後，將基準資料集經由中位數及標準差計算之後，排序基準資料集豐富度前1至前30名的菌種，並與使用者資料合併呈現，之後即可輸出菌種豐富度表格與視覺化森林圖資料。 In a specific example, after obtaining the abundance of each bacterial species in the sample, this invention calculates the abundance of the baseline dataset using the median and standard deviation, sorts the top 1 to top 30 bacterial species by abundance in the baseline dataset, and merges this data with the user data. The resulting data can then be used to output a bacterial species abundance table and a visualized forest plot.

於一具體範例，本發明的基準腸道菌相與酵素群體資料集包含來自臺灣不同群體的基準菌相資料，使用者可根據上傳的資料特性選擇適當的基準群體進行比較，總計共有287位健康臺灣人基準群體，其年齡分布在18至80歲，其中119筆來自臺灣人微生物菌相資料庫；118筆來自臺灣大腸異常的腸道菌相研究的對照組；50名來自臺灣健康中老年人腸道菌相所有樣本皆以Illumina平台定序並夾取V3~V4區域片段。定序檔案經由QIIME2拆分 (demultiplex)、品質控管後，使用SPINGO分類器進行物種分類，再以微生物基因功能預測工具(PICRUSt2)預測相關酵素豐富度，最後經資料處理，就能獲得物種豐富度與酵素豐富度比例的資料，分析後的資料儲存於資料庫中作為基準資料。 In a specific example, the baseline gut microbiota and enzyme population dataset of this invention includes baseline microbiota data from different populations in Taiwan. Users can select appropriate baseline populations for comparison based on the characteristics of the uploaded data. A total of 287 healthy Taiwanese individuals were included as baseline populations, aged 18 to 80 years. Among them, 119 samples came from the Taiwanese microbial microbiota database; 118 samples came from the control group of a study on gut microbiota abnormalities in Taiwan; and 50 samples came from the gut microbiota of healthy middle-aged and elderly people in Taiwan. All samples were sequenced using the Illumina platform and fragments from the V3 to V4 regions were extracted. After the sequenced files are demultiplexed using QiIME2 and undergo quality control, the SPINGO classifier is used for species classification. Then, the microbial gene function prediction tool (PICRUSt2) is used to predict the abundance of related enzymes. Finally, after data processing, data on species abundance and enzyme abundance ratios are obtained. The analyzed data is stored in a database as baseline data.

於一具體範例，該微生物基因功能預測工具(PICRUSt2(2.5.1))是以標誌基因(marker gene)序列預測出微生物潛在酵素豐富度的軟體，主要分析流程為：sequence placement(place_seqs.py)、hidden-state prediction(hsp.py)、metagenome prediction(metagenome_pipeline.py)、inferring pathway abundance(pathway_pipeline.py)、pathway feature descriptions(add_descriptions.py)，在每一步驟中，所有參數皆為預設值，輸出的檔案將與基準資料集比對，找出對應的KEGG化合物。 In a specific example, the microbial gene function prediction tool (PICRUSt2(2.5.1)) is software that predicts the abundance of potential enzymes in microorganisms based on marker gene sequences. The main analysis process is as follows: sequence placement ( place_seqs.py ), hidden-state prediction ( hsp.py ), metagenome prediction ( metagenome_pipeline.py ), inferring pathway abundance ( pathway_pipeline.py ), and pathway feature descriptions ( add_descriptions.py ). In each step, all parameters are set to default values. The output file is compared with the reference dataset to find the corresponding KEGG compounds.

於一更具體範例，本發明所述的分析菌相組成是以微生物宏基因體學(metagenomics)作為技術基礎，接著分析個案腸道內各菌種的分布和益生菌的含量，並建立該個案與所選定的基準腸道菌相資料集菌種豐富度差異表格，進而解析該個案腸道菌種相比於基準資料集的菌種哪些較為缺乏或充足。然後，將該個案大量的菌種列表視覺化成各種圖表，如森林圖，利用基準資料集各菌種的分布區間，加上該個案各菌種的豐富度落點繪製，藉此能夠快速瀏覽菌種相較於基準腸道菌的分布。再者，本發明特別標示優勢菌種(dominant species)的豐富度，因此本發明通常會將前十至前三十名於基準腸道菌相資料集豐富度較高的菌種，標示於子視窗的表格中，藉此了解個案腸道內哪些優勢菌種為缺乏或是較為充足。當本發明對個案與基準資料集相減後，就會產生缺乏的酵素列表，連結KEGG資料庫後，就能進一步輸出營養化合物缺乏列表。 In a more concrete example, the present invention analyzes the gut microbiota composition based on metagenomics. It then analyzes the distribution of various bacterial species and the content of probiotics in the individual's gut, and establishes a table showing the difference in species richness between the individual and a selected baseline gut microbiota dataset. This allows for the analysis of which species in the individual's gut are less abundant or more plentiful compared to the baseline dataset. Subsequently, the large list of bacterial species in the individual is visualized into various charts, such as forest plots, using the distribution ranges of each species in the baseline dataset plus the richness points of each species in the individual's gut. This allows for a quick overview of the distribution of bacterial species compared to the baseline gut microbiota. Furthermore, this invention specifically indicates the abundance of dominant species. Therefore, this invention typically marks the top ten to thirty species with higher abundance in the baseline gut microbiota database in a sub-window table, thereby understanding which dominant species are deficient or abundant in the individual's gut. After subtracting the individual's data from the baseline database, this invention generates a list of deficient enzymes. Linking to the KEGG database allows for the further output of a list of nutrient deficiencies.

於一具體範例，KEGG基因體資料庫涵蓋基因、代謝途徑、酵素、化合物等多種資料，系統自KEGG API獲得5,621筆EC(Enzyme Commission)number、18,905筆化合物編號及名稱資料庫。 In a specific example, the KEGG genomic database covers various data including genes, metabolic pathways, enzymes, and compounds. The system obtains 5,621 EC (Enzyme Commission) numbers and 18,905 compound numbers and names from the KEGG API.

於一具體範例，本發明的微生物基因功能預測缺乏之酵素至營養化合物的差異分析是以定性呈現並能獲得缺乏的化合物與食物。推薦的食物則是根據微生物分解之營養化合物運算所得到。 In a specific example, the differential analysis of enzyme and nutrient compounds deficient in microbial gene function prediction presented in this invention is qualitative and can identify the deficient compounds and foods. Recommended foods are derived based on calculations of the nutrient compounds decomposed by the microorganisms.

於一具體範例，FooDB食物及化合物資料庫涵蓋食品成分、化學組成與營養化合物資訊，該資料庫分為「食物搜尋」與「化合物搜尋」兩大區塊，本發明由取得食物資料庫、營養化合物與食物關係資料庫、營養化合物同義詞、營養化合物編號，及建立53筆食物蔬果類植化素資料集。經篩選後，最終留下962種食物資料及23,000種營養化合物進行後續的最佳化分析。 In a specific example, the FooDB food and compound database covers information on food ingredients, chemical composition, and nutritional compounds. This database is divided into two main sections: "Food Search" and "Compound Search." This invention utilizes a food database, a database of the relationship between nutritional compounds and food, a database of nutritional compound synonyms, nutritional compound codes, and establishes a database of 53 phytochemicals from fruits and vegetables. After screening, 962 food entries and 23,000 nutritional compounds were ultimately selected for further optimization analysis.

於另一具體範例，本發明提供的個體的核糖體RNA序列資料經PICRUSt2預測基因功能後，會將預測結果與選定的基準資料集進行差異分析，將對應於該個體的核糖體RNA序列資料的酵素(EC)豐富度減去基準資料集的酵素豐富度，若其值為負數，則判定上傳分析的個體的核糖體RNA序列資料缺乏該項酵素，在後續的分析中將這些缺乏的酵素納入計算；若相減的結果為正數或零，則代表分析的資料中，並不缺乏該酵素，因此該酵素在後續分析中，不會納入考慮。經運算後，最終可建立出一份紀錄該個體所有欠缺酵素的表格，之後透過KEGG基因體資料庫獲得與這些缺乏酵素有所關聯的化合物，建立缺乏的營養化合物表格。另一方面，本發明自FooDB資料庫取得各種食物所包含的營養化合物，調查彙整哪些食物所包含的化合物可改善缺乏的營養素，之後作為最佳化分析時的輸入資料。 In another specific example, after the ribosomal RNA sequence data of an individual provided by this invention is used to predict gene function using PICRUSt2, the prediction results are compared with the selected benchmark dataset. The enzyme (EC) abundance corresponding to the individual's ribosomal RNA sequence data is subtracted from the enzyme abundance of the benchmark dataset. If the value is negative, it is determined that the ribosomal RNA sequence data of the uploaded individual lacks the enzyme, and these lacking enzymes will be included in the calculation in subsequent analyses. If the result of the subtraction is positive or zero, it means that the analyzed data does not lack the enzyme, and therefore the enzyme will not be considered in subsequent analyses. After calculation, a table recording all the enzymes lacking in the individual can be created. Then, compounds associated with these deficient enzymes are obtained from the KEGG genomic database, creating a table of deficient nutrient compounds. On the other hand, this invention obtains nutrient compounds contained in various foods from the FooDB database, investigates and compiles which food compounds can improve deficient nutrients, and then uses this as input data for optimization analysis.

於一具體範例，根據前述所得到缺乏的營養化合物表格，通常每一種食物都能供應許多不同的營養素，是屬於一對多的關係，因此在改善營養缺乏的過程中，如果所有可能的食物都攝取的話，可能造成飲食過量的問題而顯得不切實際。因此，本發明以「整數線性規劃(Integer Linear Programming,ILP)」演算法將可供給營養的食物與個體缺乏的營養化合物建構成一個矩陣，並以數學模型套件進行最佳化計算，藉此達到以「最少量的食物來滿足所有欠缺營養素的需求」的技術功效。 In a specific example, based on the aforementioned table of deficient nutrients, each food typically provides many different nutrients, creating a one-to-many relationship. Therefore, in addressing nutrient deficiencies, consuming all possible foods could lead to overeating, which is impractical. This invention uses Integer Linear Programming (ILP) to construct a matrix of available foods and deficient nutrients, and then performs optimal calculations using a mathematical model suite. This achieves the technical efficiency of meeting the needs of all deficient nutrients with the minimum amount of food consumed.

於一具體範例，本發明的應用方面在於能蒐集同一個案進行「前測」及接受飲食建議數周或數月後的「後測」資料，並進行專案間的比較，更能降低實驗偏差，給予更精準的營養攝取與食物建議。同時，在本發明提供的技術方案中，個案營養素缺乏結果是經由與系統基準資料庫比較後，得出該營養素為「缺乏」、「充足」的二分法，藉由提供「定性」結果，蒐集更多菌相資料，本發明也能結合人工智慧(Artificial Intelligence)模型預測與個案生理數據等方式，透過「定量」提供個案營養素建議攝取量，進一步達到精準評估個案營養需求的狀況。 In a specific example, the application of this invention lies in its ability to collect data from the same case during a "pre-test" and a "post-test" several weeks or months after receiving dietary advice. This data allows for inter-project comparisons, reducing experimental bias and providing more accurate nutritional intake and food recommendations. Furthermore, in the technical solution provided by this invention, the result of a case's nutrient deficiency is determined by comparing it with a system benchmark database, resulting in a dichotomy of "deficient" or "sufficient" nutrient. By providing "qualitative" results and collecting more microbial data, this invention can also combine artificial intelligence (AI) model predictions with case physiological data to provide "quantitative" recommended nutrient intake, further achieving a more accurate assessment of the case's nutritional needs.

於一代表實施例，請參照圖1所示，首先從生物樣本中萃取得RNA檢體，經過次世代定序得到核糖體RNA序列資料，該核糖體RNA序列資料即為本發明所述的宏基因體資料 1 ；輸入該宏基因體資料 1 至一計算機系統進行物種分類程序 2 ，藉此得到對應該宏基因體資料的微生物菌相或菌種；執行一基因功能預測程序 3 ，藉此得到該微生物菌相或菌種的基因功能，該微生物菌相或菌種的基因功能是以酵素或代謝途徑表現；執行一差異化分析程序 4 ，經由分析該酵素之豐富度及其和基準資料集的差異，藉此得到欠缺的酵素或代謝途徑，並由KEGG資料庫得到對應於該欠缺的酵素或代謝途徑之營養素或化合物；執行一食物篩選程序 5 ，由一食物資料庫篩選能補充該欠缺的酵素或代謝途徑之營養素或化合物並建立一推薦食物列表；和執行一線性規劃程序 6 ，係以整數線性規劃程序針對該推薦食物列表進行最佳化，藉此建立完成本發明所述的營養攝取需求系統以及宏基因體和食物營養素關聯資料庫 7 。 In one representative embodiment, referring to Figure 1, RNA samples are first extracted from biological samples. After next-generation sequencing, ribosomal RNA sequence data is obtained, which is the metagenomic data 1 described in this invention. The metagenomic data 1 is then input into a computer system for species classification program 2 , thereby obtaining the microbial community or species corresponding to the metagenomic data. A gene function prediction program 3 is executed to obtain the gene function of the microbial community or species, which is expressed through enzymes or metabolic pathways. A differential analysis program 4 is then executed. By analyzing the abundance of the enzyme and its difference from the benchmark dataset, the deficient enzyme or metabolic pathway is identified, and the corresponding nutrients or compounds for the deficient enzyme or metabolic pathway are obtained from the KEGG database. A food screening procedure 5 is executed to screen nutrients or compounds that can supplement the deficient enzyme or metabolic pathway from a food database and establish a recommended food list. A linear programming procedure 6 is executed to optimize the recommended food list using integer linear programming, thereby establishing the nutritional intake requirement system and the metagenomic and food nutrient association database 7 described in this invention.

於一具體範例，執行本發明的物種分類程序，藉此得到對應該宏基因體資料的微生物菌相或菌種，具體結果如表一所示。 In a specific example, the species classification procedure of this invention was executed to obtain the microbial community or species corresponding to the metagenomic data. The specific results are shown in Table 1.

於一具體範例，執行差異化分析程序，經由分析該酵素之豐富度及其和基準資料集的差異，具體結果如表二所示。 In a specific example, a differential analysis procedure was performed. The results, showing the enzyme abundance and its difference from the baseline dataset, are illustrated in Table 2.

表二。 Table 2.

於一具體範例，由KEGG資料庫得到對應於該欠缺的酵素或代謝途徑之酵素之營養素或化合物如表三所示。 As a specific example, Table 3 shows the nutrients or compounds corresponding to the deficient enzyme or metabolic pathway obtained from the KEGG database.

於一具體範例，與基準資料集比較後的飲食推薦如表四所示。其中，左半邊欄位為食物列表與所屬種類，頻率表示該組(實驗組或對照組)中整體推薦該款食物頻率，右半邊欄位為節錄三個樣本中所推薦的食物ID，ID可自左半邊對照。 As a specific example, the dietary recommendations compared with the baseline dataset are shown in Table 4. The left column contains a list of foods and their categories; the frequency indicates the overall recommendation frequency of that food in the group (experimental or control group). The right column lists the food IDs from the three samples, which can be compared with the left column.

綜上，本發明的技術手段包括進行物種分類、基因功能酵素預測及菌種豐度差異分析，最後輸出「種(species)」層級菌相差異數值；然後進行酵素差異分析，與選定的基準資料集相減後得出使用者分析資料中的缺乏酵素，最後輸出詳細的酵素缺乏表格；將計算得出的缺乏酵素根據KEGG資料庫連結並查找出相對應的營養化合物，進一步整理輸出缺乏之營養化合物表格；根據FooDB資料庫找出每一種食物所包含的營養化合物，以此找出能夠補充營養素的潛在食物作為候選；最後將候選食物與缺乏的營養素構築成大型矩陣，並以整數線性規劃(ILP)最佳化，輸出個體化及群體食物推薦列表。本發明建立的營養需求攝取系統以及宏基因體和食物營養素關聯資料庫能提供使用者在不同專案間進行差異分析，輸出食物推薦結果與推薦食物的頻率比較，進而設計符合個體化需求的飲食和營養需求，達到精準營養的技術功效。依據本發明的方法建立的營養需求攝取系統以及宏基因體和食物營養素關聯資料庫可提供產業界有系統地了解腸道菌相與飲食營養的關聯性連結，協助產業界制定精準的飲食建議與營養改善方案。據此本發明能廣泛應用在營養和食品相關產業，且所應用的個體或群體涵蓋人類、寵物或牲畜。 In summary, the technical means of this invention include species classification, gene functional enzyme prediction, and species richness difference analysis, finally outputting the species-level microbial diversity value; then performing enzyme difference analysis, subtracting from the selected benchmark dataset to obtain the deficient enzymes in the user's analysis data, and finally outputting a detailed enzyme deficiency table; the calculated deficient enzymes are then used to analyze the KEGG database. The system links and locates corresponding nutrient compounds, further compiling and outputting a table of deficient nutrient compounds. Based on the FooDB database, it identifies the nutrient compounds contained in each food, thus identifying potential food candidates that can supplement these nutrients. Finally, it constructs a large matrix of candidate foods and deficient nutrients, optimizing it with integer linear programming (ILP) to output individualized and group food recommendation lists. The nutritional requirement intake system and metagenomic and food nutrient association database established in this invention allow users to perform differential analysis across different projects, comparing food recommendation results with the frequency of recommended foods, thereby designing diets and nutritional needs that meet individual requirements, achieving the technical efficacy of precision nutrition. The nutrient requirement intake system and metagenomic and food nutrient association database established according to the method of this invention can provide industry with a systematic understanding of the relationship between gut microbiota and dietary nutrition, assisting industry in developing precise dietary recommendations and nutritional improvement programs. Therefore, this invention can be widely applied in nutrition and food-related industries, and the individuals or groups to which it applies include humans, pets, and livestock.

以上雖以特定範例說明本發明，但並不因此限定本發明之範圍，只要不脫離本發明之要旨，熟悉本技藝者瞭解在不脫離本發明的意圖及範圍下可進行各種變形或變更 While the present invention has been illustrated with specific examples above, it does not limit the scope of the invention. Those skilled in the art will understand that various modifications or alterations can be made without departing from the intent and scope of the invention, provided they do not depart from its essence.

Claims

A method for establishing a nutrient intake requirement system using metagenomic data includes the following steps: 1. Providing metagenomic data, which includes ribosomal RNA sequencing data, specifically prokaryotic 16S ribosomal RNA sequencing data, eukaryotic 18S ribosomal RNA sequencing data, prokaryotic 23S ribosomal RNA sequencing data, eukaryotic 28S ribosomal RNA sequencing data, or combinations thereof; 2. Performing a species classification procedure to obtain the microbial community or species corresponding to the metagenomic data; 3. Performing a gene function prediction procedure to obtain the gene function of the microbial community or species, wherein the gene function of the microbial community or species is expressed through enzymes or metabolic pathways; 4. Perform a differential analysis procedure to identify the deficient enzyme or metabolic pathway by analyzing its abundance and differences from a benchmark dataset. Then, retrieve the corresponding nutrients or compounds from the KEGG database. Fifth, perform a food screening procedure to select nutrients or compounds from a food database that can supplement the deficient enzyme or metabolic pathway. The compound is then used to establish a recommended food list; and six, a linear programming procedure is performed to optimize the recommended food list, thereby establishing a complete nutritional intake system. This system includes information such as the microbial community or species composition, enzyme differences compared to a baseline dataset, differences in metabolic pathways, analysis of nutrient deficiencies, and dietary recommendations.

The method for establishing a nutrient intake requirement system using metagenomic data as described in claim 1, wherein the microbial community or species is an intestinal flora community or species.

The method for establishing a nutritional intake requirements system using metagenomic data as described in claim 1, wherein the food database includes the FooDB food and compound database.

The method for establishing a nutrient intake requirement system using metagenomic data as described in claim 1, wherein the linear programming procedure is an integer linear programming procedure.

The method for establishing a nutritional intake requirement system using metagenomic data as described in claim 1, wherein the nutritional intake requirement system is applied as a guide for balanced animal diets or a formula design guide for nutritional supplements.

A method for establishing a database linking metagenomics and food nutrients includes the following steps: 1. Providing metagenomic data, which includes ribosomal RNA sequencing data, specifically prokaryotic 16S ribosomal RNA sequencing data, eukaryotic 18S ribosomal RNA sequencing data, prokaryotic 23S ribosomal RNA sequencing data, eukaryotic 28S ribosomal RNA sequencing data, or combinations thereof; 2. Performing a species classification procedure to obtain a microbial community or species corresponding to the metagenomic data, wherein the microbial community or species is an intestinal flora community or species; 3. Performing a gene function prediction procedure to obtain the gene function of the microbial community or species, wherein the gene function of the microbial community or species is expressed through an enzyme or metabolic pathway; 4. Performing… A differential analysis procedure is used to analyze the abundance of an enzyme and its differences from a benchmark dataset to identify deficient enzymes or metabolic pathways. The corresponding nutrients or compounds for these deficient enzymes or metabolic pathways are then retrieved from a KEGG database. A food screening procedure is then performed to use a food database to select nutrients or compounds that can supplement the deficient enzymes or metabolic pathways and to create a recommended food list. ; and 6. Execute a linear planning procedure to optimize the recommended food list, thereby establishing a metagenomic and food nutrient association database. This database contains information including ribosomal RNA sequencing data, microbial community or species composition, enzyme differences compared to the baseline dataset, metabolic pathway differences, nutrient deficiency analysis, and dietary recommendations.

The method for establishing a metagenomic and food nutrient association database as described in claim 6, wherein the food database comprises the FooDB food and compound database.

The method for establishing a database of associations between metagenomics and food nutrients as described in claim 6, wherein the linear programming procedure is an integer linear programming procedure.