Resource

Q&A of Metabolomic Data Analysis

What criteria are generally used for differential metabolite screening?

A: VIP values from multivariate statistical models and P values from univariate statistical t-tests are generally used simultaneously to screen for differential metabolites. Univariate statistical analysis methods such as t-test and ANOVA focus more on independent changes in metabolite levels. Multivariate statistical analysis focuses more on the relationships between metabolites and their facilitation/antagonism relationships in biological processes. Considering the results of both types of statistical analysis methods simultaneously helps us to observe the data from different perspectives and draw conclusions, and also helps us to avoid false positive errors or model overfitting caused by using only one type of statistical analysis method.

The screening thresholds are generally VIP > 1 and P < 0.05. If a large number of differential metabolites are obtained, the screening condition of differential multiplicity can be added.

What should I do if I find no differential metabolites?

A: If the commonly used thresholds (VIP>1 and P<0.05) are used for screening but no differential metabolites are found, the thresholds can be set more stringently, such as VIP>1.5, or P<0.01. If still no differential metabolites are screened, KEGG pathway analysis can be performed on the detected substances. The metabolic pathways involved in the metabolites are investigated to observe whether there are other replenishment pathways and whether there is some correlation between the metabolic pathways and the disease.

What is the difference between PLS-DA and OPLS-DA models?

A: OPLS-DA has an additional positive exchange algorithm than PLS-DA, which filters out signals that are irrelevant to the model classification. For example, when the between-group differences are relatively small and the within-group differences are relatively large, the VIP filtering with PLS-DA may be a within-group difference variable, which is easily misleading, while OPLS-DA can filter out the between-group differences more accurately.

In the PCA and OPLS-DA models, some samples deviate from the 95% confidence interval, do such data need to be excluded?

A: It is not recommended to reject. It is normal for individual samples to deviate from the 95% confidence interval, and it will not affect the subsequent data analysis.

What is the basis for discriminating when 2 or 3 principal components are extracted in PCA?

A: In SIMCA, it is discriminated by Q2. When adding principal components leads to a decrease in Q2, it means the model is overfitted and stop adding principal components.

Why the explanation rate of PCA/OPLS-DA model is sometimes very low?

A: It must have something to do with the sample. In addition, it is related to the way of scaling and transform. In this case, we can adjust the normalization method of data processing and the transform and scaling method of modeling to observe if there is any improvement.

Does a Q2 value of less than 0.5 for PLS model cross-validation mean that the model cannot be used?

A: In general, the closer the Q2 value is to 1, the better the prediction of the model is, but there is no clear requirement that the Q2 must be >0.5. If the Q2 is less than 0.5, it means that the prediction of the model is not that good and the reliability is not that high, but it can be used.

The Q2 value is used as a reference for judgment, and is not absolute.

If my data volume is not very large and complex, how can I use multivariate methods for analysis?

A: If the data volume is not very large, the same multivariate methods can be used for analysis in software such as SIMCA. However, the data volume is small and may be over-fitted. Therefore, it is not necessary to use multivariate, you can choose other methods, such as univariate analysis methods.

Isn't multivariate statistical analysis suitable for cases with many variables and small sample sizes? Why is it better to do multivariate statistical analysis with 6 replicates than 3 replicates?

A: For statistical analysis, only a certain sample size can show the statistical significance. For metabolomics, there are many factors affecting metabolism, so a larger sample size can reduce individual differences.

Why is metabolomics analysis usually limited to a two-by-two comparison?

A: The main limitation is the OPLS-DA analysis. For comparative analysis of more than two groups, it is difficult for OPLS-DA model to calculate the contribution of metabolites to the differences between groups. The bigger difficulty is the difficulty in giving a reasonable explanation.

Can the sample size be different for the two comparison groups?

A: Yes, it is possible, only that the number of biological replicates in each group should meet the minimum requirement

Does the "area" in "area normalization" refer to the total area of a sample or the total area of all samples?

A: The total area of all substances tested in a sample.

How can I find the peak of the substance of interest from the TIC graph?

A: Combine the retention time (RT) and the characteristic mass-to-charge ratio (M/Z) values to find the peak of interest.

* For Research Use Only. Not for use in diagnostic procedures.
Our customer service representatives are available 24 hours a day, 7 days a week. Inquiry