WinSPC Custom Web Reporter
Advanced statistical analysis tools available in Custom Web Reporter
CWR offers a complete set of advanced analytical tools and rich graphics to support Six Sigma and other complex data analysis. This allows manufacturers to perform dynamic data visualization from the desktop make better, faster decisions; answer questions; and capitalize on opportunities. The advanced analytical tools in CWR include:
Univariate Statistics: The univariate statistics procedure computes various univariate statistics: mean, median, variance, maximum, minimum, Coefficient of variation, Corrected sum of squares, Geometric mean, Standard error of the geometric mean, Harmonic mean, Standard error of harmonic mean, Interquartile range, Interquartile range of the median, Kurtosis, Standard error of the median, Midrange, Number of missing cases, First quartile, Third quartile, Range, Sample Size, Skewness, Standard deviation, Standard error of the mean, Sum, Sum of case weights, Number of valid cases. Percentiles may also be computed. The results may be displayed separately for each variable or in summary form for all variables. The results may also be saved for use in other calculations.
Frequency Distributions: The frequency distribution procedure computes a frequency distribution for measurement variables. Rather than computing counts for individual values, this procedure computes counts for values that fall into continuous intervals. The output consists of: lower and upper endpoints of the intervals, frequency counts, and relative and cumulative percentages.
Frequency Tables: The frequency table procedure produces 1-way to n-way frequency and cross tabulation tables and multiple response tables. Frequency tables show the distribution of the values of a variable with the number of occurrences of each unique value of the variable. Cross tabulation tables show combined frequencies for two or more variables. The results of the cross tabulation may be saved for later use. When the Statistics Module is licensed, the frequency table procedure also performs tests and computes measures of association. For n-way tables, it does stratified analysis, computing statistics within and across strata.
Multi-Way Univariate Statistics: The multi-way univariate statistics procedure provides a technique for examining various statistics for dependent or analysis variables among various groupings in a sample or population. The groupings are determined by using categorical class variables; e.g., group the dependent variable GPA by Sex and Class. The default statistics are: frequency count, mean, standard deviation, and number of valid cases. The following statistics may be computed: C.O.V., maximum, mean, midrange, minimum, missing cases, valid cases, range, standard deviation, standard error, sum, sum of case weights, and variance.
Tabular Reporting: The tabular report procedure builds tables of descriptive statistics from classification variables and analysis variables. Tables are constructed in up to three dimensions: stub, banner, and page. The stub (row dimension) and banner (column dimension) may have multiple variables, nested or concatenated.
The body of the table is made up of cells, which contain the information in the tablefrequency counts, percentages, means, or other statistics. The cells are defined by the values of the variable, or combination of variables, for the table. In a one-dimensional table, the cells are formed by rows, in a two-dimensional table they are formed by the intersection of rows and columns, and in a three-dimensional table, cells are formed by the intersection of rows, columns, and pages.
Statistics for each cell are calculated on values from all cases defined by that cell. That is, each value of a classification variable such as Academic_class, freshman, sophomore, etc., defines a cell. When calculating statistics for an analysis variable such as GPA, statistics are calculated for the values of GPA that correspond to the different academic classes.
Graphics Capabilities: CWR provides procedures to graphically explore the shapes, patterns, and relationships of your data. Graphics are available for:
- scatter or curve, contour, bubble, sunflower
- scatterplot matrix
One- And Two-Sample Inference
CWR provides procedures for testing and estimation in one- or two-sample problems. This includes both “continuous” responses and exact tests and other inferences for proportions. For the one-sample case, a confidence interval for the population mean is provided, along with an optional test of an hypothesized mean.
For the two-sample case and the paired-data case, a test for equal population means is provided along with confidence limits for the differences in means. Some diagnostics are provided, indicating when the procedures may not be appropriate. In these situations more robust procedures may be used, such as the Location procedure, which provides inference about either the population mean or median; the Dispersion procedure provides inference about either the population standard deviation or interquartile range (IQR) of a population based on a single sample.
The Location and Dispersion procedures include diagnostics to indicate when methods for normally-distributed data are not suitable, along with suggestions as to how to proceed in such cases. An approximation to the Shapiro-Wilk W test is used to test for normality.
The following procedures are also available for two-samples:
- Compare Location provides inferences comparing either the population means, medians, or geometric means.
- Compare Dispersion provides inferences comparing either the population standard deviations or the interquartile ranges.
- Guided Compare provides “interactive” measure of location for two samples with guidance.
- Each of these procedures includes diagnostics to indicate when methods for normally distributed data are not suitable, and suggestions as to how to proceed in such cases. The following rank methods are included:
- Wilcoxon test for comparing two independent samples
- Sign and Signed Rank test for paired data
- Median test for two independent samples
- Runs test
- Kolmorogov-Smirnov test for comparing two samples
CWR provides enumerative data procedures for:
- Binomial data which includes both one- and two-sample applications and regression models. The binomial regression performs maximum likelihood fitting of regression models where the data are proportions, following the binomial distribution, using logistic (logit) or probit models.
- Poisson Regression for maximum likelihood fitting using a loglinear model.
- Contingency tables, including one-way to n-way frequency and crosstabulation tables and multiple response tables.
For n-way tables, CWR does stratified analysis, computing statistics within and across strata. The following statistics can be requested:
- Likelihood Ratio Chi-square
- Mantel-Haenszel Chi-square
- Phi Coefficient
- Contingency Coefficient
- Cramer's V
For 2 X 2 tables, the following are also computed:
- Continuity Adjusted Chi-square
- Fisher Exact Test (l-tail and 2-tail)
- McNemar's Test (+ continuity adjusted)
For tests across strata, the Cochran-Mantel-Haenszel correlation statistic (df=l) may be computed for an n-way table. If all of the tables are 2 X 2, then summary estimates of the relative risk are also computed. The following measures of association and their asymptotic standard error can be requested:
- Gamma Kendall’s Tau b
- Stuart’s Tau c
- Somers’ D
- Pearson's Correlation
- Lambda Asymmetric
- Uncertainty Coefficient
- Uncertainty Coefficient Symmetric
For 2 X 2 tables, relative risk estimates plus confidence intervals are computed. Also, loglinear models may be fitted via:
- The Parameter Estimates procedure which uses a Newton-Raphson method to find parameter estimates and standard errors for such models.
- The Fitted Values procedure which uses iterative proportional fitting and does not give parameter estimates. It is mainly used to determine whether interactions are significant, and to fit models assuming specified higher order interactions are absent.
Analysis Of Variance
CWR provides several parametric and nonparametric procedures for analysis of variance.
- The one-way procedure includes the post-hoc tests: Fisher's LSD, Tukey's W, Newman-Keuls, Duncan's New Multiple Range and Scheffe's S.
- N-way factorial designs with either balanced or unbalanced data, provided there are no empty cells.
- Repeated measures such as split-plot and changeover designs with either balanced or unbalanced cell sizes; missing cells are not supported.
- Analysis of Covariance for a oneway treatment design and one numerical covariable.
- The General Linear Models procedure provides for the use of regression models with factors specified by matrices; each matrix containing one or more columns of covariables; also provides for both univariate and multivariate analysis.
- Kruskal-Wallis one-way rank ANOVA.
- Friedman ANOVA by ranks for randomized block designs, including Kendall’s coefficient of concordance.
- Cochran’s Q test for matched frequencies.
CWR provides both parametric and nonparametric procedures for computing correlation analysis. The Pearson product-moment and Spearman rank order correlation coefficients are calculated. Options for calculating t-tests and computing with case weights are also provided. Correlation matrices may be saved and used as input into other procedures.
CWR provides procedures for simple, multiple, stepwise, all possible subset, binomial, Poisson, Weibull, and nonlinear regression. It’s simple and multiple linear regression models use least squares or weighted least squares methods. Optional statistics and output for simple regression include:
- Beta covariance and correlation matrices, variance inflation factor, partial correlations, and semi-partial correlations
- Collinearity diagnostics
- Influence statistics: residual, standard error of residual, Studentized residual, Studentized residual with current observation deleted, Cook’s D influence statistic, leverage, Durbin-Watson, sum of residuals, sum of squared residuals, press statistic, and the minimum and maximum residual
- Predicted diagnostics: predicted value, standard error of the individual predicted value, standard error of the mean predicted value, 95% confidence intervals for individual and mean predicted value
CWR's stepwise multiple regression includes weighted least squares, using either the forward selection, backward elimination, stepwise, or maximum R2 method. Options include those for simple regression and also Mallows' Cp.
Graphical diagnostics for multiple regression include:
- Partial residual plots for detecting nonlinearity.
- Leverage plots for detecting observations which may be having inordinate influence on the regression fitting.
- Residual analysis which displays either the fitted values or any one of the independent variables plotted against any one of: Cook’s D, leverage values, predicted values, or various versions of the residuals (standardized, studentized, studentized based on deletion, etc.).
- Ridge trace analysis which shows how regression coefficients change in “ridge regression” as the value of the “ridge parameter” is increased.
- Linear and Polynomial display the ordinary least squares line of Y with X, X2, X3, or X4 superimposed over a scatterplot of the data.
Binomial regression performs maximum likelihood fitting of regression models where the data are proportions, following the binomial distribution, using logistic (logit) or probit models. Poisson regression performs maximum likelihood fitting of regression models where the response is a Poisson variable, using a loglinear model.
Nonlinear regression fits models by least squares or weighted least squares using one of four methods: Gauss-Newton, modified Gauss-Newton, Marquardt, or DUD (doesn't use derivatives). Grid searches for initial estimates may be requested as well as specifying a loss function to be minimized.
CWR's all possible subsets regression is performed using one of four methods: maximizing R2, maximizing adjusted R2, minimizing mean square error, or minimizing Mallows’ Cp.
CWR provides a variety of multivariate analysis procedures:
- Multivariate analysis of variance, including repeated measures and profile analysis.
- Principal components analysis which provides standardized or unstandardized principal component scores.
- Factor analysis which provides five methods of factor extraction: principal components, iterated principal components, image, alpha factor analysis, and principal factor analysis. A scree plot and Bartlett's sphericity test are also available. There are three methods of orthogonal rotation: varimax, equamax, and quartimax.
- The promax oblique rotation is also available. Plots of all loadings and rotated loadings can be requested.
- Factor scores can be calculated and saved.
- Canonical correlation analysis and canonical redundancy analysis whose output consists of eigenvalues, canonical correlations, variance ratio, chi-square statistic, and standardized canonical coefficients. Options are provided for calculating among and between group correlations, canonical loadings, cross loadings, Stewart and Love redundancy analysis, orthogonal rotation of the loadings, and plots of the loadings.
- Cluster analysis using either centroid linkage with euclidean, chi-square or phi-square distance measure or K Means clustering with initial cluster estimation.
- Discriminant analysis can optionally save the Mahalanobis’ distances of each observation to each group mean, probabilities for the Mahalanobis’ distances, classifications, posterior probabilities, and the group means and within-groups covariance matrix.
Other types of discriminate analysis include:
- Stepwise addition of the predictor variables can optionally save the classifications, posterior probabilities, and the group means, and within groups, covariance matrix.
- Quadratic discriminant analysis, in which the data are assumed to come from a population that has a multivariate normal distribution but the equality of the covariance matrices of the groups is not assumed, can optionally save the classifications, posterior probabilities, and the group means.
- K nearest neighbor discriminant analysis is non-parametric and makes no assumption about the underlying distribution of the data.
Time Series Analysis
CWR’s time series analysis procedures include:
- Estimating the parameters of an ARIMA model (Box-Jenkins) and generating forecasts for seasonal and nonseasonal models.
- Analyzing auto-regressive vector models. This is suitable for forecasting, where typically one of the coordinates of the time series is the variable of primary interest and the others are associated variables which might aid in the forecast.
- Computing and plotting the autocorrelation function.
- Computing and plotting seasonal or periodic averages to assist in identifying seasonal trends.
- Computing and plotting the cross-correlation function.
- Computing the lagged difference of a variable.
- Performing a Difference-Sign test of randomness.
- Computing and plotting the partial autocorrelation function (used to help identify the AR parameters for the ARIMA procedure).
- Computing polynomial distributed lag regression, also known as an Almon lag. A regression is performed on the dependent variable and its lags, and optionally, other exogenous variables.
- Performing a test of randomness based on the ranks of the data for detecting trends in data.
- Performing one or more of: moving average, single or double exponential smoothing, Holt’s two parameter smoothing, Winter’s three parameter smoothing, and Classical Decomposition forecasting.
- Performing a test of randomness based on the number of turning points in the data.
Reliability And Survival Analysis
These procedures are for the analysis of response-time data, also called survival analysis. They include:
- Kaplan-Meier estimator of the survival curve from censored data.
- Cox regression, which relates response times to explanatory variables in a way which does not require specification of the distribution of the response times.
- Weibull analysis which offers a one sample procedure to fit a Weibull distribution to possibly censored response-time data, and a regression procedure for relating response times to explanatory variables (which could include treatments and thus be used for two-sample problems).
Even though the assumptions are different, the formulations of models for Weibull and Cox regression have strong similarities. Either can be considered as "proportional hazards" models. For the Weibull case, the hazard function is assumed to have a simple parametric form, and for Cox regression this form need not be specified. Weibull methods will often be more useful in reliability work and Cox regression in biostatitics.
Pie Chart. Displays several values, each as a slice of a pie. Each slice may be labeled what percentage of the total pie it represents.
Bar/Line Charts. Display one or more sets of Y values in relation to a single X value. CWR allows a wide variety of attributes for individualBar/Line Items to be controlled. CWR supports 7 different styles of Bar/Line charts:
- Bar Chart. Displays vertical or horizontal bars next to each other.
- Area Chart. Plots one or more variables with the area between the X axis and the values filled in, creating a colored, shaded or pattern filled area.
- Curve (spline) Chart. Plots a fitted curve through each value in the variable.
- Line Chart. Plots one or more variables in a fashion similar to that of an Area Chart, but without the filled area beneath the plot.
- LoWeSS (Locally Weighted Scatterplot Smoother) Chart. Plots values for a variable with a robust smoothed curve fitted to the values added.
- Point Chart / Scatterplot. Plots the values of a variable as individual points on the chart.
- Trend Chart (linear regression). Plots the values of a variable and overlies a "trend" line.
- XY Chart. An XY chart typically plots one or more Y values against a single X value. CWR allows up to 6 unique X values to each be plotted against a corresponding unique Y value. There are three types of XY charts:
- Point Chart / Scatterplot. Plots one or more XY pairs of variables on a single chart, showing each pair as an individual point in the plot.
- Line Chart. Plots one or more XY pairs of variables on a single chart, with each plot shown as a connected line.
- Curve (spline) Chart. Plots one or more XY pairs of variables on a single chart, with each plot shown as a smoothed spline curve running through the points in the plot.
- Histogram. Plots a histogram for measurement variables.
- Box Plot. Displays a box plot for each variable in the variable list. The procedure requires a real variable and can handle a grouping variable that is numeric or string.
- Probability Plot. Display a normal probability plot for a single variable. The purpose of this plot is to show whether the data approximates a normal distribution, which can be an important assumption in many statistical analyses.
- Q-Q Plot. Examines the distribution of one variable or compares the distributions of two variables. It may be used to generate any one of three types of plots:
- Percentile plot
- Percentile Comparison
- Empirical Q-Q Plot
- X-Y and Contour Plots:
- XY Plot. Displays a single variable on the X axis and one or more variables on the Y axis. The default XY plot produced is a scatter plot. Each point in the graph is identified by a marker symbol. Where there are multiple Y variables, a different marker is used for each Y variable. An XY plot may also have the points connected; these are called curve plots.
- XYZ Plot. Displays a scatter plot where a classification variable is used to determine groups for a single Y variable. Each of these groups will be plotted as a separate Y variable, up to a maximum of 12 groups.
- Bubble Plot. Displays a single XY plot with the marker symbol size based on a response variable. The marker symbol is always a circle.
- Sunflower Plot. Is useful when both the X and Y variables are categorical and the response variable contains counts or frequencies. The values of the response variable are represented by petals—a single point is represented by a dot.
- Contour Plot. For each point (x, y) in equally-spaced grid of points in the X-Y plane, a representative value of z is computed by local smoothing (fitting a local quadratic regression). Then a contour plot is made representing the relation of these computed z-values to (x, y) points in the grid.
- Scatterplot Matrix. Displays scatter plot matrices, that is, all variables in the list are plotted against each other. This makes it easy to track an interesting point or group of points from plot to plot. An optional smooth curve can be drawn through each scatterplot to help visualize the relationship between the two variables.
- Function Contour Plot. Displays contour plots of a mathematically-defined relation Z = f(X,Y), as opposed to a contour plot for empirical data. The plot is drawn in a manner that represents three-dimensional relationships in two dimensions. Lines or areas in the plot represent levels of magnitude, Z, corresponding to a position (X,Y) on a plane.