Data continues to become more abundant, and so the datasets that contain it. Even though big datasets can present insights and opportunities, they can pose significant challenges when it comes to statistical analysis. One of the biggest challenges, required to process and analyze large datasets, is the computational resources. Regression can be problematic in case of big datasets, due to the...
In this work, focus is given in the Bayesian variable selection problem for high-dimensional linear regression problems. The use of shrinkage priors, when the number n of available observations is less than the number p of explanatory variables, is a well-established method, which shares great theoretical and empirical properties. By using imaginary data and shrinkage priors as baseline...
In many empirical domains, the availability of ultrahigh-dimensional data has led to the development of feature screening and variable selection procedures aiming to detect the informative variables of datasets and consequently remove unimportant features.
In this context, we propose a ranking-based variable selection procedure that extends the Ranking Based Variable Selection technique...
Composite indicators are a common choice for synthesizing complex phenomena. Over the years, they have grown in popularity and are now applied in many social and environmental sciences. Among others, a subject of increasing interest is gender equality analysis. Gender composite indicators, even if easy to read, may provide a limited picture of the problem. Here we discuss the potentiality of...
In the context of students' ability assessment, considering collateral information in addition to item responses can be helpful in increasing the accuracy of the measurement. In this vein, the evaluation of students' abilities via computer based-devices has made response time data available at the item level (Wang et al., 2019). Besides, the literature (Becker et al., 2022) has highlighted an...
Business and consumers survey data are the basis for several indicators describing the trend of macro-economic variables that are fundamental for monitoring the overall performance of the economic system. Qualitative surveys typically ask interviewees to express their perceptions or expectations about the current or future tendency of a reference economic variable (such as inflation or...
Standard models for categorical and ordinal data, such as log-linear, association models and logistic regression models for binary or ordinal responses, as well as the Mallows model for rank data are revisited and defined through statistical information theoretic properties in terms of the Kullback–Leibler (KL) divergence. In the sequel, replacing the KL by the φ-divergence, which is a family...
There is a growing interest in the analysis of replication studies of original findings across many disciplines. When testing a hypothesis for an effect size, two Bayesian approaches stand out for their principled use of the Bayes factor (BF), namely the replication BF (Verhagen and Wakenmalers, 2014) and the skeptical BF (Pawel and Held, 2022). In both cases replication data are used to...
Survey data items are commonly collected on a Likert scale and may have an additional “don’t know” category. It is also typical to have questions that are not applicable to some individuals or to observe floor or ceiling effects on ordinal or interval responses. These situations necessitate the use of mixture models to properly account for the structure of the data. The model formulation also...
In recent times, the integration of historical data in the design and analysis of new clinical trials has gained considerable attention, owing to ethical reasons and difficulties encountered in recruiting patients. In the Bayesian framework, the process of informative prior elicitation is widely recognized as a complex and multifaceted undertaking, requiring the careful quantification and...
We propose a novel estimation method for multivariate regime switching models based on a Student-t copula function. These models account for the interdependencies between multiple variables by considering the correlation strength controlled by specific parameters. Moreover, they address fat-tailed distributions through the number of degrees of freedom. These parameters, in turn, are governed...
For the analysis of ordered categorical data, CUB modelling approach entails the estimation of two main structural latent components of the rating process: feeling and uncertainty, parameterized within a two-component mixture of Binomial and uniform distributions: see Piccolo and Simone 2019 for an overview. Featuring parameters can be possibly linked to subject covariates to determine twofold...
Latent class models rely on the conditional
independence assumption, i.e., it is assumed that the categorical
variables are independent given the cluster memberships.
Within the Bayesian framework, we propose a suitable specification of
priors for the latent class model to identify the clusters in
multivariate categorical data where the independence assumption is not
fulfilled. Each...
Regularized regression models are well studied and, under appropriate conditions, offer fast and statistically interpretable results. However, large data in many applications are heterogeneous in the sense of harboring distributional differences between latent groups. Then, the assumption that the conditional distribution of response Y given features X is the same for all samples may not hold....
The cause of failure in cohort studies that involve competing risks is frequently incompletely observed. Failure to deal with this issue can lead to substantially biased estimates. To the best of our knowledge, all the methods that have addressed the issue in the context of semiparametric competing risks models rely on a missing at random (MAR) assumption. Nevertheless, the MAR assumption is...
We consider a Bayesian approach for the analysis of rating data when a scaling component is taken into account, thus incorporating a specific form of heteroskedasticity. Our approach includes model-based probability effect measures that enable comparisons of distributions among multiple groups. These effect measures are adjusted for explanatory variables that have an impact on both the...
The contribution aims at discussing some preliminary results on the evaluation of prediction performance for the class of mixture models with uncertainty (Piccolo and Simone, 2019). The ultimate goal of the analysis is the evaluation of the extent by which the uncertainty specification constitutes an added value for prediction of ordinal scores. A small simulation study is presented to assess...
Multivariate time series data is becoming an increasingly common research topic. Unlike univariate time series, the temporal dependence of a multivariate series includes both serial dependences and interdependences across different marginal series. Consequently, as the number of component series increases, multivariate time series models become overparameterized. In addition, there are many...
Digital revolution has dramatically changed not only the way people interact but also the relationship with the self-image. Increased data availability and computational power have significantly improved algorithms for facial feature detection which have been also successfully applied to develop face filter apps enhancing and “beautifying” self-portraits.
Potential of these filters in...
The use of networks as a tool for studying complex
systems gained popularity in various scientific disciplines. In the past decade, the ``network takeover'' reached psychology, and networks were utilized to abstract complex psychological phenomena. In psychopathology, a network-based framework known as the network theory of mental illness, posits that mental disorders emerge as systems of...
Assume that we would like to estimate the expected value of a function f with respect to a density π by using an importance density function q. We prove that if π and q are close enough under KL divergence, an independent Metropolis sampler estimator that obtains samplers from π with proposal density q, enriched with a variance reduction computational strategy based on control variates,...
In many application fields, the variables used to measure a phenomenon are gathered into homogeneous blocks that measure partial aspects of the phenomenon. For example, in sensory analysis, the overall quality of products may depend on the taste and odor variables, etc. In consumer analysis, consumer preferences may depend on physical-chemical and sensory variables. In some contexts, a...