Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 23 Aug 19
 [1] arXiv:1908.08093 [pdf, other]

Title: Statistical approaches using longitudinal biomarkers for disease early detection: A comparison of methodologiesAuthors: Yongli Han, Paul S. Albert, Christine D. Berg, Nicolas Wentzensen, Hormuzd A. Katki, Danping LiuSubjects: Applications (stat.AP); Methodology (stat.ME)
Early detection of clinical outcomes such as cancer may be predicted based on longitudinal biomarker measurements. Tracking longitudinal biomarkers as a way to identify early disease onset may help to reduce mortality from diseases like ovarian cancer that are more treatable if detected early. Two general frameworks for disease risk prediction, the shared random effects model (SREM) and the pattern mixture model (PMM) could be used to assess longitudinal biomarkers on disease early detection. In this paper, we studied the predictive performances of SREM and PMM on disease early detection through an application to ovarian cancer, where early detection using the risk of ovarian cancer algorithm (ROCA) has been evaluated. Comparisons of the above three methods were performed via the analyses of the ovarian cancer data from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial and extensive simulation studies. The timedependent receiving operating characteristic (ROC) curve and its area (AUC) were used to evaluate the prediction accuracy. The outofsample predictive performance was calculated using leaveoneout crossvalidation (LOOCV), aiming to minimize the problem of model overfitting. A careful analysis of the use of the biomarker cancer antigen 125 for ovarian cancer early detection showed improved performance of PMM as compared with SREM and ROCA. More generally, simulation studies showed that PMM outperforms ROCA unless biomarkers are taken at very frequent screening settings.
 [2] arXiv:1908.08095 [pdf, other]

Title: Paired Test of Matrix Graphs and Brain Connectivity AnalysisComments: 30 pages, 1 figure, 5 tablesSubjects: Methodology (stat.ME); Neurons and Cognition (qbio.NC)
Inferring brain connectivity network and quantifying the significance of interactions between brain regions are of paramount importance in neuroscience. Although there have recently emerged some tests for graph inference based on independent samples, there is no readily available solution to test the change of brain network for paired and correlated samples. In this article, we develop a paired test of matrix graphs to infer brain connectivity network when the groups of samples are correlated. The proposed test statistic is both bias corrected and variance corrected, and achieves a small estimation error rate. The subsequent multiple testing procedure built on this test statistic is guaranteed to asymptotically control the false discovery rate at the prespecified level. Both the methodology and theory of the new test are considerably different from the two independent samples framework, owing to the strong correlations of measurements on the same subjects before and after the stimulus activity. We illustrate the efficacy of our proposal through simulations and an analysis of an Alzheimer's Disease Neuroimaing Initiative dataset.
 [3] arXiv:1908.08098 [pdf, ps, other]

Title: BRIDGE: Byzantineresilient Decentralized Gradient DescentComments: 18 pages, 1 figure, 1 table; preprint of a conference paperSubjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP)
Decentralized optimization techniques are increasingly being used to learn machine learning models from data distributed over multiple locations without gathering the data at any one location. Unfortunately, methods that are designed for faultless networks typically fail in the presence of node failures. In particular, Byzantine failurescorresponding to the scenario in which faulty/compromised nodes are allowed to arbitrarily deviate from an agreedupon protocolare the hardest to safeguard against in decentralized settings. This paper introduces a Byzantineresilient decentralized gradient descent (BRIDGE) method for decentralized learning that, when compared to existing works, is more efficient and scalable in higherdimensional settings and that is deployable in networks having topologies that go beyond the star topology. The main contributions of this work include theoretical analysis of BRIDGE for strongly convex learning objectives and numerical experiments demonstrating the efficacy of BRIDGE for both convex and nonconvex learning tasks.
 [4] arXiv:1908.08116 [pdf, other]

Title: An Email Experiment to Identify the Effect of Racial Discrimination on Access to Lawyers: A Statistical ApproachSubjects: Applications (stat.AP)
We consider the problem of conducting an experiment to study the prevalence of racial bias against individuals seeking legal assistance, in particular whether lawyers use clues about a potential client's race in deciding whether to reply to email requests for representations. The problem of discriminating between potential linear and nonlinear effects of a racial signal is formulated as a statistical inference problem, whose objective is to infer a parameter determining the shape of a specific function. Various complexities associated with the design and analysis of this experiment are handled by applying a novel combination of rigorous, semirigorous and rudimentary statistical techniques. The actual experiment is performed with a population of lawyers in Florida.
 [5] arXiv:1908.08133 [pdf, other]

Title: Insights and Inference for the Proportion Below the Relative Poverty LineSubjects: Methodology (stat.ME)
We examine a commonly used relative poverty measure $H_p$, defined to be the proportion of incomes falling below the relative poverty line, which is defined to be a fraction $p$ of the median income. We do this by considering this concept for theoretical income populations, and its potential for determining actual changes following transfer of incomes from the wealthy to those whose incomes fall below the relative poverty line. In the process we derive and evaluate the performance of large sample confidence intervals for $H_p$. Finally, we illustrate the estimators on real income data sets.
 [6] arXiv:1908.08144 [pdf, ps, other]

Title: There is no Reliable Way to Detect Hacked BallotMarking DevicesAuthors: Philip B. StarkSubjects: Applications (stat.AP); Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Election system vendors are marketing ballotmarking devices (BMDs) as a universal system, and some states are deploying them for all voters, not just those who need a BMD to vote independently. Like all devices with CPUs, BMDs can be hacked, misprogrammed, or misconfigured. BMD printout might not reflect what the BMD screen or audio confirmed. If a voter complains that the BMD altered votes, officials have no way to tell whether there was a BMD malfunction, the voter erred, or the voter is attempting to cast doubt on the election. Preelection logic and accuracy (L&A) tests have little ability to detect outcomechanging problems, in part because BMDs know the time and date, and in part because it is in practice impossible to simulate the full range of voter interactions with the device. _Parallel_ or _live_ tests of BMDs on election day address the first problem but not the second. In practice, neither L&A nor parallel tests can probe all combinations of voter preferences, device settings, ballot language, duration of voter interaction, input and output interfaces, and other variables that could include a enough votes to change election outcomes. Even if less parallel testing sufficed, it would still require extra BMDs, new infrastructure for creating test patterns, new infrastructure for reporting test results, additional pollingplace staff, and additional staff training. And if parallel testing discovers an error, the only remedy is to hold a new election: there is no way to reconstruct the correct election result from an untrustworthy paper trail. Minimizing the number of votes cast using BMDs is prudent election administration.
 [7] arXiv:1908.08243 [pdf, ps, other]

Title: Expectile based measures of skewnessSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
In the literature, quite a few measures have been proposed for quantifying the deviation of a probability distribution from symmetry. The most popular of these skewness measures are based on the third centralized moment and on quantiles. However, there are major drawbacks in using these quantities. These include a strong emphasis on the distributional tails and a poor asymptotic behaviour for the (empirical) moment based measure as well as difficult statistical inference and strange behaviour for discrete distributions for quantile based measures.
Therefore, in this paper, we introduce skewness measures based on or connected with expectiles. Since expectiles can be seen as smoothed versions of quantiles, they preserve the advantages over the moment based measure while not exhibiting most of the disadvantages of quantile based measures. We introduce corresponding empirical counterparts and derive asymptotic properties. Finally, we conduct a simulation study, comparing the newly introduced measures with established ones, and evaluating the performance of the respective estimators.  [8] arXiv:1908.08258 [pdf, ps, other]

Title: Adaptive Configuration Oracle for Online Portfolio Selection MethodsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Financial markets are complex environments that produce enormous amounts of noisy and nonstationary data. One fundamental problem is online portfolio selection, the goal of which is to exploit this data to sequentially select portfolios of assets to achieve positive investment outcomes while managing risks. Various algorithms have been proposed for solving this problem in fields such as finance, statistics and machine learning, among others. Most of the methods have parameters that are estimated from backtests for good performance. Since these algorithms operate on nonstationary data that reflects the complexity of financial markets, we posit that adaptively tuning these parameters in an intelligent manner is a remedy for dealing with this complexity. In this paper, we model the mapping between the parameter space and the space of performance metrics using a Gaussian process prior. We then propose an oracle based on adaptive Bayesian optimization for automatically and adaptively configuring online portfolio selection methods. We test the efficacy of our solution on algorithms operating on equity and index data from various markets.
 [9] arXiv:1908.08264 [pdf, other]

Title: `Regression Anytime' with BruteForce SVD TruncationSubjects: Statistics Theory (math.ST); Numerical Analysis (math.NA); Optimization and Control (math.OC); Probability (math.PR); Computational Finance (qfin.CP)
We propose a new leastsquares Monte Carlo algorithm for the approximation of conditional expectations in the presence of stochastic derivative weights. The algorithm can serve as a building block for solving dynamic programming equations, which arise, e.g., in nonlinear option pricing problems or in probabilistic discretization schemes for fully nonlinear parabolic partial differential equations. Our algorithm can be generically applied when the underlying dynamics stem from an Euler approximation to a stochastic differential equation. A builtin variance reduction ensures that the convergence in the number of samples to the true regression function takes place at an arbitrarily fast polynomial rate, if the problem under consideration is smooth enough.
 [10] arXiv:1908.08320 [pdf, other]

Title: Spatial and Spatiotemporal GARCH Models  A Unified ApproachSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
In timeseries analyses and particularly in finance, generalised autoregressive conditional heteroscedasticity (GARCH) models are widely applied statistical tools for modelling volatility clusters (i.e. periods of increased or decreased risks). In contrast, the spatial dependence in conditional second moments of spatial and spatiotemporal processes has been considered rather uncritical until now. Only a few models have been proposed for modelling local clusters of increased risks. In this paper, we introduce a unified spatial and spatiotemporal GARCHtype model, which covers all previously proposed spatial autoregressive conditional heteroscedasticity (ARCH) models but also introduces novel spatial GARCH (spGARCH) and EGARCH processes. For this common modelling framework, maximumlikelihood estimators are derived. In addition to the theoretical contributions, we suggest a model selection strategy verified by a series of Monte Carlo simulation studies. Eventually, the use of the unified model is demonstrated by an empirical example. In particular, we focus on realestate prices from 1995 to 2014 in all Berlin ZIPcode areas. For these data, a spatial autoregressive model has been applied, which shows locally varying model uncertainties captured by the spatial GARCHtype models.
 [11] arXiv:1908.08389 [pdf, other]

Title: Chaotic Time Series Prediction using SpatioTemporal RBF Neural NetworksComments: Published in: 2018 3rd International Conference on Emerging Trends in Engineering, Sciences and Technology (ICEEST). arXiv admin note: substantial text overlap with arXiv:1908.01321Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.dataan)
Due to the dynamic nature, chaotic time series are difficult predict. In conventional signal processing approaches signals are treated either in time or in space domain only. Spatiotemporal analysis of signal provides more advantages over conventional unidimensional approaches by harnessing the information from both the temporal and spatial domains. Herein, we propose an spatiotemporal extension of RBF neural networks for the prediction of chaotic time series. The proposed algorithm utilizes the concept of timespace orthogonality and separately deals with the temporal dynamics and spatial nonlinearity(complexity) of the chaotic series. The proposed RBF architecture is explored for the prediction of MackeyGlass time series and results are compared with the standard RBF. The spatiotemporal RBF is shown to out perform the standard RBFNN by achieving significantly reduced estimation error.
 [12] arXiv:1908.08444 [pdf, other]

Title: Hierarchical Bayes Modeling for LargeScale InferenceSubjects: Methodology (stat.ME)
Bayesian modeling is now ubiquitous in problems of largescale inference even when frequentist criteria are in mind for evaluating the performance of a procedure. By far most popular in literature of the past decade and a half are empirical Bayes methods, that have shown in practice to improve significantly over strictlyfrequentist competitors in many different problems. As an alternative to empirical Bayes methods, in this paper we propose hierarchical Bayes modeling for largescale problems, and address two separate points that, in our opinion, deserve more attention. The first is nonparametric "deconvolution" methods that are applicable also outside the sequence model. The second point is the adequacy of Bayesian modeling for situations where the parameters are by assumption deterministic. We provide partial answers to both: first, we demonstrate how our methodology applies in the analysis of a logistic regression model. Second, we appeal to Robbins's compound decision theory and provide an extension, to give formal justification for the Bayesian approach in the sequence case.
 [13] arXiv:1908.08484 [pdf, ps, other]

Title: Minimum Description Length RevisitedComments: under submissionSubjects: Methodology (stat.ME); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
This is an uptodate introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, averagecase methodology is replaced by a more robust worstcase approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and crossvalidation vs Bayes can, to a large extent, be viewed from a unified perspective.
 [14] arXiv:1908.08489 [pdf]

Title: Time series model selection with a metalearning approach; evidence from a pool of forecasting algorithmsComments: 30 pages, 10 tables, and 7 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
One of the challenging questions in time series forecasting is how to find the best algorithm. In recent years, a recommender system scheme has been developed for time series analysis using a metalearning approach. This system selects the best forecasting method with consideration of the time series characteristics. In this paper, we propose a novel approach to focusing on some of the unanswered questions resulting from the use of metalearning in time series forecasting. Therefore, three main gaps in previous works are addressed including, analyzing various subsets of top forecasters as inputs for metalearners; evaluating the effect of forecasting error measures; and assessing the role of the dimensionality of the feature space on the forecasting errors of metalearners. All of these objectives are achieved with the help of a diverse stateoftheart pool of forecasters and metalearners. For this purpose, first, a pool of forecasting algorithms is implemented on the NN5 competition dataset and ranked based on the two error measures. Then, six machinelearning classifiers known as metalearners, are trained on the extracted features of the time series in order to assign the most suitable forecasting method for the various subsets of the pool of forecasters. Furthermore, twodimensionality reduction methods are implemented in order to investigate the role of feature space dimension on the performance of metalearners. In general, it was found that metalearners were able to defeat all of the individual benchmark forecasters; this performance was improved even after applying the feature selection method.
Crosslists for Fri, 23 Aug 19
 [15] arXiv:1810.01483 (crosslist from astroph.CO) [pdf, other]

Title: DeepCMB: Lensing Reconstruction of the Cosmic Microwave Background with Deep Neural NetworksComments: 19 pages; LaTeX; 12 figures; changes to match published versionJournalref: Astronomy and Computing 28 100307 (2019)Subjects: Cosmology and Nongalactic Astrophysics (astroph.CO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Nextgeneration cosmic microwave background (CMB) experiments will have lower noise and therefore increased sensitivity, enabling improved constraints on fundamental physics parameters such as the sum of neutrino masses and the tensortoscalar ratio r. Achieving competitive constraints on these parameters requires high signaltonoise extraction of the projected gravitational potential from the CMB maps. Standard methods for reconstructing the lensing potential employ the quadratic estimator (QE). However, the QE performs suboptimally at the low noise levels expected in upcoming experiments. Other methods, like maximum likelihood estimators (MLE), are under active development. In this work, we demonstrate reconstruction of the CMB lensing potential with deep convolutional neural networks (CNN)  ie, a ResUNet. The network is trained and tested on simulated data, and otherwise has no physical parametrization related to the physical processes of the CMB and gravitational lensing. We show that, over a wide range of angular scales, ResUNets recover the input gravitational potential with a higher signaltonoise ratio than the QE method, reaching levels comparable to analytic approximations of MLE methods. We demonstrate that the network outputs quantifiably different lensing maps when given input CMB maps generated with different cosmologies. We also show we can use the reconstructed lensing map for cosmological parameter estimation. This application of CNN provides a few innovations at the intersection of cosmology and machine learning. First, while training and regressing on images, we predict a continuousvariable field rather than discrete classes. Second, we are able to establish uncertainty measures for the network output that are analogous to standard methods. We expect this approach to excel in capturing hardtomodel nonGaussian astrophysical foreground and noise contributions.
 [16] arXiv:1908.08035 (crosslist from eess.IV) [pdf, other]

Title: More unlabelled data or label more data? A study on semisupervised laparoscopic image segmentationAuthors: Yunguan Fu, Maria R. Robu, Bongjin Koo, Crispin Schneider, Stijn van Laarhoven, Danail Stoyanov, Brian Davidson, Matthew J. Clarkson, Yipeng HuComments: Accepted to MICCAI MIL3ID 2019Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Improving a semisupervised image segmentation task has the option of adding more unlabelled images, labelling the unlabelled images or combining both, as neither image acquisition nor expert labelling can be considered trivial in most clinical applications. With a laparoscopic liver image segmentation application, we investigate the performance impact by altering the quantities of labelled and unlabelled training data, using a semisupervised segmentation algorithm based on the mean teacher learning paradigm. We first report a significantly higher segmentation accuracy, compared with supervised learning. Interestingly, this comparison reveals that the training strategy adopted in the semisupervised algorithm is also responsible for this observed improvement, in addition to the added unlabelled data. We then compare different combinations of labelled and unlabelled data set sizes for training semisupervised segmentation networks, to provide a quantitative example of the practically useful tradeoff between the two data planning strategies in this surgical guidance application.
 [17] arXiv:1908.08037 (crosslist from cs.LG) [pdf]

Title: Hebbian Graph EmbeddingsSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Representation learning has recently been successfully used to create vector representations of entities in language learning, recommender systems and in similarity learning. Graph embeddings exploit the locality structure of a graph and generate embeddings for nodes which could be words in a language, products of a retail website; and the nodes are connected based on a context window. In this paper, we consider graph embeddings with an errorfree (errorless) associative learning update rule, which models the embedding vector of node as a nonconvex Gaussian mixture of the embeddings of the nodes in its immediate vicinity with some constant variance that is reduced as iterations progress. It is very easy to parallelize our algorithm without any form of shared memory, which makes it possible to use it on very large graphs with a much higher dimensionality of the embeddings. Results show that our algorithm performs well when the dimensionality of the embeddings is large.
 [18] arXiv:1908.08045 (crosslist from astroph.IM) [pdf, other]

Title: Modeling the Gaia ColorMagnitude Diagram with Bayesian Neural Flows to Constrain Distance EstimatesComments: 15 pages, 8 figuresSubjects: Instrumentation and Methods for Astrophysics (astroph.IM); Astrophysics of Galaxies (astroph.GA); Machine Learning (cs.LG); Machine Learning (stat.ML)
We demonstrate an algorithm for learning a flexible colormagnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multidimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this datadriven model using Gaia DR2 photometry and parallaxes. Dust estimation and dereddening is done iteratively inside the model and without prior distance information, using the Bayestar map. The signaltonoise (precision) of distance measurements improves on average by more than 48% over the raw Gaia data, and we also demonstrate how the accuracy of distances have improved over other models, especially in the noisyparallax regime. Applications are discussed, including significantly improved Milky Way disk separation and substructure detection. We conclude with a discussion of future work, which exploits the normalizing flow architecture to allow us to exactly marginalize over missing photometry, enabling the inclusion of many surveys without losing coverage.
 [19] arXiv:1908.08082 (crosslist from cs.LG) [pdf, ps, other]

Title: Dynamic Scheduling of MPIbased Distributed Deep Learning Training JobsJournalref: Published at MLSys Workshop @ NeurIPS 2018 (https://nips.cc/Conferences/2018/Schedule?showEvent=10919) December 7th, 2018Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
There is a general trend towards solving problems suited to deep learning with more complex deep learning architectures trained on larger training sets. This requires longer compute times and greater data parallelization or model parallelization. Both data and model parallelism have been historically faster in parameter server architectures, but data parallelism is starting to be faster in ring architectures due to algorithmic improvements. In this paper, we analyze the math behind ring architectures and make an informed adaptation of dynamic scheduling to ring architectures. To do so, we formulate a nonconvex, nonlinear, NPhard integer programming problem and a new efficient doubling heuristic for its solution. We build upon Horovod: an open source ring architecture framework over TensorFlow. We show that Horovod jobs have a low cost to stop and restart and that stopping and restarting ring architecture jobs leads to faster completion times. These two facts make dynamic scheduling of ring architecture jobs feasible. Lastly, we simulate a scheduler using these runs and show a more than halving of average job time on some workload patterns.
 [20] arXiv:1908.08118 (crosslist from cs.NE) [pdf, other]

Title: Neural Plasticity NetworksComments: arXiv admin note: text overlap with arXiv:1904.04432Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)
Neural plasticity is an important functionality of human brain, in which number of neurons and synapses can shrink or expand in response to stimuli throughout the span of life. We model this dynamic learning process as an $L_0$norm regularized binary optimization problem, in which each unit of a neural network (e.g., weight, neuron or channel, etc.) is attached with a stochastic binary gate, whose parameters determine the level of activity of a unit in the network. At the beginning, only a small portion of binary gates (therefore the corresponding neurons) are activated, while the remaining neurons are in a hibernation mode. As the learning proceeds, some neurons might be activated or deactivated if doing so can be justified by the costbenefit tradeoff measured by the $L_0$norm regularized objective. As the training gets mature, the probability of transition between activation and deactivation will diminish until a final hardening stage. We demonstrate that all of these learning dynamics can be modulated by a single parameter $k$ seamlessly. Our neural plasticity network (NPN) can prune or expand a network depending on the initial capacity of network provided by the user; it also unifies dropout (when $k=0$), traditional training of DNNs (when $k=\infty$) and interpolates between these two. To the best of our knowledge, this is the first learning framework that unifies network sparsification and network expansion in an endtoend training pipeline. Extensive experiments on synthetic dataset and multiple image classification benchmarks demonstrate the superior performance of NPN. We show that both network sparsification and network expansion can yield compact models of similar architectures and of similar predictive accuracies that are close to or sometimes even higher than baseline networks. We plan to release our code to facilitate the research in this area.
 [21] arXiv:1908.08127 (crosslist from econ.GN) [pdf]

Title: Forecasting escooter competition with direct and access trips by mode and distance in New York CityComments: 18 pages, 6 figuresSubjects: General Economics (econ.GN); Applications (stat.AP)
Given the lack of demand forecasting models for escooter sharing systems, we address this research gap using data from Portland, OR, and New York City. A loglog regression model is estimated for escooter trips based on user age, income, labor force participation, and health insurance coverage, with an adjusted R squared value of 0.663. When applied to the Manhattan market, the model predicts 66K daily escooter trips, which would translate to 67 million USD in annual revenue (based on average 12minute trips and historical fare pricing models). We propose a novel nonlinear, multifactor model to break down the number of daily trips by the alternate modes of transportation that they would likely substitute. The final model parameters reveal a relationship with taxi trips as well as access/egress trips with public transit in Manhattan. Our model estimates that escooters would replace at most 1% of taxi trips; the model can explain $800,000 of the annual revenue from this competition. The distance structure of revenue from access/egress trips is found to differ significantly from that of substituted taxi trips.
 [22] arXiv:1908.08142 (crosslist from cs.LG) [pdf, other]

Title: Transferability and Hardness of Supervised Classification TasksComments: This paper is published at the International Conference on Computer Vision (ICCV) 2019Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose a novel approach for estimating the difficulty and transferability of supervised classification tasks. Unlike previous work, our approach is solution agnostic and does not require or assume trained models. Instead, we estimate these values using an information theoretic approach: treating training labels as random variables and exploring their statistics. When transferring from a source to a target task, we consider the conditional entropy between two such variables (i.e., label assignments of the two tasks). We show analytically and empirically that this value is related to the loss of the transferred model. We further show how to use this value to estimate task hardness. We test our claims extensively on three large scale data sets  CelebA (40 tasks), Animals with Attributes 2 (85 tasks), and CaltechUCSD Birds 200 (312 tasks)  together representing 437 classification tasks. We provide results showing that our hardness and transferability estimates are strongly correlated with empirical hardness and transferability. As a case study, we transfer a learned face recognition model to CelebA attribute classification tasks, showing state of the art accuracy for tasks estimated to be highly transferable.
 [23] arXiv:1908.08145 (crosslist from cs.LG) [pdf, other]

Title: A Neural Network for SemiSupervised Learning on ManifoldsComments: 12 pages, 4 figures, accepted in ICANN 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Semisupervised learning algorithms typically construct a weighted graph of data points to represent a manifold. However, an explicit graph representation is problematic for neural networks operating in the online setting. Here, we propose a feedforward neural network capable of semisupervised learning on manifolds without using an explicit graph representation. Our algorithm uses channels that represent localities on the manifold such that correlations between channels represent manifold structure. The proposed neural network has two layers. The first layer learns to build a representation of lowdimensional manifolds in the input data as proposed recently in [8]. The second learns to classify data using both occasional supervision and similarity of the manifold representation of the data. The channel carrying label information for the second layer is assumed to be "silent" most of the time. Learning in both layers is Hebbian, making our network design biologically plausible. We experimentally demonstrate the effect of semisupervised learning on nontrivial manifolds.
 [24] arXiv:1908.08169 (crosslist from cs.LG) [pdf, other]

Title: Semisupervised Adversarial Active Learning on Attributed GraphsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Active learning (AL) on attributed graphs has received increasing attention with the prevalence of graphstructured data. Although AL has been widely studied for alleviating label sparsity issues with the conventional independent and identically distributed (i.i.d.) data, how to make it effective over attributed graphs remains an open research question. Existing AL algorithms on graphs attempt to reuse the classic AL query strategies designed for i.i.d. data. However, they suffer from two major limitations. First, different AL query strategies calculated in distinct scoring spaces are often naively combined to determine which nodes to be labelled. Second, the AL query engine and the learning of the classifier are treated as two separating processes, resulting in unsatisfactory performance. In this paper, we propose a SEmisupervised Adversarial active Learning (SEAL) framework on attributed graphs, which fully leverages the representation power of deep neural networks and devises a novel AL query strategy in an adversarial way. Our framework learns two adversarial components: a graph embedding network that encodes both the unlabelled and labelled nodes into a latent space, expecting to trick the discriminator to regard all nodes as already labelled, and a semisupervised discriminator network that distinguishes the unlabelled from the existing labelled nodes in the latent space. The divergence score, generated by the discriminator in a unified latent space, serves as the informativeness measure to actively select the most informative node to be labelled by an oracle. The two adversarial components form a closed loop to mutually and simultaneously reinforce each other towards enhancing the active learning performance. Extensive experiments on four realworld networks validate the effectiveness of the SEAL framework with superior performance improvements to stateoftheart baselines.
 [25] arXiv:1908.08200 (crosslist from cs.LG) [pdf, ps, other]

Title: Finite Precision Stochastic Optimisation  Accounting for the BiasSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider first order stochastic optimization where the oracle must quantize each subgradient estimate to $r$ bits. We treat two oracle models: the first where the Euclidean norm of the oracle output is almost surely bounded and the second where it is mean square bounded. Prior work in this setting assumes the availability of unbiased quantizers. While this assumption is valid in the case of almost surely bounded oracles, it does not hold true for the standard setting of mean square bounded oracles, and the bias can dramatically affect the convergence rate. We analyze the performance of standard quantizers from prior work in combination with projected stochastic gradient descent for both these oracle models and present two new adaptive quantizers that outperform the existing ones.
Specifically, for almost surely bounded oracles, we establish first a lower bound for the precision needed to attain the standard convergence rate of $T^{\frac 12}$ for optimizing convex functions over a $d$dimentional domain. Our proposed Rotated Adaptive Tetraiterated Quantizer (RATQ) is merely a factor $O(\log \log \log^\ast d)$ far from this lower bound. For mean square bounded oracles, we show that a stateoftheart Rotated Uniform Quantizer (RUQ) from prior work would need atleast $\Omega(d\log T)$ bits to achieve the convergence rate of $T^{\frac 12}$, using any optimization protocol. However, our proposed Rotated Adaptive Quantizer (RAQ) outperforms RUQ in this setting and attains a convergence rate of $T^{\frac 12}$ using a precision of only $O(d\log\log T)$. For mean square bounded oracles, in the communicationstarved regime where the precision $r$ is fixed to a constant independent of $T$, we show that RUQ cannot attain a convergence rate better than $T^{\frac 14}$ for any $r$, while RAQ can attain convergence at rates arbitrarily close to $T^{\frac 12}$ as $r$ increases.  [26] arXiv:1908.08204 (crosslist from eess.IV) [pdf, other]

Title: Convolutional Recurrent Reconstructive Network for Spatiotemporal Anomaly Detection in Solder Paste InspectionSubjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Surface mount technology (SMT) is a process for producing printed circuit boards. Solder paste printer (SPP), package mounter, and solder reflow oven are used for SMT. The board on which the solder paste is deposited from the SPP is monitored by solder paste inspector (SPI). If SPP malfunctions due to the printer defects, the SPP produces defective products, and then abnormal patterns are detected by SPI. In this paper, we propose a convolutional recurrent reconstructive network (CRRN), which decomposes the anomaly patterns generated by the printer defects, from SPI data. CRRN learns only normal data and detects anomaly pattern through reconstruction error. CRRN consists of a spatial encoder (SEncoder), a spatiotemporal encoder and decoder (STEncoderDecoder), and a spatial decoder (SDecoder). The STEncoderDecoder consists of multiple convolutional spatiotemporal memories (CSTMs) with STAttention mechanism. CSTM is developed to extract spatiotemporal patterns efficiently. Additionally, a spatiotemporal attention (STAttention) mechanism is designed to facilitate transmitting information from the STEncoder to the STDecoder, which can solve the longterm dependency problem. We demonstrate the proposed CRRN outperforms the other conventional models in anomaly detection. Moreover, we show the discriminative power of the anomaly map decomposed by the proposed CRRN through the printer defect classification.
 [27] arXiv:1908.08223 (crosslist from cs.LG) [pdf, other]

Title: NLLinkNet: Toward Lighter but More Accurate Road Extraction with NonLocal OperationsComments: Under reviewSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Road extraction from very high resolution satellite images is one of the most important topics in the field of remote sensing. For the road segmentation problem, spatial properties of the data can usually be captured using Convolutional Neural Networks. However, this approach only considers a few local neighborhoods at a time and has difficulty capturing longrange dependencies. In order to overcome the problem, we propose NonLocal LinkNet with nonlocal blocks that can grasp relations between global features. It enables each spatial feature point to refer to all other contextual information and results in more accurate road segmentation. In detail, our method achieved 65.00\% mIOU scores on the DeepGlobe 2018 Road Extraction Challenge dataset. Our best model outperformed DLinkNet, 1stranked solution, by a significant gap of mIOU 0.88\% with much less number of parameters. We also present empirical analyses on proper usage of nonlocal blocks for the baseline model.
 [28] arXiv:1908.08281 (crosslist from cs.SI) [pdf, other]

Title: Block Randomized Optimization for Adaptive Hypergraph LearningComments: 5 pages, 1 figure, International Conference on Image Processing (ICIP) 2019Subjects: Social and Information Networks (cs.SI); Machine Learning (stat.ML)
The highorder relations between the content in social media sharing platforms are frequently modeled by a hypergraph. Either hypergraph Laplacian matrix or the adjacency matrix is a big matrix. Randomized algorithms are used for lowrank factorizations in order to approximately decompose and eventually invert such big matrices fast. Here, block randomized Singular Value Decomposition (SVD) via subspace iteration is integrated within adaptive hypergraph weight estimation for image tagging, as a first approach. Specifically, creating lowrank submatrices along the main diagonal by tessellation permits fast matrix inversions via randomized SVD. Moreover, a second approach is proposed for solving the linear system in the optimization problem of hypergraph learning by employing the conjugate gradient method. Both proposed approaches achieve high accuracy in image tagging measured by F1 score and succeed to reduce the computational requirements of adaptive hypergraph weight estimation.
 [29] arXiv:1908.08286 (crosslist from cs.LG) [pdf, other]

Title: Learning stochastic differential equations using RNN with log signature featuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper contributes to the challenge of learning a function on streamed multimodal data through evaluation. The core of the result of our paper is the combination of two quite different approaches to this problem. One comes from the mathematically principled technology of signatures and logsignatures as representations for streamed data, while the other draws on the techniques of recurrent neural networks (RNN). The ability of the former to manage high sample rate streams and the latter to manage large scale nonlinear interactions allows hybrid algorithms that are easy to code, quicker to train, and of lower complexity for a given accuracy.
We illustrate the approach by approximating the unknown functional as a controlled differential equation. Linear functionals on solutions of controlled differential equations are the natural universal class of functions on data streams. They are mathematically very explicit and interpretable, allow quantitative arguments, and yet are able to approximate any continuous function on streams arbitrarily well. They form the basis of rough path theory. Stochastic differential equations are examples of controlled differential equations where the controlling stream is a stochastic process. Following this approach, we give a hybrid LogsigRNN algorithm that learns functionals on streamed data with outstanding performance.  [30] arXiv:1908.08314 (crosslist from eess.SP) [pdf, other]

Title: LEAP nets for power grid perturbationsAuthors: Benjamin Donnot (TAU), Balthazar Donon (TAU), Isabelle Guyon (TAU), Zhengying Liu (TAU), Antoine Marot, Patrick Panciatici, Marc Schoenauer (TAU)Journalref: ESANN, Apr 2019, Bruges, BelgiumSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose a novel neural network embedding approach to model power transmission grids, in which high voltage lines are disconnected and reconnected with oneanother from time to time, either accidentally or willfully. We call our architeture LEAP net, for Latent Encoding of Atypical Perturbation. Our method implements a form of transfer learning, permitting to train on a few source domains, then generalize to new target domains, without learning on any example of that domain. We evaluate the viability of this technique to rapidly assess curative actions that human operators take in emergency situations, using real historical data, from the French high voltage power grid.
 [31] arXiv:1908.08339 (crosslist from cs.LG) [pdf]

Title: The Learning of Fuzzy Cognitive Maps With Noisy Data: A Rapid and Robust Learning Method With Maximum EntropyComments: The manuscript has been published on IEEE Transactions on CyberneticsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Numerous learning methods for fuzzy cognitive maps (FCMs), such as the Hebbianbased and the populationbased learning methods, have been developed for modeling and simulating dynamic systems. However, these methods are faced with several obvious limitations. Most of these models are extremely time consuming when learning the largescale FCMs with hundreds of nodes. Furthermore, the FCMs learned by those algorithms lack robustness when the experimental data contain noise. In addition, reasonable distribution of the weights is rarely considered in these algorithms, which could result in the reduction of the performance of the resulting FCM. In this article, a straightforward, rapid, and robust learning method is proposed to learn FCMs from noisy data, especially, to learn largescale FCMs. The crux of the proposed algorithm is to equivalently transform the learning problem of FCMs to a classicconstrained convex optimization problem in which the leastsquares term ensures the robustness of the welllearned FCM and the maximum entropy term regularizes the distribution of the weights of the welllearned FCM. A series of experiments covering two frequently used activation functions (the sigmoid and hyperbolic tangent functions) are performed on both synthetic datasets with noise and realworld datasets. The experimental results show that the proposed method is rapid and robust against data containing noise and that the welllearned weights have better distribution. In addition, the FCMs learned by the proposed method also exhibit superior performance in comparison with the existing methods. Index TermsFuzzy cognitive maps (FCMs), maximum entropy, noisy data, rapid and robust learning.
 [32] arXiv:1908.08340 (crosslist from cs.LG) [pdf, ps, other]

Title: An EndtoEnd Encrypted Neural Network for Gradient Updates Transmission in Federated LearningComments: 8 pages, 3 figuresJournalref: This paper is an extended version of a summary published in the Proc. of 2019 Data Compression Conference (DCC). The 1page summary in the DCC proceedings can be found at: https://ieeexplore.ieee.org/document/8712695Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Federated learning is a distributed learning method to train a shared model by aggregating the locallycomputed gradient updates. In federated learning, bandwidth and privacy are two main concerns of gradient updates transmission. This paper proposes an endtoend encrypted neural network for gradient updates transmission. This network first encodes the input gradient updates to a lowerdimension space in each client, which significantly mitigates the pressure of data communication in federated learning. The encoded gradient updates are directly recovered as a whole, i.e. the aggregated gradient updates of the trained model, in the decoding layers of the network on the server. In this way, gradient updates encrypted in each client are not only prevented from interception during communication, but also unknown to the server. Based on the encrypted neural network, a novel federated learning framework is designed in real applications. Experimental results show that the proposed network can effectively achieve two goals, privacy protection and data compression, under a little sacrifice of the model accuracy in federated learning.
 [33] arXiv:1908.08346 (crosslist from cs.LG) [pdf, ps, other]

Title: LoRAS: An oversampling approach for imbalanced datasetsComments: 16 pages, 1 figure (4 subfigures)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Synthetic Minority Oversampling TEchnique (SMOTE) is widelyused for the analysis of imbalanced datasets.It is known that SMOTE frequently overgeneralizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our LoRAS algorithm with 28 publicly available datasets and show that that drawing samples from an approximated data manifold of the minority class is the key to successful oversampling. We compared the performance of LoRAS, SMOTE, and several SMOTE extensions and observed that for imbalanced datasets LoRAS, on average generates better Machine Learning (ML) models in terms of F1score and Balanced Accuracy. Moreover, to explain the success of the algorithm, we have constructed a mathematical framework to prove that LoRAS is a more effective oversampling technique since it provides a better estimate to mean of the underlying local data distribution of the minority class data space.
 [34] arXiv:1908.08351 (crosslist from cs.CL) [pdf, other]

Title: The compositionality of neural networks: integrating symbolism and connectionismSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Despite a multitude of empirical studies, little consensus exists on whether neural networks are able to generalise compositionally, a controversy that, in part, stems from a lack of agreement about what it means for a neural model to be compositional. As a response to this controversy, we present a set of tests that provide a bridge between, on the one hand, the vast amount of linguistic and philosophical theory about compositionality and, on the other, the successful neural models of language. We collect different interpretations of compositionality and translate them into five theoretically grounded tests that are formulated on a taskindependent level. In particular, we provide tests to investigate (i) if models systematically recombine known parts and rules (ii) if models can extend their predictions beyond the length they have seen in the training data (iii) if models' composition operations are local or global (iv) if models' predictions are robust to synonym substitutions and (v) if models favour rules or exceptions during training. To demonstrate the usefulness of this evaluation paradigm, we instantiate these five tests on a highly compositional data set which we dub PCFG SET and apply the resulting tests to three popular sequencetosequence models: a recurrent, a convolution based and a transformer model. We provide an in depth analysis of the results, that uncover the strengths and weaknesses of these three architectures and point to potential areas of improvement.
 [35] arXiv:1908.08368 (crosslist from cs.LG) [pdf, other]

Title: A General Data Renewal Model for Prediction Algorithms in Industrial Data AnalyticsSubjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)
In industrial data analytics, one of the fundamental problems is to utilize the temporal correlation of the industrial data to make timely predictions in the production process, such as fault prediction and yield prediction. However, the traditional prediction models are fixed while the conditions of the machines change over time, thus making the errors of predictions increase with the lapse of time. In this paper, we propose a general data renewal model to deal with it. Combined with the similarity function and the loss function, it estimates the time of updating the existing prediction model, then updates it according to the evaluation function iteratively and adaptively. We have applied the data renewal model to two prediction algorithms. The experiments demonstrate that the data renewal model can effectively identify the changes of data, update and optimize the prediction model so as to improve the accuracy of prediction.
 [36] arXiv:1908.08379 (crosslist from cs.LG) [pdf, other]

Title: Practical Risk Measures in Reinforcement LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Practical application of Reinforcement Learning (RL) often involves risk considerations. We study a generalized approximation scheme for risk measures, based on MonteCarlo simulations, where the risk measures need not necessarily be \emph{coherent}. We demonstrate that, even in simple problems, measures such as the variance of the rewardtogo do not capture the risk in a satisfactory manner. In addition, we show how a risk measure can be derived from model's realizations. We propose a neural architecture for estimating the risk and suggest the risk critic architecture that can be use to optimize a policy under general risk measures. We conclude our work with experiments that demonstrate the efficacy of our approach.
 [37] arXiv:1908.08380 (crosslist from eess.SP) [pdf, other]

Title: Analysis of Wide and Deep Echo State Networks for Multiscale Spatiotemporal Time Series ForecastingComments: 10 pages, 10 figures, Proceedings of the Neuroinspired Computational Elements Workshop (NICE '19), March 2628, 2019, Albany, NY, USASubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Echo state networks are computationally lightweight reservoir models inspired by the random projections observed in cortical circuitry. As interest in reservoir computing has grown, networks have become deeper and more intricate. While these networks are increasingly applied to nontrivial forecasting tasks, there is a need for comprehensive performance analysis of deep reservoirs. In this work, we study the influence of partitioning neurons given a budget and the effect of parallel reservoir pathways across different datasets exhibiting multiscale and nonlinear dynamics.
 [38] arXiv:1908.08394 (crosslist from math.OC) [pdf, ps, other]

Title: A General Analysis Framework of Lower Complexity Bounds for FiniteSum OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper studies the lower bound complexity for the optimization problem whose objective function is the average of $n$ individual smooth convex functions. We consider the algorithm which gets access to gradient and proximal oracle for each individual component. For the stronglyconvex case, we prove such an algorithm can not reach an $\varepsilon$suboptimal point in fewer than $\Omega((n+\sqrt{\kappa n})\log(1/\varepsilon))$ iterations, where $\kappa$ is the condition number of the objective function. This lower bound is tighter than previous results and perfectly matches the upper bound of the existing proximal incremental firstorder oracle algorithm PointSAGA. We develop a novel construction to show the above result, which partitions the tridiagonal matrix of classical examples into $n$ groups. This construction is friendly to the analysis of proximal oracle and also could be used to general convex and average smooth cases naturally.
 [39] arXiv:1908.08401 (crosslist from cs.LG) [pdf, ps, other]

Title: A Deep ActorCritic Reinforcement Learning Framework for Dynamic Multichannel AccessComments: 14 figures. arXiv admin note: text overlap with arXiv:1810.03695Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
To make efficient use of limited spectral resources, we in this work propose a deep actorcritic reinforcement learning based framework for dynamic multichannel access. We consider both a singleuser case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the singleuser case, and extend it to a decentralized multiagent framework in the multiuser scenario. In both cases, we develop algorithms for the actorcritic deep reinforcement learning and evaluate the proposed learning policies via experiments and numerical results. In the singleuser model, in order to evaluate the performance of the proposed channel access policy and the framework's tolerance against uncertainty, we explore different channel switching patterns and different switching probabilities. In the case of multiple users, we analyze the probabilities of each user accessing channels with favorable channel conditions and the probability of collision. We also address a timevarying environment to identify the adaptive ability of the proposed framework. Additionally, we provide comparisons (in terms of both the average reward and time efficiency) between the proposed actorcritic deep reinforcement learning framework, DeepQ network (DQN) based approach, random access, and the optimal policy when the channel dynamics are known.
 [40] arXiv:1908.08450 (crosslist from cs.LG) [pdf, other]

Title: Efficient CrossValidation of Echo State NetworksComments: Accepted in ICANN'19 Workshop on Reservoir ComputingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Echo State Networks (ESNs) are known for their fast and precise oneshot learning of time series. But they often need good hyperparameter tuning for best performance. For this good validation is key, but usually, a single validation split is used. In this rather practical contribution we suggest several schemes for crossvalidating ESNs and introduce an efficient algorithm for implementing them. The component that dominates the time complexity of the already quite fast ESN training remains constant (does not scale up with $k$) in our proposed method of doing $k$fold crossvalidation. The component that does scale linearly with $k$ starts dominating only in some not very common situations. Thus in many situations $k$fold crossvalidation of ESNs can be done for virtually the same time complexity as a simple single split validation. Space complexity can also remain the same. We also discuss when the proposed validation schemes for ESNs could be beneficial and empirically investigate them on several different realworld datasets.
 [41] arXiv:1908.08452 (crosslist from cs.SI) [pdf, other]

Title: A new measure of modularity density for community detectionSubjects: Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Using an intuitive concept of what constitutes a meaningful community, a novel metric is formulated for detecting nonoverlapping communities in undirected, weighted heterogeneous networks. This metric, modularity density, is shown to be superior to the versions of modularity density in present literature. Compared to the previous versions of modularity density, maximization of our metric is proven to be free from bias and better detect weaklyseparated communities particularly in heterogeneous networks. In addition to these characteristics, the computational running time of our modularity density is found to be on par or faster than that of the previous variants. Our findings further reveal that community detection by maximization of our metric is mathematically related to partitioning a network by minimization of the normalized cut criterion.
 [42] arXiv:1908.08469 (crosslist from cs.LG) [pdf, other]

Title: Data Context Adaptation for Accurate Recommendation with Additional InformationComments: 10 pages, 7 figures, 5 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Given a sparse rating matrix and an auxiliary matrix of users or items, how can we accurately predict missing ratings considering different data contexts of entities? Many previous studies proved that utilizing the additional information with rating data is helpful to improve the performance. However, existing methods are limited in that 1) they ignore the fact that data contexts of rating and auxiliary matrices are different, 2) they have restricted capability of expressing independence information of users or items, and 3) they assume the relation between a user and an item is linear. We propose DaConA, a neural network based method for recommendation with a rating matrix and an auxiliary matrix. DaConA is designed with the following three main ideas. First, we propose a data context adaptation layer to extract pertinent features for different data contexts. Second, DaConA represents each entity with latent interaction vector and latent independence vector. Unlike previous methods, both of the two vectors are not limited in size. Lastly, while previous matrix factorization based methods predict missing values through the innerproduct of latent vectors, DaConA learns a nonlinear function of them via a neural network. We show that DaConA is a generalized algorithm including the standard matrix factorization and the collective matrix factorization as special cases. Through comprehensive experiments on realworld datasets, we show that DaConA provides the stateoftheart accuracy.
 [43] arXiv:1908.08479 (crosslist from math.NA) [pdf, other]

Title: Iterative Hard Thresholding for Low CPrank Tensor ModelsSubjects: Numerical Analysis (math.NA); Machine Learning (stat.ML)
Recovery of lowrank matrices from a small number of linear measurements is now wellknown to be possible under various model assumptions on the measurements. Such results demonstrate robustness and are backed with provable theoretical guarantees. However, extensions to tensor recovery have only recently began to be studied and developed, despite an abundance of practical tensor applications. Recently, a tensor variant of the Iterative Hard Thresholding method was proposed and theoretical results were obtained that guarantee exact recovery of tensors with low Tucker rank. In this paper, we utilize the same tensor version of the Restricted Isometry Property (RIP) to extend these results for tensors with low CANDECOMP/PARAFAC (CP) rank. In doing so, we leverage recent results on efficient approximations of CP decompositions that remove the need for challenging assumptions in prior works. We complement our theoretical findings with empirical results that showcase the potential of the approach.
 [44] arXiv:1908.08497 (crosslist from cs.LG) [pdf, other]

Title: DynGraph2Seq: DynamicGraphtoSequence Interpretable Learning for Health Stage Prediction in Online Health ForumsComments: 6 pages. Accepted as ICDM 2019 Short Paper. Final VersionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Online health communities such as the online breast cancer forum enable patients (i.e., users) to interact and help each other within various subforums, which are subsections of the main forum devoted to specific health topics. The changing nature of the users' activities in different subforums can be strong indicators of their health status changes. This additional information could allow healthcare organizations to respond promptly and provide additional help for the patient. However, modeling complex transitions of an individual user's activities among different subforums over time and learning how these correspond to his/her health stage are extremely challenging. In this paper, we first formulate the transition of user activities as a dynamic graph with multiattributed nodes, then formalize the health stage inference task as a dynamic graphtosequence learning problem, and hence propose a novel dynamic graphtosequence neural networks architecture (DynGraph2Seq) to address all the challenges. Our proposed DynGraph2Seq model consists of a novel dynamic graph encoder and an interpretable sequence decoder that learn the mapping between a sequence of timeevolving user activity graphs and a sequence of target health stages. We go on to propose dynamic graph hierarchical attention mechanisms to facilitate the necessary multilevel interpretability. A comprehensive experimental analysis of its use for a health stage prediction task demonstrates both the effectiveness and the interpretability of the proposed models.
 [45] arXiv:1908.08507 (crosslist from cs.LG) [pdf, other]

Title: Transfer Learning for Relation Extraction via RelationGated Adversarial LearningSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Relation extraction aims to extract relational facts from sentences. Previous models mainly rely on manually labeled datasets, seed instances or humancrafted patterns, and distant supervision. However, the human annotation is expensive, while humancrafted patterns suffer from semantic drift and distant supervision samples are usually noisy. Domain adaptation methods enable leveraging labeled data from a different but related domain. However, different domains usually have various textual relation descriptions and different label space (the source label space is usually a superset of the target label space). To solve these problems, we propose a novel model of relationgated adversarial learning for relation extraction, which extends the adversarial based domain adaptation. Experimental results have shown that the proposed approach outperforms previous domain adaptation methods regarding partial domain adaptation and can improve the accuracy of distance supervised relation extraction through finetuning.
 [46] arXiv:1908.08526 (crosslist from cs.LG) [pdf, ps, other]

Title: Double Reinforcement Learning for Efficient OffPolicy Evaluation in Markov Decision ProcessesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Offpolicy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on crossfold estimation of $q$functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourthroot rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness efficiently.
 [47] arXiv:1908.08529 (crosslist from cs.CV) [pdf, other]

Title: Sequential Latent Spaces for Modeling the Intention During Diverse Image CaptioningComments: Accepted to ICCV 2019Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Diverse and accurate vision+language modeling is an important goal to retain creative freedom and maintain user engagement. However, adequately capturing the intricacies of diversity in language models is challenging. Recent works commonly resort to latent variable models augmented with more or less supervision from object detectors or partofspeech tags. Common to all those methods is the fact that the latent variable either only initializes the sentence generation process or is identical across the steps of generation. Both methods offer no finegrained control. To address this concern, we propose SeqCVAE which learns a latent space for every word position. We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future. We illustrate the efficacy of the proposed approach to anticipate the sentence continuation on the challenging MSCOCO dataset, significantly improving diversity metrics compared to baselines while performing on par w.r.t sentence quality.
Replacements for Fri, 23 Aug 19
 [48] arXiv:1709.01577 (replaced) [pdf, other]

Title: AutoGComputation of Causal Effects on a NetworkSubjects: Methodology (stat.ME)
 [49] arXiv:1711.00562 (replaced) [pdf, other]

Title: Percent Change Estimation in Large Scale Online ExperimentsAuthors: Jacopo SorianoSubjects: Methodology (stat.ME)
 [50] arXiv:1712.00710 (replaced) [pdf, ps, other]

Title: Iterative Collaborative Filtering for Sparse Matrix EstimationSubjects: Statistics Theory (math.ST)
 [51] arXiv:1712.07822 (replaced) [pdf, other]

Title: Geometrical Insights for Implicit Generative ModelingComments: this version fixes a typo in a definitionSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [52] arXiv:1802.03774 (replaced) [pdf, other]

Title: On Kernel MethodBased Connectionist Models and Supervised Deep Learning Without BackpropagationComments: 8 pages main text, 16 pages of references and appendix, 2 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [53] arXiv:1810.01963 (replaced) [pdf, other]

Title: Learning Scheduling Algorithms for Data Processing ClustersAuthors: Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad AlizadehSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [54] arXiv:1811.08083 (replaced) [pdf, other]

Title: Complete Subset Averaging with Many InstrumentsComments: 54 pages, 2 figures, 10 tablesSubjects: Econometrics (econ.EM); Methodology (stat.ME)
 [55] arXiv:1903.12648 (replaced) [pdf, other]

Title: Overcoming Catastrophic Forgetting with Unlabeled Data in the WildComments: ICCV 2019Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [56] arXiv:1904.01670 (replaced) [pdf, other]

Title: Lautum Regularization for Semisupervised Transfer LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [57] arXiv:1904.05703 (replaced) [pdf, other]

Title: Bayesian optimal design using stochastic gradient optimisation and Fisher information gainComments: V2 fixes some typos and improves explanationsSubjects: Computation (stat.CO)
 [58] arXiv:1904.05742 (replaced) [pdf, other]

Title: Oneshot Voice Conversion by Separating Speaker and Content Representations with Instance NormalizationComments: Interspeech 2019Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
 [59] arXiv:1905.03658 (replaced) [pdf, other]

Title: Differentiable Approximation Bridges For Training Networks Containing NonDifferentiable FunctionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [60] arXiv:1905.10418 (replaced) [pdf, other]

Title: Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network ApproachComments: 10 pages, 4 figures, 8 tablesSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [61] arXiv:1905.12418 (replaced) [pdf, other]

Title: Expected Tight Bounds for Robust Deep Neural Network TrainingSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [62] arXiv:1906.07644 (replaced) [pdf, other]

Title: Towards Whitebox Benchmarks for Algorithm ControlComments: 8 pages, 9 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [63] arXiv:1906.09116 (replaced) [pdf, ps, other]

Title: Differentially Private Summation with MultiMessage ShufflingSubjects: Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [64] arXiv:1908.01406 (replaced) [pdf, other]

Title: Uncertainty in the Hot Hand Fallacy: Detecting Streaky Alternatives in Random Bernoulli SequencesSubjects: Econometrics (econ.EM); Applications (stat.AP)
 [65] arXiv:1908.03747 (replaced) [pdf, other]

Title: Estimation of Spectral Clustering Hyper ParametersComments: 5 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Accelerator Physics (physics.accph)
 [66] arXiv:1908.03901 (replaced) [pdf, ps, other]

Title: Almost Surely Asymptotic Freeness for Jacobian Spectrum of Deep NetworkAuthors: Tomohiro HayaseSubjects: Probability (math.PR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [67] arXiv:1908.05762 (replaced) [pdf, ps, other]

Title: Entityaware ELMo: Learning Contextual Entity Representation for Entity DisambiguationSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [68] arXiv:1908.06315 (replaced) [pdf, other]

Title: Implicit Deep LearningSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [69] arXiv:1908.06319 (replaced) [pdf, other]

Title: Locally Linear Embedding and fMRI feature selection in psychiatric classificationAuthors: Gagan SidhuComments: Main article is 10 pages. Supplementary Information is 15 pages and includes figures/results for six additional datasets, awith performance plots (as a function of dimensionality 'd'), proportion(s) of brain regions defined by the respective atlases, subject ID partitioning for all eleven datasets. statmaps_datasets.zip and cobre_schiz_grph.zip are on supplementary material of journal websiteJournalref: IEEE Journal of Translational Engineering in Health & Medicine 7:10, 2019Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [70] arXiv:1908.06597 (replaced) [pdf, other]

Title: Modelfree Feature Screening and FDR Control with Knockoff FeaturesSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
 [71] arXiv:1908.07409 (replaced) [pdf, other]

Title: Onset detection: A new approach to QBH systemComments: 30 pages, 26 figuresSubjects: Applications (stat.AP); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
 [72] arXiv:1908.07636 (replaced) [pdf, other]

Title: How to gamble with nonstationary $\mathcal{X}$armed bandits and have no regretsAuthors: Valeriy AvanesovComments: Fixed a typo in my own nameSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [73] arXiv:1908.07645 (replaced) [pdf, other]

Title: KNearest Neighbor Approximation Via the FriendofaFriend PrincipleComments: 27 pages, 5 figuresSubjects: Combinatorics (math.CO); Statistics Theory (math.ST)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 1908, contact, help (Access key information)