### Refine

#### Year of publication

#### Document Type

- Article (93)
- Working Paper (69)
- Doctoral Thesis (23)
- Conference Proceeding (13)
- Report (9)
- Part of a Book (6)
- Preprint (5)
- Diplom Thesis (3)
- Book (2)
- Bachelor Thesis (1)

#### Language

- English (224) (remove)

#### Keywords

#### Institute

- Informatik (224) (remove)

- Large-scale parallelized learning of nonlinear sparse coding models (2013)
- The human brain is an unparalleled system: Through millions of years of evolution and during a lifespan of learning, our brains have developed remarkable abilities for dealing with incoming sensory data, extracting structure and useful information, and finally drawing the conclusions that result in the actions we take. Understanding the principles behind this machinery and building artificial systems that mimic at least some of these capabilities is a long standing goal in both the scientific and the engineering communities. While this goal still seems unreachable, we have seen tremendous progress when it comes to training data-driven algorithms on vast amounts of training data, e.g. to learn an optimal data model and its parameters in order to accomplish some task. Such algorithms are now omnipresent: they are part of recommender systems, they perform speech recognition and generally build the foundation for many semi-autonomous systems. They start to be integral part of many technical systems modern technical societies rely on for their everyday functioning. Many of these algorithms were originally inspired by biological systems or act as models for sensory data processing in mammalian brains. The response properties of a certain population of neurons in the first stages of the mammalian visual pathway, for example, can be modeled by algorithms such as Sparse Coding (SC), Independent Component Analysis (ICA) or Factor Analysis (FA). These well established learning algorithms typically assume linear interactions between the variables of the model. Most often these relationships are expressed in the form of a matrix-vector products between a matrix with learned dictionary-elements (basis vectors as column vectors) and the latent variables of these models. While on the one hand this linear interaction can sometimes be justified by the physical process for which the machine learning model is proposed, it is on the other hand often chosen just because of its mathematical and practical convenience. From an optimal coding point of view though, one would generally expect that the ideal model closely reflect the core interactions of the system it is modeling. In vision for example, one of the dominant processes giving rise to our sensory percepts are occlusions. Occluding objects are omnipresent in visual scenes and it would not be surprising if the mammalian visual system would be optimized to process occluding structures in the visual data stream. Yet, the established mathematical models of the first stages of the visual processing path (like, e.g., SC, ICA or FA) all assume linear interactions between the active image components. In this thesis we will discuss new models that aim to approximate the effects of occluding components by assuming nonlinear interactions between their activated dictionary elements. We will present learning algorithms that infer optimal parameters for these models given data. In the experiments, we will validate the algorithms on artificial ground truth data and demonstrate their ability to recover the correct model parameters. We will show that the predictions made by these nonlinear models correspond better to the experimental data measured in-vivo than the predictions made by the established linear models. Furthermore, we systematically explore and compare a large space of plausible combinations of hyperparameters and preprocessing schemes in order to eliminate any effects of artefacts on the observed results. Training nonlinear sparse coding models is computationally more demanding than training linear models. In order to perform the numerical experiments described in this thesis we developed a software framework that facilitates the implementation of massive parallel expectation maximization (EM) based learning algorithms. This infrastructure was used for all experiments described in here, as well as by collaborators in projects we will not discuss. Some of the experiments required more than 1017 floating point operations and were run on a computer cluster running on up to 5000 CPU Cores in parallel. Our parallel framework enabled these experiments to be performed.

- A Genome-Wide Longitudinal Transcriptome Analysis of the Aging Model Podospora anserine (2013)
- Aging of biological systems is controlled by various processes which have a potential impact on gene expression. Here we report a genome-wide transcriptome analysis of the fungal aging model Podospora anserina. Total RNA of three individuals of defined age were pooled and analyzed by SuperSAGE (serial analysis of gene expression). A bioinformatics analysis identified different molecular pathways to be affected during aging. While the abundance of transcripts linked to ribosomes and to the proteasome quality control system were found to decrease during aging, those associated with autophagy increase, suggesting that autophagy may act as a compensatory quality control pathway. Transcript profiles associated with the energy metabolism including mitochondrial functions were identified to fluctuate during aging. Comparison of wild-type transcripts, which are continuously down-regulated during aging, with those down-regulated in the long-lived, copper-uptake mutant grisea, validated the relevance of age-related changes in cellular copper metabolism. Overall, we (i) present a unique age-related data set of a longitudinal study of the experimental aging model P. anserina which represents a reference resource for future investigations in a variety of organisms, (ii) suggest autophagy to be a key quality control pathway that becomes active once other pathways fail, and (iii) present testable predictions for subsequent experimental investigations.

- Virtual machine scheduling in dedicated computing clusters (2012)
- Time-critical applications process a continuous stream of input data and have to meet speciﬁc timing constraints. A common approach to ensure that such an application satisﬁes its constraints is over-provisioning: The application is deployed in a dedicated cluster environment with enough processing power to achieve the target performance for every speciﬁed data input rate. This approach comes with a drawback: At times of decreased data input rates, the cluster resources are not fully utilized. A typical use case is the HLT-Chain application that processes physics data at runtime of the ALICE experiment at CERN. From a perspective of cost and efficiency it is desirable to exploit temporarily unused cluster resources. Existing approaches aim for that goal by running additional applications. These approaches, however, a) lack in ﬂexibility to dynamically grant the time-critical application the resources it needs, b) are insufficient for isolating the time-critical application from harmful side-effects introduced by additional applications or c) are not general because application-speciﬁc interfaces are used. In this thesis, a software framework is presented that allows to exploit unused resources in a dedicated cluster without harming a time-critical application. Additional applications are hosted in Virtual Machines (VMs) and unused cluster resources are allocated to these VMs at runtime. In order to avoid resource bottlenecks, the resource usage of VMs is dynamically modiﬁed according to the needs of the time-critical application. For this purpose, a number of previously not combined methods is used. On a global level, appropriate VM manipulations like hot migration, suspend/resume and start/stop are determined by an informed search heuristic and applied at runtime. Locally on cluster nodes, a feedback-controlled adaption of VM resource usage is carried out in a decentralized manner. The employment of this framework allows to increase a cluster’s usage by running additional applications, while at the same time preventing negative impact towards a time-critical application. This capability of the framework is shown for the HLT-Chain application: In an empirical evaluation the cluster CPU usage is increased from 49% to 79%, additional results are computed and no negative effect towards the HLT-Chain application are observed.

- Analyzing user feedback of on-line communities (2013)
- The economic success of the World Wide Web makes it a highly competitive environment for web businesses. For this reason, it is crucial for web business owners to learn what their customers want. This thesis provides a conceptual framework and an implementation of a system that helps to better understand the behavior and potential interests of web site visitors by accounting for both explicit and implicit feedback. This thesis is divided into two parts. The first part is rooted in computer science and information systems and uses graph theory and an extended click-stream analysis to define a framework and a system tool that is useful for analyzing web user behavior by calculating the interests of the users. The second part is rooted in behavioral economics, mathematics, and psychology and is investigating influencing factors on different types of web user choices. In detail, a model for the cognitive process of rating products on the Web is defined and an importance hierarchy of the influencing factors is discovered. Both parts make use of techniques from a variety of research fields and, therefore, contribute to the area of Web Science.

- Design of competitive paging algorithms with good behaviour in practice (2013)
- Paging is one of the most prominent problems in the field of online algorithms. We have to serve a sequence of page requests using a cache that can hold up to k pages. If the currently requested page is in cache we have a cache hit, otherwise we say that a cache miss occurs, and the requested page needs to be loaded into the cache. The goal is to minimize the number of cache misses by providing a good page-replacement strategy. This problem is part of memory-management when data is stored in a two-level memory hierarchy, more precisely a small and fast memory (cache) and a slow but large memory (disk). The most important application area is the virtual memory management of operating systems. Accessed pages are either already in the RAM or need to be loaded from the hard disk into the RAM using expensive I/O. The time needed to access the RAM is insignificant compared to an I/O operation which takes several milliseconds. The traditional evaluation framework for online algorithms is competitive analysis where the online algorithm is compared to the optimal offline solution. A shortcoming of competitive analysis consists of its too pessimistic worst-case guarantees. For example LRU has a theoretical competitive ratio of k but in practice this ratio rarely exceeds the value 4. Reducing the gap between theory and practice has been a hot research issue during the last years. More recent evaluation models have been used to prove that LRU is an optimal online algorithm or part of a class of optimal algorithms respectively, which was motivated by the assumption that LRU is one of the best algorithms in practice. Most of the newer models make LRU-friendly assumptions regarding the input, thus not leaving much room for new algorithms. Only few works in the field of online paging have introduced new algorithms which can compete with LRU as regards the small number of cache misses. In the first part of this thesis we study strongly competitive randomized paging algorithms, i.e. algorithms with optimal competitive guarantees. Although the tight bound for the competitive ratio has been known for decades, current algorithms matching this bound are complex and have high running times and memory requirements. We propose the algorithm OnlineMin which processes a page request in O(log k/log log k) time in the worst case. The best previously known solution requires O(k^2) time. Usually the memory requirement of a paging algorithm is measured by the maximum number of pages that the algorithm keeps track of. Any algorithm stores information about the k pages in the cache. In addition it can also store information about pages not in cache, denoted bookmarks. We answer the open question of Bein et al. '07 whether strongly competitive randomized paging algorithms using only o(k) bookmarks exist or not. To do so we modify the Partition algorithm of McGeoch and Sleator '85 which has an unbounded bookmark complexity, and obtain Partition2 which uses O(k/log k) bookmarks. In the second part we extract ideas from theoretical analysis of randomized paging algorithms in order to design deterministic algorithms that perform well in practice. We refine competitive analysis by introducing the attack rate parameter r, which ranges between 1 and k. We show that r is a tight bound on the competitive ratio of deterministic algorithms. We give empirical evidence that r is usually much smaller than k and thus r-competitive algorithms have a reasonable performance on real-world traces. By introducing the r-competitive priority-based algorithm class OnOPT we obtain a collection of promising algorithms to beat the LRU-standard. We single out the new algorithm RDM and show that it outperforms LRU and some of its variants on a wide range of real-world traces. Since RDM is more complex than LRU one may think at first sight that the gain in terms of lowering the number of cache misses is ruined by high runtime for processing pages. We engineer a fast implementation of RDM, and compare it to LRU and the very fast FIFO algorithm in an overall evaluation scheme, where we measure the runtime of the algorithms and add penalties for each cache miss. Experimental results show that for realistic penalties RDM still outperforms these two algorithms even if we grant the competitors an idealistic runtime of 0.

- An erasure-resilient and compute-efficient coding scheme for storage applications (2013)
- Driven by rapid technological advancements, the amount of data that is created, captured, communicated, and stored worldwide has grown exponentially over the past decades. Along with this development it has become critical for many disciplines of science and business to being able to gather and analyze large amounts of data. The sheer volume of the data often exceeds the capabilities of classical storage systems, with the result that current large-scale storage systems are highly distributed and are comprised of a high number of individual storage components. As with any other electronic device, the reliability of storage hardware is governed by certain probability distributions, which in turn are influenced by the physical processes utilized to store the information. The traditional way to deal with the inherent unreliability of combined storage systems is to replicate the data several times. Another popular approach to achieve failure tolerance is to calculate the block-wise parity in one or more dimensions. With better understanding of the different failure modes of storage components, it has become evident that sophisticated high-level error detection and correction techniques are indispensable for the ever-growing distributed systems. The utilization of powerful cyclic error-correcting codes, however, comes with a high computational penalty, since the required operations over finite fields do not map very well onto current commodity processors. This thesis introduces a versatile coding scheme with fully adjustable fault-tolerance that is tailored specifically to modern processor architectures. To reduce stress on the memory subsystem the conventional table-based algorithm for multiplication over finite fields has been replaced with a polynomial version. This arithmetically intense algorithm is better suited to the wide SIMD units of the currently available general purpose processors, but also displays significant benefits when used with modern many-core accelerator devices (for instance the popular general purpose graphics processing units). A CPU implementation using SSE and a GPU version using CUDA are presented. The performance of the multiplication depends on the distribution of the polynomial coefficients in the finite field elements. This property has been used to create suitable matrices that generate a linear systematic erasure-correcting code which shows a significantly increased multiplication performance for the relevant matrix elements. Several approaches to obtain the optimized generator matrices are elaborated and their implications are discussed. A Monte-Carlo-based construction method allows it to influence the specific shape of the generator matrices and thus to adapt them to special storage and archiving workloads. Extensive benchmarks on CPU and GPU demonstrate the superior performance and the future application scenarios of this novel erasure-resilient coding scheme.

- Tree-width for first order formulae (2012)
- We introduce tree-width for first order formulae φ, fotw(φ). We show that computing fotw is fixed-parameter tractable with parameter fotw. Moreover, we show that on classes of formulae of bounded fotw, model checking is fixed parameter tractable, with parameter the length of the formula. This is done by translating a formula φ with fotw(φ)<k into a formula of the k-variable fragment Lk of first order logic. For fixed k, the question whether a given first order formula is equivalent to an Lk formula is undecidable. In contrast, the classes of first order formulae with bounded fotw are fragments of first order logic for which the equivalence is decidable. Our notion of tree-width generalises tree-width of conjunctive queries to arbitrary formulae of first order logic by taking into account the quantifier interaction in a formula. Moreover, it is more powerful than the notion of elimination-width of quantified constraint formulae, defined by Chen and Dalmau (CSL 2005): for quantified constraint formulae, both bounded elimination-width and bounded fotw allow for model checking in polynomial time. We prove that fotw of a quantified constraint formula φ is bounded by the elimination-width of φ, and we exhibit a class of quantified constraint formulae with bounded fotw, that has unbounded elimination-width. A similar comparison holds for strict tree-width of non-recursive stratified datalog as defined by Flum, Frick, and Grohe (JACM 49, 2002). Finally, we show that fotw has a characterization in terms of a cops and robbers game without monotonicity cost.

- Long-term potentiation through calcium-mediated N-Cadherin interaction is tightly controlled by the three-dimensional architecture of the synapse (2013)
- Poster presentation: Twenty Second Annual Computational Neuroscience Meeting: CNS*2013. Paris, France. 13-18 July 2013. The synaptic cleft is an extracellular domain that is capable of relaying a presynaptically received electrical signal by diffusive neurotransmitters to the postsynaptic membrane. The cleft is trans-synaptically bridged by ring-like shaped clusters of pre- and postsynaptically localized calcium-dependent adhesion proteins of the N-Cadherin type and is possibly the smallest intercircuit in nervous systems [1]. The strength of association between the pre- and postsynaptic membranes can account for synaptic plasticity such as long-term potentiation [2]. Through neuronal activity the intra- and extracellular calcium levels are modulated through calcium exchangers embedded in the pre- and postsynaptic membrane. Variations of the concentration of cleft calcium induces changes in the N-Cadherin-zipper, that in synaptic resting states is rigid and tightly connects the pre- and postsynaptic domain. During synaptic activity calcium concentrations are hypothesized to drop below critical thresholds which leads to loosening of the N-Cadherin connections and subsequently "unzips" the Cadherin-mediated connection. These processes may result in changes in synaptic strength [2]. In order to investigate the calcium-mediated N-Cadherin dynamics at the synaptic cleft, we developed a three-dimensional model including the cleft morphology and all prominent calcium exchangers and corresponding density distributions [3-6]. The necessity for a fully three-dimensional model becomes apparent, when investigating the effects of the spatial architecture of the synapse [7], [8]. Our data show, that the localization of calcium channels with respect to the N-Cadherin ring has substantial effects on the time-scales on which the Cadherin-zipper switches between states, ranging from seconds to minutes. This will have significant effects on synaptic signaling. Furthermore we see, that high-frequency action potential firing can only be relayed to the Calcium/N-Cadherin-system at a synapse under precise spatial synaptic reorganization.

- Synaptic boutons sizes are tuned to best fit their physiological performances (2013)
- Poster presentation: Twenty Second Annual Computational Neuroscience Meeting: CNS*2013. Paris, France. 13-18 July 2013. To truly appreciate the myriad of events which relate synaptic function and vesicle dynamics, simulations should be done in a spatially realistic environment. This holds true in particular in order to explain as well the rather astonishing motor patterns which we observed within in vivo recordings which underlie peristaltic contractionsas well as the shape of the EPSPs at different forms of long-term stimulation, presented both here, at a well characterized synapse, the neuromuscular junction (NMJ) of the Drosophila larva (c.f. Figure 1). To this end, we have employed a reductionist approach and generated three dimensional models of single presynaptic boutons at the Drosophila larval NMJ. Vesicle dynamics are described by diffusion-like partial differential equations which are solved numerically on unstructured grids using the uG platform. In our model we varied parameters such as bouton-size, vesicle output probability (Po), stimulation frequency and number of synapses, to observe how altering these parameters effected bouton function. Hence we demonstrate that the morphologic and physiologic specialization maybe a convergent evolutionary adaptation to regulate the trade off between sustained, low output, and short term, high output, synaptic signals. There seems to be a biologically meaningful explanation for the co-existence of the two different bouton types as previously observed at the NMJ (characterized especially by the relation between size and Po), the assigning of two different tasks with respect to short- and long-time behaviour could allow for an optimized interplay of different synapse types. We can present astonishing similar results of experimental and simulation data which could be gained in particular without any data fitting, however based only on biophysical values which could be taken from different experimental results. As a side product, we demonstrate how advanced methods from numerical mathematics could help in future to resolve also other difficult experimental neurobiological issues.

- On functional module detection in metabolic networks (2013)
- Functional modules of metabolic networks are essential for understanding the metabolism of an organism as a whole. With the vast amount of experimental data and the construction of complex and large-scale, often genome-wide, models, the computer-aided identification of functional modules becomes more and more important. Since steady states play a key role in biology, many methods have been developed in that context, for example, elementary flux modes, extreme pathways, transition invariants and place invariants. Metabolic networks can be studied also from the point of view of graph theory, and algorithms for graph decomposition have been applied for the identification of functional modules. A prominent and currently intensively discussed field of methods in graph theory addresses the Q-modularity. In this paper, we recall known concepts of module detection based on the steady-state assumption, focusing on transition-invariants (elementary modes) and their computation as minimal solutions of systems of Diophantine equations. We present the Fourier-Motzkin algorithm in detail. Afterwards, we introduce the Q-modularity as an example for a useful non-steady-state method and its application to metabolic networks. To illustrate and discuss the concepts of invariants and Q-modularity, we apply a part of the central carbon metabolism in potato tubers (Solanum tuberosum) as running example. The intention of the paper is to give a compact presentation of known steady-state concepts from a graph-theoretical viewpoint in the context of network decomposition and reduction and to introduce the application of Q-modularity to metabolic Petri net models.