When we think of machine learning, the first step is to acquire and train a large dataset. However, many times the data isn’t available due to confidentiality. This problem is faced by hundreds of developers, especially for projects which have no previous developments. Certain GAN (Generative Adversarial Network) models, specifically Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN), have been introduced to produce realistic real-valued multi-dimensional time-series data.
Some of the limitations of previous research include:
- Data is hard to train.
- Data acquired by other methods like GAN( Generative Adversarial Network) is sometimes unstable and might not ever converge.
- It is significantly harder to train for text than images.
- It generally requires lots of data for training and might not be the right choice when there is limited or no available data.
This paper brings the solution to this problem via the introduction of tsBNgen, a Python library to generate time series and sequential data based on an arbitrary dynamic Bayesian network. This package lets the developers and researchers generate time series data according to the random model they want.
Features of tsBNgen:
- It handles discrete nodes, continuous nodes, and hybrid (Mixture of discrete and continuous) networks.
- It uses multinomial distribution for the discrete nodes and Gaussian distribution for the continuous nodes.
- It handles arbitrary Bayesian network structure.
- It supports arbitrary loopback values.
- The code can be modified easily to handle arbitrary static and temporal structures.
tsBNgen is released under MIT license. It has been applied to various architectures like HMM, and the result had approximately 93% accuracy.