11/06/2013

SQL Scripts Runner

Introduction


A simple console application, which executes all SQL files added by developers working with decentralised development environment.

For downloading the source code: Codeproject Article

PROBLEM

How do you manage SQL scripts in a decentralised development environment?

The Figure below kind of describes the what I had in mind when I refer to a decentralised development environment?


Figure 1 
Decentralised Development Environment is the one with the local storage - the dashed box
Centralised Development Environment has only centralised data storage. 

Usually there is a main database and every developer has his own copy locally.

This type of development environment allows for independent work without the need of a network connection; if there is an error with the main storage, the others will still be able to work with their local database copies.

However a great disadvantage is that during development each developer has to maintain the main database and it’s local one. This means creating a script constantly with the necessary updates. Another option is to create a backup and restore the main database locally (this would take a lot of time and is not considerate as good practice).

So how can one keep track of the Database changes?
In this article I propose two solutions.
  1. Store the script in a common storage available for all developers – for example RedMine (http://www.redmine.org).
  2. Add all the SQL scripts in the project source control.


This first option separates the DB scripts from the concrete project, but does not provide for versioning. 
The second option however allows for versioning and when the latest version of the SQL scripts have been retrieved from the source control the developer can run it as a console executable manually. (It would be ideal if there were a way to do this automatically, this I leave to the general public.). 

What the executable should do?
  • Should be configurable – this should specify the SQL scripts folder.
  • The executable should run the scripts in order.
  • If there is a syntactic error with the script, the application should display it before even running the script - validation.
  • If there is a runtime error with the script all the scripts should roll back. The file and the line at which the error occurred should be displayed.
  • When there is no error then all the scripts should be executed and the changes applied.
  • Should have a way to test the results – simple Unit Testing.

One very important thing to be mentioned is that the SQL script should be written in a way that it can be executed many times (this means having check conditions or a drop and create statement).

Implementation

Returning a list of SQL files

All the files from a directory and its subdirectories should be returned but in the right order – this means that the files from folder Ver11.0 should be not after Ver1.0 if there is folder with name Ver2.0. This means that the sort order should be based on the values of the versions, but not by the string values.
That’s very easily implemented by a LINQ expression:
foreach (var d in Directory.GetDirectories(dir).OrderBy(x => float.Parse(Path.GetFileName(x).Substring(Configuration.Configuration.Pproperties.ScriptVersionPrefix.Length))))
{

Display the execution percentage


When \r is used the cursor goes back to the beginning of the current line and then can be rewritten. 

Console.Write("\r{0}{1}% complete", message, percent);

How to find SQL compile time errors?


The parsing capabilities of SQL Server Management Studio were used for this. I have used the two assemblies: Microsoft.Data.Schema.ScriptDom; Microsoft.Data.Schema.ScriptDom.Sql;

These two assemblies can be found in: C:\Windows\Microsoft.NET\assembly\GAC_MSIL
If you have installed MS SQL Management Studio
public static bool ParseSqlFile(string file)
{
     IList errors = new List();
     bool hasErros = false;
     using (TextReader reader = File.OpenText(file))
     {
         var parser = new TSql100Parser(true);
         var script = parser.Parse(reader, out errors) as TSqlScript;
         hasErros = errors.Count > 0;
         foreach (var parseError in errors)
         {
              Errors.ProcessErrors(file, parseError);
         }
      }
      return hasErros;
}

Main method of the application

All comes together in the main method. Here after the connection to the database is open, the SQL files are returned in a collection, then for every file its SQL script is parsed and executed as part of transaction so that when the execution is unsuccessful the data can be rolled back.

Please note:
//TODO: remove
Thread.Sleep(500);

This should be removed when it is used in practice. It is good for demo purposes so that the execution percentage can be seen.

The next section is not important for the current project and can be skipped. If after you get the source code you do not understand something about it you can get back to it.

Getting the configuration in OOP way

In “Configuration” folder can be found three files, which are responsible for getting the configuration settings from the App.config in OOP way. A custom attribute defines the key of the setting and after that the attribute is applied to a property, which is set by reflection when the console application starts.

The custom attribute  class:
[AttributeUsage(AttributeTargets.Property, Inherited = false)]
class ConfigAttribute : Attribute
{
     ///

     /// The key
     ///

 private readonly string key;
Class with properties which have attributes of the custom attrubute type
public class ConfigProperties
{
   ///

   /// Gets or sets a value indicating whether [use AU s_ AIS].
   ///

   ///
   ///   true if [use AU s_ AIS]; otherwise, false.
   ///

   [Config("SQLDir")]
public string SqlDir { get; set; }
In the “Configuration.cs” file is done the reading of the attributes by reflection.
static Configuration()
{
   properties = new ConfigProperties();
   var props = typeof(ConfigProperties).GetProperties();
   foreach (PropertyInfo prop in props)
   {
       string auth = string.Empty;
       object[] attrs = prop.GetCustomAttributes(true);
       foreach (object attr in attrs)
       {
          var authAttr = attr as ConfigAttribute;
          if (authAttr != null)
          {
             string propName = prop.Name;
             auth = authAttr.Key;
          }
   try
   {
    var value = Convert.ChangeType(ConfigurationManager.AppSettings.Get(auth), prop.PropertyType);
       prop.SetValue(properties, value, null);
   }
   catch (Exception ex)
   {
    throw ex;
   }
   break;
   }
 }
}


9/12/2013

Switching between two threads in console application

I will present a very simple idea of switching between two threads. The switch can be done by user pressing the enter key or by timer.

This is an idea which I am going to use to switch between main and backup service.
In my case I am getting the data from TCP stream. It is AIS data, and if one of the streams do not receive data, should be able to switch to backup stream.

This is the example which used the user key pressing:

using System;
using System.Threading;

class Demo1
{
    /*
     * Switch between two threads. When the user presses enter key, he wakes up one of the two threads dependig on counter.
     * When the counter reaches 10, then a variable inside the threads is set to true - which indicates that the thread will finish.
     * The body of the thread is executed one more time.
     *
     *
     *
     */

    static readonly object _locker1 = new object();
    static readonly object _locker2 = new object();

    //private static EventWaitHandle _mainWait = new AutoResetEvent(false);

    static bool exit_1 = false;
    static bool exit_2 = false;

    static void Main()
    {
        new Thread(Work1).Start();
        new Thread(Work2).Start();

        int x2 = 0;

        while (true)
        {

            Console.ReadLine(); // Wait for user to hit Enter
            x2 += 1;

            Console.WriteLine(x2);

            if (x2 % 10 == 0)
            {
                //_mainWait.Set();
                lock (_locker1)
                {
                    exit_1 = true;
                    Monitor.Pulse(_locker1);
                }
                lock (_locker2)
                {
                    exit_2 = true;
                    Monitor.Pulse(_locker2);
                }

                break;
            }

            if (x2 % 2 == 0)
            {
                lock (_locker1) // Let's now wake up the thread by
                {
                    // setting _go=true and pulsing.
                    Monitor.Pulse(_locker1);
                }
            }
            else
            {
                lock (_locker2) // Let's now wake up the thread by
                {
                    // setting _go=true and pulsing.
                    Monitor.Pulse(_locker2);
                }
            }
        }

        //_mainWait.WaitOne();
    }

    static void Work1()
    {
        lock (_locker1)
        {
            while (true)
            {
                Monitor.Wait(_locker1); // Lock is released while we’re waiting
                if (!exit_1)
                    Console.WriteLine("Worker1 - Woken!!!");
                else
                    Console.WriteLine("lastly called worker1");
                //Monitor.Wait(_locker1);
                if (exit_1)
                    break;
            }
        }
    }

    static void Work2()
    {
        lock (_locker2)
        {
            while (true)
            {
                Monitor.Wait(_locker2); // Lock is released while we’re waiting
                if (!exit_2)
                    Console.WriteLine("Worker2 - Woken!!!");
                else
                    Console.WriteLine("lastly called worker2");
                //Monitor.Wait(_locker2);
                if (exit_2)
                    break;

            }
        }
    }
}

Example with automatic switch done by timer:
using System;
using System.Threading;
using System.Timers;

class Demo1
{
    /*
     * Switch between two threads. When the user presses enter key, he wakes up one of the two threads dependig on counter.
     * When the counter reaches 10, then a variable inside the threads is set to true - which indicates that the thread will finish.
     * The body of the thread is executed one more time.
     *
     *
     *
     */

    static readonly object _locker1 = new object();
    static readonly object _locker2 = new object();

    static readonly object _locker3 = new object();

    public static System.Timers.Timer HealthCheckTimer = new System.Timers.Timer();
    public static System.Timers.Timer HealthCheckTimerBackup = new System.Timers.Timer();
   
    //private static EventWaitHandle _mainWait = new AutoResetEvent(false);
   
    static bool exit_1 = false;
    static bool exit_2 = false;

    static void Main()
    {                                
        new Thread(Work1).Start();  
        new Thread(Work2).Start();
       
        int x2 = 0;
        HealthCheckTimer.Elapsed += new ElapsedEventHandler(OnElapsedTime);
        HealthCheckTimer.Interval = 100;
        HealthCheckTimer.Enabled = true;

        while (true)
        {
            lock (_locker3)
            {
                Monitor.Wait(_locker3);
            }

            x2 += 1;
           
            Console.WriteLine(x2);

            if (x2 % 10 == 0)
            {
                lock (_locker1)
                {
                    exit_1 = true;
                    Monitor.Pulse(_locker1);
                }
                lock (_locker2)
                {
                    exit_2 = true;
                    Monitor.Pulse(_locker2);  
                }

                break;
            }

            if (x2 % 2 == 0)
            {
                lock (_locker1)
                {
                    Monitor.Pulse(_locker1);
                }
            }
            else
            {
                lock (_locker2)
                {
                    Monitor.Pulse(_locker2);
                }
            }
        }
       
        //_mainWait.WaitOne();
    }

    private static void OnElapsedTime(object sender, ElapsedEventArgs e)
    {
        lock (_locker3)
        {
            Monitor.Pulse(_locker3);
        }
    }

    static void Work1()
    {
        lock (_locker1)
        {
            while (true)
            {
                Monitor.Wait(_locker1); // Lock is released while we’re waiting
                if(!exit_1)
                    Console.WriteLine("Worker1 - Woken!!!");
                else
                    Console.WriteLine("lastly called worker1");
                //Monitor.Wait(_locker1);
                if(exit_1)
                    break;
            }
        }
    }

    static void Work2()
    {
        lock (_locker2)
        {
            while (true)
            {
                Monitor.Wait(_locker2); // Lock is released while we’re waiting
                if(!exit_2)
                    Console.WriteLine("Worker2 - Woken!!!");
                else
                    Console.WriteLine("lastly called worker2");
                //Monitor.Wait(_locker2);
                if(exit_2)
                    break;
               
            }
        }
    }
}

6/27/2013

Visual Studio: the project type is not supported by this installation error

I installed ASP.NET MVC and downloaded a few example projects. I kept getting an error when opening the project file: the project type is not supported by this installation.

After some research, I found the solution over at stackoverflow: Replace theProjectTypeGuids in the csproj file to:
{349c5851-65df-11da-9384-00065b846f21};{fae04ec0-301f-11d3-bf4b-00c04f79efbc}


http://stackoverflow.com/questions/336235/the-project-type-is-not-supported-by-this-installation-error

6/09/2013

MSc Thesis in the University of Bristol - Visualising spatial and temporal electricity consumption data from Smart Meters

Since there is almost 1 year since I finished my MSc Thesis for the University of Bristol I think that it is time to share what I did with everyone interested in the topic.

Before writing the final thesis I did extensive research and got distinction result in Research Skills subject in the University. This helped me a lot for the final push when I had to research, implement and analyse the final result.

My supervisor - Prof. Peter Flach, was professor in Artificial Intelligence and unfortunately he didn't have much experience with visualization but just general knowledge. I was working with Center for Sustainable Energy (CSE) in Bristol, which have done some cool staff with heat-map visualizations. They tried to help but they were not really involved but just waited for the final output. I think that they didn't liked that I did because I used different set of technologies with which they are not so familiar with.

You can enjoy my whole thesis the way I printed. I got 68% (almost distinction) despite the negative factors which I mention above.

5/17/2013

Testing Large file upload in bleuimp Multiple file upload jQuery plugin

I had to create multiple file upload user interface for an old style asp.net web forms app. I know... old technology but what can you do when a system which is created 8 years ago is still working and you have to support it till the new version win MVC 4.0 eventually starts.
For that task I used one very nice jQuery plugin made by blueimp. It is very nice and works for most of the browsers ... who supports IE9... :) anyway in order to test the upload of large files I had create some with particular size equal to the exact maximum possible upload size.
For that reason I used cmd tool


Below is a screenshot showing an example of the execution of the program:



1/28/2013

University of Bristol - Spam Filter Assignment


Lab 5: Spam filter - Part II

In this part of the assignment, you will train your classifier on real-world e-mails, which you can download from here. They have the same convention of file-names as in part I.

Specification

The program should be written in Java, and the name of the main file should be filter.java and all files should compile using command:
javac *.java
But now, the program should take only one argument:
java filter testfile
where testfile is the name of a file containg a single e-mail, which is to be classified. The program should return the same output as in part I. Since you do not specify the directory with the training files, the program needs to store its knowledge gained during training somewhere, e.g. in a separate file (which you can submit together with your code).
The program will be tested using automatic marking on a testing set with the same number of e-mails to that in the training set, and with the same class distribution to that in the training set. The testing set will contain e-mails from the same sources as the training set. The tests will be performed on the departmental Linux machines (MVB 2.11), and your program needs to classify all testing e-mails within 30 minutes. Sample automarking script (with a single testing e-mail) can be downloaded from here.
You also need to test your program yourself, using 10-fold cross validation.
In this part of the assignment, you will discover that in many practical machine learning problems implementing the learning algorithm is often only a small part of the overall system. Thus to get a high mark from the assignment you need to implement any of the more advanced classifying techniqes or clever pre-processing methods. You will find plenty of them on the internet. If you do not know where to look for them, ask Google ;-).

Problems you might have

  1. Arithmetic underflow: Because the number of words is very large, the probabilities dealt with are very small. When multiplied together, the values may become so small they cannot be represented. The solution is to use logarithms. The formula for classification becomes:

    ci = argmax (log pi + sum (log P(wj|ci))).

    The multiplications become additions because log(a*b)=log(a)+log(b).
  2. Memory: You need to be slightly careful about how you store some information while preprocessing.

Submission

Submit your code and any files it uses via the online submission system. Also submit 2 slides that you will be using during your presentation. The presentations will take place in January in MVB 3.43, and the detailed schedule is available here. Please submit your slides in a single pdf file (NOT in Power Point native format - so if you use Power Point to prepare the slides, please print them to pdf). Please note that you will be ONLY allowed to use 2 slides during your presentation (so if you submit more, you will NOT be able to show them). Also, please use maximum 10 lines of text per slide (more lines of text will result in reduced mark), but you are welcome to include charts and tables.
Each presentation will last up to 5 minutes (if you do not finish within 5 minutes, the presentation will be interupted). The presentation should describe your work, including the methods used to preprocess the data, any simplifying assumptions made and the results of a 10-fold cross validation carried out on the data provided using your program. Do not describe the Naive Bayes algorithm. If you implement extra preprocessing options, you should report the accuracy of 10-fold cross validation with and without these options (e.g. on a slide).

Marking criteria

Below the provisional marking criteria are listed, but I reserve the right to change them.
Your program classifies the testing set with the accuracy significantly higher than chance within 30 minutes -- 30%
Your presentation cointains the results of 10-fold cross validation -- 10%
The remaining 60% of the mark will be based on the extra work you do. I attach below the criteria I used last year
 - but they are likely to change this year (as last year, different weighting was given to this assignment), 
so just treat them as very rough guideline.

Preprocessing
I divided preprocessing techniques used in two groups:
Simple techniques - for each simple technique I gave:
- 1% for implementation, and 
- 2% for comparison of accuracy with and without the technique.
Advanced techniques - for each of advanced technique, I gave
- 2% for implementation,
- 2% for comparison of accuracy with and without, and
- 2% for analysis how the accuracy depended on parameters of a technique.

Result analysis
- analyses of different types of errors (false positive, etc) - 1%
- listing top words predicting spam/non-spam - 2%
- statistical comparison of different classifiers 1-4%

Other
Marked depending on the technique

And of course there will be 2 prizes:
- Highest accuracy on the testing set
- Most interesting approach
The winners will receive original 8'' diskettes, and some bonus points.

---------------------------------------------------------------------------------

My result for this assignment was the second best. I got 99,6% accuracy. In the report you will be able to see the techniques which I used to achieve this, but it is rally subjective - it depends of the training emails.

Presentation: http://velchev.co.uk/blogger/SpamFilter1_1_lv1594.pdf
Source Code: http://velchev.co.uk/blogger/SpamFilter1.1.zip


1/09/2013

1/05/2013

The herding instinct...

Fascinating and amusing illustration of the tendency for people to herd or swarm. Raises questions about motivation and reward where people flock to follow or support a particular cause or product. People like to belong. Early adopters/pioneers are risk-takers. The swarm effect becomes part of the product's appeal. The internet and modern communications technologies accelerate and expand the phenomenon dramatically.

1/02/2013