Using queXF in Ubuntu for Automatic Processing of Handwritten Questionnaires, Tests and Surveys

Overview of queXF

queXF is an open source system for automatically extracting data from questionnaires, especially those which require the user to mark boxes (it doesn't automatically recognise free text). To do so, the questionnaire must have been created with the associated queXML. The workflow in using queXF, which is capture in a nice picture on the queXF web page, is:

  1. Create the questionnaire as a text XML file, specifically using queXML. You can write the queXML yourself (which is all I have done) or use other software like Livesurvey to assist (I haven't tried this). A quick way to get started is to take one of the examples included with queXML and modify to suit your needs. There is a Introduction to queXML that lists the different question types.
  2. Convert the XML file to a PDF. queXML comes with stylesheets/code for doing the conversion, or you can use the free online converter. The output is a PDF of the questionnaire that can be printed, as well as a banding XML files that will be used later.
  3. After you print as many copies of the PDF that you want, you distribute and people fill them in.
  4. Scan the forms to PDF, creating one PDF per printed questionnaire.
  5. Import the scanned PDFs into the queXF database. queXF provides a basic web-interface that allows you to import a batch of scanned PDFs. There is some setting up to do in queXF before the import, in particular loading the original PDF and the accompanying banding XML. During the import, queXF automatically detects the answers to the questions.
  6. Manually check the answers that queXF auto-detected. queXF provides a convenient web interface to allow you to check that the answers detected by queXF from the scanned PDF were correctly detected. You can make changes if necessary.
  7. Output the results of all questionnaires to a spreadsheet. queXF supports CSV and other data formats.
  8. Process the results in the spreadsheet (e.g. calculate average scores). This must be done yourself; queXF doesn't provide any support for this.

Read through the instructions on the queXF website for further information. This article is not about how to use queXF, but how to install it on Ubuntu.

Install Ubuntu

If you don't have Ubuntu, it is probably easiest to install it in a virtual machine, e.g. VirtualBox. In fact, I suggest even if you have Ubuntu, to use a new install in a virtual machine, rather than your own, everyday system. This way you don't have to worry about security problems of running a web server on your computer, nor any package/software conflicts.

I used Ubuntu 12.04.2 LTS Server in a VirtualBox virtual machine. It should work for later versions of Ubuntu. The Server (rather than Desktop) edition is sufficient because everything is done on the command line or via a web interface. queXF and queXML do not have any GUI.

When installing Ubuntu you are giving a change to select a selection of software packages to install based on the purpose of your computer, e.g. OpenSSH server, LAMP server, Email server. At this step you should select LAMP server, so that Apache, mySQL and PHP are installed.

Once installed and booted, install some extra software, update and reboot:

$ sudo apt-get install nano unzip ghostscript
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo reboot

A Note About Security

The following instructions assume quexf and related software will be run on a private web server, e.g. on your own computer where no-one else has access to your computer (either via a login or via http). Hence securing the web server and files, and choosing strong passwords, is not an important consideration. If you want other people to access your quexf, especially via an untrusted network (e.g. WiFi, Internet), then you should carefully consider securing your web server. For example, using https, strong passwords, and appropriate permissions on files/directories. I provide no guidelines for doing this in the following instructions.

Install quexf, quexml and other required software

quexf requires Adodb, which is not normally installed on a Ubuntu LAMP server. quexml requires tcpdf. If you are not going to use your own instance of quexml (and instead use the free online converter to PDF quexmltools) then you can skip instructions relating to quexml and tcpdf.

Install Adodb, which is used by quexf to access the database. Also GD is needed.

$ sudo apt-get install libphp-adodb php5-adodb php5-gd

Download and install quexf into directory on the web server:

$ wget http://downloads.sourceforge.net/project/quexf/quexf/quexf-1.13.5/quexf-1.13.5.zip
$ unzip quexf-1.13.5.zip
$ sudo mv quexf-1.13.5 /var/www/quexf

Download and install quexml and tcpdf into directory on the web server:

$ wget http://downloads.sourceforge.net/project/quexml/quexml/quexml-1.3.12/quexml-1.3.12.zip
$ unzip quexml-1.3.12.zip
$ sudo mv quexml-1.3.12 /var/www/quexml
$ wget http://downloads.sourceforge.net/project/tcpdf/tcpdf_6_0_020.zip
$ unzip tcpdf_6_0_020.zip
$ sudo mv tcpdf /var/www/

Set the permissions on the directories under /var/www. I am setting the owner to the user www-data and group www-data, then adding my user sgordon to the www-data group.

$ sudo chown -R www-data.www-data quexf/ quexml/ tcpdf/
$ sudo chmod 774 quexf/ quexml/ tcpdf/
$ sudo adduser sgordon www-data
Adding user `sgordon' to group `www-data' ...
Adding user sgordon to group www-data
Done.

Setup quexf

Setup PHP for quexf by setting the following variables in /etc/php5/apache2/php.ini (Update: according to the comments, in Ubuntu 14.04 you need to also set the short_open_tag to be On):


upload_max_filesize = 10M
memory_limit = 128M
post_max_size = 10M
short_open_tag = On

Setup authentication in Apache by first adding the following directives to /etc/apache2/sites-available/default. You can set the username to your own - I used sgordon. If you have multiple users, you can also allow a different set of users to access the admin section. In my case, I have just one user.

<Directory "/var/www/quexf">
                AuthType Basic
                AuthName "quexf"
                AuthUserFile /etc/apache2/passwords
                Require user sgordon
</Directory>
<Directory "/var/www/quexf/admin">
                AuthType Basic
                AuthName "quexf admin"
                AuthUserFile /etc/apache2/passwords
                Require user sgordon
</Directory>

Now create the file to store Apache users/passwords and choose a password for your user:

$ sudo htpasswd -c /etc/apache2/passwords sgordon

As the root MySQL, add a new user called quexf and create the quexf database. Choose a password.

$ mysql -u root -p mysql
mysql> CREATE USER 'quexf'@'localhost' IDENTIFIED BY 'some_password';
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE DATABASE quexf;
Query OK, 1 row affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON quexf.* TO 'quexf'@'localhost';
Query OK, 0 rows affected (0.00 sec)

mysql> exit

Now enter the quexf directory and use the script provided in the database directory to load the tables into the new database.

$ cd /var/www/quexf
$ mysql -u quexf -p quexf < database/quexf.sql

Enter the password for the quexf database user into the file config.inc.php. Also set the DB_HOST to localhost:

define('DB_USER', 'quexf');
define('DB_PASS', 'some_password');
define('DB_HOST', 'localhost');
define('DB_NAME', 'quexf');

Restart Apache for changes to take effect:

$ sudo apache2ctl restart

To check whether quexf has been setup correctly, open your browser and visit: http://http://localhost/quexf/admin/. You should see a message Passed Configuration Test. If not, check the messages shown as to what failed, and then revisit my instructions and also those provided by quexf.

Now you can start using quexf. Read the Administration Manual for instructions.

Setup quexml

With both quexml and tcpdf install in the web directory, you first need to configure quexml to use tcpdf. In the file /var/www/queml/quexmlpdf.php set the two require_once commands to point to the correct location of tcpdfp files:

require_once('/var/www/tcpdf/config/tcpdf_config.php');
require_once('/var/www/tcpdf/tcpdf.php');

Check that it and tcpdf have been installed correctly by first visiting http://localhost/tcpdf/examples/ and trying an example of tcpdf. A PDF should be displayed.

Now test quexml by visiting http://localhost/quexml/quexmlpdf_example.php. A ZIP file should be downloaded. It contains a PDF (the questionnairse) and accompanying XML file.

Now you can get started by creating your own XML questionnaire in a text editor, then uploading it via http://localhost/quexml/. For help with the XML format see the Beginners guide and Schema.