How to use JasperReports Server Pro Analytics to analyze Twitter data

System Requirements

Here is the software I used:

  • JasperReports Server Professional 5.0.1
  • Jaspersoft ETL Plus 5.2.2
  • tTwitter connectors and app account credentials
  • MongoDB 2.4.1

I installed JasperETL on my Windows 7 desktop using the All-In-One installer, and set up a VM instance running Debian 6 to run JasperReports Server and MongoDB.  This was based on personal preference, and there's no reason all of these components couldn't be installed either on Windows or Debian.

 

You can download the complete job (for JETL Plus 5.2.2) at:  https://jaspersoft.egnyte.com/dl/fJKfg29yKB

Software Setup

I followed these instructions to set up MongoDB:

http://docs.mongodb.org/manual/tutorial/install-mongodb-on-debian/

There are similar instructions that link from this page to install MongoDB on other platforms.

I then set up a Twitter application in order to use the application API.  The application API is able to pull more tweets at a time than simple web searches, but requres OAuth authentication.  You can set up a Twitter application here:

https://dev.twitter.com/apps/new

Once you've set up your application, you'll receive your Consumer Key, Consumer Secret, Access Token and Access Token Secret credentials.  Copy and paste these to a text file, because you'll need these in order to use the tTwitterOAuth component.

After installing JasperETL, I found and installed some excellent community contributed tTwitter components available here:

Once you download these components, extract the contents of the Zip archives into the /studio/JETLPlus-r-V5.2.2/org.talend.designer.components.localprovider_5.2.2.r/components folder, relaunch the studio and create an ETL job.  The components should appear in the palatte under Social Analytics/Twitter

Creating the ETL Job

To use these components in a job, just drag and drop them into the job design area and link them together using onSubJobOK triggers:

Click the tTwitterOAuth component, select the component tab and enter the credentials for your Twitter app.  Then click the tTwitterInput component and click the component tab.  Here, you'll be able to define the search query that will be used to pull tweets.  You can either use the query builder by providing search terms and linking them together using AND or OR logic, or you can provide a raw query:

The tTwitter component is capable of outputting either structured or raw JSON data.  I chose the structured option and fed the output to a tMap component.  In the tMap component, you can choose which fields from the structured Twitter data to include.  You can also provide filters and do some data processing.  In this example, I filtered out retweets and did some simple text parsing to identify the platform used to submit the tweet:

Expression for the platform variable:

I also capture the subject of the tweet from the two search terms.  Since the search terms are case-insensitive and the default Java String class contains() function is case-sensitive, I used the Apache Commons Lang StringUtils class case-insensitive contains() function,  The commons-lang3 jar file is used by other JasperETL components, but I still needed to add a tLibraryLoad component to make sure it's available to this job:

Note that the hashtags data type is a List.  I used a tConvertType component to convert it to a string, then fed the output to a tNormalize component to generate separate rows for each hash tag.  Note that by doing this, the counts will be off for individual tweets that contain multiple hashtags.  I accounted for this by making the original count a hashtag count and creating a separate count for tweets.  This count is zero when the tweet text repeats.  I did this by using another map component:

I then used a tFilterColumns component to remove the previousTweet column, and then sent the final output to MongoDB.  Here's a screenshot showing the complete job:

Since this job is using some external components, it's necessary to export the job as an Autonomous Job before it can run on the JasperETL Administration Console.  Once this is done, you can create a job definition on the Job Conductor page and import the job archive:

I then created a simple job trigger that runs once an hour and retrieves 1500 tweets into MongoDB on each run.

Setting up the Ad Hoc Topic

I created a simple topic in iReport that makes the data available in the Ad Hoc editor.  The MongoDB was pretty simple:

The _id field will cause the Ad Hoc editor to crash, so I deleted it first before uploading the topic.

Running Analytics in the Ad Hoc Editor

Once the topic is created and uploaded to JasperReports Server, we can now analyze Twitter data!  Here is a screenshot showing a simple scenario that shows the total number of tweets based on the date and platform:

Feedback