GeoFlow is going public

The public preview of GeoFlow is now available here

http://office.microsoft.com/en-us/download-geoflow-for-excel-FX104036784.aspx

I would definitely grab the “Data Explorer” plug in while I were there…(also available here http://office.microsoft.com/en-us/excel/download-data-explorer-for-excel-FX104018616.aspx)

There are plenty of reviews and good tutorials of both products available elsewhere, but in the coming few days I will try to write about where I think these technologies fit in with respect to Self Service BI

 

PowerPivot File Size Limit in Excel 2013

This is more of a note to self as I am constantly self-confusing myself with how much data can be loaded into a PowerPivot model.

Basically, there are two things to consider, version of office one is running and whether a SharePoint deployment option is required.

First, whether one is running a 32 or a 64 Bit version of Excel.  The 32 Bit version can only address 2GB of memory which effectively limits the size of the PowerPivot workbook to about 500-700Mb.  The 64 bit version used to be limited to 4Gb file size in Excel 2010 but in Excel 2013 that limitation has been removed which means that the size of the workbook is no longer limited by the software but rather by the physical configuration of one’s machine.

Unfortunately, should one plan on posting that workbook in SharePoint, the SharePoint size limitation will dictate that only up to a 2GB file can be uploaded.

Here is the link for more information: http://office.microsoft.com/en-us/excel-help/data-model-specification-and-limits-HA102837464.aspx?CTT=5&origin=HA102893837

SAP vs. Microsoft, Information Assets vs. End User Experience positioning

If your company runs SAP, chances are the picture below represents your environment pretty well

The black colors represent the existing investments/business processes and the size of the bubbles represents the effort/investment $$$. Your company has made a very large investment in the SAP back-end (ERP, BW, etc.) which is represented by a large black circle on the lower left. The information worker end-user experience in SAP is represented by a relatively smaller bubble (BobJ, BEx) and we can also see a solid line that connects these two Information Assets (one could argue that this solid line is not quite as solid in reality but that’s not the point I am trying to make here). The important point that I am trying to make however is characterized by a significantly larger bubble that represents the End User experience that the information workers in your company enjoy using Microsoft tools such as Excel, PowerPoint and SharePoint.

Many IT folks seem surprised by the size of that Information Asset but if you think about the herculean work that is done every month across various groups in your organization to dump data out of SAP (BEx, etc.), link and cleanse and scrub that data and then build countless reports in Excel using HLOOKUPs, VLOOKUPs, SUMIFs and whatever other Excel black magic necessary to get the job done, it should become clear that the size of that circle is very well justified.

Unfortunately, many companies tend to completely downplay the significance of the End User experience while pouring millions of dollars in the infrastructure and back-end.

I think that there are at least two reasons why the end-user experience gets no respect from IT:

  1. Disconnect between IT and the business audience – this is a topic that has gotten a lot of attention and coverage and is pretty universally accepted at this point and, therefore, I am not going to dwell on it too much
  2. Philosophical encampment of the SAP ideology within IT which leads to denial and rejection of anything non-SAP

The second argument is unfortunately very commonplace.

Is there a way to reframe this situation such that we can stop bickering and arguing about what vendor/technology is better and instead focus on what is the right thing to do for the company? I think there is…

I think that it is very clear that both vendors, SAP and Microsoft, are very strategic for an organization and therefore we should not think about who is better or worse but rather we should think about how do we leverage and maximize the existing investments with both vendors. It, therefore, seems pretty natural that some Microsoft Office infrastructure must be deployed to create a solid line (i.e. set up the plumbing) between the SAP back-end and front end-user tools (represented by red color on the diagram). Typically, a large organization that has an enterprise agreement with Microsoft will have most of these components already licensed and, frankly speaking, most organizations will have these components already deployed (I am talking about SQL Server, SharePoint, etc.) although not necessarily configured to enable the flow data from SAP into the hands of end users.

Having this Microsoft infrastructure does not facilitate any more replication of data than what is already taking place as the outputs (manually built Excel reports and PowerPoint decks) remain the same, so the information is already replicated from SAP and sourced into these visualizations, albeit manually. So the volume of data movement will remain the same, but all of the non-value adding work will be reduced to the bare minimum. The following is a list of some of the benefits of this approach:

  • Automation of manual data extracts
  • Governance and controllership of the information flow
  • Security and oversight
  • Build Excel reports once and refresh them from the reporting model, no manual updates necessary
  • Same for PowerPoint (build one deck that will refresh from the model with up to date information rendered for the individual user credentials)
  • Empowering end users to own content creation therefore significantly reducing the IT reporting backlog
  • Ultimately – giving the End Users what they want, like and expect

SAP Visual Intelligence vs. Microsoft PowerPivot and PowerView

In this post I will review the latest version of SAP Visual Intelligence – 1.08 – and see how it stacks up against the Microsoft self-service BI tools, Excel, PowerPivot and PowerView. In the initial release, Visual Intelligence only supported SAP Hana, but in its latest iteration few other data sources are supported:

  1. CSV File
  2. SAP Hana
  3. SAPBW as exposed as a view in SAP Hana
  4. MS Excel
  5. Freehand SQL (basically ODBC)
  6. SAP Business Objects Universe

Just to clear some confusion that this list may generate, Visual Intelligence does not access BW objects directly, instead, Hana has some connectivity to connect to BW models, DSOs and Query Snapshots and then expose those as a Hana view, in short, Visual Intelligence knows how to plug into Hana but not BW.

Data import

Since I don’t have Hana running on my laptop, I decided to use a CSV file to load in both Visual Intelligence and PowerPivot. I used a 173MB file that contained about 1.2M records in it. It took just under 30 seconds to import the file:

After that I was able to bring the data into the pivot table and start slicing and dicing it. Loading data in Visual Intelligence wasn’t quite as smooth. Having selected the same file, I had to wait for almost 40 seconds looking at the frozen screen. Then, after the meta data have been read, I clicked on Acquire. After about a minute of looking at the spinning cursor, I was rewarded with a warning message: I was surprised to find out that the acquisition will fetch more than 30 million cells, but having been known to boldly go where no man has gone before, I took my chances and bravely hit the Yes button. Unfortunately, that was as far as this adventure would take me with respect to that data file, as the following message ended my hopes of being able to analyze a meekly 1.2M of rows of data. I have not been able to find the theoretical cell limit for Visual Intelligence, all I know is that my data file had 1.2M of rows and fewer than 30 columns, but it was clearly too much.

Lesson #1: Do not plan to use Visual Intelligence on datasets larger than 1M rows.

I therefore had to resort to a much smaller file for my testing, so I used some baseball stats CSV file with 95K rows. I was able to finally load that file into Visual Intelligence after about 20 second wait… a similar task in Excel took 7 seconds.

Data Exploration

Before I jump into data exploration, we have to take a quick pause and think about the audience for it. I think it makes sense to partition the audience up by technical vs non-technical users (IT vs the rest of the world) and I am going to ignore the technical audience altogether in this review because I think that the primary market for data exploration is the non-technical audience. Also, I have to divide that non-technical audience in two groups, Power User and Novice. I realize that most people are actually somewhere in between, but I think that reviewing these tools in the light of those two outliers will give us the best context for forming an opinion. Another thing I have to say that I am probably a PowerUser outlier within the PowerUser group and therefore it is very difficult for me sometime to just see this tool from the complete Novice perspective, but I will definitely try to do my best. So, here is what I think about my data exploration experience. Visual Intelligence does a decent job of inferring measures, attributes and hierarchies out of the data. When I say decent I mean that to somebody who has completely no clue about data, does not know how to sum data up in excel, it does a job good enough to get one going. For me, however, it was a very frustrating experience as the tool tried to do everything backwards from the way I am used to. For fairness sake, I am sure that if I spent more time using the tool, I am pretty sure I would be able to get comfortable with it. But being as it was, I was able to create a Line chart and a Bar chart that looked like this Visual Intelligence has a number of different chart types available, some of which are not available in excel:

The chart I was excited the most was the Waterfall chart as it is not offered in excel, although it is possible to create one with a trick that I will have to write about in another post. But, this is where I ran into my second biggest disappointment with the product, you can only create one chart at a time. In other words, one cannot create a dashboard that would show a trend chart and a donut chart on the same page.

Lesson #2: Visualizations are limited to only one chart at a time.

Modeling

This is my biggest disappointment with the product which, frankly speaking renders it virtually useless for a power user. Modeling is practically not existent in Visual Intelligence. Although the tool does a good enough job for a Novice level user with some basic calculations and hierarchies, it is impossible to add additional data to the model. In my case, I was able to load the batting data into Visual Intelligence, but I was not able to load any team related information as a dimension so I could slice the batting data by team attributes. Essentially, the only way to do this would have been to add all of the team attributes to the batting file, but that would dramatically increase the number of cells in the data set and we already know that the tool does not like that very much.

Lesson #3: Visual Intelligence offers virtually no modeling capabilities outside of basic calculations and hierarchies.

Collaboration

After one creates a particularly good looking chart (we already know that anything beyond that is not supported in VI) one has find a way to share one’s work with the world. In the Microsoft world, an Excel file that contains both a PowerPivot model (data) and traditional excel charts or PowerView dashboards/reports, that file can be simply saved into SharePoint that then will render all of the visualization to and end user as a web page. No such luck with Visual Intelligence as its visualizations are saved as just in image in to the Streamwork portal and the end user will not be able to interact with it online. Lesson #4: Visual Intelligence is a fat client solutions, no collaboration or online viewing is available.

Random tidbits

  • Calculations in VI are very simple, nothing like PowerPivot with respect to time intelligence, financial intelligence, balance sheet logic, etc.
  • Charts in VI are not drillable, they are in PowerView
  • No drilling to detail in VI, there is one in Excel
  • No dimensional modeling in Visual Intelligence, there is on in PowerPivot
  • Tree map is available in VI, not in Excel, heat maps are available in both, although more intuitive in VI than excel (conditional formatting)

…. More things to come later tonight…

How to think strategic about Business Intelligence

Many people reach out to me asking for advice in helping them evaluate various BI platforms for their companies. In my mind I am obviously the wrong guy to ask as I am not shy to admit my fairly heavy bias towards Microsoft.  However, over the course of the last twenty years, I have been exposed to just about every major BI player on the market and I think that my bias is well justified as it is actually based on a very pragmatic and rational set of criteria.

Having said that, I find it very interesting, however, that a lot of companies go about selecting their Business Intelligence platform in a very non-strategic, irrational and tactical way. I, therefore, decided to compile a quick list of things that I think would be important to consider as these evaluations take place.

Hardware evolution

The changes that are taking place on the hardware front are very important. Majority of the top enterprise BI vendors will say that some sort of a BI appliance (Hana, Exadata, and PDW just to name a few) is absolutely necessary to catch up with the ever increasing data volumes. I happen to think that as a general rule this argument is completely ridiculous. As a rule of thumb, hardware is commoditized on a daily basis. A $20k server today will likely have 32+ cores and up to a TB of RAM. Few years ago, similar hardware would require a $1M investment. Today I routinely load hundreds of millions of rows of data in a PowerPivot model that runs on my laptop allowing me to do the kind of analysis that required something like Teradata + Microstrategy just a few years ago. My recommendation would be to exhaust the commodity hardware options before an appliance is even considered.

Self Service Vs. Enterprise

Much of my time is spent fighting philosophical battles with various IT organizations around this topic. IT wants to be in control and own the content creation (they want to build all dashboards and reports themselves). At the same time, I have yet to see an IT shop that did not have a huge backlog of reports and an upset customer (business) who is starving for data and reporting tools.  I therefore recommend reframing the role of IT in the business intelligence strategy of a company this way: IT will own data, security, governance and plumbing required to wire up all the pieces of the BI architecture, business will own all (well, most…) of the content creation. IT would be required to have someone on staff who can assist business customers if they get stuck but whatever BI tool an organization is looking at, it should be self-service focused first and foremost.

Platform consideration

Over the last few years the BI landscape has changed very significantly. One of the major changes was the movement away from the “best in breed” and towards the “platform buy”. Generally speaking, this makes sense. A platform buy will sacrifice on some features and usability for the potential gains in interoperability. In reality, however, out of the four big vendors, IBM, Oracle, SAP and Microsoft, only one can honestly say that all of its offerings are wired together natively.

Bill of materials

Many people don’t think about this aspect of the decision, but I think it is very important to consider how many different products and installs will have to be stood up in order to get the BI platform implemented. All but one vendor will typically have a separate tool (or most likely several different tools that came from different acquisitions) for different aspects of BI. So an ETL, Dashboarding, Reporting, Analysis, Master Data, Data Quality, Predictive Analysis, Data Warehousing, Semantic Layer, Self Service tools, Data Exploration, Drill-though, Report bursting, Knowledge management, Social, and many other elements of business intelligence platform will often require a different infrastructure investment, install management and most importantly a separate license to obtain.

Conclusion

This is obviously not a complete list of things to consider for someone who is trying to evaluate a BI platform but I think it is a good start. When I am involved in an evaluation process like this, my number one goal is to spend as much of the budget $$$ as possible on content – dashboards, reports, analytical views – something that my end users will get to use on a day-to-day basis as opposed to adding more hardware to the data centers or software on a balance sheet.

Customizing PowerView in PowerPoint

Although “customizing” may be too strong of a word, there are few things that we can tweak to improve the look and feel of PowerView slides that had just been exported from SharePoint. The first thing to do after the fresh export is to adjust the size of the exported image. For some reason as you can see in the image below, the PowerView image does not take up the entire slide, leaving some white edges around which would not be a problem if the PowerView image has a white background, but in my case, I have used a grey background in PowerView and it does not look good on a white PowerPoint background.

Luckily, this can be easily remedied by just clicking on the image and resizing it to fit the entire slide. That helps while presenting the static slide, however, after one clicks the “Interact” button the, unsightly white lines come back again around the edges of the slide.

The only way that I found to make that white space go away is to do the following:

  1. Right click on the PowerView image and click on Property Sheet
  2. A property screen will come up with the following parameters: ViewMode=CachedPreview,ReportSection=ReportSection2,AllowSectionNavigation=False,Fit=True,PreviewBar=False,BackgroundColor=White,Border=True,AllowEditViewMode=False,AllowFullScreenViewMode=False,Trace_HostApplication=Powerpoint
  3. Note that the BackgroundColor is set to White which is why we are see those white lines on the edges of our visualization
  4. Change the background to black or any other color that matches the background of your PowerView slide
  5. Don’t forget to hit save :)

That should be all you need to do to make sure that your PowerView looks nice in PowerPoint if its background is anything but white.

The rest of the parameters are interesting as well…

For example, by default, each PowerView view is exported into a separate slide, so one needs to physically move from one slide to another and click on “Interact” again and again. Setting the AllowSectionNavigation to True allows us to navigate from one PowerView view to another from the same slide, which is similar to the end-user experience in SharePoint. Both, Thumbnail button, Font Selection as well as the Navigation Arrows now become available


Microsoft BI ranks highest in ability to execute by Gartner

Microsoft once again is ranked the highest in its ability to execute by Gartner in its magic quadrant for Business Intelligence.  The report is pretty long and boring, but if I were distill it down to a few salient points, I would choose the following:

  1. Companies see a dramatic improvement in BI capabilities going from SQL Server 2008 R2 to SQL Server 2012
  2. Microsoft got a node for some of the upcoming stuff including Explorer, GeoFlow and PolyBase
  3. Gartner mentioned “shortening product update cycles” for Microsoft SQL, Office and Sharepoint
  4. “it is widely deployed in large enterprises as a standard with among the highest data volumes and user counts”
  5. Microsoft is dinged on basically two fronts, mobility and CPM.

Some comments on the CTP of the Microsoft SQL Server 2012 With PowerView For Multidimensional Models

I have had a pretty frustrating experience with it so far.  I think that the only way get it to work is to have it deployed as a SSRS Service Application inside SharePoint and then configure PowerView.  In other words, if you are planning to install a multidemensional instance of the CTP, install your OLAP cube and then enjoy some PowerView magic from Excel (which is pretty close to what I was hoping I should be able to do), you are likely to experience the same level of frustration and dissapointment that I just have.

I wish this link was a little bit more descriptive with respect to what PowerView functionality will and will not be supported with this release.

Microsoft SQL Server 2012 With PowerView For Multidimensional Models is available in CTP

Microsoft demonstrated PowerView on top of the Multidimensional model in Analysis Services several weeks ago at SQL PASS and now the CTP version of the product is available here.

There are two pieces of new functionality delivered with this CTP:

  1. Update for Analysis Services engine to allow DAX support for Multidimensional models
  2. Update for the Reporting Services Integrated mode for PowerView. One new feature is ability to create Global Filters that span multiple PowerView Tabs

SAP Hana vs Microsoft SQL Server, the war is on…

and the Germans are making the first move… I have written two posts about Hana vs. xVelocity (Part I and Part II) making the basic point that there is nothing truly visionary or revolutionary in SAP Hana from the technology perspective and that a customer can get all the benefits (or rather alleged benefits as many early Hana adopters will attest to) and the same or better level of performance on a commodity hardware without having to pay millions for some hardware appliance. I was not saying that there is anything particularly wrong with Hana, although it is obviously still an immature product since there is not a single SAP customer running its SAP ERP in production on it yet. Rather, my point was that there is no need to pay millions of dollars for functionality that is readily available for a lot less.

SAP’s official target with Hana until 6:46:59 AM today was Oracle, but it seems that Microsoft SQL Server is now officially a target as well, which at the very least is a good indication of how seriously SAP is taking Microsoft’s in-memory story.

I am still working on my Part III of the xVelocity vs Hana saga, so I just wanted write up a quick critique on the aforementioned post from SAP:
 

  • There are lots of technical reasons to believe that HANA is a far superior product today than what Microsoft has announced will be available a few years out” – again, not a single link to a reference to this bold claim. Last time I checked, 40% or so of SAP deployments run on Microsoft SQL Server as compared to 0% that run on Hana, so calling Hana a “far superior product” is just silly. But why provide evidence when misinformation will suffice? Here are some more unsubstantiated pearls:
    • The SQL Server products require a batch process to build the index” (obviously he has no clue how clustered indexes work)
    • OLTP product does only OLTP so a redundant copy of the data is required” (this is so not true it made me laugh)
    • There are odd SQL limitations that require hand-tuning to get it all to work” (seriously? Odd limitations? How odd??? And what are they??????)
  • The third paragraph of the post gives us a good insight on what SAP thinks of what a DBA’s job description is. Unfortunately it also makes it apparent that whoever wrote that piece has no clue about what real DBAS do, and more importantly what they don’t do (build cubes? re-architect the eco-system?). Clearly, the article is not intended for a technical audience.
  • Shared-nothing gave us the basis for “big data” databases” this is an oxymoronic statement, by definition, big data is something that is too big to be stored in memory and SAP itself positions Hadoop is the big data answer. I am starting to get a feeling that this article was written by a marketing guy who does not even know SAP’s own products and how they work.
  • The starting point for HANA is based on the recognition that the current and upcoming hardware technologies are capable of solving for all of this in a single database instance if only the database was re-written to fully utilize the hardware” – this is just silly, give us at least one example to prove this point. HANA does not even support the newest Intel Ivy bridge processors, how fully optimized for hardware is it really??? Sounds like SAP’s strategy is for IT departments to become hostages of the hardware vendors.
  • I am going to skip the apples to oranges comparison between the two test cases from Microsoft and SAP because other then they are obviously not apple to apples I don’t have (and the article does not provide) and factual information to compare and contrast the two

And then the author is closing with stating that Microsoft SQL Server’s “1995 architecture was designed for single-core x486 systems with 256MB of RAM”. This is not cool because it’s just a straight up lie. But it is also ironic because the so-called innovative Hana platform according to this Wiki page is cobbled from several not so new technologies:

  1. MaxDB also known as Adabas – a 35 year old database that used to only live on a mainframe
  2. TREX – at least 15 years old
  3. P*Time – acquired by SAP (notice a pattern there, all of the IP in HANA is acquired) in 2005

So it does not take a lot of research to understand that if one takes the marketing fluff out, there is nothing really innovating or new in Hana, nothing that has not existed or been available form many other vendors for a very long time. This SAP blog post is not very significant in making a good case for HANA, but my guessis it was not really intended to. The real intention is for the SAP’s marketing department to officially declare war on Microsoft SQL Server, now it’s up to Microsoft to make the next move.