Are Master Data Management and Hadoop a Good Match?

Master Data is the critical electronic information about the company we cannot afford to lose. Accordingly, we should sanitise it, look after it, and store it safely in several separate places that are independent of each other. The advent of Big Data introduced the current era of huge repositories ?in the clouds?. They are not, of course but at least they are remote. This short article includes a discussion about Hadoop, and whether this is a good platform to back up your Master Data.

About Hadoop

Hadoop is an open-source Apache software framework built on the assumption that hardware failure is so common that backups are unavoidable. It comprises a storage area and a management part that distributes the data to smaller nodes where it processes faster and more efficiently. Prominent users include Yahoo! and Facebook. In fact more than half Fortune 50 companies were using Hadoop in 2013.

Hadoop – initially launched in December 2011 ? has survived its baptism of fire and became a respected, reliable option. But is this something the average business owner can tackle on their own? Bear in mind that open source software generally comes with little implementation support from the vendor.

The Hadoop Strong Suite

  • Free to download, use and contribute to
  • Everything you need ?in the box? to get started
  • Distributed across multiple fire-walled computers
  • Fast processing of data held in efficient cluster nodes
  • Massive scaleable storage you are unlikely to run out of

Practical Constraints

There is more to Hadoop than writing to WordPress. The most straightforward solutions are uploading using Java commands, obtaining an interface mechanism, or using third party vendor connectors such as ACCESS or SAS. The system does not replace the need for IT support, although it is cheap and exceptionally powerful.

The Not-Free Safer Option

Smaller companies without in-depth in-house support are wise to engage with a technical intermediary. There are companies providing commercial implementations followed by support. Microsoft, Amazon and Google among others all have commercial versions in their catalogues, and support teams at the end of the line.

Check our similar posts

New Focus on Monitoring Soil

There is nothing new about monitoring soil in arid conditions. South Africa and Israel have been doing it for decades. However climate change has increased its urgency as the world comes to terms with pressure on the food chain. Denizon decided to explore trends at the macro first world level and the micro third world one.

In America, the Coordinated National Soil Moisture Network is going ahead with plans to create a database of federal and state monitoring networks and numerical modelling techniques, with an eye on soil-moisture database integration. This is a component of the National Drought Resilience Partnership that slots into Barrack Obama?s Climate Action Plan.

This far-reaching program reaches into every corner of American life to address the twin scourges of droughts and inundation, and the agency director has called it ?probably ?… one of the most innovative inter-agency tools on the planet?. The pilot project involving remote moisture sensing and satellite observation targets Oklahoma, North Texas and surrounding areas.

Africa has similar needs but lacks America?s financial muscle. Princeton University ecohydrologist Kelly Caylor is bridging the gap in Kenya and Zambia by using cell phone technology to transmit ecodata collected by low-cost ?pulsepods?.

He deploys the pods about the size of smoke alarms to measure plants and their environment.?Aspects include soil moisture to estimate how much water they are using, and sunlight to approximate the rate of photosynthesis. Each pod holds seven to eight sensors, can operate on or above the ground, and transmits the data via sms.

While the system is working well at academic level, there is more to do before the information is useful to subsistence rural farmers living from hand to mouth. The raw data stream requires interpretation and the analysis must come through trusted channels most likely to be the government and tribal chiefs. Kelly Caylor cites the example of a sick child. The temperature reading has no use until a trusted source interprets it.

He has a vision of climate-smart agriculture where tradition gives way to global warming. He involves local farmers in his research by enrolling them when he places pods, and asking them to sms weekly weather reports to him that he correlates with the sensor data. As trust builds, he hopes to help them choose more climate-friendly crops and learn how to reallocate labour as seasons change.

The Better Way of Applying Benford’s Law for Fraud Detection

Applying Benford’s Law on large collections of data is an effective way of detecting fraud. In this article, we?ll introduce you to Benford’s Law, talk about how auditors are employing it in fraud detection, and introduce you to a more effective way of integrating it into an IT solution.

Benford’s Law in a nutshell

Benford’s Law states that certain data sets – including certain accounting numbers – exhibit a non-uniform distribution of first digits. Simply put, if you gather all the first digits (e.g. 8 is the first digit of ?814 and 1 is the first digit of ?1768) of all the numbers that make up one of these data sets, the smallest digits will appear more frequently than the larger ones.

That is, according to Benford’s Law,

1 should comprise roughly 30.1% of all first digits;
2 should be 17.6%;
3 should be 12.5%;
4 should be 9.7%, and so on.

Notice that the 1s (ones) occur far more frequently than the rest. Those who are not familiar with Benford’s Law tend to assume that all digits should be distributed uniformly. So when fraudulent individuals tinker with accounting data, they may end up putting in more 9s or 8s than there actually should be.

Once an accounting data set is found to show a large deviation from this distribution, then auditors move in to make a closer inspection.

Benford’s Law spreadsheets and templates

Because Benford’s Law has been proven to be effective in discovering unnaturally-behaving data sets (such as those manipulated by fraudsters), many auditors have created simple software solutions that apply this law. Most of these solutions, owing to the fact that a large majority of accounting departments use spreadsheets, come in the form of spreadsheet templates.

You can easily find free downloadable spreadsheet templates that apply Benford’s Law as well as simple How-To articles that can help you to implement the law on your own existing spreadsheets. Just Google “Benford’s law template” or “Benford’s law spreadsheet”.

I suggest you try out some of them yourself to get a feel on how they work.

The problem with Benford’s Law when used on spreadsheets

There’s actually another reason why I wanted you to try those spreadsheet templates and How-To’s yourself. I wanted you to see how susceptible these solutions are to trivial errors. Whenever you work on these spreadsheet templates – or your own spreadsheets for that matter – when implementing Benford’s Law, you can commit mistakes when copy-pasting values, specifying ranges, entering formulas, and so on.

Furthermore, some of the data might be located in different spreadsheets, which can likewise by found in different departments and have to be emailed for consolidation. The departments who own this data will have to extract the needed data from their own spreadsheets, transfer them to another spreadsheet, and send them to the person in-charge of consolidation.

These activities can introduce errors as well. That’s why we think that, while Benford’s Law can be an effective tool for detecting fraud, spreadsheet-based working environments can taint the entire fraud detection process.

There?s actually a better IT solution where you can use Benford’s Law.

Why a server-based solution works better

In order to apply Benford’s Law more effectively, you need to use it in an environment that implements better controls than what spreadsheets can offer. What we propose is a server-based system.

In a server-based system, your data is placed in a secure database. People who want to input data or access existing data will have to go through access controls such as login procedures. These systems also have features that log access history so that you can trace who accessed which and when.

If Benford’s Law is integrated into such a system, there would be no need for any error-prone copy-pasting activities because all the data is stored in one place. Thus, fraud detection initiatives can be much faster and more reliable.

You can get more information on this site regarding the disadvantages of spreadsheets. We can also tell you more about the advantages of server application solutions.

Spreadsheet Risk Issues

It is interesting to note that the riskiness of operational spreadsheets are overlooked even by companies with high standards of risk management. Only when errors amount to actual losses do they realize that these risks have been staring them in the face all along.

Common spreadsheet risk issues

Susceptibility to trivial manual errors

Due to the fundamental structure of spreadsheets, a slight change in the formula or value in any of their inhabited cells may already affect their overall output. An

  • accidental copy-paste,
  • omission of a negative sign,
  • erroneous range selection,
  • incorrect data input or
  • unintentional deletion of a character,cell, range, column, or row

are just some of the simple errors spreadsheet users frequently encounter. Rarely are there any counter-checking controls in place in a spreadsheet-based activity and manual errors therefore easily go undetected.

Possibility of the user working on the wrong version

How do you store spreadsheet files?

Since the most common reports are usually generated on a monthly basis, users tend to store them using variations of these two configurations:

spreadsheet storage

If you notice, a user can accidentally work on the wrong version with any of these structures.

Prone to inconsistent company-wide reporting

This happens when a summary or ?final? spreadsheet is fed information by different departments coming from their own spreadsheets. Even if most of the data in their spreadsheets come from one source (the company-wide database), erroneous copy-pasting and linking, or even different interpretations of the same data can result to contradicting information in the end.

Often defenceless against unauthorised access

Some spreadsheets contain information needed by various individuals or department units in an organisation. Hence, they are often shared via email or through shared folders in a network. Now, because spreadsheets don’t normally use any access control, any user can easily open a spreadsheet file and view or modify the contents as he wishes.

Highly vulnerable to fraud

A complex spreadsheet system with zero or very minimal controls provides the perfect setting for would-be fraudsters. Hidden cells with malicious formulas and links to bogus information can go unnoticed for a long time especially if the final figures don’t deviate much from expected values.

Spreadsheet risk mitigation solutions may not suffice

Inherent complexity makes testing and logic inspection very time consuming

Deep testing can uncover possible errors hidden in spreadsheet cells and consequently mitigate risks. But spreadsheets used to support financial reporting are normally large, complex, highly-personalised and, without ample supporting documentation, understandably hard to follow.

No clear ownership of risk management responsibilities

There?s always a dilemma when an organisation starts assigning risk management responsibilities for spreadsheets. IT personnel believe users in the business side of the organisation should be responsible since they are the ones who create, edit, store, duplicate, and share the spreadsheet files. On the other hand, users believe IT should be responsible since they have always been in-charge of managing IT infrastructure, applications, and files.

To get rid of spreadsheet risks, you’ll have to get rid of spreadsheets altogether

One remedy is to have a risk management activity that involves both IT personnel and spreadsheet users. But wouldn’t you want to get rid of the complexity of having to distribute the responsibilities between the two parties instead of just one?

Learn more about Denizon’s server application solutions and how you can get rid of spreadsheet risk issues.

More Spreadsheet Blogs


Spreadsheet Risks in Banks


Top 10 Disadvantages of Spreadsheets


Disadvantages of Spreadsheets – obstacles to compliance in the Healthcare Industry


How Internal Auditors can win the War against Spreadsheet Fraud


Spreadsheet Reporting – No Room in your company in an age of Business Intelligence


Still looking for a Way to Consolidate Excel Spreadsheets?


Disadvantages of Spreadsheets


Spreadsheet woes – ill equipped for an Agile Business Environment


Spreadsheet Fraud


Spreadsheet Woes – Limited features for easy adoption of a control framework


Spreadsheet woes – Burden in SOX Compliance and other Regulations


Spreadsheet Risk Issues


Server Application Solutions – Don’t let Spreadsheets hold your Business back


Why Spreadsheets can send the pillars of Solvency II crashing down

?

Advert-Book-UK

amazon.co.uk

?

Advert-Book-USA

amazon.com

Contact Us

  • (+353)(0)1-443-3807 – IRL
  • (+44)(0)20-7193-9751 – UK

Ready to work with Denizon?