Thursday, 6 March 2025

Data is the new oil

Almost 20 years ago Chris Humby coined the phrase “data is the new oil” and it has been repeated like a mantra ever since. “Data is the new oil” has come to define a generation of business thinking. Data has changed the way we think, and the way we pay for things. 

Chris Humby was a data scientist before we knew that we needed data scientists. He worked with Tesco and found value in a place where no one else had looked: the weekly shop. Back then, each unremarkable purchase of toothpaste or bread was just that - unremarkable. Today we know differently. We now recognise the value in data that shows, for example, the purchase of nappies. We know that it indicates a high probability of household with a baby. Which in turn means a high probability of tired parents likely to redeem coupons against toys or premium brand nappies. And conversely, there’s no point offering money off toys or nappies to households without young children. 

The result was the Tesco Clubcard which became a win-win for both consumers and Tesco. And, of course, Chris Humby didn’t do too badly out of it either. Together, they had struck oil. 

Data needs to be refined

As with many catchy sayings, there’s more than a grain of truth in what he said. Like oil, valuable data is often hidden and needs research to be identified as such. And like crude oil, data also needs to be refined to be made useful. There was investment and experimentation in the Clubcard project before it became the success we know today.

Where is your data hidden?

A lot has happened since then, including of course AI. Whilst AI was around 20 years ago, it was not in the accessible format it is today. Azure AI Services make AI models available on a pay as you go basis, requiring an understanding of software engineering rather than machine learning to implement them.

But a lot of valuable data is still hidden. Some companies have made big investments in data analysis,  but many have not. And almost all businesses have vast amounts of data in databases, spreadsheets, legacy systems, and many other places. This is data that no one has yet identified as valuable.

Competitive advantage in an AI world

In a world where AI makes general knowledge easier to access, competitive advantage will come from business-specific knowledge. That is, the knowledge that individual businesses or organisations have built up over time and is unique to their organisation and industry. This is company-confidential data that has value to decision-makers - provided it can be easily accessed when it’s needed. 

Consider the needs of a medium sized business or department that wants to make in-house data easily available to their people. In the past, this might have been done using a combination of induction training, on-going training courses, operating manuals, and data shared on intranets.

ChatGPT has shown us a friendlier way to access data. AI now makes it much easier to access company data in the same way. Azure AI Search provides functionality to augment large language models like ChatGPT with domain specific data, such as company knowledge bases or information held in a database. 

Azure AI Search

Azure AI Search enables advanced querying of different types of data. It has clever ways of indexing data, which enables better performance when querying the data. And it integrates with other Azure services to provide natural language processing. 

It is no longer news that data has value. Now the question is how business should use their corporate data to best advantage. Azure AI Services, which includes Azure AI Search, puts better data within the reach of more businesses.

If you want to explore what Azure AI services could do you for you, then get in touch. We are Azure software developers with solid experience of software engineering and data manipulation. And we’d love to help you find your oil. 


Tuesday, 25 February 2025

What can Azure AI Services do for you?

AI is revolutionizing industries worldwide. From automating customer service with ChatGPT to enabling early disease detection, the possibilities are limitless. But how can businesses unlock this power without employing expensive AI? That’s where Microsoft Azure AI services comes in.

Business leaders are rightly focused on results, not building AI models. Azure AI services let you integrate powerful, pre-built AI capabilities into your business applications without requiring a team of specialists or a deep knowledge of machine learning.

Azure AI services offer pre-built, easy-to-deploy AI models that can be provisioned through the Azure portal, just like adding a database or virtual machine. Capabilities like image analysis and document intelligence are now available to anyone who can see an opportunity to improve a business process. Azure AI services make it much easier to turn a good idea into a better business.

Azure AI Services include Language, Speech, Vision, Content Safety, and more. For example, Azure AI Language enables sentiment analysis, key phrase extraction, and summarization—helping you better understand customer feedback and improve decision-making. Some Azure AI services can be customized, allowing you to develop innovative solutions to address specific challenges. You might want to improve customer experiences or streamline operations. And with a pay-as-you-go pricing model, building a proof-of-concept application is low-risk.

Azure also provides a comprehensive suite of tools—such as storage, app services, and containers—that support AI-powered application development. Now, instead of needing AI experts, we can use Azure software engineering skills to create powerful AI solutions.

With Azure AI, businesses of all sizes can tackle challenges and seize opportunities in ways that were once only possible for larger organizations.

If you are ready to explore the potential of AI in your business, contact us now to find out how we can help. We have the Azure software development skills needed to incorporate Azure AI services in ways that could transform your operations and drive innovation.

Monday, 17 February 2025

Retrieval Augmented Generation

I’ve got a riddle for you. When is ChatGPT not the best thing since sliced bread? Answer: When it makes things up! 

What? You didn’t think that was funny? Well, if getting things wrong is a big deal in your work, you can be forgiven for not rolling on the floor laughing.  

There are many industries where it’s important to get things right such as software development, renewable energy, science, medicine, education, government, or any other knowledge-intensive field. In all these areas, people depend on having timely and correct information to do their work and advise their customers.

What are “hallucinations” in LLMs?

Large language models (LLMs), such as ChatGPT and others, have revolutionised the way we interact with information. But despite the impressive communication skills of LLMs, they do have some limitations, such as making things up, also known as hallucinating. It may not happen very often but for some businesses and organisations, it's a big deal.

Hallucinations include generating information that is plain wrong, providing information that’s not relevant, or attributing things incorrectly. But why do they happen? And are the LLMs actually making things up? 

LLM hallucinations often occur because of gaps in the training data. When the question is specialised, relating to a niche area, or outside the time period of data for the training data, the LLM may lack the domain-specific sources and so helpfully does the best it can. 

Sometimes the response is obviously wrong, but often the response is highly plausible. And if the person requesting the information doesn’t have specialised knowledge, they may well think the answer is correct. This is the point where the riddle becomes even less funny than it was to begin with. And it wasn’t very funny even then ….

How Retrieval Augmented Generation (RAG) helps

RAG is the process of optimising large language models (LLMs) for specific knowledge-intensive tasks. The “augmented” in RAG is the process of supplementing the data available to the LLM with additional, domain-specific data. This may be internal to an organisation, or specialist data for a particular sphere of knowledge. It might include searching databases and selected documents. It means that the LLM has access to more recent and more specialist data. Which means that hallucinations become less frequent, and relevance and accuracy are increased for that particular area. And by making verified company-specific data available to the LLM, there’s massive business benefit.

Why RAG matters to business

As our world becomes more complex, so does our need to manage information. In industries like software development, manufacturing, or renewable energy, accurate and reliable information is crucial. By integrating RAG into your business processes, you’re giving your teams better data for decision-making, customer service, and product development. RAG provides a way for organisations to improve their products and services by harnessing AI in a way that’s tailored to their needs. 

Ready to leverage RAG in your business?

The first step in creating a RAG solution is identifying and collecting relevant content ready for indexing. If you’d like to discuss a proof-of-concept project for your business, we’ve got the relevant skills to tailor a solution to fit your specific needs. Get in touch to see whether our skills are a good fit for your goals. 

Wednesday, 12 February 2025

Is your data AI ready?

Intelligent data: what it is, and why it matters

I know a lot of people are wary of AI. The idea that software might make better decisions than lawyers, scientists, or doctors takes a bit of getting used to. But AI is already improving the world in wonderful ways. 

AI is processing images to improve scientists’ understanding of wildlife populations. Advanced research shows that AI could do a better job than some doctors of identifying eye problems. And AI is already analysing documents to provide lawyers with relevant case histories.

None of this is science fiction. These are real projects that are reducing costs, improving accuracy, and removing repetitive tasks from highly skilled people, freeing them up to do more valuable and interesting  work. 

Whilst we marvel at the intelligence and friendliness of ChatGPT, many businesses executives are trying to figure out how best to use AI to stay competitive.  The irony is that answer may be surprisingly simple.

  1. Learn about AI: what it really is, how it works, and why it matters in your industry.
  2. Bring data together: AI works best on large volumes of trustworthy data. That means you need to bring disparate data sources together into one unified and secure data platform. 
  3. Get your data in shape. Clean it up. Match different data sources. Remove duplicates. Have an audit trail so you can trust the output. Add meta data. 
  4. Rinse and repeat. Put processes in place so new data is added and cleaned ready for analysis. 
  5. Make it visual. Microsoft Power BI has AI capabilities that make it much easier to visualise data, and to find data insights. 

AI is a disruptor, there can be little doubt about that. But figuring out how it will disrupt specific industries is more difficult to predict. But businesses who use data to uncover insights, and help their people make better decisions are outperforming those who do not. But everyone has to get started by understanding what the opportunities might be, and prototyping the possibilities.

Microsoft and the Azure intelligent data platform is a leader for Analytics and Business Intelligence platforms. From Power BI, Microsoft Fabric, to Azure SQL Database and CosmosDB, there’s a data platform to meet data needs large and small. 

If you are thinking about your next steps into an AI-powered world, get in touch to see whether our data engineering experience could help.

Thursday, 6 February 2025

Would your data win a gold medal?

Whether you work for a global enterprise or a local business, data quality is vital for analysis and reporting. I’ve talked about this before in How to Improve Data Quality and Designing for Data Protection but today I’m exploring a structured approach to data quality known as the medallion architecture. 

Data can come from a variety of systems, including legacy systems that have been used for many years. Some well-designed relational database systems will have tight constraints to help improve data quality, but others may have a lot of errors. You therefore need a way of getting your data into a format that can be trusted by decision-makers. 

This architecture comes from the world of big data and data lakes, but the ideas behind it are useful for all data projects. It’s called the medallion architecture, and it processes data in three distinct stages: Bronze, Silver, and Gold—just like Olympic medals. Clever, right?

Bronze layer: raw data

Data from source systems are imported “raw” without making any changes to the data. The purpose of this stage is to:

Validate import integrity. Ensure no data are missing, the original schema has been preserved, data has not been corrupted, etc.

Add meta data. Columns are added to identify the import date and time, originating system etc. 

Provide an audit trail. This bronze layer data is not modified, and so can be used to validate queries that emerge in later stages.

Avoid re-importing. This initial stage might not be the most glamorous, but it provides the foundation for everything that follows and needs to be done with care. You want to avoid reimporting the data if problems emerge later. 

Data are appended to the bronze layer periodically, and so files will increase in size over time. Data in the bronze layer is never accessed directly by business users, data scientists, or analysts. Instead, it forms the foundation for the silver and gold layers.

Silver layer: clean data

The silver layer uses data from the bronze layer and is never created directly from source data. The purpose of the silver layer is to:

Clean the data. Fix issues such as missing or null values, deduplicating, dealing with out-of-range values, data types, normalization, and other data quality issues.

Validate the data. Check no errors have been introduced by comparing and testing against the bronze layer. 

Normalize the data. Data may be split separate tables reading for processing at the gold layer.

Data is typically not aggregated at the silver layer but if aggregation is done, at least one non-aggregated record is preserved. Data at the silver layer might be used by data scientists or analysts. Business users would normally have access to the gold layer. 

Gold layer: business-ready data 

The gold layer is where data becomes business-reporting-ready. At this stage:

Data is denormalized and aggregated according to business needs.

Data models and measures are created, in line with how users want to query and analyse the data.

The gold layer is focussed on optimizing the data for business intelligence reporting. Data presented in a format suitable for business users to work with tools such as Power BI to create dashboards and reports. 

Data-led decisions need gold-standard data 

I may be overdoing the Olympic theme, but the concepts behind the medallion architecture are now considered best practice. It is built on good data practices that actually work. In summary:

1. Separate data ingestion and validation from later stages. Preserve this “raw” version of the data to create a base from which to validate future processing. 

2. Manage data quality issues as an intermediate step before aggregating, modelling or creating measures. 

3. Optimize for business use. Create aggregations and measures suited to business reporting requirements at the final stage. 

If you are tackling a reporting project, moving data to the cloud, or need help improving your data’s reliability, get in touch for a chat. We have decades of experience in improving data quality and data manipulation. Together we could make a winning team to turn raw data into golden insights (alright, alright, enough).