Monday, 17 February 2025

Retrieval Augmented Generation

I’ve got a riddle for you. When is ChatGPT not the best thing since sliced bread? Answer: When it makes things up! 

What? You didn’t think that was funny? Well, if getting things wrong is a big deal in your work, you can be forgiven for not rolling on the floor laughing.  

There are many industries where it’s important to get things right such as software development, renewable energy, science, medicine, education, government, or any other knowledge-intensive field. In all these areas, people depend on having timely and correct information to do their work and advise their customers.

What are “hallucinations” in LLMs?

Large language models (LLMs), such as ChatGPT and others, have revolutionised the way we interact with information. But despite the impressive communication skills of LLMs, they do have some limitations, such as making things up, also known as hallucinating. It may not happen very often but for some businesses and organisations, it's a big deal.

Hallucinations include generating information that is plain wrong, providing information that’s not relevant, or attributing things incorrectly. But why do they happen? And are the LLMs actually making things up? 

LLM hallucinations often occur because of gaps in the training data. When the question is specialised, relating to a niche area, or outside the time period of data for the training data, the LLM may lack the domain-specific sources and so helpfully does the best it can. 

Sometimes the response is obviously wrong, but often the response is highly plausible. And if the person requesting the information doesn’t have specialised knowledge, they may well think the answer is correct. This is the point where the riddle becomes even less funny than it was to begin with. And it wasn’t very funny even then ….

How Retrieval Augmented Generation (RAG) helps

RAG is the process of optimising large language models (LLMs) for specific knowledge-intensive tasks. The “augmented” in RAG is the process of supplementing the data available to the LLM with additional, domain-specific data. This may be internal to an organisation, or specialist data for a particular sphere of knowledge. It might include searching databases and selected documents. It means that the LLM has access to more recent and more specialist data. Which means that hallucinations become less frequent, and relevance and accuracy are increased for that particular area. And by making verified company-specific data available to the LLM, there’s massive business benefit.

Why RAG matters to business

As our world becomes more complex, so does our need to manage information. In industries like software development, manufacturing, or renewable energy, accurate and reliable information is crucial. By integrating RAG into your business processes, you’re giving your teams better data for decision-making, customer service, and product development. RAG provides a way for organisations to improve their products and services by harnessing AI in a way that’s tailored to their needs. 

Ready to leverage RAG in your business?

The first step in creating a RAG solution is identifying and collecting relevant content ready for indexing. If you’d like to discuss a proof-of-concept project for your business, we’ve got the relevant skills to tailor a solution to fit your specific needs. Get in touch to see whether our skills are a good fit for your goals. 

Wednesday, 12 February 2025

Is your data AI ready?

Intelligent data: what it is, and why it matters

I know a lot of people are wary of AI. The idea that software might make better decisions than lawyers, scientists, or doctors takes a bit of getting used to. But AI is already improving the world in wonderful ways. 

AI is processing images to improve scientists’ understanding of wildlife populations. Advanced research shows that AI could do a better job than some doctors of identifying eye problems. And AI is already analysing documents to provide lawyers with relevant case histories.

None of this is science fiction. These are real projects that are reducing costs, improving accuracy, and removing repetitive tasks from highly skilled people, freeing them up to do more valuable and interesting  work. 

Whilst we marvel at the intelligence and friendliness of ChatGPT, many businesses executives are trying to figure out how best to use AI to stay competitive.  The irony is that answer may be surprisingly simple.

  1. Learn about AI: what it really is, how it works, and why it matters in your industry.
  2. Bring data together: AI works best on large volumes of trustworthy data. That means you need to bring disparate data sources together into one unified and secure data platform. 
  3. Get your data in shape. Clean it up. Match different data sources. Remove duplicates. Have an audit trail so you can trust the output. Add meta data. 
  4. Rinse and repeat. Put processes in place so new data is added and cleaned ready for analysis. 
  5. Make it visual. Microsoft Power BI has AI capabilities that make it much easier to visualise data, and to find data insights. 

AI is a disruptor, there can be little doubt about that. But figuring out how it will disrupt specific industries is more difficult to predict. But businesses who use data to uncover insights, and help their people make better decisions are outperforming those who do not. But everyone has to get started by understanding what the opportunities might be, and prototyping the possibilities.

Microsoft and the Azure intelligent data platform is a leader for Analytics and Business Intelligence platforms. From Power BI, Microsoft Fabric, to Azure SQL Database and CosmosDB, there’s a data platform to meet data needs large and small. 

If you are thinking about your next steps into an AI-powered world, get in touch to see whether our data engineering experience could help.

Thursday, 6 February 2025

Would your data win a gold medal?

Whether you work for a global enterprise or a local business, data quality is vital for analysis and reporting. I’ve talked about this before in How to Improve Data Quality and Designing for Data Protection but today I’m exploring a structured approach to data quality known as the medallion architecture. 

Data can come from a variety of systems, including legacy systems that have been used for many years. Some well-designed relational database systems will have tight constraints to help improve data quality, but others may have a lot of errors. You therefore need a way of getting your data into a format that can be trusted by decision-makers. 

This architecture comes from the world of big data and data lakes, but the ideas behind it are useful for all data projects. It’s called the medallion architecture, and it processes data in three distinct stages: Bronze, Silver, and Gold—just like Olympic medals. Clever, right?

Bronze layer: raw data

Data from source systems are imported “raw” without making any changes to the data. The purpose of this stage is to:

Validate import integrity. Ensure no data are missing, the original schema has been preserved, data has not been corrupted, etc.

Add meta data. Columns are added to identify the import date and time, originating system etc. 

Provide an audit trail. This bronze layer data is not modified, and so can be used to validate queries that emerge in later stages.

Avoid re-importing. This initial stage might not be the most glamorous, but it provides the foundation for everything that follows and needs to be done with care. You want to avoid reimporting the data if problems emerge later. 

Data are appended to the bronze layer periodically, and so files will increase in size over time. Data in the bronze layer is never accessed directly by business users, data scientists, or analysts. Instead, it forms the foundation for the silver and gold layers.

Silver layer: clean data

The silver layer uses data from the bronze layer and is never created directly from source data. The purpose of the silver layer is to:

Clean the data. Fix issues such as missing or null values, deduplicating, dealing with out-of-range values, data types, normalization, and other data quality issues.

Validate the data. Check no errors have been introduced by comparing and testing against the bronze layer. 

Normalize the data. Data may be split separate tables reading for processing at the gold layer.

Data is typically not aggregated at the silver layer but if aggregation is done, at least one non-aggregated record is preserved. Data at the silver layer might be used by data scientists or analysts. Business users would normally have access to the gold layer. 

Gold layer: business-ready data 

The gold layer is where data becomes business-reporting-ready. At this stage:

Data is denormalized and aggregated according to business needs.

Data models and measures are created, in line with how users want to query and analyse the data.

The gold layer is focussed on optimizing the data for business intelligence reporting. Data presented in a format suitable for business users to work with tools such as Power BI to create dashboards and reports. 

Data-led decisions need gold-standard data 

I may be overdoing the Olympic theme, but the concepts behind the medallion architecture are now considered best practice. It is built on good data practices that actually work. In summary:

1. Separate data ingestion and validation from later stages. Preserve this “raw” version of the data to create a base from which to validate future processing. 

2. Manage data quality issues as an intermediate step before aggregating, modelling or creating measures. 

3. Optimize for business use. Create aggregations and measures suited to business reporting requirements at the final stage. 

If you are tackling a reporting project, moving data to the cloud, or need help improving your data’s reliability, get in touch for a chat. We have decades of experience in improving data quality and data manipulation. Together we could make a winning team to turn raw data into golden insights (alright, alright, enough).

Thursday, 30 January 2025

Do you need a data strategy?

Three "A"s to improve your data management

In previous blog posts I’ve discussed the benefits of creating an L&D dashboard and the need to keep historical data, including training data. But a dashboard is only as good as the data behind it. Bad data with visual impact is still bad data. And worse, it will lead to bad decisions without anyone realising why. 

A well-thought-out data management strategy is the necessary step before reporting. It will save you time, money, and will improve decision-making. 

Small and medium sized businesses (SMEs), and departments within large organisations, can all benefit from managing their data in a structured way. A data strategy provides a framework that makes it easier to discuss the costs and benefits of data management. 

In this article, I’m going to discuss how to develop a data strategy to keep your data accurate and trustworthy. 

The basics

There are some basic principles to think about, regardless of what type of business or department you work for. These are the three “A”s of data management:

1. Accurate

2. Available

3. Adheres to the rules

Accurate

Data that doesn’t contain errors sounds simple enough, but mistakes creep into data in all sorts of ways. Incorrect dates, spelling mistakes, the wrong person or the wrong course, etc. Each individual error might not be important, but as you amass more data it all adds up. Whether you are using dashboards to monitor progress, create insights for strategy, or preparing for AI, you need a solid data foundation. 

For data to be accurate it must also be complete. It needs to include all relevant time periods, parts of the organisation, or relevant data. That often means thinking about where data is stored. You are likely to have data in relational databases such as SQL Server, or Oracle, often used as back-end data stores for LMS systems, accounts systems, production systems, etc. Other data will be semi-structured or unstructured, such as data in spreadsheets, Word documents, emails, training evaluation forms, or PDF files. 

Available

Availability means ensuring data is easily accessible to those who need it, when they need it, and in a format they can use. 

1. Granting access to those authorised to use the data and restricting access to those who shouldn’t have access.

2. Providing user friendly data models that can be used to create dashboards or for self-service reporting.

3. Storing data with appropriate redundancy safeguards. Data can be lost either by human error, hardware failures, or other disasters. 

4. Providing training so people know how to access the data, and what they can and cannot do with it.

Adheres to rules

Data rules and regulations come from both inside and outside the organisation. 

Compliance regulations such as GDPR require organisations to safeguard personal information, with heavy penalties for those who do not safeguard sensitive data.

Within your organisation you may have guidelines that define which roles have access to what data to ensure private data stays private. 

Think about who owns the data and who is responsible for safeguarding it. Use metadata (data about data) to identify different types of data within your organisation so you can put policies in place to manage it. 

Creating your data strategy

Get your team together and start the discussion. Identify a project that will provide immediate business benefits, then:

1. Identify your key data sources.

2. Agree who owns and manages each data set.

3. Identify and implement quality control measures.

Data management is the necessary first stage in communicating generating data insights and making data-led decision-making. Azure has a range of data governance, storage, and redundancy options to help you modernize your data management whilst improving accuracy, availability, and adhering to rules. 

If you want to improve your data management let’s talk about how you can get your data in better shape. Get in touch for an initial chat about what you want to achieve. 


Thursday, 23 January 2025

6 Reasons you don’t need an L&D dashboard

Creating a learning and development (L&D) dashboard provides many benefits, including getting your data in good shape, getting an overview of L&D activities, and finding insights to help you make better decisions. 

But is it right for you? Here are six reasons you don’t need an L&D dashboard. 

1. You already have an at-a-glance training report (and it's accurate)

If your current system already provides a clear and comprehensive view of completed and planned training, you don’t need an L&D dashboard. But you do need to check that the data is accurate, and up to date. Have employees reviewed their own training data? Can they correct mistakes? Is it complete, or are their gaps in time or for people? Have you verified historical records? 

Data is only valuable if it is correct; otherwise, you risk basing decisions on a shaky foundation. And that’s all the most important if you are considering how to use AI. 

2. Your L&D activities are closely aligned to business goals (and are clear to everyone)

When people understand why the training is important, and what role it plays in achieving organizational goals, you get the greatest bang for your L&D buck. But aligning training with business objectives, and making it visible, isn’t always easy. 

A well-designed L&D dashboard provides this visibility, ensuring stakeholders understand the purpose of what they are doing. If your L&D reporting achieves this level of clarity, you are probably already getting big paybacks for your clear thinking. If not, business goals are a great place to start when thinking about what you want your L&D dashboard to achieve.

3. Your employees are engaged in their learning plans (and can see their training history)

Engaged employees are key to the success of any L&D program. This requires employees having access to their learning plans and a view of what they have already completed. When people understand how their training connects to business strategy, they become more invested in their development and make better decisions.

For organizations with well-designed L&D dashboards, these tools enhance employee engagement. They already know that implementing a dashboard is a game-changer for improving transparency and motivation.

4. Skills gaps are clearly visible (based on today’s and tomorrow’s objectives)

If your systems enable you to identify skills gaps, and prioritize training based on current and future business objectives, you’ve got this covered. 

But for many, a well-designed L&D dashboard can help to identify high-priority areas like onboarding new employees, addressing compliance gaps, or upskilling for emerging technologies. Nothing stands still with L&D.

5. Your L&D data is being linked to employee performance (and metrics are hotly debated)

Organisations that are working on measuring the impact of L&D on employee performance already have a competitive advantage. Metrics such as productivity, quality, and employee retention are being used together with data for training, coaching, and other L&D initiatives. Dashboards help to communicate both data and insights and get everyone thinking about how to improve.

If you’ve started on this journey, you already know the benefits. However, for those ready to explore these connections further, a highly visual dashboard can provide the impact needed to communicate insights effectively to busy stakeholders.

6. Your L&D activities are visible (and everyone is engaged)

Organizations that make L&D activities understandable for all stakeholders tend to achieve better results. Transparency encourages both trust and accountability. 

If your current systems already provide this level of transparency, an L&D dashboard might not be necessary. However, for organizations looking to enhance visibility and collaboration, dashboards can play a vital role.

Conclusion

An L&D dashboard is a valuable tool at every stage of the learning and development journey. With AI helping us make better decisions, accurate and actionable L&D data has never been more important. The key is to evaluate your current capabilities and ensure your data is accurate and up to date.

Whether you need to refine your reporting systems or explore tools like Microsoft Power BI or Microsoft Fabric, the right solution starts with accurate data. Contact us to discover how we can help you build meaningful insights and communicate them effectively.