🌕 Introducing Galileo Luna – the new standard for GenAI evaluation 🌕 After a year of R&D, we're excited to announce the Galileo Luna family of Evaluation Foundation Models! 💰 Cost effective: 97% cheaper than GPT-3.5 ⚡ Fast: 11x faster than GPT-3.5 🎯 Accurate: 18% more accurate than GPT-3.5 📈 No Ground Truth Data Needed: Simplifies deployment and maintenance 🔧 Customizable: Quickly fine-tune for your specific evaluation requirements 🔒 Robust Security: Detect hallucinations, prevent prompt attacks, and enforce data privacy 📜 Extended Context Support: Handles 16k+ context length seamlessly Fortune 500 teams already use Luna to tackle the most difficult challenge in productionizing GenAI. Get started with Luna today: https://lnkd.in/g2g5pf_e #GenAI #GenerativeAI #LLM #LLMOps #MachineLearning #ML #AI #ArtificialIntelligence #CIO #CTO #CDO #DataEngineer #DataEngineering #DataScience #DataScientist
🔭 Galileo
Software Development
San Francisco, California 5,872 followers
Generative AI Evaluation, Experimentation, and Observability Platform
About us
At Galileo we are building the first algorithm-powered LLMOps Platform for the enterprise. Galileo is currently powering ML teams across the Fortune 500 as well as startups across multiple industries. If you are interested in working at the intersection of Machine Learning and data, alongside industry and academic veterans at a well-funded early stage company going after a big, real and massively underserved problem, we are currently hiring for multiple positions. Feel free to apply directly here: https://www.rungalileo.io/team Or reach out to us at team@rungalileo.io -- we would love to hear from you!
- Website
-
http://www.rungalileo.io
External link for 🔭 Galileo
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2021
Locations
-
Primary
525 Brannan St
San Francisco, California 94107, US
Employees at 🔭 Galileo
-
Dharmesh Thakker
General Partner at Battery Ventures - Supporting Cloud, DevOps, AI and Security Entrepreneurs
-
Brent Chalker
GTM @ Galileo - Build and Evaluate GenAI Apps Faster | Lean Thinker | Passionate Sales Leader | Business Value Creator
-
Masha Belyi
ML Scientist at Galileo
-
Vikram Chatterji
Co-founder/CEO at Galileo | Magical AI Evaluations and Observability
Updates
-
MMLU/MT Bench/MATH/GSM8k... Are AI benchmarks as reliable as we think? 🤔 According to Princeton professor Arvind Narayanan, many fall short. Once a benchmark gains popularity, it’s tough to shift away, even if flawed, because everyone wants to compare new models to old standards. But what are these benchmarks really testing? Investigations show that many benchmarks were built for simpler systems and are outdated. Surprisingly, many were created using content from platforms like Wikihow and Reddit, rather than consulting experts. Others relied on Mechanical Turk gig workers to craft questions aimed at evaluating morals and ethics. These tests cover a wide array of knowledge, from eighth-grade math to advanced fields like law and medicine. However, Emily M. Bender, a professor of linguistics at the University of Washington, argues that many benchmarks lack construct validity, meaning they don’t measure what they claim to. Bender also warned of the disconnect between what these benchmarks actually measure and how the model makers present a high score on a benchmark. “What do we need automated systems for taking multiple choice tests or standardized tests for? What’s the purpose of that?” said Bender. She points out that AI systems like Gemini and Llama don’t truly understand or reason; they merely predict the next sequence of letters based on their training data. This issue is becoming more prominent as society grapples with the broader impacts of AI. Policymakers are taking note, with numerous AI-related bills pending in California and Colorado recently passing the nation’s first comprehensive AI legislation. As we continue to integrate AI into various aspects of life, it’s crucial to scrutinize the tools we use to measure its capabilities. Are these benchmarks truly reflective of AI’s potential, or are they relics of a simpler past? 👇 Full article linked in the comments below #AI #ArtificialIntelligence #ML #MachineLearning #DataScience #DataScientist #DataEngineer #DataEngineering #LLM #GenAI #LLMOps #MLOps
-
📕 Just wrapped up reading Lilian Weng's insightful post on extrinsic hallucinations in LLMs. Here are some key takeaways... Challenges in hallucination ⚠ Hallucinations in LLMs have a dual nature: in-context and extrinsic. In-context hallucinations occur when the model's output contradicts the provided context, while extrinsic hallucinations are more elusive, arising when the model fabricates information not supported by its pre-training dataset. The root causes of hallucinations are found in both the pre-training and fine-tuning stages. During pre-training, the model ingests vast amounts of internet data, including outdated or incorrect information. Fine-tuning, which aims to enhance specific capabilities, introduces another layer of complexity. Notably, LLMs learn new knowledge introduced during fine-tuning more slowly than pre-existing knowledge, increasing the model's tendency to hallucinate. Solutions for hallucination 💭 Researchers have developed sophisticated detection and evaluation frameworks to tackle hallucinations. They found that LLMs categorize knowledge into Known and Unknown groups, with Known further divided into HighlyKnown, MaybeKnown, and WeaklyKnown. The best performance is achieved when the model learns the majority of Known examples but only a few Unknown ones. Prompting the model to generate responses to unanswerable or unknown questions can trigger hallucinations. Retrofit Attribution using Research and Revision (RARR) retroactively enables LLMs to support attributions to external evidence via editing for attribution. Methods like Rethinking with Retrieval (RR) rely on retrieval of relevant external knowledge without additional editing. Instead of using a search query generation model, RR’s retrieval is based on decomposed CoT prompting. Self-reflective retrieval-augmented generation trains a model to reflect on its own generation by outputting both task output and intermittent special reflection tokens. This approach improves the quality of the generated content by critiquing and selecting the best segments. Without grounding by external retrieved knowledge, verification and revision processes can reduce hallucination. Chain-of-Verification (CoVe) involves planning and executing verification steps to ensure factual correctness. Recitation-Augmented Generation (RECITE) relies on recitation as an intermediate step to improve factual correctness. The model first recites relevant information and then generates the output, reducing hallucination through self-consistency and multi-hop QA support. The battle against hallucinations continues and we can solve it by understanding the root causes and employing innovative detection and mitigation strategies! #LLM #LLMOps #GenAI #GenerativeAI #DataScience #DataScientist #DataEngineer #DataEngineering #AI #ArtificialIntelligence #ML #MachineLearning
-
📢 📢 Today, we're excited to share news of 🔭 Galileo's partnership with HP. By now, most will agree that to build a production-grade #genAI solution, you need an evaluation and observability solution to minimize hallucinations and ensure accurate and trustworthy responses. This problem becomes even more pronounced when working with proprietary and private data. Earlier this year, HP announced the Z by HP AI Studio, a centralized platform that combines data, workflows, and compute to accelerate AI and data science model development. Today we're excited to share that Galileo, alongside partners like NVIDIA, will be integrated directly into the Z by HP AI Studio. With Galileo, HP users will be able to detect and correct hallucinations, drift, and bias in their models, while proactively protecting against inaccurate or biased outputs—all within the HP AI Studio platform. Read more at the link in comments below ⬇ #AI #LLM #TrustworthyAI #Evals #partnership #hp
-
-
👉 Optimizing RAG: Actionable Insights from Recent Research RAG's complexity can be daunting for any team. Here are some actionable ways to streamline your RAG workflows and optimize performance. 🏆 Rerank for Relevance: Use monoT5 or TILDEv2 based on your efficiency requirements 📈 Efficient Embedding: Choose embedding models like LLM-Embedder for better retrieval performance 🛠 Implement Query Classification: Automate the decision-making process to determine if retrieval is necessary 🏎 Select Appropriate Retrieval Methods: Depending on your performance vs. efficiency needs, choose between Hybrid with HyDE or Hybrid 🍪 Optimize Chunking: Use methods like Small2Big and sliding windows for effective chunking Adopting these strategies can enhance the efficiency of your RAG systems, making them more robust and responsive to user needs! #LLMOps #LLM #GenAI #GenerativeAI #DataEngineer #DataEngineering #DataScience #DataScientist #ML #MachineLearning #AI #ArtificialIntelligence
-
We are excited to introduce our newest addition to the Galileo Team, Rob Bennett! 💫🔭 You can learn more about employees on our team page: https://lnkd.in/githwpaE Read up on our Blogs: https://lnkd.in/gpyev2E8 We are hiring! https://lnkd.in/gWKtq9wg #MeetTheGalileans #LLM #LLMOps #MLOps #ML #AI #data #hiring #techjobs #mldata #mljobs #rungalileo #mlengineer #nlp #naturallanguageprocessing #Platform #BackendEngineer #MLResearcher #Researcher #DeepLearning #prompt #monitor #finetuning #hallucinations #hallucination #genai #generativeai #ArtificialIntelligence #MachineLearning #DataEngineering #DataEngineer #DataScience #DataScientist #sales #SDR #AccountExecutive
-
-
🔭 Galileo reposted this
Building a GenAI app? Pro hack to save 💰💰💰 The cost-efficiency of your GenAI app depends on the model you select for your task, which depends on understanding the task's hardness and creating an efficient router. Pro hack: Use discriminative LLM to categorize the prompt's hardness level. Then, route the discussion to the appropriate generation LLM. Below are the models' capabilities on hard tasks. Use GPT4/Opus only for the hardest tasks. Haiku is the most efficient option right now! Amazing work from LMSYS team! #machinelearning #rag #genai #generativeai #llm
-
-
Multimodal models are gaining popularity across industries, but they are just as prone to hallucinations as LLMs. Learn the different types of hallucinations across modalities, what causes them, and how to mitigate them: https://lnkd.in/gwj3Rnin #LLM #LLMOps #ML #MachineLearning #AI #ArtificialIntelligence #DataScience #DataScientist #DataEngineer #DataEngineering
Survey of Hallucinations in Multimodal Models - Galileo
rungalileo.io
-
Excited to share how Chegg Inc. uses Galileo to build agentic applications! Don't miss tomorrow's session with Yash Sheth and Taranveer Singh at the AI Engineer World's Fair https://lnkd.in/gtaCWAH9
-
-
🍻 Get drinks with the Galileo team! We're excited to co-host a happy hour with Cohere and Baseten for the AI Engineer World's Fair – don't miss it: https://lu.ma/tmxrddfq #DataEngineering #ML #MachineLearning #AI #ArtificialIntelligence #DataEngineer #DataScience #DataScientist