A 4-Tiered Approach to Revolutionizing Security Investigations

Edward Johns

Solutions Engineer

Security teams are no longer just reacting to alerts, they're conducting deep, forensic-style investigations to uncover sophisticated threats that unfold over months or even years.

Situation: Traditional SIEMs

Traditional Security Information and Event Management (SIEM) systems, while excellent for real-time threat detection and incident response, often fall short when it comes to the nuanced demands of these investigations. They prioritize immediate alerting on hot data, leaving historical logs in cold storage that's cumbersome to access and analyze.

Enter AI-driven data tiering, a paradigm shift that optimizes data storage and retrieval specifically for investigative workflows. By leveraging technologies like AWS S3 for scalable storage and Parquet for efficient columnar data formats, the Observo platform transforms raw telemetry into investigation-ready intelligence. This approach ensures that every byte of data retains its potential value, enabling analysts to reconstruct attack timelines, identify subtle patterns, and achieve cost efficiencies at scale—all without the binary hot/cold constraints of legacy systems.

In this blog, we'll explore how AI-driven data tiering addresses the unique challenges of security investigations, drawing on Observo's innovative capabilities to illustrate a future where investigations are proactive, efficient, and intelligence-led.

Complication: Why SIEMs Aren't Enough

Security investigations differ fundamentally from threat detection, investigation, and response (TDIR). While SIEMs shine in spotting known threats through real-time rules and alerts, investigations require piecing together historical puzzles: tracing lateral movements across six months of logs, correlating benign administrative actions with anomalous network traffic, or pinpointing an initial compromise amid thousands of authentication events.

Traditional SIEM tiering exacerbates these challenges. Data is either kept in expensive, searchable hot storage for short periods (90 days) or archived in cold storage, where it loses enrichment, normalization, and queryability. This leads to bottlenecks like:

Cost Prohibitions: Maintaining months of detailed logs in hot storage skyrockets expenses.
Performance Issues: Historical queries on large datasets degrade quickly.
Context Erosion: Raw archived data lacks behavioral insights or threat intelligence, forcing manual reconstruction during crises.

A real-world example from a financial institution underscores the challenge: During a breach investigation, their SIEM’s 90-day hot data window missed an eight-month-old initial compromise. Analysts spent weeks manually sifting through disparate cold storage systems, struggling to correlate events that Observo’s advanced pattern-matching algorithms could have automatically uncovered and linked in hours, leveraging enriched logs and dynamic clustering to reveal the attack timeline with precision.

The solution? Intelligence-driven tiering that dynamically assesses data value based on investigative potential, using AI to optimize storage while preserving accessibility. By integrating AWS S3's cost-effective object storage with Parquet's compression and columnar efficiency, this approach scales to petabytes without sacrificing investigative fidelity.

Implication: SIEMs Investigation Limitations

Failing to address these limitations results in prolonged investigation times, increased vulnerability to sophisticated adversaries, and significant financial losses. For instance, analysts may waste weeks pulling from disparate systems, leading to delayed threat identification and response—such as in an APT reconstruction taking days, weeks or months instead of hours, or an insider threat exposing $847M in trades. This not only heightens risks like data exfiltration, compliance violations such as SEC audits, and reputation damage but also drives up operational costs, with organizations potentially overspending by millions on SIEM storage while achieving only reactive alert-chasing. Without proactive, intelligence-led tools, security teams face alert fatigue, higher false positives (up to 70% unaddressed), and inability to scale to petabytes, ultimately lagging behind nation-state campaigns or malware distributions affecting thousands of customers.

Position: An Intelligence-First Investigations Framework

To overcome these challenges, organizations should adopt an AI-driven data tiering approach that optimizes storage and retrieval for investigative workflows, transforming raw telemetry into investigation-ready intelligence. Observo redefines data management by placing investigations at the core, using machine learning to create adaptive systems that dynamically assess data value, enrich logs with context, and leverage cost-effective storage like AWS S3 and Parquet formats. Parquet's columnar structure enables fast queries on specific fields such as timestamps or IPs, ideal for investigative drilling. This setup achieves 74% storage cost reductions through compression and intelligent sampling, all while maintaining full fidelity for compliance.

Observo places investigations at the core, using machine learning to turn static rules into adaptive investigative responses. Unlike rule-based tools like Cribl, which require predefined "valuable" data criteria, Observo anticipates investigative needs through dynamic intelligence.

Key capabilities include:

Dynamic Pattern Extraction & Anomaly Detection: The system uses memory-efficient pattern mining to group similar log entries and create baselines of normal behavior. This capability enables robust anomaly detection by highlighting deviations from established patterns in real time.
Log Metadata Enricher Configurations: This initial step enriches incoming log data with contextual metadata—such as timestamps, log levels, host details, and event identifiers—based on predefined or dynamic configurations. By tagging logs with this additional context, the system sets the stage for more accurate pattern recognition.
Pattern Extractor: Leveraging memory-efficient algorithms, the Pattern Extractor processes the enriched logs in real time to identify recurring patterns and group similar events. This step not only condenses multiple instances of similar events into a single representation, reducing data dimensionality, but also establishes a baseline of normal behavior that is critical for detecting anomalies.
Enricher Sentiment Analyzer: Once patterns are identified, this module applies deep learning–based sentiment analysis to assess the contextual tone of each pattern. By assigning sentiment scores (such as positive, neutral, or negative), it highlights patterns that might indicate potential issues or significant events, helping teams prioritize critical alerts and streamline incident response, enriching them preemptively to ensure critical context is always available.
Accelerated Incident Resolution: Enriched with sentiment analysis, the platform accelerates incident resolution by over 40%. By tagging log data with positive or negative sentiments, it prioritizes alerts for rapid troubleshooting while minimizing alert fatigue.
Data Enrichment: Transforms raw telemetry into context-rich, analytics-ready logs by leveraging AI-driven pattern extraction, sentiment analysis, and dynamic/static threat intelligence feed lookups such as Abuse.ch or AlienVault OTX, cutting false positives by 70%, and enabling proactive threat hunting with 60-80% faster detection.

Action: Four-Tier Investigation Process

Observo revolutionizes security investigations by replacing the outdated hot/warm/cold storage models of traditional SIEMs with a dynamic, Four-Tier Investigation Process tailored for uncovering complex, long-term threats. This innovative framework leverages AI-driven data tiering, cost-effective storage solutions like AWS S3, and efficient Parquet formats to ensure seamless access to enriched, analytics-ready data at scale. Each tier is purpose-built to address the unique demands of investigative workflows, enabling analysts to navigate from high-level insights to granular details with unparalleled speed and precision, all while achieving significant cost efficiencies and compliance readiness.

The Four-Tier Investigation Process

Why It Matters

This four-tier process eliminates the binary constraints of traditional SIEMs, where data is either expensively searchable or inaccessible in cold storage. By dynamically assessing data value with AI, Observo achieves up to 74% storage cost reductions through compression and intelligent sampling, while enabling 98%+ field accuracy for reliable analytics. The pipeline—comprising AI-powered parsing, noise reduction (up to 80%), and serialization to standards—ensures seamless data flow from ingestion to investigation. For instance, querying Tier 3 Parquet files can deliver sub-second results on 2.3 petabytes, empowering analysts to trace lateral movements or pinpoint initial compromises efficiently. This scalable, intelligence-driven approach transforms data lakes into investigative powerhouses, ensuring every event’s value is preserved for proactive threat hunting and compliance.

This low-cost storage bucket (S3) example structure exemplifies this:

Our pipelines support sources, functions, optimizers, serializers and destinations that ensure data flows efficiently. Parsers and AI-powered transforms handle unstructured logs, extracting fields like IPs and usernames without manual regex. Optimizers drop noise (up to 80% reduction), and serializers map to standards such as OCSF for easy integration.

Security Investigations in Action Use Cases

Observo’s Four-Tier Investigation Process, powered by AI-driven data tiering, data enrichment, pattern extraction, sentiment analysis, low-cost storage like AWS S3 and Parquet, transforms how organizations tackle complex threats, as demonstrated by these potential use case scenarios showcasing its investigative power:

Uncovering a 14-Month APT Campaign

A government defense contractor faced a hypothetical nation-state attack where a SIEM’s 90-day data window missed the initial compromise. Observo’s enriched logs, integrated with threat intelligence from Abuse.ch and AlienVault OTX, could flag suspicious patterns like unusual DNS queries and PowerShell executions. By querying 2.3 petabytes of data in seconds using Amazon Athena, analysts could reconstruct the 14-month attack timeline in days not months, identifying phishing, lateral movement, and exfiltration. This could save $2.8M in storage costs, accelerate resolution by 80%, and ensure compliance with preserved evidence.

Stopping a Supply Chain Attack

In a potential scenario, TechFlow Industries detects malware in a software update affecting 15,000 customers. Observo’s pipeline could analyze and store six months of CI/CD logs in low-cost storage such as AWS S3, pinpointing anomalies like build time deviations and offshore code signing. Clustering and sentiment analysis would trace the compromise to a phishing attack 127 days prior, enabling containment in 4 hours and a clean release in 24 hours, potentially cutting storage costs by 74% and minimizing reputation damage through rapid response.

Uncovering an Insider Threat

‍In a potential scenario, a global investment bank flags unusual after-hours database queries by a specific quantitative analyst potentially indicating insider trading. Observo’s pipeline could cluster 24 months of logs, enriched with GreyNoise threat intelligence and user context, to baseline normal behavior (150-200 daily queries, 7 AM-8 PM). Sentiment analysis might flag anomalies like a 23% increase in after-hours queries and access to unassigned portfolios. By querying 18 months of Parquet-formatted logs in seconds via Amazon Athena, analysts could correlate $847M in affected trades, confirm exfiltration attempts, and provide SEC-compliant audit trails within 72 hours. This could reduce SIEM costs by 50%, cut manual effort by 67%, and accelerate resolution by 42%, ensuring rapid, evidence-driven investigations.

Benefit: Cost Savings & Enhanced Capabilities

By adopting Observo's intelligence-first approach, organizations can achieve up to 80% reductions in data volumes through AI-driven compression, sampling, anomaly detection, and routing to cost-effective storage like AWS S3 and Parquet formats, leading to 50-75% lower SIEM costs overall. This enables 60-80% faster threat detection, a 42% quicker mean time to resolution (MTTR) for incidents, and proactive investigations with 70% fewer false positives via enriched insights, sentiment analysis, and scalable querying on petabytes—delivering sub-second results with tools like Amazon Athena. Real-world outcomes include reconstructing 14-month APT timelines in days, all while reducing manual effort by up to 85% and ensuring compliance readiness through preserved chain-of-custody and full-fidelity data rehydration.

Focusing on our Four-Tier Investigation Process avoids traditional SIEM alert overload, emphasizing long-term intelligence and operational efficiency. As threats become more persistent, security evolves from reactive systems to proactive, AI-optimized platforms, with Observo's low-cost storage solutions turning data lakes into investigative powerhouses. By replacing rigid rules with dynamic intelligence—supported by integrations with over 500 sources and open formats like OCSF—organizations gain scalable tools for hypothesis-driven hunts, enhanced resilience, and processing capabilities for 100 PB of data per month. Embracing this investigation-first model isn't just an upgrade—it's essential for preserving every event's value and staying ahead of sophisticated adversaries.

‍

For more information on AI-native data pipelines as essential components of modern SOCs, read The CISO Field Guide to AI Security Data Pipelines.

‍

See the Observo AI Data Pipeline in action.

Request a personalized demo to see how Observo AI can help you.

Request a Demo