Episodes

  • This story was originally published on HackerNoon at: https://hackernoon.com/go-clean-to-be-lean-data-optimization-for-improved-business-efficiency.
    The article discusses cost optimization with clean data, explaining how businesses can save resources by reducing the workload for data analysts and more.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-cleaning, #data-optimization, #data-cleansing, #clean-data, #big-data, #big-data-processing, #data-processing, #business-data, and more.

    This story was written by: @karolisdidziulis. Learn more about this writer by checking @karolisdidziulis's about page, and for more stories, please visit hackernoon.com.

    This article discusses cost optimization with clean data. It explains how businesses can save resources by decreasing the load for data analysts, among other opportunities. It also discusses the differences between raw and clean data and who can benefit from switching to the latter. You'll also find 4 ways in which clean data reduces time to value.

  • This story was originally published on HackerNoon at: https://hackernoon.com/efficient-data-management-and-workflow-orchestration-with-apache-doris-job-scheduler.
    Apache Doris 2.1.0's built-in Job Scheduler simplifies task automation with high efficiency, flexibility, and easy integration for seamless data management.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-engineering, #big-data, #database, #open-source, #programming, #apache-doris, #task-automation, #workflow-orchestration, and more.

    This story was written by: @frankzzz. Learn more about this writer by checking @frankzzz's about page, and for more stories, please visit hackernoon.com.

    The built-in Doris Job Scheduler triggers pre-defined operations efficiently and reliably. It is useful in many cases including ETL and data lake analytics.

  • Missing episodes?

    Click here to refresh the feed.

  • This story was originally published on HackerNoon at: https://hackernoon.com/scaling-ethereum-data-bloat-data-availability-and-the-cloudless-solution.
    Determining how to persist Ethereum’s excess data will allow it to scale indefinitely into the future, and Codex has arrived to help.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-storage, #decentralized-storage, #peer-to-peer, #web3-storage, #ethereum, #ethereum-scaling, #good-company, #data-bloat, and more.

    This story was written by: @logos. Learn more about this writer by checking @logos's about page, and for more stories, please visit hackernoon.com.

    Codex is a cloudless, trustless, p2p storage protocol seeking to offer strong data persistence and durability guarantees for the Ethereum ecosystem and beyond. Due to the rapid development and implementation of new protocols, the Ethereum blockchain chain has become bloated with data. This data bloat can also be defined as “network congestion,” where transaction data clogs the network and undermines scalability. Codex offers a solution to the DA problem, except with data persistence.

  • This story was originally published on HackerNoon at: https://hackernoon.com/what-frontend-devs-want-from-backend-devs.
    Backend developers can help frontend developers work with their API more efficiently and ship the product with as little friction as possible.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-structure, #backend-developer, #typescript, #programming-advice, #api, #coding-teamwork, #how-to-have-clean-code, #figma, and more.

    This story was written by: @smileek. Learn more about this writer by checking @smileek's about page, and for more stories, please visit hackernoon.com.

    Backend developers can help frontend developers work with their API more efficiently and ship the product with as little friction as possible. Here are a few simple things that can decrease your time-to-market or improve other fancy metrics your managers want you to improve. I will tell it from the web developers’ point of view, but from what I remember, the same works for mobile development.

  • This story was originally published on HackerNoon at: https://hackernoon.com/how-to-build-an-ai-chatbot-with-python-and-gemini-api.
    Learn how to create a web-based AI chatbot using Python and the Gemini API with this step-by-step beginner-friendly guide.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #python-programming, #ai-chatbot, #google-gemini, #google-ai, #gemini-api, #python-tutorials, #python-flask, #chatbot-development, and more.

    This story was written by: @proflead. Learn more about this writer by checking @proflead's about page, and for more stories, please visit hackernoon.com.

    This guide walks you through building a web-based AI chatbot using Python and the Gemini API. From setting up your environment to running your chatbot, you'll learn each step to create your own AI assistant.

  • This story was originally published on HackerNoon at: https://hackernoon.com/how-to-set-up-a-local-dns-server-with-python.
    DNS servers play a crucial role in translating human-friendly domain names into IP addresses that computers use to identify each other on the network.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #python-programming, #networking, #dns-server-guide, #how-to-set-up-dns-server, #how-to-creatw-html-files, #http-server-guide, #troubleshooting-dns-server, #python-and-dns-servers, and more.

    This story was written by: @hackerclukchp0j00003b6oy80p1nrw. Learn more about this writer by checking @hackerclukchp0j00003b6oy80p1nrw's about page, and for more stories, please visit hackernoon.com.

    DNS servers play a crucial role in translating human-friendly domain names into IP addresses that computers use to identify each other on the network. Setting up your own local DNS server can be beneficial for various reasons, including local development, internal network management, and educational purposes. We’ll create a simple HTTP server using Python’s built-in `http.server` module to serve the HTML files.

  • This story was originally published on HackerNoon at: https://hackernoon.com/the-collective-loves-data-how-big-data-is-shaping-and-predicting-our-future.
    Big data shapes our future! Explore how massive datasets are used to predict trends & make smarter decisions.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #big-data, #what-is-big-data, #examples-of-big-data, #digital-footprint, #machine-world, #big-data-storage, #big-data-processing, #what-to-know-about-big-data, and more.

    This story was written by: @manoj123. Learn more about this writer by checking @manoj123's about page, and for more stories, please visit hackernoon.com.

    Big data surrounds us! From social media posts to sensor readings, vast amounts of information shape our world. This article by a Google engineer dives into what big data is (think massive, varied, and ever-growing data sets) and how it's analyzed to predict trends and make smarter decisions. Learn about real-world applications and exciting future possibilities like AI and quantum computing.

  • This story was originally published on HackerNoon at: https://hackernoon.com/apache-doris-for-log-and-time-series-data-analysis-in-netease-why-not-elasticsearch-and-influxdb.
    NetEase has replaced Elasticsearch and InfluxDB with Apache Doris in its monitoring and time series data analysis platforms, respectively
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-engineering, #logging, #time-series-analysis, #time-series-database, #big-data-analytics, #elasticsearch, #database, #netease, and more.

    This story was written by: @frankzzz. Learn more about this writer by checking @frankzzz's about page, and for more stories, please visit hackernoon.com.

    NetEase has replaced Elasticsearch and InfluxDB with Apache Doris in its monitoring and time series data analysis platforms, respectively, achieving 11X query performance and saving 70% of resources.

  • This story was originally published on HackerNoon at: https://hackernoon.com/unlocking-the-power-of-data-lakes-for-embedded-analytics-in-multi-tenant-saas.
    Discover why data lakes are superior to traditional data warehouses for embedded analytics in SaaS applications.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-analytics, #embedded-analytics, #data-lake, #data-warehouse, #qrvey, #b2b-saas, #data-storage, #good-company, and more.

    This story was written by: @goqrvey. Learn more about this writer by checking @goqrvey's about page, and for more stories, please visit hackernoon.com.

    Analytics should extract maximum insight right? Well, to do that, you’ll need complete access to all relevant data. A data lake is a central storage for all kinds of data in its original, unstructured form. Data lakes are generally more cost-effective than data warehouses for embedded analytics use cases.

  • This story was originally published on HackerNoon at: https://hackernoon.com/the-linkedin-nanotargeting-experiment-that-broke-all-the-rules.
    Discover how a groundbreaking nanotargeting experiment on LinkedIn defies audience size restrictions, unlocking new ad campaign strategies.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #nanotargeting, #online-advertising, #user-privacy, #user-data-security, #hyper-personalized-ads, #public-data-risks, #linkedin-advertising, #hackernoon-top-story, and more.

    This story was written by: @netizenship. Learn more about this writer by checking @netizenship's about page, and for more stories, please visit hackernoon.com.

    A study demonstrates the feasibility of nanotargeting on LinkedIn, bypassing audience size restrictions and achieving successful campaigns by employing JavaScript code to reactivate campaign launch buttons, employing various targeting strategies, and verifying success through campaign metrics and user interaction.

  • This story was originally published on HackerNoon at: https://hackernoon.com/data-science-interview-question-creating-roc-and-precision-recall-curves-from-scratch.
    This is one of the popular data science interview questions which requires one to create the ROC and similar curves from scratch.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-science, #data-science-interview, #precision-and-recall, #precision-recall-curves, #roc-data-science, #data-analysis, #data-science-job-questions, #hackernoon-top-story, and more.

    This story was written by: @varunnakra1. Learn more about this writer by checking @varunnakra1's about page, and for more stories, please visit hackernoon.com.

    This is one of the popular data science interview questions which requires one to create the ROC and similar curves from scratch. For the purposes of this story, I will assume that readers are aware of the meaning and the calculations behind these metrics and what they represent and how are they interpreted. We start with importing the necessary libraries (we import math as well because that module is used in calculations)

  • This story was originally published on HackerNoon at: https://hackernoon.com/why-should-companies-outsource-data-processing.
    Data processing outsourcing boosts efficiency, reduces costs, and enhances decision-making, helping businesses manage and leverage vast data effectively.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data, #data-processing, #data-outsourcing, #data-mangagement, #data-security, #data-cost-reduction, #data-efficiency, #data-science, and more.

    This story was written by: @rayanpotterr. Learn more about this writer by checking @rayanpotterr's about page, and for more stories, please visit hackernoon.com.

    Data processing is an essential business process and it consists of activities like order processing, form processing, compilation of mailing lists, and processing of different organizational and business information. Data processing outsourcing also offers a two-fold benefit i.e. low operational expenses and increased operational efficiency. It also helps in enhancing data quality, obtaining insights quicker to make well-informed and timely business decisions.

  • This story was originally published on HackerNoon at: https://hackernoon.com/the-role-of-big-data-in-developing-new-medicines.
    Drug development is one of the most crucial — and time-consuming — processes in medicine. Here's how big data can help.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #big-data, #drug-discovery, #medicine, #drug-development, #artificial-intelligence, #healthcare, #pharmaceutical, #hackernoon-top-story, and more.

    This story was written by: @zacamos. Learn more about this writer by checking @zacamos's about page, and for more stories, please visit hackernoon.com.

    Developing a new medicine takes an average of 12 years, but big data can improve every stage of the process. It helps fuel AI drug discovery, identify underserved needs, streamline clinical trials, and monitor for potential issues.

  • This story was originally published on HackerNoon at: https://hackernoon.com/building-ci-pipeline-with-databricks-asset-bundle-and-gitlab.
    Databricks Asset Bundle streamlines the development of complex data, analytics, and ML projects for the Databricks platform.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #databricks, #gitlab, #devops, #mlops-platforms, #databricks-asset-bundles, #databricks-gui, #how-to-build-a-ci-pipeline, #hackernoon-top-story, and more.

    This story was written by: @neshom. Learn more about this writer by checking @neshom's about page, and for more stories, please visit hackernoon.com.

    In the previous blog, I showed you how to build a CI pipeline using Databricks CLI eXtensions and GitLab. In this post, I will show you how to achieve the same objective with the latest and recommended Databricks deployment framework, Databricks Asset Bundles.

  • This story was originally published on HackerNoon at: https://hackernoon.com/how-im-building-an-ai-for-analytics-service.
    In this article I want to share my experience with developing an AI service for a web analytics platform called Swetrix.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-analysis, #ai, #analytics, #website-traffic, #software-architecture, #machine-learning, #predictive-analytics, #hackernoon-top-story, and more.

    This story was written by: @pro1code1hack. Learn more about this writer by checking @pro1code1hack's about page, and for more stories, please visit hackernoon.com.

    In this article I want to share my experience with developing an AI service for a web analytics platform, called Swetrix. My aim was to develop a machine learning model that would predict future website traffic based on the data displayed on the following screenshot. The end goal is to have a clear vision for the customer of what traffic will appear on their website in the future.

  • This story was originally published on HackerNoon at: https://hackernoon.com/real-time-anomaly-detection-in-underwater-gliders-experimental-evaluation.
    This paper presents a real-time anomaly detection algorithm to enhance underwater glider safety using datasets from actual deployments.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-analysis, #machine-learning, #underwater-gliders, #anomaly-detection, #oceanography, #glider-navigation, #ocean-data, #marine-robotics, and more.

    This story was written by: @oceanography. Learn more about this writer by checking @oceanography's about page, and for more stories, please visit hackernoon.com.

    We apply the anomaly detection algorithm to four glider deployments across the coastal ocean of Florida and Georgia, USA. For evaluation, the anomaly detected by the algorithm is cross-validated by high-resolution glider DBD data and pilot notes. We simulate the online detection process on SBD and compare the result with that detected from DBD.

  • This story was originally published on HackerNoon at: https://hackernoon.com/real-time-anomaly-detection-in-underwater-gliders-abstract-and-intro.
    This paper presents a real-time anomaly detection algorithm to enhance underwater glider safety, using datasets from actual deployments.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-analysis, #machine-learning, #underwater-gliders, #anomaly-detection, #oceanography, #glider-navigation, #ocean-data, #marine-robotics, and more.

    This story was written by: @oceanography. Learn more about this writer by checking @oceanography's about page, and for more stories, please visit hackernoon.com.

    Underwater gliders are widely used in oceanography for a range of applications. However, unpredictable events like shark strikes or remora attachments can lead to abnormal glider behavior or even loss of the instrument. This paper employs an anomaly detection algorithm to assess operational conditions of underwater gliders in the real-world ocean environment. Prompt alerts are provided to glider pilots upon detecting any anomaly.

  • This story was originally published on HackerNoon at: https://hackernoon.com/the-power-of-universal-semantic-layers-insights-from-cube-co-founder-artyom-keydunov.
    What is a universal semantic layer, and how is it different from a semantic layer? Is there actual semantics involved? Who uses that, how, and what for?
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #analytics, #business-intelligence, #data-modeling, #data-integration, #knowledge-graph, #universal-semantic-layer, #data-visualization, #data-warehouses, and more.

    This story was written by: @linked_do. Learn more about this writer by checking @linked_do's about page, and for more stories, please visit hackernoon.com.

    What is a universal semantic layer, and how is it different from a semantic layer? Is there actual semantics involved? Who uses that, how, and what for?

  • This story was originally published on HackerNoon at: https://hackernoon.com/a-comprehensive-guide-to-building-dolphinscheduler-320-production-grade-cluster-deployment.
    In version 3.2.0, DolphinScheduler introduces a series of new features and improvements, significantly enhancing its stability.
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-science, #workflow-management, #opensource, #programming, #dolphinscheduler, #apache-dolphinscheduler, #big-data, #cluster-deployment, and more.

    This story was written by: @zhoujieguang. Learn more about this writer by checking @zhoujieguang's about page, and for more stories, please visit hackernoon.com.

    DolphinScheduler provides powerful workflow management and scheduling capabilities for data engineers by simplifying complex task dependencies. In version 3.2.0, DolphinScheduler introduces a series of new features and improvements, significantly enhancing its stability and availability in production environments.

  • This story was originally published on HackerNoon at: https://hackernoon.com/why-monitoring-a-distributed-database-more-complex-than-you-might-expect.
    Why is monitoring a distributed database more complex than you might expect
    Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #distributed-databases, #opentelemetry, #monitor-distributed-databases, #apache-ignite, #monitoring-systems, #database-monitoring-challenges, #data-acquisition-models, #push-vs-pull-acquisition, and more.

    This story was written by: @ingvard. Learn more about this writer by checking @ingvard's about page, and for more stories, please visit hackernoon.com.

    In this article, we will dive into the complexities of monitoring distributed databases from the perspective of a monitoring system developer. I will try to cover the following topics: managing multiple nodes, network restrictions, and issues related to high throughput caused by a large number of metrics.