Introduction to Presto (PrestoDB)

Presto (or PrestoDB) is an open source, distributed SQL query engine, designed from the ground up for fast analytic queries against data of any size. It supports both non-relational sources, such as the Hadoop Distributed File System (HDFS), Amazon S3, Cassandra, MongoDB, and HBase, and relational data sources such as MySQL, PostgreSQL, Amazon Redshift, Microsoft SQL Server, and Teradata.

Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds. You’ll find it used by many well-known companies like Facebook, Airbnb, Netflix, Atlassian, and Nasdaq.

What is the history of Presto?

Presto started as a project at Facebook, to run interactive analytic queries against a 300PB data warehouse, built with large Hadoop/HDFS-based clusters. Prior to building Presto, Facebook used Apache Hive, which it created and rolled out in 2008, to bring the familiarity of the SQL syntax to the Hadoop ecosystem. Hive had a significant impact on the Hadoop ecosystem for simplifying complex Java MapReduce jobs into SQL-like queries, while being able to execute jobs at high scale. However, it wasn’t optimized for fast performance needed in interactive queries.

In 2012, the Facebook Data Infrastructure group built Presto, an interactive query system that could operate quickly at petabyte scale. It was rolled out company-wide in spring, 2013. In November, 2013, Facebook open sourced Presto under the Apache Software License, and made it available for anyone to download on Github. Today, Presto has become a popular choice for doing interactive queries on Hadoop, and has a lot of contributions from Facebook, and other organizations. Facebook’s implementation of Presto is used by over a thousand employees, who run more than 30,000 queries, processing one petabyte of data daily.

How does Presto work?

Presto is a distributed system that runs on Hadoop, and uses an architecture similar to a classic massively parallel processing (MPP) database management system. It has one coordinator node working in synch with multiple worker nodes. Users submit their SQL query to the coordinator which uses a custom query and execution engine to parse, plan, and schedule a distributed query plan across the worker nodes. It is designed to support standard ANSI SQL semantics, including complex queries, aggregations, joins, left/right outer joins, sub-queries, window functions, distinct counts, and approximate percentiles.

After the query is compiled, Presto processes the request into multiple stages across the worker nodes. All processing is in-memory, and pipelined across the network between stages, to avoid any unnecessary I/O overhead. Adding more worker nodes allows for more parallelism, and faster processing.

To make Presto extensible to any data source, it was designed with storage abstraction to make it easy to build pluggable connectors. Because of this, Presto has a lot of connectors, including to non-relational sources like the Hadoop Distributed File System (HDFS), Amazon S3, Cassandra, MongoDB, and HBase, and relational sources such as MySQL, PostgreSQL, Amazon Redshift, Microsoft SQL Server, and Teradata. The data is queried where it is stored, without the need to move it into a separate analytics system.

Presto and Hadoop

Presto is an open source, distributed SQL query engine designed for fast, interactive queries on data in HDFS, and others. Unlike Hadoop/HDFS, it does not have its own storage system. Thus, Presto is complimentary to Hadoop, with organizations adopting both to solve a broader business challenge. Presto can be installed with any implementation of Hadoop, and is packaged in the Amazon EMR Hadoop distribution.

Who uses Presto?

Presto is used in production at very large scale at many well-known organizations. You’ll find it used at Facebook, Airbnb, Netflix, Atlassian, Nasdaq, and many more. Facebook’s implementation of Presto is used by over a thousand employees, who run more than 30,000 queries, processing one petabyte of data daily. On average, Netflix runs around 3,500 queries per day on its Presto clusters. Airbnb built and open sourced, Airpal, a web-based query execution tool that works on top of Presto. The broader Presto community can be found on this forum and on the Presto page on Facebook.

Deploying Presto in the Cloud

Presto is an ideal workload in the cloud, because the cloud provides performance, scalability, reliability, availability, and massive economies of scale. You can launch a Presto cluster in minutes. You don’t need to worry about node provisioning, cluster setup, Presto configuration, or cluster tuning.

Build your Presto implementation in the cloud on Amazon Web Services

Amazon EMR and Amazon Athena are the best places to deploy Presto in the cloud, because it does the integration, and testing rigor of Presto for you, with the scale, simplicity, and cost effectiveness of AWS. With Amazon EMR, you can launch Presto clusters in minutes without needing to do node provisioning, cluster setup, Presto configuration, or cluster tuning. EMR enables you to provision one, hundreds, or thousands of compute instances in minutes. Amazon Athena lets you deploy Presto using the AWS Serverless platform, with no servers, virtual machines, or clusters to setup, manage, or tune. Simply point to your data at Amazon S3, define the schema, and start querying using the built-in query editor, or with your existing Business Intelligence (BI) tools. Athena automatically parallelizes your query, and dynamically scales resources for queries to run quickly. You pay only for the queries that you run.

Presto что это за программа и нужна ли она

Presto! PageManager — это приложение для управления документами в Macintosh. Благодаря совместимости с большинством редакторов изображений и текстовых редакторов Presto! PageManager обеспечивает полный контроль файлов Macintosh. Можно легко управлять документами, редактировать электронную почту и файлы и выполнять чтение документов с помощью программы оптического распознавания текста, встроенной в программу Presto! PageManager.

При использовании аппарата в качестве сканера рекомендуется установить Presto! PageManager. Инструкции по установке программы Presto! PageManager см. в Руководстве по быстрой установке .

Примечание

Полное руководство пользователя NewSoft Presto! PageManager 7 можно просмотреть при помощи значка Help (Справка) в NewSoft Presto! PageManager 7.

Функции

Оптическое распознавание текста: в один этап вы сканируете изображение, распознаете текст и редактируете его при помощи текстового редактора.

Pедактирование изображения: улучшение, обрезка и поворот изображений или их открытие в выбранном редакторе изображений.

Fast and Reliable SQL Engine for Data Analytics and the Open Lakehouse

See how some the largest internet-scale companies are using Presto today. It doesn’t matter if you’re operating at Meta-like scale or at just a few nodes — Presto is for everyone!

300 PB data lakehouse
1K daily active users
30 K queries/day
2 regions
20 clusters
8 K nodes
7 K weekly active users
100 M+ queries/day
50 PB HDFS bytes read/day
10 K+ compute cores
1M queries/day

See the Powered by Presto page

Federate queries and query data where it lives — data lakes, lakehouses, and more

Presto can query relational & NoSQL databases, data warehouses, data lakes and more and has dozens of connectors available today. It also allows querying data where it lives and a single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.

Blazing fast analytics

Presto is an in-memory distributed SQL engine, faster than other compute engines in the disaggregated stack

Standardize your SQL with one engine

Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users. With Presto, you get 1 familar ANSI SQL language and 1 engine for your data analytics so you don’t need to graduate to another lakehouse engine.

Open source

Presto is a neutrally governed open source project under The Linux Foundation with dozens of member companies (and growing!). Run Presto wherever you want, on-prem or in any cloud.

Presto Users & Contributors

Join the Presto community

There’s a lot of ways to get involved with the Presto community. Join our Slack channel to connect with other Presto engineers and users, contribute to the project, and join our Virtual Meetup to stay up to date on community events.

The Presto Foundation

The Presto Foundation is the organization that oversees the development of the Presto open source project. Presto is an independent open-source project and not controlled by any single company. Members of the Presto Foundation provide essential financial support for the collaborative development process, including tooling, infrastructure, and community conferences. Learn more and become a Member.

При подготовке материала использовались источники:
https://aws.amazon.com/ru/big-data/what-is-presto/
https://support.brother.com/g/s/id/htmldoc/mfc/dcp8070d/ru/html/sug/chapter9_3.html
https://prestodb.io/