Spark RDD - Search

About 99,400 results

Open links in new tab

Any time

apache.org
https://spark.apache.org › docs › latest › rdd-programming-guide.html
RDD Programming Guide - Spark 3.5.4 Documentation
Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. There are two ways to create RDDs: parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system, such as a shared filesystem ...
apache.org
https://spark.apache.org › docs › latest › api › java › org › apache › spar…
RDD (Spark 3.5.4 JavaDoc) - Apache Spark
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. This class contains the basic operations available on all RDDs, such as map, filter, and persist.
apache.org
https://spark.apache.org › docs › latest › api › python › reference › api › p…
pyspark.RDD — PySpark 3.5.4 documentation - Apache Spark
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. Methods
apache.org
https://spark.apache.org › docs › api › core › spark › RDD.html
spark.RDD
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. This class contains the basic operations available on all RDDs, such as map, filter, and persist.
apache.org
https://spark.apache.org › docs › latest › api › scala › org › apache › spar…
Spark 3.5.4 ScalaDoc - org.apache.spark.rdd.RDD
Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.
apache.org
https://spark.apache.org › docs › latest › quick-start.html
Quick Start - Spark 3.5.4 Documentation - Apache Spark
Note that, before Spark 2.0, the main programming interface of Spark was the Resilient Distributed Dataset (RDD). After Spark 2.0, RDDs are replaced by Dataset, which is strongly-typed like an RDD, but with richer optimizations under the hood.
apache.org
https://spark.apache.org › docs › latest › sql-programming-guide.html
Spark SQL and DataFrames - Spark 3.5.4 Documentation - Apache …
Spark SQL, DataFrames and Datasets Guide. Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed.
apache.org
https://spark.apache.org › docs › api › core › spark › RDD.html
spark.RDD
A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. This class contains the basic operations available on all RDDs, such as map, filter, and persist.
apache.org
https://spark.apache.org › examples.html
Examples - Apache Spark
Spark RDD Example. The Spark RDD APIs are suitable for unstructured data. The Spark DataFrame API is easier and more performant for structured data. Suppose you have a text file called some_text.txt with the following three lines of data:
apache.org
https://spark.apache.org › docs › latest › api › python › reference › api › p…
pyspark.RDD.mean — PySpark 3.5.4 documentation - Apache Spark
pyspark.RDD.mean¶ RDD.mean → float [source] ¶ Compute the mean of this RDD’s elements.
Pagination
- 1
- 2
- 3
- 4
- Next