Pulse · apache/spark · GitHub

July 2, 2022 – July 9, 2022

Overview

29 Active pull requests

0 Active issues
- 0 Merged pull requests
- 29 Open pull requests
- 0 Closed issues
- 0 New issues

29 Pull requests opened by 21 people

[WIP][SPARK-39660][SQL] Support v2 `DESCRIBE TABLE .. PARTITION`
#37055 opened Jul 2, 2022
[SPARK-39665][INFRA] Bump workflow versions in GitHub Actions
#37056 opened Jul 2, 2022
[SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding
#37065 opened Jul 3, 2022
[SPARK-39667][SQL] Add another workaround when there is not enough memory to build and broadcast the table
#37069 opened Jul 4, 2022
[SPARK-39672][SQL][3.1] Don't remove project before filter for IN or correlated EXISTS subquery
#37074 opened Jul 4, 2022
[SparkConnect] Initial Protobuf Definitions
#37075 opened Jul 4, 2022
[SPARK-35208][SQL][DOCS] Add docs for LATERAL subqueries
#37080 opened Jul 5, 2022
[SPARK-39678][SQL] Improve stats estimation for v2 tables
#37083 opened Jul 5, 2022
[SPARK-39690][SQL] Fixes Reuse exchange across subqueries with AQE if subquery side exchange materialized first
#37098 opened Jul 6, 2022
[SPARK-37287][SQL] Pull out dynamic partition and bucket sort from FileFormatWriter
#37099 opened Jul 6, 2022
[SPARK-39694][TESTS] Change to use the `${projectName}/Test/runMain` to execute Benchmarks
#37102 opened Jul 6, 2022
[SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the topKSortFallbackThreshold
#37104 opened Jul 6, 2022
[SPARK-39700][SQL] Deprecate Catalog API that has input parameters (dbName, tableName/FunctionName)
#37105 opened Jul 6, 2022
[SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel
#37110 opened Jul 7, 2022
[SPARK-39704][SQL] Implement createIndex & dropIndex & indexExists in JDBC (H2 dialect)
#37112 opened Jul 7, 2022
[WIP] Supports url encode/decode function
#37113 opened Jul 7, 2022
[SPARK-39706][SQL] Set missing column with defaultValue as constant in `ParquetColumnVector`
#37115 opened Jul 7, 2022
[SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions
#37116 opened Jul 7, 2022
[WIP][SPARK-39714][python] Try to fix the mypy annotation tests
#37117 opened Jul 7, 2022
[SPARK-38910][YARN][FOLLOWUP] Unmanaged AM should clean staging dir before unregister
#37119 opened Jul 7, 2022
[SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging
#37123 opened Jul 8, 2022
[SPARK-39385][SQL] Supports push down `REGR_AVGX` and `REGR_AVGY`
#37126 opened Jul 8, 2022
[SPARK-39710][SQL] Support push local topK through outer join
#37129 opened Jul 8, 2022
[SPARK-39719][R] Make databaseExists/listTables/tables/tableNames in SparkR support 3L namespace
#37132 opened Jul 8, 2022
[SPARK-39720][R] Implement tableExists/getTable in SparkR for 3L namespace
#37133 opened Jul 8, 2022
[SPARK-39723][R] Implement functionExists in SparkR for 3L namespace
#37135 opened Jul 8, 2022
[SPARK-39724][CORE] Remove duplicate `.setAccessible(true)` call in `kvstore.KVTypeInfo`
#37136 opened Jul 8, 2022
[SPARK-39726][SQL] Change the default value of spark.sql.execution.topKSortFallbackThreshold to 800000
#37140 opened Jul 8, 2022
[SPARK-39024] Notify External Shuffle Service when Yarn Sends a Node in Decommissioning State
#37141 opened Jul 8, 2022

31 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[SPARK-39625][SPARK-38904][SQL] Add Dataset.as(StructType)
#37011 commented on Jul 5, 2022 • 15 new comments
[SPARK-39624][SQL] Support coalesce partition through CartesianProduct
#37014 commented on Jul 8, 2022 • 13 new comments
[SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager
#37052 commented on Jul 7, 2022 • 13 new comments
[SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv
#37009 commented on Jul 8, 2022 • 11 new comments
[SPARK-39639][SQL] Fix possible null pointer in MySQLDialect listIndexes
#37031 commented on Jul 8, 2022 • 8 new comments
[SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing
#36995 commented on Jul 8, 2022 • 6 new comments
[SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic
#37040 commented on Jul 9, 2022 • 6 new comments
[SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart
#35906 commented on Jul 5, 2022 • 5 new comments
[SPARK-39469][SQL] Infer date type for CSV schema inference
#36871 commented on Jul 8, 2022 • 4 new comments
[SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down
#36991 commented on Jul 7, 2022 • 4 new comments
[SPARK-39452][GraphX] Extend EdgePartition1D with Destination based Strategy
#37053 commented on Jul 8, 2022 • 4 new comments
[MINOR][SQL] Add docstring for function pyspark.sql.functions.timestamp_seconds
#36944 commented on Jul 6, 2022 • 3 new comments
[SPARK-39312][SQL] Use parquet native In predicate for in filter push down
#36696 commented on Jul 8, 2022 • 2 new comments
[SPARK-39654][PYTHON] Parameters quotechar and escapechar need limitation in read_csv
#37044 commented on Jul 5, 2022 • 2 new comments
[SPARK-38041][SQL] DataFilter pushed down dynamically
#35669 commented on Jul 8, 2022 • 1 new comment
[SPARK-38946][PYTHON][PS] Generates a new dataframe instead of operating inplace in setitem
#36353 commented on Jul 8, 2022 • 1 new comment
[SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred
#36505 commented on Jul 8, 2022 • 1 new comment
[SPARK-39380][SQL] Ignore comment syntax in dfs command
#36768 commented on Jul 8, 2022 • 1 new comment
[SPARK-39398][GRAPHX]message checkpointer support storage level
#36806 commented on Jul 8, 2022 • 1 new comment
[SPARK-39494][PYTHON] Support `createDataFrame` from a list of scalars when schema is not provided
#36893 commented on Jul 8, 2022 • 1 new comment
[SPARK-39541][YARN]Diagnostics of yarn UI did not display the exception of driver when driver exit before regiserAM
#36952 commented on Jul 7, 2022 • 1 new comment
[SPARK-39600][SQL] Enhance pushdown limit through window
#36990 commented on Jul 6, 2022 • 1 new comment
[SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT
#37001 commented on Jul 6, 2022 • 1 new comment
[SPARK-39617][CORE] Driver cores mult be a positive number fix
#37016 commented on Jul 7, 2022 • 1 new comment
[SPARK-39640][SQL] Improve window statistics estimation
#37032 commented on Jul 6, 2022 • 1 new comment
[SPARK-39642][SQL] Improve Expand statistics estimation
#37034 commented on Jul 5, 2022 • 1 new comment
[SPARK-39655][SQL] Add a config to limit CartesianProductExec's partition number
#37048 commented on Jul 8, 2022 • 1 new comment
[SPARK-38814][BUILD][TESTS] Migrate Junit 4 to Junit 5
#36078 commented on Jul 6, 2022 • 0 new comments
[SPARK-38909][CORE][YARN] Encapsulate `LevelDB` used by `ExternalShuffleBlockResolver` and `YarnShuffleService` as `DB`
#36200 commented on Jul 5, 2022 • 0 new comments
[POC][SPARK-39522][INFRA] Uses Docker image cache over a custom image
#36980 commented on Jul 5, 2022 • 0 new comments
[SPARK-39522][INFRA] Uses Docker image cache over a custom image in sparkr job
#37006 commented on Jul 4, 2022 • 0 new comments