Questions tagged [partitioning]

1

votes
2

answer
98

Views

Inconsistent querying on a partitioned CosmosDB collection

I have a partitioned cosmos DB collection which is defined as unlimited with a throughput of 1000. It has the following document structure: 'Id': 'b42129d2-5467-450c-9f7e-744f78dfe1e7', // Primary key 'ArrayOfObjects': [ { // other properties omitted for brevity 'SubId': 'ed2a49fb-51d4-45b4-9690-df0...
Sylvoo
1

votes
0

answer
23

Views

How to detect duplicates in large json file using PySpark HashPartitioner

I have a large json file with over 20GB of json-structured metadata. It contains simple user metadata across some application, and I would like to sift through it to detect duplicates. Here is an example of how the data looks like: {'created': '2015-08-04', 'created_at': '2010-03-15', 'username': 'k...
John Lexus
0

votes
0

answer
20

Views

Partition key for mutual acquaintances recommendations in CosmosDB

When defining a Graph Database in CosmosDB a Partition Key must be specified. The Partition Key is used for sharding the database. Each partition has a hard storage limit of 10GB. As such queries that do writes or reads across partitions are a lot more expensive. I want to use CosmosDb to find mutua...
Aran Mulholland
1

votes
0

answer
6

Views

NULL in column used for range partitioning in Postgres

I have a table partitioned by range in Postgres 10.6. Is there a way to tell one of its partitions to accept NULL for the column used as partition key? The reason I need this is: my table size is 200GB and it's actually not yet partitioned. I want to partition it going forward, so I thought I would...
MondKin
1

votes
1

answer
687

Views

How to combine multiple ORC files (belonging to each partition) in a Partitioned Hive ORC table into a single big ORC file

I have a partitioned ORC table in Hive. After loading the table with all possible partitions I get on HDFS - multiple ORC files i.e. each partition directory on HDFS has an ORC file in it. I need to combine all these ORC files under each partition to a single big ORC file for some use-case. Can some...
Anchit Jatana
1

votes
0

answer
303

Views

Spark coalesce on rdd resulting in less partitions than expected

We are running a spark batch job which performs following operations : Create dataframe by reading from hive table Convert dataframe to rdd Store the rdd into list Above steps are performed for 2 different tables and a variable ( called minNumberPartitions ) is set which holds the minimum number of...
Debanjan Dhar
1

votes
0

answer
47

Views

Oracle execution plan differs if using partition extended syntax

I am querying in Oracle 12c a large subpartitioned table which has statistics on the table and partition level but nothing gathered on the subpartition level. I get notably different explain plan results for different partition syntax, presumably because one of these relies on the nonexistent subpar...
Caitlin M. Shaw
1

votes
0

answer
100

Views

Erlang mnesia node getting isolated from cluster

I have an erlang(release 17.3) mnesia cluster of 3 nodes running in 1 datacenter with disk+ram based tables. Once in a while I would see that one node at random,say A, would show other 2 nodes as stopped(stopped_db_nodes). Also other 2 nodes, say B and C would show A in stopped_db_nodes. This basica...
Saurav Prakash
1

votes
0

answer
31

Views

Avoid chunk / batch processing in Spark

Often I am encountering a pattern of dividing Big processing steps in batches when these steps can't be processed entirely in our Big Data Spark cluster. For instance, we have a large cross join or some calculus that fails when done with all the input data and then we usually are dividing these spar...
1

votes
0

answer
428

Views

What is the initial number of partitions created for a dataframe?

I am new to Spark. I am trying to understand the number of partitions produced by default by a hiveContext.sql('query') statement. I know that we can repartition the dataframe after it has been created using df.repartition. But, what is the number of partitions produced by default when the dataframe...
Hemanth
1

votes
0

answer
96

Views

How to calculated a table partition key from alphanumeric string?

Goal: Create 300 partition table that evenly distributes records on a table. The primary key is an email address plus the partition key [1-300]. We are not permitted to use a hash partitioned table due to performance issues. Doing this for number heavy values is easy: SQL: MOD(NVL(REGEXP_REPLACE(fi...
ScrappyDev
1

votes
1

answer
51

Views

Calculate Total Ending Quantity by using previous and next row value (LAG & LEAD) in SQL Server

Calculate Total Ending Quantity by using previous and next row value (LAG & LEAD) in SQL Server. Here is the input data. Input Data Date Account Type Quantity 12/28/2007 A 2N 719 3/28/2008 A 2N 806 6/27/2008 A 2N 622 9/26/200...
rachel
1

votes
0

answer
206

Views

Difference between MySQL Range partitioning RANGE TO_DAY(DtCol) vs RANGE COLUMN(DtCol)

I would like to understand, if there is any difference between RANGE TO_DAY(DateCol) and RANGE COLUMN(DateCol) MySQL RANGE partitioning by Dates. MySQL Version : 5.7.12 Sample test scripts: CREATE TABLE log_tbl_1 ( id bigint(20) NOT NULL AUTO_INCREMENT, stime datetime not NULL, primary KEY id (id, s...
Hariharan Suresh
1

votes
0

answer
131

Views

SQL Server : the partition scheme cannot be changed because there exists one or more incremental statistics on the table

I get this error and I cannot find solution for it The partition scheme cannot be changed because there exists one or more incremental statistics on the table The background: SQL Server 2017 I have a clustered columnstore index, partitioned by product I try to change to different partition scheme (b...
Yorik
1

votes
0

answer
643

Views

Partition existing tables using PostgreSQL 10

I have gone through a bunch of documentation for PostgresSQL 10 partitioning but I am still not clear on whether existing tables can be partitioned. Most of the posts mention about partitioning existing tables using PostgreSQL 9. Also, in the official PostgresSQL website : https://www.postgresql.org...
user1715513
1

votes
1

answer
40

Views

Alternative to the default hashpartioner provided with hadoop

I have a hadoop MapReduce program that distributes keys unevenly. Some reducers end up with two keys, some with one key, and some with none. how do I force hadoop to distribute each partition with a certain key to a separate reducer. I have nine unique keys of the form: 0,0 0,1 0,2 1,0 1,1 1,2 2,0 2...
zaranaid
1

votes
0

answer
186

Views

Partitioned table on timestamp::date query scanning all partitions

Problem setting (Postgesql 9.6) I have one table where partitioning works as I intended(1), one that does not(2): CASE(1) Table, partitioning on s_date timestamp without time zone NOT NULL CREATE TABLE 'diagnoseAW'.'AWIORECORDERAWCOMMAND' ( 'ebpZone' text COLLATE pg_catalog.'default' NOT NULL, 'elem...
1

votes
1

answer
363

Views

Choosing partition key in DynamoDB

I am writing a service that queries some occupation data from remote stations in carparks and storing it in DynamoDB. This is a sample dataset: 2018-05-01T10:57:15 1 Azrieli Sarona 1242 478 712 0 1 3 1 מפלס -2 2018-05-01T10:57:16 171 11 159 0 0 1 What is the best way to define a partition key fo...
Alex Gill
1

votes
0

answer
49

Views

SQL Partition Data by Date Range ignoring date gaps and weekends

Thank you in advance for your patience, and help! I am trying to partition my data in a way that displays date ranges. IMAGE: Data Set - Current Results - Desired Results In the image you can see what my data set looks like. The results I'm currently getting. As well as, the results I would like to...
1

votes
1

answer
375

Views

Delete data from a specific partition in SQL Server 2012

I would like to delete data from a specific partition using the partition ID. I got queries to truncate data from specific partition for SQL Server 2016 but did not find any query for lower versions. I tried below query to delete only data from partitions with partition id 14 and 15. DELETE FROM pa...
l.lijith
1

votes
0

answer
92

Views

Is it possible to make a list partition in PostgreSQL based on a join of the partition list key?

Consider the following tables Table 'public.Foo' Column | Type | ------------------+-----------------------------+ foo_id | integer | PK bar_id | integer | FK to bars .... Table 'public.Bar' Column |...
Whelchel
1

votes
2

answer
358

Views

Why select result takes long time in partitioned table in postgreSql?

I have a daily partitioned table in postgresql. It uses cdr_date for partitioning. When I select a simple query, it takes a long time I dont know why! this is a simple sql EXPLAIN (ANALYZE , BUFFERS ) select * FROM cdr WHERE cdr_date >= '2018-05-24 11:59:00.937000 +00:00' AND cdr_date = ''2018-05-24...
N'bia
1

votes
0

answer
44

Views

HIVE - increment value on column change

I'm just basically trying to add a column with a unique identifier for a journey. I have a table that looks similar to this: Time id station newtrip 2017-11-15 16:45 100 St.George TRUE 2017-11-15 16:46 100 Bloor FALSE 2017-11-15 16:47 110 Wellesley TRUE 2017-1...
Alexander Witte
1

votes
2

answer
249

Views

Gathering statistics on partitioned tables

Ultimately i need to know if this will be enough. In oracle, there is a setting on a table to incrementally gather statistics, rather than a full table. Basically, it will only gather stats on partitions where the data has changed. We need to make sure all partitioned tables have INCREMENTAL set t...
user9766188
1

votes
2

answer
129

Views

Hive Partitioning validation

I have created a partitioned hive table. I inserted data into this table. Now suppose I execute one select * query using where clause then how can i make sure that hive query is using partioning?
Kamal Tomar
1

votes
1

answer
135

Views

Hive partitioning using current date

I have some sample date like this 1,prasad,Newyork 2,Tarak,Mexico I want to load this data in to hive table using partition using current date and when I load this data tomorrow again it should be partitioned by using tomorrow's date. is this possible to achieve this in hive.????
Venkat J
1

votes
1

answer
265

Views

BigQuery - Converting from non-partitioned to partitioned table - Legacy SQL

I've looked at previous questions, but the links given to GCP were outdated so I would like to learn what is the best way to do the conversion while inserting the correct partition (meaning not the day i inserted the records, but according to the 'date' column. Could someone point me in the right d...
ShiraP
1

votes
0

answer
50

Views

partitioning mariadb for large table error

We have problem that a table is growing fast an has now 270M records, we dont need data older than a week so we want to delete them on a twee INNODB galera cluster on Mariadb server latest version. We have figured out partitioning is the best solution (maybe there are other solutions please let me k...
Maestroi
1

votes
2

answer
70

Views

Kafka Consumer distribution not working as expected

I have Three topics each having three partitions on a cluster of kafka. now, there are total 9 partitions. and when i create 9 consumers... the 6 are being idle. only three consumers are being used. the expectation is: each consumer should pickup one partitions and hence, 9 consumer should pick up d...
Radhi
1

votes
0

answer
60

Views

Create partition function based on join with other table

I have 2 tables: TableA ( Id uniqueidentifier, Date datetime ) TabelB ( Body XML, KeyA uniqueidentifier ) And I want to create partition for both tables based on date field in TableA. I checked documentation but can't find any clue about it. Is it possible to do?
Oleg L
1

votes
1

answer
43

Views

I can't get the complete list of tables using jooq 3.9.1 from Postgres databse

I am creating the table as follows: CREATE TABLE plist1 (c1 NUMERIC, c2 VARCHAR(10)) PARTITION BY LIST (c1) When I tried to read the complete list of tables from the Postgres database including the master tables that we have used for partitioning. Interestingly when I have used the stand alone progr...
Patan
1

votes
0

answer
258

Views

NoSQL database design using single table

How satisfied are you with the statement 'You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.' ? I have listed down my use cases for a NoSQL design but restricting myself for a single table design makes the design complex a...
Alok Sharma
1

votes
1

answer
29

Views

Oracle table is locked for inserts during a procedure execution but allows updates

We have a procedure which does operation on multiple records in multiple tables, in some cases the procedure needs to insert in a table named TAX, when this insert happens another table named PAYMENT is locked for insertion, but not for update. So if another transaction tries to insert in PAYMENT it...
Amir Pashazadeh
1

votes
0

answer
13

Views

create a lineal-log-log-lineal partition of interval [a,b] in matlab, but matching steps

i want to create a vector x from a to b, in the following way. I have an interest point c , a
kurokirasama
1

votes
1

answer
39

Views

Diskpart directing resulting output to second text file

Have the following, which works great: echo BASEBOARD>>%computername%.txt wmic /APPEND:'%computername%.txt' baseboard get Manufacturer, Model, Name, PartNumber, slotlayout, serialnumber, poweredon echo BIOS>>%computername%.txt wmic /APPEND:'%computername%.txt' bios get name, version, serialnumber, I...
Leptonator
1

votes
0

answer
69

Views

np.argpartition when sorting 2d arrays and nan

Consult np.argpartition when sorting 2d arrays, can you exclude the influence of nan? Such as: a = np.array([ 1., -1., nan, 0., nan, 2., 4., -2., -10., nan]) # Get effective Element Center np.argpartition(a, (~np.isnan(a)).sum()//2)] But what if it's a 2d array query?
weidong
1

votes
0

answer
26

Views

Can I partition by id that already exists on worker?

I am coding a streaming fault detection app for several counters, with input: RDD(Int, BreezeDenseMatrix[Double]). For every RDD, i want to do some computations and write the RDD in a textfile in hdfs to compare it with the next new RDD. What i want to do is when the new RDD has arrived, read the t...
mkey
1

votes
1

answer
289

Views

uniformly partition a rdd in spark

I have a text file in HDFS, which has about 10 million records. I am trying to read the file do some transformations on that data. I am trying to uniformly partition the data before I do the processing on it. here is the sample code var myRDD = sc.textFile('input file location') myRDD = myRDD.repar...
Sudharnath
1

votes
1

answer
53

Views

Partitioning Not working fine in Google BigQuery when SQL Query contains a sub-query

I have the following table structure in the Big-query **query_all_partition** property_unique_date DATE REQUIRED page_url STRING REQUIRED click INTEGER REQUIRED impression INTEGER REQUIRED position FLOAT REQUIRED Here, I have specified partitioning over property...
abhiphanse
1

votes
1

answer
19

Views

Do I need to include partition name in the query to get the actual benefits of partitioning?

I have restructured one of my data tables (DeviceLogs) with range partition by month using date (LogDate) field. Following is a minimal version of my table. UUID | LogDate | DeviceId | Counter ------|----------------------|-----------|--------- xxxx | 2018-08-21 15:00:00 | 23...
BlueBird

View additional questions