Questions tagged [partitioning]
1489 questions
1
votes
2
answer
98
Views
Inconsistent querying on a partitioned CosmosDB collection
I have a partitioned cosmos DB collection which is defined as unlimited with a throughput of 1000. It has the following document structure:
'Id': 'b42129d2-5467-450c-9f7e-744f78dfe1e7', // Primary key
'ArrayOfObjects': [
{
// other properties omitted for brevity
'SubId': 'ed2a49fb-51d4-45b4-9690-df0...
1
votes
0
answer
23
Views
How to detect duplicates in large json file using PySpark HashPartitioner
I have a large json file with over 20GB of json-structured metadata. It contains simple user metadata across some application, and I would like to sift through it to detect duplicates. Here is an example of how the data looks like:
{'created': '2015-08-04', 'created_at': '2010-03-15', 'username': 'k...
0
votes
0
answer
20
Views
Partition key for mutual acquaintances recommendations in CosmosDB
When defining a Graph Database in CosmosDB a Partition Key must be specified. The Partition Key is used for sharding the database. Each partition has a hard storage limit of 10GB. As such queries that do writes or reads across partitions are a lot more expensive. I want to use CosmosDb to find mutua...
1
votes
0
answer
6
Views
NULL in column used for range partitioning in Postgres
I have a table partitioned by range in Postgres 10.6. Is there a way to tell one of its partitions to accept NULL for the column used as partition key?
The reason I need this is: my table size is 200GB and it's actually not yet partitioned. I want to partition it going forward, so I thought I would...
1
votes
1
answer
687
Views
How to combine multiple ORC files (belonging to each partition) in a Partitioned Hive ORC table into a single big ORC file
I have a partitioned ORC table in Hive. After loading the table with all possible partitions I get on HDFS - multiple ORC files i.e. each partition directory on HDFS has an ORC file in it. I need to combine all these ORC files under each partition to a single big ORC file for some use-case.
Can some...
1
votes
0
answer
303
Views
Spark coalesce on rdd resulting in less partitions than expected
We are running a spark batch job which performs following operations :
Create dataframe by reading from hive table
Convert dataframe to rdd
Store the rdd into list
Above steps are performed for 2 different tables and a variable ( called minNumberPartitions ) is set which holds the minimum number of...
1
votes
0
answer
47
Views
Oracle execution plan differs if using partition extended syntax
I am querying in Oracle 12c a large subpartitioned table which has statistics on the table and partition level but nothing gathered on the subpartition level. I get notably different explain plan results for different partition syntax, presumably because one of these relies on the nonexistent subpar...
1
votes
0
answer
100
Views
Erlang mnesia node getting isolated from cluster
I have an erlang(release 17.3) mnesia cluster of 3 nodes running in 1 datacenter with disk+ram based tables.
Once in a while I would see that one node at random,say A, would show other 2 nodes as stopped(stopped_db_nodes). Also other 2 nodes, say B and C would show A in stopped_db_nodes. This basica...
1
votes
0
answer
31
Views
Avoid chunk / batch processing in Spark
Often I am encountering a pattern of dividing Big processing steps in batches when these steps can't be processed entirely in our Big Data Spark cluster.
For instance, we have a large cross join or some calculus that fails when done with all the input data and then we usually are dividing these spar...
1
votes
0
answer
428
Views
What is the initial number of partitions created for a dataframe?
I am new to Spark. I am trying to understand the number of partitions produced by default by a hiveContext.sql('query') statement. I know that we can repartition the dataframe after it has been created using df.repartition. But, what is the number of partitions produced by default when the dataframe...
1
votes
0
answer
96
Views
How to calculated a table partition key from alphanumeric string?
Goal:
Create 300 partition table that evenly distributes records on a table.
The primary key is an email address plus the partition key [1-300].
We are not permitted to use a hash partitioned table due to performance issues.
Doing this for number heavy values is easy:
SQL: MOD(NVL(REGEXP_REPLACE(fi...
1
votes
1
answer
51
Views
Calculate Total Ending Quantity by using previous and next row value (LAG & LEAD) in SQL Server
Calculate Total Ending Quantity by using previous and next row value (LAG & LEAD) in SQL Server. Here is the input data.
Input Data
Date Account Type Quantity
12/28/2007 A 2N 719
3/28/2008 A 2N 806
6/27/2008 A 2N 622
9/26/200...
1
votes
0
answer
206
Views
Difference between MySQL Range partitioning RANGE TO_DAY(DtCol) vs RANGE COLUMN(DtCol)
I would like to understand, if there is any difference between RANGE TO_DAY(DateCol) and RANGE COLUMN(DateCol) MySQL RANGE partitioning by Dates.
MySQL Version : 5.7.12
Sample test scripts:
CREATE TABLE log_tbl_1 (
id bigint(20) NOT NULL AUTO_INCREMENT,
stime datetime not NULL,
primary KEY id (id, s...
1
votes
0
answer
131
Views
SQL Server : the partition scheme cannot be changed because there exists one or more incremental statistics on the table
I get this error and I cannot find solution for it
The partition scheme cannot be changed because there exists one or more incremental statistics on the table
The background:
SQL Server 2017
I have a clustered columnstore index, partitioned by product
I try to change to different partition scheme (b...
1
votes
0
answer
643
Views
Partition existing tables using PostgreSQL 10
I have gone through a bunch of documentation for PostgresSQL 10 partitioning but I am still not clear on whether existing tables can be partitioned. Most of the posts mention about partitioning existing tables using PostgreSQL 9.
Also, in the official PostgresSQL website : https://www.postgresql.org...
1
votes
1
answer
40
Views
Alternative to the default hashpartioner provided with hadoop
I have a hadoop MapReduce program that distributes keys unevenly.
Some reducers end up with two keys, some with one key, and some with none.
how do I force hadoop to distribute each partition with a certain key to a separate reducer. I have nine unique keys of the form:
0,0
0,1
0,2
1,0
1,1
1,2
2,0
2...
1
votes
0
answer
186
Views
Partitioned table on timestamp::date query scanning all partitions
Problem setting (Postgesql 9.6)
I have one table where partitioning works as I intended(1), one that does not(2):
CASE(1)
Table, partitioning on s_date timestamp without time zone NOT NULL
CREATE TABLE 'diagnoseAW'.'AWIORECORDERAWCOMMAND'
(
'ebpZone' text COLLATE pg_catalog.'default' NOT NULL,
'elem...
1
votes
1
answer
363
Views
Choosing partition key in DynamoDB
I am writing a service that queries some occupation data from remote stations in carparks and storing it in DynamoDB. This is a sample dataset:
2018-05-01T10:57:15
1
Azrieli Sarona
1242
478
712
0
1
3
1
מפלס -2
2018-05-01T10:57:16
171
11
159
0
0
1
What is the best way to define a partition key fo...
1
votes
0
answer
49
Views
SQL Partition Data by Date Range ignoring date gaps and weekends
Thank you in advance for your patience, and help!
I am trying to partition my data in a way that displays date ranges.
IMAGE: Data Set - Current Results - Desired Results
In the image you can see what my data set looks like. The results I'm currently getting. As well as, the results I would like to...
1
votes
1
answer
375
Views
Delete data from a specific partition in SQL Server 2012
I would like to delete data from a specific partition using the partition ID. I got queries to truncate data from specific partition for SQL Server 2016 but did not find any query for lower versions.
I tried below query to delete only data from partitions with partition id 14 and 15.
DELETE FROM pa...
1
votes
0
answer
92
Views
Is it possible to make a list partition in PostgreSQL based on a join of the partition list key?
Consider the following tables
Table 'public.Foo'
Column | Type |
------------------+-----------------------------+
foo_id | integer | PK
bar_id | integer | FK to bars
....
Table 'public.Bar'
Column |...
1
votes
2
answer
358
Views
Why select result takes long time in partitioned table in postgreSql?
I have a daily partitioned table in postgresql. It uses cdr_date for partitioning. When I select a simple query, it takes a long time I dont know why!
this is a simple sql
EXPLAIN (ANALYZE , BUFFERS )
select * FROM cdr
WHERE cdr_date >= '2018-05-24 11:59:00.937000 +00:00'
AND cdr_date = ''2018-05-24...
1
votes
0
answer
44
Views
HIVE - increment value on column change
I'm just basically trying to add a column with a unique identifier for a journey. I have a table that looks similar to this:
Time id station newtrip
2017-11-15 16:45 100 St.George TRUE
2017-11-15 16:46 100 Bloor FALSE
2017-11-15 16:47 110 Wellesley TRUE
2017-1...
1
votes
2
answer
249
Views
Gathering statistics on partitioned tables
Ultimately i need to know if this will be enough. In oracle, there is a setting on a table to incrementally gather statistics, rather than a full table. Basically, it will only gather stats on partitions where the data has changed. We need to make sure all partitioned tables have INCREMENTAL set t...
1
votes
2
answer
129
Views
Hive Partitioning validation
I have created a partitioned hive table. I inserted data into this table. Now suppose I execute one select * query using where clause then how can i make sure that hive query is using partioning?
1
votes
1
answer
135
Views
Hive partitioning using current date
I have some sample date like this
1,prasad,Newyork
2,Tarak,Mexico
I want to load this data in to hive table using partition using current date and when I load this data tomorrow again it should be partitioned by using tomorrow's date.
is this possible to achieve this in hive.????
1
votes
1
answer
265
Views
BigQuery - Converting from non-partitioned to partitioned table - Legacy SQL
I've looked at previous questions, but the links given to GCP were outdated so I would like to learn what is the best way to do the conversion while inserting the correct partition (meaning not the day i inserted the records, but according to the 'date' column.
Could someone point me in the right d...
1
votes
0
answer
50
Views
partitioning mariadb for large table error
We have problem that a table is growing fast an has now 270M records,
we dont need data older than a week so we want to delete them on a twee INNODB galera cluster on Mariadb server latest version.
We have figured out partitioning is the best solution (maybe there are other solutions please let me k...
1
votes
2
answer
70
Views
Kafka Consumer distribution not working as expected
I have Three topics each having three partitions on a cluster of kafka.
now, there are total 9 partitions. and when i create 9 consumers... the 6 are being idle. only three consumers are being used.
the expectation is: each consumer should pickup one partitions and hence, 9 consumer should pick up d...
1
votes
0
answer
60
Views
Create partition function based on join with other table
I have 2 tables:
TableA
(
Id uniqueidentifier,
Date datetime
)
TabelB
(
Body XML,
KeyA uniqueidentifier
)
And I want to create partition for both tables based on date field in TableA.
I checked documentation but can't find any clue about it. Is it possible to do?
1
votes
1
answer
43
Views
I can't get the complete list of tables using jooq 3.9.1 from Postgres databse
I am creating the table as follows:
CREATE TABLE plist1 (c1 NUMERIC, c2 VARCHAR(10)) PARTITION BY
LIST (c1)
When I tried to read the complete list of tables from the Postgres database including the master tables that we have used for partitioning.
Interestingly when I have used the stand alone progr...
1
votes
0
answer
258
Views
NoSQL database design using single table
How satisfied are you with the statement 'You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.' ?
I have listed down my use cases for a NoSQL design but restricting myself for a single table design makes the design complex a...
1
votes
1
answer
29
Views
Oracle table is locked for inserts during a procedure execution but allows updates
We have a procedure which does operation on multiple records in multiple tables, in some cases the procedure needs to insert in a table named TAX, when this insert happens another table named PAYMENT is locked for insertion, but not for update.
So if another transaction tries to insert in PAYMENT it...
1
votes
0
answer
13
Views
create a lineal-log-log-lineal partition of interval [a,b] in matlab, but matching steps
i want to create a vector x from a to b, in the following way.
I have an interest point c , a
1
votes
1
answer
39
Views
Diskpart directing resulting output to second text file
Have the following, which works great:
echo BASEBOARD>>%computername%.txt
wmic /APPEND:'%computername%.txt' baseboard get Manufacturer, Model, Name, PartNumber, slotlayout, serialnumber, poweredon
echo BIOS>>%computername%.txt
wmic /APPEND:'%computername%.txt' bios get name, version, serialnumber, I...
1
votes
0
answer
69
Views
np.argpartition when sorting 2d arrays and nan
Consult np.argpartition when sorting 2d arrays, can you exclude the influence of nan?
Such as:
a = np.array([ 1., -1., nan, 0., nan, 2., 4., -2., -10., nan])
# Get effective Element Center
np.argpartition(a, (~np.isnan(a)).sum()//2)]
But what if it's a 2d array query?
1
votes
0
answer
26
Views
Can I partition by id that already exists on worker?
I am coding a streaming fault detection app for several counters, with input: RDD(Int, BreezeDenseMatrix[Double]).
For every RDD, i want to do some computations and write the RDD in a textfile in hdfs to compare it with the next new RDD.
What i want to do is when the new RDD has arrived, read the t...
1
votes
1
answer
289
Views
uniformly partition a rdd in spark
I have a text file in HDFS, which has about 10 million records. I am trying to read the file do some transformations on that data. I am trying to uniformly partition the data before I do the processing on it. here is the sample code
var myRDD = sc.textFile('input file location')
myRDD = myRDD.repar...
1
votes
1
answer
53
Views
Partitioning Not working fine in Google BigQuery when SQL Query contains a sub-query
I have the following table structure in the Big-query
**query_all_partition**
property_unique_date DATE REQUIRED
page_url STRING REQUIRED
click INTEGER REQUIRED
impression INTEGER REQUIRED
position FLOAT REQUIRED
Here, I have specified partitioning over property...
1
votes
1
answer
19
Views
Do I need to include partition name in the query to get the actual benefits of partitioning?
I have restructured one of my data tables (DeviceLogs) with range partition by month using date (LogDate) field. Following is a minimal version of my table.
UUID | LogDate | DeviceId | Counter
------|----------------------|-----------|---------
xxxx | 2018-08-21 15:00:00 | 23...