Questions tagged [warehouse]

1

votes
1

answer
51

Views

Can an Data Warehouse include a Data lake?

I want to understand data warehouse and data lake more in detail. It seems to me there is different information to the topic. Inmon defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process Now I...
A.Dumas
1

votes
3

answer
266

Views

Move SQL Server Database data to SAP BW

I have read a few articles about moving data out of SAP BW and into SQL Server. I cant find any articles on moving the data from SQL Server to SAP BW, is it even possible and if so what would be the best way to handle this?
Etienne
1

votes
1

answer
71

Views

Azure SQL Data Warehouse - Max concurrent queries

I have to decide to use an Azure SQL Data Warehouse or a SQL Data warehouse based on Microsoft SQL Server virtualized on a VM. The problem what i do not understand is the MAX CONCURRENT QUERIES LIMITATION TO 32. The same for the Azure SQL Database is 6400. To be honest when i want to use the Azure D...
STORM B.
1

votes
0

answer
215

Views

Using JBDC to read sql file in spark scala collecting Warehouse error

I am trying to read MySQL file using Spark Scala. Following is the code I tried val dataframe_mysql = sqlContext.read.format('jdbc') .option('url','jdbc:mysql://xx.xx.xx.xx:xx') .option('driver', 'com.mysql.jdbc.Driver') .option('dbtable', 'schema.xxxx') .option('user', 'xxxx').option('password', '...
Fouad Haddud
1

votes
1

answer
175

Views

Finding top most used HIVE Tables

We are using HIVE extensively in our data warehouse solution. Many scheduled jobs and adhoc queries are accessing these. How can I find which HIVE tables are most popular in my company. So that I could take some action to improve it.
rajnish
1

votes
0

answer
102

Views

How to “replace” a SQLite table without overriding indexes and keys?

Currently I push my dataframes into several SQLite tables by using pandas basic functionality df.to_sql('df', if_exists='replace'). My idea is to 'optimize' the table afterwards in 'DB Browser for SQLite' by setting primary keys, foreign keys, indizes and so on. Problem: Everything gets overridden...
Christian
1

votes
0

answer
129

Views

Can entity framework handle Type 2 slowly changing dimensions?

First off...apologies if this is a really dumb question! I have developed a CRM type application using C# web forms that updates records in our company database. Some areas of the DB use type 2 slowly changing dimensions (SCD) and my application uses stored procedures to read and update these SCDs....
Simon Tindall
1

votes
0

answer
132

Views

how does ColumnStore Index can retrieve data if it is not ordered?

i'm learning how to use columnstore in sql server and i can't figure out how it works. this is a normale rowstore sample: If we use columnstore it should look soemthing like this according to this article : now how is it possible to retrieve ross taylor info if the order is not preserved in the col...
bakaou baka
1

votes
1

answer
56

Views

How to achieve zero downtime in ETl

I have an ETL process which takes data from transaction db and keeps after processing stores the data to another DB. While storing the data we are truncating the old data and storing new data to have better performance, as update takes a lot of time than truncate insert. So in this process we experi...
Subhrajit
1

votes
2

answer
32

Views

JOOQ with SQL DataWarehouse?

Does JOOQ support dialect for 'SQL DataWarehouse'? Any pointers .
Sid
1

votes
1

answer
72

Views

Bulk data processing using dapper and data warehouse

I am using dapper in front end to process the data and inserting it into data warehouse. I have a scenario where i need to send bulk data from from dapper to data warehouse and perform few operation on it. I can do that using data table. i can create a data table, fill it with data and then pass tha...
Jai
1

votes
0

answer
474

Views

Azure Data Factory Pipeline hangs/timeouts

I'm building a Data-warehouse, I'm extracting data from 2 source systems (A & B), the main pipeline is executing them in parallel, the 2 system are independent: The extraction is being done via a set of queries that are stored in the DB in a table and being read by each of the pipelines: When the tw...
MDreamer
1

votes
2

answer
231

Views

Star Schema: Triple Relations between Fact and Dimension

I'm modelling a Star Schema from an ERD. The Database is a rental firm. My fact table now contains a single Rental Booking. Customers are able to book cars, collect cars and return cars. The date of when this happens should be recorded. So, my Star Schema now has a Time Dimension with Day/Month/Year...
coolMan
1

votes
1

answer
30

Views

Do SSIS lookup transformations cache only relevant columns?

I have a full cache lookup that seems to be causing a failure. It was pointing directly to the lookup table via the drop down list in the Lookup configuration, which my understanding is equivalent to a SELECT *. I thought that by changing the lookup configuration to use results from a SQL statement,...
Shane
1

votes
1

answer
83

Views

Loading only latest files data to Azure SQL Datawarehouse

Step#1: WE are supposed to copy the CSV Files from On-Premise File Server to Azure Blob Storage (say - 'Staging' Container in Blob Storage). Step#2: Applying Polybase, we will load these files data to Azure SQL Datawarehouse. We are maintaining the same file name (sync with the Staging DB Tables), e...
Koushik
1

votes
0

answer
180

Views

Data warehouse - multiple choice survey

I want to use a data warehouse to store questions and answers from a survey of multiple choice questions, so my proposal is to design a star schema. For this I have done the following: I build a fact table with the next fields: userID, surveyID, questionID, answerID and date. On the other hand I bui...
dPrieto
1

votes
1

answer
33

Views

Whats the alternative for redshift Extract datatype

What is the alternative for Extract in Azure Datawarehouse, we are using datepart right now but it does not work with from, so what can be a straight forward alternative for extract??
Dead Man
1

votes
0

answer
191

Views

Pandas to_gbq method DataFrame schema doesn't match to the table

We're working on an app and were collecting stats from it. Initially we were dumping stats on googl sheets so we downloaded it as csv and uploaded the data to bigquery. We succesfully uploaded data with the schema we wanted. When we are re-running the app and added a step to upload it to the test ta...
Kel
1

votes
2

answer
354

Views

Snowflake python connector not working on larger data set in AWS Lambda

I'm using Snowflakes python connector to try to retrieve a set of data from our data warehouse for processing. This job is executing within a AWS lambda function and has trouble when the rows being returning back is ~20 or so. When I set a limit 10 or limit 20 I'm able to get the data set back. If I...
Austin
1

votes
1

answer
51

Views

FactLoanVolume - One or Many Fact Tables

I am designing a Fact table to report on loan volume. The grain is one row per loan transaction. A loan has a few major milestones that we report on: In order of sequence, these are Lock Volume, Loan Funding Volume and Loan Sales Volume. I have Lock Date, Loan Funding Date and Loan Sale Date as F...
EyeBikeRide
1

votes
0

answer
13

Views

How are accumulating snapshots implemented in hive

Accumulating snapshots have rows that need to be updated for every process change. How is this implemented in hive, when we cannot update the record? So far I can only think of implementing this as a SCD2 table.
Ravi R
1

votes
0

answer
40

Views

Is it possible to integrate google analytics 360 to a data warehouse without Bigquery

One of our clients wonders if they can get raw data from google analytics 360 to their own data warehouse directly. I believe it's impossible by reading up many materials but I just couldn't find it's official document or notice which enunciates that it's impossible. Does anyone know what I am looki...
Hong SeongWoong
1

votes
0

answer
40

Views

Does AdomdDataReader work with queries with no columns?

I'm currently trying to switch from using CellSet to using AdomdDataReader for a project, because I noticed that performance is significantly better with the latter. So far it works great, with one exception: Queries that don't specify any columns will make the DataReader return false on the very f...
user6314158
1

votes
2

answer
78

Views

Azure DataWarehouse load CSV with external Table

I can't find a complete example on how I would be able to load a CSV file directlry with a external table into a Sql Datawarehouse. The file is on a Storage account https://tstodummy.blob.core.windows.net/ Blob container referencedata-in, folder csv-uploads, file something.csv. This is my code CREAT...
Harry Leboeuf
1

votes
1

answer
67

Views

How to add a partition boundary only when not exists in SQL Data Warehouse?

I am using Azure SQL Data Warehouse Gen 1, and I create a partition table like this CREATE TABLE [dbo].[StatsPerBin1]( [Bin1] [varchar](100) NOT NULL, [TimeWindow] [datetime] NOT NULL, [Count] [int] NOT NULL, [Timestamp] [datetime] NOT NULL) WITH ( DISTRIBUTION = HASH ( [Bin1] ), CLUSTERED INDEX([Bi...
Lucas Yang
1

votes
2

answer
39

Views

Dimensional Modeling: how to create a table without Surrogate Primary Keys?

From what I have understand, we don't have Primary Key in the fact table and put a Surrogate Key is somehow a waste of space. Hence, the foreign key combination is the primary key for the fact table. But I may case, I was not able to do that because the unique keys can potentially repeat in the fact...
Nicolas Tang
1

votes
0

answer
60

Views

location option must be specified when creating a temporary table in SSIS Parallel Data Warehouse

so i am loading data into a temp table in my data flow task and i keep getting the following error when i try to hit 'Preview' location option must be specified when creating a temporary table
Kamran
1

votes
1

answer
103

Views

About Surrogate key in Loading Process in DataWarehouse

When you do the loading process from stage table to the fact and dimension table and does it mean that you also load the surrogate key from stage to the dimension table in relation to new rows? Or do you create new surrogate key in dimension table by using the sql code Identity for the table? (https...
What'sUP
1

votes
1

answer
73

Views

Case when statement SQL

I am facing some difficulties for a Datawarehouse transformation task, I have some source columns which are coming in varchar format, data contained: Blanks, -, decimal numbers such as (1234.44). Those columns in target are declared as number. I am trying to treat that data with this code but I kee...
Lorik Berisha
1

votes
0

answer
90

Views

SSIS alternatives for ETL in Azure data factory

Please could you all assist us in answering what we believe to be a rather simple question, but is proving to be really difficult to get a solution to. We have explored things like Data Bricks and Snowflake for our Azure based data warehouse, but keep getting stuck at the same point. Do you have any...
Dominic Albrecht
1

votes
1

answer
36

Views

Using a subset of a fact table for another fact table?

I'm pretty new to data warehousing so I'm not sure whether the question makes any sense. I have a Sales fact table that shows the purchases by customers. This table is connected to dimension tables like Customers and Product. I plan to have a PromotionStatus fact table that tracks the response of c...
Jon Coal
1

votes
1

answer
28

Views

Historical data sets in an initial build

The issue I am currently facing is I think a logical one and maybe a limitation of SSIS. My data has a set of accounts, at any point this account can be owned by an organisation. This combination controls my historical dimension of 'Account' E.g ╔════════════╦═══...
Caz1224
1

votes
0

answer
32

Views

SQL Server Analysis Service passes null and blank values in dimension string attribute

It is expected in SSAS to show duplicate key error messages on null and blank values both in a nvarchar column. We have a dimension that distinct values have some null and blank values in a nvarchar attribute. When it is in full process, SSAS doesn't show up any error message for duplicate key. Null...
programmer21
1

votes
2

answer
44

Views

modeling datawarehouse multilanguage

I need your help. I work for a survey company and I am responsible for creating its architecture and modeling a data warehouse that analyzes the results of an international survey (50 countries). For the architecture, we decided to create a tabular model in PowerBI to analyze our data and to create...
Lidou123
1

votes
2

answer
85

Views

Dimensional modeling on columnar databases

I have started learning cloud architecture and found out that they all are using columnar databases which claims to be more efficient as they are storing column rather than a row to reduce duplicate. From a data mart perspective (lets say for an organization a department only want to monitor intern...
Zerotoinfinity
1

votes
1

answer
35

Views

How to delete row from a heap with a batch size 10000

No supported- DELETE TOP(10000) FROM dataArchival.MyTable WHERE DateLocal BETWEEN '2018-03-01' AND '2018-10-01' delete dataArchival.MyTable from dataArchival.MyTable d,#myTemp d2 where d.DateLocal=d2.DateLocal delete d from dataArchival.MyTable d ( SELECT *, RN = ROW_NUMBER() OVER(ORDER BY (SELECT...
Alivia
1

votes
1

answer
32

Views

How to use Rollup Grouping Function error in SQL DW?

I'm getting the error thatROLLUP is not a function name but the documentation says it should work Msg 104162, Level 16, State 1, Line 2 'ROLLUP' is not a recognized built-in function name. I've tried group by grouping sets but it told me the syntax was wrong, that's when I saw that grouping sets doe...
1

votes
1

answer
30

Views

What should be my hive partitioning strategy and view strategy so that query can efficiently run and return results within 10 seconds

My use case is i have two data sources: 1. Source1 (as speed layer) 2. Hive external table on top of S3(as batch layer) I am using Presto for querying data from both the data sources by using view. I want to create view that will union data from both the sources like : 'create view test as select *...
unknown_k
1

votes
1

answer
19

Views

Design Fact Table

I'm trying to design a model in PowerBi. I've developped a model with a fact table like this. 1- SurveyFact as Respondant| Date | Question | IdResponse | Count Frank | 201801 | Where do you live ? |Germany | 1 Stephane | 201801...
Lidou123
1

votes
0

answer
30

Views

Update changes in Azure SQL Data Warehouse using polybase

I want help regarding Azure SQL Data Warehouse, I'm using Polybase to ELT data from Azure Data Lake Storage Gen2 to Azure SQL DW. When we load data first time into DW no issues. But when we load data again/incremental load how do we upsert data? Flow we are using ASDL2 -> (polybase) -> External tab...
Soni007

View additional questions