HBASE-25601 Use ASF-official mailing list archives

Signed-off-by: Peter Somogyi <psomogyi@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>

Closes #2983
This commit is contained in:
Josh Elser 2021-02-24 11:15:30 -05:00
parent ed2693f123
commit a7d0445a21
9 changed files with 15 additions and 31 deletions

View File

@ -112,7 +112,6 @@
<archive>https://lists.apache.org/list.html?user@hbase.apache.org</archive> <archive>https://lists.apache.org/list.html?user@hbase.apache.org</archive>
<otherArchives> <otherArchives>
<otherArchive>https://dir.gmane.org/gmane.comp.java.hadoop.hbase.user</otherArchive> <otherArchive>https://dir.gmane.org/gmane.comp.java.hadoop.hbase.user</otherArchive>
<otherArchive>https://search-hadoop.com/?q=&amp;fc_project=HBase</otherArchive>
</otherArchives> </otherArchives>
</mailingList> </mailingList>
<mailingList> <mailingList>
@ -123,7 +122,6 @@
<archive>https://lists.apache.org/list.html?dev@hbase.apache.org</archive> <archive>https://lists.apache.org/list.html?dev@hbase.apache.org</archive>
<otherArchives> <otherArchives>
<otherArchive>https://dir.gmane.org/gmane.comp.java.hadoop.hbase.devel</otherArchive> <otherArchive>https://dir.gmane.org/gmane.comp.java.hadoop.hbase.devel</otherArchive>
<otherArchive>https://search-hadoop.com/?q=&amp;fc_project=HBase</otherArchive>
</otherArchives> </otherArchives>
</mailingList> </mailingList>
<mailingList> <mailingList>

View File

@ -37,13 +37,13 @@ Just request the name of your branch be added to JIRA up on the developer's mail
Thereafter you can file issues against your feature branch in Apache HBase JIRA. Thereafter you can file issues against your feature branch in Apache HBase JIRA.
Your code you keep elsewhere -- it should be public so it can be observed -- and you can update dev mailing list on progress. Your code you keep elsewhere -- it should be public so it can be observed -- and you can update dev mailing list on progress.
When the feature is ready for commit, 3 +1s from committers will get your feature merged. When the feature is ready for commit, 3 +1s from committers will get your feature merged.
See link:http://search-hadoop.com/m/asM982C5FkS1[HBase, mail # dev - Thoughts See link:https://lists.apache.org/thread.html/200513c7e7e4df23c8b9134eeee009d61205c79314e77f222d396006%401346870308%40%3Cdev.hbase.apache.org%3E[HBase, mail # dev - Thoughts
about large feature dev branches] about large feature dev branches]
[[hbase.fix.version.in.jira]] [[hbase.fix.version.in.jira]]
.How to set fix version in JIRA on issue resolve .How to set fix version in JIRA on issue resolve
Here is how link:http://search-hadoop.com/m/azemIi5RCJ1[we agreed] to set versions in JIRA when we Here is how we agreed to set versions in JIRA when we
resolve an issue. If master is going to be 3.0.0, branch-2 will be 2.4.0, and branch-1 will be resolve an issue. If master is going to be 3.0.0, branch-2 will be 2.4.0, and branch-1 will be
1.7.0 then: 1.7.0 then:
@ -59,7 +59,7 @@ resolve an issue. If master is going to be 3.0.0, branch-2 will be 2.4.0, and br
[[hbase.when.to.close.jira]] [[hbase.when.to.close.jira]]
.Policy on when to set a RESOLVED JIRA as CLOSED .Policy on when to set a RESOLVED JIRA as CLOSED
We link:http://search-hadoop.com/m/4cIKs1iwXMS1[agreed] that for issues that list multiple releases in their _Fix Version/s_ field, CLOSE the issue on the release of any of the versions listed; subsequent change to the issue must happen in a new JIRA. We agreed that for issues that list multiple releases in their _Fix Version/s_ field, CLOSE the issue on the release of any of the versions listed; subsequent change to the issue must happen in a new JIRA.
[[no.permanent.state.in.zk]] [[no.permanent.state.in.zk]]
.Only transient state in ZooKeeper! .Only transient state in ZooKeeper!
@ -101,7 +101,7 @@ NOTE: End-of-life releases are not included in this list.
[[hbase.commit.msg.format]] [[hbase.commit.msg.format]]
== Commit Message format == Commit Message format
We link:http://search-hadoop.com/m/Gwxwl10cFHa1[agreed] to the following Git commit message format: We agreed to the following Git commit message format:
[source] [source]
---- ----
HBASE-xxxxx <title>. (<contributor>) HBASE-xxxxx <title>. (<contributor>)

View File

@ -31,8 +31,6 @@
NOTE: Codecs mentioned in this section are for encoding and decoding data blocks or row keys. NOTE: Codecs mentioned in this section are for encoding and decoding data blocks or row keys.
For information about replication codecs, see <<cluster.replication.preserving.tags,cluster.replication.preserving.tags>>. For information about replication codecs, see <<cluster.replication.preserving.tags,cluster.replication.preserving.tags>>.
Some of the information in this section is pulled from a link:http://search-hadoop.com/m/lL12B1PFVhp1/v=threaded[discussion] on the HBase Development mailing list.
HBase supports several different compression algorithms which can be enabled on a ColumnFamily. HBase supports several different compression algorithms which can be enabled on a ColumnFamily.
Data block encoding attempts to limit duplication of information in keys, taking advantage of some of the fundamental designs and patterns of HBase, such as sorted row keys and the schema of a given table. Data block encoding attempts to limit duplication of information in keys, taking advantage of some of the fundamental designs and patterns of HBase, such as sorted row keys and the schema of a given table.
Compressors reduce the size of large, opaque byte arrays in cells, and can significantly reduce the storage space needed to store uncompressed data. Compressors reduce the size of large, opaque byte arrays in cells, and can significantly reduce the storage space needed to store uncompressed data.
@ -122,7 +120,7 @@ Prefix Tree::
The compression or codec type to use depends on the characteristics of your data. Choosing the wrong type could cause your data to take more space rather than less, and can have performance implications. The compression or codec type to use depends on the characteristics of your data. Choosing the wrong type could cause your data to take more space rather than less, and can have performance implications.
In general, you need to weigh your options between smaller size and faster compression/decompression. Following are some general guidelines, expanded from a discussion at link:http://search-hadoop.com/m/lL12B1PFVhp1[Documenting Guidance on compression and codecs]. In general, you need to weigh your options between smaller size and faster compression/decompression. Following are some general guidelines, expanded from a discussion at link:https://lists.apache.org/thread.html/481e67a61163efaaf4345510447a9244871a8d428244868345a155ff%401378926618%40%3Cdev.hbase.apache.org%3E[Documenting Guidance on compression and codecs].
* If you have long keys (compared to the values) or many columns, use a prefix encoder. * If you have long keys (compared to the values) or many columns, use a prefix encoder.
FAST_DIFF is recommended. FAST_DIFF is recommended.

View File

@ -1079,7 +1079,7 @@ the top of the webpage).
If a big 40ms or so occasional delay is seen in operations against HBase, try the Nagles' setting. If a big 40ms or so occasional delay is seen in operations against HBase, try the Nagles' setting.
For example, see the user mailing list thread, For example, see the user mailing list thread,
link:http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&subj=Re+Inconsistent+scan+performance+with+caching+set+to+1[Inconsistent scan performance with caching set to 1] link:https://lists.apache.org/thread.html/3d7ceb41c04a955b1b1c80480cdba95208ca3e97bf6895a40e0c1bbb%401346186127%40%3Cuser.hbase.apache.org%3E[Inconsistent scan performance with caching set to 1]
and the issue cited therein where setting `notcpdelay` improved scan speeds. You might also see the and the issue cited therein where setting `notcpdelay` improved scan speeds. You might also see the
graphs on the tail of graphs on the tail of
link:https://issues.apache.org/jira/browse/HBASE-7008[HBASE-7008 Set scanner caching to a better default] link:https://issues.apache.org/jira/browse/HBASE-7008[HBASE-7008 Set scanner caching to a better default]

View File

@ -35,7 +35,7 @@ Being familiar with these guidelines will help the HBase committers to use your
Apache HBase gets better only when people contribute! If you are looking to contribute to Apache HBase, look for link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)[issues in JIRA tagged with the label 'beginner']. Apache HBase gets better only when people contribute! If you are looking to contribute to Apache HBase, look for link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20labels%20in%20(beginner)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)[issues in JIRA tagged with the label 'beginner'].
These are issues HBase contributors have deemed worthy but not of immediate priority and a good way to ramp on HBase internals. These are issues HBase contributors have deemed worthy but not of immediate priority and a good way to ramp on HBase internals.
See link:http://search-hadoop.com/m/DHED43re96[What label See link:https://lists.apache.org/thread.html/b122265f4e4054cf08f8cd38609fb06af72f398c44f9086b05ef4e21%401407246237%40%3Cdev.hbase.apache.org%3E[What label
is used for issues that are good on ramps for new contributors?] from the dev mailing list for background. is used for issues that are good on ramps for new contributors?] from the dev mailing list for background.
Before you get started submitting code to HBase, please refer to <<developing,developing>>. Before you get started submitting code to HBase, please refer to <<developing,developing>>.
@ -1022,7 +1022,7 @@ requirements of the ASF policy on releases.
____ ____
Regards the latter, run `mvn apache-rat:check` to verify all files are suitably licensed. Regards the latter, run `mvn apache-rat:check` to verify all files are suitably licensed.
See link:http://search-hadoop.com/m/DHED4dhFaU[HBase, mail # dev - On recent discussion clarifying ASF release policy] See link:https://mail-archives.apache.org/mod_mbox/hbase-dev/201406.mbox/%3CCA%2BRK%3D_B8EP0JMFV%2Bdt-k1g%3DBmedzyq2z1GSqrnMMiH6%3DcdoiAA%40mail.gmail.com%3E[HBase, mail # dev - On recent discussion clarifying ASF release policy]
for how we arrived at this process. for how we arrived at this process.
To help with the release verification, please follow the guideline below and vote based on the your verification. To help with the release verification, please follow the guideline below and vote based on the your verification.
@ -2607,7 +2607,7 @@ A committer should.
[[git.patch.flow]] [[git.patch.flow]]
====== Patching Etiquette ====== Patching Etiquette
In the thread link:http://search-hadoop.com/m/DHED4EiwOz[HBase, mail # dev - ANNOUNCEMENT: Git Migration In Progress (WAS => In the thread link:https://lists.apache.org/thread.html/186fcd5eb71973a7b282ecdba41606d3d221efd505d533bb729e1fad%401400648690%40%3Cdev.hbase.apache.org%3E[HBase, mail # dev - ANNOUNCEMENT: Git Migration In Progress (WAS =>
Re: Git Migration)], it was agreed on the following patch flow Re: Git Migration)], it was agreed on the following patch flow
. Develop and commit the patch against master first. . Develop and commit the patch against master first.

View File

@ -3376,7 +3376,6 @@ Physical data size on disk is distinct from logical size of your data and is aff
See <<regions.arch,regions.arch>>. See <<regions.arch,regions.arch>>.
* Decreased by <<compression,compression>> and data block encoding, depending on data. * Decreased by <<compression,compression>> and data block encoding, depending on data.
See also link:http://search-hadoop.com/m/lL12B1PFVhp1[this thread].
You might want to test what compression and encoding (if any) make sense for your data. You might want to test what compression and encoding (if any) make sense for your data.
* Increased by size of region server <<wal,wal>> (usually fixed and negligible - less than half of RS memory size, per RS). * Increased by size of region server <<wal,wal>> (usually fixed and negligible - less than half of RS memory size, per RS).
* Increased by HDFS replication - usually x3. * Increased by HDFS replication - usually x3.

View File

@ -593,7 +593,6 @@ See <<precreate.regions>>, as well as <<perf.configurations>>
== Reading from HBase == Reading from HBase
The mailing list can help if you are having performance issues. The mailing list can help if you are having performance issues.
For example, here is a good general thread on what to look at addressing read-time issues: link:http://search-hadoop.com/m/qOo2yyHtCC1[HBase Random Read latency > 100ms]
[[perf.hbase.client.caching]] [[perf.hbase.client.caching]]
=== Scan Caching === Scan Caching
@ -846,7 +845,7 @@ See the link:https://issues.apache.org/jira/browse/HDFS-1599[Umbrella Jira Ticke
Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via link:https://issues.apache.org/jira/browse/HDFS-2246[HDFS-2246], it is possible for the DFSClient to take a "short circuit" and read directly from the disk instead of going through the DataNode when the data is local. Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via link:https://issues.apache.org/jira/browse/HDFS-2246[HDFS-2246], it is possible for the DFSClient to take a "short circuit" and read directly from the disk instead of going through the DataNode when the data is local.
What this means for HBase is that the RegionServers can read directly off their machine's disks instead of having to open a socket to talk to the DataNode, the former being generally much faster. What this means for HBase is that the RegionServers can read directly off their machine's disks instead of having to open a socket to talk to the DataNode, the former being generally much faster.
See JD's link:http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf[Performance Talk]. See JD's link:http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf[Performance Talk].
Also see link:http://search-hadoop.com/m/zV6dKrLCVh1[HBase, mail # dev - read short circuit] thread for more discussion around short circuit reads. Also see link:https://lists.apache.org/thread.html/ce2ce3a3bbd20806d0c017b2e7528e78a46ccb87c063831db051949d%401347548325%40%3Cdev.hbase.apache.org%3E[HBase, mail # dev - read short circuit] thread for more discussion around short circuit reads.
To enable "short circuit" reads, it will depend on your version of Hadoop. To enable "short circuit" reads, it will depend on your version of Hadoop.
The original shortcircuit read patch was much improved upon in Hadoop 2 in link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347]. The original shortcircuit read patch was much improved upon in Hadoop 2 in link:https://issues.apache.org/jira/browse/HDFS-347[HDFS-347].

View File

@ -250,7 +250,7 @@ If your rows and column names are large, especially compared to the size of the
One such is the case described by Marc Limotte at the tail of link:https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272[HBASE-3551] (recommended!). Therein, the indices that are kept on HBase storefiles (<<hfile>>) to facilitate random access may end up occupying large chunks of the HBase allotted RAM because the cell value coordinates are large. One such is the case described by Marc Limotte at the tail of link:https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272[HBASE-3551] (recommended!). Therein, the indices that are kept on HBase storefiles (<<hfile>>) to facilitate random access may end up occupying large chunks of the HBase allotted RAM because the cell value coordinates are large.
Mark in the above cited comment suggests upping the block size so entries in the store file index happen at a larger interval or modify the table schema so it makes for smaller rows and column names. Mark in the above cited comment suggests upping the block size so entries in the store file index happen at a larger interval or modify the table schema so it makes for smaller rows and column names.
Compression will also make for larger indices. Compression will also make for larger indices.
See the thread link:http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&subj=a+question+storefileIndexSize[a question storefileIndexSize] up on the user mailing list. See the thread link:https://lists.apache.org/thread.html/b158eae5d8888d3530be378298bca90c17f80982fdcdfa01d0844c3d%401306240189%40%3Cuser.hbase.apache.org%3E[a question storefileIndexSize] up on the user mailing list.
Most of the time small inefficiencies don't matter all that much. Unfortunately, this is a case where they do. Most of the time small inefficiencies don't matter all that much. Unfortunately, this is a case where they do.
Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated several billion times in your data. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated several billion times in your data.
@ -639,7 +639,7 @@ However, HBase scales better at larger data volumes, so this is a feature trade-
Pay attention to <<performance>> when implementing any of these approaches. Pay attention to <<performance>> when implementing any of these approaches.
Additionally, see the David Butler response in this dist-list thread link:http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&subj=Stargate+hbase[HBase, mail # user - Stargate+hbase] Additionally, see the David Butler response in this dist-list thread link:https://lists.apache.org/thread.html/b0ca33407f010d5b1be67a20d1708e8d8bb1e147770f2cb7182a2e37%401300972712%40%3Cuser.hbase.apache.org%3E[HBase, mail # user - Stargate+hbase]
[[secondary.indexes.filter]] [[secondary.indexes.filter]]
=== Filter Query === Filter Query

View File

@ -32,7 +32,7 @@
Always start with the master log (TODO: Which lines?). Normally it's just printing the same lines over and over again. Always start with the master log (TODO: Which lines?). Normally it's just printing the same lines over and over again.
If not, then there's an issue. If not, then there's an issue.
Google or link:http://search-hadoop.com[search-hadoop.com] should return some hits for those exceptions you're seeing. Google should return some hits for those exceptions you're seeing.
An error rarely comes alone in Apache HBase, usually when something gets screwed up what will follow may be hundreds of exceptions and stack traces coming from all over the place. An error rarely comes alone in Apache HBase, usually when something gets screwed up what will follow may be hundreds of exceptions and stack traces coming from all over the place.
The best way to approach this type of problem is to walk the log up to where it all began, for example one trick with RegionServers is that they will print some metrics when aborting so grepping for _Dump_ should get you around the start of the problem. The best way to approach this type of problem is to walk the log up to where it all began, for example one trick with RegionServers is that they will print some metrics when aborting so grepping for _Dump_ should get you around the start of the problem.
@ -221,12 +221,6 @@ For more information on GC pauses, see the link:https://blog.cloudera.com/blog/2
[[trouble.resources]] [[trouble.resources]]
== Resources == Resources
[[trouble.resources.searchhadoop]]
=== search-hadoop.com
link:http://search-hadoop.com[search-hadoop.com] indexes all the mailing lists and is great for historical searches.
Search here first when you have an issue as its more than likely someone has already had your problem.
[[trouble.resources.lists]] [[trouble.resources.lists]]
=== Mailing Lists === Mailing Lists
@ -235,7 +229,6 @@ The 'dev' mailing list is aimed at the community of developers actually building
Before going to the mailing list, make sure your question has not already been answered by searching the mailing list Before going to the mailing list, make sure your question has not already been answered by searching the mailing list
archives first. For those who prefer to communicate in Chinese, they can use the 'user-zh' mailing list instead of the archives first. For those who prefer to communicate in Chinese, they can use the 'user-zh' mailing list instead of the
'user' list. 'user' list.
Use <<trouble.resources.searchhadoop>>.
Take some time crafting your question. Take some time crafting your question.
See link:http://www.mikeash.com/getting_answers.html[Getting Answers] for ideas on crafting good questions. See link:http://www.mikeash.com/getting_answers.html[Getting Answers] for ideas on crafting good questions.
A quality question that includes all context and exhibits evidence the author has tried to find answers in the manual and out on lists is more likely to get a prompt response. A quality question that includes all context and exhibits evidence the author has tried to find answers in the manual and out on lists is more likely to get a prompt response.
@ -639,8 +632,6 @@ The upside is that more data is packed into the same region, but performance is
And smaller StoreFiles become targets for compaction. And smaller StoreFiles become targets for compaction.
Without compression the files are much bigger and don't need as much compaction, however this is at the expense of I/O. Without compression the files are much bigger and don't need as much compaction, however this is at the expense of I/O.
For additional information, see this thread on link:http://search-hadoop.com/m/WUnLM6ojHm1/Long+client+pauses+with+compression&subj=Long+client+pauses+with+compression[Long client pauses with compression].
[[trouble.client.security.rpc.krb]] [[trouble.client.security.rpc.krb]]
=== Secure Client Connect ([Caused by GSSException: No valid credentials provided...]) === Secure Client Connect ([Caused by GSSException: No valid credentials provided...])
@ -693,7 +684,7 @@ The utility <<trouble.tools.builtin.zkcli>> may help investigate ZooKeeper issue
[[trouble.client.oome.directmemory.leak]] [[trouble.client.oome.directmemory.leak]]
=== Client running out of memory though heap size seems to be stable (but the off-heap/direct heap keeps growing) === Client running out of memory though heap size seems to be stable (but the off-heap/direct heap keeps growing)
You are likely running into the issue that is described and worked through in the mail thread link:http://search-hadoop.com/m/ubhrX8KvcH/Suspected+memory+leak&subj=Re+Suspected+memory+leak[HBase, mail # user - Suspected memory leak] and continued over in link:http://search-hadoop.com/m/p2Agc1Zy7Va/MaxDirectMemorySize+Was%253A+Suspected+memory+leak&subj=Re+FeedbackRe+Suspected+memory+leak[HBase, mail # dev - FeedbackRe: Suspected memory leak]. You are likely running into the issue that is described and worked through in the mail thread link:https://lists.apache.org/thread.html/d12bbe56be95cf68478d1528263042730670ff39159a01eaf06d8bc8%401322622090%40%3Cuser.hbase.apache.org%3E[HBase, mail # user - Suspected memory leak] and continued over in link:https://lists.apache.org/thread.html/621dde35479215f0b07b23af93b8fac52ff4729949b5c9af18e3a85b%401322971078%40%3Cuser.hbase.apache.org%3E[HBase, mail # dev - FeedbackRe: Suspected memory leak].
A workaround is passing your client-side JVM a reasonable value for `-XX:MaxDirectMemorySize`. A workaround is passing your client-side JVM a reasonable value for `-XX:MaxDirectMemorySize`.
By default, the `MaxDirectMemorySize` is equal to your `-Xmx` max heapsize setting (if `-Xmx` is set). Try setting it to something smaller (for example, one user had success setting it to `1g` when they had a client-side heap of `12g`). If you set it too small, it will bring on `FullGCs` so keep it a bit hefty. By default, the `MaxDirectMemorySize` is equal to your `-Xmx` max heapsize setting (if `-Xmx` is set). Try setting it to something smaller (for example, one user had success setting it to `1g` when they had a client-side heap of `12g`). If you set it too small, it will bring on `FullGCs` so keep it a bit hefty.
You want to make this setting client-side only especially if you are running the new experimental server-side off-heap cache since this feature depends on being able to use big direct buffers (You may have to keep separate client-side and server-side config dirs). You want to make this setting client-side only especially if you are running the new experimental server-side off-heap cache since this feature depends on being able to use big direct buffers (You may have to keep separate client-side and server-side config dirs).
@ -1333,12 +1324,11 @@ Use the internal EC2 host names when configuring the ZooKeeper quorum peer list.
=== Instability on Amazon EC2 === Instability on Amazon EC2
Questions on HBase and Amazon EC2 come up frequently on the HBase dist-list. Questions on HBase and Amazon EC2 come up frequently on the HBase dist-list.
Search for old threads using link:http://search-hadoop.com/[Search Hadoop]
[[trouble.ec2.connection]] [[trouble.ec2.connection]]
=== Remote Java Connection into EC2 Cluster Not Working === Remote Java Connection into EC2 Cluster Not Working
See Andrew's answer here, up on the user list: link:http://search-hadoop.com/m/sPdqNFAwyg2[Remote Java client connection into EC2 instance]. See Andrew's answer here, up on the user list: link:https://lists.apache.org/thread.html/666bfa863bc2eb2ec7bbe5ecfbee345e0cbf1d58aaa6c1636dfcb527%401269010842%40%3Cuser.hbase.apache.org%3E[Remote Java client connection into EC2 instance].
[[trouble.versions]] [[trouble.versions]]
== HBase and Hadoop version issues == HBase and Hadoop version issues